ó <¿CVc@ sqdZddlmZddlZddlmZddlmZddlm Z d„Z d„Z d „Z dS( sBLEU score implementation.iÿÿÿÿ(tdivisionN(t word_tokenize(tCounter(tngramsc s€‡‡fd†t|ddƒDƒ}y&tjd„t||ƒDƒƒ}Wntk r_dSXtˆˆƒ}|tj|ƒS(s$ Calculate BLEU score (Bilingual Evaluation Understudy) from Papineni, Kishore, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. "BLEU: a method for automatic evaluation of machine translation." In Proceedings of ACL. http://www.aclweb.org/anthology/P02-1040.pdf >>> weights = [0.25, 0.25, 0.25, 0.25] >>> hypothesis1 = ['It', 'is', 'a', 'guide', 'to', 'action', 'which', ... 'ensures', 'that', 'the', 'military', 'always', ... 'obeys', 'the', 'commands', 'of', 'the', 'party'] >>> hypothesis2 = ['It', 'is', 'to', 'insure', 'the', 'troops', ... 'forever', 'hearing', 'the', 'activity', 'guidebook', ... 'that', 'party', 'direct'] >>> reference1 = ['It', 'is', 'a', 'guide', 'to', 'action', 'that', ... 'ensures', 'that', 'the', 'military', 'will', 'forever', ... 'heed', 'Party', 'commands'] >>> reference2 = ['It', 'is', 'the', 'guiding', 'principle', 'which', ... 'guarantees', 'the', 'military', 'forces', 'always', ... 'being', 'under', 'the', 'command', 'of', 'the', ... 'Party'] >>> reference3 = ['It', 'is', 'the', 'practical', 'guide', 'for', 'the', ... 'army', 'always', 'to', 'heed', 'the', 'directions', ... 'of', 'the', 'party'] >>> bleu([reference1, reference2, reference3], hypothesis1, weights) 0.5045666840058485 >>> bleu([reference1, reference2, reference3], hypothesis2, weights) 0 :param references: reference sentences :type references: list(list(str)) :param hypothesis: a hypothesis sentence :type hypothesis: list(str) :param weights: weights for unigrams, bigrams, trigrams and so on :type weights: list(float) c3 s'|]\}}tˆˆ|ƒVqdS(N(t_modified_precision(t.0tit_(t hypothesist references(sk/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/translate/bleu_score.pys @ststartics s(|]\}}|tj|ƒVqdS(N(tmathtlog(Rtwtp_n((sk/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/translate/bleu_score.pys Esi(t enumerateR tfsumtzipt ValueErrort_brevity_penaltytexp(R Rtweightstp_nststbp((RR sk/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/translate/bleu_score.pytbleus,& c sÁtt||ƒƒ}|sdSi‰xW|D]O}tt||ƒƒ}x1|D])}tˆj|dƒ||ƒˆ|>> reference1 = 'the cat is on the mat'.split() >>> reference2 = 'there is a cat on the mat'.split() >>> hypothesis1 = 'the the the the the the the'.split() >>> references = [reference1, reference2] >>> _modified_precision(references, hypothesis1, n=1) 0.2857142857142857 In the modified n-gram precision, a reference word will be considered exhausted after a matching hypothesis word is identified, e.g. >>> reference1 = ['It', 'is', 'a', 'guide', 'to', 'action', 'that', ... 'ensures', 'that', 'the', 'military', 'will', ... 'forever', 'heed', 'Party', 'commands'] >>> reference2 = ['It', 'is', 'the', 'guiding', 'principle', 'which', ... 'guarantees', 'the', 'military', 'forces', 'always', ... 'being', 'under', 'the', 'command', 'of', 'the', ... 'Party'] >>> reference3 = ['It', 'is', 'the', 'practical', 'guide', 'for', 'the', ... 'army', 'always', 'to', 'heed', 'the', 'directions', ... 'of', 'the', 'party'] >>> hypothesis = 'of the'.split() >>> references = [reference1, reference2, reference3] >>> _modified_precision(references, hypothesis, n=1) 1.0 >>> _modified_precision(references, hypothesis, n=2) 1.0 An example of a normal machine translation hypothesis: >>> hypothesis1 = ['It', 'is', 'a', 'guide', 'to', 'action', 'which', ... 'ensures', 'that', 'the', 'military', 'always', ... 'obeys', 'the', 'commands', 'of', 'the', 'party'] >>> hypothesis2 = ['It', 'is', 'to', 'insure', 'the', 'troops', ... 'forever', 'hearing', 'the', 'activity', 'guidebook', ... 'that', 'party', 'direct'] >>> reference1 = ['It', 'is', 'a', 'guide', 'to', 'action', 'that', ... 'ensures', 'that', 'the', 'military', 'will', ... 'forever', 'heed', 'Party', 'commands'] >>> reference2 = ['It', 'is', 'the', 'guiding', 'principle', 'which', ... 'guarantees', 'the', 'military', 'forces', 'always', ... 'being', 'under', 'the', 'command', 'of', 'the', ... 'Party'] >>> reference3 = ['It', 'is', 'the', 'practical', 'guide', 'for', 'the', ... 'army', 'always', 'to', 'heed', 'the', 'directions', ... 'of', 'the', 'party'] >>> references = [reference1, reference2, reference3] >>> _modified_precision(references, hypothesis1, n=1) 0.9444444444444444 >>> _modified_precision(references, hypothesis2, n=1) 0.5714285714285714 >>> _modified_precision(references, hypothesis1, n=2) 0.5882352941176471 >>> _modified_precision(references, hypothesis2, n=2) 0.07692307692307693 :param references: A list of reference translations. :type references: list(list(str)) :param hypothesis: A hypothesis translation. :type hypothesis: list(str) :param n: The ngram order. :type n: int ic3 s.|]$\}}|t|ˆ|ƒfVqdS(N(tmin(Rtngramtcount(t max_counts(sk/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/translate/bleu_score.pys ¦s(RRtmaxtgettdicttitemstsumtvalues(R Rtntcountst referencetreference_countsRtclipped_counts((Rsk/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/translate/bleu_score.pyRNsM  +"c s`t|ƒ‰d„|Dƒ}t|d‡fd†ƒ}ˆ|krGdStjd|ˆƒSdS(sË Calculate brevity penalty. As the modified n-gram precision still has the problem from the short length sentence, brevity penalty is used to modify the overall BLEU score according to length. An example from the paper. There are three references with length 12, 15 and 17. And a concise hypothesis of the length 12. The brevity penalty is 1. >>> reference1 = list('aaaaaaaaaaaa') # i.e. ['a'] * 12 >>> reference2 = list('aaaaaaaaaaaaaaa') # i.e. ['a'] * 15 >>> reference3 = list('aaaaaaaaaaaaaaaaa') # i.e. ['a'] * 17 >>> hypothesis = list('aaaaaaaaaaaa') # i.e. ['a'] * 12 >>> references = [reference1, reference2, reference3] >>> _brevity_penalty(references, hypothesis) 1.0 In case a hypothesis translation is shorter than the references, penalty is applied. >>> references = [['a'] * 28, ['a'] * 28] >>> hypothesis = ['a'] * 12 >>> _brevity_penalty(references, hypothesis) 0.2635971381157267 The length of the closest reference is used to compute the penalty. If the length of a hypothesis is 12, and the reference lengths are 13 and 2, the penalty is applied because the hypothesis length (12) is less then the closest reference length (13). >>> references = [['a'] * 13, ['a'] * 2] >>> hypothesis = ['a'] * 12 >>> _brevity_penalty(references, hypothesis) 0.9200444146293233 The brevity penalty doesn't depend on reference order. More importantly, when two reference sentences are at the same distance, the shortest reference sentence length is used. >>> references = [['a'] * 13, ['a'] * 11] >>> hypothesis = ['a'] * 12 >>> bp1 = _brevity_penalty(references, hypothesis) >>> bp2 = _brevity_penalty(reversed(references),hypothesis) >>> bp1 == bp2 == 1 True A test example from mteval-v13a.pl (starting from the line 705): >>> references = [['a'] * 11, ['a'] * 8] >>> hypothesis = ['a'] * 7 >>> _brevity_penalty(references, hypothesis) 0.8668778997501817 >>> references = [['a'] * 11, ['a'] * 8, ['a'] * 6, ['a'] * 7] >>> hypothesis = ['a'] * 7 >>> _brevity_penalty(references, hypothesis) 1.0 :param references: A list of reference translations. :type references: list(list(str)) :param hypothesis: A hypothesis translation. :type hypothesis: list(str) cs s|]}t|ƒVqdS(N(tlen(RR&((sk/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/translate/bleu_score.pys ístkeyc st|ˆƒ|fS(N(tabs(tref_len(tc(sk/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/translate/bleu_score.pytîsiN(R)RR R(R Rtref_lenstr((R-sk/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/translate/bleu_score.pyR«s A  ( t__doc__t __future__RR t nltk.tokenizeRt nltk.compatRt nltk.utilRRRR(((sk/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/translate/bleu_score.pyt s  : ]