U C^k @sddlmZddlmZddlZddlZddlZddlZed\ ZZ Z Z Z Z ZZZZGdddeZddd Zdd d Zd d ZddZddZddZedkreGdddejZdS))division)print_functionN c@s eZdZdS)UDErrorN)__name__ __module__ __qualname__r r 9/tmp/pip-install-6_kvzl1k/spacy/bin/ud/conll17_ud_eval.pyrasrTcsJGddd}Gddd}Gddd}|d\}d|}d 7|sTq4|d }dkr|d rrq<j||djtj|sfd d jdD] }|q|rtddjdDd krtd|jd_ dq<| d}t|dkr,td |d|t kr.UDRepresentationcSsg|_g|_g|_g|_dSN) characterstokenswords sentencesselfr r r __init__hsz.load_conllu..UDRepresentation.__init__Nrrrrr r r r UDRepresentationgsrc@s0eZdZddZeddZddZddZd S) zload_conllu..UDSpancSs||_||_||_dSr )startendr )rrrr r r r rssz$load_conllu..UDSpan.__init__cSsd|j|j|jS)N)joinr rrrr r r textzsz load_conllu..UDSpan.textcSs|jSr rrr r r __str__~sz#load_conllu..UDSpan.__str__cSs|jSr rrr r r __repr__sz$load_conllu..UDSpan.__repr__N)rrrrpropertyrrrr r r r UDSpanrs  rc@seZdZddZdS)zload_conllu..UDWordcSs4||_||_||_d|_|tdd|jt<dS)N:r)spancolumns is_multiwordparentDEPRELsplit)rr r!r"r r r rs z$load_conllu..UDWord.__init__Nrr r r r UDWordsr&)rNr #cs|jdkrtd|jdkr|t|jt}|tjkrRtd|jt|r|j|d}d|_|||_dS)NZ remappingzThere is a cycle in a sentencez1Line {}: HEAD '{}' points outside of the sentencer')r#rintr!HEADlenrformat)wordheadr#linenum process_wordZsentence_startZudr r r2s  z!load_conllu..process_wordcSsg|]}|jdkr|qSr )r#.0r.r r r s zload_conllu..z&There are multiple roots in a sentence rzCThe CoNLL-U line {} does not contain 10 tab-separated columns: '{}'. rz5There is an empty FORM in the CoNLL-U file -- line %d-z%Cannot parse multi-word token ID '{}'T)r"zCannot parse word ID '{}'z3Incorrect word ID '{}' for word '{}', expected '{}'zCannot parse HEAD '{}'zHEAD cannot be negativeFz-The CoNLL-U file does not end with empty line)readlinerstrip startswithrappendr r,rrrr%r-IDFORMreplaceextendr mapr*rangeprintr+)file check_parserrr&indexliner.r!rr_Z word_lineZ word_columnsZword_idZhead_idr r0r load_conllues          " "(  rKc sGdddGdddGfdddddfd d }d d ffd d }ddddfddfddfdd}|j|jkrd}|j||j|kr|d7}qtddd|j||dd|j||d||j|j}|r||j|j||j|j||d||dd ||d d ||d!d ||d"d ||d#d ||d$d ||d%d d& } n>||j|j||j|j||d||d'd ||d(d d)} dk rfd*d+} ||d,d | | d-<| S).Nc@seZdZdddZdS)zevaluate..ScoreNcSs|r ||nd|_|r||nd|_||r.Score.__init__)NNNrr r r r ScoresrTc@seZdZddZdS)zevaluate..AlignmentWordcSs||_||_d|_d|_dSr ) gold_word system_word gold_parentsystem_parent_gold_alignedrrUrVr r r rsz(evaluate..AlignmentWord.__init__Nrr r r r AlignmentWordsrZcs(eZdZddZfddZddZdS)zevaluate..AlignmentcSs||_||_g|_i|_dSr ) gold_words system_words matched_wordsmatched_words_map)rr[r\r r r rsz$evaluate..Alignment.__init__cs |j||||j|<dSr )r]r>r^rYrZr r append_aligned_wordssz0evaluate..Alignment.append_aligned_wordscSsN|jD]B}|jjdk r|jjnd|_|jjdk rB|j|jjdnd|_qdS)Nr)r]rUr#rWrVr^getrX)rrr r r fill_parentss   z(evaluate..Alignment.fill_parentsN)rrrrr`rbr r_r r Alignment s rccSs*tjdkr"t|tr"|dS|S)Nrutf-8)sys version_info isinstancestrdecodelowerrr r r rl szevaluate..lowerc sd\}}}g}g}d}d}d} |t|kr|t|kr|dkrN||dnd} |dkrf||dnd} ||j||jkr|s|d7}|t| |d7}q||j||jkr| s|d7}|t| |d7}q|||j||jk7}||j||jkr0|t||d} d}n>||j||jkrf|t||d}d} nd} d}|d7}|d7}qt|t||d||S)N)rrrrFr'T)r,rr>rjstripr) Z gold_spansZ system_spansrSgisirQrRZcomboZprevious_end_si_earlierZprevious_end_gi_earlierZ previous_siZ previous_girTr r spans_score%sB    zevaluate..spans_scorecSsdSNr'r )wr r r Mzevaluate..c sd\}}}}|jD]}|||7}q|jD]}|||7}q*|jD]}|||j7}qB|dkrj|||S|jD].}||j|j||j|jkrp|||j7}qp||||S)N)rrrr)r[r\r]rUrWrVrX) alignmentZkey_fnZ weight_fngoldsystemZalignedrSr.rrpr r alignment_scoreMs      z!evaluate..alignment_scorecSs:|t|krdS||jr*||jj|kS||jj|kS)NT)r,r"r rr)rimultiword_span_endr r r beyond_endcs   zevaluate..beyond_endcSs|jr|jj|kr|jjS|Sr )r"r r)r.r{r r r extend_endjszevaluate..extend_endcs||jrB||jj}||jsx||jj||jjkrx|d7}n6||jj}||jsx||jj||jjkrx|d7}||}}|||r|||s|t|kr|t|ks||jj||jjkr|||}|d7}q|||}|d7}q||||fSrr)r"r rrr,)r[r\rnror{gsss)r|r}r r find_multiword_spanos&  "  "     z%evaluate..find_multiword_spanc s4fddt||D}tt||D]}ttD]}|||jt||jtkrd|d||kr|dkr||d|dnd|||<t||||d||kr||d|nd|||<t||||dkr|||dnd|||<qBq,|S)Ncsg|]}dgqS)rr )r4rzrorr r r5sz1evaluate..compute_lcs..r'r)rDreversedr!r@max) r[r\rnror~rlcsgs)rlrr compute_lcss,D:@zevaluate..compute_lcsc s||}d\}}|t|kr|t|kr||jsD||jrH||||\}}}}||kr||kr̈||||||}d\}} | ||kr|||kr̈||| jt|||jtkr|||| |||| d7} |d7}q|| || d||kr*|| d|ndkr<| d7} q|d7}qq||jj||jjf||jj||jjfkr||||||d7}|d7}q||jj||jjkr|d7}q|d7}q||S)N)rrr'r) r,r"r!r@r`r rrrb) r[r\rvrnror~rrrr)rcrrrlr r align_wordss2 , 4  2   zevaluate..align_wordsrr'zDThe concatenation of tokens in gold file and in system file differ! zFFirst 20 differing characters in gold file: '{}' and system file: '{}'rcSs |jtSr )r!UPOSrsr#r r r rtrucSs |jtSr )r!XPOSrr r r rtrucSs |jtSr r!FEATSrr r r rtrucSs|jt|jt|jtfSr )r!rrrrr r r rtrucSs |jtSr r!LEMMArr r r rtrucSs|Sr r rr r r rtrucSs||jtfSr r!r$rr r r rtru) Tokens SentencesWordsrrFeatsAllTagsLemmasUASLAScSs |jtSr rrr r r rtrucSs |jtSr rrr r r rtru)rrrrrcs|jtdS)Ng?)rar!r$)r.)deprel_weightsr r weighted_lasszevaluate..weighted_lascSs||jtfSr rrr r r rtru WeightedLAS)r rr-rrr r) gold_ud system_udrrGrqryrrHrvresultrr ) rcrZrTr|rrr}rrlr evaluates\  ( '                  rcCsn|dkr dSi}|D]T}|ds|s,q|d}t|dkrTtd|t|d||d<q|S)Nr)r(rLzBExpected two columns in the UD Relations weights file on line '{}'r'r)r=rmr<r%r, ValueErrorr-float)Z weights_filerrIr!r r r load_deprel_weightss rcCs.t|fdditjdkrddini}t|S)Nmoderrdencodingrf)openrgrhrK)path_filer r r load_conllu_files&rcCs*t|j}t|j}t|j}t|||Sr )r gold_file system_filerweightsr)argsrrrr r r evaluate_wrappers   rc Cs@t}|jdtdd|jdtdd|jddtddd d d |jd d dddd|}|jdk rv|jsvd|_t|}|jst d d|dj nddddddddddg }|jdk r| dt d t d!|D]\}t d" |d||j d||jd||j ||jdk r2d# d||jnd$qdS)%Nrz,Name of the CoNLL-U file with the gold data.)typehelprz1Name of the CoNLL-U file with the predicted data.z --weightsz-wrZdeprel_weights_filezKCompute WeightedLAS using given weights for Universal Dependency Relations.)rdefaultmetavarrz --verbosez-vrcountzPrint all metrics.)ractionrr'zLAS F1 Score: {:.2f}drrrrrrrrrrrz:Metrics | Precision | Recall | F1 Score | AligndAccz;-----------+-----------+-----------+-----------+-----------z&{:11}|{:10.2f} |{:10.2f} |{:10.2f} |{}z{:10.2f}r)argparseArgumentParser add_argumentrjFileType parse_argsrverboserrEr-rOr>rMrNrP)parserrZ evaluationmetricsZmetricr r r main sB      &r__main__c@sHeZdZeddZddZddZddZd d Zd d Z d dZ dS) TestAlignmentc Csgd}}|D]}|d}t|dkrR|d7}|d||dt|dkq|d|d|t|d|d|ddD](}|d7}|d||t|dkqqttjdkrtj ntj d |dgS) zKPrepare fake CoNLL-U files with fake HEAD to prevent multiple roots errors.rr9r'z{} {} _ _ _ _ {} _ _ _z{}-{} {} _ _ _ _ _ _ _ _Nrd ) r%r,r>r-r*rKrgrhioStringIOBytesIOr)rlinesZ num_wordsrspartspartr r r _load_words8s   "( zTestAlignment._load_wordscCs |tt||||dSr )Z assertRaisesrrr)rrwrxr r r _test_exceptionHszTestAlignment._test_exceptioncCs|t||||}tdd|D}tdd|D}||dj|dj|djf||||d|||fdS)Ncss&|]}tdt|ddVqdSr'r9Nrr,r%r3r r r Msz)TestAlignment._test_ok..css&|]}tdt|ddVqdSrrr3r r r rNsrrL)rrsumZ assertEqualrMrNrO)rrwrxrSrr[r\r r r _test_okKs zTestAlignment._test_okcCs|dgdgdS)Nab)rrr r r test_exceptionRszTestAlignment.test_exceptioncCs0|dgdgd|dddgdddgddS)Nrr'rcrerrr r r test_equalUszTestAlignment.test_equalcCsb|dgdddgd|dddgddddgd|d gd d gd|dd gdd dgddS)Nz abc a b crrrrebc b cdz abcd a b c dab a bcd c dzde d ez bcd b c derrr r r test_equal_with_multiwordYsz'TestAlignment.test_equal_with_multiwordcCs|dgddddgd|ddgddddgd|dd dgddddgd |dd dgddd gd |d dgdddgd|ddgdd dgd |dd dgddgddS)NZabcdrrrrrabcr'ZbcrLrZcdz abc a BX cz def d EX frrzef e frzcd bc dzab AX BXzcd CX arrr r r test_alignment_szTestAlignment.test_alignmentN) rrr staticmethodrrrrrrrr r r r r7s r)T)NT) __future__rrrrrgZunittestrDr?r@rrrrr+r$ZDEPSZMISC ExceptionrrKrrrrrrZTestCaserr r r r Us$    p (