ó <¿CVc @snddlmZmZddlZddlmZddlmZddlm Z ddl m Z ddl m Zd„Z d efd „ƒYZd „Zd d dddd„ZejdƒZd&d d„Zd„Zd'd ed„Zd„ZejdejƒZejdƒZd„Zddddddd d!d"g d d#„Zd$„Z e!d%krje ƒndS((iÿÿÿÿ(tprint_functiontunicode_literalsN(tTree(tmap_tag(t str2tuple(tpython_2_unicode_compatible(taccuracycCs_g}g}xC|D];}|j|jƒƒ}|t|ƒ7}|t|ƒ7}qWt||ƒS(u| Score the accuracy of the chunker against the gold standard. Strip the chunk information from the gold standard and rechunk it using the chunker, then compute the accuracy score. :type chunker: ChunkParserI :param chunker: The chunker being evaluated. :type gold: tree :param gold: The chunk structures to score the chunker on. :rtype: float (tparsetflattenttree2conlltagst _accuracy(tchunkertgoldt gold_tagst test_tagst gold_treet test_tree((sa/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/chunk/util.pyRs  t ChunkScorecBseZdZd„Zd„Zd„Zd„Zd„Zd„Zdd„Z d „Z d „Z d „Z d „Z d „Zd„Zd„ZRS(u; A utility class for scoring chunk parsers. ``ChunkScore`` can evaluate a chunk parser's output, based on a number of statistics (precision, recall, f-measure, misssed chunks, incorrect chunks). It can also combine the scores from the parsing of multiple texts; this makes it significantly easier to evaluate a chunk parser that operates one sentence at a time. Texts are evaluated with the ``score`` method. The results of evaluation can be accessed via a number of accessor methods, such as ``precision`` and ``f_measure``. A typical use of the ``ChunkScore`` class is:: >>> chunkscore = ChunkScore() # doctest: +SKIP >>> for correct in correct_sentences: # doctest: +SKIP ... guess = chunkparser.parse(correct.leaves()) # doctest: +SKIP ... chunkscore.score(correct, guess) # doctest: +SKIP >>> print('F Measure:', chunkscore.f_measure()) # doctest: +SKIP F Measure: 0.823 :ivar kwargs: Keyword arguments: - max_tp_examples: The maximum number actual examples of true positives to record. This affects the ``correct`` member function: ``correct`` will not return more than this number of true positive examples. This does *not* affect any of the numerical metrics (precision, recall, or f-measure) - max_fp_examples: The maximum number actual examples of false positives to record. This affects the ``incorrect`` member function and the ``guessed`` member function: ``incorrect`` will not return more than this number of examples, and ``guessed`` will not return more than this number of true positive examples. This does *not* affect any of the numerical metrics (precision, recall, or f-measure) - max_fn_examples: The maximum number actual examples of false negatives to record. This affects the ``missed`` member function and the ``correct`` member function: ``missed`` will not return more than this number of examples, and ``correct`` will not return more than this number of true negative examples. This does *not* affect any of the numerical metrics (precision, recall, or f-measure) - chunk_label: A regular expression indicating which chunks should be compared. Defaults to ``'.*'`` (i.e., all chunks). :type _tp: list(Token) :ivar _tp: List of true positives :type _fp: list(Token) :ivar _fp: List of false positives :type _fn: list(Token) :ivar _fn: List of false negatives :type _tp_num: int :ivar _tp_num: Number of true positives :type _fp_num: int :ivar _fp_num: Number of false positives :type _fn_num: int :ivar _fn_num: Number of false negatives. cKsÓtƒ|_tƒ|_tƒ|_tƒ|_tƒ|_|jddƒ|_|jddƒ|_|jddƒ|_ |jddƒ|_ d|_ d|_ d|_ d|_d|_d|_t|_dS( Numax_tp_examplesidumax_fp_examplesumax_fn_examplesu chunk_labelu.*ig(tsett_correctt_guessedt_tpt_fpt_fntgett_max_tpt_max_fpt_max_fnt _chunk_labelt_tp_numt_fp_numt_fn_numt_countt _tags_correctt _tags_totaltFalset_measuresNeedUpdate(tselftkwargs((sa/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/chunk/util.pyt__init__rs            cCsˆ|jr„|j|j@|_|j|j|_|j|j|_t|jƒ|_t|jƒ|_t|jƒ|_ t |_ndS(N( R$RRRRRtlenRRRR#(R%((sa/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/chunk/util.pyt_updateMeasures…s cCsÕ|jt||j|jƒO_|jt||j|jƒO_|jd7_t|_yt|ƒ}t|ƒ}Wntk r“d}}nX|j t |ƒ7_ |j t d„t ||ƒDƒƒ7_ dS(uU Given a correctly chunked sentence, score another chunked version of the same sentence. :type correct: chunk structure :param correct: The known-correct ("gold standard") chunked sentence. :type guessed: chunk structure :param guessed: The chunked sentence to be scored. icss'|]\}}||krdVqdS(iN((t.0tttg((sa/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/chunk/util.pys ¨s N((Rt _chunksetsR RRtTrueR$R t ValueErrorR"R(R!tsumtzip(R%tcorrecttguessedt correct_tagst guessed_tags((sa/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/chunk/util.pytscores !!   cCs!|jdkrdS|j|jS(uÁ Return the overall tag-based accuracy for all text that have been scored by this ``ChunkScore``, using the IOB (conll2000) tag encoding. :rtype: float ii(R"R!(R%((sa/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/chunk/util.pyR¬scCs?|jƒ|j|j}|dkr*dSt|jƒ|SdS(u‰ Return the overall precision for all texts that have been scored by this ``ChunkScore``. :rtype: float iN(R)RRtfloat(R%tdiv((sa/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/chunk/util.pyt precision·s   cCs?|jƒ|j|j}|dkr*dSt|jƒ|SdS(u† Return the overall recall for all texts that have been scored by this ``ChunkScore``. :rtype: float iN(R)RRR7(R%R8((sa/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/chunk/util.pytrecallÃs   gà?cCsV|jƒ|jƒ}|jƒ}|dks:|dkr>dSd||d||S(u» Return the overall F measure for all texts that have been scored by this ``ChunkScore``. :param alpha: the relative weighting of precision and recall. Larger alpha biases the score towards the precision value, while smaller alpha biases the score towards the recall value. ``alpha`` should have a value in the range [0,1]. :type alpha: float :rtype: float ii(R)R9R:(R%talphatptr((sa/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/chunk/util.pyt f_measureÏs   cCs4|jƒt|jƒ}g|D]}|d^q S(uÈ Return the chunks which were included in the correct chunk structures, but not in the guessed chunk structures, listed in input order. :rtype: list of chunks i(R)tlistR(R%tchunkstc((sa/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/chunk/util.pytmissedâs cCs4|jƒt|jƒ}g|D]}|d^q S(uÀ Return the chunks which were included in the guessed chunk structures, but not in the correct chunk structures, listed in input order. :rtype: list of chunks i(R)R?R(R%R@RA((sa/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/chunk/util.pyt incorrectîs cCs*t|jƒ}g|D]}|d^qS(u— Return the chunks which were included in the correct chunk structures, listed in input order. :rtype: list of chunks i(R?R(R%R@RA((sa/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/chunk/util.pyR2ùscCs*t|jƒ}g|D]}|d^qS(u— Return the chunks which were included in the guessed chunk structures, listed in input order. :rtype: list of chunks i(R?R(R%R@RA((sa/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/chunk/util.pyR3scCs|jƒ|j|jS(N(R)RR(R%((sa/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/chunk/util.pyt__len__ s cCsdtt|ƒƒdS(u` Return a concise representation of this ``ChunkScoring``. :rtype: str u(treprR((R%((sa/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/chunk/util.pyt__repr__scCsLdd|jƒdd|jƒdd|jƒdd|jƒdS(u- Return a verbose representation of this ``ChunkScoring``. This representation includes the precision, recall, and f-measure scores. For other information about the score, use the accessor methods (e.g., ``missed()`` and ``incorrect()``). :rtype: str uChunkParse score: u IOB Accuracy: %5.1f%% idu Precision: %5.1f%% u Recall: %5.1f%% u F-Measure: %5.1f%%(RR9R:R>(R%((sa/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/chunk/util.pyt__str__s 9(t__name__t __module__t__doc__R'R)R6RR9R:R>RBRCR2R3RDRFRG(((sa/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/chunk/util.pyR4s=     cCs“d}g}xz|D]r}t|tƒr{tj||jƒƒrb|j||f|jƒfƒn|t|jƒƒ7}q|d7}qWt |ƒS(Nii( t isinstanceRtretmatchtlabeltappendtfreezeR(tleavesR(R+tcountt chunk_labeltposR@tchild((sa/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/chunk/util.pyR-*s "uNPuSu/c Csštjdƒ}t|gƒg}xC|j|ƒD]2}|jƒ} | ddkr®t|ƒdkr~td|jƒƒ‚nt|gƒ} |dj| ƒ|j| ƒq1| ddkröt|ƒdkrétd |jƒƒ‚n|j ƒq1|d kr|dj| ƒq1t | |ƒ\} } |rL|rLt ||| ƒ} n|dj| | fƒq1Wt|ƒdkr’td t|ƒƒ‚n|dS( uB Divide a string of bracketted tagged text into chunks and unchunked tokens, and produce a Tree. Chunks are marked by square brackets (``[...]``). Words are delimited by whitespace, and each word should have the form ``text/tag``. Words that do not contain a slash are assigned a ``tag`` of None. :param s: The string to be converted :type s: str :param chunk_label: The label to use for chunk nodes :type chunk_label: str :param root_label: The label to use for the root of the tree :type root_label: str :rtype: Tree u\[|\]|[^\[\]\s]+iu[iuUnexpected [ at char %diÿÿÿÿu]iuUnexpected ] at char %duExpected ] at char %dN( RLtcompileRtfinditertgroupR(R/tstartROtpoptNoneRR( tsRSt root_labeltsept source_tagsett target_tagsettWORD_OR_BRACKETtstackRMttexttchunktwordttag((sa/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/chunk/util.pyt tagstr2tree7s.    u(\S+)\s+(\S+)\s+([IOB])-?(\S+)?uPPuVPc Csht|gƒg}xKt|jdƒƒD]4\}}|jƒsFq(ntj|ƒ}|d krttd|ƒ‚n|jƒ\}}} } |d k r­| |kr­d} n| dkoÌ| |dj ƒk} | dksá| rt |ƒdkr|j ƒqn| dks| rEt| gƒ} |dj | ƒ|j | ƒn|dj ||fƒq(W|d S( u* Return a chunk structure for a single sentence encoded in the given CONLL 2000 style string. This function converts a CoNLL IOB string into a tree. It uses the specified chunk types (defaults to NP, PP and VP), and creates a tree rooted at a node labeled S (by default). :param s: The CoNLL string to be converted. :type s: str :param chunk_types: The chunk types to be converted. :type chunk_types: tuple :param root_label: The node label to use for the root. :type root_label: str :rtype: Tree u uError on line %duOuIiÿÿÿÿuBOiuBiN( Rt enumeratetsplittstript_LINE_RERMR[R/tgroupsRNR(RZRO( R\t chunk_typesR]RbtlinenotlineRMReRftstatet chunk_typet mismatch_IRd((sa/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/chunk/util.pyt conllstr2treeis*"     "cCsºg}x­|D]¥}ym|jƒ}d}xT|D]L}t|tƒrStdƒ‚n|j|d|d||fƒd}q/WWq tk r±|j|d|ddfƒq Xq W|S(uË Return a list of 3-tuples containing ``(word, tag, IOB-tag)``. Convert a tree to the CoNLL IOB tag format. :param t: The tree to be converted. :type t: Tree :rtype: list(tuple) uB-u7Tree is too deeply nested to be printed in CoNLL formatiiuI-uO(RNRKRR/ROtAttributeError(R+ttagsRUtcategorytprefixtcontents((sa/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/chunk/util.pyR œs    " &cCsot|gƒ}xY|D]Q\}}}|d kr\|rFtdƒ‚qg|j||fƒq|jdƒr‘|jt|d||fgƒƒq|jdƒr5t|ƒdksàt|dtƒ sà|djƒ|dkr|rõtdƒ‚q2|jt|d||fgƒƒqg|dj||fƒq|dkrW|j||fƒqtd|ƒ‚qW|S( u1 Convert the CoNLL IOB format to a tree. uBad conll tag sequenceuB-iuI-iiÿÿÿÿuOuBad conll tag %rN(RR[R/ROt startswithR(RKRN(tsentenceRmR]tstrictttreeRetpostagtchunktag((sa/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/chunk/util.pytconlltags2tree´s& &&& cCs5gt|ƒD]}dj|ƒ^q }dj|ƒS(uÒ Return a multiline string where each line contains a word, tag and IOB tag. Convert a tree to the CoNLL IOB string format :param t: The tree to be converted. :type t: Tree :rtype: str u u (R tjoin(R+ttokentlines((sa/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/chunk/util.pyt tree2conllstrÓs (u\s*(\s*(?P.+?)\s*\s*)?(\s*(?P.+?)\s*\s*)?(\s*(?P.+?)\s*\s*)?\s*(\s*(?P.+?)\s*\s*)?(?P.*?)\s*\s*\s*u#]*?type="(?P\w+)"cCsHt|gƒg}|dkr"gSxútjd|ƒD]æ}|jƒ}y¤|jdƒr½tj|ƒ}|dkr„td|ƒnt|jdƒgƒ}|dj |ƒ|j |ƒn-|jdƒrÙ|j ƒn|dj |ƒWq5t t fk rt d|j ƒƒ‚q5Xq5Wt|ƒdkr@t d ƒ‚n|d S( Nu<[^>]+>|[^\s<]+us4  ö 03       /