ó <¿CVc@sddlmZddlmZddlmZddlmZddlm Z m Z m Z ddl Tddl Tdefd„ƒYZe d efd „ƒYƒZd efd „ƒYZe d efd„ƒYƒZe defd„ƒYƒZe e defd„ƒYƒƒZdS(iÿÿÿÿ(tunicode_literals(tTree(t ElementTree(traise_unorderable_types(ttotal_orderingtpython_2_unicode_compatiblet string_types(t*tNombankCorpusReadercBsteZdZdd d d dd„Zd d„Zd d„Zd„Zd„Zd d„Z d „Z d „d „Z RS( u” Corpus reader for the nombank corpus, which augments the Penn Treebank with information about the predicate argument structure of every noun instance. The corpus consists of two parts: the predicate-argument annotations themselves, and a set of "frameset files" which define the argument labels used by the annotations, on a per-noun basis. Each "frameset file" contains one or more predicates, such as ``'turn'`` or ``'turn_on'``, each of which is divided into coarse-grained word senses called "rolesets". For each "roleset", the frameset file provides descriptions of the argument roles, along with examples. uuutf8cCs~t|tƒr!t||ƒ}nt|ƒ}tj||||g||ƒ||_||_||_||_ ||_ dS(uÚ :param root: The root directory for this corpus. :param nomfile: The name of the file containing the predicate- argument annotations (relative to ``root``). :param framefiles: A list or regexp specifying the frameset fileids for this corpus. :param parse_fileid_xform: A transform that should be applied to the fileids in this corpus. This should be a function of one argument (a fileid) that returns a string (the new fileid). :param parse_corpus: The corpus containing the parse trees corresponding to this corpus. These parse trees are necessary to resolve the tree pointers used by nombank. N( t isinstanceRtfind_corpus_fileidstlistt CorpusReadert__init__t_nomfilet _framefilest _nounsfilet_parse_fileid_xformt _parse_corpus(tselftroottnomfilet framefilest nounsfiletparse_fileid_xformt parse_corpustencoding((sl/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/corpus/reader/nombank.pyR s     cCsb|dkr|j}nt|tjƒr6|g}ntg|D]}|j|ƒjƒ^q@ƒS(uV :return: the text contents of the given fileids, as a single string. N(tNonet_fileidsR tcompatRtconcattopentread(Rtfileidstf((sl/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/corpus/reader/nombank.pytraw@s   cs_i‰ˆdk r(‡fd†ˆdOsuinstance_filtercsˆj|ˆS(N(t_read_instance_block(tstream(tkwargsR(sl/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/corpus/reader/nombank.pyR&QsRN(RtStreamBackedCorpusViewtabspathRR(RR$((R$R)Rsl/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/corpus/reader/nombank.pyt instancesHs  cCs+t|j|jƒtd|j|jƒƒS(u :return: a corpus view that acts as a list of strings, one for each line in the predicate-argument annotation file. R(R*R+Rtread_line_blockR(R((sl/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/corpus/reader/nombank.pytlinesTscCsÞ|jdƒd}|jddƒ}|jddƒjddƒ}d|}||jkrotd |ƒ‚ntj|j|ƒjƒƒjƒ}xD|j d ƒD]}|j d |kr£|Sq£Wtd ||fƒ‚d S(uE :return: the xml description for the given roleset. u.iu perc-signu%uoneslashonezerou1/10u 1-slash-10u frames/%s.xmluFrameset file for %s not foundupredicate/rolesetuiduRoleset %s not found in %sN( tsplittreplaceRt ValueErrorRtparseR+Rtgetroottfindalltattrib(Rt roleset_idR$t framefiletetreetroleset((sl/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/corpus/reader/nombank.pyR9]s  $cCs¨|dk rDd|}||jkr8td|ƒ‚n|g}n |j}g}xH|D]@}tj|j|ƒjƒƒjƒ}|j|j dƒƒqZWt |ƒS(uA :return: list of xml descriptions for rolesets. u frames/%s.xmluFrameset file for %s not foundupredicate/rolesetN( RRR1RR2R+RR3tappendR4tLazyConcatenation(RR$R7RtrsetsR8((sl/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/corpus/reader/nombank.pytrolesetsss      $cCs+t|j|jƒtd|j|jƒƒS(u‰ :return: a corpus view that acts as a list of all noun lemmas in this corpus (from the nombank.1.0.words file). R(R*R+RR-R(R((sl/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/corpus/reader/nombank.pytnounsˆscCstS(N(tTrue(R%((sl/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/corpus/reader/nombank.pyR&‘scCssg}xftdƒD]X}|jƒjƒ}|rtj||j|jƒ}||ƒrk|j|ƒqkqqW|S(Nid(trangetreadlinetstriptNombankInstanceR2RRR:(RR(tinstance_filtertblocktitlineR%((sl/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/corpus/reader/nombank.pyR'‘s   N( t__name__t __module__t__doc__RR R#R,R.R9R=R>R'(((sl/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/corpus/reader/nombank.pyRs     RCcBseeZdd„Zed„ƒZd„Zd„Zd„ZeeddƒZ e ddd„ƒZ RS( c Cs[||_||_||_||_||_||_||_t|ƒ|_| |_ dS(N( tfileidtsentnumtwordnumR$t sensenumbert predicatetpredidttuplet argumentsR( RRKRLRMR$RNRORPRRR((sl/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/corpus/reader/nombank.pyR §s        cCsD|jjddƒ}|jddƒjddƒ}d||jfS(u¬The name of the roleset used by this instance's predicate. Use ``nombank.roleset() `` to look up information about the roleset.u%u perc-signu1/10u 1-slash-10uoneslashonezerou%s.%s(R$R0RN(Rtr((sl/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/corpus/reader/nombank.pyR9ÏscCsd|j|j|jfS(Nu'(RKRLRM(R((sl/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/corpus/reader/nombank.pyt__repr__ØscCsvd|j|j|j|j|jf}|j|jdff}x.t|ƒD] \}}|d||f7}qNW|S(Nu%s %s %s %s %surelu %s-%s(RKRLRMR$RNRRROtsorted(Rtstitemstargloctargid((sl/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/corpus/reader/nombank.pyt__str__Üs cCsI|jdkrdS|j|jjƒkr/dS|jj|jƒ|jS(N(RRRKR!t parsed_sentsRL(R((sl/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/corpus/reader/nombank.pyt _get_treeäs tdocus The parse tree corresponding to this instance, or None if the corresponding tree is not available.c Cs~|jƒ}t|ƒdkr1td|ƒ‚n|d \}}}}}|d} gt| ƒD]'\} } d| kra| j| ƒ^qa} t| ƒdkr³td|ƒ‚n|dk rÎ||ƒ}nt|ƒ}t|ƒ}| djddƒ\} }tj| ƒ}g}xB| D]:}|jddƒ\}}|j tj|ƒ|fƒqWt |||||||||ƒ S(Niu Badly formatted nombank line: %riu-reliiu-( R/tlenR1t enumeratetpopRtinttNombankTreePointerR2R:RC(RVRRtpiecesRKRLRMR$RNtargsRFtptreltpredlocRPRORRtargRXRY((sl/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/corpus/reader/nombank.pyR2ìs(  :     N( RHRIRR tpropertyR9RTRZR\ttreet staticmethodR2(((sl/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/corpus/reader/nombank.pyRC¤s '     tNombankPointercBseZdZd„ZRS(un A pointer used by nombank to identify one or more constituents in a parse tree. ``NombankPointer`` is an abstract base class with three concrete subclasses: - ``NombankTreePointer`` is used to point to single constituents. - ``NombankSplitTreePointer`` is used to point to 'split' constituents, which consist of a sequence of two or more ``NombankTreePointer`` pointers. - ``NombankChainTreePointer`` is used to point to entire trace chains in a tree. It consists of a sequence of pieces, which can be ``NombankTreePointer`` or ``NombankSplitTreePointer`` pointers. cCs|jtkrtƒ‚ndS(N(t __class__RltNotImplementedError(R((sl/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/corpus/reader/nombank.pyR s(RHRIRJR (((sl/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/corpus/reader/nombank.pyRls tNombankChainTreePointercBs,eZd„Zd„Zd„Zd„ZRS(cCs ||_dS(N(Rc(RRc((sl/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/corpus/reader/nombank.pyR &s cCsdjd„|jDƒƒS(Nu*css|]}d|VqdS(u%sN((t.0Re((sl/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/corpus/reader/nombank.pys -s(tjoinRc(R((sl/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/corpus/reader/nombank.pyRZ,scCsd|S(Nu((R((sl/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/corpus/reader/nombank.pyRT.scCsG|dkrtdƒ‚ntdg|jD]}|j|ƒ^q+ƒS(NuParse tree not avaialableu*CHAIN*(RR1RRctselect(RRjRe((sl/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/corpus/reader/nombank.pyRr0s (RHRIR RZRTRr(((sl/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/corpus/reader/nombank.pyRo$s   tNombankSplitTreePointercBs,eZd„Zd„Zd„Zd„ZRS(cCs ||_dS(N(Rc(RRc((sl/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/corpus/reader/nombank.pyR 6s cCsdjd„|jDƒƒS(Nu,css|]}d|VqdS(u%sN((RpRe((sl/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/corpus/reader/nombank.pys <s(RqRc(R((sl/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/corpus/reader/nombank.pyRZ;scCsd|S(Nu((R((sl/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/corpus/reader/nombank.pyRT=scCsG|dkrtdƒ‚ntdg|jD]}|j|ƒ^q+ƒS(NuParse tree not avaialableu*SPLIT*(RR1RRcRr(RRjRe((sl/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/corpus/reader/nombank.pyRr?s (RHRIR RZRTRr(((sl/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/corpus/reader/nombank.pyRs4s   RbcBseeZdZd„Zed„ƒZd„Zd„Zd„Zd„Z d„Z d„Z d „Z RS( u@ wordnum:height*wordnum:height*... wordnum:height, cCs||_||_dS(N(RMtheight(RRMRt((sl/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/corpus/reader/nombank.pyR Ks cCsã|jdƒ}t|ƒdkrGtg|D]}tj|ƒ^q+ƒS|jdƒ}t|ƒdkrŽtg|D]}tj|ƒ^qrƒS|jdƒ}t|ƒdkrÂtd|ƒ‚ntt|dƒt|dƒƒS(Nu*iu,u:iubad nombank pointer %ri(R/R^RoRbR2RsR1Ra(RVRctelt((sl/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/corpus/reader/nombank.pyR2Os  cCsd|j|jfS(Nu%s:%s(RMRt(R((sl/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/corpus/reader/nombank.pyRZbscCsd|j|jfS(NuNombankTreePointer(%d, %d)(RMRt(R((sl/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/corpus/reader/nombank.pyRTescCsdx&t|ttfƒr(|jd}qWt|tƒsB||kS|j|jkoc|j|jkS(Ni(R RoRsRcRbRMRt(Rtother((sl/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/corpus/reader/nombank.pyt__eq__hs    cCs ||k S(N((RRv((sl/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/corpus/reader/nombank.pyt__ne__rscCsrx&t|ttfƒr(|jd}qWt|tƒsNt|ƒt|ƒkS|j|j f|j|j fkS(Ni(R RoRsRcRbtidRMRt(RRv((sl/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/corpus/reader/nombank.pyt__lt__us   cCs,|dkrtdƒ‚n||j|ƒS(NuParse tree not avaialable(RR1ttreepos(RRj((sl/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/corpus/reader/nombank.pyRrs cCs|dkrtdƒ‚n|g}g}d}xätrt|dtƒrÑt|ƒt|ƒkrt|jdƒn|dcd7<|dt|dƒkrº|j|d|dƒq|jƒ|jƒq3||jkrÿt |t|ƒ|j d ƒS|d7}|jƒq3WdS(u} Convert this pointer to a standard 'tree position' pointer, given that it points to the given tree. uParse tree not avaialableiiÿÿÿÿiN( RR1R?R RR^R:R`RMRQRt(RRjtstackR{RM((sl/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/corpus/reader/nombank.pyR{ƒs$      ( RHRIRJR RkR2RZRTRwRxRzRrR{(((sl/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/corpus/reader/nombank.pyRbCs     N(t __future__Rt nltk.treeRt xml.etreeRtnltk.internalsRt nltk.compatRRRtnltk.corpus.reader.utiltnltk.corpus.reader.apiR RtobjectRCRlRoRsRb(((sl/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/corpus/reader/nombank.pyt s"  ‘m