ó <¿CVc@sQddlmZddlZddlmZddlmZddlmZddl m Z ddl m Z ddl TddlTd efd „ƒYZejd efd „ƒYƒZd efd„ƒYZejdefd„ƒYƒZejdefd„ƒYƒZe ejdefd„ƒYƒƒZejdefd„ƒYƒZdS(iÿÿÿÿ(tunicode_literalsN(t ElementTree(tcompat(tTree(traise_unorderable_types(ttotal_ordering(t*tPropbankCorpusReadercBsteZdZdd d d dd„Zd d„Zd d„Zd„Zd„Zd d„Z d „Z d „d „Z RS( u• Corpus reader for the propbank corpus, which augments the Penn Treebank with information about the predicate argument structure of every verb instance. The corpus consists of two parts: the predicate-argument annotations themselves, and a set of "frameset files" which define the argument labels used by the annotations, on a per-verb basis. Each "frameset file" contains one or more predicates, such as ``'turn'`` or ``'turn_on'``, each of which is divided into coarse-grained word senses called "rolesets". For each "roleset", the frameset file provides descriptions of the argument roles, along with examples. uuutf8cCst|tjƒr$t||ƒ}nt|ƒ}tj||||g||ƒ||_||_||_ ||_ ||_ dS(uÜ :param root: The root directory for this corpus. :param propfile: The name of the file containing the predicate- argument annotations (relative to ``root``). :param framefiles: A list or regexp specifying the frameset fileids for this corpus. :param parse_fileid_xform: A transform that should be applied to the fileids in this corpus. This should be a function of one argument (a fileid) that returns a string (the new fileid). :param parse_corpus: The corpus containing the parse trees corresponding to this corpus. These parse trees are necessary to resolve the tree pointers used by propbank. N( t isinstanceRt string_typestfind_corpus_fileidstlistt CorpusReadert__init__t _propfilet _framefilest _verbsfilet_parse_fileid_xformt _parse_corpus(tselftroottpropfilet framefilest verbsfiletparse_fileid_xformt parse_corpustencoding((sm/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/corpus/reader/propbank.pyR !s     cCsb|dkr|j}nt|tjƒr6|g}ntg|D]}|j|ƒjƒ^q@ƒS(uV :return: the text contents of the given fileids, as a single string. N(tNonet_fileidsRRR tconcattopentread(Rtfileidstf((sm/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/corpus/reader/propbank.pytrawAs   cs_i‰ˆdk r(‡fd†ˆdPsuinstance_filtercsˆj|ˆS(N(t_read_instance_block(tstream(tkwargsR(sm/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/corpus/reader/propbank.pyR%RsRN(RtStreamBackedCorpusViewtabspathRR(RR#((R#R(Rsm/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/corpus/reader/propbank.pyt instancesIs  cCs+t|j|jƒtd|j|jƒƒS(u :return: a corpus view that acts as a list of strings, one for each line in the predicate-argument annotation file. R(R)R*Rtread_line_blockR(R((sm/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/corpus/reader/propbank.pytlinesUscCs®|jdƒd}d|}||jkr?td|ƒ‚ntj|j|ƒjƒƒjƒ}xD|jdƒD]}|j d|krs|SqsWtd||fƒ‚dS( uE :return: the xml description for the given roleset. u.iu frames/%s.xmluFrameset file for %s not foundupredicate/rolesetuiduRoleset %s not found in %sN( tsplitRt ValueErrorRtparseR*Rtgetroottfindalltattrib(Rt roleset_idR#t framefiletetreetroleset((sm/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/corpus/reader/propbank.pyR7^s  $cCs¨|dk rDd|}||jkr8td|ƒ‚n|g}n |j}g}xH|D]@}tj|j|ƒjƒƒjƒ}|j|j dƒƒqZWt |ƒS(uA :return: list of xml descriptions for rolesets. u frames/%s.xmluFrameset file for %s not foundupredicate/rolesetN( RRR/RR0R*RR1tappendR2tLazyConcatenation(RR#R5RtrsetsR6((sm/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/corpus/reader/propbank.pytrolesetsrs      $cCs+t|j|jƒtd|j|jƒƒS(u :return: a corpus view that acts as a list of all verb lemmas in this corpus (from the verbs.txt file). R(R)R*RR,R(R((sm/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/corpus/reader/propbank.pytverbs‡scCstS(N(tTrue(R$((sm/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/corpus/reader/propbank.pyR%scCssg}xftdƒD]X}|jƒjƒ}|rtj||j|jƒ}||ƒrk|j|ƒqkqqW|S(Nid(trangetreadlinetstriptPropbankInstanceR0RRR8(RR'tinstance_filtertblocktitlineR$((sm/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/corpus/reader/propbank.pyR&s   N( t__name__t __module__t__doc__RR R"R+R-R7R;R<R&(((sm/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/corpus/reader/propbank.pyRs     RAcBsƒeZd d„Zed„ƒZed„ƒZed„ƒZd„Zd„Z d„Z ee ddƒZ e d d d „ƒZ RS( c Cs[||_||_||_||_||_||_||_t|ƒ|_| |_ dS(N( tfileidtsentnumtwordnumttaggerR7t inflectiont predicatettuplet argumentsR( RRIRJRKRLR7RMRNRPR((sm/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/corpus/reader/propbank.pyR ¦s        cCs|jjdƒdS(uThe baseform of the predicate.u.i(R7R.(R((sm/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/corpus/reader/propbank.pyR#ÒscCs|jjdƒdS(u"The sense number of the predicate.u.i(R7R.(R((sm/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/corpus/reader/propbank.pyt sensenumber×scCsdS(uIdentifier of the predicate.urel((R((sm/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/corpus/reader/propbank.pytpredidÜscCsd|j|j|jfS(Nu((RIRJRK(R((sm/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/corpus/reader/propbank.pyt__repr__áscCs|d|j|j|j|j|j|jf}|j|jdff}x.t|ƒD] \}}|d||f7}qTW|S(Nu%s %s %s %s %s %surelu %s-%s( RIRJRKRLR7RMRPRNtsorted(Rtstitemstargloctargid((sm/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/corpus/reader/propbank.pyt__str__ås cCsI|jdkrdS|j|jjƒkr/dS|jj|jƒ|jS(N(RRRIR t parsed_sentsRJ(R((sm/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/corpus/reader/propbank.pyt _get_treeís tdocus The parse tree corresponding to this instance, or None if the corresponding tree is not available.c Cs|jƒ}t|ƒdkr1td|ƒ‚n|d \}}}}}} g|dD]} | jdƒrX| ^qX} g|dD]} | jdƒs„| ^q„} t| ƒdkrÊtd|ƒ‚n|dk rå||ƒ}nt|ƒ}t|ƒ}tj| ƒ} tj| dd ƒ} g}xB| D]:}|jddƒ\}}|j tj|ƒ|fƒq0Wt |||||| | ||ƒ S( Niu!Badly formatted propbank line: %riu-reliiiüÿÿÿu-( R.tlenR/tendswithRtinttPropbankInflectionR0tPropbankTreePointerR8RA(RURRtpiecesRIRJRKRLR7RMtptreltargsRNRPtargRWRX((sm/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/corpus/reader/propbank.pyR0õs* ,,      N(RFRGRR tpropertyR#RQRRRSRYR[ttreet staticmethodR0(((sm/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/corpus/reader/propbank.pyRA£s +     tPropbankPointercBseZdZd„ZRS(u„ A pointer used by propbank to identify one or more constituents in a parse tree. ``PropbankPointer`` is an abstract base class with three concrete subclasses: - ``PropbankTreePointer`` is used to point to single constituents. - ``PropbankSplitTreePointer`` is used to point to 'split' constituents, which consist of a sequence of two or more ``PropbankTreePointer`` pointers. - ``PropbankChainTreePointer`` is used to point to entire trace chains in a tree. It consists of a sequence of pieces, which can be ``PropbankTreePointer`` or ``PropbankSplitTreePointer`` pointers. cCs|jtkrtƒ‚ndS(N(t __class__RjtNotImplementedError(R((sm/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/corpus/reader/propbank.pyR *s(RFRGRHR (((sm/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/corpus/reader/propbank.pyRjs tPropbankChainTreePointercBs,eZd„Zd„Zd„Zd„ZRS(cCs ||_dS(N(Rb(RRb((sm/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/corpus/reader/propbank.pyR 0s cCsdjd„|jDƒƒS(Nu*css|]}d|VqdS(u%sN((t.0Rc((sm/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/corpus/reader/propbank.pys 7s(tjoinRb(R((sm/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/corpus/reader/propbank.pyRY6scCsd|S(Nu((R((sm/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/corpus/reader/propbank.pyRS8scCsG|dkrtdƒ‚ntdg|jD]}|j|ƒ^q+ƒS(NuParse tree not avaialableu*CHAIN*(RR/RRbtselect(RRhRc((sm/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/corpus/reader/propbank.pyRp:s (RFRGR RYRSRp(((sm/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/corpus/reader/propbank.pyRm.s   tPropbankSplitTreePointercBs,eZd„Zd„Zd„Zd„ZRS(cCs ||_dS(N(Rb(RRb((sm/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/corpus/reader/propbank.pyR As cCsdjd„|jDƒƒS(Nu,css|]}d|VqdS(u%sN((RnRc((sm/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/corpus/reader/propbank.pys Gs(RoRb(R((sm/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/corpus/reader/propbank.pyRYFscCsd|S(Nu((R((sm/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/corpus/reader/propbank.pyRSHscCsG|dkrtdƒ‚ntdg|jD]}|j|ƒ^q+ƒS(NuParse tree not avaialableu*SPLIT*(RR/RRbRp(RRhRc((sm/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/corpus/reader/propbank.pyRpJs (RFRGR RYRSRp(((sm/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/corpus/reader/propbank.pyRq?s   RacBseeZdZd„Zed„ƒZd„Zd„Zd„Zd„Z d„Z d„Z d „Z RS( u@ wordnum:height*wordnum:height*... wordnum:height, cCs||_||_dS(N(RKtheight(RRKRr((sm/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/corpus/reader/propbank.pyR Ws cCsã|jdƒ}t|ƒdkrGtg|D]}tj|ƒ^q+ƒS|jdƒ}t|ƒdkrŽtg|D]}tj|ƒ^qrƒS|jdƒ}t|ƒdkrÂtd|ƒ‚ntt|dƒt|dƒƒS(Nu*iu,u:iubad propbank pointer %ri(R.R]RmRaR0RqR/R_(RURbtelt((sm/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/corpus/reader/propbank.pyR0[s  cCsd|j|jfS(Nu%s:%s(RKRr(R((sm/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/corpus/reader/propbank.pyRYnscCsd|j|jfS(NuPropbankTreePointer(%d, %d)(RKRr(R((sm/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/corpus/reader/propbank.pyRSqscCsdx&t|ttfƒr(|jd}qWt|tƒsB||kS|j|jkoc|j|jkS(Ni(RRmRqRbRaRKRr(Rtother((sm/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/corpus/reader/propbank.pyt__eq__ts    cCs ||k S(N((RRt((sm/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/corpus/reader/propbank.pyt__ne__~scCsrx&t|ttfƒr(|jd}qWt|tƒsNt|ƒt|ƒkS|j|j f|j|j fkS(Ni(RRmRqRbRatidRKRr(RRt((sm/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/corpus/reader/propbank.pyt__lt__s   cCs,|dkrtdƒ‚n||j|ƒS(NuParse tree not avaialable(RR/ttreepos(RRh((sm/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/corpus/reader/propbank.pyRp‹s cCs|dkrtdƒ‚n|g}g}d}xätrt|dtƒrÑt|ƒt|ƒkrt|jdƒn|dcd7<|dt|dƒkrº|j|d|dƒq|jƒ|jƒq3||jkrÿt |t|ƒ|j d ƒS|d7}|jƒq3WdS(u} Convert this pointer to a standard 'tree position' pointer, given that it points to the given tree. uParse tree not avaialableiiÿÿÿÿiN( RR/R=RRR]R8tpopRKRORr(RRhtstackRyRK((sm/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/corpus/reader/propbank.pyRys$      ( RFRGRHR RiR0RYRSRuRvRxRpRy(((sm/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/corpus/reader/propbank.pyRaOs     R`cBs¤eZdZdZdZdZdZdZdZdZ dZ dZ dZ d Z dZd Zd d d d d d „Zd „Zd „ZejdƒZed„ƒZRS(uiugupuvufunuoubu3uau-cCs1||_||_||_||_||_dS(N(tformttensetaspecttpersontvoice(RR|R}R~RR€((sm/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/corpus/reader/propbank.pyR Ês     cCs#|j|j|j|j|jS(N(R|R}R~RR€(R((sm/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/corpus/reader/propbank.pyRYÑscCsd|S(Nu((R((sm/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/corpus/reader/propbank.pyRSÔsu"[igpv\-][fpn\-][pob\-][3\-][ap\-]$cCsct|tjƒs!tdƒ‚nt|ƒdksFtjj|ƒ rYtd|ƒ‚nt|ŒS(Nuexpected a stringiu!Bad propbank inflection string %r( RRR t TypeErrorR]R`t _VALIDATEtmatchR/(RU((sm/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/corpus/reader/propbank.pyR0Ùs (RFRGt INFINITIVEtGERUNDt PARTICIPLEtFINITEtFUTUREtPASTtPRESENTtPERFECTt PROGRESSIVEtPERFECT_AND_PROGRESSIVEt THIRD_PERSONtACTIVEtPASSIVEtNONER RYRStretcompileR‚RiR0(((sm/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/corpus/reader/propbank.pyR`²s&  (t __future__RR’t xml.etreeRtnltkRt nltk.treeRtnltk.internalsRt nltk.compatRtnltk.corpus.reader.utiltnltk.corpus.reader.apiR Rtpython_2_unicode_compatibletobjectRARjRmRqRaR`(((sm/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/corpus/reader/propbank.pyts*    x   a