๓ <ฟCVc@@sคdZddlmZmZdZddlTddlmZmZddl m Z defd„ƒYZ dd „Z d efd „ƒYZd efd „ƒYZdS(u& Corpus reader for the SemCor Corpus. i(tabsolute_importtunicode_literalsu epytext en(t*(tXMLCorpusReadert XMLCorpusView(tTreetSemcorCorpusReadercB@sกeZdZed„Zdd„Zdd„ZddpBdpBdd„Zdd„Z dd „Z ddpudpudd „Z d „Z d „Z ed „ƒZRS(u Corpus reader for the SemCor Corpus. For access to the complete XML data structure, use the ``xml()`` method. For access to simple word lists and tagged word lists, use ``words()``, ``sents()``, ``tagged_words()``, and ``tagged_sents()``. cC@s)tj|||ƒ||_||_dS(N(Rt__init__t_lazyt_wordnet(tselftroottfileidstwordnettlazy((sk/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/corpus/reader/semcor.pyRs cC@s|j|dtttƒS(ur :return: the given file(s) as a list of words and punctuation symbols. :rtype: list(str) uword(t_itemstFalse(R R ((sk/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/corpus/reader/semcor.pytwordsscC@s|j|dtttƒS(uฤ :return: the given file(s) as a list of chunks, each of which is a list of words and punctuation symbols that form a unit. :rtype: list(list(str)) uchunk(RR(R R ((sk/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/corpus/reader/semcor.pytchunks%suposusemubothcC@s%|j|dt|dk|dkƒS(uc :return: the given file(s) as a list of tagged chunks, represented in tree form. :rtype: list(Tree) :param tag: `'pos'` (part of speech), `'sem'` (semantic), or `'both'` to indicate the kind of tags to include. Semantic tags consist of WordNet lemma IDs, plus an `'NE'` node if the chunk is a named entity without a specific entry in WordNet. (Named entities of type 'other' have no lemma. Other chunks not in WordNet have no semantic tag. Punctuation tokens have `None` for their part of speech tag.) uchunkusemupos(RR(R R ttag((sk/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/corpus/reader/semcor.pyt tagged_chunks.s cC@s|j|dtttƒS(u˜ :return: the given file(s) as a list of sentences, each encoded as a list of word strings. :rtype: list(list(str)) uword(RtTrueR(R R ((sk/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/corpus/reader/semcor.pytsents=scC@s|j|dtttƒS(u˜ :return: the given file(s) as a list of sentences, each encoded as a list of chunks. :rtype: list(list(list(str))) uchunk(RRR(R R ((sk/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/corpus/reader/semcor.pyt chunk_sentsEscC@s%|j|dt|dk|dkƒS(u“ :return: the given file(s) as a list of sentences. Each sentence is represented as a list of tagged chunks (in tree form). :rtype: list(list(Tree)) :param tag: `'pos'` (part of speech), `'sem'` (semantic), or `'both'` to indicate the kind of tags to include. Semantic tags consist of WordNet lemma IDs, plus an `'NE'` node if the chunk is a named entity without a specific entry in WordNet. (Named entities of type 'other' have no lemma. Other chunks not in WordNet have no semantic tag. Punctuation tokens have `None` for their part of speech tag.) uchunkusemupos(RR(R R R((sk/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/corpus/reader/semcor.pyt tagged_sentsMs c @s{|dkr%| r%‡fd†}nˆjr4tnˆj}tgˆj|ƒD]$}||||||ˆjƒ^qPƒS(Nuwordc@s"tˆjrtnˆj|ŒƒS(N(tLazyConcatenationRtSemcorWordViewt_words(targs(R (sk/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/corpus/reader/semcor.pyt`s(RRRtconcattabspathsR (R R tunitt bracket_senttpos_tagtsem_tagt_tfileid((R sk/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/corpus/reader/semcor.pyR\s c C@s๔|dkst‚g}tj|ƒjƒ}xฎ|jdƒD]}g} x[t|ƒD]M} tj| ||||jƒ} |dkr–| j | ƒqV| j | ƒqVW|rอ|j t |j d| ƒƒq=|j | ƒq=Wd|ks๐t‚|S(u] Helper used to implement the view methods -- returns a list of tokens, (segmented) words, chunks, or sentences. The tokens and chunks may optionally be tagged (with POS and sense information). :param fileid: The name of the underlying file. :param unit: One of `'token'`, `'word'`, or `'chunk'`. :param bracket_sent: If true, include sentence bracketing. :param pos_tag: Whether to include part-of-speech tags. :param sem_tag: Whether to include semantic tags, namely WordNet lemma and OOV named entity status. utokenuworduchunku.//susnum(utokenuworduchunkN(tAssertionErrort ElementTreetparsetgetroottfindallt_all_xmlwords_inRt_wordR textendtappendtSemcorSentencetattribtNone( R R%R R!R"R#tresulttxmldoctxmlsenttsenttxmlwordtitm((sk/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/corpus/reader/semcor.pyRfs  cC@sR|j}|sd}n|jd|ƒ}|jdƒ}|dk rw|d|}dt|jd ƒd ƒd } n d}} |jd |ƒ} |jdƒ} d|jƒk} |jdƒ} |dkr"| rๆ| rๆ|}n8|f|r๛| fnd|r|| | | fnd}|S|jdƒ}|dkrA|S| dk rพy|j|ƒ}Wqพtk rบyd|| t| ƒf}Wqปtk rถ|d| d| }qปXqพXn|rึt | |ƒgn|}|r| r| dk r t |t d|ƒgƒSt d|ƒSn1|r<| dk r<t ||ƒS|rJ|d S|SdS(Nuulemmaulexsnu%unuvuaurusu:iiurdfuwnsnupnuposutokenu_uwordu %s.%s.%02du.uNE(unuvuaurus((( ttexttgetR1tinttsplittkeystlemma_from_keyt Exceptiont ValueErrorR(R6R R"R#R ttkntlemmatlexsnt sense_keytwnpostredeftsensenumt isOOVEntitytposR7twwtsensetbottom((sk/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/corpus/reader/semcor.pyR,‰sN   $   8    !   N(t__name__t __module__t__doc__RRR1RRRRRRRRt staticmethodR,(((sk/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/corpus/reader/semcor.pyRs      #cC@sV|dkrg}nx:|D]2}|jdkrA|j|ƒqt||ƒqW|S(Nuwfupunc(uwfupunc(R1RR.R+(teltR2tchild((sk/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/corpus/reader/semcor.pyR+ฦs  R/cB@seZdZd„ZRS(u‹ A list of words, augmented by an attribute ``num`` used to record the sentence identifier (the ``n`` attribute from the XML). cC@s||_tj||ƒdS(N(tnumtlistR(R RRtitems((sk/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/corpus/reader/semcor.pyRาs (RLRMRNR(((sk/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/corpus/reader/semcor.pyR/อsRcB@s2eZdZd„Zd„Zd„Zd„ZRS(uN A stream backed corpus view specialized for use with the BNC corpus. cC@sY|rd}nd}||_||_||_||_||_tj|||ƒdS(u{ :param fileid: The name of the underlying file. :param unit: One of `'token'`, `'word'`, or `'chunk'`. :param bracket_sent: If true, include sentence bracketing. :param pos_tag: Whether to include part-of-speech tags. :param sem_tag: Whether to include semantic tags, namely WordNet lemma and OOV named entity status. u.*/su.*/s/(punc|wf)N(t_unitt_sentt_pos_tagt_sem_tagR RR(R R%R R!R"R#R ttagspec((sk/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/corpus/reader/semcor.pyRฺs       cC@s'|jr|j|ƒS|j|ƒSdS(N(RVt handle_sentt handle_word(R RPtcontext((sk/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/corpus/reader/semcor.pyt handle_elt๎s cC@s%tj||j|j|j|jƒS(N(RR,RURWRXR (R RP((sk/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/corpus/reader/semcor.pyR[๒scC@s‹g}xn|D]f}|jdkr`|j|ƒ}|jdkrP|j|ƒqs|j|ƒq td|jƒ‚q Wt|jd|ƒS(NuwfupuncuworduUnexpected element %susnum(uwfupunc(RR[RUR-R.R?R/R0(R RPR5RQR7((sk/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/corpus/reader/semcor.pyRZ๕s (RLRMRNRR]R[RZ(((sk/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/corpus/reader/semcor.pyRึs    N(RNt __future__RRt __docformat__tnltk.corpus.reader.apitnltk.corpus.reader.xmldocsRRt nltk.treeRRR1R+RSR/R(((sk/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/corpus/reader/semcor.pyt s ด