ó <¿CVc@s{ddlZddlmZddlmZmZddlmZd„Zdefd„ƒYZ defd „ƒYZ dS( iÿÿÿÿN(tcompat(tStreamBackedCorpusViewtconcat(t CorpusReadercs%tjˆƒd‡fd†ƒ}|S(Ncs5|jddƒ|s%|jƒ}nˆ|||S(Nttags(tpoptNonetfileids(tselfRtkwargs(tfun(sk/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/corpus/reader/ipipan.pyt decorators(t functoolstwrapsR(R R ((R sk/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/corpus/reader/ipipan.pyt _parse_argss tIPIPANCorpusReadercBseZdZd„Zdd„Zdd„Zdd„Zdd„Zdddd„Z e dd„ƒZ e dd„ƒZ e dd „ƒZ e dd „ƒZe dd „ƒZe dd „ƒZd „Zd„Zd„Zdd„Zd„Zd„Zd„ZRS(s5 Corpus reader designed to work with corpus created by IPI PAN. See http://korpus.pl/en/ for more details about IPI PAN corpus. The corpus includes information about text domain, channel and categories. You can access possible values using ``domains()``, ``channels()`` and ``categories()``. You can use also this metadata to filter files, e.g.: ``fileids(channel='prasa')``, ``fileids(categories='publicystyczny')``. The reader supports methods: words, sents, paras and their tagged versions. You can get part of speech instead of full tag by giving "simplify_tags=True" parameter, e.g.: ``tagged_sents(simplify_tags=True)``. Also you can get all tags disambiguated tags specifying parameter "one_tag=False", e.g.: ``tagged_paras(one_tag=False)``. You can get all tags that were assigned by a morphological analyzer specifying parameter "disamb_only=False", e.g. ``tagged_words(disamb_only=False)``. The IPIPAN Corpus contains tags indicating if there is a space between two tokens. To add special "no space" markers, you should specify parameter "append_no_space=True", e.g. ``tagged_words(append_no_space=True)``. As a result in place where there should be no space between two tokens new pair ('', 'no-space') will be inserted (for tagged data) and just '' for methods without tags. The corpus reader can also try to append spaces between words. To enable this option, specify parameter "append_space=True", e.g. ``words(append_space=True)``. As a result either ' ' or (' ', 'space') will be inserted between tokens. By default, xml entities like " and & are replaced by corresponding characters. You can turn off this feature, specifying parameter "replace_xmlentities=False", e.g. ``words(replace_xmlentities=False)``. cCstj|||ddƒdS(N(Rt__init__R(RtrootR((sk/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/corpus/reader/ipipan.pyR;scCsm|s|jƒ}ng}xB|j|ƒD]1}t|dƒ}|j|jƒƒWdQXq+Wdj|ƒS(Ntrt(Rt_list_morph_filestopentappendtreadtjoin(RRt filecontentstfileidtinfile((sk/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/corpus/reader/ipipan.pytraw>scCs%|s|jƒ}n|j|dƒS(Ntchannel(Rt _parse_header(RR((sk/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/corpus/reader/ipipan.pytchannelsHscCs%|s|jƒ}n|j|dƒS(Ntdomain(RR(RR((sk/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/corpus/reader/ipipan.pytdomainsMscCsA|s|jƒ}ng|j|dƒD]}|j|ƒ^q(S(NtkeyTerm(RRt _map_category(RRtcat((sk/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/corpus/reader/ipipan.pyt categoriesRscCs|dk r3|dk r3|dk r3tdƒ‚n|dkrd|dkrd|dkrdtj|ƒSt|tjƒr‚|g}nt|tjƒr |g}nt|tjƒr¾|g}n|rÔ|jd|ƒS|rê|jd|ƒS|jd|d|jƒSdS(NsNYou can specify only one of channels, domains and categories parameter at onceRR R"tmap( Rt ValueErrorRRt isinstanceRt string_typest_list_morph_files_byR#(RRR!R%((sk/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/corpus/reader/ipipan.pyRXs$      c KsAtg|j|ƒD]'}|j|dtjdt|^qƒS(NtmodeR(RRt_viewtIPIPANCorpusViewt SENTS_MODEtFalse(RRR R((sk/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/corpus/reader/ipipan.pytsentsnsc KsAtg|j|ƒD]'}|j|dtjdt|^qƒS(NR+R(RRR,R-t PARAS_MODER/(RRR R((sk/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/corpus/reader/ipipan.pytparastscKs8tg|j|ƒD]}|j|dt|^qƒS(NR(RRR,R/(RRR R((sk/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/corpus/reader/ipipan.pytwordszscKs;tg|j|ƒD]!}|j|dtj|^qƒS(NR+(RRR,R-R.(RRR R((sk/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/corpus/reader/ipipan.pyt tagged_sentsscKs;tg|j|ƒD]!}|j|dtj|^qƒS(NR+(RRR,R-R1(RRR R((sk/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/corpus/reader/ipipan.pyt tagged_paras…scKs2tg|j|ƒD]}|j||^qƒS(N(RRR,(RRR R((sk/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/corpus/reader/ipipan.pyt tagged_words‹scCs g|j|ƒD] }|^qS(N(tabspaths(RRtf((sk/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/corpus/reader/ipipan.pyRscCs,g|j|ƒD]}|jddƒ^qS(Ns morph.xmls header.xml(Rtreplace(RRR8((sk/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/corpus/reader/ipipan.pyt_list_header_files“scCs]tƒ}xG|j|ƒD]6}|j||ƒ}x|D]}|j|ƒq8WqWt|ƒS(N(tsetR:t_get_tagtaddtlist(RRttagtvaluesR8t values_listtv((sk/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/corpus/reader/ipipan.pyR—s   c Cs¥|jƒ}tƒ}xƒ|D]{}|j|ƒjddƒ}|j||ƒ}xE|D]=} |dk rw|| ƒ} n| |krV|j|ƒqVqVWqWt|ƒS(Ns morph.xmls header.xml(RR;tabspathR9R<RR=R>( RR?R@R&Rt ret_fileidsR8tfpRAtvalue((sk/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/corpus/reader/ipipan.pyR*Ÿs      cCs£g}t|dƒ}|jƒ}WdQXd}xltrž|jd||ƒ}|dkr_|S|jd|d|ƒ}|j||t|ƒd|!ƒq3WdS(NRiti(RRtTruetfindRtlen(RR8R?RRtheaderttag_endttag_pos((sk/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/corpus/reader/ipipan.pyR<¬s  cCs/|jdƒ}|dkr|S||dSdS(NRHiÿÿÿÿi(RJ(RR$tpos((sk/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/corpus/reader/ipipan.pyR#·s c Ks<|jdtƒ}|jddƒ}|jdtƒ}|jdtƒ}|jdtƒ}|jdtƒ}|jdtƒ} |jd tƒ} t|ƒdkr»td |jƒƒ‚n| rØ| rØtd ƒ‚n| r|só| só| rtd ƒ‚nt|d|d|d|d|d|d|d| d | ƒS( NRR+it simplify_tagstone_tagt disamb_onlytappend_no_spacet append_spacetreplace_xmlentitiessUnexpected arguments: %ss;You cannot specify both one_tag=False and disamb_only=Falses[You cannot specify simplify_tags, one_tag or disamb_only with functions other than tagged_*(RRIR/RKR'tkeysR-( RtfilenameR RR+RPRQRRRSRTRU((sk/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/corpus/reader/ipipan.pyR,¾s(  N(t__name__t __module__t__doc__RRRRR!R%RRR0R2R3R4R5R6RR:RR*R<R#R,(((sk/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/corpus/reader/ipipan.pyRs4"        R-cBsJeZdZdZdZdd„Zd„Zd„Zd„Zd„Z RS(iiicKsÚtj||d|dƒt|_d|_|jdtƒ|_|jdtƒ|_ |jdt j ƒ|_ |jdtƒ|_ |jdtƒ|_|jdtƒ|_|jdtƒ|_|jd tƒ|_dS( NiRRRR+RPRQRSRTRU(RRRR/t in_sentencetpositionRRIt show_tagsRRR-t WORDS_MODER+RPRQRSRTRU(RRWtstartposR ((sk/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/corpus/reader/ipipan.pyRàs  c Cs¡g}g}t}t}tƒ}|j|ƒ}xjtrœt|ƒdkrj|j|ƒ|j|ƒ}n|dgkrŠ| s†t‚gS|jƒ}|jt|ƒd7_|j dƒrÊt|_ q3|j dƒrÜq3|j dƒr/|j r|r| r|j |ƒnt}t}d} tƒ}q3|j dƒrÞ|j rµt|_ |j|ƒ|j |jkrv|gS|j |jkr¥|j r¡|j |ƒn|S|j|ƒq™|j |jkr™|j|ƒ|gSq3|j dƒr'|dd !} |jr™| jd d ƒjd d ƒ} q™q3|j dƒr‹|j sU|jdƒdkr™||jdƒd|jdƒ!} |j| ƒq™q3|j dƒr1|jr!|jrÕg|D]} | jdƒd^q³}n|j sé|j r|j| t|ƒfƒq.|j| |jƒfƒq™|j| ƒq3|j dƒr‡|j rRt}n|jr™|jrt|jdƒq„|jdƒq™q3|j dƒr3q3q3WdS(NiRssno-spacess  Ã