ó <¿CVc@s:dZddlZddlmZddlmZddlTededddÄdÅgƒZed e dd ƒZ ede dddddddƒZ ede dddddƒZede dddddƒZededgƒZededƒZededddƒZededd gdÆdd$ddƒZed%ed&dÇdd+ƒZed,ed&ddÈdÉgƒZed0ed1ƒZed2ed3ddƒZed4e d5d6ddddƒZed7ed8d9d:d;d<gƒZ ed=e!d>dd.ƒZ"ed?edddÊdËdÌgƒZ#edEedddFƒZ$edGe%dHƒZ&edIedddFƒZ'edJe(dKdddd ƒZ)edLe*dMƒZ+edNe,dOdd+ƒZ-edPe.dQddRƒZ/edSe0dTƒZ1edUe2dddddƒZ3edVe4ddWdXddƒZ5edYe dZdd[dd$dd+d\d]ƒZ6ed^e7ddWd_ddƒZ8ed`e9dadd+ƒZ:edbe!dddƒZ;edce<dddd ƒZ=edee>dfdd$ƒZ?edge@dhdd.ƒZAedieBdjdddkdldd ƒZCedmeDdndodpgƒZEedqeFdrdd ƒZGedseFdrdd ƒZHedteIdudWdudd.ƒZJedveKdwddxdd$ƒZLedyeMdd gdd.ƒZNedze7d{dddd.ƒZOed|ePd}ƒZQed~eRdKƒZSedeTd€dWd€dd+ƒZUedeVd‚dd+ƒZWedƒeXd}ƒZYed„eZd…gdddd+ƒZ[ed†eddd.ƒZ\ed‡e!dHdd ƒZ]edˆeTd‰dŠid‹gdŒ6dgdŽ6ddƒZ^ede_dHdd ƒZ`ede_d‘dd ƒZaede_d’dd ƒZbed“ecdd$ƒZded”eeƒZfed”egd•dd$ddƒZhed–eid—ƒZjed˜e d™dd$ddƒZkedšeld›dœeddžemƒdŸendd$ddƒZoed ed¡dd.ƒZped¢eqd£ƒZred¤esƒZted¥ed1dd ƒZued¦evd§d¨d̓Zwed¬exd}ƒZyed­ed®dd.ƒZzed¯e{ed°e|d±dd ƒƒZ}ed²e~d³ƒZedªe!dHddƒZ€ed´eƒZ‚edµeƒd¶d·d¸d¹„ekƒZ„edºe…dºd·d»d¼„ekƒZ†edµeƒd¶d·d¸d½„eLƒZ‡edºe…dºd·d»d¾„eLƒZˆed¿e‰dÀe}ƒZŠdÁ„Z‹eŒdÂkr*nddÄZŽdS(Îs™ NLTK corpus readers. The modules in this package provide functions that can be used to read corpus files in a variety of formats. These functions can be used to read both the corpus files that are distributed in the NLTK corpus package, and corpus files that are part of external corpora. Available Corpora ================= Please see http://www.nltk.org/nltk_data/ for a complete list. Install corpora using nltk.download(). Corpus Reader Functions ======================= Each corpus module defines one or more "corpus reader functions", which can be used to read documents from that corpus. These functions take an argument, ``item``, which is used to indicate which document should be read from the corpus: - If ``item`` is one of the unique identifiers listed in the corpus module's ``items`` variable, then the corresponding document will be loaded from the NLTK corpus package. - If ``item`` is a filename, then that file will be read. Additionally, corpus reader functions can be given lists of item names; in which case, they will return a concatenation of the corresponding documents. Corpus reader functions are named based on the type of information they return. Some common examples, and their return types, are: - words(): list of str - sents(): list of (list of str) - paras(): list of (list of (list of str)) - tagged_words(): list of (str,str) tuple - tagged_sents(): list of (list of (str,str)) - tagged_paras(): list of (list of (list of (str,str))) - chunked_sents(): list of (Tree w/ (str,str) leaves) - parsed_sents(): list of (Tree with str leaves) - parsed_paras(): list of (list of (Tree with str leaves)) - xml(): A single xml ElementTree - raw(): unprocessed corpus contents For example, to read a list of the words in the Brown Corpus, use ``nltk.corpus.brown.words()``: >>> from nltk.corpus import brown >>> print(", ".join(brown.words())) The, Fulton, County, Grand, Jury, said, ... iÿÿÿÿN(tRegexpTokenizer(tLazyCorpusLoader(t*tabcs (?!\.).*\.txttencodingtsciencetlatin_1truraltutf8talpinottagsettbrowns c[a-z]\d\dtcat_filescats.txttasciitcess_cats (?!\.).*\.tbftunknowns ISO-8859-15tcess_esptcmudicttcomtranstcomparative_sentencesslabeledSentences\.txtslatin-1t conll2000s train.txtstest.txttNPtVPtPPtwsjt conll2002s.*\.(test|train).*tLOCtPERtORGtMISCsutf-8t conll2007teuss ISO-8859-2tesptcrubadans.*\.txttdependency_treebanks.*\.dptflorestas (?!\.).*\.ptbt#t framenet_v15sfrRelation.xmlsframeIndex.xmlsfulltextIndex.xmls luIndex.xmls semTypes.xmlt gazetteerss(?!LICENSE|\.).*\.txttgenesissfinnish|french|germantswedishtcp865s.*tutf_8t gutenbergtlatin1tieers(?!README|\.).*t inauguraltindians (?!\.).*\.postipipans(?!\.).*morph\.xmltjeitas .*\.chasens knbc/corpus1s.*/KN.*seuc-jpt lin_thesauruss.*\.lspt mac_morphotmachadot cat_patterns ([a-z]*)/.*t masc_taggeds(spoken|written)/.*\.txtscategories.txttsept_t movie_reviewss (neg|pos)/.*t mte_teip5s (oana).*\.xmltnamestnkjpttnps_chats(?!README|\.).*\.xmltopinion_lexicons(\w+)\-words\.txttpl196xs [a-z]-.*\.xmlt textid_files textids.txttppattachttrainingttesttdevsettproduct_reviews_1s^(?!Readme).*\.txttproduct_reviews_2t pros_conssIntegrated(Cons|Pros)\.txttptbs/(WSJ/\d\d/WSJ_\d\d|BROWN/C[A-Z]/C[A-Z])\d\d.MRGs allcats.txttqctreuterss(training|test).*trtes (?!\.).*\.xmltsensevaltsentence_polaritysrt-polarity\.(neg|pos)t sentiwordnetsSentiWordNet_3.0.0.txtt shakespearetsinica_treebanktparsedt state_uniont stopwordst subjectivitys"(quote.tok.gt9|plot.tok.gt9)\.5000tcat_maptsubjsquote.tok.gt9.5000tobjsplot.tok.gt9.5000tswadeshtpanlex_swadeshsswadesh110/.*\.txtsswadesh207/.*\.txtt switchboardttimits.+\.tagsttoolboxs(?!.*(README|\.)).*\.(dic|txt)streebank/combineds wsj_.*\.mrgstreebank/taggeds wsj_.*\.postsent_tokenizers(?<=/\.)\s*(?![^\[]*\])tgapstpara_block_readers treebank/rawswsj_.*ttwitter_sampless.*\.jsontudhrtudhr2tuniversal_treebanks_v20s .*\.conllt columntypestignoretwordstpostverbnettwebtexts(?!README|\.).*\.txttwordnettomws.*/wn-data-.*\.tabt wordnet_ics.*\.dattycoetpropbanksprop.txtsframes/.*\.xmls verbs.txtcCstjdd|ƒS(Ns ^wsj/\d\d/R=(tretsub(tfilename((sf/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/corpus/__init__.pytss nombank.1.0snombank.1.0.wordscCstjdd|ƒS(Ns ^wsj/\d\d/R=(RpRq(Rr((sf/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/corpus/__init__.pyRsscCs |jƒS(N(tupper(Rr((sf/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/corpus/__init__.pyRs scCs |jƒS(N(Rt(Rr((sf/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/corpus/__init__.pyRsstsemcorsbrown./tagfiles/br-.*\.xmlcCsêtjƒtjƒtjƒtjƒtjƒtjƒtjƒtjƒt jƒt jƒt jƒt jƒt jƒtjƒtjƒtjƒtjƒtjƒtjƒtjƒtjƒtjƒtjƒdS(N(RtdemoR RRRR'R+R-R.R/R;RBRMRPRQRSRTR\R]ttreebankRbRjRg(((sf/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/corpus/__init__.pyRvs.                      t__main__cCsjddl}xWt|jƒD]F}t|j|dƒ}t|tƒrt|dƒr|jƒqqWdS(Niÿÿÿÿt_unload( t nltk.corpustdirtcorpustgetattrtNonet isinstancet CorpusReaderthasattrRy(tmoduletnltktnameRX((sf/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/corpus/__init__.pytteardown_module9s  (Rslatin_1(Rsutf8(sNPRR(RRRR(Rs ISO-8859-2(R sutf8(sfinnish|french|germanslatin_1(sswedishscp865(s.*sutf_8( signoreswordssignoresignorespossignoresignoresignoresignoresignore(t__doc__Rpt nltk.tokenizeRtnltk.corpus.utilRtnltk.corpus.readertPlaintextCorpusReaderRtAlpinoCorpusReaderR tCategorizedTaggedCorpusReaderR tBracketParseCorpusReaderRRtCMUDictCorpusReaderRtAlignedCorpusReaderRt ComparativeSentencesCorpusReaderRtConllChunkCorpusReaderRRtDependencyCorpusReaderRtCrubadanCorpusReaderR!R"R#tFramenetCorpusReadertframenettWordListCorpusReaderR&R'R+tIEERCorpusReaderR-R.tIndianCorpusReaderR/tIPIPANCorpusReaderR0tChasenCorpusReaderR1tKNBCorpusReadertknbctLinThesaurusCorpusReaderR2tMacMorphoCorpusReaderR3t*PortugueseCategorizedPlaintextCorpusReaderR4R6t CategorizedPlaintextCorpusReaderR9tMTECorpusReadert multext_eastR;tNKJPCorpusReaderR<tNPSChatCorpusReaderR>tOpinionLexiconCorpusReaderR?tPl196xCorpusReaderR@tPPAttachmentCorpusReaderRBtReviewsCorpusReaderRFRGtProsConsCorpusReaderRHt#CategorizedBracketParseCorpusReaderRItStringCategoryCorpusReaderRJRKtRTECorpusReaderRLtSensevalCorpusReaderRMt CategorizedSentencesCorpusReaderRNtSentiWordNetCorpusReaderROtXMLCorpusReaderRPtSinicaTreebankCorpusReaderRQRSRTRUtSwadeshCorpusReaderRYt swadesh110t swadesh207tSwitchboardCorpusReaderR[tTimitCorpusReaderR\tTimitTaggedCorpusReadert timit_taggedtToolboxCorpusReaderR]RwtChunkedCorpusReadertTruet!tagged_treebank_para_block_readerttreebank_chunkt treebank_rawtTwitterCorpusReaderRatUdhrCorpusReaderRbRctConllCorpusReadertuniversal_treebankstVerbnetCorpusReaderRiRjtWordNetCorpusReaderR€RktWordNetICCorpusReaderRmRgtYCOECorpusReaderRntPropbankCorpusReaderRotNombankCorpusReadertnombankt propbank_ptbt nombank_ptbtSemcorCorpusReaderRuRvt__name__R~R…(((sf/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/corpus/__init__.pyt<s¨                                 #