ó <ŋCVc@sĻdZddlZddlZddlmZddlmZddlTddlTej dƒZ ej dƒZ ej dƒZ ej d ƒZ d efd „ƒYZdS( sĖ Sinica Treebank Corpus Sample http://rocling.iis.sinica.edu.tw/CKIP/engversion/treebank.htm 10,000 parsed sentences, drawn from the Academia Sinica Balanced Corpus of Modern Chinese. Parse tree notation is based on Information-based Case Grammar. Tagset documentation is available at http://www.sinica.edu.tw/SinicaCorpus/modern_e_wordtype.html Language and Knowledge Processing Group, Institute of Information Science, Academia Sinica It is distributed with the Natural Language Toolkit under the terms of the Creative Commons Attribution-NonCommercial-ShareAlike License [http://creativecommons.org/licenses/by-nc-sa/2.5/]. References: Feng-Yi Chen, Pi-Fang Tsai, Keh-Jiann Chen, and Chu-Ren Huang (1999) The Construction of Sinica Treebank. Computational Linguistics and Chinese Language Processing, 4, pp 87-104. Huang Chu-Ren, Keh-Jiann Chen, Feng-Yi Chen, Keh-Jiann Chen, Zhao-Ming Gao, and Kuang-Yu Chen. 2000. Sinica Treebank: Design Criteria, Annotation Guidelines, and On-line Interface. Proceedings of 2nd Chinese Language Processing Workshop, Association for Computational Linguistics. Chen Keh-Jiann and Yu-Ming Hsieh (2004) Chinese Treebanks and Grammar Extraction, Proceedings of IJCNLP-04, pp560-565. iĸĸĸĸN(t sinica_parse(tmap_tag(t*s^#\S+\ss (?<=\))#.*$s:([^:()|]+):([^:()|]+)s:[^:()|]+:([^:()|]+)tSinicaTreebankCorpusReadercBs5eZdZd„Zd„Zdd„Zd„ZRS(s) Reader for the sinica treebank. cCs7|jƒ}tjd|ƒ}tjd|ƒ}|gS(Nt(treadlinet IDENTIFIERtsubtAPPENDIX(tselftstreamtsent((st/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/corpus/reader/sinica_treebank.pyt _read_block;s cCs t|ƒS(N(R(R R ((st/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/corpus/reader/sinica_treebank.pyt_parseAscCs~gtj|ƒD]\}}||f^q}|rz||jkrzg|D]'\}}|t|j||ƒf^qJ}n|S(N(tTAGWORDtfindallt_tagsetR(R R ttagsettttwt tagged_sent((st/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/corpus/reader/sinica_treebank.pyt_tagDs.7cCs tj|ƒS(N(tWORDR(R R ((st/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/corpus/reader/sinica_treebank.pyt_wordJsN(t__name__t __module__t__doc__R R tNoneRR(((st/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/corpus/reader/sinica_treebank.pyR7s    (Rtostret nltk.treeRtnltk.tagRtnltk.corpus.reader.utiltnltk.corpus.reader.apitcompileRRRRtSyntaxCorpusReaderR(((st/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/corpus/reader/sinica_treebank.pyt's