ó <¿CVc@s¿dZddlZddlTddlTejdƒZejdƒZejdƒZejdƒZejdƒZ ejd ƒZ ejd ƒZ d e fd „ƒYZ d efd„ƒYZdS(sÎ CorpusReader for the Comparative Sentence Dataset. - Comparative Sentence Dataset information - Annotated by: Nitin Jindal and Bing Liu, 2006. Department of Computer Sicence University of Illinois at Chicago Contact: Nitin Jindal, njindal@cs.uic.edu Bing Liu, liub@cs.uic.edu (http://www.cs.uic.edu/~liub) Distributed with permission. Related papers: - Nitin Jindal and Bing Liu. "Identifying Comparative Sentences in Text Documents". Proceedings of the ACM SIGIR International Conference on Information Retrieval (SIGIR-06), 2006. - Nitin Jindal and Bing Liu. "Mining Comprative Sentences and Relations". Proceedings of Twenty First National Conference on Artificial Intelligence (AAAI-2006), 2006. - Murthy Ganapathibhotla and Bing Liu. "Mining Opinions in Comparative Sentences". Proceedings of the 22nd International Conference on Computational Linguistics (Coling-2008), Manchester, 18-22 August, 2008. iÿÿÿÿN(t*s^\*+$s s s ss(\d)_((?:[\.\w\s/-](?!\d_))+)s\((?!.*\()(.*)\)$t ComparisoncBs2eZdZddddddd„Zd„ZRS(sN A Comparison represents a comparative sentence and its constituents. cCs:||_||_||_||_||_||_dS(s^ :param text: a string (optionally tokenized) containing a comparation. :param comp_type: an integer defining the type of comparison expressed. Values can be: 1 (Non-equal gradable), 2 (Equative), 3 (Superlative), 4 (Non-gradable). :param entity_1: the first entity considered in the comparison relation. :param entity_2: the second entity considered in the comparison relation. :param feature: the feature considered in the comparison relation. :param keyword: the word or phrase which is used for that comparative relation. N(ttextt comp_typetentity_1tentity_2tfeaturetkeyword(tselfRRRRRR((sv/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/corpus/reader/comparative_sents.pyt__init__6s     cCs.dj|j|j|j|j|j|jƒS(Ns]Comparison(text="{}", comp_type={}, entity_1="{}", entity_2="{}", feature="{}", keyword="{}")(tformatRRRRRR(R((sv/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/corpus/reader/comparative_sents.pyt__repr__Is N(t__name__t __module__t__doc__tNoneR R (((sv/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/corpus/reader/comparative_sents.pyR2s t ComparativeSentencesCorpusReadercBs›eZdZeZeƒddd„Zdd„Zdd„Z d„Z dd„Z d„Z dd„Z dd „Zd „Zd „Zd „Zd „ZRS(sf Reader for the Comparative Sentence Dataset by Jindal and Liu (2006). >>> from nltk.corpus import comparative_sentences >>> comparison = comparative_sentences.comparisons()[0] >>> comparison.text ['its', 'fast-forward', 'and', 'rewind', 'work', 'much', 'more', 'smoothly', 'and', 'consistently', 'than', 'those', 'of', 'other', 'models', 'i', "'ve", 'had', '.'] >>> comparison.entity_2 'models' >>> (comparison.feature, comparison.keyword) ('rewind', 'more') >>> len(comparative_sentences.comparisons()) 853 tutf8cCs,tj||||ƒ||_||_dS(s¶ :param root: The root directory for this corpus. :param fileids: a list or regexp specifying the fileids in this corpus. :param word_tokenizer: tokenizer for breaking sentences or paragraphs into words. Default: `WhitespaceTokenizer` :param sent_tokenizer: tokenizer for breaking paragraphs into sentences. :param encoding: the encoding that should be used to read the corpus. N(t CorpusReaderR t_word_tokenizert_sent_tokenizer(Rtroottfileidstword_tokenizertsent_tokenizertencoding((sv/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/corpus/reader/comparative_sents.pyR as  cCs€|dkr|j}nt|tjƒr6|g}ntg|j|ttƒD]*\}}}|j||j d|ƒ^qOƒS(s Return all comparisons in the corpus. :param fileids: a list or regexp specifying the ids of the files whose comparisons have to be returned. :return: the given file(s) as a list of Comparison objects. :rtype: list(Comparison) RN( Rt_fileidst isinstancetcompatt string_typestconcattabspathstTruet CorpusViewt_read_comparison_block(RRtpathtenctfileid((sv/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/corpus/reader/comparative_sents.pyt comparisonsps   cCs{tg|j|ttƒD]*\}}}|j||jd|ƒ^qƒ}tg|D]}|rV|jƒ^qVƒ}|S(s& Return a set of all keywords used in the corpus. :param fileids: a list or regexp specifying the ids of the files whose keywords have to be returned. :return: the set of keywords and comparative phrases used in the corpus. :rtype: set(str) R(RRR R!t_read_keyword_blocktsettlower(RRR#R$R%t all_keywordsRt keywords_set((sv/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/corpus/reader/comparative_sents.pytkeywords€s F+cCshg}|jdƒjƒ}xF|jdƒD]5}| s+|jdƒrMq+n|j|jƒƒq+W|S(s‚ Return the list of words and constituents considered as clues of a comparison (from listOfkeywords.txt). slistOfkeywords.txts s//(topentreadtsplitt startswithtappendtstrip(RR,traw_texttline((sv/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/corpus/reader/comparative_sents.pytkeywords_readmescCs_|dkr|j}nt|tƒr3|g}ntg|D]}|j|ƒjƒ^q=ƒS(sÊ :param fileids: a list or regexp specifying the fileids that have to be returned as a raw string. :return: the given file(s) as a single string. :rtype: str N(RRRRRR-R.(RRtf((sv/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/corpus/reader/comparative_sents.pytraws    cCs|jdƒjƒS(s@ Return the contents of the corpus readme file. s README.txt(R-R.(R((sv/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/corpus/reader/comparative_sents.pytreadmeªscCsJtg|j|ttƒD]*\}}}|j||jd|ƒ^qƒS(sc Return all sentences in the corpus. :param fileids: a list or regexp specifying the ids of the files whose sentences have to be returned. :return: all sentences of the corpus as lists of tokens (or as plain strings, if no word tokenizer is specified). :rtype: list(list(str)) or list(str) R(RRR R!t_read_sent_block(RRR#R$R%((sv/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/corpus/reader/comparative_sents.pytsents°s cCsJtg|j|ttƒD]*\}}}|j||jd|ƒ^qƒS(s) Return all words and punctuation symbols in the corpus. :param fileids: a list or regexp specifying the ids of the files whose words have to be returned. :return: the given file(s) as a list of words and punctuation symbols. :rtype: list(str) R(RRR R!t_read_word_block(RRR#R$R%((sv/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/corpus/reader/comparative_sents.pytwords½s cCsxtr|jƒ}|sgStjt|ƒ}|rtjt|ƒ}tjt|ƒ}|jƒjƒ}|jr‹|jj |ƒ}n|jƒg}|r²x|D]}t tj d|ƒj dƒƒ} t d|d| ƒ} |jƒ}tj|ƒ} | ryxq| D]f\} } | dkr6| jƒ| _q | dkrT| jƒ| _q | dkr | jƒ| _q q Wntj|ƒ}|rž|d| _n|j| ƒq¨Wn|rxT|D]I}t tj d|ƒj dƒƒ} t d|d| ƒ} |j| ƒq¿Wn|SqWdS( Ns iRRt1t2t3i(R treadlinetretfindallt COMPARISONtGRAD_COMPARISONtNON_GRAD_COMPARISONR2RttokenizetinttmatchtgroupRtENTITIES_FEATSRRRtKEYWORDRR1(RtstreamR4tcomparison_tagstgrad_comparisonstnon_grad_comparisonstcomparison_texttcomparison_bundletcompRt comparisontentities_featstcodet entity_featR((sv/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/corpus/reader/comparative_sents.pyR"ÊsJ     !     !cCs4g}x'|j|ƒD]}|j|jƒqW|S(N(R"R1R(RRLR,RS((sv/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/corpus/reader/comparative_sents.pyR'ûscCsèxátrã|jƒ}tjt|ƒr\x,trU|jƒ}tjt|ƒr*Pq*q*Wqntjt|ƒ rtj|ƒ rtjt|ƒ r|j rÊg|j j |ƒD]}|j j |ƒ^q®S|j j |ƒgSqqWdS(N( R R@RARHtSTARSRBRCRJtCLOSE_COMPARISONRRFR(RRLR4tsent((sv/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/corpus/reader/comparative_sents.pyR9s    # ,cCs1g}x$|j|ƒD]}|j|ƒqW|S(N(R9textend(RRLR<RY((sv/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/corpus/reader/comparative_sents.pyR;sN(R R RtStreamBackedCorpusViewR!tWhitespaceTokenizerRR R&R,R5R7R8R:R<R"R'R9R;(((sv/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/corpus/reader/comparative_sents.pyRNs    1  (RRAtnltk.corpus.reader.apit nltk.tokenizetcompileRWRCRXRDRERJRKtobjectRRR(((sv/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/corpus/reader/comparative_sents.pyt#s