є <┐CVc@s┐dZddlZddlTddlTejdГZejdГZejdГZejdГZejdГZ ejd ГZ ejd ГZdefdДГYZ d efdДГYZdS(s╬ CorpusReader for the Comparative Sentence Dataset. - Comparative Sentence Dataset information - Annotated by: Nitin Jindal and Bing Liu, 2006. Department of Computer Sicence University of Illinois at Chicago Contact: Nitin Jindal, njindal@cs.uic.edu Bing Liu, liub@cs.uic.edu (http://www.cs.uic.edu/~liub) Distributed with permission. Related papers: - Nitin Jindal and Bing Liu. "Identifying Comparative Sentences in Text Documents". Proceedings of the ACM SIGIR International Conference on Information Retrieval (SIGIR-06), 2006. - Nitin Jindal and Bing Liu. "Mining Comprative Sentences and Relations". Proceedings of Twenty First National Conference on Artificial Intelligence (AAAI-2006), 2006. - Murthy Ganapathibhotla and Bing Liu. "Mining Opinions in Comparative Sentences". Proceedings of the 22nd International Conference on Computational Linguistics (Coling-2008), Manchester, 18-22 August, 2008. i N(t*s^\*+$sss ss(\d)_((?:[\.\w\s/-](?!\d_))+)s$(?!.*\()(.*)$$t ComparisoncBs2eZdZdddddddДZdДZRS(sN A Comparison represents a comparative sentence and its constituents. cCs:||_||_||_||_||_||_dS(s^ :param text: a string (optionally tokenized) containing a comparation. :param comp_type: an integer defining the type of comparison expressed. Values can be: 1 (Non-equal gradable), 2 (Equative), 3 (Superlative), 4 (Non-gradable). :param entity_1: the first entity considered in the comparison relation. :param entity_2: the second entity considered in the comparison relation. :param feature: the feature considered in the comparison relation. :param keyword: the word or phrase which is used for that comparative relation. N(ttextt comp_typetentity_1tentity_2tfeaturetkeyword(tselfRRRRRR((sv/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/corpus/reader/comparative_sents.pyt__init__6s cCs.dj|j|j|j|j|j|jГS(Ns]Comparison(text="{}", comp_type={}, entity_1="{}", entity_2="{}", feature="{}", keyword="{}")(tformatRRRRRR(R((sv/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/corpus/reader/comparative_sents.pyt__repr__IsN(t__name__t __module__t__doc__tNoneR R(((sv/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/corpus/reader/comparative_sents.pyR2st ComparativeSentencesCorpusReadercBsЫeZdZeZeГdddДZddДZddДZ dДZ ddДZdДZddДZ dd ДZd ДZdДZdДZd ДZRS(sf Reader for the Comparative Sentence Dataset by Jindal and Liu (2006). >>> from nltk.corpus import comparative_sentences >>> comparison = comparative_sentences.comparisons()[0] >>> comparison.text ['its', 'fast-forward', 'and', 'rewind', 'work', 'much', 'more', 'smoothly', 'and', 'consistently', 'than', 'those', 'of', 'other', 'models', 'i', "'ve", 'had', '.'] >>> comparison.entity_2 'models' >>> (comparison.feature, comparison.keyword) ('rewind', 'more') >>> len(comparative_sentences.comparisons()) 853 tutf8cCs,tj||||Г||_||_dS(s╢ :param root: The root directory for this corpus. :param fileids: a list or regexp specifying the fileids in this corpus. :param word_tokenizer: tokenizer for breaking sentences or paragraphs into words. Default: `WhitespaceTokenizer` :param sent_tokenizer: tokenizer for breaking paragraphs into sentences. :param encoding: the encoding that should be used to read the corpus. N(tCorpusReaderR t_word_tokenizert_sent_tokenizer(Rtroottfileidstword_tokenizertsent_tokenizertencoding((sv/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/corpus/reader/comparative_sents.pyR as cCsА|dkr|j}nt|tjГr6|g}ntg|j|ttГD]*\}}}|j||j d|Г^qOГS(s Return all comparisons in the corpus. :param fileids: a list or regexp specifying the ids of the files whose comparisons have to be returned. :return: the given file(s) as a list of Comparison objects. :rtype: list(Comparison) RN( Rt_fileidst isinstancetcompattstring_typestconcattabspathstTruet CorpusViewt_read_comparison_block(RRtpathtenctfileid((sv/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/corpus/reader/comparative_sents.pytcomparisonsps cCs{tg|j|ttГD]*\}}}|j||jd|Г^qГ}tg|D]}|rV|jГ^qVГ}|S(s& Return a set of all keywords used in the corpus. :param fileids: a list or regexp specifying the ids of the files whose keywords have to be returned. :return: the set of keywords and comparative phrases used in the corpus. :rtype: set(str) R(RRR R!t_read_keyword_blocktsettlower(RRR#R$R%tall_keywordsRtkeywords_set((sv/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/corpus/reader/comparative_sents.pytkeywordsАs F+cCshg}|jdГjГ}xF|jdГD]5}|s+|jdГrMq+n|j|jГГq+W|S(sВ Return the list of words and constituents considered as clues of a comparison (from listOfkeywords.txt). slistOfkeywords.txts s//(topentreadtsplitt startswithtappendtstrip(RR,traw_texttline((sv/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/corpus/reader/comparative_sents.pytkeywords_readmeРscCs_|dkr|j}nt|tГr3|g}ntg|D]}|j|ГjГ^q=ГS(s╩ :param fileids: a list or regexp specifying the fileids that have to be returned as a raw string. :return: the given file(s) as a single string. :rtype: str N(RRRRRR-R.(RRtf((sv/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/corpus/reader/comparative_sents.pytrawЭs cCs|jdГjГS(s@ Return the contents of the corpus readme file. s README.txt(R-R.(R((sv/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/corpus/reader/comparative_sents.pytreadmeкscCsJtg|j|ttГD]*\}}}|j||jd|Г^qГS(sc Return all sentences in the corpus. :param fileids: a list or regexp specifying the ids of the files whose sentences have to be returned. :return: all sentences of the corpus as lists of tokens (or as plain strings, if no word tokenizer is specified). :rtype: list(list(str)) or list(str) R(RRR R!t_read_sent_block(RRR#R$R%((sv/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/corpus/reader/comparative_sents.pytsents░s cCsJtg|j|ttГD]*\}}}|j||jd|Г^qГS(s) Return all words and punctuation symbols in the corpus. :param fileids: a list or regexp specifying the ids of the files whose words have to be returned. :return: the given file(s) as a list of words and punctuation symbols. :rtype: list(str) R(RRR R!t_read_word_block(RRR#R$R%((sv/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/corpus/reader/comparative_sents.pytwords╜s cCsxtr|jГ}|sgStjt|Г}|rtjt|Г}tjt|Г}|jГjГ}|jrЛ|jj |Г}n|jГg}|r▓x|D]}t tjd|ГjdГГ} t d|d| Г} |jГ}tj|Г}|ryxq|D]f\}} |dkr6| jГ| _q|dkrT| jГ| _q|dkr| jГ| _qqWntj|Г}|rЮ|d| _n|j| ГqиWn|rxT|D]I}t tjd|ГjdГГ} t d|d| Г} |j| Гq┐Wn|SqWdS( Ns iRRt1t2t3i(R treadlinetretfindallt COMPARISONtGRAD_COMPARISONtNON_GRAD_COMPARISONR2RttokenizetinttmatchtgroupRtENTITIES_FEATSRRRtKEYWORDRR1(RtstreamR4tcomparison_tagstgrad_comparisonstnon_grad_comparisonstcomparison_texttcomparison_bundletcompRt comparisontentities_featstcodetentity_featR((sv/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/corpus/reader/comparative_sents.pyR"╩sJ ! !cCs4g}x'|j|ГD]}|j|jГqW|S(N(R"R1R(RRLR,RS((sv/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/corpus/reader/comparative_sents.pyR'√scCsшxсtrу|jГ}tjt|Гr\x,trU|jГ}tjt|Гr*Pq*q*Wqntjt|Гrtj|Гrtjt|Гr|j r╩g|j j |ГD]}|jj |Г^qоS|jj |ГgSqqWdS(N(R R@RARHtSTARSRBRCRJtCLOSE_COMPARISONRRFR(RRLR4tsent((sv/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/corpus/reader/comparative_sents.pyR9s # ,cCs1g}x$|j|ГD]}|j|ГqW|S(N(R9textend(RRLR<RY((sv/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/corpus/reader/comparative_sents.pyR;sN(RR RtStreamBackedCorpusViewR!tWhitespaceTokenizerRR R&R,R5R7R8R:R<R"R'R9R;(((sv/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/corpus/reader/comparative_sents.pyRNs 1 (RRAtnltk.corpus.reader.apit nltk.tokenizetcompileRWRCRXRDRERJRKtobjectRRR(((sv/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/corpus/reader/comparative_sents.pyt#s