ó
<¿CVc @ s¿ d Z d d l Z d d l Td d l Te j d ƒ Z e j d ƒ Z e j d ƒ Z e j d ƒ Z e j d ƒ Z e j d ƒ Z
e j d
ƒ Z d e f d „ ƒ YZ
d
e f d „ ƒ YZ d S( sÎ
CorpusReader for the Comparative Sentence Dataset.
- Comparative Sentence Dataset information -
Annotated by: Nitin Jindal and Bing Liu, 2006.
Department of Computer Sicence
University of Illinois at Chicago
Contact: Nitin Jindal, njindal@cs.uic.edu
Bing Liu, liub@cs.uic.edu (http://www.cs.uic.edu/~liub)
Distributed with permission.
Related papers:
- Nitin Jindal and Bing Liu. "Identifying Comparative Sentences in Text Documents".
Proceedings of the ACM SIGIR International Conference on Information Retrieval
(SIGIR-06), 2006.
- Nitin Jindal and Bing Liu. "Mining Comprative Sentences and Relations".
Proceedings of Twenty First National Conference on Artificial Intelligence
(AAAI-2006), 2006.
- Murthy Ganapathibhotla and Bing Liu. "Mining Opinions in Comparative Sentences".
Proceedings of the 22nd International Conference on Computational Linguistics
(Coling-2008), Manchester, 18-22 August, 2008.
iÿÿÿÿN( t *s ^\*+$s s s
s s (\d)_((?:[\.\w\s/-](?!\d_))+)s \((?!.*\()(.*)\)$t
Comparisonc B s2 e Z d Z d d d d d d d „ Z d „ Z RS( sN
A Comparison represents a comparative sentence and its constituents.
c C s: | | _ | | _ | | _ | | _ | | _ | | _ d S( s^
:param text: a string (optionally tokenized) containing a comparation.
:param comp_type: an integer defining the type of comparison expressed.
Values can be: 1 (Non-equal gradable), 2 (Equative), 3 (Superlative),
4 (Non-gradable).
:param entity_1: the first entity considered in the comparison relation.
:param entity_2: the second entity considered in the comparison relation.
:param feature: the feature considered in the comparison relation.
:param keyword: the word or phrase which is used for that comparative relation.
N( t textt comp_typet entity_1t entity_2t featuret keyword( t selfR R R R R R ( ( sv /private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/corpus/reader/comparative_sents.pyt __init__6 s c C s. d j | j | j | j | j | j | j ƒ S( Ns] Comparison(text="{}", comp_type={}, entity_1="{}", entity_2="{}", feature="{}", keyword="{}")( t formatR R R R R R ( R ( ( sv /private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/corpus/reader/comparative_sents.pyt __repr__I s N( t __name__t
__module__t __doc__t NoneR R ( ( ( sv /private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/corpus/reader/comparative_sents.pyR 2 s t ComparativeSentencesCorpusReaderc B s› e Z d Z e Z e ƒ d d d „ Z d d „ Z d d „ Z d „ Z
d d „ Z d „ Z d d „ Z
d d „ Z d
„ Z d „ Z d „ Z d
„ Z RS( sf
Reader for the Comparative Sentence Dataset by Jindal and Liu (2006).
>>> from nltk.corpus import comparative_sentences
>>> comparison = comparative_sentences.comparisons()[0]
>>> comparison.text
['its', 'fast-forward', 'and', 'rewind', 'work', 'much', 'more', 'smoothly',
'and', 'consistently', 'than', 'those', 'of', 'other', 'models', 'i', "'ve",
'had', '.']
>>> comparison.entity_2
'models'
>>> (comparison.feature, comparison.keyword)
('rewind', 'more')
>>> len(comparative_sentences.comparisons())
853
t utf8c C s, t j | | | | ƒ | | _ | | _ d S( s¶
:param root: The root directory for this corpus.
:param fileids: a list or regexp specifying the fileids in this corpus.
:param word_tokenizer: tokenizer for breaking sentences or paragraphs
into words. Default: `WhitespaceTokenizer`
:param sent_tokenizer: tokenizer for breaking paragraphs into sentences.
:param encoding: the encoding that should be used to read the corpus.
N( t CorpusReaderR t _word_tokenizert _sent_tokenizer( R t roott fileidst word_tokenizert sent_tokenizert encoding( ( sv /private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/corpus/reader/comparative_sents.pyR a s c C s€ | d k r | j } n t | t j ƒ r6 | g } n t g | j | t t ƒ D]* \ } } } | j | | j d | ƒ^ qO ƒ S( s
Return all comparisons in the corpus.
:param fileids: a list or regexp specifying the ids of the files whose
comparisons have to be returned.
:return: the given file(s) as a list of Comparison objects.
:rtype: list(Comparison)
R N(
R t _fileidst
isinstancet compatt string_typest concatt abspathst Truet
CorpusViewt _read_comparison_block( R R t patht enct fileid( ( sv /private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/corpus/reader/comparative_sents.pyt comparisonsp s c C s{ t g | j | t t ƒ D]* \ } } } | j | | j d | ƒ^ q ƒ } t g | D] } | rV | j ƒ ^ qV ƒ } | S( s&
Return a set of all keywords used in the corpus.
:param fileids: a list or regexp specifying the ids of the files whose
keywords have to be returned.
:return: the set of keywords and comparative phrases used in the corpus.
:rtype: set(str)
R ( R R R R! t _read_keyword_blockt sett lower( R R R# R$ R% t all_keywordsR t keywords_set( ( sv /private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/corpus/reader/comparative_sents.pyt keywords€ s F+c C sh g } | j d ƒ j ƒ } xF | j d ƒ D]5 } | s+ | j d ƒ rM q+ n | j | j ƒ ƒ q+ W| S( s‚
Return the list of words and constituents considered as clues of a
comparison (from listOfkeywords.txt).
s listOfkeywords.txts
s //( t opent readt splitt
startswitht appendt strip( R R, t raw_textt line( ( sv /private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/corpus/reader/comparative_sents.pyt keywords_readme s c C s_ | d k r | j } n t | t ƒ r3 | g } n t g | D] } | j | ƒ j ƒ ^ q= ƒ S( sÊ
:param fileids: a list or regexp specifying the fileids that have to be
returned as a raw string.
:return: the given file(s) as a single string.
:rtype: str
N( R R R R R R- R. ( R R t f( ( sv /private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/corpus/reader/comparative_sents.pyt raw s
c C s | j d ƒ j ƒ S( s@
Return the contents of the corpus readme file.
s
README.txt( R- R. ( R ( ( sv /private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/corpus/reader/comparative_sents.pyt readmeª s c C sJ t g | j | t t ƒ D]* \ } } } | j | | j d | ƒ^ q ƒ S( sc
Return all sentences in the corpus.
:param fileids: a list or regexp specifying the ids of the files whose
sentences have to be returned.
:return: all sentences of the corpus as lists of tokens (or as plain
strings, if no word tokenizer is specified).
:rtype: list(list(str)) or list(str)
R ( R R R R! t _read_sent_block( R R R# R$ R% ( ( sv /private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/corpus/reader/comparative_sents.pyt sents° s
c C sJ t g | j | t t ƒ D]* \ } } } | j | | j d | ƒ^ q ƒ S( s)
Return all words and punctuation symbols in the corpus.
:param fileids: a list or regexp specifying the ids of the files whose
words have to be returned.
:return: the given file(s) as a list of words and punctuation symbols.
:rtype: list(str)
R ( R R R R! t _read_word_block( R R R# R$ R% ( ( sv /private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/corpus/reader/comparative_sents.pyt words½ s c C s xt r| j ƒ } | s g St j t | ƒ } | r t j t | ƒ } t j t | ƒ } | j ƒ j ƒ } | j r‹ | j j | ƒ } n | j ƒ g } | r²x| D]} t
t j d | ƒ j d ƒ ƒ } t
d | d | ƒ }
| j ƒ } t j | ƒ } | ryxq | D]f \ } }
| d k r6|
j ƒ |
_ q| d k rT|
j ƒ |
_ q| d k r|
j ƒ |
_ qqWn t j | ƒ } | rž| d |
_ n | j |
ƒ q¨ Wn | rxT | D]I } t
t j d | ƒ j d ƒ ƒ } t
d | d | ƒ }
| j |
ƒ q¿Wn | Sq Wd S( Ns i R R t 1t 2t 3i ( R t readlinet ret findallt
COMPARISONt GRAD_COMPARISONt NON_GRAD_COMPARISONR2 R t tokenizet intt matcht groupR t ENTITIES_FEATSR R R t KEYWORDR R1 ( R t streamR4 t comparison_tagst grad_comparisonst non_grad_comparisonst comparison_textt comparison_bundlet compR t
comparisont entities_featst codet entity_featR ( ( sv /private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/corpus/reader/comparative_sents.pyR" Ê sJ
!
!c C s4 g } x' | j | ƒ D] } | j | j ƒ q W| S( N( R" R1 R ( R RL R, RS ( ( sv /private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/corpus/reader/comparative_sents.pyR' û s c C sè xá t rã | j ƒ } t j t | ƒ r\ x, t rU | j ƒ } t j t | ƒ r* Pq* q* Wq n t j t | ƒ r t j | ƒ r t j t | ƒ r | j rÊ g | j j
| ƒ D] } | j j
| ƒ ^ q® S| j j
| ƒ g Sq q Wd S( N( R R@ RA RH t STARSRB RC RJ t CLOSE_COMPARISONR RF R ( R RL R4 t sent( ( sv /private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/corpus/reader/comparative_sents.pyR9 s # ,c C s1 g } x$ | j | ƒ D] } | j | ƒ q W| S( N( R9 t extend( R RL R<