ó
<¿CVc @ s† d Z d d l m Z d d l m Z d d l Td d l Td d l Td „ Z e j d e
f d „ ƒ Yƒ Z d e f d „ ƒ YZ
d
S( u
Corpus reader for the Recognizing Textual Entailment (RTE) Challenge Corpora.
The files were taken from the RTE1, RTE2 and RTE3 datasets and the files
were regularized.
Filenames are of the form rte*_dev.xml and rte*_test.xml. The latter are the
gold standard annotated files.
Each entailment corpus is a list of 'text'/'hypothesis' pairs. The following
example is taken from RTE3::
The sale was made to pay Yukos' US$ 27.5 billion tax bill,
Yuganskneftegaz was originally sold for US$ 9.4 billion to a little known
company Baikalfinansgroup which was later bought by the Russian
state-owned oil company Rosneft .
Baikalfinansgroup was sold to Rosneft.
In order to provide globally unique IDs for each pair, a new attribute
``challenge`` has been added to the root element ``entailment-corpus`` of each
file, taking values 1, 2 or 3. The GID is formatted 'm-n', where 'm' is the
challenge number and 'n' is the pair ID.
iÿÿÿÿ( t unicode_literals( t compat( t *c C s0 i d d 6d d 6d d 6d d 6} | | j ƒ S( uí
Normalize the string value in an RTE pair's ``value`` or ``entailment``
attribute as an integer (1, 0).
:param value_string: the label used to classify a text/hypothesis pair
:type value_string: str
:rtype: int
i u TRUEi u FALSEu YESu NO( t upper( t value_stringt valdict( ( sh /private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/corpus/reader/rte.pyt norm* s
t RTEPairc B s5 e Z d Z d d d d d d d d „ Z d „ Z RS( uø
Container for RTE text-hypothesis pairs.
The entailment relation is signalled by the ``value`` attribute in RTE1, and by
``entailment`` in RTE2 and RTE3. These both get mapped on to the ``entailment``
attribute of this class.
c C s | | _ | j d | _ d | j | j f | _ | d j | _ | d j | _ d | j k rz t | j d ƒ | _ n1 d | j k r¢ t | j d ƒ | _ n | | _ d | j k rÍ | j d | _ n | | _ d | j k rø | j d | _ n | | _ d S(
uË
:param challenge: version of the RTE challenge (i.e., RTE1, RTE2 or RTE3)
:param id: identifier for the pair
:param text: the text component of the pair
:param hyp: the hypothesis component of the pair
:param value: classification label for the pair
:param task: attribute for the particular NLP task that the data was drawn from
:param length: attribute for the length of the text of the pair
u idu %s-%si i u valueu
entailmentu tasku lengthN(
t challenget attribt idt gidt textt hypR t valuet taskt length( t selft pairR R
R R
R R R ( ( sh /private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/corpus/reader/rte.pyt __init__C s c C s, | j r d | j | j f Sd | j Sd S( Nu u ( R R
( R ( ( sh /private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/corpus/reader/rte.pyt __repr__c s N( t __name__t
__module__t __doc__t NoneR R ( ( ( sh /private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/corpus/reader/rte.pyR : s t RTECorpusReaderc B s e Z d Z d „ Z d „ Z RS( u¼
Corpus reader for corpora in RTE challenges.
This is just a wrapper around the XMLCorpusReader. See module docstring above for the expected
structure of input documents.
c C sW y | j d } Wn t k
r* d } n Xg | j d ƒ D] } t | d | ƒ^ q; S( u÷
Map the XML input into an RTEPair.
This uses the ``getiterator()`` method from the ElementTree package to
find all the ```` elements.
:param doc: a parsed XML document
:rtype: list(RTEPair)
u challengeu pairR N( R t KeyErrorR t getiteratorR ( R t docR R ( ( sh /private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/corpus/reader/rte.pyt _read_etreer s
c C sM t | t j ƒ r | g } n t g | D] } | j | j | ƒ ƒ ^ q( ƒ S( u¥
Build a list of RTEPairs from a RTE corpus.
:param fileids: a list of RTE corpus fileids
:type: list
:rtype: list(RTEPair)
( t
isinstanceR t string_typest concatR t xml( R t fileidst fileid( ( sh /private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/corpus/reader/rte.pyt pairs„ s ( R R R R R$ ( ( ( sh /private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/corpus/reader/rte.pyR j s N( R t
__future__R t nltkR t nltk.corpus.reader.utilt nltk.corpus.reader.apit nltk.corpus.reader.xmldocsR t python_2_unicode_compatiblet objectR t XMLCorpusReaderR ( ( ( sh /private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/corpus/reader/rte.pyt " s
/