ó <¿CVc@s²dZddlmZddlZddlmZd„Zd„Zdefd„ƒYZ d „Z e d „Z d „Z d „Z d „Zedkr®e ƒe ƒeƒndS(sˆ Simple classifier for RTE corpus. It calculates the overlap in words and named entities between text and hypothesis, and also whether there are words / named entities in the hypothesis which fail to occur in the text, since this is an indicator that the hypothesis is more informative than (i.e not entailed by) the text. TO DO: better Named Entity classification TO DO: add lemmatization iÿÿÿÿ(tprint_functionN(taccuracycCs |jƒs|jƒrtStS(sj This just assumes that words in all caps or titles are named entities. :type token: str (tistitletisuppertTruetFalse(ttoken((sl/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/classify/rte_classify.pytnescCs8tjjj|dtjjjƒ}|dk r4|S|S(sA Use morphy from WordNet to find the base form of verbs. tposN(tnltktcorpustwordnettmorphytVERBtNone(twordtlemma((sl/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/classify/rte_classify.pyt lemmatize#s$ tRTEFeatureExtractorcBs5eZdZeed„Zed„Zed„ZRS(s™ This builds a bag of words for both the text and the hypothesis after throwing away some stopwords, then calculates overlap and difference. csŒ||_tddddddddd d d d d ddgƒ|_tddddddgƒ|_ddlm}|dƒ}|j|jƒ|_|j|j ƒ|_ t|jƒ|_ t|j ƒ|_ ˆrt‡fd†|jDƒƒ|_ t‡fd†|j Dƒƒ|_ n|jrO|j |j|_ |j |j|_ n|j |j @|_ |j |j |_|j |j |_dS(s­ :param rtepair: a ``RTEPair`` from which features should be extracted :param stop: if ``True``, stopwords are thrown away. :type stop: bool tatthetitttheytoftinttotisthavetaretweretandtveryt.t,tnotnottnevertfailedtrejectedtdeniediÿÿÿÿ(tRegexpTokenizers([A-Z]\.)+|\w+|\$[\d\.]+c3s|]}ˆ|ƒVqdS(N((t.0R(R(sl/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/classify/rte_classify.pys Isc3s|]}ˆ|ƒVqdS(N((R)R(R(sl/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/classify/rte_classify.pys JsN(tstoptsett stopwordstnegwordst nltk.tokenizeR(ttokenizettextt text_tokensthypt hyp_tokenst text_wordst hyp_wordst_overlapt _hyp_extrat _txt_extra(tselftrtepairR*RR(t tokenizer((Rsl/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/classify/rte_classify.pyt__init__1s( ! "% cCs‡td„|jDƒƒ}|dkr?|r;td|ƒn|S|dkrs|rhtd|j|ƒn|j|Std|ƒ‚dS(s° Compute the overlap between text and hypothesis. :param toktype: distinguish Named Entities from ordinary words :type toktype: 'ne' or 'word' css!|]}t|ƒr|VqdS(N(R(R)R((sl/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/classify/rte_classify.pys \sRs ne overlapRs word overlapsType not recognized:'%s'N(R+R6tprintt ValueError(R9ttoktypetdebugt ne_overlap((sl/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/classify/rte_classify.pytoverlapUs   cCsTtd„|jDƒƒ}|dkr)|S|dkr@|j|Std|ƒ‚dS(s² Compute the extraneous material in the hypothesis. :param toktype: distinguish Named Entities from ordinary words :type toktype: 'ne' or 'word' css!|]}t|ƒr|VqdS(N(R(R)R((sl/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/classify/rte_classify.pys osRRsType not recognized: '%s'N(R+R7R>(R9R?R@tne_extra((sl/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/classify/rte_classify.pyt hyp_extrahs    (t__name__t __module__t__doc__RRR<RBRD(((sl/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/classify/rte_classify.pyR,s$ cCs¸t|ƒ}i}t|d‰ss rte1_dev.xmls rte2_dev.xmls rte3_dev.xmlcss|]}||jfVqdS(N(RR(R)RS((sl/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/classify/rte_classify.pys Œss rte1_test.xmls rte2_test.xmls rte3_test.xmlsTraining classifier...sTesting classifier...sAccuracy: %6.4f(R R trtetpairsR=R(ttrainerRPttrainttestRStlabelt classifiertacc((sl/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/classify/rte_classify.pytrte_classifier…s 1  +cCsstjjjdgƒd }xP|D]H}tƒx8tt|ƒƒD]$}td|t|ƒ|fƒqCWq#WdS(Ns rte1_dev.xmlis %-15s => %s(R R RTRUR=tsortedRQ(RURStkey((sl/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/classify/rte_classify.pyt demo_featuresžs  cCsrtjjjdgƒd}t|ƒ}t|jƒt|jdƒƒt|jdƒƒt|jdƒƒdS(Ns rte3_dev.xmli!RR( R R RTRURR=R5RBRD(R:RO((sl/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/classify/rte_classify.pytdemo_feature_extractor¦s   cs‡ddl‰y ˆjdƒ‡fd†}WnDtk rry‡fd†}Wqstk rnˆjj}qsXnXˆjj|ƒdS(Niÿÿÿÿs/usr/local/bin/megamcsˆjj|dƒS(Ntmegam(tMaxentClassifierRW(tx(R (sl/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/classify/rte_classify.pyt³scsˆjj|dƒS(NtBFGS(RbRW(Rc(R (sl/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/classify/rte_classify.pyRd¶s(R t config_megamR>RbRWtclassifyR\(RV((R sl/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/classify/rte_classify.pytdemo¯s    t__main__(RGt __future__RR tnltk.classify.utilRRRtobjectRRQR\R_R`RhRE(((sl/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/classify/rte_classify.pyts  L