ó <¿CVc @súdZddlmZddlmZddlZddlmZidddd d d d d dg d6dddgd6dddd d d d d dddg d6Ze ddddddƒZ e ddddddƒZ d„Z d„Z ded„Zejd„Zd„Zd„Zded„Zddd d!„Zeed"„Zd#„Zd$ed%„Zd$d&„Zd'„Zd(d)„Zd*„Zd+„Ze d,kröddl!Z!dd-l"m#Z#ed.d$ƒed.d$ƒeƒeƒeƒeƒndS(/s© Code for extracting relational triples from the ieer and conll2002 corpora. Relations are stored internally as dictionaries ('reldicts'). The two serialization outputs are "rtuple" and "clause". - An rtuple is a tuple of the form ``(subj, filler, obj)``, where ``subj`` and ``obj`` are pairs of Named Entity mentions, and ``filler`` is the string of words occurring between ``sub`` and ``obj`` (with no intervening NEs). Strings are printed via ``repr()`` to circumvent locale variations in rendering utf-8 encoded strings. - A clause is an atom of the form ``relsym(subjsym, objsym)``, where the relation, subject and object have been canonicalized to single strings. iÿÿÿÿ(tprint_function(t defaultdictN(thtmlentitydefstLOCATIONt ORGANIZATIONtPERSONtDURATIONtDATEtCARDINALtPERCENTtMONEYtMEASUREtieertLOCtPERtORGt conll2002tFACILITYtGPEtacecCs%y t|SWntk r |SXdS(sF Expand an NE class name. :type type: str :rtype: str N(t short2longtKeyError(ttype((se/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/sem/relextract.pyt_expand,s  cCs%y t|SWntk r |SXdS(sJ Abbreviate an NE class name. :type type: str :rtype: str N(t long2shortR(R((se/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/sem/relextract.pyt class_abbrev7s  t cspy|j|ƒSWnXtk rk|r>|jd„|DƒƒSddlm‰|j‡fd†|DƒƒSXdS(sà Join a list into a string, turning tags tuples into tag strings or just words. :param untag: if ``True``, omit the tag from tagged input strings. :type lst: list :rtype: str css|]}|dVqdS(iN((t.0ttup((se/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/sem/relextract.pys Nsiÿÿÿÿ(t tuple2strc3s|]}ˆ|ƒVqdS(N((RR(R(se/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/sem/relextract.pys PsN(tjoint TypeErrortnltk.tagR(tlsttseptuntag((Rse/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/sem/relextract.pyt_joinCs cCs7y||jdƒSWntk r2|jdƒSXdS(s` Translate one entity to its ISO Latin value. Inspired by example from effbot.org iiN(tgroupR(tmtdefs((se/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/sem/relextract.pytdescape_entityRs  cCsXt|ddtƒ}|jƒ}tjdƒ}|jt|ƒ}|jddƒ}|S(s• Convert a list of strings into a canonical symbol. :type lst: list :return: a Unicode string without whitespace :rtype: unicode t_R#s&(\w+?);t.t(R$tTruetlowertretcompiletsubR(treplace(R!tsymtENT((se/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/sem/relextract.pytlist2symes  cCs}ddlm}g}gdg}xT|D]L}t||ƒsR|dj|ƒq)||d<|j|ƒgdg}q)W|S(sÑ Group a chunk structure into a list of 'semi-relations' of the form (list(str), ``Tree``). In order to facilitate the construction of (``Tree``, string, ``Tree``) triples, this identifies pairs whose first member is a list (possibly empty) of terminal strings, and whose second member is a ``Tree`` of the form (NE_label, terminals). :param tree: a chunk tree :return: a list of pairs (list(str), ``Tree``) :rtype: list of tuple iÿÿÿÿ(tTreeiiN(t nltk.treeR5tNonet isinstancetappend(ttreeR5t semi_relstsemi_reltdtr((se/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/sem/relextract.pyt tree2semi_relss     icCs‰g}x|t|ƒdkr„ttƒ}t|dd| ƒ|d<|ddjƒ|dÞs( t NE_CLASSESRt ValueErrorR>ttexttheadlineRStlisttfilter( R@REtdoctcorpusRWRORNtreldictst relfilter((RERWR@ROse/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/sem/relextract.pyt extract_rels°s    cCs—t|dƒ|d|dt|dƒ|dg}d}|r_|dg|}d|}n|rƒ|j|d ƒ|d }nt|ƒ}||S( sy Pretty print the reldict as an rtuple. :param reldict: a relation dictionary :type reldict: defaultdict R@RARCRERFs[%s: %r] %r [%s: %r]R?s...%r)RHs(%r...(RR9ttuple(RRR?RHtitemstformatt printargs((se/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/sem/relextract.pytrtupleæs5   cCs||d|df}d|S(s¸ Print the relation in clausal form. :param reldict: a relation dictionary :type reldict: defaultdict :param relsym: a label for the relation :type relsym: str RBRGs %s(%r, %r)((RRtrelsymRe((se/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/sem/relextract.pytclause÷sic Csddlm}|rŠyDddl}|jdƒ}|j|_|jƒ}|jdƒWqŠtk r†ddl }|j dƒqŠXnt j dƒ}t ƒt dƒt d d ƒxâ|jƒD]Ô}xË|j|ƒD]º} |rt | jƒt dƒnx‘td d | ddd|ƒD]q} t t| ddƒƒ|r y8| d| d| jf} |jd| ƒ|jƒWq‘tk rq‘Xq q WqÛWqÅW|ryG|jdƒt ƒt dƒt dƒx|D]} t | ƒqÕWWqtk rýqXndS(s. Select pairs of organizations and locations whose mentions occur with an intervening occurrence of the preposition "in". If the sql parameter is set to True, then the entity pairs are loaded into an in-memory database, and subsequently pulled out using an SQL "SELECT" query. iÿÿÿÿ(R Ns:memory:sPcreate table Locations (OrgName text, LocationName text, DocID text)s/Cannot import sqlite; sql flag will be ignored.s.*\bin\b(?!\b.+ing)s'IEER: in(ORG, LOC) -- just the clauses:t=i-iRR R`R RWRitINRARFsJinsert into Locations values (?, ?, ?)sTselect OrgName from Locations where LocationName = 'Atlanta's,Extract data from SQL table: ORGs in Atlantat-s===============s---------------(t nltk.corpusR tsqlite3tconnecttOptimizedUnicodet text_factorytcursortexecutet ImportErrortwarningstwarnR.R/RMtfileidst parsed_docstdocnoRcRjtcommitt NameError( RPtsqlR Rot connectiontcurRvRltfileR_trelRhtrow((se/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/sem/relextract.pytin_demo sT          %      c Csøddlm}d}tj|tjƒ}tƒtdƒtddƒx§|jƒD]™}x|j|ƒD]}t}}|r§t|j ƒtdƒt }}nxBt dd |d d d |ƒD]"}tt |d |d|ƒƒqÆWqmWqWWdS(Niÿÿÿÿ(R s› (.*( # assorted roles analyst| chair(wo)?man| commissioner| counsel| director| economist| editor| executive| foreman| governor| head| lawyer| leader| librarian).*)| manager| partner| president| producer| professor| researcher| spokes(wo)?man| writer| ,\sof\sthe?\s* # "X, of (the) Y" s(IEER: has_role(PER, ORG) -- raw rtuples:Rki-iRRR`R RWR?RHs===============( RnR R.R/tVERBOSERMRxRytFalseRzR,RcRh( RPR trolestROLESR€R_R?RHR((se/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/sem/relextract.pyt roles_demoHs     %cCs§ddlm}ddlm}tdƒtddƒg|jƒD].}|j|ƒD]}|j|jf^qXqE}x'|d D]}tƒtd|ƒq„WdS( Niÿÿÿÿ(R (R5sIEER: First 20 HeadlinesRki-is%s: %s( RnR R6R5RMRxRyRzR\(R R5R€R_ttreesR:((se/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/sem/relextract.pytieer_headlineszs Aic CsÐddlm}d}tj|tjƒ}tƒtdƒtddƒx|jdƒD]n}t}}|r}t}}nxHt dd |d d d |d dƒD]"}tt |dtdtƒƒq¢WqZWdS(sh Find the copula+'van' relation ('of') in the Dutch tagged training corpus from CoNLL 2002. iÿÿÿÿ(Rsù ( is/V| # 3rd sing present and was/V| # past forms of the verb zijn ('be') werd/V| # and also present wordt/V # past of worden ('become) ) .* # followed by anything van/Prep # followed by van ('of') s;Dutch CoNLL2002: van(PER, ORG) -- raw rtuples with context:Rki-s ned.trainRRR`RRWROi R?RHN( RnRR.R/R„RMt chunked_sentsR…R,RcRh(RPRtvnvtVANR_R?RHR((se/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/sem/relextract.pytconllneds    +c CsÇddlm}d}tj|tjƒ}tƒtdƒtddƒg|jdƒD]1}tdd |d d d |ƒD] }|^q|qZ}x(|d D]}tt|ddƒƒqœWtƒdS(Niÿÿÿÿ(Rs. .* ( de/SP| del/SP ) s=Spanish CoNLL2002: de(ORG, LOC) -- just the first 10 clauses:Rki-s esp.trainRR R`RRWi RitDE( RnRR.R/R„RMR‹RcRj(RtdeRR_Rtrelstr((se/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/sem/relextract.pytconllesp±s 1c CsÅtƒtdƒtddƒtjdƒ}g}xŠttjjjƒd ƒD]l\}}tj|ƒ}t dd|dd d |d d ƒ}x*|D]"}td j |t |ƒƒƒq—WqQWdS(NsB1500 Sentences from Penn Treebank, as processed by NLTK NE ChunkerRki-sC.*(chairman|president|trader|scientist|economist|analyst|partner).*iÜRRR`RRWROis {0:<5}{1}( RMR.R/t enumeratetnltkR`ttreebankt tagged_sentstne_chunkRcRfRh(tROLER‘titsentR((se/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/sem/relextract.pyt ne_chunkedÆs )$ t__main__(t relextractRP($t__doc__t __future__Rt collectionsRR.t nltk.compatRRYtdictRRRRR…R$t entitydefsR(R4R>RSR7RcRhRjR,RƒRˆRŠRŽR“Rœt__name__R•tnltk.semRž(((se/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/sem/relextract.pytsJ       6 > 2  $