ó <¿CVc@sdZddlmZmZddlmZddlmZddlm Z ddl Z ddl Z ddl m Z ddlmZmZed efd „ƒYƒZd efd „ƒYZd „Zed„Zd„Zd„Zd„ZdZdZdZedkreƒndS(u˜ Tools for reading and writing dependency trees. The input is assumed to be in Malt-TAB format (http://stp.lingfil.uu.se/~nivre/research/MaltXML.html). iÿÿÿÿ(tprint_functiontunicode_literals(t defaultdict(tchain(tpformatN(tTree(tpython_2_unicode_compatiblet string_typestDependencyGraphcBs(eZdZddeddd„Zd„Zd„Zd„Zd„Z d„Z d„Z d „Z d „Z d „Zd „Zeeddd „ƒZd„Zd„Zd„Zdeddd„Zed„Zd„Zd„Zdd„Zd„Zd„Zd„Zd„Zd„Zd„Z RS(uQ A container for the nodes and labelled edges of a dependency structure. uROOTc Csvtd„ƒ|_|jdjidd6dd6dd6ƒd |_|rr|j|d|d|d |d |ƒnd S( u¥Dependency graph. We place a dummy `TOP` node with the index 0, since the root node is often assigned 0 as its head. This also means that the indexing of the nodes corresponds directly to the Malt-TAB format, which starts at 1. If zero-based is True, then Malt-TAB-like input with node numbers starting at 0 and the root node assigned -1 (as produced by, e.g., zpar). :param str cell_separator: the cell separator. If not provided, cells are split by whitespace. :param str top_relation_label: the label by which the top relation is identified, for examlple, `ROOT`, `null` or `TOP`. cSsIi dd6dd6dd6dd6dd6dd6dd6ttƒd6dd 6S( Nuaddressuwordulemmauctagutagufeatsuheadudepsurel(tNoneRtlist(((sl/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/parse/dependencygraph.pyt9s  iuTOPuctagutaguaddresstcell_extractort zero_basedtcell_separatorttop_relation_labelN(RtnodestupdateR troott_parse(tselfttree_strR R RR((sl/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/parse/dependencygraph.pyt__init__'s   cCs|j|=dS(uw Removes the node with the given address. References to this node in others will still exist. N(R(Rtaddress((sl/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/parse/dependencygraph.pytremove_by_addressWscCslxe|jjƒD]T}g}x;|dD]/}||krI|j|ƒq'|j|ƒq'W||d>> dg = DependencyGraph( ... 'John N 2\n' ... 'loves V 0\n' ... 'Mary N 2' ... ) >>> print(dg.to_dot()) digraph G{ edge [dir=forward] node [shape=plaintext] 0 [label="0 (None)"] 0 -> 2 [label="ROOT"] 1 [label="1 (John)"] 2 [label="2 (loves)"] 2 -> 1 [label=""] 2 -> 3 [label=""] 3 [label="3 (Mary)"] } u digraph G{ uedge [dir=forward] unode [shape=plaintext] tkeycSs|dS(Nuaddress((tv((sl/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/parse/dependencygraph.pyR ¬su %s [label="%s (%s)"]uaddressuwordudepsu %s -> %s [label="%s"]u %s -> %s u }N(tsortedRRtitemsR (RtsRtreltdepsR((sl/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/parse/dependencygraph.pytto_dots  %#  $ c Cs|jƒ}y:tjddgdtjdtjdtjdtƒ}Wntk retdƒ‚nX|j|ƒ\}}|r™tdj|ƒƒ‚n|S( u7Show SVG representation of the transducer (IPython magic). >>> dg = DependencyGraph( ... 'John N 2\n' ... 'loves V 0\n' ... 'Mary N 2' ... ) >>> dg._repr_svg_().split('\n')[0] '' udotu-Tsvgtstdintstdouttstderrtuniversal_newlinesu0Cannot find the dot binary from Graphviz packageu?Cannot create svg representation by running dot from string: {}( R3t subprocesstPopentPIPEtTruetOSErrort Exceptiont communicatetformat(Rt dot_stringtprocesstoutterr((sl/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/parse/dependencygraph.pyt _repr_svg_¸s       cCs t|jƒS(N(RR(R((sl/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/parse/dependencygraph.pyt__str__×scCsdjt|jƒƒS(Nu (R?tlenR(R((sl/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/parse/dependencygraph.pyt__repr__ÚscCsWt|ƒE}g|jƒjdƒD]$}t|d|d|d|ƒ^q%SWdQXdS(uï :param filename: a name of a file in Malt-TAB format :param zero_based: nodes in the input file are numbered starting from 0 rather than 1 (as produced by, e.g., zpar) :param str cell_separator: the cell separator. If not provided, cells are split by whitespace. :param str top_relation_label: the label by which the top relation is identified, for examlple, `ROOT`, `null` or `TOP`. :return: a list of DependencyGraphs u R RRN(topentreadtsplitR(tfilenameR RRtinfileR((sl/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/parse/dependencygraph.pytloadÝscsKtj|j|djƒƒ}|j|d‰t‡fd†|DƒƒS(ul Returns the number of left children under the node specified by the given address. udepsuaddressc3s!|]}|ˆkrdVqdS(iN((t.0tc(tindex(sl/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/parse/dependencygraph.pys ýs(Rt from_iterableRRtsum(Rt node_indextchildren((RPsl/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/parse/dependencygraph.pyt left_childrenös csKtj|j|djƒƒ}|j|d‰t‡fd†|DƒƒS(um Returns the number of right children under the node specified by the given address. udepsuaddressc3s!|]}|ˆkrdVqdS(iN((RNRO(RP(sl/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/parse/dependencygraph.pys s(RRQRRRR(RRSRT((RPsl/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/parse/dependencygraph.pytright_childrenÿs cCs2|j|dƒs.|j|dj|ƒndS(Nuaddress(R+RR(RR((sl/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/parse/dependencygraph.pytadd_nodesc Cs½d„}d„}d„}d„} i|d6|d6|d6| d6} t|tƒrqd „|jd ƒDƒ}nd „|Dƒ} d „| Dƒ} d} xÈt| d dƒD]´\} }|j|ƒ}| dkràt|ƒ} n| t|ƒksøt‚|dkr>y| | }Wq>tk r:tdj | ƒƒ‚q>Xny+||| ƒ\} }}}}}}}Wn8t tfk r£||ƒ\}}}}}}}nX|dkr¶qªnt |ƒ}|rÕ|d7}n|j | j i| d6|d6|d6|d6|d6|d6|d6|d6ƒ| dkrB|dkrB|}n|j |d|j| ƒqªW|j dd|r¬|j dd|d}|j ||_||_n tjdƒdS(u½Parse a sentence. :param extractor: a function that given a tuple of cells returns a 7-tuple, where the values are ``word, lemma, ctag, tag, feats, head, rel``. :param str cell_separator: the cell separator. If not provided, cells are split by whitespace. :param str top_relation_label: the label by which the top relation is identified, for examlple, `ROOT`, `null` or `TOP`. cSs+|\}}}|||||d|dfS(Nu((tcellsRPtwordttagthead((sl/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/parse/dependencygraph.pytextract_3_cellsscSs.|\}}}}|||||d||fS(Nu((RXRPRYRZR[R1((sl/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/parse/dependencygraph.pytextract_4_cellssc Ss[|\}}}}}}}yt|ƒ}Wntk r>nX|||||d||fS(Nu(tintt ValueError( RXRPt line_indexRYtlemmaRZt_R[R1((sl/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/parse/dependencygraph.pytextract_7_cells#s  c Ssd|\ }}}}}}}} } } yt|ƒ}Wntk rGnX|||||||| fS(N(R^R_( RXRPR`RYRatctagRZtfeatsR[R1Rb((sl/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/parse/dependencygraph.pytextract_10_cells,s $ iiii css|] }|VqdS(N((RNtline((sl/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/parse/dependencygraph.pys =su css|]}|jƒVqdS(N(trstrip(RNtl((sl/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/parse/dependencygraph.pys ?scss|]}|r|VqdS(N((RNRi((sl/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/parse/dependencygraph.pys @ststartiuUNumber of tab-delimited fields ({0}) not supported by CoNLL(10) or Malt-Tab(4) formatu_uaddressuwordulemmauctagutagufeatsuheadureliudepsuBThe graph doesn't contain a node that depends on the root element.N(t isinstanceRRJR t enumerateRFtAssertionErrortKeyErrorR_R?t TypeErrorR^RRRRRtwarningstwarn(Rtinput_R R RRR\R]RcRft extractorstlinest cell_numberRPRgRXRYRaRdRZReR[R1t root_address((sl/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/parse/dependencygraph.pyR sl       +%        cCs'|d}|r#|dkr#|Sn|S(Nuwordu,((RRtfiltertw((sl/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/parse/dependencygraph.pyt_word~s   cCso|j|ƒ}|d}ttj|djƒƒƒ}|rgt|g|D]}|j|ƒ^qKƒS|SdS(u¦ Turn dependency graphs into NLTK trees. :param int i: index of a node :return: either a word (if the indexed node is a leaf) or a ``Tree``. uwordudepsN(R*R.RRQRRt_tree(RtiRRYR2R((sl/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/parse/dependencygraph.pyRz…s  )cCs[|j}|d}ttj|djƒƒƒ}t|g|D]}|j|ƒ^q?ƒS(u– Starting with the ``root`` node, build a dependency tree using the NLTK ``Tree`` constructor. Dependency labels are omitted. uwordudeps(RR.RRQRRRz(RRRYR2R((sl/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/parse/dependencygraph.pyttree”s  ccs¥|s|j}n|d|df}xxttj|djƒƒƒD]W}|j|ƒ}||d|d|dffVx|jd|ƒD] }|VqŽWqFWdS(us Extract dependency triples of the form: ((head word, head tag), rel, (dep word, dep tag)) uworductagudepsurelRN(RR.RRQRR*ttriples(RRR[R{Rttriple((sl/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/parse/dependencygraph.pyR}Ÿs & cCs,y|j|dSWntk r'dSXdS(Nuhead(Rt IndexErrorR (RR{((sl/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/parse/dependencygraph.pyt_hd¯s cCs,y|j|dSWntk r'dSXdS(Nurel(RRR (RR{((sl/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/parse/dependencygraph.pyt_relµs c Cs6i}xL|jjƒD];}x2|dD]&}t|d|gƒ}d||>> dg = DependencyGraph(treebank_data) >>> dg.contains_cycle() False >>> cyclic_dg = DependencyGraph() >>> top = {'word': None, 'deps': [1], 'rel': 'TOP', 'address': 0} >>> child1 = {'word': None, 'deps': [2], 'rel': 'NTOP', 'address': 1} >>> child2 = {'word': None, 'deps': [4], 'rel': 'NTOP', 'address': 2} >>> child3 = {'word': None, 'deps': [1], 'rel': 'NTOP', 'address': 3} >>> child4 = {'word': None, 'deps': [3], 'rel': 'NTOP', 'address': 4} >>> cyclic_dg.nodes = { ... 0: top, ... 1: child1, ... 2: child2, ... 3: child3, ... 4: child4, ... } >>> cyclic_dg.root = top >>> cyclic_dg.contains_cycle() [3, 1, 2, 4] udepsuaddressii(RRttupletget_cycle_pathR*tFalse( Rt distancesRRR,Rbt new_entriestpair1tpair2tpairtpath((sl/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/parse/dependencygraph.pytcontains_cycle¼s$  ! # cCsŠx)|dD]}||kr |dgSq WxW|dD]K}|j|j|ƒ|ƒ}t|ƒdkr7|jd|dƒ|Sq7WgS(Nudepsuaddressi(RƒR*RFtinsert(Rt curr_nodetgoal_node_indexRRŠ((sl/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/parse/dependencygraph.pyRƒîs cs€|dkrd‰n?|dkr*d‰n*|dkr?d‰ntdj|ƒƒ‚dj‡fd †t|jjƒƒDƒƒS( u® The dependency graph in CoNLL format. :param style: the style to use for the format (3, 4, 10 columns) :type style: int :rtype: str iu{word} {tag} {head} iu{word} {tag} {head} {rel} i u9{i} {word} {lemma} {ctag} {tag} {feats} {head} {rel} _ _ uUNumber of tab-delimited fields ({0}) not supported by CoNLL(10) or Malt-Tab(4) formatuc3s:|]0\}}|ddkrˆjd||VqdS(utaguTOPR{N(R?(RNR{R(ttemplate(sl/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/parse/dependencygraph.pys s(R_R?tjoinR.RR/(Rtstyle((Rsl/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/parse/dependencygraph.pytto_conllùs       cCsÉddl}ttdt|jƒƒƒ}g|D]6}|j|ƒr1||j|ƒ|j|ƒf^q1}i|_x&|D]}|j|d|j|s.  ÿÿ  + V