ó <æCVc@`s;ddlmZddlmZddlmZddlZddlZddlmZddlm Z ddl m Z yDddl m Z dd lmZdd lmZdd lmZWnek rĻnXdd lmZmZmZd efd„ƒYZdefd„ƒYZdefd„ƒYZd„ZdS(i(tabsolute_import(tdivision(tprint_functionN(tremove(tdeepcopy(t itemgetter(tarray(tsparse(tload_svmlight_file(tsvm(tParserItDependencyGraphtDependencyEvaluatort ConfigurationcB`s5eZdZd„Zd„Zed„Zd„ZRS(s Class for holding configuration which is the partial analysis of the input sentence. The transition based parser aims at finding set of operators that transfer the initial configuration to the terminal configuration. The configuration includes: - Stack: for storing partially proceeded words - Buffer: for storing remaining input words - Set of arcs: for storing partially built dependency tree This class also provides a method to represent a configuration as list of features. cC`sXdg|_ttdt|jƒƒƒ|_g|_|j|_t|jƒ|_dS(s¶ :param dep_graph: the representation of an input in the form of dependency graph. :type dep_graph: DependencyGraph where the dependencies are not specified. iiN( tstacktlisttrangetlentnodestbuffertarcst_tokenst _max_address(tselft dep_graph((sm/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/parse/transitionparser.pyt__init__,s  !  cC`s3dt|jƒdt|jƒdt|jƒS(NsStack : s Buffer : s Arcs : (tstrRRR(R((sm/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/parse/transitionparser.pyt__str__8scC`sC|dkrtS|dkr tS|tkr?|dkr?tSntS(ss Check whether a feature is informative The flag control whether "_" is informative or not tt_N(tNonetFalsetTrue(Rtfeattflag((sm/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/parse/transitionparser.pyt_check_informative<s    cC`sØg}t|jƒdkrn|jt|jƒd}|j|}|j|dtƒrp|jd|dƒnd|kr§|j|dƒr§|jd|dƒn|j|dƒrŅ|jd|dƒnd |kr)|j|d ƒr)|d jd ƒ}x"|D]}|jd |ƒq Wnt|jƒdkr“|jt|jƒd }|j|}|j|dƒr“|jd |dƒq“nd}d}d} d} xw|jD]l\} } } | |krµ| | kr÷| |kr÷| }| } n| | kr!| |kr!| }| } q!qµqµW|j| ƒrH|jd| ƒn|j| ƒrn|jd| ƒqnnt|jƒdkr¤|jd}|j|}|j|dtƒrĖ|jd|dƒnd|kr|j|dƒr|jd|dƒn|j|dƒr-|jd|dƒnd |kr„|j|d ƒr„|d jd ƒ}x"|D]}|jd|ƒqfWnt|jƒdkr|jd}|j|}|j|dtƒrį|jd|dƒn|j|dƒr|jd|dƒqnt|jƒd krl|jd }|j|}|j|dƒrl|jd|dƒqlnt|jƒdkrÉ|jd}|j|}|j|dƒrÉ|jd|dƒqÉnd}d}d} d} xw|jD]l\} } } | |krė| | kr-| |kr-| }| } n| | krW| |krW| }| } qWqėqėW|j| ƒr~|jd| ƒn|j| ƒr¤|jd| ƒq¤n|S(s/ Extract the set of features for the current configuration. Implement standard features as describe in Table 3.2 (page 31) in Dependency Parsing book by Sandra Kubler, Ryan McDonal, Joakim Nivre. Please note that these features are very basic. :return: list(str) iitwordt STK_0_FORM_tlemmat STK_0_LEMMA_ttagt STK_0_POS_tfeatst|t STK_0_FEATS_it STK_1_POS_i@Bi’’’’Rt STK_0_LDEP_t STK_0_RDEP_t BUF_0_FORM_t BUF_0_LEMMA_t BUF_0_POS_t BUF_0_FEATS_t BUF_1_FORM_t BUF_1_POS_t BUF_2_POS_it BUF_3_POS_t BUF_0_LDEP_t BUF_0_RDEP_( RRRR#R tappendtsplitRR(Rtresultt stack_idx0ttokenR*R!t stack_idx1t left_mostt right_mostt dep_left_mosttdep_right_mosttwitrtwjt buffer_idx0t buffer_idx1t buffer_idx2t buffer_idx3((sm/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/parse/transitionparser.pytextract_featuresJs¤                (t__name__t __module__t__doc__RRRR#RK(((sm/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/parse/transitionparser.pyR s    t TransitioncB`sSeZdZdZdZdZdZd„Zd„Zd„Z d„Z d „Z RS( s½ This class defines a set of transition which is applied to a configuration to get another configuration Note that for different parsing algorithm, the transition is different. tLEFTARCtRIGHTARCtSHIFTtREDUCEcC`sD||_|tjtjgkr@tdtjtjfƒ‚ndS(s¢ :param alg_option: the algorithm option of this parser. Currently support `arc-standard` and `arc-eager` algorithm :type alg_option: str s% Currently we only support %s and %s N(t_algotTransitionParsert ARC_STANDARDt ARC_EAGERt ValueError(Rt alg_option((sm/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/parse/transitionparser.pyR¼s  c C`sšt|jƒdks*t|jƒdkr.dS|jddkrEdS|jt|jƒd}t}|jtjkr¬x2|jD]$\}}}||krt}qqWn|rč|jj ƒ|jd}|jj |||fƒndSdS(s Note that the algorithm for left-arc is quite similar except for precondition for both arc-standard and arc-eager :param configuration: is the current configuration :return : A new configuration or -1 if the pre-condition is not satisfied ii’’’’iN( RRRR RTRURWRRtpopR:( Rtconftrelationtidx_wiR"t idx_parentREt idx_childtidx_wj((sm/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/parse/transitionparser.pytleft_arcČs*   cC`sŽt|jƒdks*t|jƒdkr.dS|jtjkr…|jjƒ}|jd}||jd<|jj|||fƒnU|jt|jƒd}|jjdƒ}|jj|ƒ|jj|||fƒdS(sų Note that the algorithm for right-arc is DIFFERENT for arc-standard and arc-eager :param configuration: is the current configuration :return : A new configuration or -1 if the pre-condition is not satisfied ii’’’’iN( RRRRTRURVRZRR:(RR[R\R]R`((sm/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/parse/transitionparser.pyt right_arcćs*  cC`sŸ|jtjkrdSt|jƒdkr/dS|jt|jƒd}t}x/|jD]$\}}}||krYt}qYqYW|r—|jjƒndSdS(sé Note that the algorithm for reduce is only available for arc-eager :param configuration: is the current configuration :return : A new configuration or -1 if the pre-condition is not satisfied i’’’’iiN( RTRURWRRRRR RZ(RR[R]R"R^RER_((sm/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/parse/transitionparser.pytreduceös  cC`s?t|jƒdkrdS|jjdƒ}|jj|ƒdS(só Note that the algorithm for shift is the SAME for arc-standard and arc-eager :param configuration: is the current configuration :return : A new configuration or -1 if the pre-condition is not satisfied ii’’’’N(RRRZRR:(RR[R]((sm/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/parse/transitionparser.pytshift s( RLRMRNtLEFT_ARCt RIGHT_ARCRRRSRRaRbRcRd(((sm/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/parse/transitionparser.pyRO±s   RUcB`skeZdZdZdZd„Zd„Zd„Zd„Zd„Z d„Z d „Z d „Z d „Z RS( sl Class for transition based parser. Implement 2 algorithms which are "arc-standard" and "arc-eager" s arc-standards arc-eagercC`s_||j|jgkr7td|j|jfƒ‚n||_i|_i|_i|_dS(s  :param algorithm: the algorithm option of this parser. Currently support `arc-standard` and `arc-eager` algorithm :type algorithm: str s% Currently we only support %s and %s N(RVRWRXt _algorithmt _dictionaryt _transitiont_match_transition(Rt algorithm((sm/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/parse/transitionparser.pyR s   cC`sR|j|}|j|}|ddkr.dS|d|dkrJ|dSdSdS(NR$theadtaddresstrel(RR(RR^R_tdepgraphtp_nodetc_node((sm/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/parse/transitionparser.pyt_get_dep_relation.s  cC`sdg}x>|D]6}|jj|t|jƒƒ|j|j|ƒq Wdjd„t|ƒDƒƒS(sč :param features: list of feature string which is needed to convert to binary features :type features: list(str) :return : string of binary features in libsvm format which is 'featureID:value' pairs t cs`s|]}t|ƒdVqdS(s:1.0N(R(t.0t featureID((sm/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/parse/transitionparser.pys Fs(Rht setdefaultRR:tjointsorted(Rtfeaturestunsorted_resulttfeature((sm/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/parse/transitionparser.pyt_convert_to_binary_features:s  c C`s-g}xc|jD]X}|j|}d|kr|d}|d}|dk rh|j||fƒqhqqWxŗ|D]²\}}||kr |}|}|}nx‚t|d|ƒD]m}xdtt|jƒƒD]M} | |ksī| |krŠ|| f|krtS| |f|krtSqŠqŠWq“WqsWtS(NRlRmi(RRR:RRRR ( RRotarc_listtkeytnodetchildIdxt parentIdxttemptktm((sm/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/parse/transitionparser.pyt_is_projectiveHs*       cC`sm|jj|t|jƒdƒ||j|j|>> from nltk.parse import DependencyGraph, DependencyEvaluator >>> from nltk.parse.transitionparser import TransitionParser, Configuration, Transition >>> gold_sent = DependencyGraph(""" ... Economic JJ 2 ATT ... news NN 3 SBJ ... has VBD 0 ROOT ... little JJ 5 ATT ... effect NN 3 OBJ ... on IN 5 ATT ... financial JJ 8 ATT ... markets NNS 6 PC ... . . 3 PU ... """) >>> conf = Configuration(gold_sent) ###################### Check the Initial Feature ######################## >>> print(', '.join(conf.extract_features())) STK_0_POS_TOP, BUF_0_FORM_Economic, BUF_0_LEMMA_Economic, BUF_0_POS_JJ, BUF_1_FORM_news, BUF_1_POS_NN, BUF_2_POS_VBD, BUF_3_POS_JJ ###################### Check The Transition ####################### Check the Initialized Configuration >>> print(conf) Stack : [0] Buffer : [1, 2, 3, 4, 5, 6, 7, 8, 9] Arcs : [] A. Do some transition checks for ARC-STANDARD >>> operation = Transition('arc-standard') >>> operation.shift(conf) >>> operation.left_arc(conf, "ATT") >>> operation.shift(conf) >>> operation.left_arc(conf,"SBJ") >>> operation.shift(conf) >>> operation.shift(conf) >>> operation.left_arc(conf, "ATT") >>> operation.shift(conf) >>> operation.shift(conf) >>> operation.shift(conf) >>> operation.left_arc(conf, "ATT") Middle Configuration and Features Check >>> print(conf) Stack : [0, 3, 5, 6] Buffer : [8, 9] Arcs : [(2, 'ATT', 1), (3, 'SBJ', 2), (5, 'ATT', 4), (8, 'ATT', 7)] >>> print(', '.join(conf.extract_features())) STK_0_FORM_on, STK_0_LEMMA_on, STK_0_POS_IN, STK_1_POS_NN, BUF_0_FORM_markets, BUF_0_LEMMA_markets, BUF_0_POS_NNS, BUF_1_FORM_., BUF_1_POS_., BUF_0_LDEP_ATT >>> operation.right_arc(conf, "PC") >>> operation.right_arc(conf, "ATT") >>> operation.right_arc(conf, "OBJ") >>> operation.shift(conf) >>> operation.right_arc(conf, "PU") >>> operation.right_arc(conf, "ROOT") >>> operation.shift(conf) Terminated Configuration Check >>> print(conf) Stack : [0] Buffer : [] Arcs : [(2, 'ATT', 1), (3, 'SBJ', 2), (5, 'ATT', 4), (8, 'ATT', 7), (6, 'PC', 8), (5, 'ATT', 6), (3, 'OBJ', 5), (3, 'PU', 9), (0, 'ROOT', 3)] B. Do some transition checks for ARC-EAGER >>> conf = Configuration(gold_sent) >>> operation = Transition('arc-eager') >>> operation.shift(conf) >>> operation.left_arc(conf,'ATT') >>> operation.shift(conf) >>> operation.left_arc(conf,'SBJ') >>> operation.right_arc(conf,'ROOT') >>> operation.shift(conf) >>> operation.left_arc(conf,'ATT') >>> operation.right_arc(conf,'OBJ') >>> operation.right_arc(conf,'ATT') >>> operation.shift(conf) >>> operation.left_arc(conf,'ATT') >>> operation.right_arc(conf,'PC') >>> operation.reduce(conf) >>> operation.reduce(conf) >>> operation.reduce(conf) >>> operation.right_arc(conf,'PU') >>> print(conf) Stack : [0, 3, 9] Buffer : [] Arcs : [(2, 'ATT', 1), (3, 'SBJ', 2), (0, 'ROOT', 3), (5, 'ATT', 4), (3, 'OBJ', 5), (5, 'ATT', 6), (8, 'ATT', 7), (6, 'PC', 8), (3, 'PU', 9)] ###################### Check The Training Function ####################### A. Check the ARC-STANDARD training >>> import tempfile >>> import os >>> input_file = tempfile.NamedTemporaryFile(prefix='transition_parse.train', dir=tempfile.gettempdir(), delete=False) >>> parser_std = TransitionParser('arc-standard') >>> print(', '.join(parser_std._create_training_examples_arc_std([gold_sent], input_file))) Number of training examples : 1 Number of valid (projective) examples : 1 SHIFT, LEFTARC:ATT, SHIFT, LEFTARC:SBJ, SHIFT, SHIFT, LEFTARC:ATT, SHIFT, SHIFT, SHIFT, LEFTARC:ATT, RIGHTARC:PC, RIGHTARC:ATT, RIGHTARC:OBJ, SHIFT, RIGHTARC:PU, RIGHTARC:ROOT, SHIFT >>> parser_std.train([gold_sent],'temp.arcstd.model') Number of training examples : 1 Number of valid (projective) examples : 1 ... >>> remove(input_file.name) B. Check the ARC-EAGER training >>> input_file = tempfile.NamedTemporaryFile(prefix='transition_parse.train', dir=tempfile.gettempdir(),delete=False) >>> parser_eager = TransitionParser('arc-eager') >>> print(', '.join(parser_eager._create_training_examples_arc_eager([gold_sent], input_file))) Number of training examples : 1 Number of valid (projective) examples : 1 SHIFT, LEFTARC:ATT, SHIFT, LEFTARC:SBJ, RIGHTARC:ROOT, SHIFT, LEFTARC:ATT, RIGHTARC:OBJ, RIGHTARC:ATT, SHIFT, LEFTARC:ATT, RIGHTARC:PC, REDUCE, REDUCE, REDUCE, RIGHTARC:PU >>> parser_eager.train([gold_sent],'temp.arceager.model') Number of training examples : 1 Number of valid (projective) examples : 1 ... >>> remove(input_file.name) ###################### Check The Parsing Function ######################## A. Check the ARC-STANDARD parser >>> result = parser_std.parse([gold_sent], 'temp.arcstd.model') >>> de = DependencyEvaluator(result, [gold_sent]) >>> de.eval() >= (0, 0) True B. Check the ARC-EAGER parser >>> result = parser_eager.parse([gold_sent], 'temp.arceager.model') >>> de = DependencyEvaluator(result, [gold_sent]) >>> de.eval() >= (0, 0) True Note that result is very poor because of only one training example. N((((sm/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/parse/transitionparser.pytdemozs(t __future__RRRR§R®tosRtcopyRtoperatorRtnumpyRtscipyRtsklearn.datasetsRtsklearnR t ImportErrort nltk.parseR R R tobjectR RORURŅ(((sm/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/parse/transitionparser.pyt s(   “g’c