ó <¿CVc@sødZddlZddlmZd„ZejZd„ZdZ yddl m Z Wne k rsd„Z nXd Z d ZdZd efd „ƒYZd efd„ƒYZdefd„ƒYZdefd„ƒYZdefd„ƒYZdS(sÌ Provides scoring functions for a number of association measures through a generic, abstract implementation in ``NgramAssocMeasures``, and n-specific ``BigramAssocMeasures`` and ``TrigramAssocMeasures``. iÿÿÿÿN(treducecCstj|dƒS(Ng@(t_mathtlog(tx((sj/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/metrics/association.pytscCstd„|ƒS(NcSs||S(N((Rty((sj/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/metrics/association.pyRs(R(ts((sj/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/metrics/association.pyRsg#B’ ¡œÇ;(t fisher_exactcOs t‚dS(N(tNotImplementedError(t_argst_kwargs((sj/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/metrics/association.pyRsiiþÿÿÿtNgramAssocMeasurescBs¹eZdZdZed„ƒZed„ƒZed„ƒZed„ƒZ ed„ƒZ ed„ƒZ ed„ƒZ ed „ƒZ ed „ƒZed „ƒZed „ƒZRS( s¿ An abstract class defining a collection of generic association measures. Each public method returns a score, taking the following arguments:: score_fn(count_of_ngram, (count_of_n-1gram_1, ..., count_of_n-1gram_j), (count_of_n-2gram_1, ..., count_of_n-2gram_k), ..., (count_of_1gram_1, ..., count_of_1gram_n), count_of_total_words) See ``BigramAssocMeasures`` and ``TrigramAssocMeasures`` Inheriting classes should define a property _n, and a method _contingency which calculates contingency values from marginals in order for all association measures defined here to be usable. icGstdƒ‚dS(s>Calculates values of a contingency table from marginal values.s?The contingency table is not availablein the general ngram caseN(R(t marginals((sj/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/metrics/association.pyt _contingency>scGstdƒ‚dS(sACalculates values of contingency table marginals from its values.s?The contingency table is not availablein the general ngram caseN(R(t contingency((sj/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/metrics/association.pyt _marginalsDsc#s‰tˆƒ}gtˆjƒD]‰dˆ>^q}xPttˆƒƒD]<‰t‡‡‡fd†|Dƒƒt|ˆjdƒVqEWdS(s3Calculates expected values for a contingency table.ic3s>|]4‰t‡‡‡fd†tdˆjƒDƒƒVqdS(c3s-|]#}|ˆ@ˆˆ@krˆ|VqdS(N((t.0R(tconttitj(sj/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/metrics/association.pys SsiN(tsumtranget_n(R(tclsRR(Rsj/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/metrics/association.pys SsN(RRRtlent_producttfloat(RRtn_alltbits((RRRsj/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/metrics/association.pyt_expected_valuesJs  & cGst|tƒ|tS(s Scores ngrams by their frequency(RtNGRAMtTOTAL(R ((sj/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/metrics/association.pytraw_freqXscGs?|tt|tƒt|t|jdƒ|ttdS(sScores ngrams using Student's t test with independence hypothesis for unigrams, as in Manning and Schutze 5.3.1. igà?(RRtUNIGRAMSRRRt_SMALL(RR ((sj/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/metrics/association.pyt student_t]s cGs;|j|Œ}|j|ƒ}td„t||ƒDƒƒS(sZScores ngrams using Pearson's chi-square as in Manning and Schutze 5.3.3. css+|]!\}}||d|tVqdS(iN(R"(Rtobstexp((sj/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/metrics/association.pys ns(R RRtzip(RR Rtexps((sj/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/metrics/association.pytchi_sqgs cOs,|t|jddƒtt|tƒƒS(sÂScores ngrams using a variant of mutual information. The keyword argument power sets an exponent (default 3) for the numerator. No logarithm of the result is calculated. tpoweri(RtgetRRR!(R tkwargs((sj/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/metrics/association.pytmi_likeqscGs5t|t|t|jdƒtt|tƒƒS(s^Scores ngrams by pointwise mutual information, as in Manning and Schutze 5.4. i(t_log2RRRRR!(RR ((sj/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/metrics/association.pytpmizs cGs<|j|Œ}|jtd„t||j|ƒƒDƒƒS(sOScores ngrams using likelihood ratios as in Manning and Schutze 5.3.4. css7|]-\}}|tt|ƒ|ttƒVqdS(N(t_lnRR"(RR$R%((sj/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/metrics/association.pys ˆs(R RRR&R(RR R((sj/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/metrics/association.pytlikelihood_ratio‚s cGsGt|tƒt|t|jdƒ}|tt|t|ƒdS(s1Scores ngrams using the Poisson-Stirling measure.i(RR!RRRRR-(RR R%((sj/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/metrics/association.pytpoisson_stirling‹s cGs+|j|Œ}t|dƒt|d ƒS(s&Scores ngrams using the Jaccard index.iiÿÿÿÿ(R RR(RR R((sj/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/metrics/association.pytjaccard’s(t__name__t __module__t__doc__Rt staticmethodR Rt classmethodRR R#R(R,R.R0R1R2(((sj/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/metrics/association.pyR )s    tBigramAssocMeasurescBs}eZdZdZed„ƒZed„ƒZed„ƒZed„ƒZ ed„ƒZ ed„ƒZ ed„ƒZ RS( s€ A collection of bigram association measures. Each association measure is provided as a function with three arguments:: bigram_score_fn(n_ii, (n_ix, n_xi), n_xx) The arguments constitute the marginals of a contingency table, counting the occurrences of particular events in a corpus. The letter i in the suffix refers to the appearance of the word in question, while x indicates the appearance of any word. Thus, for example: n_ii counts (w1, w2), i.e. the bigram being scored n_ix counts (w1, *) n_xi counts (*, w2) n_xx counts (*, *), i.e. any bigram This may be shown with respect to a contingency table:: w1 ~w1 ------ ------ w2 | n_ii | n_oi | = n_xi ------ ------ ~w2 | n_io | n_oo | ------ ------ = n_ix TOTAL = n_xx icCs<|\}}||}||}|||||||fS(sECalculates values of a bigram contingency table from marginal values.((tn_iit n_ix_xi_tupletn_xxtn_ixtn_xitn_oitn_io((sj/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/metrics/association.pyR ·s   cCs'|||||f||||fS(sACalculates values of contingency table marginals from its values.((R9R>R?tn_oo((sj/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/metrics/association.pyR¿sccsZt|ƒ}xGtdƒD]9}||||dA||||dAt|ƒVqWdS(s3Calculates expected values for a contingency table.iiiN(RRR(RR;R((sj/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/metrics/association.pyRÄs cGsU|j|Œ\}}}}t||||dƒ||||||||S(sdScores bigrams using phi-square, the square of the Pearson correlation coefficient. i(R R(RR R9R?R>R@((sj/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/metrics/association.pytphi_sqÌscCs)|\}}||j|||f|ƒS(sƒScores bigrams using chi-square, i.e. phi-sq multiplied by the number of bigrams, as in Manning and Schutze 5.3.3. (RA(RR9R:R;R<R=((sj/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/metrics/association.pyR(Ös cGsI|j|Œ\}}}}t||g||ggddƒ\}}|S(sºScores bigrams using Fisher's Exact Test (Pedersen 1996). Less sensitive to small counts than PMI or Chi Sq, but also more expensive to compute. Requires scipy. t alternativetless(R R(RR R9R?R>R@toddstpvalue((sj/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/metrics/association.pytfisherÞs*cCs"|\}}dt|ƒ||S(s(Scores bigrams using Dice's coefficient.i(R(R9R:R;R<R=((sj/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/metrics/association.pytdiceês ( R3R4R5RR6R RRR7RAR(RFRG(((sj/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/metrics/association.pyR8™s  tTrigramAssocMeasurescBs2eZdZdZed„ƒZed„ƒZRS(sÄ A collection of trigram association measures. Each association measure is provided as a function with four arguments:: trigram_score_fn(n_iii, (n_iix, n_ixi, n_xii), (n_ixx, n_xix, n_xxi), n_xxx) The arguments constitute the marginals of a contingency table, counting the occurrences of particular events in a corpus. The letter i in the suffix refers to the appearance of the word in question, while x indicates the appearance of any word. Thus, for example: n_iii counts (w1, w2, w3), i.e. the trigram being scored n_ixx counts (w1, *, *) n_xxx counts (*, *, *), i.e. any trigram icCs°|\}}}|\}}} ||} ||} ||} | || | } ||| | }||| | }||| | | | ||}|| | | | |||fS(sÔCalculates values of a trigram contingency table (or cube) from marginal values. >>> TrigramAssocMeasures._contingency(1, (1, 1, 1), (1, 73, 1), 2000) (1, 0, 0, 0, 0, 72, 0, 1927) ((tn_iiit n_iix_tuplet n_ixx_tupletn_xxxtn_iixtn_ixitn_xiitn_ixxtn_xixtn_xxitn_oiitn_ioitn_iiotn_ooitn_oiotn_iootn_ooo((sj/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/metrics/association.pyR s   " c Gsv|\}}}}}}}}|||||||f||||||||||||ft|ƒfS(s»Calculates values of contingency table marginals from its values. >>> TrigramAssocMeasures._marginals(1, 0, 0, 0, 0, 72, 0, 1927) (1, (1, 1, 1), (1, 73, 1), 2000) (R( RRIRSRTRVRURWRXRY((sj/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/metrics/association.pyRs(R3R4R5RR6R R(((sj/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/metrics/association.pyRHñstQuadgramAssocMeasurescBs2eZdZdZed„ƒZed„ƒZRS(s3 A collection of quadgram association measures. Each association measure is provided as a function with five arguments:: trigram_score_fn(n_iiii, (n_iiix, n_iixi, n_ixii, n_xiii), (n_iixx, n_ixix, n_ixxi, n_xixi, n_xxii, n_xiix), (n_ixxx, n_xixx, n_xxix, n_xxxi), n_all) The arguments constitute the marginals of a contingency table, counting the occurrences of particular events in a corpus. The letter i in the suffix refers to the appearance of the word in question, while x indicates the appearance of any word. Thus, for example: n_iiii counts (w1, w2, w3, w4), i.e. the quadgram being scored n_ixxi counts (w1, *, *, w4) n_xxxx counts (*, *, *, *), i.e. any quadgram ic"CsÎ|\}}}}|\} } } } } }|\}}}}||}||}||}| |||}| |||}| |||}||||||||}||}||||}| |||}||||||||}| |||}||||||||}||||||||} |||||||||||||||| }!||||||||||||||| |!fS(sXCalculates values of a quadgram contingency table from marginal values. (("tn_iiiit n_iiix_tuplet n_iixx_tuplet n_ixxx_tupletn_xxxxtn_iiixtn_iixitn_ixiitn_xiiitn_iixxtn_ixixtn_ixxitn_xixitn_xxiitn_xiixtn_ixxxtn_xixxtn_xxixtn_xxxitn_oiiitn_ioiitn_iioitn_ooiitn_oioitn_iooitn_oooitn_iiiotn_oiiotn_ioiotn_ooiotn_iiootn_oiootn_ioootn_oooo((sj/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/metrics/association.pyR ?s*   " """Bc Gs›|\}}}}}}}}} } } } } }}}|| }||}||}||}||| | }||| | }||||}||||}||||}||| | }|||| || | |}|||| || | |}|||| || | | }||||||||}t|ƒ}|||||f||||||f||||f|fS(sCalculates values of contingency table marginals from its values. QuadgramAssocMeasures._marginals(1, 0, 2, 46, 552, 825, 2577, 34967, 1, 0, 2, 48, 7250, 9031, 28585, 356653) (1, (2, 553, 3, 1), (7804, 6, 3132, 1378, 49, 2), (38970, 17660, 100, 38970), 440540) (R( RR[RnRoRqRpRrRsRtRuRvRwRxRyRzR{R|R`RaRbRcRdReRfRgRhRiRjRkRlRmR((sj/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/metrics/association.pyR\s*6    """" (R3R4R5RR6R R(((sj/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/metrics/association.pyRZ)stContingencyMeasurescBs&eZdZd„Zed„ƒZRS(sWraps NgramAssocMeasures classes such that the arguments of association measures are contingency table values rather than marginals. cCs‰d|jj|j_xlt|ƒD]^}|jdƒr>q#nt||ƒ}|jdƒsq|j||ƒ}nt|||ƒq#WdS(sAConstructs a ContingencyMeasures given a NgramAssocMeasures classt Contingencyt__t_N(t __class__R3tdirt startswithtgetattrt_make_contingency_fntsetattr(tselftmeasurestktv((sj/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/metrics/association.pyt__init__„scs.‡‡fd†}ˆj|_ˆj|_|S(s‡From an association measure function, produces a new function which accepts contingency table values as its arguments. csˆˆj|ŒŒS(N(R(R(Rˆtold_fn(sj/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/metrics/association.pytres”s(R5R3(RˆRŒR((RˆRŒsj/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/metrics/association.pyR…s  (R3R4R5R‹R6R…(((sj/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/metrics/association.pyR}s (R5tmathRt functoolsRR-RR/RR"t scipy.statsRt ImportErrorRR!RtobjectR R8RHRZR}(((sj/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/metrics/association.pyt s$      pX8V