ó <¿CVc @s-dZddlmZmZddlZddlmZddlmZddl m Z m Z ddl m Z ddlmZmZdd lmZejeƒZed efd „ƒYƒZed kr)ddlZddlZdd lmZejƒZejddddddddƒejddddddddƒejdddddddgdd ƒejd!d"dd#dddgdd$ƒejd%d&dd'dd(ƒejd)d*dd+dd,dd-ƒejd.d/dd0dd1dd2ƒejd3d4dd5dd6dd7ƒejd8d9dd:de dd;ƒejd<d=dd>de!dd?dd@ƒej"ƒ\Z#Z$e#j% rej&ƒe'ƒnej(dAdBdCe)e#j*ƒƒgZ+e,e#j%dDƒâZ-xØe-D]ÐZ.e.j/e#j0ƒZ1e1dEe2e1dFd!ƒe3e1dj4ƒj/e#j5ƒƒf\Z6Z7Z8e#j9e#j:kp†e;e#j9ƒdEkobe6e#j9kp†e;e#j:ƒdEko†e6e#j:kr¢e+j<e6e7e8fƒnqÒWWdQXe#j=rÜee+e>ee#jƒe#j=ƒƒZ?nee+e>ee#jƒƒZ?e#j@rneAe>e?e#jBƒƒƒejCƒndS(GuN Implementations of inter-annotator agreement coefficients surveyed by Artstein and Poesio (2007), Inter-Coder Agreement for Computational Linguistics. An agreement coefficient calculates the amount that annotators agreed on label assignments beyond what is expected by chance. In defining the AnnotationTask class, we use naming conventions similar to the paper's terminology. There are three types of objects in an annotation task: the coders (variables "c" and "C") the items to be annotated (variables "i" and "I") the potential categories to be assigned (variables "k" and "K") Additionally, it is often the case that we don't want to treat two different labels as complete disagreement, and so the AnnotationTask constructor can also take a distance metric as a final argument. Distance metrics are simply functions that take two arguments, and return a value between 0.0 and 1.0 indicating the distance between them. If not supplied, the default is binary comparison between the arguments. The simplest way to initialize an AnnotationTask is with a list of triples, each containing a coder's assignment for one object in the task: task = AnnotationTask(data=[('c1', '1', 'v1'),('c2', '1', 'v1'),...]) Note that the data list needs to contain the same number of triples for each individual coder, containing category values for the same set of items. Alpha (Krippendorff 1980) Kappa (Cohen 1960) S (Bennet, Albert and Goldstein 1954) Pi (Scott 1955) TODO: Describe handling of multiple coders and missing data Expected results from the Artstein and Poesio survey paper: >>> from nltk.metrics.agreement import AnnotationTask >>> import os.path >>> t = AnnotationTask(data=[x.split() for x in open(os.path.join(os.path.dirname(__file__), "artstein_poesio_example.txt"))]) >>> t.avg_Ao() 0.88 >>> t.pi() 0.7995322418977615... >>> t.S() 0.8199999999999998... This would have returned a wrong value (0.0) in @785fb79 as coders are in the wrong order. Subsequently, all values for pi(), S(), and kappa() would have been wrong as they are computed with avg_Ao(). >>> t2 = AnnotationTask(data=[('b','1','stat'),('a','1','stat')]) >>> t2.avg_Ao() 1.0 The following, of course, also works. >>> t3 = AnnotationTask(data=[('a','1','othr'),('b','1','othr')]) >>> t3.avg_Ao() 1.0 iÿÿÿÿ(tprint_functiontunicode_literalsN(tgroupby(t itemgetter(tFreqDisttConditionalFreqDist(t deprecated(tpython_2_unicode_compatiblet iteritems(tbinary_distancetAnnotationTaskcBseZdZded„Zd„Zd„Zdd„Zd„Z d„Z d„Z e dƒdddd „ƒZ dd „Zd „Zd „Zd „Zd„Zdd„Zdd„Zd„Zd„Zd„Zd„Zd„Zd„Zd„Zdd„Zdd„ZRS(u/Represents an annotation task, i.e. people assign labels to items. Notation tries to match notation in Artstein and Poesio (2007). In general, coders and items can be represented as any hashable object. Integers, for example, are fine, though strings are more readable. Labels must support the distance functions applied to them, so e.g. a string-edit-distance makes no sense if your labels are integers, whereas interval distance needs numeric values. A notable case of this is the MASI metric, which requires Python sets. cCsV||_tƒ|_tƒ|_tƒ|_g|_|dk rR|j|ƒndS(u.Initialize an empty annotation task. N(tdistancetsettItKtCtdatatNonet load_array(tselfRR ((sh/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/metrics/agreement.pyt__init__cs      cCsdjtd„|jƒƒS(Nu cSs2d|d|djddƒdj|dƒfS(Nu%s %s %sucoderuitemu_u u,ulabels(treplacetjoin(tx((sh/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/metrics/agreement.pytps(RtmapR(R((sh/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/metrics/agreement.pyt__str__oscCssxl|D]d\}}}|jj|ƒ|jj|ƒ|jj|ƒ|jji|d6|d6|d6ƒqWdS(u¥Load the results of annotation. The argument is a list of 3-tuples, each representing a coder's labeling of an item: (coder,item,label) ucoderulabelsuitemN(RtaddRR Rtappend(Rtarraytcodertitemtlabels((sh/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/metrics/agreement.pyRts cså|p |j}t‡‡‡fd†|Dƒƒ}|dˆkrct‡‡fd†|Dƒƒ}nt‡‡fd†|Dƒƒ}dt|j|d|dƒƒ}tjdˆˆˆ|ƒtjd|d|dd|ƒ|S( u6Agreement between two coders on a given item c3s;|]1}|dˆˆfkr|dˆkr|VqdS(ucoderuitemN((t.0R(tcAtcBti(sh/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/metrics/agreement.pys ˆsucoderc3s5|]+}|dˆkr|dˆkr|VqdS(ucoderuitemN((R!R(R#R$(sh/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/metrics/agreement.pys Šsc3s5|]+}|dˆkr|dˆkr|VqdS(ucoderuitemN((R!R(R"R$(sh/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/metrics/agreement.pys Œsgð?ulabelsu.Observed agreement between %s and %s on %s: %fu"Distance between "%r" and "%r": %f(RtnexttfloatR tlogtdebug(RR"R#R$Rtk1tk2tret((R"R#R$sh/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/metrics/agreement.pytagr€s""$  cs#tt‡fd†|jDƒƒƒS(Nc3s%|]}|dˆkrdVqdS(ulabelsiN((R!R(tk(sh/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/metrics/agreement.pys –s(R&tsumR(RR-((R-sh/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/metrics/agreement.pytNk•scs&tt‡‡fd†|jDƒƒƒS(Nc3s5|]+}|dˆkr|dˆkrdVqdS(uitemulabelsiN((R!R(R$R-(sh/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/metrics/agreement.pys ™s(R&R.R(RR$R-((R$R-sh/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/metrics/agreement.pytNik˜scs&tt‡‡fd†|jDƒƒƒS(Nc3s5|]+}|dˆkr|dˆkrdVqdS(ucoderulabelsiN((R!R(tcR-(sh/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/metrics/agreement.pys œs(R&R.R(RR1R-((R1R-sh/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/metrics/agreement.pytNck›suUse Nk, Nik or Nck insteadcCsÞ|dk r6|dkr6|dkr6|j|ƒ}n‹|dk ro|dk ro|dkro|j||ƒ}nR|dk r¨|dk r¨|dkr¨|j||ƒ}ntd|||fƒ‚tjd||||ƒ|S(uHImplements the "n-notation" used in Artstein and Poesio (2007) u7You must pass either i or c, not both! (k=%r,i=%r,c=%r)uCount on N[%s,%s,%s]: %dN(RR/R0R2t ValueErrorR'R((RR-R$R1R+((sh/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/metrics/agreement.pytNžs$$$cCs4|p |j}tt|dt|ƒƒt|ƒƒS(Ntkey(RRtsortedR(RtfieldR((sh/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/metrics/agreement.pyt _grouped_data®scs}ˆjd‡‡fd†ˆjDƒƒ}tt‡‡‡fd†|Dƒƒƒttˆjƒƒ}tjdˆˆ|ƒ|S(u=Observed agreement between two coders on all items. uitemc3s+|]!}|dˆˆfkr|VqdS(ucoderN((R!R(R"R#(sh/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/metrics/agreement.pys ¶sc3s-|]#\}}ˆjˆˆ||ƒVqdS(N(R,(R!Rt item_data(R"R#R(sh/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/metrics/agreement.pys ·su(Observed agreement between %s and %s: %f(R8RR&R.tlenR R'R((RR"R#RR+((R"R#Rsh/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/metrics/agreement.pytAo²s(;cCsxd}d}|jjƒ}xL|jD]A}|j|ƒx+|D]#}||||ƒ7}|d7}q?Wq%W||}|S(uP Calculates the average of function results for each coder pair ii(Rtcopytremove(RtfunctionttotaltntsR"R#R+((sh/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/metrics/agreement.pyt_pairwise_average»s   cCs&|j|jƒ}tjd|ƒ|S(uAAverage observed agreement across all coders and items. uAverage observed agreement: %f(RBR;R'R((RR+((sh/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/metrics/agreement.pytavg_AoÊsc Cséd}x‘|jdƒD]€\}}td„|Dƒƒ}x[t|ƒD]M\}}x>t|ƒD]0\}}|t||ƒ|j||ƒ7}q^WqEWqWdtt|jƒt|jƒt|jƒdƒ|} tj d| ƒ| S(u©The observed disagreement for the alpha coefficient. The alpha coefficient, unlike the other metrics, uses this rather than observed agreement. guitemcss|]}|dVqdS(ulabelsN((R!R((sh/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/metrics/agreement.pys Úsgð?iuObserved disagreement: %f( R8RRR&R R:R RR'R(( RR?R$titemdatat label_freqstjtnjtltnlR+((sh/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/metrics/agreement.pytDo_alphaÒs0;gð?c s d}‡‡fd†|jDƒ}xJ|jd|ƒD]6\}}||jt|ƒdt|ƒdƒ7}q5W|t|jƒ|}tjdˆˆ|ƒ|S(uGThe observed disagreement for the weighted kappa coefficient. gc3s+|]!}|dˆˆfkr|VqdS(ucoderN((R!R(R"R#(sh/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/metrics/agreement.pys èsuitemulabelsu+Observed disagreement between %s and %s: %f(RR8R R%R:R R'R(( RR"R#t max_distanceR?RR$RDR+((R"R#sh/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/metrics/agreement.pytDo_Kw_pairwiseãscs/ˆj‡‡fd†ƒ}tjd|ƒ|S(u$Averaged over all labelers csˆj||ˆƒS(N(RL(R"R#(RKR(sh/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/metrics/agreement.pyRösuObserved disagreement: %f(RBR'R((RRKR+((RKRsh/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/metrics/agreement.pytDo_KwòscCs5dtt|jƒƒ}|jƒ|d|}|S(u,Bennett, Albert and Goldstein 1954 gð?(R&R:RRC(RtAeR+((sh/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/metrics/agreement.pytSûscCsŠd}td„|jDƒƒ}x(t|ƒD]\}}||d7}q,W|tt|jƒt|jƒdƒ}|jƒ|d|S(u_Scott 1955; here, multi-pi. Equivalent to K from Siegel and Castellan (1988). gcss|]}|dVqdS(ulabelsN((R!R((sh/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/metrics/agreement.pys sii(RRRR&R:R RRC(RR?RER-tfRN((sh/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/metrics/agreement.pytpis *cCsud}tt|jƒƒ}td„|jDƒƒ}x:|jƒD],}|||||||||7}qAW|S(Ngcss#|]}|d|dfVqdS(ulabelsucoderN((R!R((sh/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/metrics/agreement.pys s(R&R:R RRt conditions(RR"R#RNtnitemsRER-((sh/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/metrics/agreement.pytAe_kappas *cCsJ|j||ƒ}|j||ƒ|d|}tjd|||ƒ|S(u gð?u(Expected agreement between %s and %s: %f(RTR;R'R((RR"R#RNR+((sh/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/metrics/agreement.pytkappa_pairwisescCs|j|jƒS(uNCohen 1960 Averages naively over kappas for each coder pair. (RBRU(R((sh/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/metrics/agreement.pytkappa scCs(|j|jƒ}|jƒ|d|S(ulDavies and Fleiss 1982 Averages over observed and expected agreements for each coder pair. gð?(RBRTRC(RRN((sh/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/metrics/agreement.pyt multi_kappa'scCsãd}td„|jDƒƒ}xW|jD]L}||}x9|jD].}|t|||ƒ|j||ƒ7}qCWq)Wdt|jƒt|jƒt|jƒt|jƒd|}tj d|ƒd|j ƒ|}|S(uKrippendorff 1980 gcss|]}|dVqdS(ulabelsN((R!R((sh/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/metrics/agreement.pys 5sgð?iuExpected disagreement: %f( RRRR&R R:R RR'R(RJ(RtDeRERFRGRHR+((sh/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/metrics/agreement.pytalpha/s 0Bc sØd}t‡‡fd†|jDƒƒ}xS|jD]H}x?|jD]4}||ˆ||ˆ||j||ƒ7}qBWq2W||tt|jƒdƒ}tjdˆˆ|ƒ|j ˆˆƒ} d| |} | S(uCohen 1968 gc3s9|]/}|dˆˆfkr|d|dfVqdS(ucoderulabelsN((R!R(R"R#(sh/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/metrics/agreement.pys Dsiu+Expected disagreement between %s and %s: %fgð?( RRRR tpowR:R R'R(RL( RR"R#RKR?RERFRHRXtDoR+((R"R#sh/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/metrics/agreement.pytweighted_kappa_pairwise?s6 csˆj‡‡fd†ƒS(uCohen 1968 csˆj||ˆƒS(N(R\(R"R#(RKR(sh/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/metrics/agreement.pyRTs(RB(RRK((RKRsh/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/metrics/agreement.pytweighted_kappaPsN(t__name__t __module__t__doc__RR RRRR,R/R0R2RR4R8R;RBRCRJRLRMRORQRTRURVRWRYR\R](((sh/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/metrics/agreement.pyR Us4                  u__main__(R u-du --distancetdestudistancetdefaultubinary_distancethelpudistance metric to useu-au --agreementu agreementukappau"agreement coefficient to calculateu-eu --excludeuexcludetactionuappendu8coder names to exclude (may be specified multiple times)u-iu --includeuincludeu.coder names to include, same format as excludeu-fu--fileufileuPfile to read labelings from, each line with three columns: 'labeler item labels'u-vu --verboseuverboseu0u+how much debugging to print on stderr (0-4)u-cu --columnsepu columnsepu uIchar/string that separates the three columns in the file, defaults to tabu-lu --labelsepulabelsepu,u[char/string that separates labels (if labelers can assign more than one), defaults to commau-pu --presenceupresenceu=convert each labeling into 1 or 0, based on presence of LABELu-Tu --thoroughuthoroughu store_trueu6calculate agreement for every subset of the annotatorstleveli2i urii(DR`t __future__RRtloggingt itertoolsRtoperatorRtnltk.probabilityRRtnltk.internalsRt nltk.compatRRtnltk.metrics.distanceR t getLoggert__file__R'tobjectR R^tretoptparset nltk.metricsR t OptionParsertparsert add_optionRtFalset parse_argstoptionst remaindertfilet print_helptexitt basicConfigtinttverboseRtopentinfileRHtsplitt columnsepttokststrt frozensettstriptlabelsepRtobject_R tincludetexcludeR:Rtpresencetgetattrttasktthoroughtprintt agreementtshutdown(((sh/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/metrics/agreement.pytFst ÿ     !!   ! E$$# '