ó <¿CVc@s—dZddlmZmZddlmZddlmZmZm Z m Z ddl m Z de fd„ƒYZ d„Zed kr“eƒnd S( uë A classifier based on the Naive Bayes algorithm. In order to find the probability for a label, this algorithm first uses the Bayes rule to express P(label|features) in terms of P(label) and P(features|label): | P(label) * P(features|label) | P(label|features) = ------------------------------ | P(features) The algorithm then makes the 'naive' assumption that all features are independent, given the label: | P(label) * P(f1|label) * ... * P(fn|label) | P(label|features) = -------------------------------------------- | P(features) Rather than computing P(featues) explicitly, the algorithm just calculates the denominator for each label, and normalizes them so they sum to one: | P(label) * P(f1|label) * ... * P(fn|label) | P(label|features) = -------------------------------------------- | SUM[l]( P(l) * P(f1|l) * ... * P(fn|l) ) iÿÿÿÿ(tprint_functiontunicode_literals(t defaultdict(tFreqDisttDictionaryProbDistt ELEProbDisttsum_logs(t ClassifierItNaiveBayesClassifiercBs\eZdZd„Zd„Zd„Zd„Zdd„Zdd„Ze e d „ƒZ RS( u A Naive Bayes classifier. Naive Bayes classifiers are paramaterized by two probability distributions: - P(label) gives the probability that an input will receive each label, given no information about the input's features. - P(fname=fval|label) gives the probability that a given feature (fname) will receive a given value (fval), given that the label (label). If the classifier encounters an input with a feature that has never been seen with any label, then rather than assigning a probability of 0 to all labels, it will ignore that feature. The feature value 'None' is reserved for unseen feature values; you generally should not use 'None' as a feature value for one of your own features. cCs+||_||_t|jƒƒ|_dS(u= :param label_probdist: P(label), the probability distribution over labels. It is expressed as a ``ProbDistI`` whose samples are labels. I.e., P(label) = ``label_probdist.prob(label)``. :param feature_probdist: P(fname=fval|label), the probability distribution for feature values, given labels. It is expressed as a dictionary whose keys are ``(label, fname)`` pairs and whose values are ``ProbDistI`` objects over feature values. I.e., P(fname=fval|label) = ``feature_probdist[label,fname].prob(fval)``. If a given ``(label,fname)`` is not a key in ``feature_probdist``, then it is assumed that the corresponding P(fname=fval|label) is 0 for all values of ``fval``. N(t_label_probdistt_feature_probdisttlisttsamplest_labels(tselftlabel_probdisttfeature_probdist((sj/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/classify/naivebayes.pyt__init__?s  cCs|jS(N(R (R((sj/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/classify/naivebayes.pytlabelsTscCs|j|ƒjƒS(N(t prob_classifytmax(Rt featureset((sj/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/classify/naivebayes.pytclassifyWscCs.|jƒ}xNt|jƒƒD]:}x1|jD]}||f|jkr/Pq/q/W||=qWi}x'|jD]}|jj|ƒ||¡scSsdS(Ngð?((((sj/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/classify/naivebayes.pyR0¢siR&csˆ|ˆ|S(N((tfeature_(tmaxprobtminprob(sj/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/classify/naivebayes.pyR0±s( tsetRR RR taddR"RtmintdiscardR)( RR+tfeaturesRRtprobdistR tfeaturetp((R2R3sj/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/classify/naivebayes.pyR(“s "   cCs˜tƒ}ttƒ}ttƒ}tƒ}xy|D]q\}}||cd7\\}} }||dt|| ƒƒ}|||| fs"¿