ó <¿CVc@@sCdZddlmZddlmZddlZddlmZddlm Z ddl m Z ddl m Z dd lmZmZddljZd „Zd „Zd „Zd „Zdefd„ƒYZdefd„ƒYZdefd„ƒYZdefd„ƒYZdefd„ƒYZdefd„ƒYZdS(s¹Various classifier implementations. Also includes basic feature extractor methods. Example Usage: :: >>> from textblob import TextBlob >>> from textblob.classifiers import NaiveBayesClassifier >>> train = [ ... ('I love this sandwich.', 'pos'), ... ('This is an amazing place!', 'pos'), ... ('I feel very good about these beers.', 'pos'), ... ('I do not like this restaurant', 'neg'), ... ('I am tired of this stuff.', 'neg'), ... ("I can't deal with this", 'neg'), ... ("My boss is horrible.", "neg") ... ] >>> cl = NaiveBayesClassifier(train) >>> cl.classify("I feel amazing!") 'pos' >>> blob = TextBlob("The beer is good. But the hangover is horrible.", classifier=cl) >>> for s in blob.sentences: ... print(s) ... print(s.classify()) ... The beer is good. pos But the hangover is horrible. neg .. versionadded:: 0.6.0 i(tabsolute_import(tchainN(t basestring(tcached_property(t FormatError(t word_tokenize(t strip_punct is_filelikec@s2d„‰tj‡fd†|Dƒƒ}t|ƒS(s±Return a set of all words in a dataset. :param dataset: A list of tuples of the form ``(words, label)`` where ``words`` is either a string of a list of tokens. cS@s't|tƒrt|dtƒS|SdS(Nt include_punc(t isinstanceRRtFalse(twords((sj/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/textblob/textblob/classifiers.pyttokenize9sc3@s!|]\}}ˆ|ƒVqdS(N((t.0R t_(R (sj/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/textblob/textblob/classifiers.pys >s(Rt from_iterabletset(tdatasett all_words((R sj/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/textblob/textblob/classifiers.pyt_get_words_from_dataset1s cC@sNt|tƒr4td„t|dtƒDƒƒ}ntd„|Dƒƒ}|S(Ncs@s!|]}t|dtƒVqdS(tallN(RR (R tw((sj/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/textblob/textblob/classifiers.pys CsRcs@s!|]}t|dtƒVqdS(RN(RR (R R((sj/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/textblob/textblob/classifiers.pys Fs(R RRRR (tdocumentttokens((sj/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/textblob/textblob/classifiers.pyt_get_document_tokensAs  c@s8t|ƒ}t|ƒ‰t‡fd†|Dƒƒ}|S(sEA basic document feature extractor that returns a dict indicating what words in ``train_set`` are contained in ``document``. :param document: The text to extract features from. Can be a string or an iterable. :param list train_set: Training data set, a list of tuples of the form ``(words, label)``. c3@s*|] }dj|ƒ|ˆkfVqdS(u contains({0})N(tformat(R tword(R(sj/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/textblob/textblob/classifiers.pys Ss(RRtdict(Rt train_sett word_featurestfeatures((Rsj/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/textblob/textblob/classifiers.pytbasic_extractorIs    cC@s&t|ƒ}td„|Dƒƒ}|S(sdA basic document feature extractor that returns a dict of words that the document contains. cs@s$|]}dj|ƒtfVqdS(u contains({0})N(RtTrue(R R((sj/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/textblob/textblob/classifiers.pys ]s(RR(RRR((sj/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/textblob/textblob/classifiers.pytcontains_extractorXs tBaseClassifiercB@s\eZdZedd„Zdd„Zed„ƒZd„Z d„Z d„Z d„Z RS( sÀAbstract classifier class from which all classifers inherit. At a minimum, descendant classes must implement a ``classify`` method and have a ``classifier`` property. :param train_set: The training set, either a list of tuples of the form ``(text, classification)`` or a file-like object. ``text`` may be either a string or an iterable. :param callable feature_extractor: A feature extractor function that takes one or two arguments: ``document`` and ``train_set``. :param str format: If ``train_set`` is a filename, the file format, e.g. ``"csv"`` or ``"json"``. If ``None``, will attempt to detect the file format. :param kwargs: Additional keyword arguments are passed to the constructor of the :class:`Format ` class used to read the data. Only applies when a file-like object is passed as ``train_set``. .. versionadded:: 0.6.0 cK@sL||_||_t|ƒr6|j||ƒ|_n ||_d|_dS(N(t format_kwargstfeature_extractorRt _read_dataRtNonettrain_features(tselfRR$Rtkwargs((sj/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/textblob/textblob/classifiers.pyt__init__ws     cC@sƒ|s-tj|ƒ}|smtdƒ‚qmn@tjƒ}||jƒkrctdj|ƒƒ‚n||}|||jjƒS(shReads a data file and returns an iterable that can be used as testing or training data. s@Could not automatically detect format for the given data source.s'{0}' format not supported.( tformatstdetectRt get_registrytkeyst ValueErrorRR#t to_iterable(R(RRt format_classtregistry((sj/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/textblob/textblob/classifiers.pyR%€s  cC@stdƒ‚dS(sThe classifier object.s)Must implement the "classifier" property.N(tNotImplementedError(R(((sj/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/textblob/textblob/classifiers.pyt classifier‘scC@stdƒ‚dS(sClassifies a string of text.s#Must implement a "classify" method.N(R3(R(ttext((sj/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/textblob/textblob/classifiers.pytclassify–scC@stdƒ‚dS(sTrains the classifier.s Must implement a "train" method.N(R3(R(tlabeled_featureset((sj/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/textblob/textblob/classifiers.pyttrainšscC@stdƒ‚dS(s3Returns an iterable containing the possible labels.s!Must implement a "labels" method.N(R3(R(((sj/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/textblob/textblob/classifiers.pytlabelsžscC@s?y|j||jƒSWn!ttfk r:|j|ƒSXdS(sWExtracts features from a body of text. :rtype: dictionary of features N(R$Rt TypeErrortAttributeError(R(R5((sj/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/textblob/textblob/classifiers.pytextract_features¢sN( t__name__t __module__t__doc__RR&R*R%RR4R6R8R9R<(((sj/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/textblob/textblob/classifiers.pyR"bs    tNLTKClassifiercB@skeZdZd Zed d„Zd„Zed„ƒZ d„Z d„Z d„Z d d„Z d„ZRS( sHAn abstract class that wraps around the nltk.classify module. Expects that descendant classes include a class variable ``nltk_class`` which is the class in the nltk.classify module to be wrapped. Example: :: class MyClassifier(NLTKClassifier): nltk_class = nltk.classify.svm.SvmClassifier cK@sWtt|ƒj||||g|jD]!\}}|j|ƒ|f^q)|_dS(N(tsuperR@R*RR<R'(R(RR$RR)tdtc((sj/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/textblob/textblob/classifiers.pyR*½scC@s+|jj}djd|dt|jƒƒS(Ns <{cls} trained on {n} instances>tclstn(t __class__R=RtlenR(R(t class_name((sj/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/textblob/textblob/classifiers.pyt__repr__Âs cC@s2y|jƒSWntk r-tdƒ‚nXdS(sThe classifier.s@NLTKClassifier must have a nltk_class variable that is not None.N(R8R;R/(R(((sj/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/textblob/textblob/classifiers.pyR4Çs cO@sMy)|jj|j||Ž|_|jSWntk rHtdƒ‚nXdS(sŸTrain the classifier with a labeled feature set and return the classifier. Takes the same arguments as the wrapped NLTK class. This method is implicitly called when calling ``classify`` or ``accuracy`` methods and is included only to allow passing in arguments to the ``train`` method of the wrapped NLTK class. .. versionadded:: 0.6.2 :rtype: A classifier s@NLTKClassifier must have a nltk_class variable that is not None.N(t nltk_classR8R'R4R;R/(R(targsR)((sj/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/textblob/textblob/classifiers.pyR8Ðs   cC@s |jjƒS(s&Return an iterable of possible labels.(R4R9(R(((sj/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/textblob/textblob/classifiers.pyR9ãscC@s|j|ƒ}|jj|ƒS(sIClassifies the text. :param str text: A string of text. (R<R4R6(R(R5t text_features((sj/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/textblob/textblob/classifiers.pyR6çscC@sht|ƒr|j|ƒ}n|}g|D]!\}}|j|ƒ|f^q+}tjj|j|ƒS(sGCompute the accuracy on a test set. :param test_set: A list of tuples of the form ``(text, label)``, or a file pointer. :param format: If ``test_set`` is a filename, the file format, e.g. ``"csv"`` or ``"json"``. If ``None``, will attempt to detect the file format. (RR%R<tnltkR6taccuracyR4(R(ttest_setRt test_dataRBRCt test_features((sj/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/textblob/textblob/classifiers.pyRNïs .cO@s‰|j|7_g|jD]!\}}|j|ƒ|f^q|_y"|jj|j||Ž|_Wntk r„tdƒ‚nXtS(s½Update the classifier with new training data and re-trains the classifier. :param new_data: New data as a list of tuples of the form ``(text, label)``. s@NLTKClassifier must have a nltk_class variable that is not None.( RR<R'RJR8R4R;R/R (R(tnew_dataRKR)RBRC((sj/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/textblob/textblob/classifiers.pytupdateÿs1 N(R=R>R?R&RJRR*RIRR4R8R9R6RNRS(((sj/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/textblob/textblob/classifiers.pyR@®s      tNaiveBayesClassifiercB@s5eZdZejjZd„Zd„Zd„Z RS(sPA classifier based on the Naive Bayes algorithm, as implemented in NLTK. :param train_set: The training set, either a list of tuples of the form ``(text, classification)`` or a filename. ``text`` may be either a string or an iterable. :param feature_extractor: A feature extractor function that takes one or two arguments: ``document`` and ``train_set``. :param format: If ``train_set`` is a filename, the file format, e.g. ``"csv"`` or ``"json"``. If ``None``, will attempt to detect the file format. .. versionadded:: 0.6.0 cC@s|j|ƒ}|jj|ƒS(s²Return the label probability distribution for classifying a string of text. Example: :: >>> classifier = NaiveBayesClassifier(train_data) >>> prob_dist = classifier.prob_classify("I feel happy this morning.") >>> prob_dist.max() 'positive' >>> prob_dist.prob("positive") 0.7 :rtype: nltk.probability.DictionaryProbDist (R<R4t prob_classify(R(R5RL((sj/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/textblob/textblob/classifiers.pyRU$scO@s|jj||ŽS(sŽReturn the most informative features as a list of tuples of the form ``(feature_name, feature_value)``. :rtype: list (R4tmost_informative_features(R(RKR)((sj/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/textblob/textblob/classifiers.pytinformative_features7scO@s|jj||ŽS(soDisplays a listing of the most informative features for this classifier. :rtype: None (R4tshow_most_informative_features(R(RKR)((sj/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/textblob/textblob/classifiers.pytshow_informative_features?s( R=R>R?RMR6RTRJRURWRY(((sj/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/textblob/textblob/classifiers.pyRTs    tDecisionTreeClassifiercB@s5eZdZejjjZd„ZeZ d„Z RS(sRA classifier based on the decision tree algorithm, as implemented in NLTK. :param train_set: The training set, either a list of tuples of the form ``(text, classification)`` or a filename. ``text`` may be either a string or an iterable. :param feature_extractor: A feature extractor function that takes one or two arguments: ``document`` and ``train_set``. :param format: If ``train_set`` is a filename, the file format, e.g. ``"csv"`` or ``"json"``. If ``None``, will attempt to detect the file format. .. versionadded:: 0.6.2 cO@s|jj||ŽS(sReturn a string containing a pretty-printed version of this decision tree. Each line in the string corresponds to a single decision tree node or leaf, and indentation is used to display the structure of the tree. :rtype: str (R4t pretty_format(R(RKR)((sj/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/textblob/textblob/classifiers.pyR[ZscO@s|jj||ŽS(s­Return a string representation of this decision tree that expresses the decisions it makes as a nested set of pseudocode if statements. :rtype: str (R4t pseudocode(R(RKR)((sj/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/textblob/textblob/classifiers.pyR\fs( R=R>R?RMR6t decisiontreeRZRJR[tpprintR\(((sj/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/textblob/textblob/classifiers.pyRZHs  tPositiveNaiveBayesClassifiercB@sMeZdZejjZedd„Zd„Z d„Z dddd„Z RS(sA variant of the Naive Bayes Classifier that performs binary classification with partially-labeled training sets, i.e. when only one class is labeled and the other is not. Assuming a prior distribution on the two labels, uses the unlabeled set to estimate the frequencies of the features. Example usage: :: >>> from text.classifiers import PositiveNaiveBayesClassifier >>> sports_sentences = ['The team dominated the game', ... 'They lost the ball', ... 'The game was intense', ... 'The goalkeeper catched the ball', ... 'The other team controlled the ball'] >>> various_sentences = ['The President did not comment', ... 'I lost the keys', ... 'The team won the game', ... 'Sara has two kids', ... 'The ball went off the court', ... 'They had the ball for the whole game', ... 'The show is over'] >>> classifier = PositiveNaiveBayesClassifier(positive_set=sports_sentences, ... unlabeled_set=various_sentences) >>> classifier.classify("My team lost the game") True >>> classifier.classify("And now for something completely different.") False :param positive_set: A collection of strings that have the positive label. :param unlabeled_set: A collection of unlabeled strings. :param feature_extractor: A feature extractor function. :param positive_prob_prior: A prior estimate of the probability of the label ``True``. .. versionadded:: 0.7.0 gà?cK@sx||_||_||_g|jD]}|j|ƒ^q%|_g|jD]}|j|ƒ^qM|_||_dS(N(R$t positive_sett unlabeled_setR<tpositive_featurestunlabeled_featurestpositive_prob_prior(R(R`RaR$RdR)RB((sj/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/textblob/textblob/classifiers.pyR*™s   %%cC@s:|jj}djd|dt|jƒdt|jƒƒS(NsH<{cls} trained on {n_pos} labeled and {n_unlabeled} unlabeled instances>RDtn_post n_unlabeled(RFR=RRGR`Ra(R(RH((sj/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/textblob/textblob/classifiers.pyRI¥s  cO@s+|jj|j|j|jƒ|_|jS(sTrain the classifier with a labeled and unlabeled feature sets and return the classifier. Takes the same arguments as the wrapped NLTK class. This method is implicitly called when calling ``classify`` or ``accuracy`` methods and is included only to allow passing in arguments to the ``train`` method of the wrapped NLTK class. :rtype: A classifier (RJR8RbRcRdR4(R(RKR)((sj/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/textblob/textblob/classifiers.pyR8¬s cO@s½||_|rL|j|7_|jg|D]}|j|ƒ^q,7_n|r|j|7_|jg|D]}|j|ƒ^qo7_n|jj|j|j|j||Ž|_t S(sÖUpdate the classifier with new data and re-trains the classifier. :param new_positive_data: List of new, labeled strings. :param new_unlabeled_data: List of new, unlabeled strings. ( RdR`RbR<RaRcRJR8R4R (R(tnew_positive_datatnew_unlabeled_dataRdRKR)RB((sj/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/textblob/textblob/classifiers.pyRSºs  $ $N( R=R>R?RMR6R_RJR!R*RIR8R&RS(((sj/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/textblob/textblob/classifiers.pyR_os&   tMaxEntClassifiercB@s2eZejjjjZejjjZd„ZRS(cC@s|j|ƒ}|jj|ƒS(s®Return the label probability distribution for classifying a string of text. Example: :: >>> classifier = MaxEntClassifier(train_data) >>> prob_dist = classifier.prob_classify("I feel happy this morning.") >>> prob_dist.max() 'positive' >>> prob_dist.prob("positive") 0.7 :rtype: nltk.probability.DictionaryProbDist (R<R4RU(R(R5tfeats((sj/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/textblob/textblob/classifiers.pyRU×s( R=R>RMR6tmaxenttMaxentClassifierR?RJRU(((sj/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/textblob/textblob/classifiers.pyRiÓs(R?t __future__Rt itertoolsRRMttextblob.compatRttextblob.decoratorsRttextblob.exceptionsRttextblob.tokenizersRttextblob.utilsRRttextblob.formatsR+RRRR!tobjectR"R@RTRZR_Ri(((sj/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/textblob/textblob/classifiers.pyt!s&     Ld6'd