ó <¿CVc@s®dZddlmZddlmZddlmZmZddl m Z ddl m Z m ZmZmZddlmZddlmZmZd efd „ƒYZd S( s­ A SentimentAnalyzer is a tool to implement and facilitate Sentiment Analysis tasks using NLTK features and classifiers, especially for teaching and demonstrative purposes. iÿÿÿÿ(tprint_function(t defaultdict(tapply_featurestaccuracy(tBigramCollocationFinder(tBigramAssocMeasurest precisiontrecallt f_measure(tFreqDist(t save_filettimertSentimentAnalyzercBs˜eZdZd d„Zd d„Zd d„Zd dd„Zd dej d„Z d„Z d „Z d „Z d d „Zd eeeeed „ZRS(sI A Sentiment Analysis tool based on machine learning approaches. cCsttƒ|_||_dS(N(Rtlisttfeat_extractorst classifier(tselfR((ss/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/sentiment/sentiment_analyzer.pyt__init__ scCs’g}|dkr.|o(t|dtƒ}n|tkraxQ|D]\}}|j|ƒqAWn-|tkrŽx|D]}|j|ƒqtWn|S(sÇ Return all words/tokens from the documents (with duplicates). :param documents: a list of (words, label) tuples. :param labeled: if `True`, assume that each document is represented by a (words, label) tuple: (list(str), str). If `False`, each document is considered as being a simple list of strings: list(str). :rtype: list(str) :return: A list of all words/tokens in `documents`. iN(tNonet isinstancettupletTruetextendtFalse(Rt documentstlabeledt all_wordstwordst sentiment((ss/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/sentiment/sentiment_analyzer.pyR$s     cCst|j||ƒS(s Apply all feature extractor functions to the documents. This is a wrapper around `nltk.classify.util.apply_features`. If `labeled=False`, return featuresets as: [feature_func(doc) for doc in documents] If `labeled=True`, return featuresets as: [(feature_func(tok), label) for (tok, label) in toks] :param documents: a list of documents. `If labeled=True`, the method expects a list of (words, label) tuples. :rtype: LazyMap (Rtextract_features(RRR((ss/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/sentiment/sentiment_analyzer.pyR9sicCsLtd„|Dƒƒ}g|j|ƒD]"\}}|||kr&|^q&S(s7 Return most common top_n word features. :param words: a list of words/tokens. :param top_n: number of best words/tokens to use, sorted by frequency. :rtype: list(str) :return: A list of `top_n` words/tokens (with no duplicates) sorted by frequency. css|] }|VqdS(N((t.0tword((ss/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/sentiment/sentiment_analyzer.pys Ts(R t most_common(RRttop_ntmin_freqtunigram_feats_freqstwtf((ss/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/sentiment/sentiment_analyzer.pytunigram_word_featsIs icCs,tj|ƒ}|j|ƒ|j||ƒS(si Return `top_n` bigram features (using `assoc_measure`). Note that this method is based on bigram collocations measures, and not on simple bigram frequency. :param documents: a list (or iterable) of tokens. :param top_n: number of best words/tokens to use, sorted by association measure. :param assoc_measure: bigram association measure to use as score function. :param min_freq: the minimum number of occurrencies of bigrams to take into consideration. :return: `top_n` ngrams scored by the given association measure. (Rtfrom_documentstapply_freq_filtertnbest(RRR!R"t assoc_measuretfinder((ss/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/sentiment/sentiment_analyzer.pytbigram_collocation_featsXs cCs,|j|gdtƒ}|jj|dƒS(s  Classify a single instance applying the features that have already been stored in the SentimentAnalyzer. :param instance: a list (or iterable) of tokens. :return: the classification result given by applying the classifier. Ri(RRRtclassify(Rtinstancetinstance_feats((ss/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/sentiment/sentiment_analyzer.pyR-lscKs|j|j|ƒdS(sG Add a new function to extract features from a document. This function will be used in extract_features(). Important: in this step our kwargs are only representing additional parameters, and NOT the document we have to parse. The document will always be the first parameter in the parameter list, and it will be added in the extract_features() function. :param function: the extractor function to add to the list of feature extractors. :param kwargs: additional parameters required by the `function` function. N(Rtappend(Rtfunctiontkwargs((ss/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/sentiment/sentiment_analyzer.pytadd_feat_extractorws cCsRi}xE|jD]:}x$|j|D]}|||}q$W|j|ƒqW|S(sk Apply extractor functions (and their parameters) to the present document. We pass `document` as the first parameter of the extractor functions. If we want to use the same extractor function multiple times, we have to add it to the extractors with `add_feat_extractor` using multiple sets of parameters (one for each call of the extractor function). :param document: the document that will be passed as argument to the feature extractor functions. :return: A dictionary of populated features extracted from the document. :rtype: dict (Rtupdate(Rtdocumentt all_featurest extractort param_settfeats((ss/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/sentiment/sentiment_analyzer.pyR…s cKs<tdƒ||||_|r5t|j|ƒn|jS(ss Train classifier on the training set, optionally saving the output in the file specified by `save_classifier`. Additional arguments depend on the specific trainer used. For example, a MaxentClassifier can use `max_iter` parameter to specify the number of iterations, while a NaiveBayesClassifier cannot. :param trainer: `train` method of a classifier. E.g.: NaiveBayesClassifier.train :param training_set: the training set to be passed as argument to the classifier `train` method. :param save_classifier: the filename of the file where the classifier will be stored (optional). :param kwargs: additional parameters that will be passed as arguments to the classifier `train` function. :return: A classifier instance trained on the training set. sTraining classifier(tprintRR (Rttrainert training_settsave_classifierR2((ss/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/sentiment/sentiment_analyzer.pyttrain™s  cCsä|dkr|j}ntdjt|ƒjƒƒi}|tkrbt||ƒ} | |dRRRY(((ss/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/sentiment/sentiment_analyzer.pyR s       N(R[t __future__Rt collectionsRtnltk.classify.utilRRRCtnltk.collocationsRt nltk.metricsRRRGRRHRRItnltk.probabilityR tnltk.sentiment.utilR R tobjectR (((ss/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/sentiment/sentiment_analyzer.pyts"