ó <¿CVc@@s³dZddlmZddlmZddlZddlmZddlm Z ddl mZde fd „ƒYZd e fd„ƒYZ e ƒjZeƒZed„ZdS( s<Various tokenizer implementations. .. versionadded:: 0.4.0 i(tabsolute_import(tchainN(t strip_punc(t BaseTokenizer(trequires_nltk_corpust WordTokenizercB@seZdZed„ZRS(skNLTK's recommended word tokenizer (currently the TreeBankTokenizer). Uses regular expressions to tokenize text. Assumes text has already been segmented into sentences. Performs the following steps: * split standard contractions, e.g. don't -> do n't * split commas and single quotes * separate periods that appear at the end of line cC@sjtjj|ƒ}|r|Sg|D]?}t|dtƒr#|jdƒrP|nt|dtƒ^q#SdS(s¸Return a list of word tokens. :param text: string of text. :param include_punc: (optional) whether to include punctuation as separate tokens. Default to True. tallt'N(tnltkttokenizet word_tokenizeRtFalset startswith(tselfttexttinclude_puncttokenstword((si/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/textblob/textblob/tokenizers.pyR s (t__name__t __module__t__doc__tTrueR (((si/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/textblob/textblob/tokenizers.pyRs tSentenceTokenizercB@seZdZed„ƒZRS(sïNLTK's sentence tokenizer (currently PunkSentenceTokenizer). Uses an unsupervised algorithm to build a model for abbreviation words, collocations, and words that start sentences, then uses that to find sentence boundaries. cC@stjj|ƒS(sReturn a list of sentences.(RR t sent_tokenize(R R((si/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/textblob/textblob/tokenizers.pyR 6s(RRRRR (((si/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/textblob/textblob/tokenizers.pyR/sc@s/tj‡‡‡fd†t|ƒDƒƒ}|S(sÆConvenience function for tokenizing text into words. NOTE: NLTK's word tokenizer expects sentences as input, so the text will be tokenized to sentences before being tokenized to words. c3@s*|] }tj|dˆˆˆŽVqdS(RN(t_word_tokenizert itokenize(t.0tsentence(targsRtkwargs(si/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/textblob/textblob/tokenizers.pys Fs(Rt from_iterableR(RRRRtwords((RRRsi/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/textblob/textblob/tokenizers.pyR ?s(Rt __future__Rt itertoolsRRttextblob.utilsRt textblob.baseRttextblob.decoratorsRRRRRRRR (((si/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/textblob/textblob/tokenizers.pyts