ó R;]c@@sWdZddlmZddlmZddlZddlZddedd„ZdS(s+Provide utilities for text data processing.i(tabsolute_import(tprint_functionNt s cC@sytdtj|d||ƒƒ}|rKg|D]}|jƒ^q0}n|dkrdtj|ƒS|j|ƒ|SdS(s'Counts tokens in the specified string. For token_delim='' and seq_delim='', a specified string of two sequences of tokens may look like:: token1token2token3token4token5 Parameters ---------- source_str : str A source string of tokens. token_delim : str, default ' ' A token delimiter. seq_delim : str, default '\\n' A sequence delimiter. to_lower : bool, default False Whether to convert the source source_str to the lower case. counter_to_update : collections.Counter or None, default None The collections.Counter instance to be updated with the token counts of `source_str`. If None, return a new collections.Counter instance counting tokens from `source_str`. Returns ------- collections.Counter The `counter_to_update` collections.Counter instance after being updated with the token counts of `source_str`. If `counter_to_update` is None, return a new collections.Counter instance counting tokens from `source_str`. Examples -------- >>> source_str = ' Life is great ! \n life is good . \n' >>> count_tokens_from_str(token_line, ' ', '\n', True) Counter({'!': 1, '.': 1, 'good': 1, 'great': 1, 'is': 2, 'life': 2}) t|N(tfiltertNonetretsplittlowert collectionstCountertupdate(t source_strt token_delimt seq_delimtto_lowertcounter_to_updatett((sX/usr/local/lib/python2.7/site-packages/mxnet-1.2.1-py2.7.egg/mxnet/contrib/text/utils.pytcount_tokens_from_strs("   ( t__doc__t __future__RRR RtFalseRR(((sX/usr/local/lib/python2.7/site-packages/mxnet-1.2.1-py2.7.egg/mxnet/contrib/text/utils.pyts