ó
R;]c @@ sW d Z d d l m Z d d l m Z d d l Z d d l Z d d e d d „ Z d S( s+ Provide utilities for text data processing.i ( t absolute_import( t print_functionNt s
c C@ sy t d t j | d | | ƒ ƒ } | rK g | D] } | j ƒ ^ q0 } n | d k rd t j | ƒ S| j | ƒ | Sd S( s' Counts tokens in the specified string.
For token_delim='
' and seq_delim='', a specified string of two sequences of tokens may
look like::
token1 | token2 | token3 | token4 | token5 |
Parameters
----------
source_str : str
A source string of tokens.
token_delim : str, default ' '
A token delimiter.
seq_delim : str, default '\\n'
A sequence delimiter.
to_lower : bool, default False
Whether to convert the source source_str to the lower case.
counter_to_update : collections.Counter or None, default None
The collections.Counter instance to be updated with the token counts of `source_str`. If
None, return a new collections.Counter instance counting tokens from `source_str`.
Returns
-------
collections.Counter
The `counter_to_update` collections.Counter instance after being updated with the token
counts of `source_str`. If `counter_to_update` is None, return a new collections.Counter
instance counting tokens from `source_str`.
Examples
--------
>>> source_str = ' Life is great ! \n life is good . \n'
>>> count_tokens_from_str(token_line, ' ', '\n', True)
Counter({'!': 1, '.': 1, 'good': 1, 'great': 1, 'is': 2, 'life': 2})
t |N( t filtert Nonet ret splitt lowert collectionst Countert update( t
source_strt token_delimt seq_delimt to_lowert counter_to_updatet t( ( sX /usr/local/lib/python2.7/site-packages/mxnet-1.2.1-py2.7.egg/mxnet/contrib/text/utils.pyt count_tokens_from_str s ("
( t __doc__t
__future__R R R R t FalseR R ( ( ( sX /usr/local/lib/python2.7/site-packages/mxnet-1.2.1-py2.7.egg/mxnet/contrib/text/utils.pyt s | | |