ó <¿CVc@sHdZddlZddlmZdefd„ƒYZeƒjZdS(sê S-Expression Tokenizer ``SExprTokenizer`` is used to find parenthesized expressions in a string. In particular, it divides a string into a sequence of substrings that are either parenthesized expressions (including any nested parenthesized expressions), or other whitespace-separated tokens. >>> from nltk.tokenize import SExprTokenizer >>> SExprTokenizer().tokenize('(a b (c d)) e f (g)') ['(a b (c d))', 'e', 'f', '(g)'] By default, `SExprTokenizer` will raise a ``ValueError`` exception if used to tokenize an expression with non-matching parentheses: >>> SExprTokenizer().tokenize('c) d) e (f (g') Traceback (most recent call last): ... ValueError: Un-matched close paren at char 1 The ``strict`` argument can be set to False to allow for non-matching parentheses. Any unmatched close parentheses will be listed as their own s-expression; and the last partial sexpr with unmatched open parentheses will be listed as its own sexpr: >>> SExprTokenizer(strict=False).tokenize('c) d) e (f (g') ['c', ')', 'd', ')', 'e', '(f (g'] The characters used for open and close parentheses may be customized using the ``parens`` argument to the `SExprTokenizer` constructor: >>> SExprTokenizer(parens='{}').tokenize('{a b {c d}} e f {g}') ['{a b {c d}}', 'e', 'f', '{g}'] The s-expression tokenizer is also available as a function: >>> from nltk.tokenize import sexpr_tokenize >>> sexpr_tokenize('(a b (c d)) e f (g)') ['(a b (c d))', 'e', 'f', '(g)'] iÿÿÿÿN(t TokenizerItSExprTokenizercBs&eZdZded„Zd„ZRS(s\ A tokenizer that divides strings into s-expressions. An s-expresion can be either: - a parenthesized expression, including any nested parenthesized expressions, or - a sequence of non-whitespace non-parenthesis characters. For example, the string ``(a (b c)) d e (f)`` consists of four s-expressions: ``(a (b c))``, ``d``, ``e``, and ``(f)``. By default, the characters ``(`` and ``)`` are treated as open and close parentheses, but alternative strings may be specified. :param parens: A two-element sequence specifying the open and close parentheses that should be used to find sexprs. This will typically be either a two-character string, or a list of two strings. :type parens: str or list :param strict: If true, then raise an exception when tokenizing an ill-formed sexpr. s()cCs~t|ƒdkr!tdƒ‚n||_|d|_|d|_tjdtj|dƒtj|dƒfƒ|_dS(Nis'parens must contain exactly two stringsiis%s|%s( tlent ValueErrort_strictt _open_parent _close_parentretcompiletescapet _paren_regexp(tselftparenststrict((se/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/tokenize/sexpr.pyt__init__Ns   cCsig}d}d}x|jj|ƒD]î}|jƒ}|dkro||||jƒ!jƒ7}|jƒ}n||jkr‹|d7}n||jkr%|jrÈ|dkrÈtd|jƒƒ‚nt d|dƒ}|dkr|j |||j ƒ!ƒ|j ƒ}qq%q%W|jr?|dkr?td|ƒ‚n|t |ƒkre|j ||ƒn|S(sQ Return a list of s-expressions extracted from *text*. For example: >>> SExprTokenizer().tokenize('(a b (c d)) e f (g)') ['(a b (c d))', 'e', 'f', '(g)'] All parentheses are assumed to mark s-expressions. (No special processing is done to exclude parentheses that occur inside strings, or following backslash characters.) If the given expression contains non-matching parentheses, then the behavior of the tokenizer depends on the ``strict`` parameter to the constructor. If ``strict`` is ``True``, then raise a ``ValueError``. If ``strict`` is ``False``, then any unmatched close parentheses will be listed as their own s-expression; and the last partial s-expression with unmatched open parentheses will be listed as its own s-expression: >>> SExprTokenizer(strict=False).tokenize('c) d) e (f (g') ['c', ')', 'd', ')', 'e', '(f (g'] :param text: the string to be tokenized :type text: str or iter(str) :rtype: iter(str) iis!Un-matched close paren at char %ds Un-matched open paren at char %d( R tfinditertgrouptstarttsplitRRRRtmaxtappendtendR(R ttexttresulttpostdepthtmtparen((se/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/tokenize/sexpr.pyttokenizeWs.    (t__name__t __module__t__doc__tTrueRR(((se/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/tokenize/sexpr.pyR8s (RRtnltk.tokenize.apiRRRtsexpr_tokenize(((se/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/tokenize/sexpr.pyt2s R