B d?@sdZddlmZdZddlZddlZddlZddlmZddl m Z ej dkrRe Z Gdd d eZGd d d eZGd d d eZGdddeZGdddeZGdddeeZGdddeeZGdddeZGdddeZGdddeZGdddeZGdddeZGd d!d!eZGd"d#d#eZGd$d%d%eZGd&d'd'eZGd(d)d)eZ Gd*d+d+eZ!Gd,d-d-eZ"d.d/e#d0fd1d2Z$d3d4Z%dS)5a A finite state machine specialized for regular-expression-based text filters, this module defines the following classes: - `StateMachine`, a state machine - `State`, a state superclass - `StateMachineWS`, a whitespace-sensitive version of `StateMachine` - `StateWS`, a state superclass for use with `StateMachineWS` - `SearchStateMachine`, uses `re.search()` instead of `re.match()` - `SearchStateMachineWS`, uses `re.search()` instead of `re.match()` - `ViewList`, extends standard Python lists. - `StringList`, string-specific ViewList. Exception classes: - `StateMachineError` - `UnknownStateError` - `DuplicateStateError` - `UnknownTransitionError` - `DuplicateTransitionError` - `TransitionPatternNotFound` - `TransitionMethodNotFound` - `UnexpectedIndentationError` - `TransitionCorrection`: Raised to switch to another transition. - `StateCorrection`: Raised to switch to another state & transition. Functions: - `string2lines()`: split a multi-line string into a list of one-line strings How To Use This Module ====================== (See the individual classes, methods, and attributes for details.) 1. Import it: ``import statemachine`` or ``from statemachine import ...``. You will also need to ``import re``. 2. Derive a subclass of `State` (or `StateWS`) for each state in your state machine:: class MyState(statemachine.State): Within the state's class definition: a) Include a pattern for each transition, in `State.patterns`:: patterns = {'atransition': r'pattern', ...} b) Include a list of initial transitions to be set up automatically, in `State.initial_transitions`:: initial_transitions = ['atransition', ...] c) Define a method for each transition, with the same name as the transition pattern:: def atransition(self, match, context, next_state): # do something result = [...] # a list return context, next_state, result # context, next_state may be altered Transition methods may raise an `EOFError` to cut processing short. d) You may wish to override the `State.bof()` and/or `State.eof()` implicit transition methods, which handle the beginning- and end-of-file. e) In order to handle nested processing, you may wish to override the attributes `State.nested_sm` and/or `State.nested_sm_kwargs`. If you are using `StateWS` as a base class, in order to handle nested indented blocks, you may wish to: - override the attributes `StateWS.indent_sm`, `StateWS.indent_sm_kwargs`, `StateWS.known_indent_sm`, and/or `StateWS.known_indent_sm_kwargs`; - override the `StateWS.blank()` method; and/or - override or extend the `StateWS.indent()`, `StateWS.known_indent()`, and/or `StateWS.firstknown_indent()` methods. 3. Create a state machine object:: sm = StateMachine(state_classes=[MyState, ...], initial_state='MyState') 4. Obtain the input text, which needs to be converted into a tab-free list of one-line strings. For example, to read text from a file called 'inputfile':: input_string = open('inputfile').read() input_lines = statemachine.string2lines(input_string) 5. Run the state machine on the input text and collect the results, a list:: results = sm.run(input_lines) 6. Remove any lingering circular references:: sm.unlink() )print_functionrestructuredtextN)utils) ErrorOutput)rc@seZdZdZd6ddZddZd7d d Zd8d d Zd9ddZddZ ddZ ddZ d:ddZ ddZ ddZddZddZd;d d!Zd"d#Zd StateMachinea A finite state machine for text filters using regular expressions. The input is provided in the form of a list of one-line strings (no newlines). States are subclasses of the `State` class. Transitions consist of regular expression patterns and transition methods, and are defined in each state. The state machine is started with the `run()` method, which returns the results of processing in a list. FcCsLd|_d|_d|_d|_||_||_||_i|_||g|_ t |_ dS)a+ Initialize a `StateMachine` object; add state objects. Parameters: - `state_classes`: a list of `State` (sub)classes. - `initial_state`: a string, the class name of the initial state. - `debug`: a boolean; produce verbose output if true (nonzero). Nr) input_lines input_offsetline line_offsetdebug initial_state current_statestates add_states observersr_stderr)self state_classesrr rszStateMachine.is_next_line_blankcCs|jt|jdkS)z0Return 1 if the input is at or past end-of-file.r)r r2r )rrrrat_eofEszStateMachine.at_eofcCs |jdkS)z8Return 1 if the input is at or before beginning-of-file.r)r )rrrrat_bofIszStateMachine.at_bofcCs<|j|8_|jdkr d|_n|j|j|_||jS)z=Load `self.line` with the `n`'th previous line and return it.rN)r r r r@)rrArrrr/Ms  zStateMachine.previous_linecCsTzDy||j|_|j|j|_Wntk r>d|_tYnX|jS|XdS)z?Jump to absolute line offset `line_offset`, load and return it.N)r r r r r?r*r@)rr rrr goto_lineWs  zStateMachine.goto_linecCs|j||jS)zMake and add transitions listed in `self.initial_transitions`.N)initial_transitionsmake_transitionsadd_transitions)rnamesr5rrrrms zState.add_initial_transitionscCsNx.|D]&}||jkrt|||krt|qW||jdd<|j|dS)a" Add a list of transitions to the start of the transition list. Parameters: - `names`: a list of transition names. - `transitions`: a mapping of names to transition tuples. Exceptions: `DuplicateTransitionError`, `UnknownTransitionError`. Nr)r5DuplicateTransitionErrorUnknownTransitionErrorrVupdate)rrtr5rYrrrrss   zState.add_transitionscCs0||jkrt||g|jdd<||j|<dS)z Add a transition to the start of the transition list. Parameter `transition`: a ready-made transition 3-tuple. Exception: `DuplicateTransitionError`. Nr)r5rurV)rrYZ transitionrrradd_transitions zState.add_transitioncCs2y|j|=|j|Wnt|YnXdS)z^ Remove a transition by `name`. Exception: `UnknownTransitionError`. N)r5rVrgrv)rrYrrrremove_transitions zState.remove_transitioncCs|dkr|jj}y,|j|}t|ds:t|}|j|<Wn(tk rdtd|jj|fYnXyt||}Wn(t k rt d|jj|fYnX|||fS)a Make & return a transition tuple based on `name`. This is a convenience function to simplify transition creation. Parameters: - `name`: a string, the name of the transition pattern & method. This `State` object must have a method called '`name`', and a dictionary `self.patterns` containing a key '`name`'. - `next_state`: a string, the name of the next `State` object for this transition. A value of ``None`` (or absent) implies no state change (i.e., continue with the same state). Exceptions: `TransitionPatternNotFound`, `TransitionMethodNotFound`. NrWz%s.patterns[%r]z%s.%s) r+r,patternshasattrrecompiler=TransitionPatternNotFoundgetattrAttributeErrorTransitionMethodNotFound)rrYr9rZr[rrrmake_transitions  zState.make_transitioncCsjtd}g}i}xP|D]H}t||r>||||<||q|j|||d<||dqW||fS)z Return a list of transition names and a transition mapping. Parameter `name_list`: a list, where each entry is either a transition name string, or a 1- or 2-tuple (transition name, optional next state name). rMr)r`r rrd)r name_listZ stringtypertr5Z namestaterrrrrs   zState.make_transitionscCs |dgfS)a' Called when there is no match from `StateMachine.check_line()`. Return the same values returned by transition methods: - context: unchanged; - next state name: ``None``; - empty result list. Override in subclasses to catch this event. Nr)rr4r5rrrrXs zState.no_matchcCs|gfS)z Handle beginning-of-file. Return unchanged `context`, empty result. Override in subclasses. Parameter `context`: application-defined storage. r)rr4rrrr%sz State.bofcCsgS)z Handle end-of-file. Return empty result. Override in subclasses. Parameter `context`: application-defined storage. r)rr4rrrr-sz State.eofcCs ||gfS)z A "do nothing" transition method. Return unchanged `context` & `next_state`, empty result. Useful for simple state changes (actionless transitions). r)rrWr4r9rrrnopsz State.nop)F)N)r,rirjrkrzrqrorprrrrmrsrxryrrrrXr%r-rrrrrrls$' $  !  rlc@s.eZdZdZd ddZd ddZd dd Zd S)StateMachineWSaq `StateMachine` subclass specialized for whitespace recognition. There are three methods provided for extracting indented text blocks: - `get_indented()`: use when the indent is unknown. - `get_known_indented()`: use when the indent is known for all lines. - `get_first_known_indented()`: use when only the first line's indent is known. FTcCsh|}|j|j||\}}}|r6|t|dx$|rZ|dsZ||d7}q8W||||fS)a Return a block of indented lines of text, and info. Extract an indented block where the indent is unknown for all lines. :Parameters: - `until_blank`: Stop collecting at the first blank line if true. - `strip_indent`: Strip common leading indent if true (default). :Return: - the indented block (a list of lines of text), - its indent, - its first line offset from BOF, and - whether or not it finished with a blank line. rr)rHr get_indentedr r'r2rB trim_start)r until_blank strip_indentr8indentedindent blank_finishrrrr%s zStateMachineWS.get_indentedcCsf|}|jj|j|||d\}}}|t|dx$|rZ|dsZ||d7}q8W|||fS)a Return an indented block and info. Extract an indented block where the indent is known for all lines. Starting with the current line, extract the entire text block with at least `indent` indentation (which must be whitespace, except for the first line). :Parameters: - `indent`: The number of indent columns/characters. - `until_blank`: Stop collecting at the first blank line if true. - `strip_indent`: Strip `indent` characters of indentation if true (default). :Return: - the indented block, - its first line offset from BOF, and - whether or not it finished with a blank line. ) block_indentrr)rHr rr r'r2rBr)rrrrr8rrrrrget_known_indented?s z!StateMachineWS.get_known_indentedcCsl|}|jj|j|||d\}}}|t|d|r`x$|r^|ds^||d7}qd?Z!d@dAZ"dBdCZ#dDdEZ$dFdGZ%dHdIZ&dJdKZ'dLdMZ(dNdOZ)dPdQZ*dS)XViewLista> List with extended functionality: slices of ViewList objects are child lists, linked to their parents. Changes made to a child list also affect the parent list. A child list is effectively a "view" (in the SQL sense) of the parent list. Changes to parent lists, however, do *not* affect active child lists. If a parent list is changed, any active child lists should be recreated. The start and end of the slice can be trimmed using the `trim_start()` and `trim_end()` methods, without affecting the parent list. The link between child and parent lists can be broken by calling `disconnect()` on the child list. Also, ViewList objects keep track of the source & offset of each item. This information is accessible via the `source()`, `offset()`, and `info()` methods. Ncsg|_g|_||_||_t|trD|jdd|_|jdd|_n:|dk r~t||_|rb||_nfddtt|D|_t|jt|jkst ddS)Ncsg|] }|fqSrr).0i)rrr Ysz%ViewList.__init__..z data mismatch) dataitemsparent parent_offsetr rlistranger2AssertionError)rinitlistrrrrr)rrrAs  zViewList.__init__cCs t|jS)N)strr)rrrr__str__\szViewList.__str__cCsd|jj|j|jfS)Nz%s(%s, items=%s))r+r,rr)rrrr__repr___szViewList.__repr__cCs|j||kS)N)r_ViewList__cast)rotherrrr__lt__czViewList.__lt__cCs|j||kS)N)rr)rrrrr__le__drzViewList.__le__cCs|j||kS)N)rr)rrrrr__eq__erzViewList.__eq__cCs|j||kS)N)rr)rrrrr__ne__frzViewList.__ne__cCs|j||kS)N)rr)rrrrr__gt__grzViewList.__gt__cCs|j||kS)N)rr)rrrrr__ge__hrzViewList.__ge__cCs |j}||}||k||kS)N)rr)rrZmineZyoursrrr__cmp__js zViewList.__cmp__cCst|tr|jS|SdS)N)r rr)rrrrr__castps zViewList.__castcCs ||jkS)N)r)ritemrrr __contains__vrzViewList.__contains__cCs t|jS)N)r2r)rrrr__len__wrzViewList.__len__cCs^t|trP|jdkstd|j|j|j|j|j|j|j||jpJddS|j|SdS)N)Nrzcannot handle slice with strider)rrr) r slicesteprr+rstartstopr)rrrrr __getitem__}s  zViewList.__getitem__cCst|tr|jdkstdt|ts.td|j|j|j|j<|j |j |j|j<t |jt |j ksrtd|j r||j |jpd|j |jpt ||j <n ||j|<|j r||j ||j <dS)N)Nrzcannot handle slice with stridez(assigning non-ViewList to ViewList slicez data mismatchr) r rrrrrIrrrrr2rr)rrrrrr __setitem__s  , zViewList.__setitem__cCsy(|j|=|j|=|jr&|j||j=Wnttk r|jdksJtd|j|j|j=|j|j|j=|jr|j|jp|d|j|jpt ||j=YnXdS)Nzcannot handle slice with strider) rrrrrIrrrrr2)rrrrr __delitem__szViewList.__delitem__cCs4t|tr(|j|j|j|j|jdStddS)N)rz!adding non-ViewList to a ViewList)r rr+rrrI)rrrrr__add__s zViewList.__add__cCs4t|tr(|j|j|j|j|jdStddS)N)rz!adding ViewList to a non-ViewList)r rr+rrrI)rrrrr__radd__s zViewList.__radd__cCs(t|tr|j|j7_ntd|S)Nz!argument to += must be a ViewList)r rrrI)rrrrr__iadd__s zViewList.__iadd__cCs|j|j||j|dS)N)r)r+rr)rrArrr__mul__szViewList.__mul__cCs |j|9_|j|9_|S)N)rr)rrArrr__imul__szViewList.__imul__cCsRt|tstd|jr2|jt|j|j||j|j|j |j dS)Nz(extending a ViewList with a non-ViewList) r rrIrrOr2rrr&r)rrrrrr&s  zViewList.extendrcCsX|dkr||n@|jr8|jt|j|j||||j||j||fdS)N)r&rrOr2rrrdr)rrrr8rrrrds  zViewList.appendcCs|dkrnt|tstd|j|j||<|j|j||<|jrt|j|t|j}|j||j|nV|j|||j|||f|jrt|j|t|j}|j||j|||dS)Nz+inserting non-ViewList with no source given) r rrIrrrr2rOr)rrrrr8indexrrrrOs zViewList.insertrcCsH|jr0t|j|t|j}|j||j|j||j|S)N)rr2rpoprr)rrrrrrrs  z ViewList.poprcCsf|t|jkr&td|t|jfn|dkr6td|jd|=|jd|=|jrb|j|7_dS)zW Remove items from the start of the list, without touching the parent. zCSize of trim too large; can't trim %s items from a list of size %s.rzTrim size must be >= 0.N)r2rr?rrr)rrArrrrs  zViewList.trim_startcCsV|t|jkr&td|t|jfn|dkr6td|j| d=|j| d=dS)zU Remove items from the end of the list, without touching the parent. zCSize of trim too large; can't trim %s items from a list of size %s.rzTrim size must be >= 0.N)r2rr?r)rrArrrtrim_endszViewList.trim_endcCs||}||=dS)N)r)rrrrrrrgs zViewList.removecCs |j|S)N)rcount)rrrrrr rzViewList.countcCs |j|S)N)rr)rrrrrr rzViewList.indexcCs|j|jd|_dS)N)rreverserr)rrrrrs  zViewList.reversecGsBtt|j|jf|}dd|D|_dd|D|_d|_dS)NcSsg|] }|dqS)rr)rentryrrrrsz!ViewList.sort..cSsg|] }|dqS)rr)rrrrrrs)sortedziprrr)rr0tmprrrsortsz ViewList.sortcCsJy |j|Stk rD|t|jkr>|j|dddfSYnXdS)z%Return source & offset for index `i`.rrN)rr?r2r)rrrrrr(s  z ViewList.infocCs||dS)zReturn source for index `i`.r)r()rrrrrr#szViewList.sourcecCs||dS)zReturn offset for index `i`.r)r()rrrrrr8'szViewList.offsetcCs d|_dS)z-Break link between this list and parent list.N)r)rrrr disconnect+szViewList.disconnectccs0x*t|j|jD]\}\}}|||fVqWdS)z8Return iterator yielding (source, offset, value) tuples.N)rrr)rrarr8rrrxitems/szViewList.xitemscCs"x|D]}td|q WdS)z=Print the list in `grep` format (`source:offset:value` lines)z%s:%d:%sN)rr")rr rrrpprint4szViewList.pprint)NNNNN)Nr)Nr)r)r)r)+r,rirjrkrrrrrrrrrrrrrrrrrrrr__rmul__rr&rdrOrrrrgrrrrr(rr8rrrrrrrr-sR       rc@sNeZdZdZdejfddZdddZdd d Zdd d Z ddZ ddZ d S)r!z*A `ViewList` with string-specific methods.rcs*fdd|j||D|j||<dS)z Trim `length` characters off the beginning of each item, in-place, from index `start` to `end`. No whitespace-checking is done on the trimmed text. Does not affect slice parent. csg|]}|dqS)Nr)rr )lengthrrrDsz(StringList.trim_left..N)r)rrrrr)rr trim_left>s zStringList.trim_leftFcCsz|}t|j}x^||krl|j|}|s,P|rb|ddkrb||\}}t|||||d|d7}qW|||S)z Return a contiguous block of text. If `flush_left` is true, raise `UnexpectedIndentationError` if an indented line is encountered before the text block ends (with a blank line). r r)r2rrBr(rR)rrrSrlastr rr8rrrrQGs     zStringList.get_text_blockTNcCsJ|}|}|dk r|dkr|}|dk r,|d7}t|j}x||kr|j|} | r| ddksr|dk r| d|r||ko|j|d } P| } | s|rd} Pn0|dkrt| t| } |dkr| }n t|| }|d7}q8Wd} |||} |dk r| r| jd|d| jd<|r:|r:| j||dk d| |pDd| fS)a Extract and return a StringList of indented lines of text. Collect all lines with indentation, determine the minimum indentation, remove the minimum indentation from all indented lines (unless `strip_indent` is false), and return them. All lines up to but not including the first unindented line will be returned. :Parameters: - `start`: The index of the first line to examine. - `until_blank`: Stop collecting at the first blank line if true. - `strip_indent`: Strip common leading indent if true (default). - `block_indent`: The indent of the entire block, if known. - `first_indent`: The indent of the first line, if known. :Return: - a StringList of indented lines with mininum indent removed; - the amount of the indent; - a boolean: did the indented block finish with a blank line or EOF? Nrrr)r)r2rrBlstripminr)rrrrrrrrrr rstrippedZ line_indentrTrrrr\sB       zStringList.get_indentedc s*|||}|xtt|jD]}t|j|}y ||}Wn.tk rn|t|j|t|7}YnXy ||}Wn.tk r|t|j|t|7}YnX|j||||j|<} | r tt| t| q W|r&dkr |kr&nnfdd|jD|_|S)Nrcsg|]}|dqS)Nr)rr )rrrrsz+StringList.get_2D_block..) rr2rrZcolumn_indicesr?rstriprr) rtopleftbottomrightrrTrcir r)rr get_2D_blocks$      "zStringList.get_2D_blockcCsxtj}xltt|jD]Z}|j|}t|trg}x,|D]$}||||dkr8||q8Wd||j|<qWdS)z Pad all double-width characters in self by appending `pad_char` to each. For East Asian language support. ZWFrMN) unicodedataeast_asian_widthrr2rr unicoderdr#)rZpad_charrrr newcharrrrpad_double_widths     zStringList.pad_double_widthcCs4x.tt|jD]}|j||||j|<qWdS)z6Replace all occurrences of substring `old` with `new`.N)rr2rreplace)roldrrrrrrszStringList.replace)F)rFTNN)T) r,rirjrksysmaxsizerrQrrrrrrrrr!:s  ; r!c@s eZdZdS)StateMachineErrorN)r,rirjrrrrrsrc@s eZdZdS)r>N)r,rirjrrrrr>sr>c@s eZdZdS)r\N)r,rirjrrrrr\sr\c@s eZdZdS)rvN)r,rirjrrrrrvsrvc@s eZdZdS)ruN)r,rirjrrrrrusruc@s eZdZdS)r~N)r,rirjrrrrr~sr~c@s eZdZdS)rN)r,rirjrrrrrsrc@s eZdZdS)rRN)r,rirjrrrrrRsrRc@seZdZdZdS)r.z Raise from within a transition method to switch to another transition. Raise with one argument, the new transition name. N)r,rirjrkrrrrr.sr.c@seZdZdZdS)r1z Raise from within a transition method to switch to another state. Raise with one or two arguments: new state name, and an optional new transition name. N)r,rirjrkrrrrr1sr1Fz[ ]cs&|r|d|}fdd|DS)a Return a list of one-line strings with tabs expanded, no newlines, and trailing whitespace stripped. Each tab is expanded with between 1 and `tab_width` spaces, so that the next character's index becomes a multiple of `tab_width` (8 by default). Parameters: - `astring`: a multi-line string. - `tab_width`: the number of columns between tab stops. - `convert_whitespace`: convert form feeds and vertical tabs to spaces? rcsg|]}|qSr) expandtabsr)rs) tab_widthrrrsz string2lines..)sub splitlines)ZastringrZconvert_whitespace whitespacer)rr string2liness rcCs>t\}}}x|jr|j}qW|jj}|j||j|j|jfS)z Return exception information: - the exception's class name; - the exception object; - the name of the file containing the offending code; - the line number of the offending code; - the function name of the offending code. ) rexc_infotb_nexttb_framef_coder, co_filename tb_linenoco_name)r`ra tracebackcoderrrr_s  r_)&rk __future__r __docformat__rr|rZdocutilsrZdocutils.utils.error_reportingr version_inforrobjectrrlrrrrrrr! Exceptionrr>r\rvrur~rrRr.r1r}rr_rrrrisP    g