ó <żCVc@słdZddlmZddlmZmZmZddlmZm Z ddl m Z m Z ddl mZddlmZmZdZed efd „ƒYƒZd „Zd S( uý A general interface to the SENNA pipeline that supports any of the operations specified in SUPPORTED_OPERATIONS. Applying multiple operations at once has the speed advantage. For example, Senna will automatically determine POS tags if you are extracting named entities. Applying both of the operations will cost only the time of extracting the named entities. The SENNA pipeline has a fixed maximum size of the sentences that it can read. By default it is 1024 token/sentence. If you have larger sentences, changing the MAX_SENTENCE_SIZE value in SENNA_main.c should be considered and your system specific binary should be rebuilt. Otherwise this could introduce misalignment errors. The input is: - path to the directory that contains SENNA executables. If the path is incorrect, Senna will automatically search for executable file specified in SENNA environment variable - List of the operations needed to be performed. - (optionally) the encoding of the input data (default:utf-8) >>> from __future__ import unicode_literals >>> from nltk.classify import Senna >>> pipeline = Senna('/usr/share/senna-v2.0', ['pos', 'chk', 'ner']) >>> sent = 'Dusseldorf is an international business center'.split() >>> [(token['word'], token['chk'], token['ner'], token['pos']) for token in pipeline.tag(sent)] [('Dusseldorf', 'B-NP', 'B-LOC', 'NNP'), ('is', 'B-VP', 'O', 'VBZ'), ('an', 'B-NP', 'O', 'DT'), ('international', 'I-NP', 'O', 'JJ'), ('business', 'I-NP', 'O', 'NN'), ('center', 'I-NP', 'O', 'NN')] i˙˙˙˙(tunicode_literals(tpathtseptenviron(tPopentPIPE(t architecturetsystem(tTaggerI(t text_typetpython_2_unicode_compatibleuhttp://ml.nec-labs.com/senna/tSennacBsGeZdddgZdd„Zd„Zd„Zd„Zd„ZRS( uposuchkuneruutf-8cCsł||_tj|ƒt|_|j|jƒ}tj|ƒsŚdtkrŚtjtdƒt|_|j|jƒ}tj|ƒsŁtd||fƒ‚qŁqŚn||_ dS(NuSENNAu3Senna executable expected at %s or %s but not found( t _encodingRtnormpathRt_patht executabletisfileRtOSErrort operations(tselft senna_pathRtencodingt exe_file_1t exe_file_2((se/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/classify/senna.pyt__init__8s  cCs–tƒ}|dkrNtƒd}|dkr>tj|dƒStj|dƒS|dkrjtj|dƒS|dkr†tj|d ƒStj|d ƒS( uĆ The function that determines the system specific binary that should be used in the pipeline. In case, the system is not known the default senna binary will be used. uLinuxiu64bitu senna-linux64u senna-linux32uWindowsusenna-win32.exeuDarwinu senna-osxusenna(RRRtjoin(Rt base_pathtos_nametbits((se/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/classify/senna.pyRKs      cCsJi}d}x7tjD],}||jkr|||<|d7}qqW|S(u¨ A method that calculates the order of the columns that SENNA pipeline will output the tags into. This depends on the operations being ordered. i(R tSUPPORTED_OPERATIONSR(Rt_maptit operation((se/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/classify/senna.pyR]s cCs|j|gƒdS(uI Applies the specified operation(s) on a list of tokens. i(t tag_sents(Rttokens((se/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/classify/senna.pyttagjscCsI|j}tj|j|jƒƒsCtd|j|jƒƒ‚n|j|jƒd|jddg}|jg|jD]}d|^qwƒdjd„|Dƒƒd}t |t ƒrŇ|rŇ|j |ƒ}nt |dt d t d t ƒ}|jd |ƒ\}}|} |jd kr0td |ƒ‚n|rH|j|ƒ} n|jƒ} gg} d } d } xŮ| jƒjdƒD]Â}|sŽ| jgƒ| d7} d } qn|jdƒ}i}x&| D]}|| |jƒ||ststdintstdouttstderrtinputiu!Senna command failed! Details: %siu uwordu´Misalignment error occurred at sentence number %d. Possible reason is that the sentence size exceeded the maximum size. Check the documentation of Senna class for more information.i˙˙˙˙(R RRRRRtextendRRt isinstanceR tencodeRRt communicatet returncodet RuntimeErrortdecodeRtstriptsplittappendt IndexError(Rt sentencesRt _senna_cmdtopt_inputtpR'R(t senna_outputtmap_ttagged_sentencestsentence_indext token_indext tagged_wordttagstresultR#((se/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/classify/senna.pyR!psL $'      (t__name__t __module__RRRRR#R!(((se/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/classify/senna.pyR 3s    cCsPddlm}ytddddgƒ}Wntk rK|dƒ‚nXdS(Ni˙˙˙˙(tSkipTestu/usr/share/senna-v2.0uposuchkuneruSenna executable not found(tnoseRDR R(tmoduleRDttagger((se/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/classify/senna.pyt setup_moduleŽs  N(t__doc__t __future__RtosRRRt subprocessRRtplatformRRt nltk.tag.apiRt nltk.compatR R t _senna_urlR RH(((se/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/classify/senna.pyt%sz