U C^Ȱ@sddlmZmZddlZddlZddlmZddlZddlZddl m Z ddl m Z ddl m Z mZddlmZddlZddlZddlmZmZd d lmZd d lmZd d lmZd d lmZd dlmZm Z m!Z!d dl"m#Z#m$Z$m%Z%m&Z&d dl'm(Z(d dl)m*Z*d dl+m,Z,m-Z-d dl.m/Z/m0Z0d dl1m2Z2m3Z3d dl1m4Z4d dl5m6Z6d dl7m8Z8d dl9m:Z:d dl;mm?Z?m@Z@mAZAmBZBd dlCmDZDd dlCmEZEdZFGdddeGZHGd d!d!eGZIGd"d#d#eGZJd$d%ZKGd&d'd'eLZMd(d)ZNd*d+ZOGd,d-d-ZPdS).)absolute_importunicode_literalsN) minibatch) OrderedDict)contextmanager)copydeepcopy)Model)chaincycle) Tokenizer)Vocab) Lemmatizer)Lookups) analyze_pipesanalyze_all_pipesvalidate_attrs)izip basestring_ is_python2 class_types) GoldParse)Scorer)link_vectors_to_modelscreate_default_optimizer)IS_STOPLANG)TOKENIZER_PREFIXESTOKENIZER_SUFFIXES)TOKENIZER_INFIXES) TOKEN_MATCH)TAG_MAP)Doc) LEX_ATTRSis_stop)ErrorsWarningsdeprecation_warning user_warning)util)aboutFc@seZdZedddZedddZedddZeddd Zd d d gZe Z e e Z e eZe eZeeZiZeZiZeZiZiZd dddZgZgZdS) BaseDefaultsNcCs|dkr|j|d}t|dS)Nnlplookups)create_lookupsr)clsr.r0r31/tmp/pip-install-6_kvzl1k/spacy/spacy/language.pycreate_lemmatizer*s zBaseDefaults.create_lemmatizercst|fdd|jD}t|jkrV|jtd}|tjjkrV|tjj|t }| D]\}}t |}| ||qd|S)Ncsi|]\}}||qSr3r3).0namefilenamerootr3r4 3sz/BaseDefaults.create_lookups..) r*Zget_module_path resourcesrlex_attr_gettersregistryr0updategetritemsZload_language_dataZ add_table)r2r. filenameslangr0r7r8datar3r9r4r10s    zBaseDefaults.create_lookupsc Cs||}|j||d}t|j}tjt|jd|t<t ||j ||d}|j D]*\}}| D]\}} |j ||| qbqR|S)Nr/)Zstops)r=tag_map lemmatizerr0)r1r5dictr= functoolspartialr% stop_wordsrrrE morph_rulesrAZ morphologyZadd_special_case) r2r.r0rFr=vocabZtag_strexcZorth_strattrsr3r3r4 create_vocab>s  zBaseDefaults.create_vocabcCs|j}|j}|jr t|jjnd}|jr8t|jjnd}|jrPt |jj nd}|dk rb|j n| |}t ||||||dS)N)rules prefix_search suffix_searchinfix_finditer token_match)tokenizer_exceptionsrTprefixesr*Zcompile_prefix_regexsearchsuffixesZcompile_suffix_regexinfixesZcompile_infix_regexfinditerrLrOr )r2r.rPrTrQrRrSrLr3r3r4create_tokenizerPs"zBaseDefaults.create_tokenizertaggerparsernerZltrT) directionZhas_caseZ has_letters)NN)N)N)N) __name__ __module__ __qualname__ classmethodr5r1rOr[ pipe_namesr!rTtuplerrVrrXr rYrGr"rErUsetrJrKr$r=Zsyntax_iteratorsr<Zwriting_systemZsingle_orth_variantsZpaired_orth_variantsr3r3r3r4r,)s.      r,c@seZdZdZeZdZdddiZdddifdd Ze d d Z e d d Z e j dd Z e ddZ e ddZe ddZe ddZe ddZe ddZe ddZe ddZe dd Zd!d"Zefd#d$ZdWd%d&Zd'd(Zd)d*Zd+d,Zd-d.Zgdfd/d0Zd1d2Zd3d4Z d5d6Z!dXd8d9Z"dYd:d;Z#dd?Z%d[d@dAZ&d\dDdEZ'e(dFdGZ)dBdHdIgdBddJfdKdLZ*dMdNZ+e,dfdOdPZ-e,dfdQdRZ.e,dfdSdTZ/e,dfdUdVZ0dS)]Languagea[A text-processing pipeline. Usually you'll load this once per process, and pass the instance around your application. Defaults (class): Settings, data and factory methods for creating the `nlp` object and processing pipeline. lang (unicode): Two-letter language ID, i.e. ISO code. DOCS: https://spacy.io/api/language N tokenizercCs |j|SN)Defaultsr[r-r3r3r4zLanguage.Ti@BcKstjj}|j|t||_d|_|dkrp|jj }||f| di}|j j dkr| di d|j _ n0|j r|j r|j |j krttjj|j |j d||_|dkr|jj}||f| di}||_g|_||_d|_dS)a0Initialise a Language object. vocab (Vocab): A `Vocab` object. If `True`, a vocab is created via `Language.Defaults.create_vocab`. make_doc (callable): A function that takes text and returns a `Doc` object. Usually a `Tokenizer`. meta (dict): Custom meta data for the Language class. Is written to by models to add model meta data. max_length (int) : Maximum number of characters in a single text. The current v2 models may run out memory on extremely long texts, due to large internal allocations. You should segment these texts into meaningful units, e.g. paragraphs, subsections etc, before passing them to spaCy. Default maximum length is 1,000,000 characters (1mb). As a rule of thumb, if all pipeline components are enabled, spaCy's default models currently requires roughly 1GB of temporary memory per 100,000 characters in one text. RETURNS (Language): The newly constructed object. NTrLvectorsr7)r.rLrh)r*r> factoriesget_allr?rG_meta_pathrjrOr@rmr7rC ValueErrorr&ZE150formatrLr[rhpipeline max_length _optimizer)selfrLmake_docrumetakwargsZuser_factoriesfactoryr3r3r4__init__s&    zLanguage.__init__cCs|jSri)rqrwr3r3r4pathsz Language.pathcCs|jjr|jd|jjn|jd|j|jdd|jdd|jddtj|jdd |jd d |jd d |jd d |jd d |jjt|jj |jj j |jj j d|jd<|j |jd<|j |jd<|j|jd<|jS)NrCr7modelversionz0.0.0Z spacy_versionz>={} descriptionauthoremailurllicense)widthrmkeysr7rmrtrnlabels)rLrCrp setdefaultrsr+ __version__Zvectors_lengthlenrmZn_keysr7rdpipe_factories pipe_labelsr}r3r3r4rys(     z Language.metacCs ||_dSri)rp)rwvaluer3r3r4ryscCs |dS)N tensorizerget_piper}r3r3r4rszLanguage.tensorizercCs |dS)Nr\rr}r3r3r4r\szLanguage.taggercCs |dS)Nr]rr}r3r3r4r]szLanguage.parsercCs |dS)Nr^rr}r3r3r4entityszLanguage.entitycCs |dS)NZ entity_linkerrr}r3r3r4linkerszLanguage.linkercCs |dS)Nmatcherrr}r3r3r4rszLanguage.matchercCsdd|jDS)zwGet names of available pipeline components. RETURNS (list): List of component name strings, in order. cSsg|] \}}|qSr3r3)r6 pipe_name_r3r3r4 sz'Language.pipe_names..rtr}r3r3r4rdszLanguage.pipe_namescCs(i}|jD]\}}t|d|||<q |S)zGet the component factories for the available pipeline components. RETURNS (dict): Factory names, keyed by component names. r{)rtgetattr)rwrnrpiper3r3r4rszLanguage.pipe_factoriescCs2t}|jD] \}}t|dr t|j||<q |S)zGet the labels set by the pipeline components, if available (if the component exposes a labels property). RETURNS (dict): Labels keyed by component name. r)rrthasattrlistr)rwrr7rr3r3r4rs  zLanguage.pipe_labelscCs:|jD]\}}||kr|Sqttjj||jddS)zGet a pipeline component for a given component name. name (unicode): Name of pipeline component to get. RETURNS (callable): The pipeline component. DOCS: https://spacy.io/api/language#get_pipe r7optsN)rtKeyErrorr&E001rsrd)rwr7r componentr3r3r4rs zLanguage.get_pipecCsN||jkr8|dkr&ttjj|dnttjj|d|j|}||f|S)a0Create a pipeline component from a factory. name (unicode): Factory name to look up in `Language.factories`. config (dict): Configuration parameters to initialise component. RETURNS (callable): Pipeline component. DOCS: https://spacy.io/api/language#create_pipe Zsbdr7)rnrr&ZE108rsZE002)rwr7configr{r3r3r4 create_pipes  zLanguage.create_pipec Cst|dsLtjjt||d}t|trD||jkrD|tjj|d7}t ||dkr^t |}||j kr~t tj j||j dtt|t|t|t|gdkrt tjd}||f} |st|||gst|j}|j| n|r|jd| n|r,||j kr,|j |}|j|j || nZ|rj||j krj|j |d}|j|j |d| nt tjj|pz||j dtrt|j|||dS) aAdd a component to the processing pipeline. Valid components are callables that take a `Doc` object, modify it and return it. Only one of before/after/first/last can be set. Default behaviour is "last". component (callable): The pipeline component. name (unicode): Name of pipeline component. Overwrites existing component.name attribute if available. If no name is set and the component exposes no name attribute, component.__name__ is used. An error is raised if a name already exists in the pipeline. before (unicode): Component name to insert component directly before. after (unicode): Component name to insert component directly after. first (bool): Insert component first / not first in the pipeline. last (bool): Insert component last / not last in the pipeline. DOCS: https://spacy.io/api/language#add_pipe __call__rr7)rNrrr )rr&E003rsrepr isinstancerrnZE004rrr*get_component_namerdE007sumboolZE006anyrrtappendinsertindexrENABLE_PIPELINE_ANALYSISr) rwrr7beforeafterfirstlastmsgZ pipe_indexrr3r3r4add_pipe,s:   $   zLanguage.add_pipecCs ||jkS)a$Check if a component name is present in the pipeline. Equivalent to `name in nlp.pipe_names`. name (unicode): Name of the component. RETURNS (bool): Whether a component of the name exists in the pipeline. DOCS: https://spacy.io/api/language#has_pipe )rd)rwr7r3r3r4has_pipe^s zLanguage.has_pipecCs||jkr ttjj||jdt|dsltjjt||d}t|t rd||j krd|tj j|d7}t|||f|j |j |<trt|j dS)zReplace a component in the pipeline. name (unicode): Name of the component to replace. component (callable): Pipeline component. DOCS: https://spacy.io/api/language#replace_pipe rrrrN)rdrrr&rrsrrrrrrnZE135rtrrr)rwr7rrr3r3r4 replace_pipeis  zLanguage.replace_pipecCsh||jkr ttjj||jd||jkr@ttjj||jd|j|}||j|df|j|<dS)zRename a pipeline component. old_name (unicode): Name of the component to rename. new_name (unicode): New name of the component. DOCS: https://spacy.io/api/language#rename_pipe rr N)rdrrr&rrsrrrt)rwZold_namenew_nameir3r3r4 rename_pipe|s    zLanguage.rename_pipecCsF||jkr ttjj||jd|j|j|}trBt |j|S)zRemove a component from the pipeline. name (unicode): Name of the component to remove. RETURNS (tuple): A `(name, component)` tuple of the removed component. DOCS: https://spacy.io/api/language#remove_pipe r) rdrrr&rrsrtpoprrr)rwr7removedr3r3r4 remove_pipes   zLanguage.remove_pipecCst||jkr(ttjjt||jd||}|dkr>i}|jD]b\}}||krVqDt|dsxttj jt ||d||f| |i}|dkrDttj j|dqD|S)aApply the pipeline to some text. The text can span multiple sentences, and can contain arbtrary whitespace. Alignment into the original string is preserved. text (unicode): The text to be processed. disable (list): Names of the pipeline components to disable. component_cfg (dict): An optional dictionary with extra keyword arguments for specific components. RETURNS (Doc): A container for accessing the annotations. DOCS: https://spacy.io/api/language#call )lengthruNrrr) rrurrr&ZE088rsrxrtrrtyper@ZE005)rwtextdisable component_cfgdocr7procr3r3r4rs   zLanguage.__call__cGs4t|dkr&t|dttfr&|d}t|f|S)a^Disable one or more pipeline components. If used as a context manager, the pipeline will be restored to the initial state at the end of the block. Otherwise, a DisabledPipes object is returned, that has a `.restore()` method you can use to undo your changes. DOCS: https://spacy.io/api/language#disable_pipes r r)rrrre DisabledPipes)rwnamesr3r3r4 disable_pipesszLanguage.disable_pipescCs ||Sri)rh)rwrr3r3r4rxszLanguage.make_docc sdg}g}t||D]t\}}t|tr2||}t|tsvfdd|D}|rjtjj|d}t|t|f|}| || |q||fS)z+Format golds and docs before update models.)wordstagsZheadsdepsentitiesZcatslinkscsg|]}|kr|qSr3r3)r6kZ expected_keysr3r4rsz3Language._format_docs_and_golds..)Zunexpexp) ziprrrxrr&ZE151rsrrr) rwdocsgoldsZ gold_objsZdoc_objsrgold unexpectederrr3rr4_format_docs_and_goldss      zLanguage._format_docs_and_goldscs4t|t|kr,ttjjt|t|dt|dkr.get_gradsr?dropsgdlossesr)N)r IndexErrorr&ZE009rsrvrr opsralphab1b2rrtrandomshufflerr@rr?rA)rwrrrrrrrpipesr7rrzrrrr3rr4r?s6          zLanguage.updatecst|dkrdS|dkr4|jdkr.ttj|_|j}t|}t|D] \}}t|trD| |||<qDt|j }t ||dkri}idfdd }|j |_ |j|_|j|_|D]\\} } t| dsqi| j|f||d|| iD]\} \} } || | | dqq|S) aMake a "rehearsal" update to the models in the pipeline, to prevent forgetting. Rehearsal updates run an initial copy of the model over some data, and update the model so its current predictions are more like the initial ones. This is useful for keeping a pretrained model on-track, even if you're updating it with a smaller set of examples. docs (iterable): A batch of `Doc` objects. drop (float): The dropout rate. sgd (callable): An optimizer. RETURNS (dict): Results from the update. EXAMPLE: >>> raw_text_batches = minibatch(raw_texts) >>> for labelled_batch in minibatch(zip(train_docs, train_golds)): >>> docs, golds = zip(*train_docs) >>> nlp.update(docs, golds) >>> raw_batch = [nlp.make_doc(text) for text in next(raw_text_batches)] >>> nlp.rehearse(raw_batch) rNcs||f|<dSrir3rrr3r4r,sz$Language.rehearse..get_gradsrehearserr)N)rrvrr rr enumeraterrrxrtrrrrrrrr@rA)rwrrrrrrrrr7rrrrr3rr4rs6         zLanguage.rehearseccs@|jD]\}}t|dr||}q|D]\}}||fVq(dS)a,Can be called before training to pre-process gold data. By default, it handles nonprojectivity and adds missing tags to the tag map. docs_golds (iterable): Tuples of `Doc` and `GoldParse` objects. YIELDS (tuple): Tuples of preprocessed `Doc` and `GoldParse` objects. preprocess_goldN)rtrr)rw docs_goldsr7rrrr3r3r4r;s    zLanguage.preprocess_goldc Ks8|dkrdd}n>|D]6\}}|}|D] \}}|dD]}|j|}q.r devicerpretrained_vectorsbegin_training)rtr)rrLr@r*use_gpurmrDshaper rasarrayrr7rrvrtrr?r) rwZget_gold_tuplesrrcfgrZannots_bracketsZannotswordr7rrzr3r3r4rHs@         zLanguage.begin_trainingcKs|dddkrJt|d|jjjjddkrJtj |jjj|jj_t |j|jjjjdrr|jjj |d<|dkrt tj}||_ |jD]\}}t|drt|j|_q|j S)aContinue training a pretrained model. Create and return an optimizer, and initialize "rehearsal" for any pipeline component that has a .rehearse() method. Rehearsal is used to prevent models from "forgetting" their initialised "knowledge". To perform rehearsal, collect samples of text you want the models to retain performance on, and call nlp.rehearse() with a batch of Doc objects. rrrr rN_rehearsal_model)r@r*rrLrmrDrr rrrr7rrvrtrrrr)rwrrr7rr3r3r4resume_trainingts    zLanguage.resume_trainingFc s|dkrtjd}|dkr i}t|\}}fdd|D}t|}jD]F\}} ||i} | d|t| dst|| | }qL| j|f| }qLt||D]R\} } t | t st | f| } |rt | |di} | d||j | | f| q|S) a<Evaluate a model's pipeline components. docs_golds (iterable): Tuples of `Doc` and `GoldParse` objects. verbose (bool): Print debugging information. batch_size (int): Batch size to use. scorer (Scorer): Optional `Scorer` to use. If not passed in, a new one will be created. component_cfg (dict): An optional dictionary with extra keyword arguments for specific components. RETURNS (Scorer): The scorer containing the evaluation results. DOCS: https://spacy.io/api/language#evaluate Nrcs$g|]}t|tr|n|qSr3)rrrxr6rr}r3r4rsz%Language.evaluate.. batch_sizerscorerverbose) rrtrrr@rr_piperrrprintZscore) rwrrrrrrrr7rrzrrr3r}r4evaluates0          zLanguage.evaluatec +svfdd|jD}|D]&}z t|Wqtk r<YqXqdV|D]&}z t|WqJtk rnYqJXqJdS)aReplace weights of models in the pipeline with those provided in the params dictionary. Can be used as a contextmanager, in which case, models go back to their original weights after the block. params (dict): A dictionary of parameters keyed by model ID. **cfg: Config parameters. EXAMPLE: >>> with nlp.use_params(optimizer.averages): >>> nlp.to_disk('/tmp/checkpoint') cs$g|]\}}t|dr|qS) use_params)rrr6r7rparamsr3r4rs z'Language.use_params..N)rtnext StopIteration)rwr rcontextscontextr3r r4rs   zLanguage.use_paramsrir c #s&t|\}} tr(|dkr(ttjd}|dkr:ttj|dkrJt }|rt|\} } dd| D}dd| D} j |||||d} t | | D]\}}||fVqdS|dkri}g}j D]b\}}||krq| |i}|d|t|d r tj|j f|}ntjt||d }||q|dkrD||||} n&fd d|D} |D]}|| } qZt}t}d}d }| D]}|V|r|||d kr|||d7}n`t|d kr||}}|dkrtjj}n,jj|\}}j||j|d }qdS)aProcess texts as a stream, and yield `Doc` objects in order. texts (iterator): A sequence of texts to process. as_tuples (bool): If set to True, inputs should be a sequence of (text, context) tuples. Output will then be a sequence of (doc, context) tuples. Defaults to False. batch_size (int): The number of texts to buffer. disable (list): Names of the pipeline components to disable. cleanup (bool): If True, unneeded strings are freed to control memory use. Experimental. component_cfg (dict): An optional dictionary with extra keyword arguments for specific components. n_process (int): Number of processors to process texts, only supported in Python3. If -1, set `multiprocessing.cpu_count()`. YIELDS (Doc): Documents in the order of the original text. DOCS: https://spacy.io/api/language#pipe r rcss|]}|dVqdS)rNr3r6Ztcr3r3r4 sz Language.pipe..css|]}|dVqdS)r Nr3rr3r3r4rs)rr n_processrNrr)rrzc3s|]}|VqdSrirxr6rr}r3r4r!sri') itertoolsteerr)r'ZW023r(ZW016mp cpu_countrrrtr@rrrHrIrr_multiprocessing_pipeweakrefWeakSetaddrrrLstringsZ_cleanup_stale_strings _reset_cacherh)rwtextsZ as_tuples n_threadsrrcleanuprr raw_textsZ text_context1Z text_context2r rrrrr7rrzfrZ recent_refsZold_refsZoriginal_strings_dataZnr_seenrrr3r}r4rsz                 z Language.pipec #st|\}}ddt|D}tddt|D\}}t||} t| ||d} | | fddt||D} | D] } | qt ddt |D} fdd| D}z.cSsg|]}tdqS)F)rZPiper$r3r3r4rJs) chunk_sizecs(g|] \}}tjtj||fdqS))targetargs)rProcess _apply_pipesrx)r6ZrchZschrrwr3r4rVscss|]}|VqdSri)recv)r6r+r3r3r4r_sz1Language._multiprocessing_pipe..c3s|]}tj|VqdSri)r#rL from_bytes)r6Zbyte_docr}r3r4r`sr r)rrrangerr_Sendersendstartr from_iterabler terminaterstep)rwrrrrr!Ztexts_qZbytedocs_recv_chZbytedocs_send_chZ batch_textssenderZprocsrZ byte_docsrrrrr3r*r4rCs.    zLanguage._multiprocessing_pipecs|dk rttj|}t|}t}fdd|d<fdd|d<jD]:\}}t|ds`qL||krjqLt|dsvqL|fd d||<qLfd d|d <t|||dS) a]Save the current state to a directory. If a model is loaded, this will include the model. path (unicode or Path): Path to a directory, which will be created if it doesn't exist. exclude (list): Names of components or serialization fields to exclude. DOCS: https://spacy.io/api/language#to_disk Ncsjj|dgdSNrL)exclude)rhto_diskpr}r3r4rkzsz"Language.to_disk..rhcs|dtjS)Nw)openwritesrsly json_dumpsryr8r}r3r4rk}s  meta.jsonr7r7cSs|j|dgdSr5)r7r9rr3r3r4rkrlcs j|Sri)rLr7r8r}r3r4rkrlrL) r(r'W014r* ensure_pathrrtrr7)rwr~r6r serializersr7rr3r}r4r7ks"     zLanguage.to_diskcs|dk rttj|}t|}t}fdd|d<fdd|d<fdd|d<jD].\}}||krnq\t|d szq\|fd d||<q\|dsd|krt |dg}t ||||_ S) aLoads state from a directory. Modifies the object in place and returns it. If the saved `Language` object contains a model, the model will be loaded. path (unicode or Path): A path to a directory. exclude (list): Names of components or serialization fields to exclude. RETURNS (Language): The modified `Language` object. DOCS: https://spacy.io/api/language#from_disk Ncsjt|Sri)ryr?r= read_jsonr8r}r3r4rkrlz$Language.from_disk..r?csj|otSri)rL from_disk_fix_pretrained_vectors_namer8r}r3r4rksrLcsjj|dgdSr5)rhrEr8r}r3r4rksrhrEcSs|j|dgdSr5)rEr@r3r3r4rks) r(r'rAr*rBrrtrexistsrrErq)rwr~r6r deserializersr7rr3r}r4rEs&    zLanguage.from_diskc s|dk rttj|}t}fdd|d<fdd|d<fdd|d<jD].\}}||krdqRt|d spqR|fd d||<qRt|||}t||S) aSerialize the current state to a binary string. exclude (list): Names of components or serialization fields to exclude. RETURNS (bytes): The serialized form of the `Language` object. DOCS: https://spacy.io/api/language#to_bytes Ncs jSri)rLto_bytesr3r}r3r4rkrlz#Language.to_bytes..rLcsjjdgdSr5)rhrIr3r}r3r4rkrlrhcs tjSri)r=r>ryr3r}r3r4rkrlr?rIcSs|jdgdSr5rI)rr3r3r4rkrl) r(r'rArrtrr*get_serialization_excluderI)rwr6rrzrCr7rr3r}r4rIs  zLanguage.to_bytesc s|dk rttj|}t}fdd|d<fdd|d<fdd|d<jD].\}}||krdqRt|d spqR|fd d||<qRt|||}t|||S) aLoad state from a binary string. bytes_data (bytes): The data to load from. exclude (list): Names of components or serialization fields to exclude. RETURNS (Language): The `Language` object. DOCS: https://spacy.io/api/language#from_bytes Ncsjt|Sri)ryr?r=Z json_loadsbr}r3r4rkrlz%Language.from_bytes..r?csj|otSri)rLr,rFrLr}r3r4rksrLcsjj|dgdSr5)rhr,rLr}r3r4rksrhr,cSs|j|dgdSr5)r,)rMrr3r3r4rks) r(r'rArrtrr*rKr,)rw bytes_datar6rrzrHr7rr3r}r4r,s   zLanguage.from_bytes)NNNNN)rNNN)NNN)NNN)N)FrNN)1r`rarb__doc__r,rjrCrnr|propertyr~rysetterrr\r]rrrrdrrrrGrrrrrrrrrxrr?rrrrrrrrrrer7rErIr,r3r3r3r4rgxs   +              2   - 4 ,  * " m( &rgc@s.eZdZdZdeedfddZddZdS)raDecorator for pipeline components. Can decorate both function components and class components and will automatically register components in the Language.factories. If the component is a class and needs access to the nlp object or config parameters, it can expose a from_nlp classmethod that takes the nlp object and **cfg arguments and returns the initialized component. NFcCs$||_t||_t||_||_dS)aQDecorate a pipeline component. name (unicode): Default component and factory name. assigns (list): Attributes assigned by component, e.g. `["token.pos"]`. requires (list): Attributes required by component, e.g. `["token.dep"]`. retokenizes (bool): Whether the component changes the tokenization. N)r7rassignsrequires retokenizes)rwr7rRrSrTr3r3r4r|s  zcomponent.__init__csd|d|dd}|jp"t}|_|_|j_|j_|j_fdd}|tjj<S)Nrr cs,tdrj|f|Sttr(SS)Nfrom_nlp)rrUrr)r.robjr3r4r{ s   z#component.__call__..factory) r7r*rr{rRrSrTrgrn)rwr'rzZ factory_namer{r3rVr4rs   zcomponent.__call__)r`rarbrOrer|rr3r3r3r4rs  rcCsd|jkr0|jddr0|jdd|jj_nX|jjjsFd|jj_nBd|jkr~d|jkr~d|jd|jdf}||jj_n ttj|jjjdkrt |j|j D]6\}}t |dsq|j di|jjj|j dd<qdS) Nrmr7rCz %s_%s.vectorsrrZdeprecation_fixes vectors_name)ryr@rLrmr7sizerrr&ZE092rrtrrr)r.rXr7rr3r3r4rFs      rFc@s0eZdZdZddZddZddZdd Zd S) rz)Manager for temporary pipeline disabling.cs>|_||_tj|_t||fdd|DdS)Nc3s|]}|VqdSri)r)r6r7r-r3r4r8sz)DisabledPipes.__init__..)r.rrrtoriginal_pipelinerr|extend)rwr.rr3r-r4r|0s   zDisabledPipes.__init__cCs|Srir3r}r3r3r4 __enter__:szDisabledPipes.__enter__cGs |dSri)restore)rwr'r3r3r4__exit__=szDisabledPipes.__exit__csTjjj}j_fdd|D}|rD|j_ttjj|dgdd<dS)zARestore the pipeline to its state when DisabledPipes was created.cs g|]\}}j|s|qSr3)r.rrr}r3r4rCs z)DisabledPipes.restore..)rN)r.rtrZrrr&ZE008rs)rwcurrentrr3r}r4r]@s zDisabledPipes.restoreN)r`rarbrOr|r\r^r]r3r3r3r4r-s  rccsDt|}dD]}||kr ||q |D]}||f|}|Vq(dS)N)rr)rGr)rrrzargrr3r3r4rKs  rcsF|}fdd|D}|D] }||}q|dd|DqdS)aWorker for Language.pipe receiver (multiprocessing.Connection): Pipe to receive text. Usually created by `multiprocessing.Pipe()` sender (multiprocessing.Connection): Pipe to send doc. Usually created by `multiprocessing.Pipe()` c3s|]}|VqdSrir3rrr3r4r`sz_apply_pipes..cSsg|] }|qSr3rJrr3r3r4rdsz _apply_pipes..N)r@r/)rxrZrecieverr4rrrr3rr4r)Vs  r)c@s(eZdZdZddZddZddZdS) r.zAUtil for sending data to multiprocessing workers in Language.pipecCs(t||_tt||_||_d|_dS)Nr)iterrDr queuesr%count)rwrDrbr%r3r3r4r|js z_Sender.__init__cCs4tt|jt|j|jD]\}}||qdS)z1Send chunk_size items from self.data to channels.N)rislicerrDr rbr%put)rwitemqr3r3r4r/ps  z _Sender.sendcCs,|jd7_|j|jkr(d|_|dS)zfTell sender that comsumed one item. Data is sent to the workers after every chunk_size calls.r rN)rcr%r/r}r3r3r4r3xs z _Sender.stepN)r`rarbrOr|r/r3r3r3r3r4r.gsr.)Q __future__rrrrZ spacy.utilrrrH collectionsr contextlibrrrZ thinc.neuralr r=multiprocessingrr r rhr rLrrFrr0rZanalysisrrrcompatrrrrrrrrZ_mlrrrNrrZlang.punctuationrrr Zlang.tokenizer_exceptionsr!Z lang.tag_mapr"tokensr#Zlang.lex_attrsr$r%errorsr&r'r(r)rr*r+robjectr,rgrrFrrrr)r.r3r3r3r4s\                Ox.