U C^ @sddlmZddlmZmZmZddlmZmZddl m Z ddl m Z ddl mZe d d gd Gd d d eZe dd gd Gddde ZdS))unicode_literals)Poolingmax_pool mean_pool)SiameseCauchySimilarity)Pipe) component)link_vectors_to_modelsZsentencizer_hookzdoc.user_hooks)Zassignsc@s.eZdZdZd ddZddZeddZdS) SentenceSegmenteraA simple spaCy hook, to allow custom sentence boundary detection logic (that doesn't require the dependency parse). To change the sentence boundary detection strategy, pass a generator function `strategy` on initialization, or assign a new strategy to the .strategy attribute. Sentence detection strategies should be generators that take `Doc` objects and yield `Span` objects for each sentence. NcCs&||_|dks|dkr|j}||_dS)NZon_punct)vocabsplit_on_punctstrategy)selfrrr7/tmp/pip-install-6_kvzl1k/spacy/spacy/pipeline/hooks.py__init__szSentenceSegmenter.__init__cCs|j|jd<|S)NZsents)r user_hooksrdocrrr__call__s zSentenceSegmenter.__call__ccspd}d}t|D]<\}}|r>|js>|||jV|j}d}q|jdkrd}q|t|krl||t|VdS)NrF).!?T) enumerateZis_punctitextlen)rstartZ seen_periodrtokenrrrr s   z SentenceSegmenter.split_on_punct)N)__name__ __module__ __qualname____doc__rr staticmethodrrrrrr s  r similarityc@sZeZdZdZdddZeddZddZd d Zd d Z dddZ e d d fddZ d S)SimilarityHooka. Experimental: A pipeline component to install a hook for supervised similarity into `Doc` objects. Requires a `Tensorizer` to pre-process documents. The similarity model can be any object obeying the Thinc `Model` interface. By default, the model concatenates the elementwise mean and elementwise max of the two tensors, and compares them using the Cauchy-like similarity function from Chen (2013): >>> similarity = 1. / (1. + (W * (vec1-vec2)**2).sum()) Where W is a vector of dimension weights, initialized to 1. TcKs||_||_t||_dSN)rmodeldictcfg)rrr*r,rrrr>szSimilarityHook.__init__cCsttttt|Sr))rrrrr)clslengthrrrModelCszSimilarityHook.ModelcCs|j|jd<|S)zInstall similarity hookr')predictrrrrrrGs zSimilarityHook.__call__cks|D]}||VqdSr)r)rZdocskwargsrrrrpipeLszSimilarityHook.pipecCs||j||fgSr)) require_modelr*r0)rZdoc1Zdoc2rrrr0PszSimilarityHook.predictNcCs ||jj||d\}}dS)N)drop)r3r*Z begin_update)rZ doc1_doc2Zgoldssgdr5ZsimsZbp_simsrrrupdateTszSimilarityHook.updatecKs<|jdkr(||djj|_t|j|dkr8|}|S)zAllocate model, using width from tensorizer in pipeline. gold_tuples (iterable): Gold-standard training data. pipeline (list): The pipeline the model is part of. TrN)r*r/ZnOr rZcreate_optimizer)r_Zpipeliner6r1rrrbegin_trainingXs   zSimilarityHook.begin_training)T)Nr4) r"r#r$r%r classmethodr/rr2r0r7tupler9rrrrr(/s   r(N) __future__rZ thinc.t2vrrrZ thinc.neural._classes.differencerrZpipesr languager Z_mlr objectr r(rrrrs     "