U à€C^>ã@s`ddlmZddlmZddlmZmZmZmZm Z ddl m Z ddl m Z Gdd„deƒZd S) é)Úunicode_literals)Ú OrderedDicté)ÚNOUNÚVERBÚADJÚPUNCTÚPROPN)ÚErrors)ÚLookupsc@sšeZdZdZedd„ƒZdd„Zddd„Zd d d „Zd!d d „Z d"d d„Z d#dd„Z d$dd„Z d%dd„Z d&dd„Zd'dd„Zd(dd„Zd)dd„Zdd„ZdS)*Ú Lemmatizerz The Lemmatizer supports simple part-of-speech-sensitive suffix rules and lookup tables. DOCS: https://spacy.io/api/lemmatizer cOsttjƒ‚dS)N)ÚNotImplementedErrorr ZE172)ÚclsÚargsÚkwargs©rú3/tmp/pip-install-6_kvzl1k/spacy/spacy/lemmatizer.pyÚloadszLemmatizer.loadcOs&|s|st|tƒsttjƒ‚||_dS)z÷Initialize a Lemmatizer. lookups (Lookups): The lookups object containing the (optional) tables "lemma_rules", "lemma_index", "lemma_exc" and "lemma_lookup". RETURNS (Lemmatizer): The newly constructed object. N)Ú isinstancer Ú ValueErrorr ZE173Úlookups)ÚselfrrrrrrÚ__init__s zLemmatizer.__init__Nc Csþ|j di¡}d|jkr&| ||¡gS|tddfkr:d}nX|tddfkrNd}nD|tddfkrbd}n0|td d fkrvd }n|td fkrˆ|gS| ¡gS|  ||¡r¨| ¡gS|j d i¡}|j d i¡}|j di¡}|  || |i¡| |i¡| |g¡¡}|S)aeLemmatize a string. string (unicode): The string to lemmatize, e.g. the token text. univ_pos (unicode / int): The token's universal part-of-speech tag. morphology (dict): The token's morphological features following the Universal Dependencies scheme. RETURNS (list): The available lemmas for the string. Ú lemma_lookupZ lemma_rulesrÚnounrÚverbrÚadjrÚpunctr Z lemma_indexZ lemma_exc) rÚ get_tableÚgetrrrrr ÚlowerÚ is_base_formÚ lemmatize) rÚstringÚuniv_posÚ morphologyÚ lookup_tableZ index_tableZ exc_tableZ rules_tableZlemmasrrrÚ__call__"s4         üzLemmatizer.__call__cCsà|dkr i}|dkr&| d¡dkr&dS|dkr@| d¡dkr@dS|dkrv| d¡d krv| d ¡d krv| d¡dkrvdS|d kr| d ¡dkrdS| d¡dkr¢dS| d¡dkr´dS| d¡dkrÆdS| d ¡dkrØdSdSdS)a? Check whether we're dealing with an uninflected paradigm, so we can avoid lemmatization entirely. univ_pos (unicode / int): The token's universal part-of-speech tag. morphology (dict): The token's morphological features following the Universal Dependencies scheme. NrÚNumberZsingTrZVerbFormÚinfÚfinZTenseZpresrZDegreeÚposÚnoneF)r)rr$r%rrrr!Hs2  ÿ þ ýzLemmatizer.is_base_formcCs ||d|ƒS)Nrr©rr#r%rrrrlszLemmatizer.nouncCs ||d|ƒS)Nrrr-rrrroszLemmatizer.verbcCs ||d|ƒS)Nrrr-rrrrrszLemmatizer.adjcCs ||d|ƒS)NÚdetrr-rrrr.uszLemmatizer.detcCs ||d|ƒS)NÚpronrr-rrrr/xszLemmatizer.proncCs ||d|ƒS)NÚadprr-rrrr0{szLemmatizer.adpcCs ||d|ƒS)NÚnumrr-rrrr1~szLemmatizer.numcCs ||d|ƒS)Nrrr-rrrrszLemmatizer.punctcCs2|j di¡}|dk r|n|}||kr.||S|S)a„Look up a lemma in the table, if available. If no lemma is found, the original string is returned. string (unicode): The original string. orth (int): Optional hash of the string to look up. If not set, the string will be used and hashed. RETURNS (unicode): The lemma if the string was found, otherwise the original string. rN)rr)rr#Zorthr&ÚkeyrrrÚlookup„s zLemmatizer.lookupc CsÈ|}| ¡}g}g}|D]Z\}} | |¡r|dt|ƒt|ƒ…| } | sLq| |ks\|  ¡sh| | ¡q| | ¡qtt |¡ƒ}| |g¡D]} | |krŽ|  d| ¡qŽ|s¶|  |¡|sÄ| |¡|S)Nr) r ÚendswithÚlenÚisalphaÚappendÚlistrÚfromkeysrÚinsertÚextend) rr#ÚindexÚ exceptionsÚrulesÚorigZformsZ oov_formsÚoldÚnewÚformrrrr"”s*      zLemmatizer.lemmatize)N)N)N)N)N)N)N)N)N)N)N)Ú__name__Ú __module__Ú __qualname__Ú__doc__Ú classmethodrrr'r!rrrr.r/r0r1rr3r"rrrrr s   & $         r N)Ú __future__rÚ collectionsrÚsymbolsrrrrr Úerrorsr rr Úobjectr rrrrÚs