B An]\@sddlmZmZddlZddlZddlZddlZddlmZddl Z ddl m Z ddl m Z ddlmZddlZddlZddlZddlZddlmZy ddlZWnek rdZYnXdd lmZdd lmZmZmZmZm Z dd lm!Z!dd l"m#Z#m$Z$m%Z%ia&ee'j(d a)da*ddZ+ddZ,ddZ-ddZ.dvddZ/ddZ0ddZ1ddZ2d d!Z3d"d#Z4dwd$d%Z5d&d'Z6d(d)Z7d*d+Z8d,d-Z9d.d/Z:d0d1Z;d2d3Zdyd8d9Z?d:d;Z@dd?ZBd@dAZCdBdCZDdDdEZEdFdGZFdHdIZGdzdJdKZHd{dMdNZIdOdPZJdQdRZKdSdTZLdeMfdUdVZNd|dXdYZOdZd[ZPd\d]ZQd^d_ZRd`daZSdbdcZTdddeZUdfdgZVd}dhdiZWdjdkZXdldmZYdndoZZdpdqZ[Gdrdsdse\Z]Gdtdudue^Z_dS)~)unicode_literalsprint_functionN)Path) OrderedDict)Model)NumpyOps)Draft4Validator)ORTH)cupy CudaStreampath2str basestring_unicode_) import_file)ErrorsWarningsdeprecation_warningdataFcCs|adS)N) _PRINT_ENV)valuerm/home/app_decipher_dev_19-4/dev/decipher-analysis/serverless-application/helper/df_spacy/python/spacy/util.py set_env_log$srcCs|tkS)aCheck whether a Language class is already loaded. Language classes are loaded lazily, to avoid expensive setup code associated with the language data. lang (unicode): Two-letter language code, e.g. 'en'. RETURNS (bool): Whether a Language class has been loaded. ) LANGUAGES)langrrrlang_class_is_loaded)s rc Cstd|}|dk r|t|<|S|tkrytd|d}Wn6tk rp}zttjj||dWdd}~XYnXt||j dt|<t|S)zImport and load a Language class. lang (unicode): Two-letter language code, e.g. 'en'. RETURNS (Language): Language class. Zspacy_languagesNz.lang.%sspacy)rerrr) get_entry_pointr importlib import_module ImportErrorrZE048formatgetattr__all__)r entry_pointmodulerrrrget_lang_class5s &r(cCs |t|<dS)zSet a custom Language class name that can be loaded via get_lang_class. name (unicode): Name of Language class. cls (Language): Language class. N)r)nameclsrrrset_lang_classJsr+TcCs|stStrtSdSdS)zGet path to spaCy data directory. require_exists (bool): Only return path if it exists, otherwise None. RETURNS (Path or None): Data path or None. N) _data_pathexists)require_existsrrr get_data_pathTsr/cCs t|adS)z_Set path to spaCy data directory. path (unicode or Path): Path to new data directory. N) ensure_pathr,)pathrrr set_data_path`sr2cCst|trt|S|SdS)zEnsure string is converted to a Path. path: Anything. If string, it's converted to Path. RETURNS: Path or original argument. N) isinstancerr)r1rrrr0is r0cKst}|r|s(ttjjt|dt|tr|t dd| DkrXt |f|St |rlt |f|St|rtt|f|Snt|drt|f|Sttjj|ddS)aLoad a model from a shortcut link, package or data path. name (unicode): Package name, shortcut link or model path. **overrides: Specific overrides, like pipeline components to disable. RETURNS (Language): `Language` class with the loaded model. )r1cSsg|] }|jqSr)r)).0drrr szload_model..r-)r)N)r/r-IOErrorrZE049r#r r3rsetiterdirload_model_from_link is_packageload_model_from_packagerload_model_from_pathhasattrZE050)r) overrides data_pathrrr load_modelus       rAcKsPt|d}yt||}Wn&tk rBttjj|dYnX|jf|S)zCLoad a model from a shortcut link, or directory in spaCy data path.z __init__.py)r))r/rAttributeErrorr7rZE051r#load)r)r?r1r*rrrr:s r:cKst|}|jf|S)z'Load a model from an installed package.)r r!rC)r)r?r*rrrr<s r<c Ks|s t|}t|d}|fd|i|}|dg}|dg}|dkrT|jj}n |dkr`g}xD|D]<}||krf|di|i}|j||d} |j| |d qfW||S) zLoad a model from a data directory path. Creates Language class with pipeline from meta.json and then calls from_disk() with path.rmetapipelinedisableT)FNZ pipeline_args)config)r))get_model_metar(getZDefaultsZ pipe_namesZ create_pipeZadd_pipe from_disk) model_pathrDr?r*ZnlprErFr)rG componentrrrr=s      r=cKs`t|j}t|}d|d|d|df}||}|sRttjjt|dt ||f|S)a&Helper function to use in the `load()` method of a model package's __init__.py. init_file (unicode): Path to model's __init__.py, i.e. `__file__`. **overrides: Specific overrides, like pipeline components to disable. RETURNS (Language): `Language` class with loaded model. z%s_%s-%srr)version)r1) rparentrHr-r7rE052r#r r=)Z init_filer?rKrDZdata_dirr@rrrload_model_from_init_pys rPcCst|}|s&ttjjt|d|d}|sHttjj|dt |}x.dD]&}||ksl||sXt tj j|dqXW|S)zGet model meta.json from a directory path and validate its contents. path (unicode or Path): Path to model directory. RETURNS (dict): The model's meta data. )r1z meta.json)rr)rM)setting) r0r-r7rrOr#r is_fileZE053srsly read_json ValueErrorZE054)r1rK meta_pathrDrQrrrrHs  rHcCs>|}tjj}x$|D]}|dd|krdSqWdS)zCheck if string maps to a package installed via pip. name (unicode): Name of package. RETURNS (bool): True if installed package, False if not. -_TF)lower pkg_resourcesZ working_setZby_keykeysreplace)r)packagespackagerrrr;s   r;cCs|}t|}t|jjS)z|Get the path to an installed package. name (unicode): Package name. RETURNS (Path): Path to installed package. )rYr r!r__file__rN)r)pkgrrrget_package_paths racCs*i}x t|D]}|||j<qW|S)zGet registered entry points from other packages for a given key, e.g. 'spacy_factories' and return them as a dictionary, keyed by name. key (unicode): Entry point name. RETURNS (dict): Entry points, keyed by name. )rZiter_entry_pointsrCr))keyresultr&rrrget_entry_pointssrecCs*x$t|D]}|j|kr |Sq WdS)zCheck if registered entry point is available for a given name and load it. Otherwise, return None. key (unicode): Entry point name. value (unicode): Name of entry point to load. RETURNS: The loaded entry point or None. N)rZrbr)rC)rcrr&rrrrs rcCs4ytjj}|dkrdSWntk r.dSXdS)zCheck if user is running spaCy from a Jupyter notebook by detecting the IPython kernel. Mainly used for the displaCy visualizer. RETURNS (bool): True if in Jupyter, False if not. ZZMQInteractiveShellTF)Z get_ipython __class____name__ NameError)shellrrr is_in_jupyter s rjcCs&tdkr dSttjtrdStSdS)N)r r3ropsr)requirerrrget_cuda_streams  rmcCs6tdkr |Stj|jd|jd}|j||d|SdS)NC)orderdtype)stream)r ndarrayshaperpr8)rqZ numpy_arrayarrayrrr get_async!s rucCst|tkrt}nt}d|tjkrb|tjd|}tr^t|dt|dd||S|tjkr|tj|}trt|dt|dd||Strt|dt|d|SdS)NZSPACY_=Zviaz$SPACY_$z by default) typefloatintupperosenvironrprintrepr)r)defaultZ type_convertrrrrenv_opt*s   rc CsHt|}|}|d}WdQRXddd|D}t|S)N |cSs"g|]}|rdt|qS)^)stripreescape)r4piecerrrr6Dszread_regex..)r0openreadsplitjoinrcompile)r1file_entries expressionrrr read_regex?s  rcCsHd|kr&ddd|D}t|Sddd|D}t|SdS)zCompile a sequence of prefix rules into a regex object. entries (tuple): The prefix rules, e.g. spacy.lang.punctuation.TOKENIZER_PREFIXES. RETURNS (regex object): The regex object. to be used for Tokenizer.prefix_search. (rcSs"g|]}|rdt|qS)r)rrr)r4rrrrr6Rsz(compile_prefix_regex..cSsg|]}|rd|qS)r)r)r4rrrrr6VsN)rrr)rrrrrcompile_prefix_regexIs  rcCsddd|D}t|S)zCompile a sequence of suffix rules into a regex object. entries (tuple): The suffix rules, e.g. spacy.lang.punctuation.TOKENIZER_SUFFIXES. RETURNS (regex object): The regex object. to be used for Tokenizer.suffix_search. rcSsg|]}|r|dqS)rw)r)r4rrrrr6`sz(compile_suffix_regex..)rrr)rrrrrcompile_suffix_regexZsrcCsddd|D}t|S)zCompile a sequence of infix rules into a regex object. entries (tuple): The infix rules, e.g. spacy.lang.punctuation.TOKENIZER_INFIXES. RETURNS (regex object): The regex object. to be used for Tokenizer.infix_finditer. rcSsg|]}|r|qSr)r)r4rrrrr6jsz'compile_infix_regex..)rrr)rrrrrcompile_infix_regexdsrcGstt||S)aQExtend an attribute function with special cases. If a word is in the lookups, the value is returned. Otherwise the previous function is used. default_func (callable): The default function to execute. *lookups (dict): Lookup dictionary mapping string to attribute value. RETURNS (callable): Lexical attribute getter. ) functoolspartial_get_attr_unless_lookup) default_funclookupsrrr add_lookupsns rcCs&x|D]}||kr||SqW||S)Nr)rrstringlookuprrrr{s  rcGst|}x|D]z}xj|D]^\}}tdd|DsJttjj||dddd|D}||krttjj||dqW| |qWt |dd}|S)zUpdate and validate tokenizer exceptions. Will overwrite exceptions. base_exceptions (dict): Base exceptions. *addition_dicts (dict): Exceptions to add to the base dict, in order. RETURNS (dict): Combined tokenizer exceptions. css|]}t|ttVqdS)N)r3r r)r4attrrrr szupdate_exc..)rcZorthscss|]}|tVqdS)N)r )r4rrrrrs'u’) dictitemsallrUrZE055r#rZE056update expand_exc)Zbase_exceptionsZaddition_dictsexcZ additionsZorthZ token_attrsZdescribed_orthrrr update_excs  rcs\ddt|}xF|D]:\}}|kr|}fdd|D}|||<qW|S)aHFind string in tokenizer exceptions, duplicate entry and replace string. For example, to add additional versions with typographic apostrophes. excs (dict): Tokenizer exceptions. search (unicode): String to find and replace. replace (unicode): Replacement. RETURNS (dict): Combined tokenizer exceptions. cSs t|}|t|||t<|S)N)rr r\)tokensearchr\fixedrrr _fix_tokenszexpand_exc.._fix_tokencsg|]}|qSrr)r4t)rr\rrrr6szexpand_exc..)rrr\)Zexcsrr\Znew_excs token_stringtokensZnew_key new_valuer)rr\rrrs   rcCs~|dks|dksttj|dkr(d}n|dkr8||7}t|td|}|dkrV|}n|dkrf||7}t|t||}||fS)Nr r)rUrZE057minmax)lengthstartstopsteprrrnormalize_slices rccs`t|trt|}n|}t|}x8t|}tt|t|}t|dkrNPt|Vq$WdS)zlIterate over batches of items. `size` may be an iterator, so that batch-size can vary on each step. rN) r3rz itertoolsrepeatiternextlistislicelen)rsizesize_ batch_sizebatchrrr minibatchs   rc#s2fdd}t}x||V||9}qWdS)aZYield an infinite series of compounding values. Each time the generator is called, a value is produced by multiplying the previous value by the compound rate. EXAMPLE: >>> sizes = compounding(1., 10., 1.5) >>> assert next(sizes) == 1. >>> assert next(sizes) == 1 * 1.5 >>> assert next(sizes) == 1.5 * 1.5 cskrt|St|S)N)rr)r)rrrrclipszcompounding..clipN)ry)rrZcompoundrcurrr)rrr compoundings  rc#s:fdd}t}x||V||7}qWdS)aYield an infinite series of values that step from a start value to a final value over some number of steps. Each step is (stop-start)/steps. After the final value is reached, the generator continues yielding that value. EXAMPLE: >>> sizes = stepping(1., 200., 100) >>> assert next(sizes) == 1. >>> assert next(sizes) == 1 * (200.-1.) / 100 >>> assert next(sizes) == 1 + (200.-1.) / 100 + (200.-1.) / 100 cskrt|St|S)N)rr)r)rrrrrszstepping..clipN)ry)rrZstepsrrr)rrrsteppings  rc#s>fdd}d}x&|dd||V|d7}qWdS)z5Yield an infinite series of linearly decaying values.cskrt|St|S)N)rr)r)rrrrrszdecaying..clipg?r Nr)rrdecayrnr_updr)rrrdecayings rc cst|trt|}n|}t|}xt|}g}xt|dkry|rNt|\}}nt|}Wntk rv|rr|VdSX|||8}|r|||fq2||q2W|r$|Vq$WdS)z.Create minibatches of a given number of words.rN)r3rzrrrr StopIterationappend) rrZtuplesZ count_wordsrrrdocZgoldrrrminibatch_by_wordss,     rccst|}g}ypxjx.ttd|t|D]}|t|q*Wt|x*ttd|D]}|rp|Vq\Pq\WqWWn6t k rt|x|r|VqWt YnXdS)uShuffle an iterator. This works by holding `bufsize` items back and yielding them sometime later. Obviously, this is not unbiased – but should be good enough for batching. Larger bufsize means less bias. From https://gist.github.com/andres-erbsen/1307752 iterable (iterable): Iterator to shuffle. bufsize (int): Items to hold back. YIELDS (iterable): The shuffled iterator. r N) rrangerandomrandintrrrshufflepopr)iterablebufsizebufirrr itershuffle$s    rcCsBt}x0|D]$\}}|dd|kr|||<qWt|S)N.r)rrrrS msgpack_dumps)gettersexclude serializedrcgetterrrrto_bytesAs rcCsJt|}x:|D].\}}|dd|kr||kr|||qW|S)Nrr)rS msgpack_loadsrr) bytes_dataZsettersrmsgrcsetterrrr from_bytesJs  rcCsPt|}|s|x2|D]&\}}|dd|kr"|||q"W|S)Nrr)r0r-mkdirrr)r1Zwritersrrcwriterrrrto_diskSsrcCs@t|}x2|D]&\}}|dd|kr|||qW|S)Nrr)r0rr)r1ZreadersrrcreaderrrrrJ^s rJcCs|ddddS)zPerform a template-specific, rudimentary HTML minification for displaCy. Disclaimer: NOT a general-purpose solution, only removes indentation and newlines. html (unicode): Markup to minify. RETURNS (unicode): "Minified" HTML. z rr)rr\)htmlrrr minify_htmlgsrcCs4|dd}|dd}|dd}|dd}|S) zReplace <, >, &, " with their HTML encoded representation. Intended to prevent HTML errors in rendered displaCy markup. text (unicode): The original text. RETURNS (unicode): Equivalent text to be safely used within HTML. &z&z>"z")r\)textrrr escape_htmlrs     rcCsVy ddl}Wntk r dSXddlm}|jj|}||t_ |t_ |S)Nr)CupyOps) cupy.cuda.devicer"thinc.neural.opsrcudadeviceDeviceuserrkOps)Zgpu_idr rrrrruse_gpus  rcCs.t|tj|tdk r*tj|dS)N)rseednumpyr )rrrrfix_random_seeds  rcCst|S)N)r)schemarrrget_json_validatorsrcCst|}||dS)zHValidate a given schema. This just checks if the schema itself is valid.N)rZ check_schema)r validatorrrrvalidate_schemasrcCsg}xt||dddD]n}|jrDdddd|jD}nd}|jd |}|jrd d|jD}|d d|7}||qW|S) zValidate data against a given JSON schema (see https://json-schema.org). data: JSON-serializable data to validate. validator (jsonschema.DraftXValidator): The validator. RETURNS (list): A list of error messages, if available. cSs|jS)N)r1)errrzvalidate_json..)rcz[{}]z -> cSsg|] }t|qSr)str)r4prrrr6sz!validate_json..r cSsg|]}d|jqS)z - {})r#message)r4Zsuberrrrrr6sz: {})sortedZ iter_errorsr1r#rr contextr)rrerrorsrZerr_pathrZsuberrsrrr validate_jsonsr cCst|}dd|D}xf|D]Z\}}|dkrV|dkrVttjj|d||q |dd|kr tt j j|dq W|S)zHelper function to validate serialization args and manage transition from keyword arguments (pre v2.1) to exclude argument. cSsg|]}|ddqS)rr)r)r4r)rrrr6sz-get_serialization_exclude..)ZvocabF)argrr) rrrrZW015r#rrrUrZE128)Z serializersrkwargsoptionsrcrrrrget_serialization_excludes rc@s*eZdZdZddZd ddZddZdS) SimpleFrozenDictzSimplified implementation of a frozen dict, mainly used as default function or method argument (for arguments that should default to empty dictionary). Will raise an error if user or spaCy attempts to add to dict. cCsttjdS)N)NotImplementedErrorrE095)selfrcrrrr __setitem__szSimpleFrozenDict.__setitem__NcCsttjdS)N)rrr)rrcrrrrrszSimpleFrozenDict.popcCsttjdS)N)rrr)rotherrrrrszSimpleFrozenDict.update)N)rg __module__ __qualname____doc__rrrrrrrrs rc@s,eZdZddZddZddZddZd S) DummyTokenizercKsdS)Nrr)rrrrrrszDummyTokenizer.to_bytescKs|S)Nr)rZ _bytes_datarrrrrszDummyTokenizer.from_bytescKsdS)Nr)r_pathrrrrrszDummyTokenizer.to_diskcKs|S)Nr)rrrrrrrJszDummyTokenizer.from_diskN)rgrrrrrrJrrrrrsr)T)F)F)N)N)r)r)r)` __future__rrr|rZr rpathlibrr collectionsrZthinc.neural._classes.modelrrrrrZ numpy.randomrrSZ jsonschemarZ cupy.randomr r"Zsymbolsr compatr r rrrr rrrrr_rNr,rrrr(r+r/r2r0rAr:r<r=rPrHr;rarerrjrmrurrrrrrrrrrrrrrrrrrrrrJrrrrrrr rrrobjectrrrrrs