U á€C^„ ã@s¶ddlmZmZmZddlZddlmZddlZddlZddl Z ddl Z ddl Z ddl Z ddlmZddlmZejdddefd ddefd d d efd ddd„ƒZdd„Zdd„ZdS)é)Úunicode_literalsÚdivisionÚprint_functionN)ÚPath)Úmsgé)Ú load_modelz Model to loadÚ positionalz&Location of input file. '-' for stdin.z+Maximum number of texts to use if availableÚoptionÚn)ÚmodelÚinputsÚn_textsé'c Csî|dk rt|tƒ}|dkrld}t d¡"tjj ¡\}}t|Ž\}}W5QRXt d  |¡¡|d|…}t d  |¡¡t |ƒ}W5QRXt  d  |¡¡t t  ||¡ƒ}t dtƒtƒd¡t d¡}t d ¡| ¡ d ¡ ¡dS) a4 Profile a spaCy pipeline, to find out which functions take the most time. Input should be formatted as one JSON object per line with a key "text". It can either be provided as a JSONL file, or be read from sys.sytdin. If no input file is specified, the IMDB dataset is loaded via Thinc. Ni¨az!Loading IMDB dataset via Thinc...z)Loaded IMDB dataset and using {} exampleszLoading model '{}'...zLoaded model '{}'zparse_texts(nlp, texts)z Profile.profz Profile statsÚtime)Ú _read_inputsrZloadingÚthincÚextraZdatasetsZimdbÚzipÚinfoÚformatrZgoodÚlistÚ itertoolsÚisliceÚcProfileZrunctxÚglobalsÚlocalsÚpstatsZStatsÚdividerZ strip_dirsZ sort_statsZ print_stats) r r rZn_inputsZ imdb_trainÚ_ÚnlpÚtextsÚs©r#ú4/tmp/pip-install-6_kvzl1k/spacy/spacy/cli/profile.pyÚprofiles"      r%cCs&ddl}|j| |¡ddD]}qdS)Nré)Z batch_size)ÚtqdmÚpipe)r r!r'Údocr#r#r$Ú parse_texts0sr*ccs”|dkr(| d¡tj}dd„|Dƒ}nFt|ƒ}| ¡r@| ¡sP|jd|dd| d |jd ¡¡|  ¡}|D]}t   |¡}|d }|VqrdS) Nú-zReading input from sys.stdincss|]}| d¡VqdS)Úutf8N)Úencode)Ú.0Úliner#r#r$Ú <sz_read_inputs..zNot a valid input data fileé)ZexitszUsing data from {}éÿÿÿÿÚtext) rÚsysÚstdinrÚexistsÚis_fileÚfailrÚpartsÚopenÚsrslyZ json_loads)ÚlocrÚfile_Z input_pathr/Údatar3r#r#r$r8s  r)Nr)Ú __future__rrrZplacÚpathlibrr;rrr4rZthinc.extra.datasetsrZwasabirÚutilrÚ annotationsÚstrÚintr%r*rr#r#r#r$Ús$      ý