U á€C^ÞZã @sddlmZmZddlmZddlmZddlZddlZddl Z ddl m Z m Z ddl mZddlmZdd lmZmZd Zd Zd Zd Zejdddefdddefdddefdddefdddefdddefdddefdddefdd-d!d"„ƒZd#d$„Zd%d&„Zd.d'd(„Zd)d*„Zd+d,„Z dS)/é)Úunicode_literalsÚprint_function)ÚPath)ÚCounterN)ÚPrinterÚMESSAGESé)Ú GoldCorpus)Únonproj)Ú load_modelÚget_lang_classé2éédiÐzmodel languageÚ positionalz(location of JSON-formatted training dataz+location of JSON-formatted development dataz"name of model to update (optional)ÚoptionÚbz5Comma-separated names of pipeline components to trainÚpz+Ignore warnings, only show stats and errorsÚflagZIWz-Print additional information and explanationsÚVzDon't pretty-print the resultsZNF)ÚlangÚ train_pathÚdev_pathÚ base_modelÚpipelineÚignore_warningsÚverboseÚ no_formatútagger,parser,nerFc/ sŒ t| |d}| ¡s&|jd|dd| ¡s>|jd|dddd„| d¡Dƒ}|r`t|ƒ‰nt|ƒ} | ƒ‰| d ¡d } d } | d ¡¨t||ƒ} z t |   ˆ¡ƒ} t |   ˆ¡ƒ}Wn0t k ræ}zd   t|ƒ¡} W5d }~XYnXzt |  ˆ¡ƒ}Wn2t k r,}zd  t|ƒ¡} W5d }~XYnXW5QRX| sD| rn| rT| | ¡| rd| | ¡t d¡| d¡t| |ƒ}t||ƒ}t||ƒ}|d}|d}| d¡| d  d |¡¡¡‡fdd„|DƒD]}| d  |¡¡qØ|r| d  |¡¡n| d  |¡¡| d  t| ƒ¡¡| d  t|ƒ¡¡t|ƒsT| d¡t| |¡ƒ}|rz| d  |¡¡n | d¡|sàt| ƒtkràd  t| ƒ¡}t| ƒtkrÀ| |¡n | |¡|jd  tt¡|d| d ¡|d!}| d"  ||dkr d#nd$t|d$ƒ¡¡|d%d&kr>| d'  |d%¡¡|d%d&kr`| d(  |d%¡¡|d$ d)¡}|jd*  t|d+d,¡|dtˆjjƒrÀ| d-  tˆjjƒˆjjj ˆjj!¡¡n | d.¡d/|krt"d0d1„|d/Dƒƒ}|d/}t#ˆd/ƒ‰‡fd2d„|Dƒ}‡fd3d„|Dƒ}d4}d4} d4}!| d5¡| d6  t|ƒt|ƒdkrVd7nd8t|ƒt|ƒdkrpd7nd8¡¡|d9}"| d:  |"|"dkr˜d;nd<¡¡|D]}#t|#ƒd&kr¤| d=¡q¤|rüd>d„| ¡Dƒ}$t|$d+d,}$|jd?  |$¡|d|r|jd@  t|ƒ¡|d|dAr<| dB  |dA¡¡d+}!|D]l}#||#t$kr@| dC  |#||#¡¡d+}| dD¡t%| |#ƒ}%W5QRX|%d&kr@| dE  |#¡¡d+} q@|s¾| dF¡| sÎ| dG¡|!sÞ| dH¡|rø|jdI  t$¡|d| r |jdJ|d|!r| dK¡dL|krf| dM¡dNd„|dODƒ}t#ˆdLƒ‰‡fdPd„|Dƒ}‡fdQd„|Dƒ}| dR  t|ƒt|ƒ¡¡|r¸t|dO ¡d+d,}$|jd?  |$¡|d|rÖ|jd@  t|ƒ¡|dt"|dOƒt"|dOƒkr| dS  t|dOƒt|dOƒ¡¡|dTd&krD| dU¡|dTd&krf| dV¡n"| dW¡|dTd&krf| dX¡dY|krB| dZ¡d[d„|d\Dƒ}ˆj&j'‰| d]  t|ƒt|ƒdkr´d7nd8tˆƒtˆƒdkrÎd7nd8¡¡t|d\ ¡d+d,}$|j|$|d‡fd^d„|Dƒ}&|&s"| d_  ˆj(¡¡|&D]}#| d`  |#ˆj(¡¡q&da|k rØd4}| db¡| dc  |ddt| ƒdkrzdend |d!|dd¡¡|ddt|dƒ}'|'dfkr¾| dg  |'¡¡dhd„|diDƒ}(djd„|diDƒ})dkd„|diDƒ}*|dld&k r*| dm  |dl|dldk r"dend ¡¡|dld&k r`| dn  |dl|dldk rXdend ¡¡| do  t|)ƒt|(ƒdk r€d7nd8¡¡| dp  t|(ƒt|(ƒdk r¨d7nd8¡¡t|di ¡d+d,}$|j|$|d|diD]6}#|di|#t)k rÚ| dq  |#|di|#¡¡d+} qÚg}+|diD]@}#|di|#t)k rdr|#k r|+ *ds  |#t|di|#ƒ¡¡ qt|+ƒd&k r´| dt  t|+ƒt|+ƒdk rŽdend ¡¡|jdu  dv |+¡¡|dd+}t"|(ƒt"|*ƒ rì|jdw  d t"|(ƒt"|*ƒ¡¡|dt"|*ƒt"|(ƒ r"|jdxd t"|*ƒt"|(ƒ¡|d| r<|jdy  t)¡|dt|dzƒdk rl| d{  d |dz¡¡d|¡|dld&k r¢| d}  |dl|dldk ršdend ¡¡|d~d&k rØ| d  |d~|d~dk rÐdend ¡¡| d€¡|j+t,j-},|j+t,j.}-|j+t,j/}.|, r,| d  |,|,dk r$d‚ndƒ¡¡|- rR| d„  |-|-dk rJd…nd†¡¡|. rx| d„  |.|.dk rpd‡ndˆ¡¡|. rˆt d¡d S)‰zÅ Analyze, debug and validate your training and development data, get useful stats, and find problems like invalid entity annotations, cyclic dependencies, low data labels and more. )ÚprettyrzTraining data not foundé©ZexitszDevelopment data not foundcSsg|] }| ¡‘qS©)Ústrip©Ú.0rr"r"ú7/tmp/pip-install-6_kvzl1k/spacy/spacy/cli/debug_data.pyÚ @szdebug_data..ú,zData format validationÚzLoading corpus...z"Training data cannot be loaded: {}Nz%Development data cannot be loaded: {}zCorpus is loadableÚtextszTraining statszTraining pipeline: {}ú, csg|]}|ˆjkr|‘qSr")Z factoriesr$)Únlpr"r&r'ss z2Pipeline component '{}' not available in factorieszStarting with base model '{}'zStarting with blank model '{}'z{} training docsz{} evaluation docszNo evaluation docsz,{} training examples also in evaluation dataz/No overlap between training and evaluation dataz7Low number of examples to train from a blank model ({})z9It's recommended to use at least {} examples (minimum {}))ÚshowzVocab & VectorsÚn_wordsz#{} total {} in the data ({} unique)ÚwordÚwordsÚn_misaligned_wordsrz){} misaligned tokens in the training dataz${} misaligned tokens in the dev dataé z10 most common words: {}T)Úcountsz*{} vectors ({} unique keys, {} dimensions)z$No word vectors present in the modelÚnercss|]}|dkr|VqdS)©ÚOú-Nr"©r%Úlabelr"r"r&Ú ¹szdebug_data..csg|]}|ˆkr|‘qSr"r"©r%Úl©Ú model_labelsr"r&r'¾scsg|]}|ˆkr|‘qSr"r"r;r=r"r&r'¿sFzNamed Entity Recognitionz{} new {}, {} existing {}r9Úlabelsr7z%{} missing {} (tokens with '-' label)ÚvalueÚvalueszEmpty label found in new labelscSs g|]\}}|dkr||f‘qS)r7r")r%r9Úcountr"r"r&r'×sþzNew: {}z Existing: {}Úws_entsz"{} invalid whitespace entity spansz.Low number of examples for new label '{}' ({})zAnalyzing label distribution...z,No examples for texts WITHOUT new label '{}'z&Good amount of examples for all labelsz5Examples without occurrences available for all labelszr,rIr&Ú debug_datas0    ÿÿÿ             ÿ  ÿü  ÿÿÿÿÿÿ ÿü ýÿ  ÿ  üÿÿÿþ  ÿ  ÿÿÿ  ÿ   ÿýüÿ   ÿÿ ÿ ÿ  üÿÿÿÿÿ  üÿ ÿÿÿ  ýÿ  ýÿÿþÿþÿþÿÿÿ ÿ  ÿÿ ÿûÿ ÿýÿýÿýÿý ÿýÿ þÿþÿ    ÿÿÿ r{c Cs®|jd}|jdkrN| d |¡¡t |¡}W5QRX| d |¡¡|S|jdkr’| d |¡¡t |¡}W5QRX| d |¡¡|S|jd |j¡ddd dS) Néÿÿÿÿz.jsonz Loading {}...z Loaded {}z.jsonlzCan't load file extension {}zExpected .json or .jsonlr r!) ÚpartsÚsuffixr[r_ÚsrslyÚ read_jsonrdZ read_jsonlrX)Ú file_pathrwÚ file_nameÚdatar"r"r&Ú _load_file s     ýr„c CsLtƒtƒtƒtƒtƒtƒdddddddtƒdœ}|D]\}}dd„|jDƒ}|d |¡|dt|ƒ7<|dt|jƒt|ƒ7<|d |j¡d |kr6t|jƒD]‚\}}|dkrÄq²|  d ¡rè||j rè|d d 7<|  d ¡r|  d¡d }|d |d 7<q²|dkr²|d dd 7<q²d|krz|d |j ¡t |j  ¡ƒ d¡d krz|dd 7<d|krž|d dd„|jDƒ¡d|kr4|d dd„|jDƒ¡tt|j|jƒƒD]8\}\} } | |krÒ|d | g¡|dd 7<qÒt |j¡r*|dd 7<t |j¡r4|dd 7<q4|S)Nr)r4rErHrMr0rPrCr.r1rKrNrQrFr*cSsg|]}|dk r|‘qS©Nr"©r%Úxr"r"r&r'/sz!_compile_gold..r0r.r1r*r4)úB-úU-zL-rCr )rˆr‰r7rDrEgð?rFrGrHcSsg|]}|dk r|‘qSr…r"r†r"r"r&r'EsrJrMcSsg|]}|dk r|‘qSr…r"r†r"r"r&r'GsrPrKrNrQ)rrpr0ÚupdaterhÚaddrfÚ enumerater4Ú startswithZis_spacerYrEr\rArBrHr?ÚzipZheadsr Zis_nonproj_treeZcontains_cycle) r]rrƒÚdocÚgoldZ valid_wordsÚir9Zcombined_labelÚdepÚheadr"r"r&resbò      recCs,|rd dd„|Dƒ¡Sd dd„|Dƒ¡S)Nr+cSsg|]\}}d ||¡‘qS)z '{}' ({})©r_)r%r<Úcr"r"r&r'Usz"_format_labels..cSsg|]}d |¡‘qS)z'{}'r”r;r"r"r&r'Vs)rg)r?r3r"r"r&roSsrocCs6d}|D](\}}dd„|jDƒ}||kr|d7}q|S)NrcSs"g|]}|dkr| d¡d‘qS)r5r7r )rYr8r"r"r&r'\sz/_get_examples_without_label..r )r4)rƒr9rBrrr?r"r"r&rsYs   rscCs ||jkrtƒS| |¡}|jSr…)Z pipe_namesrpZget_piper?)r,Z pipe_nameryr"r"r&rqbs  rq)NrFFF)F)!Ú __future__rrÚpathlibrÚ collectionsrZplacrbrZwasabirrrr Zsyntaxr Úutilr r rrrtrlrkÚ annotationsr`Úboolr{r„rerorsrqr"r"r"r&ÚsR        ü   óø e6