U C^z @sddlmZddlZddlmZddlmZddlZddlZddl m Z m Z m Z ddl m Z e e e e e e dZd Zd Zejd d defd d defdeddefdddefddddefdeeddefdddefdddefd d&d"d#Zd$d%ZdS)')unicode_literalsN)Path)Printer) conllu2jsoniob2jsonconll_ner2json)ner_jsonl2json)Z conllubioZconlluZconllneriobjsonl)jsonr msg)r r z Input file positionalz!Output directory. '-' for stdout.zType of data to produce: {}optiontz*Number of sentences per doc (0 to disable)n)zSegment sentences (for -c ner)flagsz(Model for sentence segmentation (for -s)bz Converter: {}cz Language (if tokenizer required)lz#Enable appending morphology to tagsrm) input_file output_dir file_typen_sents seg_sentsmodel converterlang morphology-r Fautoc  Cs"|dk} t| d} t|} |tkrD| jd|ddtdd|tkrj|dkrj| jd|d dd| s| jd | dd|dkrt|s| jd |dd| jd d d } |dkr| j dd}|dks|dkr&t | } | dkr| d| }n$| dkr| d| }n | d|tkrD| jd|ddt|}|| |||||| d}|dkrd|}t|t| jd|}|dkrt||n.|dkrt||n|dkrt||| dt||n.|dkrtd|n|dkrtd|dS)a Convert files into JSON format for use with train command and other experiment management functions. If no output_dir is specified, the data is written to stdout, so you can pipe them forward to a JSON file: $ spacy convert some_file.conllu > some_file.json r")no_printzUnknown file type: '{}'zSupported file types: '{}'z, r)ZexitszCan't write .{} data to stdout.z#Please specify an output directory.zInput file not foundzOutput directory not foundrzutf-8)encodingr#Nr r z'Auto-detected token-per-line NER formatz*Auto-detected sentence-per-line NER formatzgCan't automatically detect NER format. Conversion may not succeed. See https://spacy.io/api/cli#convertzCan't find converter for {})rrZuse_morphologyr rr$z.{}r r rz(Generated output file ({} documents): {})rr FILE_TYPESfailformatjoinFILE_TYPES_STDOUTexistsopenreadsuffixautodetect_ner_formatinfowarn CONVERTERSpartsZ with_suffixsrsly write_jsonZ write_jsonlZ write_msgpackZgoodlen)rrrrrrr!rr r$rZ input_path input_dataZconverter_autodetectfuncdatar0Z output_filer<4/tmp/pip-install-6_kvzl1k/spacy/spacy/cli/convert.pyconvert sz              r>cCs|ddd}ddd}td}td}|D]@}|}||rZ|dd7<||r4|d d7<q4|ddkr|d dkrd S|d dkr|ddkrdSdS) N r)r r z\S+\|(O|[IB]-\S+)z\S+\s+(O|[IB]-\S+)$r rr )splitrecompilestripsearch)r9linesZformat_guessesZiob_reZner_reliner<r<r=r1s     r1)r"r rFNFr#N) __future__rZplacpathlibrZwasabirr6rB convertersrrrr r4r(r, annotationsstrr*inttuplekeysboolr>r1r<r<r<r=sJ             V