U C^e@sxddlmZddlZddlZddlmZmZddlm Z ddl m Z ddl m Z ddlmZeeZGdd d ZdS) )shuffleN) zero_initcreate_default_optimizerget_cossim_loss)Model)chain)Affinec@sleZdZdZdZdZdZdZdddZd d Z dd d Z ddZ e ddZ ddZddZe ddZdS) EntityEncoderz Train the embeddings of entity descriptions to fit a fixed-size entity vector (e.g. 64D). This entity vector will be stored in the KB, for further downstream use in the entity model. rig{Gz?cCs||_||_||_||_dSN)nlp input_dim desc_widthepochs)selfrrrrrM/tmp/pip-install-6_kvzl1k/spacy/bin/wiki_entity_linking/train_descriptions.py__init__ szEntityEncoder.__init__c sjdkrtdd}d}t|t|}g}|t|krtj|||}fdd|D}t|}| | ||}t||t|}t d |q,|S)Nz(Can not apply encoder before training itircsg|]}|qSr)_get_doc_embedding).0docrrr 2sz/EntityEncoder.apply_encoder..zEncoded: {} entities)encoder ValueErrorminlenlistrpipenpasarrayextendtolistloggerinfoformat) rdescription_listZ batch_sizestartstop encodingsZdocsZdoc_embeddingsencrrr apply_encoder&s  zEntityEncoder.apply_encoderFcCsF||\}}|rBtd|d|jdtd|dS)Nz"Trained entity descriptions on {} z$(non-unique) descriptions across {} rzFinal loss: {}) _train_modelr%r&r'r)rr(Zto_print processedlossrrrtrain<s zEntityEncoder.traincCs6d}d}||j|jd}d}|}d}t|jD]}t|d} d} t|jt |} |r6| t |kr6g} || | D]"} | | }| |}| |qz| | }| ddkrtd||t | 7}||jk}||kr|}d}n|d7}||jkrd}| d7} | |j} t| |jt |} qZq6||fS)Ng?rTz loss: {} F)_build_networkrrcopyrangerrr BATCH_SIZErrrappend_updater%r&r'MIN_LOSSMAX_NO_IMPROVEMENT)rr(Z best_lossZiter_since_bestr/r0Z descriptionsZ to_continueiZbatch_nrr)r*batchdescrr doc_vectorrrrr.FsB         zEntityEncoder._train_modelcCsttjt|fdd}t|D]6\}}|j|jjjkrJ|jjj|j||<qd||<q|jjj|}tj |dd}|S)Nr<)Zdtyper)Zaxis) r!zerosr enumerateZorthZvocabvectorsZkey2rowdataZmean)rindicesr<wordZ word_vectorsr?rrrrxs z EntityEncoder._get_doc_embeddingc CsRtdti,t|||_|jtt||dd?|_W5QRXt|jj|_ dS)Nz>>g)Z drop_factor) rZdefine_operatorsrr rrmodelropssgd)rZ orig_widthZ hidden_withrrrr4s   zEntityEncoder._build_networkcCsN|jjt||jd\}}|j|t|d\}}|||jd|t|S)N)Zdrop)scoresgolds)rH)rFZ begin_updater!r"DROP _get_lossrHr)rrBZ predictionsZbp_modelr0Zd_scoresrrrr9s zEntityEncoder._updatecCst||\}}||fSr r)rJrIr0Z gradientsrrrrLszEntityEncoder._get_lossN)r )F)__name__ __module__ __qualname____doc__rKr7r:r;rr-r1r. staticmethodrr4r9rLrrrrr s  2  r )randomrloggingZnumpyr!Z spacy._mlrrZspacy.cli.pretrainrZ thinc.v2vrZ thinc.apirZthinc.neural._classes.affiner getLoggerrMr%r rrrrs