U C^@sddlmZmZddlmZddlmZmZmZm Z m Z ddl m Z ddl mZddlmZdd lmZdd lmZdd lmZdd lmZdd lmZddlmZmZddlZddlZddl Z!GdddeZ"GdddeZ#GdddeZ$GdddeZ%GdddeZ&dS))unicode_literalsprint_function)Model)chainclone with_getitemwrap with_reshape)Softmax)ReLu) LayerNorm)Maxout)Residual)Affine)MultiHeadedAttention)PositionwiseFeedForward)PyTorchWrapper PyTorchModuleNc@s eZdZd ddZd d d Zd S)EncoderDecoderr,'cpucCst|||_||_||_||_||_t||||d|_t t ||d|_ t t |||d||_tt||d|_|j|j|jg|_dS)a EncoderDecoder consists of an encoder stack, a decoder stack and an output layer which is a linear + softmax. Parameters explanation: nS: the number of encoders/decoders in the stack nH: the number of heads in the multiheaded attention nM: the token's embedding size nTGT: the number of unique words in output vocabulary )nMnHdevicenSrrrrr)ZnOZnIN)r__init__rrrnTGTrEncoderencrPytorchLayerNormnormr DecoderLayerdecr r projZ_layers)selfrrrr"rr+H/tmp/pip-install-6_kvzl1k/thinc/thinc/neural/_classes/encoder_decoder.pyr!s zEncoderDecoder.__init__皙?c s|\}}}|jj|f|d\}|jj||||f|d\\}}}}|j|\} |jj| |d\} dfdd } | |f| fS)z A batch object flows through the network. It contains input, output and corresponding masks. Input changes while the object travels through the network. Output is the golden output. Input: nB x nL x nM dropNcsZ||d}||d}tjjjjtjjjd}||f|d\}}||d}||fS)Nsgd)Zdtype)ropsZxpzerosshapeZfloat32)Z d_word_probsr1dY2dY1r3dY0dX1dX0X0b_Y2Zbackprop_decodeZbackprop_encodeZbackprop_outputr+r, finish_update9s    z2EncoderDecoder.begin_update..finish_update)N)r$ begin_updater(r&r)) r*inputsr/ZXmaskY0ZYmaskX1Y1_Y2Z word_probsr=r+r:r,r>*s  zEncoderDecoder.begin_updateN)rrrrr)r-__name__ __module__ __qualname__r!r>r+r+r+r,rs rcs&eZdZdfdd ZddZZS) r%rư>rcsNtt|tt|||_tt |||_ ||_ ||_ dS)N) superr%r!nn ParameterZtorchZonestoa_2r3b_2epsr)r*rrPr __class__r+r,r!Es zPytorchLayerNorm.__init__cCsJ|jddd|j}|jddd|j}|j||||j|jS)NT)Zkeepdim)meanrMrstdrNrPrO)r*xrTrUr+r+r,forwardLszPytorchLayerNorm.forward)rrIr)rFrGrHr!rW __classcell__r+r+rQr,r%Dsr%c@s eZdZd ddZd ddZd S) r#rrrcCs6t|tt|||d||_tt||d|_dS)Nr r)rr!r EncoderLayerstackrr%r&)r*rrrrr+r+r,r!Ss zEncoder.__init__r-c sL|\}}|jj||fdd\\}}|j|\}dfdd }||fS)Nr-r.cs||d}||d}|SNr0r+)dX2r1r8r9b_X1b_X2r+r,r=]s  z+Encoder.begin_update..finish_update)N)rZr>r&) r*inputr/r;maskrArCX2r=r+r]r,r>Xs zEncoder.begin_updateN)rrrr)r-rEr+r+r+r,r#Rs r#c@s eZdZd ddZd ddZd S) rYrrrcCsVt|t||d|_t|d||_tt||d|_||_ |j|j|jg|_ dS)Nrrr) rr!rattnrffdrr%r&rlayers_r*rrrr+r+r,r!fs  zEncoderLayer.__init__r-c s|\}|jj|df|d\}|j|\}|}|jj||d\}|j|\}||} dfdd } | |f| fS)Nr.csL|}||d}||d}||7}|}||d}||d}||7}Sr[r+)ZdX6r1ZdX5ZdX4ZdX3r\r8r9r;r^r_Zb_X4Zb_X5r+r,r=xs    z0EncoderLayer.begin_update..finish_update)N)rfr>r&rg) r*r`r/rarArbZX3ZX4ZX5ZX6r=r+rjr,r>ns zEncoderLayer.begin_updateN)rrr)r-rEr+r+r+r,rYes rYc@s eZdZd ddZd ddZd S) r'rrrcCsbt|t||d|_t||d|_tt||d|_t|d||_ |j|j|j|j g|_ dS)Nrcrerd) rr!ry_attnx_attnrr%r&rrgrhrir+r+r,r!s  zDecoderLayer.__init__r-cs|\}}}}|jj||df|d\}|j|\}||} |jj| ||ddf|d\} |j| \} | | } |jj| |d\} dfdd }| |||f|fS)Nr.c sp|\}}||d}|}||d}||d\}}||7}|} | |d} | |d} | |7} ||7}| |fSr[r+) ZdIr1ZdY7ZdXZdY6ZdY5ZdY4ZdY3r9r5r6r7Zb_Y1r<Zb_Y4Zb_Y5Zb_Y7r+r,r=s    z0DecoderLayer.begin_update..finish_update)N)rkr>r&rlrg)r*r`r/r@r;ZX_maskZY_maskrBrDZY3ZY4ZY5ZY6ZY7r=r+rmr,r>s zDecoderLayer.begin_updateN)rrr)r-rEr+r+r+r,r's r')' __future__rrmodelrapirrr r r Zsoftmaxr Zrelur Z layernormrZmaxoutrZresnetrZaffinerZmultiheaded_attentionrZpositionwise_ffdrZextra.wrappersrrcopymathZnumpynprr%r#rYr'r+r+r+r,s&         1#