ó šÄïYc@s•dZddlmZddlZddlZddlZddlmZm Z m Z ddl m Z ddddd „Zd efd „ƒYZdS( s5Definition of various recurrent neural network cells.iÿÿÿÿ(tprint_functionNi(tDataItert DataBatchtDataDesc(tndarrays ic CsÖ|}|dkr(i||6}t}nt}g}x•|D]}g} xq|D]i} | |kr¦|svtd| ƒ‚||kr|d7}n||| <|d7}n| j|| ƒqNW|j| ƒq;W||fS(sEncode sentences and (optionally) build a mapping from string tokens to integer indices. Unknown keys will be added to vocabulary. Parameters ---------- sentences : list of list of str A list of sentences to encode. Each sentence should be a list of string tokens. vocab : None or dict of str -> int Optional input Vocabulary invalid_label : int, default -1 Index for invalid token, like invalid_key : str, default '\n' Key for invalid token. Use '\n' for end of sentence by default. start_label : int lowest index. Returns ------- result : list of list of int encoded sentences vocab : dict of str -> int result vocabulary sUnknown token %siN(tNonetTruetFalsetAssertionErrortappend( t sentencestvocabt invalid_labelt invalid_keyt start_labeltidxt new_vocabtrestsenttcodedtword((s,build/bdist.linux-armv7l/egg/mxnet/rnn/io.pytencode_sentencess$          tBucketSentenceItercBs;eZdZd dddddd„Zd„Zd„ZRS( s†Simple bucketing iterator for language model. The label at each sequence step is the following token in the sequence. Parameters ---------- sentences : list of list of int Encoded sentences. batch_size : int Batch size of the data. invalid_label : int, optional Key for invalid label, e.g. . The default is -1. dtype : str, optional Data type of the encoding. The default data type is 'float32'. buckets : list of int, optional Size of the data buckets. Automatically generated if None. data_name : str, optional Name of the data. The default name is 'data'. label_name : str, optional Name of the label. The default name is 'softmax_label'. layout : str, optional Format of data and label. 'NT' means (batch_size, length) and 'TN' means (length, batch_size). iÿÿÿÿtdatat softmax_labeltfloat32tNTc CsVtt|ƒjƒ|sogttjg|D]} t| ƒ^q,ƒƒD]\} } | |krH| ^qH}n|jƒd} g|D] } g^q†|_x—t|ƒD]‰\} }t j |t|ƒƒ}|t|ƒkrî| d7} q¨ntj ||f|d|ƒ}||t|ƒ*|j|j |ƒq¨Wg|jD]} tj | d|ƒ^q?|_td| ƒ||_||_||_||_||_||_g|_g|_|jdƒ|_||_t|ƒ|_|jdkrUtd|jd||jfd|jƒg|_td|jd||jfd|jƒg|_n~|jdkrÇtd|jd|j|fd|jƒg|_td|jd|j|fd|jƒg|_n td ƒ‚g|_ x`t|jƒD]O\} }|j j!gt"dt|ƒ|d|ƒD]} | | f^q"ƒqìWd|_#|j$ƒdS( Niitdtypes?WARNING: discarded %d sentences longer than the largest bucket.tNtnametshapetlayouts>Invalid layout %s: Must by NT (batch major) or TN (time major)(%tsuperRt__init__t enumeratetnptbincounttlentsortRtbisectt bisect_lefttfullR tasarraytprintt batch_sizetbucketst data_namet label_nameRR tnddatatndlabeltfindt major_axisRtmaxtdefault_bucket_keyRt provide_datat provide_labelt ValueErrorRtextendtrangetcurr_idxtreset(tselfR R,R-R R.R/RRtstitjtndiscardt_Rtbucktbuff((s,build/bdist.linux-armv7l/egg/mxnet/rnn/io.pyR!gs`;  .           G cCsd|_tj|jƒx!|jD]}tjj|ƒq#Wg|_g|_x¯|jD]¤}tj|ƒ}|dd…dd…f|dd…dd…f<|j |dd…df<|jj t j |d|j ƒƒ|jj t j |d|j ƒƒqYWdS(s1Resets the iterator to the beginning of the data.iNiiÿÿÿÿR(R;trandomtshuffleRRR#R0R1t empty_likeR R RtarrayR(R=RCtlabel((s,build/bdist.linux-armv7l/egg/mxnet/rnn/io.pyR<£s   2"cCs>|jt|jƒkr!t‚n|j|j\}}|jd7_|jdkr”|j||||j!j}|j||||j!j}n6|j||||j!}|j||||j!}t |g|gddd|j |dt d|j d|j d|jƒgd t d|jd|j d|jƒgƒS( sReturns the next batch of data.itpadit bucket_keyR6RRRR7(R;R%Rt StopIterationR3R0R,tTR1RR-RR.RRR/(R=R?R@RRI((s,build/bdist.linux-armv7l/egg/mxnet/rnn/io.pytnext³s" ! N(t__name__t __module__t__doc__RR!R<RN(((s,build/bdist.linux-armv7l/egg/mxnet/rnn/io.pyRNs   : (RQt __future__RR'REtnumpyR#tioRRRtRRRR(((s,build/bdist.linux-armv7l/egg/mxnet/rnn/io.pyts   0