ó ùµÈ[c@s˜dZddlmZddlZddlZddlZddlmZm Z m Z ddl m Z dddddd „Zd efd „ƒYZdS( s5Definition of various recurrent neural network cells.iÿÿÿÿ(tprint_functionNi(tDataItert DataBatchtDataDesc(tndarrays ic Csë|}|dkr(i||6}t}nt}g}xª|D]¢} g} x†| D]~} | |kr»|s||s|td| ƒ‚||kr•|d7}n|r¤|} n||| <|d7}n| j|| ƒqNW|j| ƒq;W||fS(sEncode sentences and (optionally) build a mapping from string tokens to integer indices. Unknown keys will be added to vocabulary. Parameters ---------- sentences : list of list of str A list of sentences to encode. Each sentence should be a list of string tokens. vocab : None or dict of str -> int Optional input Vocabulary invalid_label : int, default -1 Index for invalid token, like invalid_key : str, default '\n' Key for invalid token. Use '\n' for end of sentence by default. start_label : int lowest index. unknown_token: str Symbol to represent unknown token. If not specified, unknown token will be skipped. Returns ------- result : list of list of int encoded sentences vocab : dict of str -> int result vocabulary sUnknown token %siN(tNonetTruetFalsetAssertionErrortappend( t sentencestvocabt invalid_labelt invalid_keyt start_labelt unknown_tokentidxt new_vocabtrestsenttcodedtword((sL/usr/local/lib/python2.7/site-packages/mxnet-1.3.1-py2.7.egg/mxnet/rnn/io.pytencode_sentencess(           tBucketSentenceItercBs;eZdZd dddddd„Zd„Zd„ZRS( s†Simple bucketing iterator for language model. The label at each sequence step is the following token in the sequence. Parameters ---------- sentences : list of list of int Encoded sentences. batch_size : int Batch size of the data. invalid_label : int, optional Key for invalid label, e.g. . The default is -1. dtype : str, optional Data type of the encoding. The default data type is 'float32'. buckets : list of int, optional Size of the data buckets. Automatically generated if None. data_name : str, optional Name of the data. The default name is 'data'. label_name : str, optional Name of the label. The default name is 'softmax_label'. layout : str, optional Format of data and label. 'NT' means (batch_size, length) and 'TN' means (length, batch_size). iÿÿÿÿtdatat softmax_labeltfloat32tNTc CsÈtt|ƒjƒ|sogttjg|D]} t| ƒ^q,ƒƒD]\} } | |krH| ^qH}n|jƒd} g|D] } g^q†|_i}x$t t|ƒƒD]}d||Invalid layout %s: Must by NT (batch major) or TN (time major)(%tsuperRt__init__t enumeratetnptbincounttlentsortRtrangetbisectt bisect_lefttfullR tasarraytprintt batch_sizetbucketst data_namet label_nameRR tnddatatndlabeltfindt major_axisR tmaxtdefault_bucket_keyRt provide_datat provide_labelt ValueErrorRtextendtcurr_idxtreset(tselfR R.R/R R0R1RR tstitjtndiscardt_t valid_bucketstitemRtbucktbuff((sL/usr/local/lib/python2.7/site-packages/mxnet-1.3.1-py2.7.egg/mxnet/rnn/io.pyR"msj;   54           G cCsd|_tj|jƒx!|jD]}tjj|ƒq#Wg|_g|_x¯|jD]¤}tj|ƒ}|dd…dd…f|dd…dd…f<|j |dd…df<|jj t j |d|j ƒƒ|jj t j |d|j ƒƒqYWdS(s1Resets the iterator to the beginning of the data.iNiiÿÿÿÿR(R<trandomtshuffleRRR$R2R3t empty_likeR R RtarrayR(R>RFtlabel((sL/usr/local/lib/python2.7/site-packages/mxnet-1.3.1-py2.7.egg/mxnet/rnn/io.pyR=®s   2"cCs>|jt|jƒkr!t‚n|j|j\}}|jd7_|jdkr”|j||||j!j}|j||||j!j}n6|j||||j!}|j||||j!}t |g|gddd|j |dt d|j d|j d|jƒgd t d|jd|j d|jƒgƒS( sReturns the next batch of data.itpadit bucket_keyR8RRR R9(R<R&Rt StopIterationR5R2R.tTR3RR/RR0RR R1(R>R@RARRL((sL/usr/local/lib/python2.7/site-packages/mxnet-1.3.1-py2.7.egg/mxnet/rnn/io.pytnext¾s" ! N(t__name__t __module__t__doc__RR"R=RQ(((sL/usr/local/lib/python2.7/site-packages/mxnet-1.3.1-py2.7.egg/mxnet/rnn/io.pyRTs   ? (RTt __future__RR)RHtnumpyR$tioRRRtRRRR(((sL/usr/local/lib/python2.7/site-packages/mxnet-1.3.1-py2.7.egg/mxnet/rnn/io.pyts    5