ó šÄïYc@@sÿdZddlmZddlmZmZddlZddlZddlZddl Z yddl Z Wne k r…dZ nXddl ZddlmZddlmZmZmZmZddlmZmZdd lmZdd lmZmZdd lmZdd lmZdd lmZdedddgƒfd„ƒYZde fd„ƒYZ!de fd„ƒYZ"de"fd„ƒYZ#de"fd„ƒYZ$d„Z%de"fd„ƒYZ&de"fd„ƒYZ'd„Z(d „Z)e)ƒdS(!s'Data iterators for common data formats.i(tabsolute_import(t OrderedDictt namedtupleNi(t_LIB(tc_arraytc_strtmx_uinttpy_str(tDataIterHandlet NDArrayHandle(t mx_real_t(t check_calltbuild_param_doc(tNDArray(tarray(t concatenatetDataDesctnametshapecB@sDeZdZedd„Zd„Zed„ƒZed„ƒZRS(s3DataDesc is used to store name, shape, type and layout information of the data or the label. The `layout` describes how the axes in `shape` should be interpreted, for example for image data setting `layout=NCHW` indicates that the first axis is number of examples in the batch(N), C is number of channels, H is the height and W is the width of the image. For sequential data, by default `layout` is set to ``NTC``, where N is number of examples in the batch, T the temporal axis representing time and C is the number of channels. Parameters ---------- cls : DataDesc The class. name : str Data name. shape : tuple of int Data shape. dtype : np.dtype, optional Data type. layout : str, optional Data layout. tNCHWcC@s4t|tƒj|||ƒ}||_||_|S(N(tsuperRt__new__tdtypetlayout(tclsRRRRtret((s(build/bdist.linux-armv7l/egg/mxnet/io.pyRBs  cC@s d|j|j|j|jfS(NsDataDesc[%s,%s,%s,%s](RRRR(tself((s(build/bdist.linux-armv7l/egg/mxnet/io.pyt__repr__HscC@s|dkrdS|jdƒS(sûGet the dimension that corresponds to the batch size. When data parallelism is used, the data will be automatically split and concatenated along the batch-size dimension. Axis can be -1, which means the whole array will be copied for each data-parallelism device. Parameters ---------- layout : str layout string. For example, "NCHW". Returns ------- int An axis indicating the batch_size dimension. itNN(tNonetfind(R((s(build/bdist.linux-armv7l/egg/mxnet/io.pytget_batch_axisLs cC@sw|dk rKt|ƒ}g|D](}t|d|d||dƒ^qSg|D]}t|d|dƒ^qRSdS(sªGet DataDesc list from attribute lists. Parameters ---------- shapes : a tuple of (name, shape) types : a tuple of (name, type) iiN(RtdictR(tshapesttypest type_dicttx((s(build/bdist.linux-armv7l/egg/mxnet/io.pytget_listbs  3( t__name__t __module__t__doc__R RRt staticmethodRR%(((s(build/bdist.linux-armv7l/egg/mxnet/io.pyR(s  t DataBatchcB@s2eZdZddddddd„Zd„ZRS(sA data batch. MXNet's data iterator returns a batch of data for each `next` call. This data contains `batch_size` number of examples. If the input data consists of images, then shape of these images depend on the `layout` attribute of `DataDesc` object in `provide_data` parameter. If `layout` is set to 'NCHW' then, images should be stored in a 4-D matrix of shape ``(batch_size, num_channel, height, width)``. If `layout` is set to 'NHWC' then, images should be stored in a 4-D matrix of shape ``(batch_size, height, width, num_channel)``. The channels are often in RGB order. Parameters ---------- data : list of `NDArray`, each array containing `batch_size` examples. A list of input data. label : list of `NDArray`, each array often containing a 1-dimensional array. optional A list of input labels. pad : int, optional The number of examples padded at the end of a batch. It is used when the total number of examples read is not divisible by the `batch_size`. These extra padded examples are ignored in prediction. index : numpy.array, optional The example indices in this batch. bucket_key : int, optional The bucket key, used for bucketing module. provide_data : list of `DataDesc`, optional A list of `DataDesc` objects. `DataDesc` is used to store name, shape, type and layout information of the data. The *i*-th element describes the name and shape of ``data[i]``. provide_label : list of `DataDesc`, optional A list of `DataDesc` objects. `DataDesc` is used to store name, shape, type and layout information of the label. The *i*-th element describes the name and shape of ``label[i]``. cC@s£|dk r0t|ttfƒs0tdƒ‚n|dk r`t|ttfƒs`tdƒ‚n||_||_||_||_||_ ||_ ||_ dS(NsData must be list of NDArrayssLabel must be list of NDArrays( Rt isinstancetlistttupletAssertionErrortdatatlabeltpadtindext bucket_keyt provide_datat provide_label(RR/R0R1R2R3R4R5((s(build/bdist.linux-armv7l/egg/mxnet/io.pyt__init__—s $ $      cC@sWg|jD]}|j^q }g|jD]}|j^q)}dj|jj||ƒS(Ns${}: data shapes: {} label shapes: {}(R/RR0tformatt __class__R&(Rtdt data_shapestlt label_shapes((s(build/bdist.linux-armv7l/egg/mxnet/io.pyt__str__¦s  N(R&R'R(RR6R=(((s(build/bdist.linux-armv7l/egg/mxnet/io.pyR*qs% tDataItercB@skeZdZdd„Zd„Zd„Zd„Zd„Zd„Zd„Z d „Z d „Z d „Z RS( s¤The base class for an MXNet data iterator. All I/O in MXNet is handled by specializations of this class. Data iterators in MXNet are similar to standard-iterators in Python. On each call to `next` they return a `DataBatch` which represents the next batch of data. When there is no more data to return, it raises a `StopIteration` exception. Parameters ---------- batch_size : int, optional The batch size, namely the number of items in the batch. See Also -------- NDArrayIter : Data-iterator for MXNet NDArray or numpy-ndarray objects. CSVIter : Data-iterator for csv data. ImageIter : Data-iterator for images. icC@s ||_dS(N(t batch_size(RR?((s(build/bdist.linux-armv7l/egg/mxnet/io.pyR6ÁscC@s|S(N((R((s(build/bdist.linux-armv7l/egg/mxnet/io.pyt__iter__ÄscC@sdS(s,Reset the iterator to the begin of the data.N((R((s(build/bdist.linux-armv7l/egg/mxnet/io.pytresetÇsc C@sM|jƒrCtd|jƒd|jƒd|jƒd|jƒƒSt‚dS(sæGet next data batch from iterator. Returns ------- DataBatch The data of next batch. Raises ------ StopIteration If the end of the data is reached. R/R0R1R2N(t iter_nextR*tgetdatatgetlabeltgetpadtgetindext StopIteration(R((s(build/bdist.linux-armv7l/egg/mxnet/io.pytnextËs cC@s |jƒS(N(RH(R((s(build/bdist.linux-armv7l/egg/mxnet/io.pyt__next__ÞscC@sdS(s}Move to the next batch. Returns ------- boolean Whether the move is successful. N((R((s(build/bdist.linux-armv7l/egg/mxnet/io.pyRBáscC@sdS(s‡Get data of current batch. Returns ------- list of NDArray The data of the current batch. N((R((s(build/bdist.linux-armv7l/egg/mxnet/io.pyRCëscC@sdS(sGet label of the current batch. Returns ------- list of NDArray The label of the current batch. N((R((s(build/bdist.linux-armv7l/egg/mxnet/io.pyRDõscC@sdS(sŸGet index of the current batch. Returns ------- index : numpy.array The indices of examples in the current batch. N(R(R((s(build/bdist.linux-armv7l/egg/mxnet/io.pyRFÿscC@sdS(s«Get the number of padding examples in the current batch. Returns ------- int Number of padding examples in the current batch. N((R((s(build/bdist.linux-armv7l/egg/mxnet/io.pyRE s( R&R'R(R6R@RARHRIRBRCRDRFRE(((s(build/bdist.linux-armv7l/egg/mxnet/io.pyR>®s      t ResizeItercB@sPeZdZed„Zd„Zd„Zd„Zd„Zd„Z d„Z RS(sMResize a data iterator to a given number of batches. Parameters ---------- data_iter : DataIter The data iterator to be resized. size : int The number of batches per epoch to resize to. reset_internal : bool Whether to reset internal iterator on ResizeIter.reset. Examples -------- >>> nd_iter = mx.io.NDArrayIter(mx.nd.ones((100,10)), batch_size=25) >>> resize_iter = mx.io.ResizeIter(nd_iter, 2) >>> for batch in resize_iter: ... print(batch.data) [] [] cC@s†tt|ƒjƒ||_||_||_d|_d|_|j |_ |j |_ |j |_ t |dƒr‚|j |_ ndS(Nitdefault_bucket_key(RRJR6t data_itertsizetreset_internaltcurRt current_batchR4R5R?thasattrRK(RRLRMRN((s(build/bdist.linux-armv7l/egg/mxnet/io.pyR6)s        cC@s&d|_|jr"|jjƒndS(Ni(RORNRLRA(R((s(build/bdist.linux-armv7l/egg/mxnet/io.pyRA7s  cC@sr|j|jkrtSy|jjƒ|_Wn0tk r^|jjƒ|jjƒ|_nX|jd7_tS(Ni( RORMtFalseRLRHRPRGRAtTrue(R((s(build/bdist.linux-armv7l/egg/mxnet/io.pyRB<s  cC@s |jjS(N(RPR/(R((s(build/bdist.linux-armv7l/egg/mxnet/io.pyRCHscC@s |jjS(N(RPR0(R((s(build/bdist.linux-armv7l/egg/mxnet/io.pyRDKscC@s |jjS(N(RPR2(R((s(build/bdist.linux-armv7l/egg/mxnet/io.pyRFNscC@s |jjS(N(RPR1(R((s(build/bdist.linux-armv7l/egg/mxnet/io.pyREQs( R&R'R(RSR6RARBRCRDRFRE(((s(build/bdist.linux-armv7l/egg/mxnet/io.pyRJs     tPrefetchingItercB@sƒeZdZd d d„Zd„Zed„ƒZed„ƒZd„Z d„Z d„Z d„Z d „Z d „Zd „ZRS( s€Performs pre-fetch for other data iterators. This iterator will create another thread to perform ``iter_next`` and then store the data in memory. It potentially accelerates the data read, at the cost of more memory usage. Parameters ---------- iters : DataIter or list of DataIter The data iterators to be pre-fetched. rename_data : None or list of dict The *i*-th element is a renaming map for the *i*-th iter, in the form of {'original_name' : 'new_name'}. Should have one entry for each entry in iter[i].provide_data. rename_label : None or list of dict Similar to ``rename_data``. Examples -------- >>> iter1 = mx.io.NDArrayIter({'data':mx.nd.ones((100,10))}, batch_size=25) >>> iter2 = mx.io.NDArrayIter({'data':mx.nd.ones((100,10))}, batch_size=25) >>> piter = mx.io.PrefetchingIter([iter1, iter2], ... rename_data=[{'data': 'data_1'}, {'data': 'data_2'}]) >>> print(piter.provide_data) [DataDesc[data_1,(25, 10L),,NCHW], DataDesc[data_2,(25, 10L),,NCHW]] cC@sÁtt|ƒjƒt|tƒs.|g}nt|ƒ|_|jdksRt‚||_||_ ||_ |j ddd|_ gt |jƒD]}tjƒ^q•|_gt |jƒD]}tjƒ^qÀ|_x|jD]}|jƒqåWt|_gt |jƒD] }d^q|_gt |jƒD] }d^q7|_d„}gt |jƒD]$}tjd|d||gƒ^qe|_x(|jD]}|jtƒ|jƒqœWdS(NiicS@sx‰tr‹|j|jƒ|js'Pny|j|jƒ|j|>> data = np.arange(40).reshape((10,2,2)) >>> labels = np.ones([10, 1]) >>> dataiter = mx.io.NDArrayIter(data, labels, 3, True, last_batch_handle='discard') >>> for batch in dataiter: ... print batch.data[0].asnumpy() ... batch.data[0].shape ... [[[ 36. 37.] [ 38. 39.]] [[ 16. 17.] [ 18. 19.]] [[ 12. 13.] [ 14. 15.]]] (3L, 2L, 2L) [[[ 32. 33.] [ 34. 35.]] [[ 4. 5.] [ 6. 7.]] [[ 24. 25.] [ 26. 27.]]] (3L, 2L, 2L) [[[ 8. 9.] [ 10. 11.]] [[ 20. 21.] [ 22. 23.]] [[ 28. 29.] [ 30. 31.]]] (3L, 2L, 2L) >>> dataiter.provide_data # Returns a list of `DataDesc` [DataDesc[data,(3, 2L, 2L),,NCHW]] >>> dataiter.provide_label # Returns a list of `DataDesc` [DataDesc[softmax_label,(3, 1L),,NCHW]] In the above example, data is shuffled as `shuffle` parameter is set to `True` and remaining examples are discarded as `last_batch_handle` parameter is set to `discard`. Usage of `last_batch_handle` parameter: >>> dataiter = mx.io.NDArrayIter(data, labels, 3, True, last_batch_handle='pad') >>> batchidx = 0 >>> for batch in dataiter: ... batchidx += 1 ... >>> batchidx # Padding added after the examples read are over. So, 10/3+1 batches are created. 4 >>> dataiter = mx.io.NDArrayIter(data, labels, 3, True, last_batch_handle='discard') >>> batchidx = 0 >>> for batch in dataiter: ... batchidx += 1 ... >>> batchidx # Remaining examples are discarded. So, 10/3 batches are created. 3 `NDArrayIter` also supports multiple input and labels. >>> data = {'data1':np.zeros(shape=(10,2,2)), 'data2':np.zeros(shape=(20,2,2))} >>> label = {'label1':np.zeros(shape=(10,1)), 'label2':np.zeros(shape=(20,1))} >>> dataiter = mx.io.NDArrayIter(data, label, 3, True, last_batch_handle='discard') Parameters ---------- data: array or list of array or dict of string to array The input data. label: array or list of array or dict of string to array, optional The input label. batch_size: int Batch size of data. shuffle: bool, optional Whether to shuffle the data. Only supported if no h5py.Dataset inputs are used. last_batch_handle : str, optional How to handle the last batch. This parameter can be 'pad', 'discard' or 'roll_over'. 'roll_over' is intended for training and can cause problems if used for prediction. data_name : str, optional The data name. label_name : str, optional The label name. iR1R/t softmax_labelc C@sZtt|ƒj|ƒt|dtd|ƒ|_t|dtd|ƒ|_tj |jddj dƒ|_ |rhtj j |j ƒg|jD][\}} tr·t| tjƒntsâ|t| jƒ|j | jƒfn || f^q“|_g|jD][\}} tr%t| tjƒntsP|t| jƒ|j | jƒfn || f^q|_n|dkr¹|jddj d|jddj d|} |j | |_ ng|jD]} | d^qÃg|jD]} | d^qà|_t|jƒ|_|j j d|_|j|ks:tdƒ‚| |_||_||_dS(NR|R}iitdiscards.batch_size needs to be smaller than data size.(RRR6R€RRR/RSR0RutarangeRtidxtrandomtshuffleRtR+RwRtasnumpytcontextt data_listRat num_sourcetnum_dataR.tcursorR?tlast_batch_handle( RR/R0R?R‡RŽt data_namet label_nameR~Rtnew_nR$((s(build/bdist.linux-armv7l/egg/mxnet/io.pyR6Us*$kn 2A   c C@sLg|jD]>\}}t|t|jgt|jdƒƒ|jƒ^q S(s5The name and shape of data provided by this iterator.i(R/RR-R?R,RR(RR~R((s(build/bdist.linux-armv7l/egg/mxnet/io.pyR4xsc C@sLg|jD]>\}}t|t|jgt|jdƒƒ|jƒ^q S(s6The name and shape of label provided by this iterator.i(R0RR-R?R,RR(RR~R((s(build/bdist.linux-armv7l/egg/mxnet/io.pyR5€scC@s|j |_dS(s'Ignore roll over data and set to start.N(R?R(R((s(build/bdist.linux-armv7l/egg/mxnet/io.pyt hard_resetˆscC@sW|jdkrF|j|jkrF|j |j|j|j|_n |j |_dS(Nt roll_over(RŽRRŒR?(R((s(build/bdist.linux-armv7l/egg/mxnet/io.pyRAŒs!%cC@s"|j|j7_|j|jkS(N(RR?RŒ(R((s(build/bdist.linux-armv7l/egg/mxnet/io.pyRB’sc C@sG|jƒr=td|jƒd|jƒd|jƒddƒSt‚dS(NR/R0R1R2(RBR*RCRDRERRG(R((s(build/bdist.linux-armv7l/egg/mxnet/io.pyRH–s c C@s?|j|jkstdƒ‚|j|j|jkr g|D]Ë}t|dtjtfƒr~|d|j|j|j!nˆt|dt |j |j|j|j!ƒgt |j |j|j|j!ƒD]2}t |j |j|j|j!ƒj |ƒ^q̓^q>S|j|j|j}g|D] }t|dtjtfƒrrt |d|j|d| gƒnÂt t|dt |j |jƒgt |j |jƒD]%}t |j |jƒj |ƒ^qªƒt|dt |j | ƒgt |j | ƒD]"}t |j | ƒj |ƒ^qƒgƒ^q+SdS(s4Load data from underlying arrays, internal use only.sDataIter needs reset.iN(RRŒR.R?R+RuRvR RtsortedR…R,R2R(Rt data_sourceR$R]R1((s(build/bdist.linux-armv7l/egg/mxnet/io.pyt_getdatas Ó cC@s|j|jƒS(N(R–R/(R((s(build/bdist.linux-armv7l/egg/mxnet/io.pyRCÁscC@s|j|jƒS(N(R–R0(R((s(build/bdist.linux-armv7l/egg/mxnet/io.pyRDÄscC@sE|jdkr=|j|j|jkr=|j|j|jSdSdS(NR1i(RŽRR?RŒ(R((s(build/bdist.linux-armv7l/egg/mxnet/io.pyREÇsN(R&R'R(RRRR6RsR4R5R’RARBRHR–RCRDRE(((s(build/bdist.linux-armv7l/egg/mxnet/io.pyRsS  !     $  t MXDataItercB@sneZdZddd„Zd„Zd„Zd„Zd„Zd„Zd „Z d „Z d „Z d „Z RS( s¾A python wrapper a C++ data iterator. This iterator is the Python wrapper to all native C++ data iterators, such as `CSVIter, `ImageRecordIter`, `MNISTIter`, etc. When initializing `CSVIter` for example, you will get an `MXDataIter` instance to use in your Python code. Calls to `next`, `reset`, etc will be delegated to the underlying C++ data iterators. Usually you don't need to interact with `MXDataIter` directly unless you are implementing your own data iterators in C++. To do that, please refer to examples under the `src/io` folder. Parameters ---------- handle : DataIterHandle, required The handle to the underlying C++ Data Iterator. data_name : str, optional Data name. Default to "data". label_name : str, optional Label name. Default to "softmax_label". See Also -------- src/io : The underlying C++ data iterator implementation, e.g., `CSVIter`. R/R‚cK@s­tt|ƒjƒ||_t|_d|_|jƒ|_|jj d}|jj d}t ||j |j ƒg|_t ||j |j ƒg|_|j d|_dS(Ni(RR—R6thandleRRt_debug_skip_loadRt first_batchRHR/R0RRRR4R5R?(RR˜RRt_R/R0((s(build/bdist.linux-armv7l/egg/mxnet/io.pyR6és   cC@sttj|jƒƒdS(N(R RtMXDataIterFreeR˜(R((s(build/bdist.linux-armv7l/egg/mxnet/io.pyRnúscC@st|_tjdƒdS(Ns>Set debug_skip_load to be true, will simply return first batch(RSR™tloggingtinfo(R((s(build/bdist.linux-armv7l/egg/mxnet/io.pytdebug_skip_loadýs cC@s,t|_d|_ttj|jƒƒdS(N(RSt_debug_at_beginRRšR RtMXDataIterBeforeFirstR˜(R((s(build/bdist.linux-armv7l/egg/mxnet/io.pyRAs  c C@sÿ|jrP|j rPtd|jƒgd|jƒgd|jƒd|jƒƒS|jdk ru|j}d|_|St |_t j dƒ}t t j|jt j|ƒƒƒ|jrõtd|jƒgd|jƒgd|jƒd|jƒƒSt‚dS(NR/R0R1R2i(R™R R*RCRDRERFRšRRRtctypestc_intR RtMXDataIterNextR˜tbyreftvalueRG(RRrtnext_res((s(build/bdist.linux-armv7l/egg/mxnet/io.pyRH s0    " 0 cC@sK|jdk rtStjdƒ}ttj|jtj |ƒƒƒ|j S(Ni( RšRRSR¢R£R RR¤R˜R¥R¦(RR§((s(build/bdist.linux-armv7l/egg/mxnet/io.pyRBs "cC@s8tƒ}ttj|jtj|ƒƒƒt|tƒS(N( R R RtMXDataIterGetDataR˜R¢R¥R RR(Rthdl((s(build/bdist.linux-armv7l/egg/mxnet/io.pyRC!s "cC@s8tƒ}ttj|jtj|ƒƒƒt|tƒS(N( R R RtMXDataIterGetLabelR˜R¢R¥R RR(RR©((s(build/bdist.linux-armv7l/egg/mxnet/io.pyRD&s "cC@sŸtjdƒ}tjtjƒƒ}ttj|jtj|ƒtj|ƒƒƒtj|j ƒ}tj|j j |ƒ}t j |dt jƒ}|jƒS(NiR(R¢tc_uint64tPOINTERR RtMXDataIterGetIndexR˜R¥t addressoftcontentsR¦t from_addressRut frombuffertuint64tcopy(Rt index_sizet index_datataddresstdbuffertnp_index((s(build/bdist.linux-armv7l/egg/mxnet/io.pyRF+s cC@s8tjdƒ}ttj|jtj|ƒƒƒ|jS(Ni(R¢R£R RtMXDataIterGetPadNumR˜R¥R¦(RR1((s(build/bdist.linux-armv7l/egg/mxnet/io.pyRE6s"( R&R'R(R6RnRŸRARHRBRCRDRFRE(((s(build/bdist.linux-armv7l/egg/mxnet/io.pyR—Ïs        c @s¡tjƒ}tjƒ}tƒ}tjtjƒƒ}tjtjƒƒ}tjtjƒƒ}ttjˆtj|ƒtj|ƒtj|ƒtj|ƒtj|ƒtj|ƒƒƒt|j ƒ‰t |j ƒ}t gt |ƒD]}t||ƒ^qégt |ƒD]}t||ƒ^qgt |ƒD]}t||ƒ^q5ƒ} d ddd} | |j | f} ‡‡fd†} ˆ| _ | | _| S( s Create an io iterator by handle.s%s s%s sReturns s------- s MXDataIter s The result iterator.c@sâg}g}xF|jƒD]8\}}|jt|ƒƒ|jtt|ƒƒƒqWttj|ƒ}ttj|ƒ}tƒ}tt j ˆt t |ƒƒ||tj |ƒƒƒt |ƒrÕtdˆƒ‚nt||S(sECreate an iterator. The parameters listed below can be passed in as keyword arguments. Parameters ---------- name : string, required. Name of the resulting data iterator. Returns ------- dataiter: Dataiter The resulting data iterator. s$%s can only accept keyword arguments(RztappendRtstrRR¢tc_char_pRR RtMXDataIterCreateIterRRaR¥RyR—(R`tkwargst param_keyst param_valsR~tvalt iter_handle(R˜t iter_name(s(build/bdist.linux-armv7l/egg/mxnet/io.pytcreatorZs    s%s %s s%s %s Returns (R¢R¼RR¬R RtMXDataIterGetIterInfoR¥RR¦tintt_build_param_docReR&R(( R˜Rtdesctnum_argst arg_namest arg_typest arg_descstnargR]t param_strtdoc_strRÄ((R˜RÃs(build/bdist.linux-armv7l/egg/mxnet/io.pyt_make_io_iterator;s2       &&, #  cC@s¦tjtjƒƒ}tjƒ}ttjtj|ƒtj|ƒƒƒtj t }xIt |j ƒD]8}tj||ƒ}t |ƒ}t||j |ƒqfWdS(s6List and add all the data iterators to current module.N(R¢R¬tc_void_ptc_uintR RtMXListDataItersR¥tsystmodulesR&ReR¦RÐtsetattr(tplistRMt module_objR]R©tdataiter((s(build/bdist.linux-armv7l/egg/mxnet/io.pyt_init_io_modules (  (*R(t __future__Rt collectionsRRRÔR¢RRfRtt ImportErrorRtnumpyRutbaseRRRRRRR R R R RÇRvR RRRtobjectR*R>RJRTR€RR—RÐRÚ(((s(build/bdist.linux-armv7l/egg/mxnet/io.pyts<       "%I=eA Îl F