ó ùµÈ[c@sddZddlZddlZddlZddlmZddlmZdefd„ƒYZ dS(soA `SVRGModule` implements the `Module` API by wrapping an auxiliary module to perform SVRG optimization logic. iÿÿÿÿN(tModulei(t_SVRGOptimizert SVRGModulecBs4eZdZddeejƒddddddd„ Zd„Zdd„Z ddde d „Z d „Z de e e dd d „Zdd„Zdd„Zd„Zd„Zd„Zd„Zd„Zd„Zddddddd ddejjd ƒdde e e dddddd„Zdd„ZRS(!sRSVRGModule is a module that encapsulates two Modules to accommodate the SVRG optimization technique. It is functionally the same as Module API, except it is implemented using SVRG optimization logic. Parameters ---------- symbol : Symbol data_names : list of str Defaults to `('data')` for a typical model used in image classification. label_names : list of str Defaults to `('softmax_label')` for a typical model used in image classification. logger : Logger Defaults to `logging`. context : Context or list of Context Defaults to ``mx.cpu()``. work_load_list : list of number Default ``None``, indicating uniform workload. fixed_param_names: list of str Default ``None``, indicating no network parameters are fixed. state_names : list of str states are similar to data and label, but not provided by data iterator. Instead they are initialized to 0 and can be set by `set_states()`. group2ctxs : dict of str to context or list of context, or list of dict of str to context Default is `None`. Mapping the `ctx_group` attribute to the context assignment. compression_params : dict Specifies type of gradient compression and additional arguments depending on the type of compression being used. For example, 2bit compression requires a threshold. Arguments would then be {'type':'2bit', 'threshold':0.5} See mxnet.KVStore.set_gradient_compression method for more details on gradient compression. update_freq: int Specifies the number of times to update the full gradients to be used in the SVRG optimization. For instance, update_freq = 2 will calculates the gradients over all data every two epochs Examples -------- >>> # An example of declaring and using SVRGModule. >>> mod = SVRGModule(symbol=lro, data_names=['data'], label_names=['lin_reg_label'], update_freq=2) >>> mod.fit(di, eval_metric='mse', optimizer='sgd', optimizer_params=(('learning_rate', 0.025),), >>> num_epoch=num_epoch, kvstore='local') tdatat softmax_labelc CsÝtt|ƒj|d|d|d|d|d|d|d|d| d | ƒ t| tƒr‚| d krvtd ƒ‚n| |_n td ƒ‚tj j ||||||||| | ƒ |_ d|_ t|jƒ|_dS( Nt data_namest label_namestloggertcontexttwork_load_listtfixed_param_namest state_namest group2ctxstcompression_paramsisnupdate_freq in SVRGModule must be a positive integer to represent the frequency for calculating full gradientssfupdate_freq in SVRGModule must be an integer to represent the frequency for calculating full gradients(tsuperRt__init__t isinstancetintt ValueErrort update_freqt TypeErrortmxtmodRt_mod_auxtNonet _param_dicttlent_contextt_ctx_len( tselftsymbolRRRRR R R R R R((sk/usr/local/lib/python2.7/site-packages/mxnet-1.3.1-py2.7.egg/mxnet/contrib/svrg_optimization/svrg_module.pyRGs'       cCs$tt|ƒjƒ|jjƒdS(s9Internal function to reset binded state for both modules.N(RRt _reset_bindR(R((sk/usr/local/lib/python2.7/site-packages/mxnet-1.3.1-py2.7.egg/mxnet/contrib/svrg_optimization/svrg_module.pyR`scCs6tt|ƒj|d|ƒ|jj|d|ƒdS(sReshapes both modules for new input shapes. Parameters ---------- data_shapes : list of (str, tuple) Typically is ``data_iter.provide_data``. label_shapes : list of (str, tuple) Typically is ``data_iter.provide_label``. t label_shapesN(RRtreshapeR(Rt data_shapesR ((sk/usr/local/lib/python2.7/site-packages/mxnet-1.3.1-py2.7.egg/mxnet/contrib/svrg_optimization/svrg_module.pyR!es tlocaltsgdt learning_rateg{®Gáz„?c s4gtˆjƒD]/‰‡‡fd†ˆjƒdjƒDƒ^qˆ_ˆjtjd|d|d|ƒ}tt ˆƒj d|d|d|d|ƒˆj r0xt ˆj jƒD]v\}}ˆj j|}ˆj j|dtjjd ˆj|jƒƒˆjr³ˆj j|d|d | ƒq³q³Wnd S( sÁInstalls and initializes SVRGOptimizer. The SVRGOptimizer is a wrapper class for a regular optimizer that is passed in and a special AssignmentOptimizer to accumulate the full gradients. If KVStore is 'local' or None, the full gradients will be accumulated locally without pushing to the KVStore. Otherwise, additional keys will be pushed to accumulate the full gradients in the KVStore. Parameters ---------- kvstore : str or KVStore Default `'local'`. optimizer : str or Optimizer Default `'sgd'` optimizer_params : dict Default `(('learning_rate', 0.01),)`. The default value is not a dictionary, just to avoid pylint warning of dangerous default values. force_init : bool Default ``False``, indicating whether we should force re-initializing the optimizer in the case an optimizer is already installed. cs>i|]4\}}tjjd|jdˆjˆƒ|“qS(tshapetctx(RtndtzerosR&R(t.0tkeytvalue(tiR(sk/usr/local/lib/python2.7/site-packages/mxnet-1.3.1-py2.7.egg/mxnet/contrib/svrg_optimization/svrg_module.pys ˆs it default_opttkvstoretoptimizer_paramst optimizert force_initt_fullR&tpriorityN(trangeRt get_paramstitemsRt_create_optimizerRt__name__RRtinit_optimizert_kvstoret enumeratet _exec_groupt param_arrayst param_namestinitRR(R)t _arg_paramsR&t_update_on_kvstoretpull( RR/R1R0R2tsvrg_optimizertidxt param_on_devstname((R-Rsk/usr/local/lib/python2.7/site-packages/mxnet-1.3.1-py2.7.egg/mxnet/contrib/svrg_optimization/svrg_module.pyR:rsE  0 c svˆjj}tjj|ˆjˆjƒ\}}|rdd|jkrdd|jkrd||j9}nd|}i} |r–| j t ˆjj ƒƒnFxCt ˆjƒD]2‰| j ‡‡fd†t ˆjj ƒDƒƒq¦WxEˆj djƒD]0} tt| jƒƒƒd} | d| | µs iiR3t rescale_gradtdefault_optimizertparam_idx2name(R=t batch_sizeRtmodelt_create_kvstoreRRAttypet num_workerstupdateR<R?R5RtkeystmaxtlisttdictR1tcreate( RR1R.R/R0ROtkv_storetupdate_on_kvstoreRLtidx2nameR+tmax_key((RKRsk/usr/local/lib/python2.7/site-packages/mxnet-1.3.1-py2.7.egg/mxnet/contrib/svrg_optimization/svrg_module.pyR8™s* $$      twritecCsWtt|ƒj|||||||ƒ|rS|jj|||||||ƒndS(sîBinds the symbols to construct executors for both two modules. This is necessary before one can perform computation with the SVRGModule. Parameters ---------- data_shapes : list of (str, tuple) Typically is ``data_iter.provide_data``. label_shapes : list of (str, tuple) Typically is ``data_iter.provide_label``. for_training : bool Default is ``True``. Whether the executors should be bound for training. inputs_need_grad : bool Default is ``False``. Whether the gradients to the input data need to be computed. Typically this is not needed. But this might be needed when implementing composition of modules. force_rebind : bool Default is ``False``. This function does nothing if the executors are already bound. But with this ``True``, the executors will be forced to rebind. shared_module : Module Default is ``None``. This is used in bucketing. When not ``None``, the shared module essentially corresponds to a different bucket -- a module with different symbol but with the same sets of parameters (e.g. unrolled RNNs with different lengths). N(RRtbindR(RR"R t for_trainingtinputs_need_gradt force_rebindt shared_moduletgrad_req((sk/usr/local/lib/python2.7/site-packages/mxnet-1.3.1-py2.7.egg/mxnet/contrib/svrg_optimization/svrg_module.pyR_Æs  cCs9tt|ƒj||ƒ|r5|jj||ƒndS(s¸Forward computation for both two modules. It supports data batches with different shapes, such as different batch sizes or different image sizes. If reshaping of data batch relates to modification of symbol or module, such as changing image layout ordering or switching from training to predicting, module rebinding is required. See Also ---------- :meth:`BaseModule.forward`. Parameters ---------- data_batch : DataBatch Could be anything with similar API implemented. is_train : bool Default is ``None``, which means ``is_train`` takes the value of ``self.for_training``. N(RRtforwardR(Rt data_batchtis_train((sk/usr/local/lib/python2.7/site-packages/mxnet-1.3.1-py2.7.egg/mxnet/contrib/svrg_optimization/svrg_module.pyReèscCs9tt|ƒj|ƒ|jjr5|jj|ƒndS(svBackward computation. See Also ---------- :meth:`BaseModule.backward`. Parameters ---------- out_grads : NDArray or list of NDArray, optional Gradient on the outputs to be propagated back. This parameter is only needed when bind is called on outputs that are not a loss function. N(RRtbackwardRtbinded(Rt out_grads((sk/usr/local/lib/python2.7/site-packages/mxnet-1.3.1-py2.7.egg/mxnet/contrib/svrg_optimization/svrg_module.pyRhÿs cCs!|jƒtt|ƒjƒdS(súUpdates parameters according to the installed optimizer and the gradients computed in the previous forward-backward batch. The gradients in the _exec_group will be overwritten using the gradients calculated by the SVRG update rule. When KVStore is used to update parameters for multi-device or multi-machine training, a copy of the parameters is stored in KVStore. Note that for `row_sparse` parameters, this function does update the copy of parameters in KVStore, but doesn't broadcast the updated parameters to all devices / machines. Please call `prepare` to broadcast `row_sparse` parameters with the next batch of data. See Also ---------- :meth:`BaseModule.update`. N(t_update_svrg_gradientsRRRT(R((sk/usr/local/lib/python2.7/site-packages/mxnet-1.3.1-py2.7.egg/mxnet/contrib/svrg_optimization/svrg_module.pyRTs cCs¤|jj}|jƒ\}}|jjd|d|ƒ|jƒd}d}xÃ|D]»}|jj|dtƒ|jjƒ|d7}x|t |j ƒD]k}xbt |ƒD]T\} } |jjj | |} t jj|j|| | ddƒ|j|| NDArray. The resulting dict is used for pulling row_sparse parameters from the kvstore, where the str key is the name of the param, and the value is the row id of the param to pull. validation_metric: str or EvalMetric The performance measure used to display during validation. splease specify number of epochsR"R R`Rbt initializerRlRmt allow_missingR2R/R1R0it pre_slicedtsparse_row_id_fntepochRzt eval_metrictlocalsisEpoch[%d] Train-%s=%fsEpoch[%d] Time cost=%.3ftscore_end_callbacktbatch_end_callbacksEpoch[%d] Validation-%s=%fN(+RtAssertionErrorR_t provide_datat provide_labelRqtinstall_monitort init_paramsR:RRtmetrict EvalMetricRYR5RpttimeRRtitertFalsetnextttictforward_backwardRTRWt update_metrictlabeltpreparet StopIterationt toc_printtget_name_valueRPt BatchEndParamR”tbaset_as_listRtinfoR6RoRtscore(%RRwt eval_dataR“tepoch_end_callbackR–R/R1R0teval_end_callbackteval_batch_end_callbackRŽRlRmRRbR2t begin_epocht num_epochtvalidation_metrictmonitorR‘R’R¢t data_itert end_of_batchRztnext_data_batchRftdbteval_name_valstbatch_end_paramstcallbackRGtvalttoctres((sk/usr/local/lib/python2.7/site-packages/mxnet-1.3.1-py2.7.egg/mxnet/contrib/svrg_optimization/svrg_module.pytfit‹szI               /        cCs6tt|ƒj|d|ƒ|jj|d|ƒdS(s¦Prepares two modules for processing a data batch. Usually involves switching bucket and reshaping. For modules that contain `row_sparse` parameters in KVStore, it prepares the `row_sparse` parameters based on the sparse_row_id_fn. When KVStore is used to update parameters for multi-device or multi-machine training, a copy of the parameters are stored in KVStore. Note that for `row_sparse` parameters, the `update()` updates the copy of parameters in KVStore, but doesn't broadcast the updated parameters to all devices / machines. The `prepare` function is used to broadcast `row_sparse` parameters with the next batch of data. Parameters ---------- data_batch : DataBatch The current batch of data for forward computation. sparse_row_id_fn : A callback function The function takes `data_batch` as an input and returns a dict of str -> NDArray. The resulting dict is used for pulling row_sparse parameters from the kvstore, where the str key is the name of the param, and the value is the row id of the param to pull. R‘N(RRR¦R(RRfR‘((sk/usr/local/lib/python2.7/site-packages/mxnet-1.3.1-py2.7.egg/mxnet/contrib/svrg_optimization/svrg_module.pyR¦)s(sdata(s softmax_labelN(s learning_rateg{®Gáz„?((s learning_rateg{®Gáz„?(s learning_rateg{®Gáz„?((s learning_rateg{®Gáz„?(R9t __module__t__doc__tloggingRtcpuRRRR!R R:R8RqR_ReRhRTRRvR„R‰RkR@tUniformRÁR¦(((sk/usr/local/lib/python2.7/site-packages/mxnet-1.3.1-py2.7.egg/mxnet/contrib/svrg_optimization/svrg_module.pyRs:'   & -!    #       —( RÃRžRÄtmxnetRt mxnet.moduleRRDRR(((sk/usr/local/lib/python2.7/site-packages/mxnet-1.3.1-py2.7.egg/mxnet/contrib/svrg_optimization/svrg_module.pyts