ó —Àv]c@sedZdgZddlmZddlmZmZddlm Z m Z de fd„ƒYZ dS( sParameter optimizer.tTraineri(t optimizer(t_create_kvstoret_create_sparse_kvstorei(t ParameterDictt ParametercBsÔeZdZddddd„Zd„Zd„Zd„Zd„Zd„Z e d„ƒZ e d „ƒZ d „Z ed „Zd „Zed „Zd„Zd„Zed„Zed„Zd„Zd„ZRS(sBApplies an `Optimizer` on a set of Parameters. Trainer should be used together with `autograd`. .. note:: For the following cases, updates will always happen on kvstore, i.e., you cannot set update_on_kvstore=False. - dist kvstore with sparse weights or sparse gradients - dist async kvstore - `optimizer.lr_scheduler` is not None Parameters ---------- params : ParameterDict The set of parameters to optimize. optimizer : str or Optimizer The optimizer to use. See `help `_ on Optimizer for a list of available optimizers. optimizer_params : dict Key-word arguments to be passed to optimizer constructor. For example, `{'learning_rate': 0.1}`. All optimizers accept learning_rate, wd (weight decay), clip_gradient, and lr_scheduler. See each optimizer's constructor for a list of additional supported arguments. kvstore : str or KVStore kvstore type for multi-gpu and distributed training. See help on :any:`mxnet.kvstore.create` for more information. compression_params : dict Specifies type of gradient compression and additional arguments depending on the type of compression being used. For example, 2bit compression requires a threshold. Arguments would then be {'type':'2bit', 'threshold':0.5} See mxnet.KVStore.set_gradient_compression method for more details on gradient compression. update_on_kvstore : bool, default None Whether to perform parameter updates on kvstore. If None, then trainer will choose the more suitable option depending on the type of kvstore. If the `update_on_kvstore` argument is provided, environment variable `MXNET_UPDATE_ON_KVSTORE` will be ignored. Properties ---------- learning_rate : float The current learning rate of the optimizer. Given an Optimizer object optimizer, its learning rate can be accessed as optimizer.learning_rate. tdevicec Cs¿t|ttfƒr*t|jƒƒ}nt|ttfƒsXtdt|ƒƒ‚ng|_t |_ t |_ i|_ x¥t |ƒD]—\}}t|tƒs½tdt|ƒƒ‚n||j |j<|jj|ƒ|j|ƒ|jdkrt|_ n|jdkr‰t|_ q‰q‰W||_|jƒ|_|rH|ni}|j||ƒ|jj|_i|d6|d6|_t |_d|_d|_ d|_!g|_"|j#ƒdS(Ns<First argument must be a list or dict of Parameters, got %s.sDFirst argument must be a list or dict of Parameters, got list of %s.tdefaulttkvstoretupdate_on_kvstore($t isinstancetdictRtlisttvaluesttuplet ValueErrorttypet_paramstFalset_contains_sparse_weightt_contains_sparse_gradt _param2idxt enumerateRtnametappendt _set_trainert_stypetTruet _grad_stypet_compression_paramst_check_contextst _contextst_init_optimizert _optimizert rescale_gradt_scalet_kvstore_paramst_kv_initializedtNonet_kvstoret_update_on_kvstoret _distributedt_params_to_initt_reset_kvstore( tselftparamsRtoptimizer_paramsRtcompression_paramsR titparam((s4/tmp/pip-install-Qvdv_2/mxnet/mxnet/gluon/trainer.pyt__init__HsD            cCspd}xc|jD]X}|jƒ}|dksb||ksbtd|jt|ƒt|ƒfƒ‚|}qW|S(Ns–All Parameters must be initialized on the same set of contexts, but Parameter %s is initialized on %s while previous Parameters are initialized on %s.(R&Rtlist_ctxtAssertionErrorRtstr(R,tcontextsR1tctx((s4/tmp/pip-install-Qvdv_2/mxnet/mxnet/gluon/trainer.pyRns " cCs d„t|jƒDƒ}t|tjƒrV| s>tdƒ‚||_||j_ntj|d|||_g|j D]}tj |jƒ^q{|_ dS(NcSsi|]\}}||“qS(((t.0R0R1((s4/tmp/pip-install-Qvdv_2/mxnet/mxnet/gluon/trainer.pys zs sUoptimizer_params must be None if optimizer is an instance of Optimizer instead of strt param_dict( RRR toptt OptimizerR4R!R9tcreateRt get_updatert _updaters(R,RR.R9t_((s4/tmp/pip-install-Qvdv_2/mxnet/mxnet/gluon/trainer.pyR ys    cCsÉ|jstdƒ‚g}|jr¼x•|jD]‡}|jrM|j|ƒq.|j|jtƒ}|j |j }|jj ||dƒ|j dkr.|jj ||d| ƒq.q.Wn||_dS(soInitialize parameters in the KVStore. Parameters with incomplete initialization are ignored. sHCannot initialize parameters in KVStore when KVStore is not initialized.iRtpriorityN(R%R4R'R*t_deferred_initRt_check_and_gett_dataR RRtinitRtpull(R,tparams_to_initR1t param_arraystidx((s4/tmp/pip-install-Qvdv_2/mxnet/mxnet/gluon/trainer.pyt _init_params‰s  $cCsq|jr*d|jjkr*tdƒ‚nt|_d|_d|_d|_g|jD] }|^qX|_ dS(sReset kvstore.tdists!Cannot reset distributed KVStore.N( R'Rt RuntimeErrorRR%R&R)R(RR*(R,R1((s4/tmp/pip-install-Qvdv_2/mxnet/mxnet/gluon/trainer.pyR+Ÿs    cs:ˆj}ˆjr\t|dƒ\}}d|jkˆ_|dtkrËtdƒ‚qËnoˆjr‡fd†ˆjDƒ}t |dt ˆj ƒ|ƒ\}}|r¸d|jkntˆ_ˆj}|dd k rË|dtkrˆjrtdƒ‚n|d}qËn¹‡fd†ˆjDƒ}t |dt ˆj ƒ|ƒ\}}|red|jkntˆ_ˆjr®d|jkr®t }|dtkr®td ƒ‚q®n|dd k rË|d}n|rˆjrí|jˆjƒn|r|jˆjƒn|ˆ_|ˆ_nd ˆ_d ˆ_t ˆ_d S( sCreate kvstore.RRJR sCCannot set update_on_kvstore=False when sparse weights are present.cs,i|]"}|jˆjdƒ|j“qS(i(tdataRR(R8R1(R,(s4/tmp/pip-install-Qvdv_2/mxnet/mxnet/gluon/trainer.pys Ðs sUCannot set update_on_kvstore=False on dist kvstore when sparse gradients are present.cs,i|]"}|jˆjdƒ|j“qS(i(RLRR(R8R1(R,(s4/tmp/pip-install-Qvdv_2/mxnet/mxnet/gluon/trainer.pys Þs tasyncs>Please set update_on_kvstore=True when training in async mode.N(R$RRRR)RRRRRtlenRR&RRtset_gradient_compressiont set_optimizerR!R'R(R%(R,tconfigRR t arg_arraysR?((R,s4/tmp/pip-install-Qvdv_2/mxnet/mxnet/gluon/trainer.pyt _init_kvstore©sF   %       cCs2t|jtjƒs$tdƒ‚n |jjSdS(NsEOptimizer has to be defined before its learning rate can be accessed.(R R!R:R;t UserWarningt learning_rate(R,((s4/tmp/pip-install-Qvdv_2/mxnet/mxnet/gluon/trainer.pyRUúscCs,t|jtjƒr|jStdƒ‚dS(Ns&Optimizer has not been initialized yet(R R!R:R;RT(R,((s4/tmp/pip-install-Qvdv_2/mxnet/mxnet/gluon/trainer.pyRscCs8t|jtjƒs$tdƒ‚n|jj|ƒdS(s¡Sets a new learning rate of the optimizer. Parameters ---------- lr : float The new learning rate of the optimizer. s@Optimizer has to be defined before its learning rate is mutated.N(R R!R:R;RTtset_learning_rate(R,tlr((s4/tmp/pip-install-Qvdv_2/mxnet/mxnet/gluon/trainer.pyRV scCs½|js|jƒn|jr,|jƒn|j|j}|r–d|jjkr–|j|j dkspt ‚|jj |d|d| dt ƒn#|jj |d|d|d| ƒdS(sœInternal method to invoke pull operations on KVStore. If `full_idx` is set to True, `kv.pull` is preferred instead of `kv.row_sparse_pull`. RJitoutR@t ignore_sparsetrow_idsN(R%RSR*RIRRR'RtsizetshapeR4RERtrow_sparse_pull(R,t parameterRXtrow_idtfull_idxRH((s4/tmp/pip-install-Qvdv_2/mxnet/mxnet/gluon/trainer.pyt_row_sparse_pulls    &cCsO|jr?|jr?|jr?|jj|kr?tdƒ‚q?n||j_dS(NsÏPossible change in the `batch_size` from previous `step` detected. Optimizer gradient normalizing factor will not change w.r.t new batch_size when update_on_kvstore=True and when distributed kvstore is used.(R(R)R%R!R"RT(R,tscale((s4/tmp/pip-install-Qvdv_2/mxnet/mxnet/gluon/trainer.pyt_check_and_rescale_grad'scCsa|j|}|j|ƒ|js0|jƒn|jrF|jƒn|jƒ|j|ƒdS(s¢Makes one step of parameter update. Should be called after `autograd.backward()` and outside of `record()` scope. For normal parameter updates, `step()` should be used, which internally calls `allreduce_grads()` and then `update()`. However, if you need to get the reduced gradients to perform certain transformation, such as in gradient clipping, then you may want to manually call `allreduce_grads()` and `update()` separately. Parameters ---------- batch_size : int Batch size of data processed. Gradient will be normalized by `1/batch_size`. Set this to 1 if you normalized loss manually with `loss = mean(loss)`. ignore_stale_grad : bool, optional, default=False If true, ignores Parameters with stale gradient (gradient that has not been updated by `backward` after last step) and skip update. N(R#RcR%RSR*RIt_allreduce_gradst_update(R,t batch_sizetignore_stale_gradR"((s4/tmp/pip-install-Qvdv_2/mxnet/mxnet/gluon/trainer.pytstep1s       cCsY|js|jƒn|jr,|jƒn|jo;|j sKtdƒ‚|jƒdS(s"For each parameter, reduce the gradients from different contexts. Should be called after `autograd.backward()`, outside of `record()` scope, and before `trainer.update()`. For normal parameter updates, `step()` should be used, which internally calls `allreduce_grads()` and then `update()`. However, if you need to get the reduced gradients to perform certain transformation, such as in gradient clipping, then you may want to manually call `allreduce_grads()` and `update()` separately. sŠallreduce_grads() when parameters are updated on kvstore is not supported. Try setting `update_on_kvstore` to False when creating trainer.N(R%RSR*RIR'R(R4Rd(R,((s4/tmp/pip-install-Qvdv_2/mxnet/mxnet/gluon/trainer.pytallreduce_gradsNs     cCs—|jr“x‡t|jƒD]s\}}|jdkr|jj||jƒd| ƒ|jsŒ|jj||jƒd| d|jƒqŒqqWndS(NtnullR@RY( R'RRtgrad_reqtpusht list_gradR(RER)(R,R0R1((s4/tmp/pip-install-Qvdv_2/mxnet/mxnet/gluon/trainer.pyRdds   cCsp|js|jƒn|jr,|jƒn|jo;|j sKtdƒ‚|j|j|ƒ|j |ƒdS(sÊMakes one step of parameter update. Should be called after `autograd.backward()` and outside of `record()` scope, and after `trainer.update()`. For normal parameter updates, `step()` should be used, which internally calls `allreduce_grads()` and then `update()`. However, if you need to get the reduced gradients to perform certain transformation, such as in gradient clipping, then you may want to manually call `allreduce_grads()` and `update()` separately. Parameters ---------- batch_size : int Batch size of data processed. Gradient will be normalized by `1/batch_size`. Set this to 1 if you normalized loss manually with `loss = mean(loss)`. ignore_stale_grad : bool, optional, default=False If true, ignores Parameters with stale gradient (gradient that has not been updated by `backward` after last step) and skip update. supdate() when parameters are updated on kvstore is not supported. Try setting `update_on_kvstore` to False when creating trainer.N( R%RSR*RIR'R(R4RcR#Re(R,RfRg((s4/tmp/pip-install-Qvdv_2/mxnet/mxnet/gluon/trainer.pytupdatens     c CsÀg|jD] }g^q }x7t|jƒD]&\}}|jdkrMq,n|s¤xN|j|jtƒD]4}|jsitd|j t |j ƒfƒ‚qiqiWn|j rî|j rî|jdkr,|j j||jƒd| ƒq,q,nxat||jƒ|jƒƒD]A\}}} | s,|jr |j|| |fƒt|_q q Wq,W|j oe|j s¼xQt|j|ƒD]:\} }|r{t|Œ\}} } | || | ƒq{q{WndS(NRjsoGradient of Parameter `%s` on context %s has not been updated by backward since last `step`. This could mean a bug in your model that made it only use a subset of the Parameters (Blocks) for this iteration. If you are intentionally only using a subset, call step with ignore_stale_grad=True to suppress this warning and skip updating of Parameters with stale gradientRR@(R>RRRkRBRCR t _fresh_gradRTRR5tcontextR'R(RREt list_datatzipRmRR( R,RgR?tupdatesR0R1RLtupdtarrtgradtupdatertwtg((s4/tmp/pip-install-Qvdv_2/mxnet/mxnet/gluon/trainer.pyRes. &#.cCsµ|jdk st‚|js+|jƒn|jrA|jƒn|jry|j s`tdƒ‚|jj |dt ƒn8t |dƒ'}|j |j djdt ƒƒWdQXdS(sDSaves trainer states (e.g. optimizer, momentum) to a file. Parameters ---------- fname : str Path to output states file. Note ---- `optimizer.param_dict`, which contains Parameter information (such as `lr_mult` and `wd_mult`) will not be saved. sSCannot save trainer states when some parameters are not yet initialized in kvstore.tdump_optimizertwbiN(R!R&R4R%RSR*RIR(R'tsave_optimizer_statesRtopentwriteR>t get_states(R,tfnametfout((s4/tmp/pip-install-Qvdv_2/mxnet/mxnet/gluon/trainer.pyt save_states´s     cCsî|js|jƒn|jr,|jƒn|jrZ|jj|ƒ|jjj|_ nkt |dƒ}|j ƒ}WdQXx1|j D]&}|j |ƒ|j dj|_qˆW|j dj|_ d„t|jƒDƒ}||j _dS(s‘Loads trainer states (e.g. optimizer, momentum) from a file. Parameters ---------- fname : str Path to input states file. Note ---- `optimizer.param_dict`, which contains Parameter information (such as `lr_mult` and `wd_mult`) will not be loaded from the file, but rather set based on current Trainer's parameters. trbNicSsi|]\}}||“qS(((R8R0R1((s4/tmp/pip-install-Qvdv_2/mxnet/mxnet/gluon/trainer.pys îs (R%RSR*RIR(R'tload_optimizer_statest_updaterRR!R}treadR>t set_statesRRR9(R,R€tftstatesRwR9((s4/tmp/pip-install-Qvdv_2/mxnet/mxnet/gluon/trainer.pyt load_statesÑs      N(t__name__t __module__t__doc__R&R2RR RIR+RStpropertyRURRVRRaRcRhRiRdRnReR‚RŠ(((s4/tmp/pip-install-Qvdv_2/mxnet/mxnet/gluon/trainer.pyRs(,%   Q     ! % N( Rt__all__tRR:tmodelRRR^RRtobjectR(((s4/tmp/pip-install-Qvdv_2/mxnet/mxnet/gluon/trainer.pyts