v]c@sdZddlZddlZddlZddlZddlZddlZddlmZddl m Z m Z m Z m Z mZmZmZmZmZddl mZmZmZmZmZmZmZmZmZmZmZmZm Z m!Z!m"Z"m#Z#m$Z$m%Z%ddl m&Z&ddl'm(Z(d d d d d dddddddddddddddddgZ)dZ*de+fd YZ,e,j-Z-e-de,fd!YZ.e-de,fd"YZ/e-de,fd#YZ0e-de,fd$YZ1e-d e,fd%YZ2e-de,fd&YZ3e-de,fd'YZ4e-de.fd(YZ5e-d e,fd)YZ6e-d e,fd*YZ7e-de,fd+YZ8e-d e,fd,YZ9e-de,fd-YZ:e-d e,fd.YZ;e-de,fd/YZ<e-de,fd0YZ=e,j>Z?de+fd1YZ@d2ZAdS(3sWeight updating functions.iNi(tpy_str( tNDArraytzerostcliptsqrttcasttmaximumtabstarraytmultiply(t sgd_updatetsgd_mom_updatet adam_updatetrmsprop_updatetrmspropalex_updatet mp_sgd_updatetmp_sgd_mom_updatetsquaret ftrl_updatet ftml_updatetsignsgd_updatet signum_updatetnag_mom_updatetmp_nag_mom_updatetmulti_sgd_updatetmulti_sgd_mom_updatetmulti_mp_sgd_updatetmulti_mp_sgd_mom_update(tsparse(tnormaltAdaDeltatAdaGradtAdamtAdamaxtDCASGDtFTMLtFtrltLBSGDtNAGtNDabstNadamt OptimizertRMSProptSGDtSGLDtSignumtTesttUpdatertccSGDtcreatet get_updatertregistercCs$g|D]}|D] }|^qqS(N((t nested_listtsublisttitem((s:/tmp/pip-install-Qvdv_2/mxnet/mxnet/optimizer/optimizer.pyt _flatten_list+sc BseZdZddddddddedd ZiZedZedZ e dZ d Z d Z d Zd Zd ZdZdZdZdZdZdZdZdZdZdZdZRS(sThe base class inherited by all optimizers. Parameters ---------- rescale_grad : float, optional, default 1.0 Multiply the gradient with `rescale_grad` before updating. Often choose to be ``1.0/batch_size``. param_idx2name : dict from int to string, optional, default None A dictionary that maps int index to string name. clip_gradient : float, optional, default None Clip the gradient by projecting onto the box ``[-clip_gradient, clip_gradient]``. learning_rate : float, optional, default 0.01 The initial learning rate. lr_scheduler : LRScheduler, optional, default None The learning rate scheduler. wd : float, optional, default 0.0 The weight decay (or L2 regularization) coefficient. Modifies objective by adding a penalty for having large weights. sym: Symbol, optional, default None The Symbol this optimizer is applying to. begin_num_update : int, optional, default 0 The initial number of updates. multi_precision : bool, optional, default False Flag to control the internal precision of the optimizer. False: results in using the same precision as the weights (default), True: makes internal 32-bit copy of the weights and applies gradients in 32-bit precision even if actual weights used in the model have lower precision. Turning this on can improve convergence and accuracy when training with float16. param_dict : dict of int -> gluon.Parameter, default None Dictionary of parameter index to gluon.Parameter, used to lookup parameter attributes such as lr_mult, wd_mult, etc. param_dict shall not be deep copied. Properties ---------- learning_rate : float The current learning rate of the optimizer. Given an Optimizer object optimizer, its learning rate can be accessed as optimizer.learning_rate. g?gg{Gz?ic Cs=||_||_||_|dk r6||j_n||_i|_i|_||_||_ iid6|_ |j d|_ ||_ | |_ d|_|dkri}nt|tstd|j|_|dk r|j|jfnd|_| r| ni|_|ji|jidS(Nis:param_idx2name should be a dict of param indexes to names.((t rescale_gradtlrt lr_schedulertNonetbase_lrtwdtlr_multtwd_multtbegin_num_updatet num_updatet_all_index_update_countst_index_update_countt clip_gradienttmulti_precisiont aggregate_numt isinstancetdicttAssertionErrortcopytidx2namet attr_dicttlist_argumentstsym_infot param_dictt set_lr_multt set_wd_mult( tselfR8tparam_idx2nameR=RDt learning_rateR:tsymR@RERO((s:/tmp/pip-install-Qvdv_2/mxnet/mxnet/optimizer/optimizer.pyt__init__^s0               - cCs~t|tst|jj}|tjkrmtjd|j |jtj|j tj|jfn|tj|<|S(sRegisters a new optimizer. Once an optimizer is registered, we can create an instance of this optimizer with `create_optimizer` later. Examples -------- >>> @mx.optimizer.Optimizer.register ... class MyOptimizer(mx.optimizer.Optimizer): ... pass >>> optim = mx.optimizer.Optimizer.create_optimizer('MyOptimizer') >>> print(type(optim)) sCWARNING: New optimizer %s.%s is overriding existing optimizer %s.%s( RGttypeRIt__name__tlowerR)t opt_registrytwarningstwarnt __module__(tklasstname((s:/tmp/pip-install-Qvdv_2/mxnet/mxnet/optimizer/optimizer.pyR3s    cKs@|jtjkr,tj|j|Std|dS(sInstantiates an optimizer with a given name and kwargs. .. note:: We can use the alias `create` for ``Optimizer.create_optimizer``. Parameters ---------- name: str Name of the optimizer. Should be the name of a subclass of Optimizer. Case insensitive. kwargs: dict Parameters for the optimizer. Returns ------- Optimizer An instantiated optimizer. Examples -------- >>> sgd = mx.optimizer.Optimizer.create_optimizer('sgd') >>> type(sgd) >>> adam = mx.optimizer.create('adam', learning_rate=.1) >>> type(adam) sCannot find optimizer %sN(RYR)RZt ValueError(R_tkwargs((s:/tmp/pip-install-Qvdv_2/mxnet/mxnet/optimizer/optimizer.pytcreate_optimizerscCs*|jdk r|j|jS|jSdS(N(R:R;RAR9(RR((s:/tmp/pip-install-Qvdv_2/mxnet/mxnet/optimizer/optimizer.pyRTscCsdS(sTCreates auxiliary state for a given weight. Some optimizers require additional states, e.g. as momentum, in addition to gradients in order to update weights. This function creates state for a given weight which will be used in `update`. This function is called only once for each weight. Parameters ---------- index : int An unique index to identify the weight. weight : NDArray The weight. Returns ------- state : any obj The state associated with the weight. N((RRtindextweight((s:/tmp/pip-install-Qvdv_2/mxnet/mxnet/optimizer/optimizer.pyt create_statetcCsd}|jrM|jtjkrM|jtj}|f|j||fS|jtjkry|j rytj dn|j||S(sCreates auxiliary state for a given weight, including FP32 high precision copy if original weight is FP16. This method is provided to perform automatic mixed precision training for optimizers that do not support it themselves. Parameters ---------- index : int An unique index to identify the weight. weight : NDArray The weight. Returns ------- state : any obj The state associated with the weight. sAccumulating with float16 in optimizer can lead to poor accuracy or slow convergence. Consider using multi_precision=True option of the optimizerN( R;REtdtypetnumpytfloat16tastypetfloat32ReR[R\(RRRcRdtweight_master_copy((s:/tmp/pip-install-Qvdv_2/mxnet/mxnet/optimizer/optimizer.pytcreate_state_multi_precisionscCs tdS(sXUpdates the given parameter using the corresponding gradient and state. Parameters ---------- index : int The unique index of the parameter into the individual learning rates and weight decays. Learning rates and weight decay may be set via `set_lr_mult()` and `set_wd_mult()`, respectively. weight : NDArray The parameter to be updated. grad : NDArray The gradient of the objective with respect to this parameter. state : any obj The state returned by `create_state()`. N(tNotImplementedError(RRRcRdtgradtstate((s:/tmp/pip-install-Qvdv_2/mxnet/mxnet/optimizer/optimizer.pytupdatescCs|jrs|jtjkrs|d}|d}|jtj}|j||||t|d|jd|n|j||||dS(syUpdates the given parameter using the corresponding gradient and state. Mixed precision version. Parameters ---------- index : int The unique index of the parameter into the individual learning rates and weight decays. Learning rates and weight decay may be set via `set_lr_mult()` and `set_wd_mult()`, respectively. weight : NDArray The parameter to be updated. grad : NDArray The gradient of the objective with respect to this parameter. state : any obj The state returned by `create_state()`. iiRgtoutN(RERgRhRiRjRkRqR(RRRcRdRoRpRltoriginal_statetgrad32((s:/tmp/pip-install-Qvdv_2/mxnet/mxnet/optimizer/optimizer.pytupdate_multi_precision s  cCs+|jdk rtdn ||_dS(sSets a new learning rate of the optimizer. Parameters ---------- lr : float The new learning rate of the optimizer. sLRScheduler of the optimizer has already been defined. Note that set_learning_rate can mutate the value of the learning rate of the optimizer only when the LRScheduler of the optimizer is undefined.N(R:R;t UserWarningR9(RRR9((s:/tmp/pip-install-Qvdv_2/mxnet/mxnet/optimizer/optimizer.pytset_learning_rate%scCs tdS(s4[DEPRECATED] Sets lr scale. Use set_lr_mult instead.N(tDeprecationWarning(RRt args_lrscale((s:/tmp/pip-install-Qvdv_2/mxnet/mxnet/optimizer/optimizer.pyt set_lr_scale6scCsi|_|jro|j\}}xK|D]@}||kr(d||kr(t||d|j|RNtfloatRq(RRt args_lr_multtattrt arg_namesR_((s:/tmp/pip-install-Qvdv_2/mxnet/mxnet/optimizer/optimizer.pyRP:s   %cCsi|_xE|jjD]4}|jdp:|jdsd|j|RKtget(RRtindicesR9t_tlrstiRc((s:/tmp/pip-install-Qvdv_2/mxnet/mxnet/optimizer/optimizer.pyt_get_lrss  -cCs|j|gdS(s Gets the learning rate given the index of the weight. Parameters ---------- index : int The index corresponding to the weight. Returns ------- lr : float Learning rate for this index. i(R(RRRc((s:/tmp/pip-install-Qvdv_2/mxnet/mxnet/optimizer/optimizer.pyt_get_lrs cCsg|D]}|j^q}xt|D]\}}||jkra||c|j|j9cKs5tt|j|||_||_||_dS(N(RR#RVtbeta1tbeta2tepsilon(RRRRRRa((s:/tmp/pip-install-Qvdv_2/mxnet/mxnet/optimizer/optimizer.pyRVs  cCsUt|j|jd|jt|j|jd|jt|j|jd|jfS(NRg(RRRRg(RRRcRd((s:/tmp/pip-install-Qvdv_2/mxnet/mxnet/optimizer/optimizer.pyResc Cst|tstt|ts*t|j||j|}|j|}|j|}i|jd6|jd6|j d6|j d6|d6}|j r|j |dcKsDtt|jd||||_||_||_||_dS(NRT(RR RVRRRR(RRRTRRRRRa((s:/tmp/pip-install-Qvdv_2/mxnet/mxnet/optimizer/optimizer.pyRVs    cCs^|jr|jnd}t|j|jd|jd|t|j|jd|jd|fS(NRRgR(RRRRRRg(RRRcRdR((s:/tmp/pip-install-Qvdv_2/mxnet/mxnet/optimizer/optimizer.pyRes c Cs#t|tstt|ts*t|j||j|}|j|}|j|}d|j|}d|j|} |t j | |9}i|jd6|jd6|j d6|j d6} |j r|j | dcKs#tt|j|||_dS(N(RRRVtfloat_stable_eps(RRtepsRa((s:/tmp/pip-install-Qvdv_2/mxnet/mxnet/optimizer/optimizer.pyRVscCst|j|jd|jS(NR(RRRR(RRRcRd((s:/tmp/pip-install-Qvdv_2/mxnet/mxnet/optimizer/optimizer.pyResc CsJt|tstt|ts*t|j||j|}|j|}|jdk}|}|ri|jd6|jd6} |j r|j | dcKsMtt|jd||||_||_||_||_||_dS(NRT(RR*RVtgamma1tgamma2tcenteredRt clip_weights(RRRTRRRRRRa((s:/tmp/pip-install-Qvdv_2/mxnet/mxnet/optimizer/optimizer.pyRV+s     cCs|jr^t|j|jd|jt|j|jd|jt|j|jd|jfSt|j|jd|jfSdS(NR(RRRRR(RRRcRd((s:/tmp/pip-install-Qvdv_2/mxnet/mxnet/optimizer/optimizer.pyRe4s  c Cs<t|tstt|ts*t|j||j|}|j|}i|jd6|jd6|jd6}|j r|j |dcKs,tt|j|||_||_dS(N(RRRVtrhoR(RRRRRa((s:/tmp/pip-install-Qvdv_2/mxnet/mxnet/optimizer/optimizer.pyRVos cCs(t|j|jt|j|jfS(N(RRR(RRRcRd((s:/tmp/pip-install-Qvdv_2/mxnet/mxnet/optimizer/optimizer.pyRetsc Cst|tstt|ts*t|j|}|j|||j9}|jdk r~t||j |j}n|\}}||j 9(|d|j ||7(t ||j t ||j |}||j 9(|d|j ||7(||||8(dS(Ng?( RGRRIRRR8RDR;RRRR( RRRcRdRoRpR=tacc_gt acc_deltat current_delta((s:/tmp/pip-install-Qvdv_2/mxnet/mxnet/optimizer/optimizer.pyRqxs   ((RXR]RRVReRq(((s:/tmp/pip-install-Qvdv_2/mxnet/mxnet/optimizer/optimizer.pyRVs cBs2eZdZddddZdZdZRS(sThe Ftrl optimizer. Referenced from *Ad Click Prediction: a View from the Trenches*, available at http://dl.acm.org/citation.cfm?id=2488200. eta : .. math:: \eta_{t,i} = \frac{learningrate}{\beta+\sqrt{\sum_{s=1}^tg_{s,i}^2}} The optimizer updates the weight by:: rescaled_grad = clip(grad * rescale_grad, clip_gradient) z += rescaled_grad - (sqrt(n + rescaled_grad**2) - sqrt(n)) * weight / learning_rate n += rescaled_grad**2 w = (sign(z) * lamda1 - z) / ((beta + sqrt(n)) / learning_rate + wd) * (abs(z) > lamda1) If the storage types of weight, state and grad are all ``row_sparse``, **sparse updates** are applied by:: for row in grad.indices: rescaled_grad[row] = clip(grad[row] * rescale_grad, clip_gradient) z[row] += rescaled_grad[row] - (sqrt(n[row] + rescaled_grad[row]**2) - sqrt(n[row])) * weight[row] / learning_rate n[row] += rescaled_grad[row]**2 w[row] = (sign(z[row]) * lamda1 - z[row]) / ((beta + sqrt(n[row])) / learning_rate + wd) * (abs(z[row]) > lamda1) The sparse update only updates the z and n for the weights whose row_sparse gradient indices appear in the current batch, rather than updating it for all indices. Compared with the original update, it can provide large improvements in model training throughput for some applications. However, it provides slightly different semantics than the original update, and may lead to different empirical results. For details of the update algorithm, see :class:`~mxnet.ndarray.ftrl_update`. This optimizer accepts the following parameters in addition to those accepted by :class:`.Optimizer`. Parameters ---------- lamda1 : float, optional L1 regularization coefficient. learning_rate : float, optional The initial learning rate. beta : float, optional Per-coordinate learning rate correlation parameter. g{Gz?g?icKs5tt|j|||_||_||_dS(N(RR$RVtlamda1tbetaR9(RRRRTRRa((s:/tmp/pip-install-Qvdv_2/mxnet/mxnet/optimizer/optimizer.pyRVs  cCs:t|j|jd|jt|j|jd|jfS(NR(RRRR(RRRcRd((s:/tmp/pip-install-Qvdv_2/mxnet/mxnet/optimizer/optimizer.pyResc Cst|tstt|ts*t|j||j|}|j|}i|jd6|jd6|jd6}|j r|j |dgMbp?cKsMtt|jd||||_||_||_||_d|_dS(NRTg?(RR(RVRRRtschedule_decayt m_schedule(RRRTRRRRRa((s:/tmp/pip-install-Qvdv_2/mxnet/mxnet/optimizer/optimizer.pyRV-s     cCs:t|j|jd|jt|j|jd|jfS(NRg(RRRRg(RRRcRd((s:/tmp/pip-install-Qvdv_2/mxnet/mxnet/optimizer/optimizer.pyRe6scCst|tstt|ts*t|j||j|}|j|}|j|}||j||}|jdk rt ||j |j}n|j ddt d||j }|j ddt d|d|j } |j||_|j| } |\} } | |j 9(| d|j |7(| |j9(| d|j||7(|d|j} | d| }| dt |j|}d|| | |}|||t||j8(dS(Ng?g?gQ?i(RGRRIRRRRCR8RDR;RRtpowRRRRR(RRRcRdRoRpR9R=Rt momentum_tt momentum_t_1tm_schedule_nextRtv_tt grad_primet m_t_primet v_t_primetm_t_bar((s:/tmp/pip-install-Qvdv_2/mxnet/mxnet/optimizer/optimizer.pyRq:s.  %)  (RXR]RRVReRq(((s:/tmp/pip-install-Qvdv_2/mxnet/mxnet/optimizer/optimizer.pyR(s    cBs)eZdZdZdZdZRS(sThe Test optimizercKstt|j|dS(N(RR.RV(RRRa((s:/tmp/pip-install-Qvdv_2/mxnet/mxnet/optimizer/optimizer.pyRV`scCst|j|jS(s$Creates a state to duplicate weight.(RRR(RRRcRd((s:/tmp/pip-install-Qvdv_2/mxnet/mxnet/optimizer/optimizer.pyRecscCs|||j7(||(dS(s"Performs w += rescale_grad * grad.N(R8(RRRcRdRoRp((s:/tmp/pip-install-Qvdv_2/mxnet/mxnet/optimizer/optimizer.pyRqgs(RXR]RRVReRq(((s:/tmp/pip-install-Qvdv_2/mxnet/mxnet/optimizer/optimizer.pyR.]s  cBs>eZdZdZdZdZdZedZRS(sUpdater for kvstore.cCs1||_i|_i|_|jdk|_dS(Ni(t optimizerRt states_syncedRFtaggregate_updates(RRR((s:/tmp/pip-install-Qvdv_2/mxnet/mxnet/optimizer/optimizer.pyRVqs   cCst|ttfs3|g}|g}|g}n|}|}|}|rh|jj|djjnxt|D]\}}t|trt |||<||}n||j kr|jj ||||j |sN(RGRt as_in_contextRR(RRRpRt synced_state((RRRs:/tmp/pip-install-Qvdv_2/mxnet/mxnet/optimizer/optimizer.pyRs   cCsptj|}t|trEt|dkrE|\|_|_n ||_tj|jj t |_ dS(sSets updater states.iN( tpickletloadsRGRRRRRHtfromkeystkeysRR(RRR((s:/tmp/pip-install-Qvdv_2/mxnet/mxnet/optimizer/optimizer.pyt set_statess ! cCs(tj|r|j|jfn|jS(sGets updater states. Parameters ---------- dump_optimizer : bool, default False Whether to also save the optimizer itself. This would also save optimizer information such as learning rate and weight decay schedules. (R"tdumpsRR(RRtdump_optimizer((s:/tmp/pip-install-Qvdv_2/mxnet/mxnet/optimizer/optimizer.pyt get_statess ( RXR]RRVRRR&RR)(((s:/tmp/pip-install-Qvdv_2/mxnet/mxnet/optimizer/optimizer.pyR/os   2 cCs t|S(sReturns a closure of the updater needed for kvstore. Parameters ---------- optimizer: Optimizer The optimizer. Returns ------- updater: function The closure of the updater. (R/(R((s:/tmp/pip-install-Qvdv_2/mxnet/mxnet/optimizer/optimizer.pyR2s (BRRRR"R[RRhtbaseRtndarrayRRRRRRRR'RR R R R R RRRRRRRRRRRRRRRtrandomRt__all__R7tobjectR)R3R+R-R#R%R"R&R,R0R RR*RR$R!R(R.RbR1R/R2(((s:/tmp/pip-install-Qvdv_2/mxnet/mxnet/optimizer/optimizer.pytsj      @v  B97MS:M;K8E [