R;]c@sdZddlZddlZddlZddlZddlmZddlmZm Z m Z m Z m Z m Z mZddlmZmZmZmZmZmZmZmZmZmZmZmZddlmZddlmZd efd YZ e j!Z!e!d e fd YZ"e!d e fdYZ#e!de fdYZ$e!de fdYZ%e!de fdYZ&e!de fdYZ'e!de fdYZ(e!de"fdYZ)e!de fdYZ*e!de fdYZ+e!de fd YZ,e!d!e fd"YZ-e!d#e fd$YZ.e!d%e fd&YZ/e!d'e fd(YZ0e!d)e fd*YZ1e j2Z3d+efd,YZ4d-Z5dS(.sWeight updating functions.iNi(tpy_str(tNDArraytzerostcliptsqrttcasttmaximumtabs( t sgd_updatetsgd_mom_updatet adam_updatetrmsprop_updatetrmspropalex_updatet mp_sgd_updatetmp_sgd_mom_updatetsquaret ftrl_updatet ftml_updatetsignsgd_updatet signum_update(tsparse(tnormalt Optimizerc BseZdZddddddddedd ZiZedZedZ e dZ d Z d Z d Zd Zd ZdZdZdZdZdZdZRS(s4The base class inherited by all optimizers. Parameters ---------- rescale_grad : float, optional Multiply the gradient with `rescale_grad` before updating. Often choose to be ``1.0/batch_size``. param_idx2name : dict from int to string, optional A dictionary that maps int index to string name. clip_gradient : float, optional Clip the gradient by projecting onto the box ``[-clip_gradient, clip_gradient]``. learning_rate : float, optional The initial learning rate. lr_scheduler : LRScheduler, optional The learning rate scheduler. wd : float, optional The weight decay (or L2 regularization) coefficient. Modifies objective by adding a penalty for having large weights. sym: Symbol, optional The Symbol this optimizer is applying to. begin_num_update : int, optional The initial number of updates. multi_precision : bool, optional Flag to control the internal precision of the optimizer. ``False`` results in using the same precision as the weights (default), ``True`` makes internal 32-bit copy of the weights and applies gradients in 32-bit precision even if actual weights used in the model have lower precision. Turning this on can improve convergence and accuracy when training with float16. Properties ---------- learning_rate : float The current learning rate of the optimizer. Given an Optimizer object optimizer, its learning rate can be accessed as optimizer.learning_rate. g?gg{Gz?ic Cs||_||_||_|dk r6||j_n||_i|_i|_||_||_ i|_ ||_ | |_ |dkri}nt |tstd|j|_|dk r|j|jfnd|_| r| ni|_|ji|jidS(Ns:param_idx2name should be a dict of param indexes to names.((t rescale_gradtlrt lr_schedulertNonetbase_lrtwdtlr_multtwd_multtbegin_num_updatet num_updatet_index_update_countt clip_gradienttmulti_precisiont isinstancetdicttAssertionErrortcopytidx2namet attr_dicttlist_argumentstsym_infot param_dictt set_lr_multt set_wd_mult( tselfRtparam_idx2nameRR"t learning_rateRtsymRR#R,((sO/usr/local/lib/python2.7/site-packages/mxnet-1.2.1-py2.7.egg/mxnet/optimizer.pyt__init__Ns,               - cCs~t|tst|jj}|tjkrmtjd|j |jtj|j tj|jfn|tj|<|S(sRegisters a new optimizer. Once an optimizer is registered, we can create an instance of this optimizer with `create_optimizer` later. Examples -------- >>> @mx.optimizer.Optimizer.register ... class MyOptimizer(mx.optimizer.Optimizer): ... pass >>> optim = mx.optimizer.Optimizer.create_optimizer('MyOptimizer') >>> print(type(optim)) sCWARNING: New optimizer %s.%s is overriding existing optimizer %s.%s( R$ttypeR&t__name__tlowerRt opt_registrytwarningstwarnt __module__(tklasstname((sO/usr/local/lib/python2.7/site-packages/mxnet-1.2.1-py2.7.egg/mxnet/optimizer.pytregisterns    cKs@|jtjkr,tj|j|Std|dS(sInstantiates an optimizer with a given name and kwargs. .. note:: We can use the alias `create` for ``Optimizer.create_optimizer``. Parameters ---------- name: str Name of the optimizer. Should be the name of a subclass of Optimizer. Case insensitive. kwargs: dict Parameters for the optimizer. Returns ------- Optimizer An instantiated optimizer. Examples -------- >>> sgd = mx.optimizer.Optimizer.create_optimizer('sgd') >>> type(sgd) >>> adam = mx.optimizer.create('adam', learning_rate=.1) >>> type(adam) sCannot find optimizer %sN(R6RR7t ValueError(R<tkwargs((sO/usr/local/lib/python2.7/site-packages/mxnet-1.2.1-py2.7.egg/mxnet/optimizer.pytcreate_optimizerscCs*|jdk r|j|jS|jSdS(N(RRR R(R/((sO/usr/local/lib/python2.7/site-packages/mxnet-1.2.1-py2.7.egg/mxnet/optimizer.pyR1scCsdS(sTCreates auxiliary state for a given weight. Some optimizers require additional states, e.g. as momentum, in addition to gradients in order to update weights. This function creates state for a given weight which will be used in `update`. This function is called only once for each weight. Parameters ---------- index : int An unique index to identify the weight. weight : NDArray The weight. Returns ------- state : any obj The state associated with the weight. N((R/tindextweight((sO/usr/local/lib/python2.7/site-packages/mxnet-1.2.1-py2.7.egg/mxnet/optimizer.pyt create_statescCsd}|jrM|jtjkrM|jtj}|f|j||fS|jtjkry|j rytj dn|j||S(sCreates auxiliary state for a given weight, including FP32 high precision copy if original weight is FP16. This method is provided to perform automatic mixed precision training for optimizers that do not support it themselves. Parameters ---------- index : int An unique index to identify the weight. weight : NDArray The weight. Returns ------- state : any obj The state associated with the weight. sAccumulating with float16 in optimizer can lead to poor accuracy or slow convergence. Consider using multi_precision=True option of the optimizerN( RR#tdtypetnumpytfloat16tastypetfloat32RCR8R9(R/RARBtweight_master_copy((sO/usr/local/lib/python2.7/site-packages/mxnet-1.2.1-py2.7.egg/mxnet/optimizer.pytcreate_state_multi_precisionscCs tdS(sXUpdates the given parameter using the corresponding gradient and state. Parameters ---------- index : int The unique index of the parameter into the individual learning rates and weight decays. Learning rates and weight decay may be set via `set_lr_mult()` and `set_wd_mult()`, respectively. weight : NDArray The parameter to be updated. grad : NDArray The gradient of the objective with respect to this parameter. state : any obj The state returned by `create_state()`. N(tNotImplementedError(R/RARBtgradtstate((sO/usr/local/lib/python2.7/site-packages/mxnet-1.2.1-py2.7.egg/mxnet/optimizer.pytupdatescCs|jrs|jtjkrs|d}|d}|jtj}|j||||t|d|jd|n|j||||dS(syUpdates the given parameter using the corresponding gradient and state. Mixed precision version. Parameters ---------- index : int The unique index of the parameter into the individual learning rates and weight decays. Learning rates and weight decay may be set via `set_lr_mult()` and `set_wd_mult()`, respectively. weight : NDArray The parameter to be updated. grad : NDArray The gradient of the objective with respect to this parameter. state : any obj The state returned by `create_state()`. iiRDtoutN(R#RDRERFRGRHRNR(R/RARBRLRMRItoriginal_statetgrad32((sO/usr/local/lib/python2.7/site-packages/mxnet-1.2.1-py2.7.egg/mxnet/optimizer.pytupdate_multi_precisions  cCs+|jdk rtdn ||_dS(sSets a new learning rate of the optimizer. Parameters ---------- lr : float The new learning rate of the optimizer. sLRScheduler of the optimizer has already been defined. Note that set_learning_rate can mutate the value of the learning rate of the optimizer only when the LRScheduler of the optimizer is undefined.N(RRt UserWarningR(R/R((sO/usr/local/lib/python2.7/site-packages/mxnet-1.2.1-py2.7.egg/mxnet/optimizer.pytset_learning_ratescCs tdS(s4[DEPRECATED] Sets lr scale. Use set_lr_mult instead.N(tDeprecationWarning(R/t args_lrscale((sO/usr/local/lib/python2.7/site-packages/mxnet-1.2.1-py2.7.egg/mxnet/optimizer.pyt set_lr_scale$scCsi|_|jro|j\}}xK|D]@}||kr(d||kr(t||d|j|s"    cCs|j||||dS(N(Ru(R/RARBRLRM((sO/usr/local/lib/python2.7/site-packages/mxnet-1.2.1-py2.7.egg/mxnet/optimizer.pyRNTs(R5R:RiR3RCRuRN(((sO/usr/local/lib/python2.7/site-packages/mxnet-1.2.1-py2.7.egg/mxnet/optimizer.pyRxs   tFTMLcBs2eZdZddddZdZdZRS(s;The FTML optimizer. This class implements the optimizer described in *FTML - Follow the Moving Leader in Deep Learning*, available at http://proceedings.mlr.press/v70/zheng17a/zheng17a.pdf. This optimizer accepts the following parameters in addition to those accepted by :class:`.Optimizer`. Parameters ---------- beta1 : float, optional 0 < beta1 < 1. Generally close to 0.5. beta2 : float, optional 0 < beta2 < 1. Generally close to 1. epsilon : float, optional Small value to avoid division by 0. g333333?g+?g:0yE>cKs5tt|j|||_||_||_dS(N(RnRzR3tbeta1tbeta2tepsilon(R/R{R|R}R?((sO/usr/local/lib/python2.7/site-packages/mxnet-1.2.1-py2.7.egg/mxnet/optimizer.pyR3ks  cCsUt|j|jd|jt|j|jd|jt|j|jd|jfS(NRD(RRsRtRD(R/RARB((sO/usr/local/lib/python2.7/site-packages/mxnet-1.2.1-py2.7.egg/mxnet/optimizer.pyRCqsc Cst|tstt|ts*t|j||j|}|j|}|j|}i|jd6|jd6|j d6|j d6|d6}|j r|j |dcKsDtt|jd||||_||_||_||_dS(NR1(RnRR3R{R|R}Rp(R/R1R{R|R}RpR?((sO/usr/local/lib/python2.7/site-packages/mxnet-1.2.1-py2.7.egg/mxnet/optimizer.pyR3s    cCs^|jr|jnd}t|j|jd|jd|t|j|jd|jd|fS(NRqRDRr(RpRrRRsRtRD(R/RARBRr((sO/usr/local/lib/python2.7/site-packages/mxnet-1.2.1-py2.7.egg/mxnet/optimizer.pyRC s c Cst|tstt|ts*t|j||j|}|j|}|j|}d|j|}d|j|} |t j | |9}i|jd6|jd6|j d6|j d6} |j r|j | dcKs#tt|j|||_dS(N(RnRR3tfloat_stable_eps(R/tepsR?((sO/usr/local/lib/python2.7/site-packages/mxnet-1.2.1-py2.7.egg/mxnet/optimizer.pyR3<scCst|j|jd|jS(NRr(RRsRtRr(R/RARB((sO/usr/local/lib/python2.7/site-packages/mxnet-1.2.1-py2.7.egg/mxnet/optimizer.pyRC@sc CsYt|tstt|ts*t|j||j|}|j|}|jdkop|jdk}|}|ri|jd6|jd6} |j r|j | dcKsMtt|jd||||_||_||_||_||_dS(NR1(RnRR3tgamma1tgamma2tcenteredR}t clip_weights(R/R1RRR}RRR?((sO/usr/local/lib/python2.7/site-packages/mxnet-1.2.1-py2.7.egg/mxnet/optimizer.pyR3|s     cCs|jr^t|j|jd|jt|j|jd|jt|j|jd|jfSt|j|jd|jfSdS(NRr(RRRsRtRr(R/RARB((sO/usr/local/lib/python2.7/site-packages/mxnet-1.2.1-py2.7.egg/mxnet/optimizer.pyRCs  c Cs<t|tstt|ts*t|j||j|}|j|}i|jd6|jd6|jd6}|j r|j |dcKs,tt|j|||_||_dS(N(RnRR3trhoR}(R/RR}R?((sO/usr/local/lib/python2.7/site-packages/mxnet-1.2.1-py2.7.egg/mxnet/optimizer.pyR3s cCs(t|j|jt|j|jfS(N(RRsRt(R/RARB((sO/usr/local/lib/python2.7/site-packages/mxnet-1.2.1-py2.7.egg/mxnet/optimizer.pyRCsc Cs t|tstt|ts*t|j|}|j|||j9}|jdk r~t||j |j}n|\}}|j |d|j |||(t ||j t ||j |}|j |d|j |||(||||8(dS(Ng?( R$RR&RhReRR"RRRRR}( R/RARBRLRMRtacc_gt acc_deltat current_delta((sO/usr/local/lib/python2.7/site-packages/mxnet-1.2.1-py2.7.egg/mxnet/optimizer.pyRNs   !(!(R5R:RiR3RCRN(((sO/usr/local/lib/python2.7/site-packages/mxnet-1.2.1-py2.7.egg/mxnet/optimizer.pyRs tFtrlcBs2eZdZddddZdZdZRS(sThe Ftrl optimizer. Referenced from *Ad Click Prediction: a View from the Trenches*, available at http://dl.acm.org/citation.cfm?id=2488200. eta : .. math:: \eta_{t,i} = \frac{learningrate}{\beta+\sqrt{\sum_{s=1}^tg_{s,i}^2}} The optimizer updates the weight by:: rescaled_grad = clip(grad * rescale_grad, clip_gradient) z += rescaled_grad - (sqrt(n + rescaled_grad**2) - sqrt(n)) * weight / learning_rate n += rescaled_grad**2 w = (sign(z) * lamda1 - z) / ((beta + sqrt(n)) / learning_rate + wd) * (abs(z) > lamda1) If the storage types of weight, state and grad are all ``row_sparse``, **sparse updates** are applied by:: for row in grad.indices: rescaled_grad[row] = clip(grad[row] * rescale_grad, clip_gradient) z[row] += rescaled_grad[row] - (sqrt(n[row] + rescaled_grad[row]**2) - sqrt(n[row])) * weight[row] / learning_rate n[row] += rescaled_grad[row]**2 w[row] = (sign(z[row]) * lamda1 - z[row]) / ((beta + sqrt(n[row])) / learning_rate + wd) * (abs(z[row]) > lamda1) The sparse update only updates the z and n for the weights whose row_sparse gradient indices appear in the current batch, rather than updating it for all indices. Compared with the original update, it can provide large improvements in model training throughput for some applications. However, it provides slightly different semantics than the original update, and may lead to different empirical results. For details of the update algorithm, see :class:`~mxnet.ndarray.ftrl_update`. This optimizer accepts the following parameters in addition to those accepted by :class:`.Optimizer`. Parameters ---------- lamda1 : float, optional L1 regularization coefficient. learning_rate : float, optional The initial learning rate. beta : float, optional Per-coordinate learning rate correlation parameter. g{Gz?g?icKs5tt|j|||_||_||_dS(N(RnRR3tlamda1tbetaR(R/RR1RR?((sO/usr/local/lib/python2.7/site-packages/mxnet-1.2.1-py2.7.egg/mxnet/optimizer.pyR3 s  cCs:t|j|jd|jt|j|jd|jfS(NRr(RRsRtRr(R/RARB((sO/usr/local/lib/python2.7/site-packages/mxnet-1.2.1-py2.7.egg/mxnet/optimizer.pyRCsc Cst|tstt|ts*t|j||j|}|j|}i|jd6|jd6|jd6}|j r|j |dgMbp?cKsMtt|jd||||_||_||_||_d|_dS(NR1g?(RnRR3R{R|R}tschedule_decayt m_schedule(R/R1R{R|R}RR?((sO/usr/local/lib/python2.7/site-packages/mxnet-1.2.1-py2.7.egg/mxnet/optimizer.pyR3ls     cCs:t|j|jd|jt|j|jd|jfS(NRD(RRsRtRD(R/RARB((sO/usr/local/lib/python2.7/site-packages/mxnet-1.2.1-py2.7.egg/mxnet/optimizer.pyRCuscCst|tstt|ts*t|j||j|}|j|}|j|}||j||}|jdk rt ||j |j}n|j ddt d||j }|j ddt d|d|j } |j||_|j| } |\} } |j | d|j || (|j| d|j||| (|d|j} | d| }| dt |j|}d|| | |}|||t||j8(dS(Ng?g?gQ?i(R$RR&ReRgRhR!RR"RRR{tpowRRR|RR}(R/RARBRLRMRRR~t momentum_tt momentum_t_1tm_schedule_nextRtv_tt grad_primet m_t_primet v_t_primetm_t_bar((sO/usr/local/lib/python2.7/site-packages/mxnet-1.2.1-py2.7.egg/mxnet/optimizer.pyRNys*  %)  !(R5R:RiR3RCRN(((sO/usr/local/lib/python2.7/site-packages/mxnet-1.2.1-py2.7.egg/mxnet/optimizer.pyRVs    tTestcBs)eZdZdZdZdZRS(sThe Test optimizercKstt|j|dS(N(RnRR3(R/R?((sO/usr/local/lib/python2.7/site-packages/mxnet-1.2.1-py2.7.egg/mxnet/optimizer.pyR3scCst|j|jS(s$Creates a state to duplicate weight.(RRsRt(R/RARB((sO/usr/local/lib/python2.7/site-packages/mxnet-1.2.1-py2.7.egg/mxnet/optimizer.pyRCscCs|||j7(||(dS(s"Performs w += rescale_grad * grad.N(R(R/RARBRLRM((sO/usr/local/lib/python2.7/site-packages/mxnet-1.2.1-py2.7.egg/mxnet/optimizer.pyRNs(R5R:RiR3RCRN(((sO/usr/local/lib/python2.7/site-packages/mxnet-1.2.1-py2.7.egg/mxnet/optimizer.pyRs  tUpdatercBs>eZdZdZdZdZdZedZRS(sUpdater for kvstore.cCs||_i|_i|_dS(N(t optimizertstatest states_synced(R/R((sO/usr/local/lib/python2.7/site-packages/mxnet-1.2.1-py2.7.egg/mxnet/optimizer.pyR3s  cCst|trt|}n||jkrY|jj|||j|sN(R$Rt as_in_contextRR(R/RMRtt synced_state((RtR/sO/usr/local/lib/python2.7/site-packages/mxnet-1.2.1-py2.7.egg/mxnet/optimizer.pyRs   cCsptj|}t|trEt|dkrE|\|_|_n ||_tj|jj t |_ dS(sSets updater states.iN( tpickletloadsR$RtlenRRR%tfromkeystkeysRjR(R/R((sO/usr/local/lib/python2.7/site-packages/mxnet-1.2.1-py2.7.egg/mxnet/optimizer.pyt set_statess ! cCs(tj|r|j|jfn|jS(sGets updater states. Parameters ---------- dump_optimizer : bool, default False Whether to also save the optimizer itself. This would also save optimizer information such as learning rate and weight decay schedules. (RtdumpsRR(R/tdump_optimizer((sO/usr/local/lib/python2.7/site-packages/mxnet-1.2.1-py2.7.egg/mxnet/optimizer.pyt get_statess ( R5R:RiR3RRRRjR(((sO/usr/local/lib/python2.7/site-packages/mxnet-1.2.1-py2.7.egg/mxnet/optimizer.pyRs    cCs t|S(sReturns a closure of the updater needed for kvstore. Parameters ---------- optimizer: Optimizer The optimizer. Returns ------- updater: function The closure of the updater. (R(R((sO/usr/local/lib/python2.7/site-packages/mxnet-1.2.1-py2.7.egg/mxnet/optimizer.pyt get_updaters (6RiRRR8REtbaseRtndarrayRRRRRRRRRR R R R R RRRRRRRtrandomRtobjectRR=RmRxRzRRRRRRRRRRRRRR@tcreateRR(((sO/usr/local/lib/python2.7/site-packages/mxnet-1.2.1-py2.7.egg/mxnet/optimizer.pyts^    4R h=/73Q3K1K0C 6