ó šÄïYc@swdZddlZddlZddlZddlZddlZddlmZmZm Z m Z m Z m Z m Z mZddlmZmZmZmZmZmZmZddlmZdefd„ƒYZejZed efd „ƒYƒZed efd „ƒYƒZed efd„ƒYƒZedefd„ƒYƒZedefd„ƒYƒZ edefd„ƒYƒZ!edefd„ƒYƒZ"edefd„ƒYƒZ#edefd„ƒYƒZ$edefd„ƒYƒZ%edefd„ƒYƒZ&edefd „ƒYƒZ'ed!efd"„ƒYƒZ(ej)Z*d#efd$„ƒYZ+d%„Z,dS(&sWeight updating functions.iÿÿÿÿNi(tNDArraytzerostcliptsqrttsigntarraytmaximumtabs(t sgd_updatetsgd_mom_updatet adam_updatetrmsprop_updatetrmspropalex_updatet mp_sgd_updatetmp_sgd_mom_update(tnormalt Optimizerc BsžeZdZdddddddddd„ ZiZed„ƒZed„ƒZd„Z d „Z d „Z d „Z d „Z d „Zd„Zd„ZRS(s¥The base class inherited by all optimizers. Parameters ---------- rescale_grad : float, optional Multiply the gradient with `rescale_grad` before updating. Often choose to be ``1.0/batch_size``. param_idx2name : dict from int to string, optional A dictionary that maps int index to string name. clip_gradient : float, optional Clip the gradient by projecting onto the box ``[-clip_gradient, clip_gradient]``. learning_rate : float, optional The initial learning rate. lr_scheduler : LRScheduler, optional The learning rate scheduler. wd : float, optional The weight decay (or L2 regularization) coefficient. Modifies objective by adding a penalty for having large weights. sym: Symbol, optional The Symbol this optimizer is applying to. begin_num_update : int, optional The initial number of updates. gð?gg{®Gáz„?ic Csð||_||_||_|dk r6||j_n||_i|_i|_||_||_ i|_ ||_ |dkrŠi}nt |t ƒs¥tdƒ‚|jƒ|_||_| rÉ| ni|_|jiƒ|jiƒdS(Ns:param_idx2name should be a dict of param indexes to names.(t rescale_gradtlrt lr_schedulertNonetbase_lrtwdtlr_multtwd_multtbegin_num_updatet num_updatet_index_update_countt clip_gradientt isinstancetdicttAssertionErrortcopytidx2nametsymt param_dictt set_lr_multt set_wd_mult( tselfRtparam_idx2nameRRt learning_rateRR"RR#((s/build/bdist.linux-armv7l/egg/mxnet/optimizer.pyt__init__=s*                cCszt|tƒst‚|jjƒ}|tjkritjd|j |jtj|j tj|jƒn|tj|<|S(sÎRegisters a new optimizer. Once an optimizer is registered, we can create an instance of this optimizer with `create_optimizer` later. Examples -------- >>> @mx.optimizer.Optimizer.register ... class MyOptimizer(mx.optimizer.Optimizer): ... pass >>> optim = mx.optimizer.Optimizer.create_optimizer('MyOptimizer') >>> print(type(optim)) sCWARNING: New optimizer %s.%s is overriding existing optimizer %s.%s( RttypeRt__name__tlowerRt opt_registrytloggingtwarningt __module__(tklasstname((s/build/bdist.linux-armv7l/egg/mxnet/optimizer.pytregister\s    cKs@|jƒtjkr,tj|jƒ|Std|ƒ‚dS(s÷Instantiates an optimizer with a given name and kwargs. .. note:: We can use the alias `create` for ``Optimizer.create_optimizer``. Parameters ---------- name: str Name of the optimizer. Should be the name of a subclass of Optimizer. Case insensitive. kwargs: dict Parameters for the optimizer. Returns ------- Optimizer An instantiated optimizer. Examples -------- >>> sgd = mx.optimizer.Optimizer.create_optimizer('sgd') >>> type(sgd) >>> adam = mx.optimizer.create('adam', learning_rate=.1) >>> type(adam) sCannot find optimizer %sN(R,RR-t ValueError(R2tkwargs((s/build/bdist.linux-armv7l/egg/mxnet/optimizer.pytcreate_optimizerxscCsdS(sTCreates auxiliary state for a given weight. Some optimizers require additional states, e.g. as momentum, in addition to gradients in order to update weights. This function creates state for a given weight which will be used in `update`. This function is called only once for each weight. Parameters ---------- index : int An unique index to identify the weight. weight : NDArray The weight. Returns ------- state : any obj The state associated with the weight. N((R&tindextweight((s/build/bdist.linux-armv7l/egg/mxnet/optimizer.pyt create_state›scCs tƒ‚dS(sXUpdates the given parameter using the corresponding gradient and state. Parameters ---------- index : int The unique index of the parameter into the individual learning rates and weight decays. Learning rates and weight decay may be set via `set_lr_mult()` and `set_wd_mult()`, respectively. weight : NDArray The parameter to be updated. grad : NDArray The gradient of the objective with respect to this parameter. state : any obj The state returned by `create_state()`. N(tNotImplementedError(R&R7R8tgradtstate((s/build/bdist.linux-armv7l/egg/mxnet/optimizer.pytupdate°scCs t‚dS(s4[DEPRECATED] Sets lr scale. Use set_lr_mult instead.N(tDeprecationWarning(R&t args_lrscale((s/build/bdist.linux-armv7l/egg/mxnet/optimizer.pyt set_lr_scaleÂscCs’i|_|jdk r~|jjƒ}xT|jjƒD]@}||kr7d||kr7t||dƒ|j|cKs;tt|ƒjd||||_||_||_dS(NR((RVRrR)tbeta1tbeta2tepsilon(R&R(RsRtRuR5((s/build/bdist.linux-armv7l/egg/mxnet/optimizer.pyR).s  cCs:t|j|jd|jƒt|j|jd|jƒfS(NRZ(RR_R]RZ(R&R7R8((s/build/bdist.linux-armv7l/egg/mxnet/optimizer.pyR95sc Cst|tƒst‚t|tƒs*t‚|j|ƒ}|j|ƒ}|j|ƒ|j|}d|j|}d|j|} |t j | ƒ|9}i|jd6|jd6|j d6|j d6} |j râ|j | dcKs#tt|ƒj|||_dS(N(RVR{R)tfloat_stable_eps(R&tepsR5((s/build/bdist.linux-armv7l/egg/mxnet/optimizer.pyR)^scCst|j|jƒS(N(RR_R](R&R7R8((s/build/bdist.linux-armv7l/egg/mxnet/optimizer.pyR9bscCsÏt|tƒst‚t|tƒs*t‚|j|ƒ}|j|ƒ}|j|ƒ||j}|jdk rt ||j |jƒ}n|}|||7(|| |t ||j ƒ||7(dS(N( RRRRQRRRORRRRRR|(R&R7R8R;R<RRthistory((s/build/bdist.linux-armv7l/egg/mxnet/optimizer.pyR=es  (R+R0RSR)R9R=(((s/build/bdist.linux-armv7l/egg/mxnet/optimizer.pyR{Ns  tRMSPropcBs;eZdZddddedd„Zd„Zd„ZRS(sÅThe RMSProp optimizer. Two versions of RMSProp are implemented: If ``centered=False``, we follow http://www.cs.toronto.edu/~tijmen/csc321/slides/lecture_slides_lec6.pdf by Tieleman & Hinton, 2012. For details of the update algorithm see :class:`~mxnet.ndarray.rmsprop_update`. If ``centered=True``, we follow http://arxiv.org/pdf/1308.0850v5.pdf (38)-(45) by Alex Graves, 2013. For details of the update algorithm see :class:`~mxnet.ndarray.rmspropalex_update`. This optimizer accepts the following parameters in addition to those accepted by :class:`.Optimizer`. Parameters ---------- gamma1: float, optional A decay factor of moving average over past squared gradient. gamma2: float, optional A "momentum" factor. Only used if `centered`=``True``. epsilon : float, optional Small value to avoid division by 0. centered : bool, optional Flag to control which version of RMSProp to use. ``True`` will use Graves's version of `RMSProp`, ``False`` will use Tieleman & Hinton's version of `RMSProp`. clip_weights : float, optional Clips weights into range ``[-clip_weights, clip_weights]``. gü©ñÒMbP?gÍÌÌÌÌÌì?g:Œ0âŽyE>cKsMtt|ƒjd||||_||_||_||_||_dS(NR((RVRR)tgamma1tgamma2tcenteredRut clip_weights(R&R(R€RRuR‚RƒR5((s/build/bdist.linux-armv7l/egg/mxnet/optimizer.pyR)”s     cCs]|jrCt|j|jƒt|j|jƒt|j|jƒfSt|j|jƒfSdS(N(R‚RR_R](R&R7R8((s/build/bdist.linux-armv7l/egg/mxnet/optimizer.pyR9s  c Cs<t|tƒst‚t|tƒs*t‚|j|ƒ}|j|ƒ}|j|ƒi|jd6|jd6|jd6}|j r’|j |dcKs,tt|ƒj|||_||_dS(N(RVR†R)trhoRu(R&R‡RuR5((s/build/bdist.linux-armv7l/egg/mxnet/optimizer.pyR)Ðs cCs(t|j|jƒt|j|jƒfS(N(RR_R](R&R7R8((s/build/bdist.linux-armv7l/egg/mxnet/optimizer.pyR9Õsc Cs t|tƒst‚t|tƒs*t‚|j|ƒ}|j|ƒ||j9}|jdk r~t||j |jƒ}n|\}}|j |d|j |||(t ||j ƒt ||j ƒ|}|j |d|j |||(||||8(dS(Ngð?( RRRRRRORRRRR‡RRu( R&R7R8R;R<Rtacc_gt acc_deltat current_delta((s/build/bdist.linux-armv7l/egg/mxnet/optimizer.pyR=Ùs   !(!(R+R0RSR)R9R=(((s/build/bdist.linux-armv7l/egg/mxnet/optimizer.pyR†¿s tFtrlcBs2eZdZdddd„Zd„Zd„ZRS(sThe Ftrl optimizer. Referenced from *Ad Click Prediction: a View from the Trenches*, available at http://dl.acm.org/citation.cfm?id=2488200. Parameters ---------- lamda1 : float, optional L1 regularization coefficient. learning_rate : float, optional The initial learning rate. beta : float, optional Per-coordinate learning rate correlation parameter. eta : .. math:: \eta_{t,i} = \frac{learningrate}{\beta+\sqrt{\sum_{s=1}^tg_{s,i}^t}} g{®Gáz„?gš™™™™™¹?icKs5tt|ƒj|||_||_||_dS(N(RVR‹R)tlamda1tbetaR(R&RŒR(RR5((s/build/bdist.linux-armv7l/egg/mxnet/optimizer.pyR)s  cCs(t|j|jƒt|j|jƒfS(N(RR_R](R&R7R8((s/build/bdist.linux-armv7l/egg/mxnet/optimizer.pyR9 sc Cst|tƒst‚t|tƒs*t‚|j|ƒ|j|ƒ}|j|ƒ}||j9}|jdk rt ||j |jƒ}n|\}}||t |||ƒt |ƒ||7}|||7}t |ƒ|j ||j t |ƒ||t|ƒ|j k|(dS(N(RRRRORRRQRRRRRRRŒRtNDabs( R&R7R8R;R<RRtdnRM((s/build/bdist.linux-armv7l/egg/mxnet/optimizer.pyR=s   .(R+R0RSR)R9R=(((s/build/bdist.linux-armv7l/egg/mxnet/optimizer.pyR‹ðs tAdamaxcBs2eZdZdddd„Zd„Zd„ZRS(sÒThe AdaMax optimizer. It is a variant of Adam based on the infinity norm available at http://arxiv.org/abs/1412.6980 Section 7. This optimizer accepts the following parameters in addition to those accepted by :class:`.Optimizer`. Parameters ---------- beta1 : float, optional Exponential decay rate for the first moment estimates. beta2 : float, optional Exponential decay rate for the second moment estimates. gü©ñÒMb`?gÍÌÌÌÌÌì?g+‡ÙÎ÷ï?cKs2tt|ƒjd||||_||_dS(NR((RVRR)RsRt(R&R(RsRtR5((s/build/bdist.linux-armv7l/egg/mxnet/optimizer.pyR)6s cCs:t|j|jd|jƒt|j|jd|jƒfS(NRZ(RR_R]RZ(R&R7R8((s/build/bdist.linux-armv7l/egg/mxnet/optimizer.pyR9;sc Cst|tƒst‚t|tƒs*t‚|j|ƒ}|j|ƒ}|j|ƒ|j|}|d|j|:}||j||}|j dk r·t ||j |j ƒ}n|\}} |j|d|j||(t |j | t|ƒƒ| (|||| 8(dS(Ngð?(RRRRQRRRORRsRRRRRRtRŽ( R&R7R8R;R<RRRvtm_ttu_t((s/build/bdist.linux-armv7l/egg/mxnet/optimizer.pyR=?s   (R+R0RSR)R9R=(((s/build/bdist.linux-armv7l/egg/mxnet/optimizer.pyR%s tNadamcBs8eZdZdddddd„Zd„Zd„ZRS( s½The Nesterov Adam optimizer. Much like Adam is essentially RMSprop with momentum, Nadam is Adam RMSprop with Nesterov momentum available at http://cs229.stanford.edu/proj2015/054_report.pdf. This optimizer accepts the following parameters in addition to those accepted by :class:`.Optimizer`. Parameters ---------- beta1 : float, optional Exponential decay rate for the first moment estimates. beta2 : float, optional Exponential decay rate for the second moment estimates. epsilon : float, optional Small value to avoid division by 0. schedule_decay : float, optional Exponential decay rate for the momentum schedule gü©ñÒMbP?gÍÌÌÌÌÌì?g+‡ÙÎ÷ï?g:Œ0âŽyE>gü©ñÒMbp?cKsMtt|ƒjd||||_||_||_||_d|_dS(NR(gð?(RVR“R)RsRtRutschedule_decayt m_schedule(R&R(RsRtRuR”R5((s/build/bdist.linux-armv7l/egg/mxnet/optimizer.pyR)ls     cCs:t|j|jd|jƒt|j|jd|jƒfS(NRZ(RR_R]RZ(R&R7R8((s/build/bdist.linux-armv7l/egg/mxnet/optimizer.pyR9uscCsÊt|tƒst‚t|tƒs*t‚|j|ƒ}|j|ƒ}|j|ƒ|j|}||j||9}|jdk r¢t ||j |jƒ}n|j ddt d||j ƒ}|j ddt d|d|j ƒ} |j||_|j| } |\} } |j | d|j || (|j| d|j||| (|d|j} | d| }| dt |j|ƒ}d|| | |}|||t|ƒ|j8(dS(Ngð?gà?g¸…ëQ¸î?i(RRRRQRRRORRRRRRstpowR”R•RtRRu(R&R7R8R;R<RRRvt momentum_tt momentum_t_1tm_schedule_nextR‘tv_tt grad_primet m_t_primet v_t_primetm_t_bar((s/build/bdist.linux-armv7l/egg/mxnet/optimizer.pyR=ys*  %)  !(R+R0RSR)R9R=(((s/build/bdist.linux-armv7l/egg/mxnet/optimizer.pyR“Vs    tTestcBs)eZdZd„Zd„Zd„ZRS(sThe Test optimizercKstt|ƒj|dS(N(RVRŸR)(R&R5((s/build/bdist.linux-armv7l/egg/mxnet/optimizer.pyR)scCst|j|jƒS(s$Creates a state to duplicate weight.(RR_R](R&R7R8((s/build/bdist.linux-armv7l/egg/mxnet/optimizer.pyR9 scCs|||j7(||(dS(s"Performs w += rescale_grad * grad.N(R(R&R7R8R;R<((s/build/bdist.linux-armv7l/egg/mxnet/optimizer.pyR=¤s(R+R0RSR)R9R=(((s/build/bdist.linux-armv7l/egg/mxnet/optimizer.pyRŸšs  tUpdatercBs;eZdZd„Zd„Zd„Zd„Zd„ZRS(sUpdater for kvstore.cCs||_i|_i|_dS(N(t optimizertstatest states_synced(R&R¡((s/build/bdist.linux-armv7l/egg/mxnet/optimizer.pyR)®s  cCsŸ||jkr;|jj||ƒ|j|Âs(RRt as_in_contextReRd(R&R<R]t synced_state((R]R&s/build/bdist.linux-armv7l/egg/mxnet/optimizer.pyR¥¾s   cCs4tj|ƒ|_tj|jjƒtƒ|_dS(sSets updater states.N(tpickletloadsR¢RtfromkeystkeysRgR£(R&R¢((s/build/bdist.linux-armv7l/egg/mxnet/optimizer.pyt set_statesÊscCstj|jƒS(sGets updater states.(R«tdumpsR¢(R&((s/build/bdist.linux-armv7l/egg/mxnet/optimizer.pyt get_statesÏs(R+R0RSR)R¦R¥R¯R±(((s/build/bdist.linux-armv7l/egg/mxnet/optimizer.pyR ¬s   cCs t|ƒS(sÝReturns a closure of the updater needed for kvstore. Parameters ---------- optimizer: Optimizer The optimizer. Returns ------- updater: function The closure of the updater. (R (R¡((s/build/bdist.linux-armv7l/egg/mxnet/optimizer.pyt get_updaterÓs (-RSRoR«R.R`R[tndarrayRRRRRRRRRŽRR R R R R RtrandomRtobjectRR3RURhRmRnRpRrR{RR†R‹RR“RŸR6tcreateR R²(((s/build/bdist.linux-armv7l/egg/mxnet/optimizer.pytsP     :4ÿ. L7#4$K040C '