ó ùµÈ[c@s|dZddlmZmZmZmZmZmZmZddl m Z e j Z dgZ e de fd„ƒYƒZ dS(sContrib optimizers.i(tNDArraytcliptcontribtmeantsqrttsquaretzerosi(t Optimizert GroupAdaGradcBs,eZdZdd„Zd„Zd„ZRS(s Adagrad optimizer with row-wise learning rates. This class implements the AdaGrad optimizer described in *Adaptive Subgradient Methods for Online Learning and Stochastic Optimization*, and available at http://www.jmlr.org/papers/volume12/duchi11a/duchi11a.pdf but uses only a single learning rate for every row of the parameter array. This optimizer updates each weight by:: grad = clip(grad * rescale_grad, clip_gradient) history += mean(square(grad), axis=1, keepdims=True) div = grad / sqrt(history + float_stable_eps) weight -= div * lr Weights are updated lazily if the gradient is sparse. For details of the update algorithm see :class:`~mxnet.ndarray.contrib.group_adagrad_update`. This optimizer accepts the following parameters in addition to those accepted by :class:`.Optimizer`. Weight decay is not supported. Parameters ---------- eps: float, optional Initial value of the history accumulator. Avoids division by 0. gñh㈵øä>cKs#tt|ƒj|||_dS(N(tsuperRt__init__tfloat_stable_eps(tselftepstkwargs((sW/usr/local/lib/python2.7/site-packages/mxnet-1.3.1-py2.7.egg/mxnet/optimizer/contrib.pyR =scCsGt|jƒdkst‚t|jddf|jd|jƒ}|S(Niiitstype(tlentshapetAssertionErrorRtcontextR(R tindextweightthistory((sW/usr/local/lib/python2.7/site-packages/mxnet-1.3.1-py2.7.egg/mxnet/optimizer/contrib.pyt create_stateAs%c Cs_t|tƒst‚t|tƒs*t‚|j|ƒ|j|ƒ}|j|ƒ}|dksmtdƒ‚|jdk}|rÚi|jd6|jd6}|j rµ|j |ds 4