U fb] @sdZddlZddlZddlmZddZGdddejjZejj ejj eejj ejj ejj ejjejjdZejj ejjejj ejj ejjejjejjd ZdS) z!Wrappers for training optimizers.N)kerascCsX|dddkrtd|dkr6t|ddS|dkrTt|ddSdS)aGet the optimizer specified in config for model training. Arguments --------- framework : str Name of the deep learning framework used. Current options are ``['torch', 'keras']``. config : dict The config dict generated from the YAML config file. Returns ------- An optimizer object for the specified deep learning framework. training optimizerNz2An optimizer must be specified in the config file.)torchpytorchr) ValueErrortorch_optimizersgetlowerkeras_optimizers) frameworkconfigrm/home/ec2-user/SageMaker/vegetation-management-remars2022/remars2022-workshop/libs/solaris/nets/optimizers.py get_optimizers rcs8eZdZdZdfdd Zfd d Zdd d ZZS) TorchAdamWaAdamW algorithm as implemented in `Torch_AdamW`_. The original Adam algorithm was proposed in `Adam: A Method for Stochastic Optimization`_. The AdamW variant was proposed in `Decoupled Weight Decay Regularization`_. Arguments: params (iterable): iterable of parameters to optimize or dicts defining parameter groups lr (float, optional): learning rate (default: 1e-3) betas (Tuple[float, float], optional): coefficients used for computing running averages of gradient and its square (default: (0.9, 0.999)) eps (float, optional): term added to the denominator to improve numerical stability (default: 1e-8) weight_decay (float, optional): weight decay coefficient (default: 1e-2) amsgrad (boolean, optional): whether to use the AMSGrad variant of this algorithm from the paper `On the Convergence of Adam and Beyond`_ (default: False) .. _Torch_AdamW: https://github.com/pytorch/pytorch/pull/3740 .. _Adam\: A Method for Stochastic Optimization: https://arxiv.org/abs/1412.6980 .. _Decoupled Weight Decay Regularization: https://arxiv.org/abs/1711.05101 .. _On the Convergence of Adam and Beyond: https://openreview.net/forum?id=ryQu7f-RZ MbP?g?g+?:0yE>{Gz?Fcsd|kstd|d|ks,td|d|dkrDdksXntd|dd|dkrpdksntd|dt|||||d }tt|||dS) NgzInvalid learning rate: {}zInvalid epsilon value: {}rg?z%Invalid beta parameter at index 0: {}z%Invalid beta parameter at index 1: {})lrbetaseps weight_decayamsgrad)rformatdictsuperr__init__)selfparamsrrrrrdefaults __class__rrr;szTorchAdamW.__init__cs,tt|||jD]}|ddqdS)NrF)rr __setstate__ param_groups setdefault)r stategroupr#rrr%Is zTorchAdamW.__setstate__NcCsd}|dk r|}|jD]}|dD]}|jdkr8q&|jd|d|d|jj}|jrjtd|d}|j|}t|dkrd|d <t |j|d <t |j|d <|rt |j|d <|d |d }} |r|d } |d \} } |d d7<||  d| || |  d| |||rVtj | | | d| |d} n| |d} d| |d }d| |d }|dt ||}|j| || q&q|S)zPerforms a single optimization step. Arguments: closure (callable, optional): A closure that reevaluates the model and returns the loss. Nr!rrrzIAdam does not support sparsegradients, please consider SparseAdam insteadrrstepexp_avg exp_avg_sqmax_exp_avg_sqr)outr)r&graddatamul_ is_sparse RuntimeErrorr(lenr zeros_likeadd_addcmul_maxsqrtmathaddcdiv_)r closurelossr)pr/rr(r+r,r-beta1beta2denombias_correction1bias_correction2 step_sizerrrr*NsF     zTorchAdamW.step)rrrrF)N)__name__ __module__ __qualname____doc__rr%r* __classcell__rrr#rr!s r)adadeltaadamadamwZ sparseadamadamaxasgdrmspropsgd)rJadagradrKrMnadamrOrP)rHr:r tensorflowrroptim OptimizerrAdadeltaAdam SparseAdamAdamaxASGDRMSpropSGDr optimizersAdagradNadamr rrrrs, p