{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## ....To Read While the Endpoint is Deploying or While your Hyperparameter Tuning Jobs are Running....\n",
    "\n",
    "It is important to elaborate on the DeepAR model's architecture by walking through an example. When interested in quantifying the confidence of the estimates produced, then it's probabilistic forecasts that are wanted. If we have real-valued, it is recommended to opt for the Gaussian likelihood:\n",
    "$$\\ell(y_t|\\mu_t,\\sigma_t)=\\frac{1}{\\sqrt{2\\pi\\sigma^2}}\\exp{\\frac{-(y_t-\\mu_t)^2}{2\\sigma^2}}.$$\n",
    "\n",
    "$\\theta$ represents the `parameters of the likelihood`. In the case of Gaussian, $\\theta_t$ will represent the mean and standard deviation:  $$\\theta_t = \\{\\mu_{t},\\sigma_{t}\\}.$$\n",
    "\n",
    "The neural network’s last hidden layer results in $h_{d,t}$. This $h_{d,t}$ will undergo 1 activation function per likelihood parameter. For example, for the Gaussian likelihood, $h_{d,t}$ is transformed by an affine activation function to get the mean:\n",
    "$$\\mu_{t} = w_{\\mu}^T h_{d,t} + b_{\\mu},$$\n",
    "and then $h$ is transformed by a softplus activation to get the standard deviation:\n",
    "$$\\sigma_t = \\log\\left(1 + \\exp(w_{\\sigma}^T h_{d,t} + b_{\\sigma})\\right).$$\n",
    "\n",
    "The `activation parameters` are the $w_{\\mu},b_{\\mu},w_{\\sigma},b_{\\sigma}$ parameters within the activation functions. The neural network is trained to learn the fixed constants of the activation parameters. Since the $h_{d,t}$ output vary given each time-step's input, this still allows the likelihood parameters to vary over time, and therefore capture dynamic behaviors in the time series data.\n",
    "\n",
    "![DeepAR Training](images/training.png)\n",
    "\n",
    "From the above diagram, the input at each time-step is the data point preceding the current time-step’s data, as well as the previous network’s output. For simplicity, on this diagram you aren’t shown covariates which would also be inputs.\n",
    "\n",
    "The LSTM layers and the final hidden layer produces the $h_{i,t}$ value, which will undergo an activation function for each parameter of the specified likelihood. To learn the activation function parameters, the neural network takes the $h_{i,t}$ at time $t$ and the data up until time $t$, and performs Stochastic Gradient Descent (SGD) to yield the activation parameters which maximize the likelihood at time $t$. The output layer uses the SGD-optimized activation functions to output the maximum likelihood parameters.\n",
    "\n",
    "This is how DeepAR trains its model to your data input. Now you want DeepAR to give you probabilistic forecasts for the next time-step.\n",
    "\n",
    "![DeepAR Forecast](images/prediction.png)\n",
    "\n",
    "During prediction the input of the current time will be processed by the trained LSTM layers, and subsequently get activated by the optimized activation functions to output the maximum-likelihood theta parameters at time $t+1$. \n",
    "\n",
    "Now that DeepAR has completed the likelihood with its parameter estimates, DeepAR can simulate `Monte Carlo (MC) samples` from this likelihood and produce an empirical distribution for the predicted datapoint - the probabilistic forecasts. The Monte Carlo samples produced at time $t+1$ are used as input for time $t+2$, etc, until the end of the prediction horizon. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}