{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Building Your Predictor\n",
"\n",
"Forecasting is used in a variety of applications and business use cases: For example, retailers need to forecast the sales of their products to decide how much stock they need by location, Manufacturers need to estimate the number of parts required at their factories to optimize their supply chain, Businesses need to estimate their flexible workforce needs, Utilities need to forecast electricity consumption needs in order to attain an efficient energy network, and enterprises need to estimate their cloud infrastructure needs.\n",
"\n",
"\n",
"\n",
"In this notebook we will be walking through the steps outlined in 2nd-through-4th boxes above to build and query your first forecast.\n",
"\n",
"\n",
"## Table Of Contents\n",
"* Step 1: [Setup Amazon Forecast](#setup)\n",
"* Step 2: [Create a Predictor](#createPredictor)\n",
"* Step 3: [Get Predictor Error Metrics from Backtesting](#predictorErrors)\n",
"* Step 4: [Create a Forecast](#createForecast)\n",
"* Step 5: [Query the Forecast](#queryForecast)\n",
"* [Next Steps](#nextSteps)\n",
"\n",
"For more informations about APIs, please check the [documentation](https://docs.aws.amazon.com/forecast/latest/dg/what-is-forecast.html)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Step 1: Setup Amazon Forecast\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This section sets up the permissions and relevant endpoints."
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [],
"source": [
"import sys\n",
"import os\n",
"import time\n",
"import pandas as pd\n",
"\n",
"# importing forecast notebook utility from notebooks/common directory\n",
"sys.path.insert( 0, os.path.abspath(\"../../common\") )\n",
"import util\n",
"\n",
"%reload_ext autoreload\n",
"import boto3\n",
"import s3fs"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The line below will retrieve your stored variables from the first notebook."
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"item_id = client_12\n",
"project = util_power_demo\n",
"data_version = 1\n",
"Forecast length = 24\n",
"Dataset frequency = H\n",
"Timestamp format = yyyy-MM-dd hh:mm:ss\n",
"dataset_group_arn = arn:aws:forecast:us-west-2:730750055343:dataset-group/util_power_demo_1\n",
"role_arn = arn:aws:iam::730750055343:role/ForecastNotebookRole-Basic\n",
"bucket_name = forecast-demo-uci-electricity-jeetub\n",
"region = us-west-2\n"
]
}
],
"source": [
"%store -r\n",
"\n",
"# Print your choices from first notebook\n",
"print(f\"item_id = {item_id}\")\n",
"print(f\"project = {PROJECT}\")\n",
"print(f\"data_version = {DATA_VERSION}\")\n",
"print(f\"Forecast length = {FORECAST_LENGTH}\")\n",
"print(f\"Dataset frequency = {DATASET_FREQUENCY}\")\n",
"print(f\"Timestamp format = {TIMESTAMP_FORMAT}\")\n",
"print(f\"dataset_group_arn = {dataset_group_arn}\")\n",
"print(f\"role_arn = {role_arn}\")\n",
"%store -r bucket_name\n",
"print(f\"bucket_name = {bucket_name}\")\n",
"%store -r region\n",
"print(f\"region = {region}\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The last part of the setup process is to validate that your account can communicate with Amazon Forecast, the cell below does just that."
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [],
"source": [
"# Connect API session\n",
"session = boto3.Session(region_name=region) \n",
"forecast = session.client(service_name='forecast') \n",
"forecastquery = session.client(service_name='forecastquery')"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'Predictors': [],\n",
" 'ResponseMetadata': {'RequestId': 'af1cb6e1-4c9e-42ec-be5a-5ed95505c987',\n",
" 'HTTPStatusCode': 200,\n",
" 'HTTPHeaders': {'content-type': 'application/x-amz-json-1.1',\n",
" 'date': 'Thu, 21 Oct 2021 22:15:24 GMT',\n",
" 'x-amzn-requestid': 'af1cb6e1-4c9e-42ec-be5a-5ed95505c987',\n",
" 'content-length': '17',\n",
" 'connection': 'keep-alive'},\n",
" 'RetryAttempts': 0}}"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# check you can communicate with Forecast API session\n",
"forecast.list_predictors()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Step 2: Create a Predictor \n",
"\n",
"Once the datasets are specified with the corresponding schema, Amazon Forecast will automatically aggregate, at the specified time granularity, all the relevant pieces of information for each item, such as sales, price, promotions, as well as categorical attributes, and generate the desired dataset. Next, one can choose an algorithm (forecasting model) and evaluate how well this particular algorithm works on this dataset. The following graph gives a high-level overview of the forecasting models.\n",
"
\n",
"
\n",
"
\n",
"\n",
"\n",
"Amazon Forecast provides several state-of-the-art forecasting algorithms including classic forecasting methods such as ETS, ARIMA, Prophet and deep learning approaches such as DeepAR+. Classical forecasting methods, such as Autoregressive Integrated Moving Average (ARIMA) or Exponential Smoothing (ETS), fit a single model to each individual time series, and then use that model to extrapolate the time series into the future. Amazon's Non-Parametric Time Series (NPTS) forecaster also fits a single model to each individual time series. Unlike the naive or seasonal naive forecasters that use a fixed time index (the previous index $T-1$ or the past season $T - \\tau$) as the prediction for time step $T$, NPTS randomly samples a time index $t \\in \\{0, \\dots T-1\\}$ in the past to generate a sample for the current time step $T$.\n",
"\n",
"In many applications, you may encounter many similar time series across a set of cross-sectional units. Examples of such time series groupings are demand for different products, server loads, and requests for web pages. In this case, it can be beneficial to train a single model jointly over all of these time series. CNN-QR and DeepAR+ take this approach, outperforming the standard ARIMA and ETS methods when your dataset contains hundreds of related time series. The trained model can also be used for generating forecasts for new time series that are similar to the ones it has been trained on. \n",
"\n",
"While deep learning approaches can outperform standard methods, this is only possible when there is sufficient data available for training. It is not true for example when one trains a neural network with a time-series containing only a few dozen observations. Amazon Forecast provides the best of two worlds allowing users to either choose a specific algorithm or let Amazon Forecast automatically perform model selection. \n",
"\n",
"\n",
"## How to evaluate a forecasting model?\n",
"\n",
"Before moving forward, let's first introduce the notion of *backtest* when evaluating forecasting models. The key difference between evaluating forecasting algorithms and standard ML applications is that we need to make sure there is no future information gets used in the past. In other words, the procedure needs to be causal. \n",
"\n",
"
\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Predictor Name = util_power_demo_1_deep_ar_plus\n"
]
}
],
"source": [
"# Which algorithm do you want to use? Choices are:\n",
"# 1. Choose PerformAutoML=True if you want to let Amazon Forecast choose a recipe automatically. \n",
"# 2. If you know which recipe you want, the next level of automation is PerformHPO=True.\n",
"# 3. Finally, you can specify exactly which recipe and enter your own hyperparameter values\n",
"# https://docs.aws.amazon.com/forecast/latest/dg/aws-forecast-choosing-recipes.html\n",
"\n",
"algorithm_arn = 'arn:aws:forecast:::algorithm/'\n",
"algorithm = 'Deep_AR_Plus'\n",
"algorithm_arn_deep_ar_plus = algorithm_arn + algorithm\n",
"predictor_name_deep_ar = f\"{PROJECT}_{DATA_VERSION}_{algorithm.lower()}\"\n",
"print(f\"Predictor Name = {predictor_name_deep_ar}\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"create_predictor_response = \\\n",
" forecast.create_predictor(PredictorName=predictor_name_deep_ar,\n",
" AlgorithmArn=algorithm_arn_deep_ar_plus,\n",
" ForecastHorizon=FORECAST_LENGTH,\n",
" PerformAutoML=False,\n",
" PerformHPO=False,\n",
" InputDataConfig= {\"DatasetGroupArn\": dataset_group_arn},\n",
" FeaturizationConfig= {\"ForecastFrequency\": DATASET_FREQUENCY}\n",
" )"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"predictor_arn_deep_ar = create_predictor_response['PredictorArn']"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Stop the train predictor\n",
"\n",
"Possibly during fine-tuning development, you'll accidentally start training a predictor before you're ready. If you don't want to wait for possibly hours for the automatic training heavy lifting to finish, there is a handy \"Stop API\" call."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# StopResource\n",
"stop_predictor_arn_deep_ar = forecast.stop_resource(ResourceArn=predictor_arn_deep_ar)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Delete the predictor\n",
"util.wait_till_delete(lambda: forecast.delete_predictor(PredictorArn = predictor_arn_deep_ar))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Submit the train predictor job again\n",
"\n",
"Maybe you fixed something you forgot before, and now you're ready to really train your predictor."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"create_predictor_response = \\\n",
" forecast.create_predictor(PredictorName=predictor_name_deep_ar,\n",
" AlgorithmArn=algorithm_arn_deep_ar_plus,\n",
" ForecastHorizon=FORECAST_LENGTH,\n",
" PerformAutoML=False,\n",
" PerformHPO=False,\n",
" InputDataConfig= {\"DatasetGroupArn\": dataset_group_arn},\n",
" FeaturizationConfig= {\"ForecastFrequency\": DATASET_FREQUENCY}\n",
" )\n",
"predictor_arn_deep_ar = create_predictor_response['PredictorArn']"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Check the status of the predictor. When the status change from **CREATE_IN_PROGRESS** to **ACTIVE**, we can continue to next steps. Depending on data size, model selection and choice of hyper parameters tuning,it can take several hours to be **ACTIVE**."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"status = util.wait(lambda: forecast.describe_predictor(PredictorArn=predictor_arn_deep_ar))\n",
"assert status"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"forecast.describe_predictor(PredictorArn=predictor_arn_deep_ar)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Step 3. Get Predictor Error Metrics from Backtesting \n",
"\n",
"After creating the predictors, we can query the forecast accuracy given by the backtest scenario and have a quantitative understanding of the performance of the algorithm. Such a process is iterative in nature during model development. When an algorithm with satisfying performance is found, the customer can deploy the predictor into a production environment, and query the forecasts for a particular item to make business decisions. The figure below shows a sample plot of different quantile forecasts of a predictor."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"error_metrics_deep_ar_plus = forecast.get_accuracy_metrics(PredictorArn=predictor_arn_deep_ar)\n",
"error_metrics_deep_ar_plus"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Step 4. Create a Forecast "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"forecast_name_deep_ar = f\"{PROJECT}_{DATA_VERSION}_{algorithm.lower()}\"\n",
"print(f\"Predictor Name = {predictor_name_deep_ar}\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# create_forecast_response=forecast.create_forecast(ForecastName=forecastName,\n",
"# PredictorArn=predictor_arn)\n",
"# forecast_arn = create_forecast_response['ForecastArn']\n",
"\n",
"create_forecast_response_deep_ar = \\\n",
" forecast.create_forecast(ForecastName=forecast_name_deep_ar,\n",
" PredictorArn=predictor_arn_deep_ar)\n",
"\n",
"forecast_arn_deep_ar = create_forecast_response_deep_ar['ForecastArn']"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Stop the create forecast\n",
"\n",
"Possibly during experimentation, you've found the best predictor, but it's not this predictor. If you don't want to wait for possibly hours for the automatic re-training heavy lifting and inferencing to finish, there is a handy \"Stop API\" call."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# StopResource\n",
"stop_forecast_arn_deep_ar = forecast.stop_resource(ResourceArn=forecast_arn_deep_ar)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Delete forecast\n",
"util.wait_till_delete(lambda: forecast.delete_forecast(ForecastArn = forecast_arn_deep_ar))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Submit the create forecast job again\n",
"\n",
"Maybe you've experimented with other predictors, but now you've decided this one is the best, and you now want to come back to it and really create a forecast."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"create_forecast_response_deep_ar = \\\n",
" forecast.create_forecast(ForecastName=forecast_name_deep_ar,\n",
" PredictorArn=predictor_arn_deep_ar)\n",
"\n",
"forecast_arn_deep_ar = create_forecast_response_deep_ar['ForecastArn']"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Check the status of the forecast process, when the status change from **CREATE_IN_PROGRESS** to **ACTIVE**, we can continue to next steps. Depending on data size, model selection and choice of hyper parameters tuning,it can take several hours to be **ACTIVE**."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"status = util.wait(lambda: forecast.describe_forecast(ForecastArn=forecast_arn_deep_ar))\n",
"assert status"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"scrolled": true
},
"outputs": [],
"source": [
"forecast.describe_forecast(ForecastArn=forecast_arn_deep_ar)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Step 5: Query the Forecast "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Once created, the forecast results are ready and you view them. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"item_id"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"forecast_response_deep = forecastquery.query_forecast(\n",
" ForecastArn=forecast_arn_deep_ar,\n",
" Filters={\"item_id\": item_id})\n",
"\n",
"forecast_response_deep"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%store forecast_arn_deep_ar\n",
"%store predictor_arn_deep_ar"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Next Steps\n",
"\n",
"Congratulations!! You've trained your first Amazon Forecast model and generated your first forecast!!\n",
"\n",
"To dive deeper, here are a couple options for further evaluation:\n",
"