# Building Your Predictor

Forecasting is used in a variety of applications and business use cases: For example, retailers need to forecast the sales of their products to decide how much stock they need by location, Manufacturers need to estimate the number of parts required at their factories to optimize their supply chain, Businesses need to estimate their flexible workforce needs, Utilities need to forecast electricity consumption needs in order to attain an efficient energy network, and enterprises need to estimate their cloud infrastructure needs.

<img src="https://amazon-forecast-samples.s3-us-west-2.amazonaws.com/common/images/forecast_overview_steps.png" width="98%">

In this notebook we will be walking through the steps outlined in 2nd-through-4th boxes above to build and query your first forecast.


## Table Of Contents
* Step 1: [Setup Amazon Forecast](#setup)
* Step 2: [Create a Predictor](#createPredictor)
* Step 3: [Get Predictor Error Metrics from Backtesting](#predictorErrors)
* Step 4: [Create a Forecast](#createForecast)
* Step 5: [Query the Forecast](#queryForecast)
* [Next Steps](#nextSteps)

For more informations about APIs, please check the [documentation](https://docs.aws.amazon.com/forecast/latest/dg/what-is-forecast.html)

## Step 1: Setup Amazon Forecast<a class="anchor" id="setup"></a>


This section sets up the permissions and relevant endpoints.

In [5]:
import sys
import os
import time
import pandas as pd

# importing forecast notebook utility from notebooks/common directory
sys.path.insert( 0, os.path.abspath("../../common") )
import util

%reload_ext autoreload
import boto3
import s3fs

The line below will retrieve your stored variables from the first notebook.

In [6]:
%store -r

# Print your choices from first notebook
print(f"item_id = {item_id}")
print(f"project = {PROJECT}")
print(f"data_version = {DATA_VERSION}")
print(f"Forecast length = {FORECAST_LENGTH}")
print(f"Dataset frequency = {DATASET_FREQUENCY}")
print(f"Timestamp format = {TIMESTAMP_FORMAT}")
print(f"dataset_group_arn = {dataset_group_arn}")
print(f"role_arn = {role_arn}")
%store -r bucket_name
print(f"bucket_name = {bucket_name}")
%store -r region
print(f"region = {region}")

item_id = client_12
project = util_power_demo
data_version = 1
Forecast length = 24
Dataset frequency = H
Timestamp format = yyyy-MM-dd hh:mm:ss
dataset_group_arn = arn:aws:forecast:us-west-2:730750055343:dataset-group/util_power_demo_1
role_arn = arn:aws:iam::730750055343:role/ForecastNotebookRole-Basic
bucket_name = forecast-demo-uci-electricity-jeetub
region = us-west-2


The last part of the setup process is to validate that your account can communicate with Amazon Forecast, the cell below does just that.

In [7]:
# Connect API session
session = boto3.Session(region_name=region) 
forecast = session.client(service_name='forecast') 
forecastquery = session.client(service_name='forecastquery')

In [8]:
# check you can communicate with Forecast API session
forecast.list_predictors()

{'Predictors': [],
 'ResponseMetadata': {'RequestId': 'af1cb6e1-4c9e-42ec-be5a-5ed95505c987',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'content-type': 'application/x-amz-json-1.1',
   'date': 'Thu, 21 Oct 2021 22:15:24 GMT',
   'x-amzn-requestid': 'af1cb6e1-4c9e-42ec-be5a-5ed95505c987',
   'content-length': '17',
   'connection': 'keep-alive'},
  'RetryAttempts': 0}}

## Step 2: Create a Predictor <a class="anchor" id="createPredictor"></a>

Once the datasets are specified with the corresponding schema, Amazon Forecast will automatically aggregate, at the specified time granularity, all the relevant pieces of information for each item, such as sales, price, promotions, as well as categorical attributes, and generate the desired dataset. Next, one can choose an algorithm (forecasting model) and evaluate how well this particular algorithm works on this dataset. The following graph gives a high-level overview of the forecasting models.
<img src="https://amazon-forecast-samples.s3-us-west-2.amazonaws.com/common/images/recipes.png" width="98%">
<img src="https://amazon-forecast-samples.s3-us-west-2.amazonaws.com/common/images/mqcnn.png" width="70%">
<img src="https://amazon-forecast-samples.s3-us-west-2.amazonaws.com/common/images/pred_details.png">


Amazon Forecast provides several state-of-the-art forecasting algorithms including classic forecasting methods such as ETS, ARIMA, Prophet and deep learning approaches such as DeepAR+. Classical forecasting methods, such as Autoregressive Integrated Moving Average (ARIMA) or Exponential Smoothing (ETS), fit a single model to each individual time series, and then use that model to extrapolate the time series into the future. Amazon's Non-Parametric Time Series (NPTS) forecaster also fits a single model to each individual time series.  Unlike the naive or seasonal naive forecasters that use a fixed time index (the previous index $T-1$ or the past season $T - \tau$) as the prediction for time step $T$, NPTS randomly samples a time index $t \in \{0, \dots T-1\}$ in the past to generate a sample for the current time step $T$.

In many applications, you may encounter many similar time series across a set of cross-sectional units. Examples of such time series groupings are demand for different products, server loads, and requests for web pages. In this case, it can be beneficial to train a single model jointly over all of these time series. CNN-QR and DeepAR+ take this approach, outperforming the standard ARIMA and ETS methods when your dataset contains hundreds of related time series. The trained model can also be used for generating forecasts for new time series that are similar to the ones it has been trained on. 

While deep learning approaches can outperform standard methods, this is only possible when there is sufficient data available for training. It is not true for example when one trains a neural network with a time-series containing only a few dozen observations. Amazon Forecast provides the best of two worlds allowing users to either choose a specific algorithm or let Amazon Forecast automatically perform model selection. 


## How to evaluate a forecasting model?

Before moving forward, let's first introduce the notion of *backtest* when evaluating forecasting models. The key difference between evaluating forecasting algorithms and standard ML applications is that we need to make sure there is no future information gets used in the past. In other words, the procedure needs to be causal. 

<img src="https://amazon-forecast-samples.s3-us-west-2.amazonaws.com/common/images/backtest.png" width=70%>



In [9]:
# Which algorithm do you want to use?  Choices are:
# 1. Choose PerformAutoML=True if you want to let Amazon Forecast choose a recipe automatically.  
# 2. If you know which recipe you want, the next level of automation is PerformHPO=True.
# 3. Finally, you can specify exactly which recipe and enter your own hyperparameter values
# https://docs.aws.amazon.com/forecast/latest/dg/aws-forecast-choosing-recipes.html

algorithm_arn = 'arn:aws:forecast:::algorithm/'
algorithm = 'Deep_AR_Plus'
algorithm_arn_deep_ar_plus = algorithm_arn + algorithm
predictor_name_deep_ar = f"{PROJECT}_{DATA_VERSION}_{algorithm.lower()}"
print(f"Predictor Name = {predictor_name_deep_ar}")

Predictor Name = util_power_demo_1_deep_ar_plus


In [None]:
create_predictor_response = \
    forecast.create_predictor(PredictorName=predictor_name_deep_ar,
                              AlgorithmArn=algorithm_arn_deep_ar_plus,
                              ForecastHorizon=FORECAST_LENGTH,
                              PerformAutoML=False,
                              PerformHPO=False,
                              InputDataConfig= {"DatasetGroupArn": dataset_group_arn},
                              FeaturizationConfig= {"ForecastFrequency": DATASET_FREQUENCY}
                             )

In [None]:
predictor_arn_deep_ar = create_predictor_response['PredictorArn']

### Stop the train predictor

Possibly during fine-tuning development, you'll accidentally start training a predictor before you're ready.  If you don't want to wait for possibly hours for the automatic training heavy lifting to finish, there is a handy "Stop API" call.

In [None]:
# StopResource
stop_predictor_arn_deep_ar = forecast.stop_resource(ResourceArn=predictor_arn_deep_ar)

In [None]:
# Delete the predictor
util.wait_till_delete(lambda: forecast.delete_predictor(PredictorArn = predictor_arn_deep_ar))

### Submit the train predictor job again

Maybe you fixed something you forgot before, and now you're ready to really train your predictor.

In [None]:
create_predictor_response = \
    forecast.create_predictor(PredictorName=predictor_name_deep_ar,
                              AlgorithmArn=algorithm_arn_deep_ar_plus,
                              ForecastHorizon=FORECAST_LENGTH,
                              PerformAutoML=False,
                              PerformHPO=False,
                              InputDataConfig= {"DatasetGroupArn": dataset_group_arn},
                              FeaturizationConfig= {"ForecastFrequency": DATASET_FREQUENCY}
                             )
predictor_arn_deep_ar = create_predictor_response['PredictorArn']

Check the status of the predictor. When the status change from **CREATE_IN_PROGRESS** to **ACTIVE**, we can continue to next steps. Depending on data size, model selection and choice of hyper parameters tuning，it can take several hours to be **ACTIVE**.

In [None]:
status = util.wait(lambda: forecast.describe_predictor(PredictorArn=predictor_arn_deep_ar))
assert status

In [None]:
forecast.describe_predictor(PredictorArn=predictor_arn_deep_ar)

## Step 3. Get Predictor Error Metrics from Backtesting <a class="anchor" id="predictorErrors"></a>

After creating the predictors, we can query the forecast accuracy given by the backtest scenario and have a quantitative understanding of the performance of the algorithm. Such a process is iterative in nature during model development. When an algorithm with satisfying performance is found, the customer can deploy the predictor into a production environment, and query the forecasts for a particular item to make business decisions. The figure below shows a sample plot of different quantile forecasts of a predictor.

In [None]:
error_metrics_deep_ar_plus = forecast.get_accuracy_metrics(PredictorArn=predictor_arn_deep_ar)
error_metrics_deep_ar_plus

## Step 4. Create a Forecast <a class="anchor" id="createForecast"></a>

In [None]:
forecast_name_deep_ar = f"{PROJECT}_{DATA_VERSION}_{algorithm.lower()}"
print(f"Predictor Name = {predictor_name_deep_ar}")

In [None]:
# create_forecast_response=forecast.create_forecast(ForecastName=forecastName,
#                                                   PredictorArn=predictor_arn)
# forecast_arn = create_forecast_response['ForecastArn']

create_forecast_response_deep_ar = \
    forecast.create_forecast(ForecastName=forecast_name_deep_ar,
                             PredictorArn=predictor_arn_deep_ar)

forecast_arn_deep_ar = create_forecast_response_deep_ar['ForecastArn']

### Stop the create forecast

Possibly during experimentation, you've found the best predictor, but it's not this predictor.  If you don't want to wait for possibly hours for the automatic re-training heavy lifting and inferencing to finish, there is a handy "Stop API" call.

In [None]:
# StopResource
stop_forecast_arn_deep_ar = forecast.stop_resource(ResourceArn=forecast_arn_deep_ar)

In [None]:
# Delete forecast
util.wait_till_delete(lambda: forecast.delete_forecast(ForecastArn = forecast_arn_deep_ar))

### Submit the create forecast job again

Maybe you've experimented with other predictors, but now you've decided this one is the best, and you now want to come back to it and really create a forecast.

In [None]:
create_forecast_response_deep_ar = \
    forecast.create_forecast(ForecastName=forecast_name_deep_ar,
                             PredictorArn=predictor_arn_deep_ar)

forecast_arn_deep_ar = create_forecast_response_deep_ar['ForecastArn']

Check the status of the forecast process, when the status change from **CREATE_IN_PROGRESS** to **ACTIVE**, we can continue to next steps. Depending on data size, model selection and choice of hyper parameters tuning，it can take several hours to be **ACTIVE**.

In [None]:
status = util.wait(lambda: forecast.describe_forecast(ForecastArn=forecast_arn_deep_ar))
assert status

In [None]:
forecast.describe_forecast(ForecastArn=forecast_arn_deep_ar)

## Step 5:  Query the Forecast <a class="anchor" id="queryForecast"></a>

Once created, the forecast results are ready and you view them. 

In [None]:
item_id

In [None]:
forecast_response_deep = forecastquery.query_forecast(
    ForecastArn=forecast_arn_deep_ar,
    Filters={"item_id": item_id})

forecast_response_deep

In [None]:
%store forecast_arn_deep_ar
%store predictor_arn_deep_ar

## Next Steps<a class="anchor" id="nextSteps"></a>

Congratulations!! You've trained your first Amazon Forecast model and generated your first forecast!!

To dive deeper, here are a couple options for further evaluation:
<ul>
    <li><b>To see an example of single item evaluation in a notebook</b>, see  `3.Evaluating_Your_Predictor.ipynb`.</li>
    <li><b>For an example how to use a notebook and Predictor Backtest Forecasts to evaluate all items at once using custom metrics</b>, see `../advanced/Item_Level_Accuracy/Item_Level_Accuracy_Using_Bike_Example.ipynb`. </li>
    <li><b>Finally, for a production-level example, how to use Amazon QuickSight to visualize either Predictor Backtest Forecasts and/or Forecasts so you can share and socialize the results with others</b>, <a href="https://aws.amazon.com/solutions/implementations/improving-forecast-accuracy-with-machine-learning/?did=sl_card&trk=sl_card" target="_blank">see our automation solution Improving Forecast Accuracy</a></li>
        <li><a href="https://console.aws.amazon.com/cloudformation/home?region=us-east-1#/stacks/new?stackName=forecast-stack&t[…]acy-with-machine-learning-demo.template" target="_blank">Quick launch link for above automation</a></li>
    </ul>
    
For other advanced topics, see the `advanced` section of our notebooks.  Several you may want to check out next:
<ul>
    <li>Example how to use Related Data: <a href="https://github.com/aws-samples/amazon-forecast-samples/blob/master/notebooks/advanced/Incorporating_Related_Time_Series_dataset_to_your_Predictor/Incorporating_Related_Time_Series_dataset_to_your_Predictor.ipynb" target="_blank">Incorporating Related Time Series</a></li>
    <li>Example how to use our built-in, hosted-by-AWS weather data: <a href="https://github.com/aws-samples/amazon-forecast-samples/blob/master/notebooks/advanced/Weather_index/1.%20Training%20your%20model%20with%20Weather%20Index.ipynb" target="_blank">Training your model with Weather Index </a></li>
    </ul>