# Introducing Custom Time Frequency Support

Prior to this feature release, customers were able to train a time-series model and produce forecasts against a specific set of time-intervals which Amazon Forecast refers to as ForecastFrequency.  As described [here](https://docs.aws.amazon.com/forecast/latest/dg/howitworks-predictor.html) forecasting frequencies are in the form of time units that range from minutes, to hours, and up to a year.

With this new feature release, we are extending the concept of frequency units to include a range of allowed values.  This allows you to select a frequency unit together with a value.  The following table explains the valid list of frequency units for each time-scale unit:

|Code|Frequency units|Allowed values|
|--|--|--|
|MIN|Minute|1-59|
|H|Hour|1-23|
|D|Day|1-6|
|W|Week|1-4|
|M|Month|1-11|
|Y|Year|1|

Given this new model, you can create a wider range of forecasting intervals according to your business use case.  When it comes to workforce planning, one customer may want to forecast at 8-hour shift intervals.  In a financial or demand forecasting scenario, a business might want to produce quarterly forecasts.   These scenarios can be achieved with codes of 8H and 3M, repsectively.  Now that the complete set of intervals are known, you can choose something to fit your specific use-case.  There are no API changes with this release; we simple allow more values in the existing API request.

This notebooks provides an example of producing a forecast at a quarterly basis having data at the daily basis.  In addition, with the same dataset, another short-term forecast is demonstrated by having seven 3-day forecasts.  The use-case with the short-term forecast might be ordering perishable items every 3-days.  This would help anticipate what the purchase orders would look like.

As before, certain rules about forecast frequency still apply.  The forecasting frequency should always be greater than or equal to the Data frequency provided for the Target Time series dataset if Related Timeseries data is not present. Also, if RTS is present, the data frequency of RTS dataset should match the forecast frequency. 


## Table of Contents
* [Pre-requisites](#prerequisites)
* Step 1: [Import your data](#import)
* Step 2: [Train a predictor](#predictor)
* Step 3: [Generate forecasts](#forecast)
* Step 4: [Query/View the forecasts](#query)
* [Clean-up](#cleanup)

## Pre-requisites <a class="anchor" id="prerequisites"></a>
Before we get started, lets set up the notebook environment, the AWS SDK client for Amazon Forecast and IAM Role used by Amazon Forecast to access your data.

#### Setup  Environment

In [1]:
%%capture --no-stderr setup

!pip install pandas s3fs matplotlib ipywidgets
!pip install boto3 --upgrade

%reload_ext autoreload

#### Setup Imports

In [2]:
import sys
import os

sys.path.insert( 0, os.path.abspath("../../common") )

import json
import util
import boto3
import s3fs
import pandas as pd

Configure the S3 bucket name and region name for this lesson.

- If you don't have an S3 bucket, create it first on S3.
- Although we have set the region to us-west-2 as a default value below, you can choose any of the regions that the service is available in.

In [3]:
bucket_name = 'input your existing S3 bucket name'
region = 'us-west-2'

#### Create an instance of AWS SDK client for Amazon Forecast

In [4]:
session = boto3.Session(region_name=region)
s3 = session.client('s3')
forecast = session.client(service_name='forecast') 
forecastquery = session.client(service_name='forecastquery')

#### Get IAM Role Amazon Forecast will use to access your data

In [5]:
from sagemaker import get_execution_role
role_arn = get_execution_role()

## Step 1: Import your data. <a class="anchor" id="import"></a>

In this step, we will create a **Dataset** and **Import** the data from S3 to Amazon Forecast. To train a Predictor we will need a **DatasetGroup** that groups the input **Datasets**. So, we will end this step by creating a **DatasetGroup** with the imported **Dataset**.

#### Peek at the data and upload it to S3.

Here, we will view the dataset locally, then upload the file to Amazon S3. Amazon Forecast consumes input data from S3.
    
A sample [Target Time Series](https://github.com/aws-samples/amazon-forecast-samples/blob/main/library/content/TargetTimeSeries.md) (TTS) is provided.  Please visit the links here to learn more about target and related time series.

In [6]:
df_tts = pd.read_csv('./data/sample_demand.csv', low_memory=False)
df_tts.head(5)

Unnamed: 0,timestamp,item_id,target_value
0,2017-12-01 00:00:00,4,27
1,2017-12-01 00:00:00,7,36
2,2017-12-01 00:00:00,10,2
3,2017-12-01 00:00:00,12,1
4,2017-12-01 00:00:00,13,61


Upload this file to Amazon S3

In [7]:
project = "custom_frequency"

key_tts = "%s/sample_demand.csv" % project

s3.upload_file( Filename="./data/sample_demand.csv", Bucket=bucket_name, Key=key_tts )

s3_data_path_tts = "s3://" + bucket_name + "/" + key_tts

#### Creating the Dataset

In [8]:
DATASET_FREQUENCY = "H" # H for hourly

TS_DATASET_NAME = "CUSTOM_FREQUENCY_TS"
TS_SCHEMA = {
   "Attributes":[
      {
         "AttributeName":"timestamp",
         "AttributeType":"timestamp"
      },
      {
         "AttributeName":"item_id",
         "AttributeType":"string"
      },
      {
         "AttributeName":"target_value",
         "AttributeType":"integer"
      }
   ]
}

create_dataset_response = forecast.create_dataset(Domain="CUSTOM",
                                                  DatasetType='TARGET_TIME_SERIES',
                                                  DatasetName=TS_DATASET_NAME,
                                                  DataFrequency=DATASET_FREQUENCY,
                                                  Schema=TS_SCHEMA)

ts_dataset_arn = create_dataset_response['DatasetArn']
describe_dataset_response = forecast.describe_dataset(DatasetArn=ts_dataset_arn)

print(f"The dataset is now {describe_dataset_response['Status']}.")

The dataset is now ACTIVE.


#### Importing the Dataset

In [9]:
TIMESTAMP_FORMAT = "yyyy-MM-dd HH:mm:ss"
TS_IMPORT_JOB_NAME = "PRODUCT_TTS_IMPORT"

ts_dataset_import_job_response = \
    forecast.create_dataset_import_job(DatasetImportJobName=TS_IMPORT_JOB_NAME,
                                       DatasetArn=ts_dataset_arn,
                                       DataSource= {
                                         "S3Config" : {
                                             "Path": s3_data_path_tts,
                                             "RoleArn": role_arn
                                         } 
                                       },
                                       TimestampFormat=TIMESTAMP_FORMAT)

ts_dataset_import_job_arn = ts_dataset_import_job_response['DatasetImportJobArn']

print(f"Waiting for Dataset Import Job with to become ACTIVE. This process could take 5-10 minutes.\n\nCurrent Status:")

status = util.wait(lambda: forecast.describe_dataset_import_job(DatasetImportJobArn=ts_dataset_import_job_arn))

Waiting for Dataset Import Job with to become ACTIVE. This process could take 5-10 minutes.

Current Status:
CREATE_PENDING .
CREATE_IN_PROGRESS .........................................................................
ACTIVE 


The Dataset Import Job with is now ACTIVE.


#### Creating a DatasetGroup

In [10]:
DATASET_GROUP_NAME = "CUSTOM_FREQUENCY_DEMO"
DATASET_ARNS = [ts_dataset_arn]

create_dataset_group_response = \
    forecast.create_dataset_group(Domain="CUSTOM",
                                  DatasetGroupName=DATASET_GROUP_NAME,
                                  DatasetArns=DATASET_ARNS)

dataset_group_arn = create_dataset_group_response['DatasetGroupArn']
describe_dataset_group_response = forecast.describe_dataset_group(DatasetGroupArn=dataset_group_arn)

print(f"The DatasetGroup is now {describe_dataset_group_response['Status']}.")

The DatasetGroup is now ACTIVE.


## Step 2: Train a predictor <a class="anchor" id="predictor"></a>

In this step, we will create a **Predictor** using the **DatasetGroup** that was created above. After creating the predictor, we will review the accuracy obtained through the backtesting process to get a quantitative understanding of the performance of the predictor.

#### Train a predictor to help generate short term purchase orders

In this example a "3-day" order is created to meet the demand of three days.  In addition, the request is for 7 series of these, giving a total of 21 day coverage.

In [12]:
PREDICTOR_NAME = "PURCHASE_ORDER_PREDICTOR"
FORECAST_HORIZON = 7
FORECAST_FREQUENCY = "3D"

create_auto_predictor_response = \
    forecast.create_auto_predictor(PredictorName = PREDICTOR_NAME,
                                   ForecastHorizon = FORECAST_HORIZON,
                                   ForecastFrequency = FORECAST_FREQUENCY,
                                   DataConfig = {
                                       'DatasetGroupArn': dataset_group_arn
                                   }
                                  )

po_predictor_arn = create_auto_predictor_response['PredictorArn']

#### Train a predictor to help anticipate demand over quarters

In this example a "3-month" forecast is created to show estimated demand for quarter financial projection. 

In [13]:
PREDICTOR_NAME = "QUARTER_PREDICTOR"
FORECAST_HORIZON = 1
FORECAST_FREQUENCY = "3M"

create_auto_predictor_response = \
    forecast.create_auto_predictor(PredictorName = PREDICTOR_NAME,
                                   ForecastHorizon = FORECAST_HORIZON,
                                   ForecastFrequency = FORECAST_FREQUENCY,
                                   DataConfig = {
                                       'DatasetGroupArn': dataset_group_arn
                                   }
                                  )

quarter_predictor_arn = create_auto_predictor_response['PredictorArn']

Poll for the two predictors, training in parallel, to complete.  After both are complete, the workflow can advance.

In [14]:
describe_auto_predictor_response = util.wait(lambda: forecast.describe_auto_predictor(PredictorArn=po_predictor_arn))
describe_auto_predictor_response = util.wait(lambda: forecast.describe_auto_predictor(PredictorArn=quarter_predictor_arn))

CREATE_IN_PROGRESS ......................................................................................................................................................................................................................................................................
ACTIVE 
ACTIVE 


## Step 3: Generate forecasts <a class="anchor" id="forecast"></a>
Finally, we will generate the forecasts using the above two predictors.   In reality, you may only need one forecast, this is just a teaching example showing how one dataset can be forecasted at multiple, custom time-frequencies.

#### Generate a forecast from the 3-day Purchase Order Model

Here, the ARN for the 3-day (3D) purchase order model is supplied.  The imported dataset along with the predictor model is used to produce the requested forecast.

In [15]:
FORECAST_NAME = "PURCHASE_ORDER_FORECAST"

create_forecast_response = \
    forecast.create_forecast(ForecastName=FORECAST_NAME,
                             PredictorArn=po_predictor_arn)

po_forecast_arn = create_forecast_response['ForecastArn']

#### Generate a forecast from the Quarter Model

Here, the ARN for the quarter (3M) model is supplied.

In [16]:
FORECAST_NAME = "QUARTER_FORECAST"

create_forecast_response = \
    forecast.create_forecast(ForecastName=FORECAST_NAME,
                             PredictorArn=quarter_predictor_arn)

quarter_forecast_arn = create_forecast_response['ForecastArn']

Poll for the two forecasts to complete. 

In [17]:
status = util.wait(lambda: forecast.describe_forecast(ForecastArn=po_forecast_arn))
status = util.wait(lambda: forecast.describe_forecast(ForecastArn=quarter_forecast_arn))


CREATE_PENDING .
CREATE_IN_PROGRESS ....................................................................................................
ACTIVE 
ACTIVE 


## Step 4: Query forecasts <a class="anchor" id="query"></a>

In this step, a lightweight API is made for a couple sample items to view the forecasted numbers.  Observe in the dates returned how they are spaced out according the the custom frequency and how the demand value is in alignment with the daily average -- as a general litmus test or rule of thumb.

In [18]:
item_id="1"

Using the Amazon Forecast Query API, a request for predictions for the named item_id is made using the purchase order forecast.  We expect to see predictions every 3 days which holds true in the dataframe shown below.

In [19]:
forecast_response = forecastquery.query_forecast(
    ForecastArn=po_forecast_arn,
    Filters={"item_id": item_id}
)

forecast_p50_df = pd.DataFrame.from_dict(forecast_response['Forecast']['Predictions']['p50'])

forecast_p50_df

Unnamed: 0,Timestamp,Value
0,2019-02-03T00:00:00,38.617483
1,2019-02-06T00:00:00,43.642546
2,2019-02-09T00:00:00,43.977778
3,2019-02-12T00:00:00,43.579142
4,2019-02-15T00:00:00,49.89582
5,2019-02-18T00:00:00,43.518435
6,2019-02-21T00:00:00,49.395246


Next, a request for predictions is made using the quarterly forecast.  We expect to an aggregate prediction for the quarter which holds true and is shown in the dataframe below.  This is a synthethic dataset, your use case will differ.  The purpose of this example is to show you how to create combinations of time-frequency that make sense for your business and obtain the predicted outcomes.

In [20]:
forecast_response = forecastquery.query_forecast(
    ForecastArn=quarter_forecast_arn,
    Filters={"item_id": item_id}
)

forecast_p50_df = pd.DataFrame.from_dict(forecast_response['Forecast']['Predictions']['p50'])

forecast_p50_df

Unnamed: 0,Timestamp,Value
0,2019-03-01T00:00:00,677.744873


## Clean-up <a class="anchor" id="cleanup"></a>
Uncomment the code section to delete all resources that were created in this notebook.

In [None]:
forecast.delete_resource_tree(ResourceArn = dataset_group_arn)
forecast.delete_resource_tree(ResourceArn = ts_dataset_arn)