# What-if Analysis with RTS/TTS filling options

This notebook describes how to perform what-if analysis using two different missing value filling options. As a sample use-case, we use product demand and price data with some missing values, and see how product demand forecast changes if we increase the prices of products.

Following is the steps:

1. [Import libraries and setup AWS resources](#Import-libraries-and-setup-AWS-resources)
2. [Prepare training dataset CSVs](#Prepare-training-dataset-CSVs)
3. [Create DatasetGroup and Datasets](#Create-DatasetGroup-and-Datasets)
4. [Import the target time series data, and related time series data](#Import-the-target-time-series-data,-and-related-time-series-data)
5. [Create the first Predictor](#Create-the-first-Predictor)
6. [Create Forecast from the first Predictor](#Create-Forecast-from-the-first-Predictor)
7. [Create 2nd Predictor with different futurefill option](#Create-2nd-Predictor-with-different-futurefill-option)
8. [Create Forecast from the 2nd predictor](#Create-Forecast-from-the-2nd-predictor)
9. [Query forecasts, visualize and compare](#Query-forecasts,-visualize-and-compare)
10. [Resource Cleanup](#Resource-cleanup)

**Note** : In order to get two versions of forecast with different filling options, this notebook is creating two Predictors, but please note that it is also possible to perform what-if analysis with just one Predictor with multiple imports of related time series dataset.

## Import libraries and setup AWS resources

In [None]:
import sys
import os
import time
import datetime

import pandas as pd
import matplotlib.pyplot as plt

import boto3

# importing forecast notebook utility from notebooks/common directory
sys.path.insert( 0, os.path.abspath("../../common") )
import util

Configure the S3 bucket name and region name for this lesson.

- If you don't have an S3 bucket, create it first on S3.
- Although we have set the region to us-west-2 as a default value below, you can choose any of the regions that the service is available in.

In [None]:
text_widget_bucket = util.create_text_widget( "bucket_name", "input your S3 bucket name" )
text_widget_region = util.create_text_widget( "region", "input region name.", default_value="us-west-2" )

In [None]:
bucket_name = text_widget_bucket.value
assert bucket_name, "bucket_name not set."

region = text_widget_region.value
assert region, "region not set."

In [None]:
session = boto3.Session(region_name=region)
s3 = session.client('s3')
forecast = session.client(service_name='forecast') 
forecastquery = session.client(service_name='forecastquery')

In [None]:
# Create the role to provide to Amazon Forecast.
role_name = "ForecastNotebookRole-WhatIfAnalysis"
role_arn = util.get_or_create_iam_role( role_name = role_name )

## Prepare training dataset CSVs
 
1. Load historical product demand data
2. Check the loaded data, and confirm missing values
3. Split into target time series (demand) and related time series (price)
4. Upload them onto S3

In [None]:
df = pd.read_csv( "./data/product_demand_with_nan.csv" )
df

In [None]:
# Try to visualize TTS/RTS.
# You can see gap in the lines (missing values)
df[ df["item_id"]=="item_001"].plot( x="timestamp" )

In [None]:
df[df["item_id"]=="item_001" ]

In [None]:
# check how many missing values exist in the data
df.isna().sum()

In [None]:
df_tts = df[["item_id", "timestamp", "demand" ]]
df_rts = df[["item_id", "timestamp", "price" ]]

In [None]:
df_tts.to_csv( "./data/tts.csv", index=False )
df_rts.to_csv( "./data/rts.csv", index=False )

#### Upload to S3

In [None]:
version = '0'
project = "whatif_and_filling"+"_"+version

key_tts = "%s/tts.csv" % project
key_rts = "%s/rts.csv" % project

s3.upload_file( Filename="./data/tts.csv", Bucket=bucket_name, Key=key_tts )
s3.upload_file( Filename="./data/rts.csv", Bucket=bucket_name, Key=key_rts )

s3_data_path_tts = "s3://" + bucket_name + "/" + key_tts
s3_data_path_rts = "s3://" + bucket_name + "/" + key_rts

# Create DatasetGroup and Datasets
 
Creating single set of DatasetGroup, Datasets. Please note that we don't have to create two RELATED_TIME_SERIES datasets.

In [None]:
response = forecast.create_dataset_group(
 DatasetGroupName = project + "_dsg",
 Domain="RETAIL",
 )

dataset_group_arn = response['DatasetGroupArn']

In [None]:
DATASET_FREQUENCY = "M"
TIMESTAMP_FORMAT = "yyyy-MM-dd"

schema ={
 "Attributes":[
 {
 "AttributeName":"item_id",
 "AttributeType":"string"
 },
 {
 "AttributeName":"timestamp",
 "AttributeType":"timestamp"
 },
 {
 "AttributeName":"demand",
 "AttributeType":"float"
 },
 ]
}

response = forecast.create_dataset(
 Domain = "RETAIL",
 DatasetType = 'TARGET_TIME_SERIES',
 DatasetName = project + "_tts",
 DataFrequency = DATASET_FREQUENCY, 
 Schema = schema
)

tts_dataset_arn = response['DatasetArn']

In [None]:
schema ={
 "Attributes":[
 {
 "AttributeName":"item_id",
 "AttributeType":"string"
 },
 {
 "AttributeName":"timestamp",
 "AttributeType":"timestamp"
 },
 {
 "AttributeName":"price",
 "AttributeType":"float"
 },
 ]
}

response = forecast.create_dataset(
 Domain = "RETAIL",
 DatasetType = 'RELATED_TIME_SERIES',
 DatasetName = project + "_rts",
 DataFrequency = DATASET_FREQUENCY, 
 Schema = schema
)

rts_dataset_arn = response['DatasetArn']

In [None]:
forecast.update_dataset_group( 
 DatasetGroupArn = dataset_group_arn, 
 DatasetArns = [
 tts_dataset_arn,
 rts_dataset_arn,
 ]
)

## Import the target time series data, and related time series data

In [None]:
response = forecast.create_dataset_import_job(
 DatasetImportJobName = project + "_tts_import",
 DatasetArn = tts_dataset_arn,
 DataSource = {
 "S3Config" : {
 "Path" : s3_data_path_tts,
 "RoleArn" : role_arn
 }
 },
 TimestampFormat = TIMESTAMP_FORMAT
)

tts_dataset_import_job_arn = response['DatasetImportJobArn']

In [None]:
response = forecast.create_dataset_import_job(
 DatasetImportJobName = project + "_rts_import1",
 DatasetArn = rts_dataset_arn,
 DataSource = {
 "S3Config" : {
 "Path" : s3_data_path_rts,
 "RoleArn" : role_arn
 }
 },
 TimestampFormat = TIMESTAMP_FORMAT
)

rts_dataset_import_job_arn = response['DatasetImportJobArn']

In [None]:
status_indicator = util.StatusIndicator()

while True:
 status = forecast.describe_dataset_import_job( DatasetImportJobArn = tts_dataset_import_job_arn )['Status']
 status_indicator.update(status)
 if status in ('ACTIVE', 'CREATE_FAILED'): break
 time.sleep(10)

status_indicator.end()

In [None]:
status_indicator = util.StatusIndicator()

while True:
 status = forecast.describe_dataset_import_job( DatasetImportJobArn = rts_dataset_import_job_arn )['Status']
 status_indicator.update(status)
 if status in ('ACTIVE', 'CREATE_FAILED'): break
 time.sleep(10)

status_indicator.end()

## Create the first Predictor

Creating the 1st Predictor using futurefill option "min".

In [None]:
PREDICTOR_NAME = project + "_predictor_1"
FORECAST_HORIZON = 3
config = [
 {
 # for Target time series
 "AttributeName": "demand",
 "Transformations": {"frontfill": "none","middlefill": "mean", "backfill": "mean"}
 },
 {
 # for Related time series
 "AttributeName": "price",
 "Transformations": {"middlefill": "mean", "backfill": "mean","futurefill": "min"}
 }
]

In [None]:
response = \
 forecast.create_auto_predictor(PredictorName = PREDICTOR_NAME,
 ForecastHorizon = FORECAST_HORIZON,
 ForecastFrequency = DATASET_FREQUENCY,
 DataConfig = {
 'DatasetGroupArn': dataset_group_arn, 
 'AttributeConfigs':config
 })

In [None]:
predictor_arn_1 = response['PredictorArn']
print(f"Waiting for Predictor with ARN {predictor_arn_1} to become ACTIVE. Depending on data size and predictor setting,it can take several hours to be ACTIVE.\n\nCurrent Status:")
status = util.wait(lambda: forecast.describe_auto_predictor(PredictorArn=predictor_arn_1))


In [None]:
response = forecast.describe_auto_predictor(PredictorArn=predictor_arn_1)
print(f"\n\nThe Predictor with ARN {predictor_arn_1} is now {response['Status']}.")

## Create Forecast from the first Predictor

In [None]:
response = forecast.create_forecast(
 ForecastName = project + "_forecast_1",
 PredictorArn = predictor_arn_1
)

In [None]:
forecast_arn_1 = response['ForecastArn']
print(f"Waiting for Forecast with ARN {forecast_arn_1} to become ACTIVE. Depending on data size and predictor settings,it can take several hours to be ACTIVE.\n\nCurrent Status:")
status = util.wait(lambda: forecast.describe_forecast(ForecastArn=forecast_arn_1))

In [None]:
response = forecast.describe_forecast(ForecastArn=forecast_arn_1)
print(f"\n\nThe Forecast with ARN {forecast_arn_1} is now {response['Status']}.")

## Create 2nd Predictor with different futurefill option

Creating the 2nd Predictor using futurefill option "max".

In [None]:
PREDICTOR_NAME = project + "_predictor_2"
FORECAST_HORIZON = 3
config = [
 {
 # for Target time series
 "AttributeName": "demand",
 "Transformations": {"frontfill": "none","middlefill": "mean", "backfill": "mean"}
 },
 {
 # for Related time series
 "AttributeName": "price",
 "Transformations": {"middlefill": "mean", "backfill": "mean","futurefill": "max"} #we use "max" for futurefill option

 }
]

In [None]:
response = forecast.create_auto_predictor(PredictorName = PREDICTOR_NAME,
 ForecastHorizon = FORECAST_HORIZON,
 ForecastFrequency = DATASET_FREQUENCY,
 DataConfig = {
 'DatasetGroupArn': dataset_group_arn, 
 'AttributeConfigs':config
 })

In [None]:
predictor_arn_2 = response['PredictorArn']
print(f"Waiting for Predictor with ARN {predictor_arn_2} to become ACTIVE. Depending on data size and predictor setting,it can take several hours to be ACTIVE.\n\nCurrent Status:")

In [None]:
status = util.wait(lambda: forecast.describe_auto_predictor(PredictorArn=predictor_arn_2))

In [None]:
response = forecast.describe_auto_predictor(PredictorArn=predictor_arn_2)
print(f"\n\nThe Predictor with ARN {predictor_arn_2} is now {response['Status']}.")

## Create Forecast from the 2nd predictor

In [None]:
response = forecast.create_forecast(
 ForecastName = project + "_forecast_2",
 PredictorArn = predictor_arn_2
)

In [None]:
forecast_arn_2 = response['ForecastArn']
print(f"Waiting for Forecast with ARN {forecast_arn_2} to become ACTIVE. Depending on data size and predictor settings,it can take several hours to be ACTIVE.\n\nCurrent Status:")
status = util.wait(lambda: forecast.describe_forecast(ForecastArn=forecast_arn_2))


In [None]:
response = forecast.describe_forecast(ForecastArn=forecast_arn_2)
print(f"\n\nThe Forecast with ARN {forecast_arn_2} is now {response['Status']}.")

## Query forecasts, visualize and compare

So far we got two Forecasts for different futurefill options (min vs max). Let's get the forecasted product demands, visualize, and compare.

In [None]:
training_data_period = ( df_tts["timestamp"].min(), df_tts["timestamp"].max() )

def plot_compare( item_id ):
 plt.figure(figsize=(12, 6))
 plt.title(item_id)
 
 df_item_actual = df_tts[ df_tts["item_id"]==item_id ]
 plt.plot( pd.to_datetime(df_item_actual["timestamp"]), df_item_actual["demand"], label="actual", color=(1,0,0) )

 def plot_forecast( single_item_forecast, label, color, hatch ):

 x = []
 y_p10 = []
 y_p50 = []
 y_p90 = []

 # visually connect last actual value with forecasts
 df_connect = df_item_actual[ df_item_actual["timestamp"]==training_data_period[1] ].reset_index(drop=True)
 x.append( datetime.datetime.strptime( df_connect.at[ 0, "timestamp" ], "%Y-%m-%d" ) )
 y_p10.append( df_connect.at[0,"demand"] )
 y_p50.append( df_connect.at[0,"demand"] )
 y_p90.append( df_connect.at[0,"demand"] )

 for p10, p50, p90 in zip( single_item_forecast["p10"], single_item_forecast["p50"], single_item_forecast["p90"] ):

 date = datetime.datetime.strptime(p50["Timestamp"],"%Y-%m-%dT00:00:00").date()
 x.append(date)

 y_p10.append(p10["Value"])
 y_p50.append(p50["Value"])
 y_p90.append(p90["Value"])

 plt.plot( x, y_p50, label="%s p50" % label, color=color )
 plt.fill_between( x, y_p10, y_p90, label="%s p10-p90" % label, color=color, alpha=0.2, hatch=hatch )

 def plot_price( single_item_price, label, color ):
 x = []
 y = []
 
 for timestamp, price in zip( single_item_price["timestamp"], single_item_price["price"] ):
 date = datetime.datetime.strptime(timestamp,"%Y-%m-%d").date()
 x.append(date)
 y.append(price)

 plt.plot( x, y, label=label, color=color, linestyle=":" )
 
 response = forecastquery.query_forecast(
 ForecastArn = forecast_arn_1,
 Filters = { "item_id" : item_id }
 )
 plot_forecast( response["Forecast"]["Predictions"], "forecast1", (0,0,1), "+" )

 response = forecastquery.query_forecast(
 ForecastArn = forecast_arn_2,
 Filters = { "item_id" : item_id }
 )
 plot_forecast( response["Forecast"]["Predictions"], "forecast2", (1,0,1), "x" )

 bottom, top = plt.ylim()
 plt.ylim((-top*0.03, top*1.03))

 plt.legend( loc='lower left' )


In [None]:
for item_id in [ "item_132","item_151","item_234" ]:
 plot_compare(item_id)

## Resource cleanup

#### For clean-up, this should be uncommented.

#### Delete forecasts

In [None]:
# util.wait_till_delete(lambda: forecast.delete_forecast(ForecastArn = forecast_arn_1))
# util.wait_till_delete(lambda: forecast.delete_forecast(ForecastArn = forecast_arn_2))

#### Delete predictor

In [None]:
# util.wait_till_delete(lambda: forecast.delete_predictor(PredictorArn = predictor_arn_1))
# util.wait_till_delete(lambda: forecast.delete_predictor(PredictorArn = predictor_arn_2))

#### Delete dataset import jobs

In [None]:
# util.wait_till_delete(lambda: forecast.delete_dataset_import_job(DatasetImportJobArn = tts_dataset_import_job_arn))
# util.wait_till_delete(lambda: forecast.delete_dataset_import_job(DatasetImportJobArn = rts_dataset_import_job_arn))

#### Delete datasets

In [None]:
# util.wait_till_delete(lambda: forecast.delete_dataset(DatasetArn = tts_dataset_arn))
# util.wait_till_delete(lambda: forecast.delete_dataset(DatasetArn = rts_dataset_arn))

#### Delete dataset group

In [None]:
# util.wait_till_delete(lambda: forecast.delete_dataset_group(DatasetGroupArn = dataset_group_arn))

#### Delete IAM role

In [None]:
# util.delete_iam_role( role_name )