## With Time Alignment Boundary you define when your days, weeks, months and years begin

We are excited to announce that Amazon Forecast now offers a new feature called Time Alignment Boundary. With this feature, customers who generate predictions using forecast frequencies of daily or higher can start defining when those periods begin.

Prior to Time Alignment Boundary, daily frequencies began at midnight; weekly frequencies began on Monday; monthly frequencies began on the first day of the month while yearly frequencies began in January. Customers can now pick when the period begins to better meet their unique needs.

Time Alignment Boundary is specified when a new predictor is created, either through the AWS Console or through API. 

Using the same input dataset, this notebook provides an example of training two weekly-frequency predictors that are bound against Friday and Sunday as the start of the week. The provided notebook is saved in an executed state, so you may review outputs without having to run each cell, unless you choose to do so.

For this exercise, a small slice of yellow taxi trip records is used from [NYC Taxi and Limousine Commission (TLC)](https://www1.nyc.gov/site/tlc/about/tlc-trip-record-data.page)


## Table of Contents
* [Initial Setup](#setup)
* Step 1: [Upload sample data to S3](#upload)
* Step 2: [Create a dataset, import data and dataset group](#dataset)
* Step 3: [Create predictors](#predictors)
* Step 4: [Create forecasts](#forecasts)
* Step 5: [View forecasted data](#view)
* Step 6: [Cleanup](#cleanup)



# Initial Setup

### Upgrade boto3

Before proceeding, ensure you have upgraded boto3.

In [1]:
!pip install boto3 --upgrade > /dev/null

### Setup Imports

In [2]:
import boto3
from time import sleep
import subprocess
import sys
import os
import pandas as pd
import calendar

sys.path.insert( 0, os.path.abspath("../../common") )

import json
import util

### Function to supressing printing account numbers

In [3]:
import re

def mask_arn(input_string):

 mask_regex = re.compile(':[0-9]{12}:')
 mask = mask_regex.search(input_string)
 
 while mask:
 input_string = input_string.replace(mask.group(),'X'*12)
 mask = mask_regex.search(input_string) 
 
 return input_string

### Create an instance of AWS SDK client for Amazon Forecast

In [4]:
# Set your region accordingly, us-east-1 as shown
region = 'us-east-1'
session = boto3.Session(region_name=region) 
forecast = session.client(service_name='forecast')
forecastquery = session.client(service_name='forecastquery')

# Checking to make sure we can communicate with Amazon Forecast
assert forecast.list_forecasts()

### Setup IAM Role used by Amazon Forecast to access your data

In [6]:
role_name = "ForecastNotebookRole-Basic"
print(f"Creating Role {mask_arn(role_name)}...")
role_arn = util.get_or_create_iam_role( role_name = role_name )

# echo user inputs without account
print(f"Success! Created role = {mask_arn(role_arn).split('/')[1]}")

Creating Role ForecastNotebookRole-Basic...
The role ForecastNotebookRole-Basic already exists, skipping creation
Done.
Success! Created role = ForecastNotebookRole-Basic


# Step 1: Upload sample data to S3

The dataset has the following 3 columns:
- timestamp: Timetamp at which pick-ups are requested.
- item_id: Pick-up location ID.
- target_value: Number of pick-ups requested around the timestamp at the pick-up location.

Note: As delivered, this uses the sample file in the data folder relative to this notebook. Please take care to ensure this file is available to your notebook.

In [7]:
bucket_name = input("\nEnter S3 bucket name for uploading the data and hit ENTER key:")

s3 = boto3.resource('s3')
s3.meta.client.upload_file('./data/taxi_sample_data.csv', bucket_name, 'taxi_sample_data.csv')

# Step 2: Create a dataset, import data and dataset group

### Create Dataset

In [8]:
DATASET_FREQUENCY = "H"
TS_DATASET_NAME = "TAXI_TIME_ALIGNMENT_BOUNDARY_DEMO"
TS_SCHEMA = {
 "Attributes":[
 {
 "AttributeName":"timestamp",
 "AttributeType":"timestamp"
 },
 {
 "AttributeName":"item_id",
 "AttributeType":"string"
 },
 {
 "AttributeName":"target_value",
 "AttributeType":"integer"
 }
 ]
} 

create_dataset_response = forecast.create_dataset(Domain="CUSTOM",
 DatasetType='TARGET_TIME_SERIES',
 DatasetName=TS_DATASET_NAME,
 DataFrequency=DATASET_FREQUENCY,
 Schema=TS_SCHEMA)

ts_dataset_arn = create_dataset_response['DatasetArn']
describe_dataset_response = forecast.describe_dataset(DatasetArn=ts_dataset_arn)

print(f"Dataset ARN {mask_arn(ts_dataset_arn)} is now {describe_dataset_response['Status']}.")

Dataset ARN arn:aws:forecast:us-east-1XXXXXXXXXXXXdataset/TAXI_TIME_ALIGNMENT_BOUNDARY_DEMO is now ACTIVE.


### Import the initial seed data file

In [9]:
TS_IMPORT_JOB_NAME = 'taxi_sample_data'
TIMESTAMP_FORMAT = "yyyy-MM-dd hh:mm:ss"
ts_s3_path = f"s3://{bucket_name}/{TS_IMPORT_JOB_NAME}.csv"
TIMEZONE = "EST"

#frequency of poll event from API to check status of tasks
sleep_duration=60


ts_dataset_import_job_response = \
 forecast.create_dataset_import_job(DatasetImportJobName=TS_IMPORT_JOB_NAME,
 DatasetArn=ts_dataset_arn,
 DataSource= {
 "S3Config" : {
 "Path": ts_s3_path,
 "RoleArn": role_arn
 } 
 },
 TimestampFormat=TIMESTAMP_FORMAT,
 TimeZone = TIMEZONE)

ts_dataset_import_job_arn = ts_dataset_import_job_response['DatasetImportJobArn']

print(f"Waiting for Dataset Import Job with ARN {mask_arn(ts_dataset_import_job_arn)} to become ACTIVE.\n\nCurrent Status:\n")

status = util.wait(lambda: forecast.describe_dataset_import_job(DatasetImportJobArn=ts_dataset_import_job_arn), sleep_duration)
 

Waiting for Dataset Import Job with ARN arn:aws:forecast:us-east-1XXXXXXXXXXXXdataset-import-job/TAXI_TIME_ALIGNMENT_BOUNDARY_DEMO/taxi_sample_data to become ACTIVE.

Current Status:

CREATE_PENDING 
CREATE_IN_PROGRESS .......
ACTIVE 


### Create a dataset group

In [10]:
DATASET_GROUP_NAME = "TAXI_TIME_ALIGNMENT_BOUNDARY_DEMO"
DATASET_ARNS = [ts_dataset_arn]

create_dataset_group_response = \
 forecast.create_dataset_group(Domain="CUSTOM",
 DatasetGroupName=DATASET_GROUP_NAME,
 DatasetArns=DATASET_ARNS)

dataset_group_arn = create_dataset_group_response['DatasetGroupArn']
describe_dataset_group_response = forecast.describe_dataset_group(DatasetGroupArn=dataset_group_arn)

print(f"The DatasetGroup with ARN {mask_arn(dataset_group_arn)} is now {describe_dataset_group_response['Status']}.")

The DatasetGroup with ARN arn:aws:forecast:us-east-1XXXXXXXXXXXXdataset-group/TAXI_TIME_ALIGNMENT_BOUNDARY_DEMO is now ACTIVE.


# Step 3: Create predictors

Observe the new parameter in the create_auto_predictor() function TimeAlignmentBoundary. In this example, two predictors are created, each with a 3-week horizon. Note one predictor has DayOfWeek=Friday, the other has DayOfWeek=Sunday.

In [11]:
FORECAST_HORIZON = 3
FORECAST_FREQUENCY = "W"

#Create a predictor with week starting Friday
create_auto_predictor_response = \
 forecast.create_auto_predictor(PredictorName = 'TAXI_PREDICTOR_WEEK_FRIDAY',
 ForecastHorizon = FORECAST_HORIZON,
 ForecastFrequency = FORECAST_FREQUENCY,
 DataConfig = {
 'DatasetGroupArn': dataset_group_arn
 },
 TimeAlignmentBoundary={
 "DayOfWeek":"FRIDAY"
 },
 ExplainPredictor = False)

friday_predictor_arn = create_auto_predictor_response['PredictorArn']
print(f"Waiting for Friday Predictor ARN {mask_arn(friday_predictor_arn)} to become ACTIVE.")



#Create a predictor with week starting Sunday
create_auto_predictor_response = \
 forecast.create_auto_predictor(PredictorName = 'TAXI_PREDICTOR_WEEK_SUNDAY',
 ForecastHorizon = FORECAST_HORIZON,
 ForecastFrequency = FORECAST_FREQUENCY,
 DataConfig = {
 'DatasetGroupArn': dataset_group_arn
 },
 TimeAlignmentBoundary={
 "DayOfWeek":"SUNDAY"
 },
 ExplainPredictor = False)

sunday_predictor_arn = create_auto_predictor_response['PredictorArn']
print(f"Waiting for Sunday Predictor ARN {mask_arn(sunday_predictor_arn)} to become ACTIVE.\n\n")


#Wait on the predictors to complete and become active
status = util.wait(lambda: forecast.describe_auto_predictor(PredictorArn=friday_predictor_arn), sleep_duration)
status = util.wait(lambda: forecast.describe_auto_predictor(PredictorArn=sunday_predictor_arn), sleep_duration)


Waiting for Friday Predictor ARN arn:aws:forecast:us-east-1XXXXXXXXXXXXpredictor/TAXI_PREDICTOR_WEEK_FRIDAY_01G415VFPA7DRMEFTBWH4WM0M9 to become ACTIVE.
Waiting for Sunday Predictor ARN arn:aws:forecast:us-east-1XXXXXXXXXXXXpredictor/TAXI_PREDICTOR_WEEK_SUNDAY_01G415VFRXXYC865W93D3Y1CRZ to become ACTIVE.
CREATE_PENDING 
CREATE_IN_PROGRESS ............................................
ACTIVE 
CREATE_IN_PROGRESS .........
ACTIVE 


### Additional Examples

In [None]:
FORECAST_HORIZON = 3

# Create a predictor that starts monthly frequencies on the 15th day of the month
FORECAST_FREQUENCY = "M"

create_auto_predictor_response = \
 forecast.create_auto_predictor(PredictorName = PREDICTOR_NAME,
 ForecastHorizon = FORECAST_HORIZON,
 ForecastFrequency = FORECAST_FREQUENCY,
 DataConfig = {
 'DatasetGroupArn': dataset_group_arn
 },
 TimeAlignmentBoundary={
 "DayOfMonth": 15
 },
 ExplainPredictor = False)


# Create a predictor that starts each day at 9AM.
FORECAST_FREQUENCY = "D"

create_auto_predictor_response = \
 forecast.create_auto_predictor(PredictorName = PREDICTOR_NAME,
 ForecastHorizon = FORECAST_HORIZON,
 ForecastFrequency = FORECAST_FREQUENCY,
 DataConfig = {
 'DatasetGroupArn': dataset_group_arn
 },
 TimeAlignmentBoundary={
 "Hour": 9
 },
 ExplainPredictor = False)

# Step 4: Create forecasts

Here, a forecast is created for each predictor. These will be used to serve requests in Step 5.

In [12]:
create_forecast_response = \
 forecast.create_forecast(ForecastName='TAXI_FORECAST_WEEK_FRIDAY',
 PredictorArn=friday_predictor_arn)

friday_forecast_arn = create_forecast_response['ForecastArn']


create_forecast_response = \
 forecast.create_forecast(ForecastName='TAXI_FORECAST_WEEK_SUNDAY',
 PredictorArn=sunday_predictor_arn)

sunday_forecast_arn = create_forecast_response['ForecastArn']


print(f"Waiting for Friday Forecast with ARN {mask_arn(friday_forecast_arn)} to become ACTIVE.")
print(f"Waiting for Sunday Forecast with ARN {mask_arn(sunday_forecast_arn)} to become ACTIVE.")

status = util.wait(lambda: forecast.describe_forecast(ForecastArn=friday_forecast_arn), sleep_duration)
status = util.wait(lambda: forecast.describe_forecast(ForecastArn=sunday_forecast_arn), sleep_duration)


Waiting for Friday Forecast with ARN arn:aws:forecast:us-east-1XXXXXXXXXXXXforecast/TAXI_FORECAST_WEEK_FRIDAY to become ACTIVE.
Waiting for Sunday Forecast with ARN arn:aws:forecast:us-east-1XXXXXXXXXXXXforecast/TAXI_FORECAST_WEEK_SUNDAY to become ACTIVE.
CREATE_PENDING 
CREATE_IN_PROGRESS ..........
ACTIVE 
CREATE_IN_PROGRESS ....
ACTIVE 


# Step 5: View forecasted data

Below, the notebook shows how to use the Amazon Forecast query client to retrieve predictions through API for a named time-series. Below, queries against the Friday and Sunday forecast are performed.

Note how forecasted data points from the Sunday forecast have Timestamps aligned to Sunday. Friday forecasts align to a Friday timestamp. Calendars are provided to visual purposes only -- ease of cross-reference.

In [13]:
#inpect values for a specific taxi route
item_id='201'

### Sunday Forecasts

In [14]:
#Sunday Forecasts

print(calendar.month(2019, 2))

forecast_response = forecastquery.query_forecast(
 ForecastArn=sunday_forecast_arn,
 Filters={"item_id": item_id}
)

for i in forecast_response['Forecast']['Predictions']['p50']:
 print(i)

 February 2019
Mo Tu We Th Fr Sa Su
 1 2 3
 4 5 6 7 8 9 10
11 12 13 14 15 16 17
18 19 20 21 22 23 24
25 26 27 28

{'Timestamp': '2019-02-03T00:00:00', 'Value': 5.769572938646521}
{'Timestamp': '2019-02-10T00:00:00', 'Value': 5.977745991739137}
{'Timestamp': '2019-02-17T00:00:00', 'Value': 5.672434587428659}


### Friday Forecasts

In [15]:
#Friday Forecasts

print(calendar.month(2019, 2))

forecast_response = forecastquery.query_forecast(
 ForecastArn=friday_forecast_arn,
 Filters={"item_id": item_id}
)

for i in forecast_response['Forecast']['Predictions']['p50']:
 print(i)

 February 2019
Mo Tu We Th Fr Sa Su
 1 2 3
 4 5 6 7 8 9 10
11 12 13 14 15 16 17
18 19 20 21 22 23 24
25 26 27 28

{'Timestamp': '2019-02-01T00:00:00', 'Value': 7.176211290014157}
{'Timestamp': '2019-02-08T00:00:00', 'Value': 7.440115660652784}
{'Timestamp': '2019-02-15T00:00:00', 'Value': 6.5957757171386255}


# Step 6: Cleanup

You will need to allow a few minutes for each of these steps to complete.


In [None]:
forecast.delete_resource_tree(ResourceArn=dataset_group_arn)

Once the dataset group has been deleted (allow a few minutes), you may proceed. The following code will allow you to test and determine when the dataset group has been deleted. When you run this next cell, you may see your dataset group. Allow a couple minutes, and try again. Once your dataset is deleted, you may proceed to next step.

In [None]:
forecast.list_dataset_groups()

Delete dataset import jobs with TAXI_TIME_ALIGNMENT_BOUNDARY_DEMO in the job name.

In [None]:
response = forecast.list_dataset_import_jobs()

for i in response['DatasetImportJobs']:

 try:
 if i['DatasetImportJobArn'].index('TAXI_TIME_ALIGNMENT_BOUNDARY_DEMO'):
 print('Deleting',i['DatasetImportJobName'])
 forecast.delete_dataset_import_job(DatasetImportJobArn=i['DatasetImportJobArn'])
 except:
 pass

It will take a few minutes to delete the dataset import jobs. Once that is complete, the dataset can be deleted as follows in the next cell.

In [None]:
forecast.delete_dataset(DatasetArn=ts_dataset_arn)