{ "cells": [ { "cell_type": "markdown", "id": "76f10c5e", "metadata": { "tags": [] }, "source": [ "# Introducing Custom Time Frequency Support\n", "\n", "Prior to this feature release, customers were able to train a time-series model and produce forecasts against a specific set of time-intervals which Amazon Forecast refers to as ForecastFrequency. As described [here](https://docs.aws.amazon.com/forecast/latest/dg/howitworks-predictor.html) forecasting frequencies are in the form of time units that range from minutes, to hours, and up to a year.\n", "\n", "With this new feature release, we are extending the concept of frequency units to include a range of allowed values. This allows you to select a frequency unit together with a value. The following table explains the valid list of frequency units for each time-scale unit:\n", "\n", "|Code|Frequency units|Allowed values|\n", "|--|--|--|\n", "|MIN|Minute|1-59|\n", "|H|Hour|1-23|\n", "|D|Day|1-6|\n", "|W|Week|1-4|\n", "|M|Month|1-11|\n", "|Y|Year|1|\n", "\n", "Given this new model, you can create a wider range of forecasting intervals according to your business use case. When it comes to workforce planning, one customer may want to forecast at 8-hour shift intervals. In a financial or demand forecasting scenario, a business might want to produce quarterly forecasts. These scenarios can be achieved with codes of 8H and 3M, repsectively. Now that the complete set of intervals are known, you can choose something to fit your specific use-case. There are no API changes with this release; we simple allow more values in the existing API request.\n", "\n", "This notebooks provides an example of producing a forecast at a quarterly basis having data at the daily basis. In addition, with the same dataset, another short-term forecast is demonstrated by having seven 3-day forecasts. The use-case with the short-term forecast might be ordering perishable items every 3-days. This would help anticipate what the purchase orders would look like.\n", "\n", "As before, certain rules about forecast frequency still apply. The forecasting frequency should always be greater than or equal to the Data frequency provided for the Target Time series dataset if Related Timeseries data is not present. Also, if RTS is present, the data frequency of RTS dataset should match the forecast frequency. \n", "\n", "\n", "## Table of Contents\n", "* [Pre-requisites](#prerequisites)\n", "* Step 1: [Import your data](#import)\n", "* Step 2: [Train a predictor](#predictor)\n", "* Step 3: [Generate forecasts](#forecast)\n", "* Step 4: [Query/View the forecasts](#query)\n", "* [Clean-up](#cleanup)" ] }, { "cell_type": "markdown", "id": "16064cb0", "metadata": { "tags": [] }, "source": [ "## Pre-requisites \n", "Before we get started, lets set up the notebook environment, the AWS SDK client for Amazon Forecast and IAM Role used by Amazon Forecast to access your data." ] }, { "cell_type": "markdown", "id": "7186d82e", "metadata": {}, "source": [ "#### Setup Environment" ] }, { "cell_type": "code", "execution_count": 1, "id": "57c13d48", "metadata": {}, "outputs": [], "source": [ "%%capture --no-stderr setup\n", "\n", "!pip install pandas s3fs matplotlib ipywidgets\n", "!pip install boto3 --upgrade\n", "\n", "%reload_ext autoreload" ] }, { "cell_type": "markdown", "id": "6f68a712", "metadata": {}, "source": [ "#### Setup Imports" ] }, { "cell_type": "code", "execution_count": 2, "id": "8989adfa", "metadata": {}, "outputs": [], "source": [ "import sys\n", "import os\n", "\n", "sys.path.insert( 0, os.path.abspath(\"../../common\") )\n", "\n", "import json\n", "import util\n", "import boto3\n", "import s3fs\n", "import pandas as pd" ] }, { "cell_type": "markdown", "id": "51b46743", "metadata": {}, "source": [ "Configure the S3 bucket name and region name for this lesson.\n", "\n", "- If you don't have an S3 bucket, create it first on S3.\n", "- Although we have set the region to us-west-2 as a default value below, you can choose any of the regions that the service is available in." ] }, { "cell_type": "code", "execution_count": 3, "id": "8caa19c6", "metadata": {}, "outputs": [], "source": [ "bucket_name = 'input your existing S3 bucket name'\n", "region = 'us-west-2'" ] }, { "cell_type": "markdown", "id": "c01dfe48", "metadata": {}, "source": [ "#### Create an instance of AWS SDK client for Amazon Forecast" ] }, { "cell_type": "code", "execution_count": 4, "id": "d19f310a", "metadata": {}, "outputs": [], "source": [ "session = boto3.Session(region_name=region)\n", "s3 = session.client('s3')\n", "forecast = session.client(service_name='forecast') \n", "forecastquery = session.client(service_name='forecastquery')" ] }, { "cell_type": "markdown", "id": "1b925cb9", "metadata": {}, "source": [ "#### Get IAM Role Amazon Forecast will use to access your data" ] }, { "cell_type": "code", "execution_count": 5, "id": "c31a719b", "metadata": {}, "outputs": [], "source": [ "from sagemaker import get_execution_role\n", "role_arn = get_execution_role()" ] }, { "cell_type": "markdown", "id": "a3c20acb", "metadata": {}, "source": [ "## Step 1: Import your data. \n", "\n", "In this step, we will create a **Dataset** and **Import** the data from S3 to Amazon Forecast. To train a Predictor we will need a **DatasetGroup** that groups the input **Datasets**. So, we will end this step by creating a **DatasetGroup** with the imported **Dataset**." ] }, { "cell_type": "markdown", "id": "d32fdd1f", "metadata": {}, "source": [ "#### Peek at the data and upload it to S3.\n", "\n", "Here, we will view the dataset locally, then upload the file to Amazon S3. Amazon Forecast consumes input data from S3.\n", " \n", "A sample [Target Time Series](https://github.com/aws-samples/amazon-forecast-samples/blob/main/library/content/TargetTimeSeries.md) (TTS) is provided. Please visit the links here to learn more about target and related time series." ] }, { "cell_type": "code", "execution_count": 6, "id": "ce29d43a", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
timestampitem_idtarget_value
02017-12-01 00:00:00427
12017-12-01 00:00:00736
22017-12-01 00:00:00102
32017-12-01 00:00:00121
42017-12-01 00:00:001361
\n", "
" ], "text/plain": [ " timestamp item_id target_value\n", "0 2017-12-01 00:00:00 4 27\n", "1 2017-12-01 00:00:00 7 36\n", "2 2017-12-01 00:00:00 10 2\n", "3 2017-12-01 00:00:00 12 1\n", "4 2017-12-01 00:00:00 13 61" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_tts = pd.read_csv('./data/sample_demand.csv', low_memory=False)\n", "df_tts.head(5)" ] }, { "cell_type": "markdown", "id": "cbe3fc7a", "metadata": {}, "source": [ "Upload this file to Amazon S3" ] }, { "cell_type": "code", "execution_count": 7, "id": "0c1d9d3b", "metadata": {}, "outputs": [], "source": [ "project = \"custom_frequency\"\n", "\n", "key_tts = \"%s/sample_demand.csv\" % project\n", "\n", "s3.upload_file( Filename=\"./data/sample_demand.csv\", Bucket=bucket_name, Key=key_tts )\n", "\n", "s3_data_path_tts = \"s3://\" + bucket_name + \"/\" + key_tts" ] }, { "cell_type": "markdown", "id": "6cf1f638", "metadata": {}, "source": [ "#### Creating the Dataset" ] }, { "cell_type": "code", "execution_count": 8, "id": "5bfb9aa6", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "The dataset is now ACTIVE.\n" ] } ], "source": [ "DATASET_FREQUENCY = \"H\" # H for hourly\n", "\n", "TS_DATASET_NAME = \"CUSTOM_FREQUENCY_TS\"\n", "TS_SCHEMA = {\n", " \"Attributes\":[\n", " {\n", " \"AttributeName\":\"timestamp\",\n", " \"AttributeType\":\"timestamp\"\n", " },\n", " {\n", " \"AttributeName\":\"item_id\",\n", " \"AttributeType\":\"string\"\n", " },\n", " {\n", " \"AttributeName\":\"target_value\",\n", " \"AttributeType\":\"integer\"\n", " }\n", " ]\n", "}\n", "\n", "create_dataset_response = forecast.create_dataset(Domain=\"CUSTOM\",\n", " DatasetType='TARGET_TIME_SERIES',\n", " DatasetName=TS_DATASET_NAME,\n", " DataFrequency=DATASET_FREQUENCY,\n", " Schema=TS_SCHEMA)\n", "\n", "ts_dataset_arn = create_dataset_response['DatasetArn']\n", "describe_dataset_response = forecast.describe_dataset(DatasetArn=ts_dataset_arn)\n", "\n", "print(f\"The dataset is now {describe_dataset_response['Status']}.\")" ] }, { "cell_type": "markdown", "id": "5aa10c42", "metadata": {}, "source": [ "#### Importing the Dataset" ] }, { "cell_type": "code", "execution_count": 9, "id": "20eb4170", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Waiting for Dataset Import Job with to become ACTIVE. This process could take 5-10 minutes.\n", "\n", "Current Status:\n", "CREATE_PENDING .\n", "CREATE_IN_PROGRESS .........................................................................\n", "ACTIVE \n", "\n", "\n", "The Dataset Import Job with is now ACTIVE.\n" ] } ], "source": [ "TIMESTAMP_FORMAT = \"yyyy-MM-dd HH:mm:ss\"\n", "TS_IMPORT_JOB_NAME = \"PRODUCT_TTS_IMPORT\"\n", "\n", "ts_dataset_import_job_response = \\\n", " forecast.create_dataset_import_job(DatasetImportJobName=TS_IMPORT_JOB_NAME,\n", " DatasetArn=ts_dataset_arn,\n", " DataSource= {\n", " \"S3Config\" : {\n", " \"Path\": s3_data_path_tts,\n", " \"RoleArn\": role_arn\n", " } \n", " },\n", " TimestampFormat=TIMESTAMP_FORMAT)\n", "\n", "ts_dataset_import_job_arn = ts_dataset_import_job_response['DatasetImportJobArn']\n", "\n", "print(f\"Waiting for Dataset Import Job with to become ACTIVE. This process could take 5-10 minutes.\\n\\nCurrent Status:\")\n", "\n", "status = util.wait(lambda: forecast.describe_dataset_import_job(DatasetImportJobArn=ts_dataset_import_job_arn))" ] }, { "cell_type": "markdown", "id": "38018caa", "metadata": {}, "source": [ "#### Creating a DatasetGroup" ] }, { "cell_type": "code", "execution_count": 10, "id": "9eb6aca5", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "The DatasetGroup is now ACTIVE.\n" ] } ], "source": [ "DATASET_GROUP_NAME = \"CUSTOM_FREQUENCY_DEMO\"\n", "DATASET_ARNS = [ts_dataset_arn]\n", "\n", "create_dataset_group_response = \\\n", " forecast.create_dataset_group(Domain=\"CUSTOM\",\n", " DatasetGroupName=DATASET_GROUP_NAME,\n", " DatasetArns=DATASET_ARNS)\n", "\n", "dataset_group_arn = create_dataset_group_response['DatasetGroupArn']\n", "describe_dataset_group_response = forecast.describe_dataset_group(DatasetGroupArn=dataset_group_arn)\n", "\n", "print(f\"The DatasetGroup is now {describe_dataset_group_response['Status']}.\")" ] }, { "cell_type": "markdown", "id": "54fc9ec5", "metadata": {}, "source": [ "## Step 2: Train a predictor \n", "\n", "In this step, we will create a **Predictor** using the **DatasetGroup** that was created above. After creating the predictor, we will review the accuracy obtained through the backtesting process to get a quantitative understanding of the performance of the predictor." ] }, { "cell_type": "markdown", "id": "94157f74", "metadata": {}, "source": [ "#### Train a predictor to help generate short term purchase orders\n", "\n", "In this example a \"3-day\" order is created to meet the demand of three days. In addition, the request is for 7 series of these, giving a total of 21 day coverage." ] }, { "cell_type": "code", "execution_count": 12, "id": "c6979c33", "metadata": {}, "outputs": [], "source": [ "PREDICTOR_NAME = \"PURCHASE_ORDER_PREDICTOR\"\n", "FORECAST_HORIZON = 7\n", "FORECAST_FREQUENCY = \"3D\"\n", "\n", "create_auto_predictor_response = \\\n", " forecast.create_auto_predictor(PredictorName = PREDICTOR_NAME,\n", " ForecastHorizon = FORECAST_HORIZON,\n", " ForecastFrequency = FORECAST_FREQUENCY,\n", " DataConfig = {\n", " 'DatasetGroupArn': dataset_group_arn\n", " }\n", " )\n", "\n", "po_predictor_arn = create_auto_predictor_response['PredictorArn']" ] }, { "cell_type": "markdown", "id": "4ced516a", "metadata": {}, "source": [ "#### Train a predictor to help anticipate demand over quarters\n", "\n", "In this example a \"3-month\" forecast is created to show estimated demand for quarter financial projection. " ] }, { "cell_type": "code", "execution_count": 13, "id": "1d9f13ae", "metadata": {}, "outputs": [], "source": [ "PREDICTOR_NAME = \"QUARTER_PREDICTOR\"\n", "FORECAST_HORIZON = 1\n", "FORECAST_FREQUENCY = \"3M\"\n", "\n", "create_auto_predictor_response = \\\n", " forecast.create_auto_predictor(PredictorName = PREDICTOR_NAME,\n", " ForecastHorizon = FORECAST_HORIZON,\n", " ForecastFrequency = FORECAST_FREQUENCY,\n", " DataConfig = {\n", " 'DatasetGroupArn': dataset_group_arn\n", " }\n", " )\n", "\n", "quarter_predictor_arn = create_auto_predictor_response['PredictorArn']" ] }, { "cell_type": "markdown", "id": "cbfe05c5", "metadata": {}, "source": [ "Poll for the two predictors, training in parallel, to complete. After both are complete, the workflow can advance." ] }, { "cell_type": "code", "execution_count": 14, "id": "1b63a613", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "CREATE_IN_PROGRESS ......................................................................................................................................................................................................................................................................\n", "ACTIVE \n", "ACTIVE \n" ] } ], "source": [ "describe_auto_predictor_response = util.wait(lambda: forecast.describe_auto_predictor(PredictorArn=po_predictor_arn))\n", "describe_auto_predictor_response = util.wait(lambda: forecast.describe_auto_predictor(PredictorArn=quarter_predictor_arn))" ] }, { "cell_type": "markdown", "id": "46673464", "metadata": {}, "source": [ "## Step 3: Generate forecasts \n", "Finally, we will generate the forecasts using the above two predictors. In reality, you may only need one forecast, this is just a teaching example showing how one dataset can be forecasted at multiple, custom time-frequencies." ] }, { "cell_type": "markdown", "id": "12ddf54c", "metadata": {}, "source": [ "#### Generate a forecast from the 3-day Purchase Order Model\n", "\n", "Here, the ARN for the 3-day (3D) purchase order model is supplied. The imported dataset along with the predictor model is used to produce the requested forecast." ] }, { "cell_type": "code", "execution_count": 15, "id": "1a1a786a", "metadata": {}, "outputs": [], "source": [ "FORECAST_NAME = \"PURCHASE_ORDER_FORECAST\"\n", "\n", "create_forecast_response = \\\n", " forecast.create_forecast(ForecastName=FORECAST_NAME,\n", " PredictorArn=po_predictor_arn)\n", "\n", "po_forecast_arn = create_forecast_response['ForecastArn']" ] }, { "cell_type": "markdown", "id": "72bdff95", "metadata": {}, "source": [ "#### Generate a forecast from the Quarter Model\n", "\n", "Here, the ARN for the quarter (3M) model is supplied." ] }, { "cell_type": "code", "execution_count": 16, "id": "4a5e1900", "metadata": {}, "outputs": [], "source": [ "FORECAST_NAME = \"QUARTER_FORECAST\"\n", "\n", "create_forecast_response = \\\n", " forecast.create_forecast(ForecastName=FORECAST_NAME,\n", " PredictorArn=quarter_predictor_arn)\n", "\n", "quarter_forecast_arn = create_forecast_response['ForecastArn']" ] }, { "cell_type": "markdown", "id": "a2b27ff6", "metadata": {}, "source": [ "Poll for the two forecasts to complete. " ] }, { "cell_type": "code", "execution_count": 17, "id": "6d942bac", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "CREATE_PENDING .\n", "CREATE_IN_PROGRESS ....................................................................................................\n", "ACTIVE \n", "ACTIVE \n" ] } ], "source": [ "status = util.wait(lambda: forecast.describe_forecast(ForecastArn=po_forecast_arn))\n", "status = util.wait(lambda: forecast.describe_forecast(ForecastArn=quarter_forecast_arn))\n" ] }, { "cell_type": "markdown", "id": "bf9e1600", "metadata": {}, "source": [ "## Step 4: Query forecasts \n", "\n", "In this step, a lightweight API is made for a couple sample items to view the forecasted numbers. Observe in the dates returned how they are spaced out according the the custom frequency and how the demand value is in alignment with the daily average -- as a general litmus test or rule of thumb." ] }, { "cell_type": "code", "execution_count": 18, "id": "356f18de", "metadata": {}, "outputs": [], "source": [ "item_id=\"1\"" ] }, { "cell_type": "markdown", "id": "8b1a6ba3", "metadata": {}, "source": [ "Using the Amazon Forecast Query API, a request for predictions for the named item_id is made using the purchase order forecast. We expect to see predictions every 3 days which holds true in the dataframe shown below." ] }, { "cell_type": "code", "execution_count": 19, "id": "67f82207", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
TimestampValue
02019-02-03T00:00:0038.617483
12019-02-06T00:00:0043.642546
22019-02-09T00:00:0043.977778
32019-02-12T00:00:0043.579142
42019-02-15T00:00:0049.895820
52019-02-18T00:00:0043.518435
62019-02-21T00:00:0049.395246
\n", "
" ], "text/plain": [ " Timestamp Value\n", "0 2019-02-03T00:00:00 38.617483\n", "1 2019-02-06T00:00:00 43.642546\n", "2 2019-02-09T00:00:00 43.977778\n", "3 2019-02-12T00:00:00 43.579142\n", "4 2019-02-15T00:00:00 49.895820\n", "5 2019-02-18T00:00:00 43.518435\n", "6 2019-02-21T00:00:00 49.395246" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "forecast_response = forecastquery.query_forecast(\n", " ForecastArn=po_forecast_arn,\n", " Filters={\"item_id\": item_id}\n", ")\n", "\n", "forecast_p50_df = pd.DataFrame.from_dict(forecast_response['Forecast']['Predictions']['p50'])\n", "\n", "forecast_p50_df" ] }, { "cell_type": "markdown", "id": "712c3cd7", "metadata": {}, "source": [ "Next, a request for predictions is made using the quarterly forecast. We expect to an aggregate prediction for the quarter which holds true and is shown in the dataframe below. This is a synthethic dataset, your use case will differ. The purpose of this example is to show you how to create combinations of time-frequency that make sense for your business and obtain the predicted outcomes." ] }, { "cell_type": "code", "execution_count": 20, "id": "7c41e66f", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
TimestampValue
02019-03-01T00:00:00677.744873
\n", "
" ], "text/plain": [ " Timestamp Value\n", "0 2019-03-01T00:00:00 677.744873" ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "forecast_response = forecastquery.query_forecast(\n", " ForecastArn=quarter_forecast_arn,\n", " Filters={\"item_id\": item_id}\n", ")\n", "\n", "forecast_p50_df = pd.DataFrame.from_dict(forecast_response['Forecast']['Predictions']['p50'])\n", "\n", "forecast_p50_df" ] }, { "cell_type": "markdown", "id": "27350b01", "metadata": {}, "source": [ "## Clean-up \n", "Uncomment the code section to delete all resources that were created in this notebook." ] }, { "cell_type": "code", "execution_count": null, "id": "1d8bc571", "metadata": {}, "outputs": [], "source": [ "forecast.delete_resource_tree(ResourceArn = dataset_group_arn)\n", "forecast.delete_resource_tree(ResourceArn = ts_dataset_arn)" ] } ], "metadata": { "kernelspec": { "display_name": "conda_python3", "language": "python", "name": "conda_python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.13" } }, "nbformat": 4, "nbformat_minor": 5 }