{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Forecast with Cold Start Items\n", "\n", "Consider the situation where a set of related new items are introduced into your catalog. This could be due to bringing a brand new product to market, offering a product for the first time that was in the market and sold by others or taking a product into a new region.\n", "\n", "Despite having no historical data for the new product available, one needs to forecast the future values for those items.\n", "\n", "\n", "In such a situation, when no demand history is available for items, this scenario is coined as a \"cold-start problem\". Amazon Forecast is able to handle this situation easily, when you use the [Auto Predictor](https://github.com/aws-samples/amazon-forecast-samples/blob/main/library/content/AutoPredictor.md).\n", "\n", "<img src=\"../../common/images/amazon_forecast.png\">\n", "\n", "\n", "# Introduction\n", "\n", "In this notebook, we walk through the process of generating forecasts for cold start items. An important note -- generating predictions for cold-start items does require an [Item Metadata](https://github.com/aws-samples/amazon-forecast-samples/blob/main/library/content/ItemMetadata.md) dataset. The Item Metadata contains categorical data about existing established items as well as your new cold-start items. Amazon Forecast considers the categorical features of your cold-start items and finds similar items with established historical demand to help estimate demand for the new items.\n", "\n", "In this example, we are using a feature called \"type\" in the item metadata file where there is a near even distribution of items across types A, B, C, and D. A new cold-start item with type \"D\" will tend to have it's future demand created by true historical values from other \"D\" typed items.\n", "\n", "Take care to engineer your item metadata file, relative to the target time series file. Any items in the item metadata, not in the target time series will have cold-start forecasts generated. We recommend you take care to ensure no products that are end-of-life are placed in the item metadata file, especially if they have had no activity during the target time series timespan, to ensure forecasts are not generated on dormant products.\n", "\n", "To correctly identify your cold start product, ensure that the item ID of your cold start product is entered as a row in your item metadata file and that it is not contained in the target time series file. For multiple cold start products, enter each product item ID as a separate row in the item metadata file. If you don’t yet have an item ID for your cold start product, you can use any alphanumeric combination less than 64 characters that isn’t already representative of another product in your dataset.\n", "\n", "\n", "# Table of Contents\n", "\n", "* Step 0: [Setting up](#setup)\n", "* Step 1: [Preparing the Datasets prior to Amazon Forecast](#prepare)\n", "* Step 2: [Preparing the Datasets inside Amazon Forecast](#import)\n", " * Step 2a: [Creating a Dataset Group](#create)\n", " * Step 2b: [Creating a Target Dataset](#target)\n", " * Step 2c: [Creating a Item Meta Information Dataset](#related)\n", " * Step 2d: [Update the Dataset Group](#update)\n", " * Step 2e: [Creating a Target Time Series Dataset Import Job](#targetImport)\n", " * Step 2f: [Creating a Item Meta Information Dataset Import Job](#relatedImport)\n", " * Step 2g: [Wait on Dataset Import Jobs](#ImportWait)\n", "* Step 3: [Create the Predictor](#algo)\n", "* Step 4: [Create a Forecast](#forecast)\n", "* Step 5: [Querying the Forecasts](#query)\n", "* Step 6: [Exporting the Forecasts](#export)\n", "* Step 7: [Clearning up your Resources](#cleanup)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Step 0: First let us setup Amazon Forecast<a class=\"anchor\" id=\"setup\">\n", "\n", "This section sets up the permissions and relevant endpoints." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Requirement already satisfied: boto3 in /home/ec2-user/anaconda3/envs/python3/lib/python3.6/site-packages (1.23.10)\n", "Requirement already satisfied: botocore<1.27.0,>=1.26.10 in /home/ec2-user/anaconda3/envs/python3/lib/python3.6/site-packages (from boto3) (1.26.10)\n", "Requirement already satisfied: s3transfer<0.6.0,>=0.5.0 in /home/ec2-user/anaconda3/envs/python3/lib/python3.6/site-packages (from boto3) (0.5.2)\n", "Requirement already satisfied: jmespath<2.0.0,>=0.7.1 in /home/ec2-user/anaconda3/envs/python3/lib/python3.6/site-packages (from boto3) (0.10.0)\n", "Requirement already satisfied: urllib3<1.27,>=1.25.4 in /home/ec2-user/anaconda3/envs/python3/lib/python3.6/site-packages (from botocore<1.27.0,>=1.26.10->boto3) (1.26.9)\n", "Requirement already satisfied: python-dateutil<3.0.0,>=2.1 in /home/ec2-user/anaconda3/envs/python3/lib/python3.6/site-packages (from botocore<1.27.0,>=1.26.10->boto3) (2.8.2)\n", "Requirement already satisfied: six>=1.5 in /home/ec2-user/anaconda3/envs/python3/lib/python3.6/site-packages (from python-dateutil<3.0.0,>=2.1->botocore<1.27.0,>=1.26.10->boto3) (1.16.0)\n" ] } ], "source": [ "!pip install boto3 --upgrade" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "import sys\n", "import os\n", "\n", "import boto3\n", "import pandas as pd\n", "import matplotlib.pyplot as plt\n", "\n", "# importing forecast notebook utility from notebooks/common directory\n", "sys.path.insert( 0, os.path.abspath(\"../../common\") )\n", "import util\n", "\n", "plt.rcParams['figure.figsize'] = (15.0, 5.0)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Configure the S3 bucket name and region name for this lesson.\n", "\n", "- If you don't have an S3 bucket, create it first on S3.\n", "- Although we have set the region to us-west-2 as a default value below, you can choose any of the regions that the service is available in." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "9d8c463d738349109f0f21941a921b10", "version_major": 2, "version_minor": 0 }, "text/plain": [ "Text(value='', description='bucketName', placeholder='input your S3 bucket name')" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "eed74a5485a74c34b6e7d23bb0207935", "version_major": 2, "version_minor": 0 }, "text/plain": [ "Text(value='us-west-2', description='region', placeholder='input region name.')" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "text_widget_bucket = util.create_text_widget( \"bucketName\", \"input your S3 bucket name\" )\n", "text_widget_region = util.create_text_widget( \"region\", \"input region name.\", default_value=\"us-west-2\" )" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "bucketName = text_widget_bucket.value\n", "assert bucketName, \"bucketName not set.\"\n", "\n", "region = text_widget_region.value\n", "assert region, \"region not set.\"" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "session = boto3.Session(region_name=region) \n", "forecast = session.client(service_name='forecast') \n", "forecast_query = session.client(service_name='forecastquery')" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "from sagemaker import get_execution_role\n", "role_arn = get_execution_role()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Step 1: Preparing the Datasets<a class=\"anchor\" id=\"prepare\">\n", " \n", " \n", "Here we use a synthetic dataset based on [electricity]() dataset, which consists of the hourly time series for 370 households (with item id 0 to 369). \n", "\n", "In this hypothetical senario, our goal is to generate forecasts for 4 new customers with item id 370 to 373. " ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Extracting data/test.csv.gz to data/test.csv\n", "Done.\n" ] } ], "source": [ "zipLocalFilePath = \"data/test.csv.gz\"\n", "localFilePath = \"data/test.csv\"\n", "\n", "util.extract_gz( zipLocalFilePath, localFilePath )" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/html": [ "<div>\n", "<style scoped>\n", " .dataframe tbody tr th:only-of-type {\n", " vertical-align: middle;\n", " }\n", "\n", " .dataframe tbody tr th {\n", " vertical-align: top;\n", " }\n", "\n", " .dataframe thead th {\n", " text-align: right;\n", " }\n", "</style>\n", "<table border=\"1\" class=\"dataframe\">\n", " <thead>\n", " <tr style=\"text-align: right;\">\n", " <th></th>\n", " <th>timestamp</th>\n", " <th>target_value</th>\n", " <th>item_id</th>\n", " </tr>\n", " </thead>\n", " <tbody>\n", " <tr>\n", " <th>0</th>\n", " <td>2014-01-01 01:00:00</td>\n", " <td>2.53807106598985</td>\n", " <td>client_0</td>\n", " </tr>\n", " <tr>\n", " <th>1</th>\n", " <td>2014-01-01 01:00:00</td>\n", " <td>23.648648648648624</td>\n", " <td>client_1</td>\n", " </tr>\n", " <tr>\n", " <th>2</th>\n", " <td>2014-01-01 01:00:00</td>\n", " <td>0.0</td>\n", " <td>client_2</td>\n", " </tr>\n", " <tr>\n", " <th>3</th>\n", " <td>2014-01-01 01:00:00</td>\n", " <td>144.81707317073176</td>\n", " <td>client_3</td>\n", " </tr>\n", " <tr>\n", " <th>4</th>\n", " <td>2014-01-01 01:00:00</td>\n", " <td>75.0</td>\n", " <td>client_4</td>\n", " </tr>\n", " </tbody>\n", "</table>\n", "</div>" ], "text/plain": [ " timestamp target_value item_id\n", "0 2014-01-01 01:00:00 2.53807106598985 client_0\n", "1 2014-01-01 01:00:00 23.648648648648624 client_1\n", "2 2014-01-01 01:00:00 0.0 client_2\n", "3 2014-01-01 01:00:00 144.81707317073176 client_3\n", "4 2014-01-01 01:00:00 75.0 client_4" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "tdf = pd.read_csv(zipLocalFilePath, dtype = object)\n", "tdf.head()" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [], "source": [ "tdf['target_value'] = tdf['target_value'].astype('float')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let us plot one time series first." ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "<Figure size 1080x576 with 1 Axes>" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "tdf[tdf['item_id'] == 'client_1'][-24*7*2:]\\\n", " .plot(x='timestamp', y='target_value', figsize=(15, 8)); " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Next, we use an item metadata dataset that contains the information for both the non-cold start items (client 0 to 369) and cold start items (client 370 to 373). We call this meta information \"type\" in this specific case. Only one categorical feature is used in this demo, but in practice one normally has multiple categorical features.\n", "\n", "Note that for cold start items where little to none demand history exists, the algorithm can only \"transfer\" information from the existing items to the new ones through the meta information. Therefore, having informative and high quality meta data is the key for a good cold-start forecast. " ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "text/html": [ "<div>\n", "<style scoped>\n", " .dataframe tbody tr th:only-of-type {\n", " vertical-align: middle;\n", " }\n", "\n", " .dataframe tbody tr th {\n", " vertical-align: top;\n", " }\n", "\n", " .dataframe thead th {\n", " text-align: right;\n", " }\n", "</style>\n", "<table border=\"1\" class=\"dataframe\">\n", " <thead>\n", " <tr style=\"text-align: right;\">\n", " <th></th>\n", " <th>item_id</th>\n", " <th>type</th>\n", " </tr>\n", " </thead>\n", " <tbody>\n", " <tr>\n", " <th>369</th>\n", " <td>client_369</td>\n", " <td>D</td>\n", " </tr>\n", " <tr>\n", " <th>370</th>\n", " <td>client_370</td>\n", " <td>A</td>\n", " </tr>\n", " <tr>\n", " <th>371</th>\n", " <td>client_371</td>\n", " <td>B</td>\n", " </tr>\n", " <tr>\n", " <th>372</th>\n", " <td>client_372</td>\n", " <td>C</td>\n", " </tr>\n", " <tr>\n", " <th>373</th>\n", " <td>client_373</td>\n", " <td>D</td>\n", " </tr>\n", " </tbody>\n", "</table>\n", "</div>" ], "text/plain": [ " item_id type\n", "369 client_369 D\n", "370 client_370 A\n", "371 client_371 B\n", "372 client_372 C\n", "373 client_373 D" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# this metadata contains the cold start items' metadata as well.\n", "localItemMetaDataFilePath = \"data/itemMetaData.csv\"\n", "imdf = pd.read_csv(localItemMetaDataFilePath, dtype = object)\n", "\n", "imdf.tail()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "And the following figure shows the histogram of the category \"type.\"" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAA20AAAEsCAYAAABQaVsqAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuNCwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8QVMy6AAAACXBIWXMAAAsTAAALEwEAmpwYAAAPeklEQVR4nO3db4xld13H8c/XLsi/GLd2WgslLmpBK4lCVkCrRFiJGNTWaGOJ4MYU+wQUiIkuPgAfiDbGGIwQkhXQNRJIRUgbSZS6QIwPRKZAtGXBNlBLZWmnGpUgAQtfH+wlDMuWtnNner8z83o9Off8zr1zvk9O0/ecs3equwMAAMBM37TqAQAAALh/og0AAGAw0QYAADCYaAMAABhMtAEAAAx2YNUDJMkFF1zQhw4dWvUYAAAAK3HzzTff291r5zo2ItoOHTqU9fX1VY8BAACwElX1b/d3zOORAAAAg4k2AACAwUQbAADAYKINAABgMNEGAAAwmGgDAAAYTLQBAAAMJtoAAAAGE20AAACDiTYAAIDBRBsAAMBgB1Y9wF516Ni7Vz0C38Ad171g1SMAAMCD4k4bAADAYA8YbVX1lqq6p6pu2bR2flXdVFW3LbYHNx17VVXdXlUfr6qf2KnBAQAA9oMH83jknyV5fZI/37R2LMnJ7r6uqo4t9n+zqi5LcnWS70vy+CR/V1VP7u4vbe/YwF7nEeP5PGY8m2toPtcQ8GA94J227v77JP951vIVSU4sXp9IcuWm9bd39xe6+5NJbk/yjO0ZFQAAYP/Z6r9pu6i7TyfJYnvhYv0JST616X13Lda+TlVdW1XrVbW+sbGxxTEAAAD2tu3+IpI6x1qf643dfby7D3f34bW1tW0eAwAAYG/YarTdXVUXJ8lie89i/a4kT9z0vkuSfHrr4wEAAOxvW422G5McXbw+muSGTetXV9U3V9WTklya5J+WGxEAAGD/esBvj6yqtyX5sSQXVNVdSV6T5Lok11fVNUnuTHJVknT3rVV1fZKPJrkvyUt9cyQAAMDWPWC0dfcL7+fQkft5/2uTvHaZoQAAADhju7+IBAAAgG0k2gAAAAYTbQAAAIOJNgAAgMFEGwAAwGCiDQAAYDDRBgAAMJhoAwAAGEy0AQAADCbaAAAABhNtAAAAg4k2AACAwUQbAADAYKINAABgMNEGAAAwmGgDAAAYTLQBAAAMJtoAAAAGE20AAACDiTYAAIDBRBsAAMBgog0AAGAw0QYAADCYaAMAABhMtAEAAAwm2gAAAAYTbQAAAIOJNgAAgMFEGwAAwGCiDQAAYDDRBgAAMJhoAwAAGEy0AQAADCbaAAAABhNtAAAAg4k2AACAwUQbAADAYKINAABgMNEGAAAwmGgDAAAYTLQBAAAMJtoAAAAGWyraquqVVXVrVd1SVW+rqkdV1flVdVNV3bbYHtyuYQEAAPabLUdbVT0hya8lOdzdT01yXpKrkxxLcrK7L01ycrEPAADAFiz7eOSBJI+uqgNJHpPk00muSHJicfxEkiuXPAcAAMC+teVo6+5/T/IHSe5McjrJf3f3e5Jc1N2nF+85neTCc32+qq6tqvWqWt/Y2NjqGAAAAHvaMo9HHsyZu2pPSvL4JI+tqhc92M939/HuPtzdh9fW1rY6BgAAwJ62zOORP57kk9290d3/l+SdSX44yd1VdXGSLLb3LD8mAADA/rRMtN2Z5FlV9ZiqqiRHkpxKcmOSo4v3HE1yw3IjAgAA7F8HtvrB7v5AVb0jyYeS3Jfkw0mOJ3lckuur6pqcCburtmNQAACA/WjL0ZYk3f2aJK85a/kLOXPXDQAAgCUt+5X/AAAA7CDRBgAAMJhoAwAAGEy0AQAADCbaAAAABhNtAAAAg4k2AACAwUQbAADAYKINAABgMNEGAAAwmGgDAAAYTLQBAAAMJtoAAAAGE20AAACDiTYAAIDBRBsAAMBgog0AAGAw0QYAADCYaAMAABjswKoHAACAaQ4de/eqR+AB3HHdC1Y9wsPGnTYAAIDBRBsAAMBgog0AAGAw0QYAADCYaAMAABhMtAEAAAwm2gAAAAYTbQAAAIOJNgAAgMFEGwAAwGCiDQAAYDDRBgAAMJhoAwAAGEy0AQAADCbaAAAABhNtAAAAg4k2AACAwUQbAADAYKINAABgMNEGAAAwmGgDAAAYTLQBAAAMtlS0VdW3VtU7qupjVXWqqn6oqs6vqpuq6rbF9uB2DQsAALDfLHun7Y+S/E13f0+S709yKsmxJCe7+9IkJxf7AAAAbMGWo62qviXJs5O8OUm6+4vd/V9JrkhyYvG2E0muXG5EAACA/WuZO23fmWQjyZ9W1Yer6k1V9dgkF3X36SRZbC8814er6tqqWq+q9Y2NjSXGAAAA2LuWibYDSZ6e5I3d/bQkn8tDeBSyu4939+HuPry2trbEGAAAAHvXMtF2V5K7uvsDi/135EzE3V1VFyfJYnvPciMCAADsX1uOtu7+TJJPVdVTFktHknw0yY1Jji7Wjia5YakJAQAA9rEDS37+V5O8taoemeQTSX45Z0Lw+qq6JsmdSa5a8hwAAAD71lLR1t0fSXL4HIeOLPNzAQAAOGPZv9MGAADADhJtAAAAg4k2AACAwUQbAADAYKINAABgMNEGAAAwmGgDAAAYTLQBAAAMJtoAAAAGE20AAACDiTYAAIDBRBsAAMBgog0AAGAw0QYAADCYaAMAABhMtAEAAAwm2gAAAAYTbQAAAIOJNgAAgMFEGwAAwGCiDQAAYDDRBgAAMJhoAwAAGEy0AQAADCbaAAAABhNtAAAAg4k2AACAwUQbAADAYKINAABgMNEGAAAwmGgDAAAYTLQBAAAMJtoAAAAGE20AAACDiTYAAIDBRBsAAMBgog0AAGAw0QYAADCYaAMAABhMtAEAAAwm2gAAAAZbOtqq6ryq+nBV/fVi//yquqmqbltsDy4/JgAAwP60HXfaXp7k1Kb9Y0lOdvelSU4u9gEAANiCpaKtqi5J8oIkb9q0fEWSE4vXJ5Jcucw5AAAA9rNl77S9LslvJPnyprWLuvt0kiy2Fy55DgAAgH1ry9FWVT+V5J7uvnmLn7+2qtaran1jY2OrYwAAAOxpy9xpuzzJz1TVHUnenuS5VfUXSe6uqouTZLG951wf7u7j3X24uw+vra0tMQYAAMDeteVo6+5Xdfcl3X0oydVJ3tvdL0pyY5Kji7cdTXLD0lMCAADsUzvxd9quS/K8qrotyfMW+wAAAGzBge34Id39/iTvX7z+jyRHtuPnAgAA7Hc7cacNAACAbSLaAAAABhNtAAAAg4k2AACAwUQbAADAYKINAABgMNEGAAAwmGgDAAAYTLQBAAAMJtoAAAAGE20AAACDiTYAAIDBRBsAAMBgog0AAGAw0QYAADCYaAMAABhMtAEAAAwm2gAAAAYTbQAAAIOJNgAAgMFEGwAAwGCiDQAAYDDRBgAAMJhoAwAAGEy0AQAADCbaAAAABhNtAAAAg4k2AACAwUQbAADAYKINAABgMNEGAAAwmGgDAAAYTLQBAAAMJtoAAAAGE20AAACDiTYAAIDBRBsAAMBgog0AAGAw0QYAADCYaAMAABhMtAEAAAwm2gAAAAbbcrRV1ROr6n1Vdaqqbq2qly/Wz6+qm6rqtsX24PaNCwAAsL8sc6ftviS/3t3fm+RZSV5aVZclOZbkZHdfmuTkYh8AAIAt2HK0dffp7v7Q4vVnk5xK8oQkVyQ5sXjbiSRXLjkjAADAvrUt/6atqg4leVqSDyS5qLtPJ2fCLsmF9/OZa6tqvarWNzY2tmMMAACAPWfpaKuqxyX5qySv6O7/ebCf6+7j3X24uw+vra0tOwYAAMCetFS0VdUjcibY3trd71ws311VFy+OX5zknuVGBAAA2L+W+fbISvLmJKe6+w83HboxydHF66NJbtj6eAAAAPvbgSU+e3mSFyf5l6r6yGLtt5Jcl+T6qromyZ1JrlpqQgAAgH1sy9HW3f+QpO7n8JGt/lwAAAC+alu+PRIAAICdIdoAAAAGE20AAACDiTYAAIDBRBsAAMBgog0AAGAw0QYAADCYaAMAABhMtAEAAAwm2gAAAAYTbQAAAIOJNgAAgMFEGwAAwGCiDQAAYDDRBgAAMJhoAwAAGEy0AQAADCbaAAAABhNtAAAAg4k2AACAwUQbAADAYKINAABgMNEGAAAwmGgDAAAYTLQBAAAMJtoAAAAGE20AAACDiTYAAIDBRBsAAMBgog0AAGAw0QYAADCYaAMAABhMtAEAAAwm2gAAAAYTbQAAAIOJNgAAgMFEGwAAwGCiDQAAYDDRBgAAMJhoAwAAGEy0AQAADCbaAAAABtuxaKuq51fVx6vq9qo6tlPnAQAA2Mt2JNqq6rwkb0jyk0kuS/LCqrpsJ84FAACwl+3UnbZnJLm9uz/R3V9M8vYkV+zQuQAAAPas6u7t/6FVP5/k+d39ksX+i5M8s7tftuk91ya5drH7lCQf3/ZB2E4XJLl31UPALuYaguW4hmB5rqPZvqO718514MAOnbDOsfY1ddjdx5Mc36Hzs82qar27D696DtitXEOwHNcQLM91tHvt1OORdyV54qb9S5J8eofOBQAAsGftVLR9MMmlVfWkqnpkkquT3LhD5wIAANizduTxyO6+r6peluRvk5yX5C3dfetOnIuHjUdZYTmuIViOawiW5zrapXbki0gAAADYHjv2x7UBAABYnmgDAAAYTLQBAAAMJtp4UKrq8qp6w6rngN2gqr67qi4/x/qPVtV3rWImAGD3Em3cr6r6gar6/aq6I8nvJPnYikeC3eJ1ST57jvXPL44BD1FVXVBVteo5YDeqqrWqWlv1HGydaONrVNWTq+rVVXUqyeuTfCpnvmX0Od39xyseD3aLQ939z2cvdvd6kkMP/ziwu1TVs6rq/VX1zqp6WlXdkuSWJHdX1fNXPR/sBnXGb1fVvTnzi/d/raqNqnr1qmfjoRNtnO1jSY4k+enu/pFFqH1pxTPBbvOob3Ds0Q/bFLB7vT7J7yZ5W5L3JnlJd397kmcn+b1VDga7yCuSXJ7kB7v727r7YJJnJrm8ql650sl4yEQbZ/u5JJ9J8r6q+pOqOpLE4yjw0Hywqn7l7MWquibJzSuYB3abA939nu7+yySf6e5/TJLu9pg+PHi/lOSF3f3Jryx09yeSvGhxjF3kwKoHYJbufleSd1XVY5NcmeSVSS6qqjcmeVd3v2eV88Eu8YqcuY5+MV+NtMNJHpnkZ1c1FOwiX970+vNnHeuHcxDYxR7R3feevdjdG1X1iFUMxNZVt//28Y1V1flJrkryC9393FXPA7tFVT0nyVMXu7d293tXOQ/sFlX1pSSfy5knPR6d5H+/cijJo7rb/3DCA6iqD3X30x/qMWYSbQAAsMds+uXH1x2KX37sOqINAABgMF9EAgAAMJhoAwAAGEy0AQAADCbaAAAABvt/fu2lOaLugeQAAAAASUVORK5CYII=\n", "text/plain": [ "<Figure size 1080x360 with 1 Axes>" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "imdf['type'].value_counts().plot(kind='bar');" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [], "source": [ "s3 = session.client('s3')" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [], "source": [ "targetTimeseriesDatakey = \"cold-start/test.csv\"\n", "\n", "s3.upload_file(Filename=localFilePath, Bucket = bucketName, Key = f\"{targetTimeseriesDatakey}\")" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [], "source": [ "itemMetaDatakey = \"cold-start/itemMetaData.csv\"\n", "\n", "s3.upload_file(Filename=localItemMetaDataFilePath, Bucket = bucketName, Key = f\"{itemMetaDatakey}\")" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [], "source": [ "project = \"coldstart_demo\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Below, we specify key input data and forecast parameters" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [], "source": [ "freq = \"H\"\n", "forecast_horizon = 48\n", "timestamp_format = \"yyyy-MM-dd HH:mm:ss\"\n", "delimiter = ','" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Step 2a. Creating a Dataset Group<a class=\"anchor\" id=\"create\">\n", "First let's create a dataset group and then update it later to add our datasets." ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [], "source": [ "dataset_group = f\"{project}_grp\"\n", "dataset_arns = []\n", "create_dataset_group_response = forecast.create_dataset_group(Domain=\"CUSTOM\",\n", " DatasetGroupName=dataset_group,\n", " DatasetArns=dataset_arns)" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [], "source": [ "dataset_group_arn = create_dataset_group_response['DatasetGroupArn']" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Step 2b. Creating a Target Dataset<a class=\"anchor\" id=\"target\">\n", "In this example, we will define a target time series. This is a required dataset to use the service." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Below we specify the target time series name af_demo_ts_4." ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "coldstart_demo_ts\n" ] } ], "source": [ "ts_dataset_name = f\"{project}_ts\"\n", "print(ts_dataset_name)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Next, we specify the schema of our dataset below. Make sure the order of the attributes (columns) matches the raw \n", "data in the files. We follow the same three attribute format as the above example." ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [], "source": [ "ts_schema_val = [{\"AttributeName\": \"timestamp\", \"AttributeType\": \"timestamp\"},\n", " {\"AttributeName\": \"target_value\", \"AttributeType\": \"float\"},\n", " {\"AttributeName\": \"item_id\", \"AttributeType\": \"string\"}]\n", "ts_schema = {\"Attributes\": ts_schema_val}" ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [], "source": [ "response = forecast.create_dataset(Domain=\"CUSTOM\",\n", " DatasetType='TARGET_TIME_SERIES',\n", " DatasetName=ts_dataset_name,\n", " DataFrequency=freq,\n", " Schema=ts_schema\n", " )\n", "\n", "ts_dataset_arn = response['DatasetArn']" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Step 2c. Creating ItemMetaData Dataset<a class=\"anchor\" id=\"related\">\n", "In this example, we will define a Item Metadata Dataset." ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "coldstart_demo_meta\n" ] } ], "source": [ "item_metadata_dataset_name = f\"{project}_meta\"\n", "print(item_metadata_dataset_name)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Specify the schema of your dataset here. Make sure the order of columns matches the raw data files. We follow the same three column format as the above example." ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [], "source": [ "meta_schema_val = [{\"AttributeName\": \"item_id\", \"AttributeType\": \"string\"},\n", " {\"AttributeName\": \"category\", \"AttributeType\": \"string\"}]\n", "meta_schema = {\"Attributes\": meta_schema_val}" ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [], "source": [ "response = forecast.create_dataset(Domain=\"CUSTOM\",\n", " DatasetType='ITEM_METADATA',\n", " DatasetName=item_metadata_dataset_name,\n", " Schema=meta_schema\n", " )\n", "\n", "meta_dataset_arn = response['DatasetArn']" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Step 2d. Updating the dataset group with the datasets we created<a class=\"anchor\" id=\"update\">\n", "You can have multiple datasets under the same dataset group. Update it with the datasets we created before." ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [], "source": [ "dataset_arns = []\n", "dataset_arns.append(ts_dataset_arn)\n", "dataset_arns.append(meta_dataset_arn)\n", "update_dataset_group_response = forecast.update_dataset_group(DatasetGroupArn=dataset_group_arn, DatasetArns=dataset_arns)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Step 2e. Creating a Target Time Series Dataset Import Job<a class=\"anchor\" id=\"targetImport\">" ] }, { "cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [], "source": [ "ts_dataset_import_job_response = forecast.create_dataset_import_job(DatasetImportJobName=dataset_group+\"_1\",\n", " DatasetArn=ts_dataset_arn,\n", " DataSource= {\n", " \"S3Config\" : {\n", " \"Path\": f\"s3://{bucketName}/{targetTimeseriesDatakey}\",\n", " \"RoleArn\": role_arn\n", " } \n", " },\n", " TimestampFormat=timestamp_format)\n", "\n", "ts_dataset_import_job_arn=ts_dataset_import_job_response['DatasetImportJobArn']" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Step 2f. Creating a Item Meta Data Dataset Import Job<a class=\"anchor\" id=\"relatedImport\">" ] }, { "cell_type": "code", "execution_count": 28, "metadata": {}, "outputs": [], "source": [ "meta_dataset_import_job_response = forecast.create_dataset_import_job(DatasetImportJobName=dataset_group,\n", " DatasetArn=meta_dataset_arn,\n", " DataSource= {\n", " \"S3Config\" : {\n", " \"Path\": f\"s3://{bucketName}/{itemMetaDatakey}\",\n", " \"RoleArn\": role_arn\n", " } \n", " })\n", "\n", "meta_dataset_import_job_arn=meta_dataset_import_job_response['DatasetImportJobArn']" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Step 2g. Wait on the two dataset import jobs to complete<a class=\"anchor\" id=\"ImportWait\">" ] }, { "cell_type": "code", "execution_count": 29, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "CREATE_PENDING \n", "CREATE_IN_PROGRESS .........\n", "ACTIVE \n", "CREATE_IN_PROGRESS ...........................\n", "ACTIVE \n" ] } ], "source": [ "status = util.wait(lambda: forecast.describe_dataset_import_job(DatasetImportJobArn=meta_dataset_import_job_arn))\n", "assert status\n", "status = util.wait(lambda: forecast.describe_dataset_import_job(DatasetImportJobArn=ts_dataset_import_job_arn))\n", "assert status" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Step 3. Create Predictor with the datasets<a class=\"anchor\" id=\"algo\">" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Read about the [Auto Predictor](https://github.com/aws-samples/amazon-forecast-samples/blob/main/library/content/AutoPredictor.md) to learn more." ] }, { "cell_type": "code", "execution_count": 30, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Waiting for Predictor to become ACTIVE. Depending on data size and predictor setting,it can take several hours to be ACTIVE.\n", "\n", "Current Status:\n", "CREATE_PENDING ..\nn", "ACTIVE \n", "\n", "\n", "The Predictor is now ACTIVE.\n" ] } ], "source": [ "PREDICTOR_NAME = project + \"_predictor\"\n", "\n", "create_auto_predictor_response = \\\n", " forecast.create_auto_predictor(PredictorName = PREDICTOR_NAME,\n", " ForecastHorizon = forecast_horizon,\n", " ForecastFrequency = freq,\n", " DataConfig = {\n", " 'DatasetGroupArn': dataset_group_arn\n", " })\n", "\n", "predictor_arn = create_auto_predictor_response['PredictorArn']\n", "print(f\"Waiting for Predictor to become ACTIVE. Depending on data size and predictor setting,it can take several hours to be ACTIVE.\\n\\nCurrent Status:\")\n", "\n", "status = util.wait(lambda: forecast.describe_auto_predictor(PredictorArn=predictor_arn))\n", "\n", "describe_auto_predictor_response = forecast.describe_auto_predictor(PredictorArn=predictor_arn)\n", "print(f\"\\n\\nThe Predictor is now {describe_auto_predictor_response['Status']}.\")\n", " " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Step 4. Creating a Forecast<a class=\"anchor\" id=\"forecast\">\n", "\n", "Next we re-train with the full dataset, and create the forecast." ] }, { "cell_type": "code", "execution_count": 31, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Waiting for Forecast to become ACTIVE.\n", "\n", "Current Status:\n", "CREATE_PENDING \n", "CREATE_IN_PROGRESS ........................................................................\n", "ACTIVE \n" ] } ], "source": [ "FORECAST_NAME = project + \"_forecast\"\n", "\n", "create_forecast_response = forecast.create_forecast(ForecastName=FORECAST_NAME,\n", " PredictorArn=predictor_arn)\n", "\n", "forecast_arn = create_forecast_response['ForecastArn']\n", "\n", "print(f\"Waiting for Forecast to become ACTIVE.\\n\\nCurrent Status:\")\n", "\n", "status = util.wait(lambda: forecast.describe_forecast(ForecastArn=forecast_arn))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Step 5. Querying the ColdStart Item Forecast<a class=\"anchor\" id=\"query\">\n", " \n", "Now we plot the forecast. First, we see there is <b>no historical data</b> for client_373 in the tdf dataframe. Next, despite evidence of no history, a future prediction is made as shown in the plot. client_373 is type D; therefore, the prediction is based on the distribution of values seen in other type D clients." ] }, { "cell_type": "code", "execution_count": 32, "metadata": {}, "outputs": [ { "data": { "text/html": [ "<div>\n", "<style scoped>\n", " .dataframe tbody tr th:only-of-type {\n", " vertical-align: middle;\n", " }\n", "\n", " .dataframe tbody tr th {\n", " vertical-align: top;\n", " }\n", "\n", " .dataframe thead th {\n", " text-align: right;\n", " }\n", "</style>\n", "<table border=\"1\" class=\"dataframe\">\n", " <thead>\n", " <tr style=\"text-align: right;\">\n", " <th></th>\n", " <th>timestamp</th>\n", " <th>target_value</th>\n", " <th>item_id</th>\n", " </tr>\n", " </thead>\n", " <tbody>\n", " </tbody>\n", "</table>\n", "</div>" ], "text/plain": [ "Empty DataFrame\n", "Columns: [timestamp, target_value, item_id]\n", "Index: []" ] }, "execution_count": 32, "metadata": {}, "output_type": "execute_result" } ], "source": [ "tdf[tdf['item_id'] == 'client_373']" ] }, { "cell_type": "code", "execution_count": 33, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "<Figure size 1080x720 with 1 Axes>" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "forecast_response = forecast_query.query_forecast(\n", " ForecastArn=forecast_arn,\n", " Filters={\"item_id\": \"client_373\"})\n", "\n", "fcst = forecast_response['Forecast']['Predictions']\n", "time_stamp = list(map(lambda x: pd.to_datetime(x['Timestamp']), fcst['p10']))\n", "p10_fcst = list(map(lambda x: x['Value'], fcst['p10']))\n", "p50_fcst = list(map(lambda x: x['Value'], fcst['p50']))\n", "p90_fcst = list(map(lambda x: x['Value'], fcst['p90']))\n", "\n", "plt.figure(figsize=(15, 10))\n", "plt.plot(time_stamp, p50_fcst)\n", "plt.fill_between(time_stamp, p10_fcst, p90_fcst, alpha=0.2)\n", "plt.title(\"Cold-Start Forecast client_373\");" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Step 6. Exporting your Forecasts<a class=\"anchor\" id=\"export\">\n", " \n", "In step 5, the Forecast Query API is used to review single items for quality assurance and testing purposes. For production scale, you should export the data in bulk to S3. From there, you can use the data for many reasons including using in your BI platform or querying from a low latency database." ] }, { "cell_type": "code", "execution_count": 34, "metadata": {}, "outputs": [], "source": [ "forecast_export_name = f'{project}_cold_start_forecast_export'\n", "forecast_export_name_path = f\"s3://{bucketName}/{forecast_export_name}\"" ] }, { "cell_type": "code", "execution_count": 35, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "CREATE_PENDING .\n", "CREATE_IN_PROGRESS ...........\n", "ACTIVE \n" ] } ], "source": [ "create_forecast_export_response = forecast.create_forecast_export_job(ForecastExportJobName=forecast_export_name,\n", " ForecastArn=forecast_arn,\n", " Destination={\n", " \"S3Config\" : {\n", " \"Path\": forecast_export_name_path,\n", " \"RoleArn\": role_arn\n", " }\n", " })\n", "forecast_export_arn = create_forecast_export_response['ForecastExportJobArn']\n", "\n", "status = util.wait(lambda: forecast.describe_forecast_export_job(ForecastExportJobArn = forecast_export_arn))\n", "assert status" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Step 7. Cleaning up your Resources<a class=\"anchor\" id=\"cleanup\">" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Once you have completed the above steps, you can start to cleanup the resources we created. All delete jobs, except for `delete_dataset_group` are asynchronous, so we have added the helpful `wait_till_delete` function. \n", "Resource Limits documented <a href=\"https://docs.aws.amazon.com/forecast/latest/dg/limits.html\">here</a>. " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Delete forecast export\n", "util.wait_till_delete(lambda: forecast.delete_forecast_export_job(ForecastExportJobArn = forecast_export_arn))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Delete forecast\n", "util.wait_till_delete(lambda: forecast.delete_forecast(ForecastArn = forecast_arn))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Delete predictor\n", "util.wait_till_delete(lambda: forecast.delete_predictor(PredictorArn = predictor_arn))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Delete the target time series and related time series dataset import jobs\n", "util.wait_till_delete(lambda: forecast.delete_dataset_import_job(DatasetImportJobArn=ts_dataset_import_job_arn))\n", "util.wait_till_delete(lambda: forecast.delete_dataset_import_job(DatasetImportJobArn=meta_dataset_import_job_arn))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Delete the target time series and related time series datasets\n", "util.wait_till_delete(lambda: forecast.delete_dataset(DatasetArn=ts_dataset_arn))\n", "util.wait_till_delete(lambda: forecast.delete_dataset(DatasetArn=meta_dataset_arn))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Delete dataset group\n", "forecast.delete_dataset_group(DatasetGroupArn=dataset_group_arn)" ] } ], "metadata": { "kernelspec": { "display_name": "conda_python3", "language": "python", "name": "conda_python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.13" } }, "nbformat": 4, "nbformat_minor": 2 }