{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Getting Data Ready\n", "\n", "Forecasting is used in a variety of applications and business use cases: For example, retailers need to forecast the sales of their products to decide how much stock they need by location, Manufacturers need to estimate the number of parts required at their factories to optimize their supply chain, Businesses need to estimate their flexible workforce needs, Utilities need to forecast electricity consumption needs in order to attain an efficient energy network, and enterprises need to estimate their cloud infrastructure needs.\n", "\n", "<img src=\"https://amazon-forecast-samples.s3-us-west-2.amazonaws.com/common/images/forecast_overview_steps.png\" width=\"98%\">\n", "\n", "In this notebook we will be walking through the first steps outlined in left-box above.\n", "\n", "\n", "## Table Of Contents\n", "* Step 1: [Setup Amazon Forecast](#setup)\n", "* Step 2: [Prepare the Datasets](#DataPrep)\n", "* Step 3: [Create the Dataset Group and Dataset](#DataSet)\n", "* Step 4: [Create the Target Time Series Data Import Job](#DataImport)\n", "* [Next Steps](#nextSteps)\n", "\n", "For more informations about APIs, please check the [documentation](https://docs.aws.amazon.com/forecast/latest/dg/what-is-forecast.html)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Step 1: Setup Amazon Forecast<a class=\"anchor\" id=\"setup\"></a>" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This section sets up the permissions and relevant endpoints." ] }, { "cell_type": "code", "execution_count": 51, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Requirement already satisfied: pandas in /Users/jeetub/.pyenv/versions/3.8.12/lib/python3.8/site-packages (1.3.3)\n", "Requirement already satisfied: s3fs in /Users/jeetub/.pyenv/versions/3.8.12/lib/python3.8/site-packages (2021.10.1)\n", "Requirement already satisfied: matplotlib in /Users/jeetub/.pyenv/versions/3.8.12/lib/python3.8/site-packages (3.4.3)\n", "Requirement already satisfied: ipywidgets in /Users/jeetub/.pyenv/versions/3.8.12/lib/python3.8/site-packages (7.6.5)\n", "Requirement already satisfied: python-dateutil>=2.7.3 in /Users/jeetub/.pyenv/versions/3.8.12/lib/python3.8/site-packages (from pandas) (2.8.2)\n", "Requirement already satisfied: pytz>=2017.3 in /Users/jeetub/.pyenv/versions/3.8.12/lib/python3.8/site-packages (from pandas) (2021.1)\n", "Requirement already satisfied: numpy>=1.17.3 in /Users/jeetub/.pyenv/versions/3.8.12/lib/python3.8/site-packages (from pandas) (1.21.2)\n", "Requirement already satisfied: six>=1.5 in /Users/jeetub/.pyenv/versions/3.8.12/lib/python3.8/site-packages (from python-dateutil>=2.7.3->pandas) (1.16.0)\n", "Requirement already satisfied: fsspec==2021.10.1 in /Users/jeetub/.pyenv/versions/3.8.12/lib/python3.8/site-packages (from s3fs) (2021.10.1)\n", "Requirement already satisfied: aiobotocore~=1.4.1 in /Users/jeetub/.pyenv/versions/3.8.12/lib/python3.8/site-packages (from s3fs) (1.4.2)\n", "Collecting botocore<1.20.107,>=1.20.106\n", " Using cached botocore-1.20.106-py2.py3-none-any.whl (7.7 MB)\n", "Requirement already satisfied: aiohttp>=3.3.1 in /Users/jeetub/.pyenv/versions/3.8.12/lib/python3.8/site-packages (from aiobotocore~=1.4.1->s3fs) (3.7.4.post0)\n", "Requirement already satisfied: wrapt>=1.10.10 in /Users/jeetub/.pyenv/versions/3.8.12/lib/python3.8/site-packages (from aiobotocore~=1.4.1->s3fs) (1.13.2)\n", "Requirement already satisfied: aioitertools>=0.5.1 in /Users/jeetub/.pyenv/versions/3.8.12/lib/python3.8/site-packages (from aiobotocore~=1.4.1->s3fs) (0.8.0)\n", "Requirement already satisfied: async-timeout<4.0,>=3.0 in /Users/jeetub/.pyenv/versions/3.8.12/lib/python3.8/site-packages (from aiohttp>=3.3.1->aiobotocore~=1.4.1->s3fs) (3.0.1)\n", "Requirement already satisfied: multidict<7.0,>=4.5 in /Users/jeetub/.pyenv/versions/3.8.12/lib/python3.8/site-packages (from aiohttp>=3.3.1->aiobotocore~=1.4.1->s3fs) (5.2.0)\n", "Requirement already satisfied: attrs>=17.3.0 in /Users/jeetub/.pyenv/versions/3.8.12/lib/python3.8/site-packages (from aiohttp>=3.3.1->aiobotocore~=1.4.1->s3fs) (21.2.0)\n", "Requirement already satisfied: chardet<5.0,>=2.0 in /Users/jeetub/.pyenv/versions/3.8.12/lib/python3.8/site-packages (from aiohttp>=3.3.1->aiobotocore~=1.4.1->s3fs) (4.0.0)\n", "Requirement already satisfied: yarl<2.0,>=1.0 in /Users/jeetub/.pyenv/versions/3.8.12/lib/python3.8/site-packages (from aiohttp>=3.3.1->aiobotocore~=1.4.1->s3fs) (1.7.0)\n", "Requirement already satisfied: typing-extensions>=3.6.5 in /Users/jeetub/.pyenv/versions/3.8.12/lib/python3.8/site-packages (from aiohttp>=3.3.1->aiobotocore~=1.4.1->s3fs) (3.10.0.2)\n", "Requirement already satisfied: urllib3<1.27,>=1.25.4 in /Users/jeetub/.pyenv/versions/3.8.12/lib/python3.8/site-packages (from botocore<1.20.107,>=1.20.106->aiobotocore~=1.4.1->s3fs) (1.26.6)\n", "Requirement already satisfied: jmespath<1.0.0,>=0.7.1 in /Users/jeetub/.pyenv/versions/3.8.12/lib/python3.8/site-packages (from botocore<1.20.107,>=1.20.106->aiobotocore~=1.4.1->s3fs) (0.10.0)\n", "Requirement already satisfied: idna>=2.0 in /Users/jeetub/.pyenv/versions/3.8.12/lib/python3.8/site-packages (from yarl<2.0,>=1.0->aiohttp>=3.3.1->aiobotocore~=1.4.1->s3fs) (3.2)\n", "Requirement already satisfied: pillow>=6.2.0 in /Users/jeetub/.pyenv/versions/3.8.12/lib/python3.8/site-packages (from matplotlib) (8.4.0)\n", "Requirement already satisfied: cycler>=0.10 in /Users/jeetub/.pyenv/versions/3.8.12/lib/python3.8/site-packages (from matplotlib) (0.10.0)\n", "Requirement already satisfied: kiwisolver>=1.0.1 in /Users/jeetub/.pyenv/versions/3.8.12/lib/python3.8/site-packages (from matplotlib) (1.3.2)\n", "Requirement already satisfied: pyparsing>=2.2.1 in /Users/jeetub/.pyenv/versions/3.8.12/lib/python3.8/site-packages (from matplotlib) (2.4.7)\n", "Requirement already satisfied: ipykernel>=4.5.1 in /Users/jeetub/.pyenv/versions/3.8.12/lib/python3.8/site-packages (from ipywidgets) (6.4.1)\n", "Requirement already satisfied: ipython-genutils~=0.2.0 in /Users/jeetub/.pyenv/versions/3.8.12/lib/python3.8/site-packages (from ipywidgets) (0.2.0)\n", "Requirement already satisfied: traitlets>=4.3.1 in /Users/jeetub/.pyenv/versions/3.8.12/lib/python3.8/site-packages (from ipywidgets) (5.1.0)\n", "Requirement already satisfied: ipython>=4.0.0 in /Users/jeetub/.pyenv/versions/3.8.12/lib/python3.8/site-packages (from ipywidgets) (7.27.0)\n", "Requirement already satisfied: nbformat>=4.2.0 in /Users/jeetub/.pyenv/versions/3.8.12/lib/python3.8/site-packages (from ipywidgets) (5.1.3)\n", "Requirement already satisfied: widgetsnbextension~=3.5.0 in /Users/jeetub/.pyenv/versions/3.8.12/lib/python3.8/site-packages (from ipywidgets) (3.5.1)\n", "Requirement already satisfied: jupyterlab-widgets>=1.0.0 in /Users/jeetub/.pyenv/versions/3.8.12/lib/python3.8/site-packages (from ipywidgets) (1.0.2)\n", "Requirement already satisfied: appnope in /Users/jeetub/.pyenv/versions/3.8.12/lib/python3.8/site-packages (from ipykernel>=4.5.1->ipywidgets) (0.1.2)\n", "Requirement already satisfied: jupyter-client<8.0 in /Users/jeetub/.pyenv/versions/3.8.12/lib/python3.8/site-packages (from ipykernel>=4.5.1->ipywidgets) (7.0.2)\n", "Requirement already satisfied: tornado<7.0,>=4.2 in /Users/jeetub/.pyenv/versions/3.8.12/lib/python3.8/site-packages (from ipykernel>=4.5.1->ipywidgets) (6.1)\n", "Requirement already satisfied: debugpy<2.0,>=1.0.0 in /Users/jeetub/.pyenv/versions/3.8.12/lib/python3.8/site-packages (from ipykernel>=4.5.1->ipywidgets) (1.4.3)\n", "Requirement already satisfied: matplotlib-inline<0.2.0,>=0.1.0 in /Users/jeetub/.pyenv/versions/3.8.12/lib/python3.8/site-packages (from ipykernel>=4.5.1->ipywidgets) (0.1.3)\n", "Requirement already satisfied: prompt-toolkit!=3.0.0,!=3.0.1,<3.1.0,>=2.0.0 in /Users/jeetub/.pyenv/versions/3.8.12/lib/python3.8/site-packages (from ipython>=4.0.0->ipywidgets) (3.0.20)\n", "Requirement already satisfied: decorator in /Users/jeetub/.pyenv/versions/3.8.12/lib/python3.8/site-packages (from ipython>=4.0.0->ipywidgets) (5.1.0)\n", "Requirement already satisfied: pexpect>4.3 in /Users/jeetub/.pyenv/versions/3.8.12/lib/python3.8/site-packages (from ipython>=4.0.0->ipywidgets) (4.8.0)\n", "Requirement already satisfied: pygments in /Users/jeetub/.pyenv/versions/3.8.12/lib/python3.8/site-packages (from ipython>=4.0.0->ipywidgets) (2.10.0)\n", "Requirement already satisfied: setuptools>=18.5 in /Users/jeetub/.pyenv/versions/3.8.12/lib/python3.8/site-packages (from ipython>=4.0.0->ipywidgets) (56.0.0)\n", "Requirement already satisfied: backcall in /Users/jeetub/.pyenv/versions/3.8.12/lib/python3.8/site-packages (from ipython>=4.0.0->ipywidgets) (0.2.0)\n", "Requirement already satisfied: jedi>=0.16 in /Users/jeetub/.pyenv/versions/3.8.12/lib/python3.8/site-packages (from ipython>=4.0.0->ipywidgets) (0.18.0)\n", "Requirement already satisfied: pickleshare in /Users/jeetub/.pyenv/versions/3.8.12/lib/python3.8/site-packages (from ipython>=4.0.0->ipywidgets) (0.7.5)\n", "Requirement already satisfied: parso<0.9.0,>=0.8.0 in /Users/jeetub/.pyenv/versions/3.8.12/lib/python3.8/site-packages (from jedi>=0.16->ipython>=4.0.0->ipywidgets) (0.8.2)\n", "Requirement already satisfied: pyzmq>=13 in /Users/jeetub/.pyenv/versions/3.8.12/lib/python3.8/site-packages (from jupyter-client<8.0->ipykernel>=4.5.1->ipywidgets) (22.2.1)\n", "Requirement already satisfied: entrypoints in /Users/jeetub/.pyenv/versions/3.8.12/lib/python3.8/site-packages (from jupyter-client<8.0->ipykernel>=4.5.1->ipywidgets) (0.3)\n", "Requirement already satisfied: nest-asyncio>=1.5 in /Users/jeetub/.pyenv/versions/3.8.12/lib/python3.8/site-packages (from jupyter-client<8.0->ipykernel>=4.5.1->ipywidgets) (1.5.1)\n", "Requirement already satisfied: jupyter-core>=4.6.0 in /Users/jeetub/.pyenv/versions/3.8.12/lib/python3.8/site-packages (from jupyter-client<8.0->ipykernel>=4.5.1->ipywidgets) (4.7.1)\n", "Requirement already satisfied: jsonschema!=2.5.0,>=2.4 in /Users/jeetub/.pyenv/versions/3.8.12/lib/python3.8/site-packages (from nbformat>=4.2.0->ipywidgets) (3.2.0)\n", "Requirement already satisfied: pyrsistent>=0.14.0 in /Users/jeetub/.pyenv/versions/3.8.12/lib/python3.8/site-packages (from jsonschema!=2.5.0,>=2.4->nbformat>=4.2.0->ipywidgets) (0.18.0)\n", "Requirement already satisfied: ptyprocess>=0.5 in /Users/jeetub/.pyenv/versions/3.8.12/lib/python3.8/site-packages (from pexpect>4.3->ipython>=4.0.0->ipywidgets) (0.7.0)\n", "Requirement already satisfied: wcwidth in /Users/jeetub/.pyenv/versions/3.8.12/lib/python3.8/site-packages (from prompt-toolkit!=3.0.0,!=3.0.1,<3.1.0,>=2.0.0->ipython>=4.0.0->ipywidgets) (0.2.5)\n", "Requirement already satisfied: notebook>=4.4.1 in /Users/jeetub/.pyenv/versions/3.8.12/lib/python3.8/site-packages (from widgetsnbextension~=3.5.0->ipywidgets) (6.4.4)\n", "Requirement already satisfied: terminado>=0.8.3 in /Users/jeetub/.pyenv/versions/3.8.12/lib/python3.8/site-packages (from notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets) (0.12.1)\n", "Requirement already satisfied: Send2Trash>=1.5.0 in /Users/jeetub/.pyenv/versions/3.8.12/lib/python3.8/site-packages (from notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets) (1.8.0)\n", "Requirement already satisfied: argon2-cffi in /Users/jeetub/.pyenv/versions/3.8.12/lib/python3.8/site-packages (from notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets) (21.1.0)\n", "Requirement already satisfied: nbconvert in /Users/jeetub/.pyenv/versions/3.8.12/lib/python3.8/site-packages (from notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets) (6.1.0)\n", "Requirement already satisfied: prometheus-client in /Users/jeetub/.pyenv/versions/3.8.12/lib/python3.8/site-packages (from notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets) (0.11.0)\n", "Requirement already satisfied: jinja2 in /Users/jeetub/.pyenv/versions/3.8.12/lib/python3.8/site-packages (from notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets) (3.0.1)\n", "Requirement already satisfied: cffi>=1.0.0 in /Users/jeetub/.pyenv/versions/3.8.12/lib/python3.8/site-packages (from argon2-cffi->notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets) (1.14.6)\n", "Requirement already satisfied: pycparser in /Users/jeetub/.pyenv/versions/3.8.12/lib/python3.8/site-packages (from cffi>=1.0.0->argon2-cffi->notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets) (2.20)\n", "Requirement already satisfied: MarkupSafe>=2.0 in /Users/jeetub/.pyenv/versions/3.8.12/lib/python3.8/site-packages (from jinja2->notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets) (2.0.1)\n", "Requirement already satisfied: defusedxml in /Users/jeetub/.pyenv/versions/3.8.12/lib/python3.8/site-packages (from nbconvert->notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets) (0.7.1)\n", "Requirement already satisfied: nbclient<0.6.0,>=0.5.0 in /Users/jeetub/.pyenv/versions/3.8.12/lib/python3.8/site-packages (from nbconvert->notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets) (0.5.4)\n", "Requirement already satisfied: testpath in /Users/jeetub/.pyenv/versions/3.8.12/lib/python3.8/site-packages (from nbconvert->notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets) (0.5.0)\n", "Requirement already satisfied: pandocfilters>=1.4.1 in /Users/jeetub/.pyenv/versions/3.8.12/lib/python3.8/site-packages (from nbconvert->notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets) (1.5.0)\n", "Requirement already satisfied: bleach in /Users/jeetub/.pyenv/versions/3.8.12/lib/python3.8/site-packages (from nbconvert->notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets) (4.1.0)\n", "Requirement already satisfied: jupyterlab-pygments in /Users/jeetub/.pyenv/versions/3.8.12/lib/python3.8/site-packages (from nbconvert->notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets) (0.1.2)\n", "Requirement already satisfied: mistune<2,>=0.8.1 in /Users/jeetub/.pyenv/versions/3.8.12/lib/python3.8/site-packages (from nbconvert->notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets) (0.8.4)\n", "Requirement already satisfied: packaging in /Users/jeetub/.pyenv/versions/3.8.12/lib/python3.8/site-packages (from bleach->nbconvert->notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets) (21.0)\n", "Requirement already satisfied: webencodings in /Users/jeetub/.pyenv/versions/3.8.12/lib/python3.8/site-packages (from bleach->nbconvert->notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets) (0.5.1)\n", "Installing collected packages: botocore\n", " Attempting uninstall: botocore\n", " Found existing installation: botocore 1.22.1\n", " Uninstalling botocore-1.22.1:\n", " Successfully uninstalled botocore-1.22.1\n", "\u001b[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.\n", "boto3 1.19.1 requires botocore<1.23.0,>=1.22.1, but you have botocore 1.20.106 which is incompatible.\u001b[0m\n", "Successfully installed botocore-1.20.106\n", "\u001b[33mWARNING: You are using pip version 21.1.1; however, version 21.3 is available.\n", "You should consider upgrading via the '/Users/jeetub/.pyenv/versions/3.8.12/bin/python3.8 -m pip install --upgrade pip' command.\u001b[0m\n", "Requirement already satisfied: boto3 in /Users/jeetub/.pyenv/versions/3.8.12/lib/python3.8/site-packages (1.19.1)\n", "Collecting botocore<1.23.0,>=1.22.1\n", " Using cached botocore-1.22.1-py3-none-any.whl (8.0 MB)\n", "Requirement already satisfied: jmespath<1.0.0,>=0.7.1 in /Users/jeetub/.pyenv/versions/3.8.12/lib/python3.8/site-packages (from boto3) (0.10.0)\n", "Requirement already satisfied: s3transfer<0.6.0,>=0.5.0 in /Users/jeetub/.pyenv/versions/3.8.12/lib/python3.8/site-packages (from boto3) (0.5.0)\n", "Requirement already satisfied: python-dateutil<3.0.0,>=2.1 in /Users/jeetub/.pyenv/versions/3.8.12/lib/python3.8/site-packages (from botocore<1.23.0,>=1.22.1->boto3) (2.8.2)\n", "Requirement already satisfied: urllib3<1.27,>=1.25.4 in /Users/jeetub/.pyenv/versions/3.8.12/lib/python3.8/site-packages (from botocore<1.23.0,>=1.22.1->boto3) (1.26.6)\n", "Requirement already satisfied: six>=1.5 in /Users/jeetub/.pyenv/versions/3.8.12/lib/python3.8/site-packages (from python-dateutil<3.0.0,>=2.1->botocore<1.23.0,>=1.22.1->boto3) (1.16.0)\n", "Installing collected packages: botocore\n", " Attempting uninstall: botocore\n", " Found existing installation: botocore 1.20.106\n", " Uninstalling botocore-1.20.106:\n", " Successfully uninstalled botocore-1.20.106\n", "\u001b[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.\n", "aiobotocore 1.4.2 requires botocore<1.20.107,>=1.20.106, but you have botocore 1.22.1 which is incompatible.\u001b[0m\n", "Successfully installed botocore-1.22.1\n", "\u001b[33mWARNING: You are using pip version 21.1.1; however, version 21.3 is available.\n", "You should consider upgrading via the '/Users/jeetub/.pyenv/versions/3.8.12/bin/python3.8 -m pip install --upgrade pip' command.\u001b[0m\n" ] } ], "source": [ "!pip install pandas s3fs matplotlib ipywidgets\n", "!pip install boto3 --upgrade" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import sys\n", "import os\n", "import pandas as pd\n", "\n", "# importing forecast notebook utility from notebooks/common directory\n", "sys.path.insert( 0, os.path.abspath(\"../../common\") )\n", "import util\n", "\n", "%reload_ext autoreload\n", "import boto3\n", "import s3fs" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Configure the S3 bucket name and region name for this lesson.\n", "\n", "- If you don't have an S3 bucket, create it first on S3. \n", "- Although we have set the region to us-west-2 as a default value below, you can choose any of the regions that the service is available in." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "region = 'us-west-2'\n", "bucket_name = 'forecast-demo-uci-electricity-jeetub'" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "# Build Session and Clients for Amazon Forecast\n", "session = boto3.Session(region_name=region) \n", "forecast = session.client(service_name='forecast') " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "<b>Create IAM Role for Forecast</b> <br>\n", "Like many AWS services, Forecast will need to assume an IAM role in order to interact with your S3 resources securely. In the sample notebooks, we use the get_or_create_iam_role() utility function to create an IAM role. Please refer to \"notebooks/common/util/fcst_utils.py\" for implementation." ] }, { "cell_type": "code", "execution_count": 82, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Creating Role ForecastNotebookRole-Basic ...\n", "The role ForecastNotebookRole-Basic exists, ignore to create it\n", "Done.\n", "Success! Created role arn = ForecastNotebookRole-Basic\n" ] } ], "source": [ "# Create the role to provide to Amazon Forecast.\n", "role_name = \"ForecastNotebookRole-Basic\"\n", "print(f\"Creating Role {role_name} ...\")\n", "role_arn = util.get_or_create_iam_role( role_name = role_name )\n", "\n", "# echo user inputs without account\n", "print(f\"Success! Created role arn = {role_arn.split('/')[1]}\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The last part of the setup process is to validate that your account can communicate with Amazon Forecast, the cell below does just that." ] }, { "cell_type": "code", "execution_count": 83, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{'Predictors': [],\n", " 'ResponseMetadata': {'RequestId': 'f0c87c34-3559-4c84-94cf-cbfcb41a7c49',\n", " 'HTTPStatusCode': 200,\n", " 'HTTPHeaders': {'content-type': 'application/x-amz-json-1.1',\n", " 'date': 'Thu, 21 Oct 2021 22:10:26 GMT',\n", " 'x-amzn-requestid': 'f0c87c34-3559-4c84-94cf-cbfcb41a7c49',\n", " 'content-length': '17',\n", " 'connection': 'keep-alive'},\n", " 'RetryAttempts': 0}}" ] }, "execution_count": 83, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Check that you can communicate with Amazon Forecast\n", "forecast.list_predictors()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Step 2: Prepare the Datasets<a class=\"anchor\" id=\"DataPrep\"></a>\n", "\n", "For this exercise, we use the individual household electric power consumption dataset. (Dua, D. and Karra Taniskidou, E. (2017). UCI Machine Learning Repository [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science.) We aggregate the usage data hourly. \n", "\n", "To begin, use Pandas to read the CSV and to show a sample of the data." ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/html": [ "<div>\n", "<style scoped>\n", " .dataframe tbody tr th:only-of-type {\n", " vertical-align: middle;\n", " }\n", "\n", " .dataframe tbody tr th {\n", " vertical-align: top;\n", " }\n", "\n", " .dataframe thead th {\n", " text-align: right;\n", " }\n", "</style>\n", "<table border=\"1\" class=\"dataframe\">\n", " <thead>\n", " <tr style=\"text-align: right;\">\n", " <th></th>\n", " <th>timestamp</th>\n", " <th>value</th>\n", " <th>item</th>\n", " </tr>\n", " </thead>\n", " <tbody>\n", " <tr>\n", " <th>0</th>\n", " <td>2014-01-01 01:00:00</td>\n", " <td>38.34991708126038</td>\n", " <td>client_12</td>\n", " </tr>\n", " <tr>\n", " <th>1</th>\n", " <td>2014-01-01 02:00:00</td>\n", " <td>33.5820895522388</td>\n", " <td>client_12</td>\n", " </tr>\n", " <tr>\n", " <th>2</th>\n", " <td>2014-01-01 03:00:00</td>\n", " <td>34.41127694859037</td>\n", " <td>client_12</td>\n", " </tr>\n", " </tbody>\n", "</table>\n", "</div>" ], "text/plain": [ " timestamp value item\n", "0 2014-01-01 01:00:00 38.34991708126038 client_12\n", "1 2014-01-01 02:00:00 33.5820895522388 client_12\n", "2 2014-01-01 03:00:00 34.41127694859037 client_12" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df = pd.read_csv(\"../../common/data/item-demand-time.csv\", dtype = object, names=['timestamp','value','item'])\n", "df.head(3)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Notice in the output above there are 3 columns of data:\n", "\n", "1. The Timestamp\n", "1. A Value\n", "1. An Item ID\n", "\n", "These are the 3 key required pieces of information to generate a forecast with Amazon Forecast. More can be added but these 3 must always remain present.\n", "\n", "The dataset happens to span January 01, 2014 to Deceber 31, 2014. We are only going to use January to October to train Amazon Forecast.\n", "\n", "You may notice a variable named `df` this is a popular convention when using Pandas if you are using the library's dataframe object, it is similar to a table in a database. You can learn more here: https://pandas.pydata.org/pandas-docs/stable/getting_started/10min.html\n" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "min timestamp = 2014-01-01 01:00:00\n", "max timestamp = 2014-10-31 23:00:00\n" ] } ], "source": [ "# Select January to October for one dataframe.\n", "jan_to_oct = df[(df['timestamp'] >= '2014-01-01') & (df['timestamp'] < '2014-11-01')]\n", "\n", "print(f\"min timestamp = {jan_to_oct.timestamp.min()}\")\n", "print(f\"max timestamp = {jan_to_oct.timestamp.max()}\")" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [], "source": [ "# save an item_id for querying later\n", "item_id = \"client_12\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now export them to CSV files and place them into your `data` folder." ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [], "source": [ "jan_to_oct[[\"timestamp\", \"item\", \"value\"]].to_csv(\"data/item-demand-time-train.csv\", header=False, index=False)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We will now export a second dataset to CSV this time including November 1st. This extra day will be used to validate our forecast. " ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [], "source": [ "validation = df[(df['timestamp'] >= '2014-01-01') & (df['timestamp'] < '2014-11-02')]\n", "validation[[\"timestamp\", \"item\", \"value\"]].to_csv(\"data/item-demand-time-validation.csv\", header=False, index=False)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "At this time the data is ready to be sent to S3 where Forecast will use it later. The following cells will upload the data to S3." ] }, { "cell_type": "code", "execution_count": 89, "metadata": {}, "outputs": [], "source": [ "key=\"elec_data/item-demand-time-train.csv\"\n", "\n", "boto3.Session().resource('s3').Bucket(bucket_name).Object(key).upload_file(\"data/item-demand-time-train.csv\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Step 3: Create the Dataset Group and Dataset <a class=\"anchor\" id=\"DataSet\"></a>\n", "\n", "In Amazon Forecast , a dataset is a collection of file(s) which contain data that is relevant for a forecasting task. A dataset must conform to a schema provided by Amazon Forecast. Since data files are imported headerless, it is important to define a schema for your data.\n", "\n", "More details about `Domain` and dataset type can be found on the [documentation](https://docs.aws.amazon.com/forecast/latest/dg/howitworks-domains-ds-types.html) . For this example, we are using [CUSTOM](https://docs.aws.amazon.com/forecast/latest/dg/custom-domain.html) domain with 3 required attributes `timestamp`, `target_value` and `item_id`.\n", "\n", "\n", "Next, you need to make some choices. \n", "<ol>\n", " <li><b>How many time units do you want to forecast?</b>. For example, if your time unit is Hour, then if you want to forecast out 1 week, that would be 24*7 = 168 hours, so answer = 168. </li>\n", " <li><b>What is the time granularity for your data?</b>. For example, if your time unit is Hour, answer = \"H\". </li>\n", " <li><b>Think of a name you want to give this project (Dataset Group name)</b>, so all files will have the same names. You should also use this same name for your Forecast DatasetGroup name, to set yourself up for reproducibility. </li>\n", " </ol>" ] }, { "cell_type": "code", "execution_count": 93, "metadata": {}, "outputs": [], "source": [ "# what is your forecast horizon in number time units you've selected?\n", "# e.g. if you're forecasting in months, how many months out do you want a forecast?\n", "FORECAST_LENGTH = 24\n", "\n", "# What is your forecast time unit granularity?\n", "# Choices are: ^Y|M|W|D|H|30min|15min|10min|5min|1min$ \n", "DATASET_FREQUENCY = \"H\"\n", "TIMESTAMP_FORMAT = \"yyyy-MM-dd hh:mm:ss\"\n", "\n", "# What name do you want to give this project? \n", "# We will use this same name for your Forecast Dataset Group name.\n", "PROJECT = 'util_power_demo'\n", "DATA_VERSION = 1" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Create the Dataset Group\n", "\n", "In this task, we define a container name or Dataset Group name, which will be used to keep track of Dataset import files, schema, and all Forecast results which go together.\n" ] }, { "cell_type": "code", "execution_count": 96, "metadata": { "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Dataset Group Name = util_power_demo_1\n" ] } ], "source": [ "dataset_group = f\"{PROJECT}_{DATA_VERSION}\"\n", "print(f\"Dataset Group Name = {dataset_group}\")" ] }, { "cell_type": "code", "execution_count": 97, "metadata": {}, "outputs": [], "source": [ "dataset_arns = []\n", "create_dataset_group_response = \\\n", " forecast.create_dataset_group(Domain=\"CUSTOM\",\n", " DatasetGroupName=dataset_group,\n", " DatasetArns=dataset_arns)" ] }, { "cell_type": "code", "execution_count": 98, "metadata": {}, "outputs": [], "source": [ "dataset_group_arn = create_dataset_group_response['DatasetGroupArn']" ] }, { "cell_type": "code", "execution_count": 99, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{'DatasetGroupName': 'util_power_demo_1',\n", " 'DatasetGroupArn': 'arn:aws:forecast:us-west-2:730750055343:dataset-group/util_power_demo_1',\n", " 'DatasetArns': [],\n", " 'Domain': 'CUSTOM',\n", " 'Status': 'ACTIVE',\n", " 'CreationTime': datetime.datetime(2021, 10, 21, 15, 11, 39, 8000, tzinfo=tzlocal()),\n", " 'LastModificationTime': datetime.datetime(2021, 10, 21, 15, 11, 39, 8000, tzinfo=tzlocal()),\n", " 'ResponseMetadata': {'RequestId': 'c476e9e7-8761-4118-9d5e-5874ec63e000',\n", " 'HTTPStatusCode': 200,\n", " 'HTTPHeaders': {'content-type': 'application/x-amz-json-1.1',\n", " 'date': 'Thu, 21 Oct 2021 22:11:43 GMT',\n", " 'x-amzn-requestid': 'c476e9e7-8761-4118-9d5e-5874ec63e000',\n", " 'content-length': '257',\n", " 'connection': 'keep-alive'},\n", " 'RetryAttempts': 0}}" ] }, "execution_count": 99, "metadata": {}, "output_type": "execute_result" } ], "source": [ "forecast.describe_dataset_group(DatasetGroupArn=dataset_group_arn)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Create the Schema" ] }, { "cell_type": "code", "execution_count": 100, "metadata": {}, "outputs": [], "source": [ "# Specify the schema of your dataset here. Make sure the order of columns matches the raw data files.\n", "ts_schema ={\n", " \"Attributes\":[\n", " {\n", " \"AttributeName\":\"timestamp\",\n", " \"AttributeType\":\"timestamp\"\n", " },\n", " {\n", " \"AttributeName\":\"target_value\",\n", " \"AttributeType\":\"float\"\n", " },\n", " {\n", " \"AttributeName\":\"item_id\",\n", " \"AttributeType\":\"string\"\n", " }\n", " ]\n", "}" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Create the Dataset" ] }, { "cell_type": "code", "execution_count": 101, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "util_power_demo_1\n" ] } ], "source": [ "ts_dataset_name = f\"{PROJECT}_{DATA_VERSION}\"\n", "print(ts_dataset_name)" ] }, { "cell_type": "code", "execution_count": 102, "metadata": {}, "outputs": [], "source": [ "response = \\\n", "forecast.create_dataset(Domain=\"CUSTOM\",\n", " DatasetType='TARGET_TIME_SERIES',\n", " DatasetName=ts_dataset_name,\n", " DataFrequency=DATASET_FREQUENCY,\n", " Schema=ts_schema\n", " )" ] }, { "cell_type": "code", "execution_count": 103, "metadata": {}, "outputs": [], "source": [ "ts_dataset_arn = response['DatasetArn']" ] }, { "cell_type": "code", "execution_count": 104, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{'DatasetArn': 'arn:aws:forecast:us-west-2:730750055343:dataset/util_power_demo_1',\n", " 'DatasetName': 'util_power_demo_1',\n", " 'Domain': 'CUSTOM',\n", " 'DatasetType': 'TARGET_TIME_SERIES',\n", " 'DataFrequency': 'H',\n", " 'Schema': {'Attributes': [{'AttributeName': 'timestamp',\n", " 'AttributeType': 'timestamp'},\n", " {'AttributeName': 'target_value', 'AttributeType': 'float'},\n", " {'AttributeName': 'item_id', 'AttributeType': 'string'}]},\n", " 'EncryptionConfig': {},\n", " 'Status': 'ACTIVE',\n", " 'CreationTime': datetime.datetime(2021, 10, 21, 15, 11, 46, 887000, tzinfo=tzlocal()),\n", " 'LastModificationTime': datetime.datetime(2021, 10, 21, 15, 11, 46, 887000, tzinfo=tzlocal()),\n", " 'ResponseMetadata': {'RequestId': '7baf7876-ec88-4c63-bce3-44c01ecce913',\n", " 'HTTPStatusCode': 200,\n", " 'HTTPHeaders': {'content-type': 'application/x-amz-json-1.1',\n", " 'date': 'Thu, 21 Oct 2021 22:11:54 GMT',\n", " 'x-amzn-requestid': '7baf7876-ec88-4c63-bce3-44c01ecce913',\n", " 'content-length': '495',\n", " 'connection': 'keep-alive'},\n", " 'RetryAttempts': 0}}" ] }, "execution_count": 104, "metadata": {}, "output_type": "execute_result" } ], "source": [ "forecast.describe_dataset(DatasetArn=ts_dataset_arn)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Update the dataset group with the datasets we created\n", "You can have multiple datasets under the same dataset group. Update it with the datasets we created before." ] }, { "cell_type": "code", "execution_count": 105, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{'ResponseMetadata': {'RequestId': 'e7e74418-c60c-424f-95e2-61df025f9251',\n", " 'HTTPStatusCode': 200,\n", " 'HTTPHeaders': {'content-type': 'application/x-amz-json-1.1',\n", " 'date': 'Thu, 21 Oct 2021 22:12:02 GMT',\n", " 'x-amzn-requestid': 'e7e74418-c60c-424f-95e2-61df025f9251',\n", " 'content-length': '2',\n", " 'connection': 'keep-alive'},\n", " 'RetryAttempts': 0}}" ] }, "execution_count": 105, "metadata": {}, "output_type": "execute_result" } ], "source": [ "dataset_arns = []\n", "dataset_arns.append(ts_dataset_arn)\n", "forecast.update_dataset_group(DatasetGroupArn=dataset_group_arn, DatasetArns=dataset_arns)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Step 4: Create a Target Time Series Dataset Import Job <a class=\"anchor\" id=\"DataImport\"></a>\n", "\n", "\n", "Now that Forecast knows how to understand the CSV we are providing, the next step is to import the data from S3 into Amazon Forecaast." ] }, { "cell_type": "code", "execution_count": 106, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "S3 URI for your data file = s3://forecast-demo-uci-electricity-jeetub/elec_data/item-demand-time-train.csv\n" ] } ], "source": [ "# Recall path to your data\n", "ts_s3_data_path = \"s3://\"+bucket_name+\"/\"+key\n", "print(f\"S3 URI for your data file = {ts_s3_data_path}\")" ] }, { "cell_type": "code", "execution_count": 107, "metadata": {}, "outputs": [], "source": [ "ts_dataset_import_job_response = \\\n", " forecast.create_dataset_import_job(DatasetImportJobName=dataset_group,\n", " DatasetArn=ts_dataset_arn,\n", " DataSource= {\n", " \"S3Config\" : {\n", " \"Path\": ts_s3_data_path,\n", " \"RoleArn\": role_arn\n", " } \n", " },\n", " TimestampFormat=TIMESTAMP_FORMAT)" ] }, { "cell_type": "code", "execution_count": 108, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'arn:aws:forecast:us-west-2:730750055343:dataset-import-job/util_power_demo_1/util_power_demo_1'" ] }, "execution_count": 108, "metadata": {}, "output_type": "execute_result" } ], "source": [ "ts_dataset_import_job_arn=ts_dataset_import_job_response['DatasetImportJobArn']\n", "ts_dataset_import_job_arn" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Check the status of dataset, when the status change from **CREATE_IN_PROGRESS** to **ACTIVE**, we can continue to next steps. Depending on the data size. It can take 10 mins to be **ACTIVE**. This process will take 5 to 10 minutes." ] }, { "cell_type": "code", "execution_count": 109, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "CREATE_PENDING ..\n", "CREATE_IN_PROGRESS .............\n", "ACTIVE \n" ] } ], "source": [ "status = util.wait(lambda: forecast.describe_dataset_import_job(DatasetImportJobArn=ts_dataset_import_job_arn))\n", "assert status" ] }, { "cell_type": "code", "execution_count": 110, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{'DatasetImportJobName': 'util_power_demo_1',\n", " 'DatasetImportJobArn': 'arn:aws:forecast:us-west-2:730750055343:dataset-import-job/util_power_demo_1/util_power_demo_1',\n", " 'DatasetArn': 'arn:aws:forecast:us-west-2:730750055343:dataset/util_power_demo_1',\n", " 'TimestampFormat': 'yyyy-MM-dd hh:mm:ss',\n", " 'UseGeolocationForTimeZone': False,\n", " 'DataSource': {'S3Config': {'Path': 's3://forecast-demo-uci-electricity-jeetub/elec_data/item-demand-time-train.csv',\n", " 'RoleArn': 'arn:aws:iam::730750055343:role/ForecastNotebookRole-Basic'}},\n", " 'FieldStatistics': {'item_id': {'Count': 21885,\n", " 'CountDistinct': 3,\n", " 'CountNull': 0,\n", " 'CountLong': 21885,\n", " 'CountDistinctLong': 3,\n", " 'CountNullLong': 0},\n", " 'target_value': {'Count': 21885,\n", " 'CountDistinct': 4635,\n", " 'CountNull': 0,\n", " 'CountNan': 0,\n", " 'Min': '0.0',\n", " 'Max': '209.99170812603649',\n", " 'Avg': 50.0947432986864,\n", " 'Stddev': 38.47197571594975,\n", " 'CountLong': 21885,\n", " 'CountDistinctLong': 4635,\n", " 'CountNullLong': 0,\n", " 'CountNanLong': 0},\n", " 'timestamp': {'Count': 21885,\n", " 'CountDistinct': 7295,\n", " 'CountNull': 0,\n", " 'Min': '2014-01-01T01:00:00Z',\n", " 'Max': '2014-10-31T23:00:00Z',\n", " 'CountLong': 21885,\n", " 'CountDistinctLong': 7295,\n", " 'CountNullLong': 0}},\n", " 'DataSize': 0.0009746281430125237,\n", " 'Status': 'ACTIVE',\n", " 'CreationTime': datetime.datetime(2021, 10, 21, 15, 12, 6, 496000, tzinfo=tzlocal()),\n", " 'LastModificationTime': datetime.datetime(2021, 10, 21, 15, 15, 2, 189000, tzinfo=tzlocal()),\n", " 'ResponseMetadata': {'RequestId': 'e45703ac-d2f6-40ee-bf57-e7abc1aa490c',\n", " 'HTTPStatusCode': 200,\n", " 'HTTPHeaders': {'content-type': 'application/x-amz-json-1.1',\n", " 'date': 'Thu, 21 Oct 2021 22:15:02 GMT',\n", " 'x-amzn-requestid': 'e45703ac-d2f6-40ee-bf57-e7abc1aa490c',\n", " 'content-length': '1190',\n", " 'connection': 'keep-alive'},\n", " 'RetryAttempts': 0}}" ] }, "execution_count": 110, "metadata": {}, "output_type": "execute_result" } ], "source": [ "forecast.describe_dataset_import_job(DatasetImportJobArn=ts_dataset_import_job_arn)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Next Steps<a class=\"anchor\" id=\"nextSteps\"></a>\n", "\n", "At this point you have successfully imported your data into Amazon Forecast and now it is time to get started in the next notebook to build your first model. To Continue, execute the cell below to store important variables where they can be used in the next notebook, then open `2.Building_Your_Predictor.ipynb`." ] }, { "cell_type": "code", "execution_count": 111, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Stored 'item_id' (str)\n", "Stored 'PROJECT' (str)\n", "Stored 'DATA_VERSION' (int)\n", "Stored 'FORECAST_LENGTH' (int)\n", "Stored 'DATASET_FREQUENCY' (str)\n", "Stored 'TIMESTAMP_FORMAT' (str)\n", "Stored 'ts_dataset_import_job_arn' (str)\n", "Stored 'ts_dataset_arn' (str)\n", "Stored 'dataset_group_arn' (str)\n", "Stored 'role_arn' (str)\n", "Stored 'bucket_name' (str)\n", "Stored 'region' (str)\n", "Stored 'key' (str)\n" ] } ], "source": [ "# Now save your choices for the next notebook \n", "%store item_id\n", "%store PROJECT\n", "%store DATA_VERSION\n", "%store FORECAST_LENGTH\n", "%store DATASET_FREQUENCY\n", "%store TIMESTAMP_FORMAT\n", "%store ts_dataset_import_job_arn\n", "%store ts_dataset_arn\n", "%store dataset_group_arn\n", "%store role_arn\n", "%store bucket_name\n", "%store region\n", "%store key" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Additional Topics<a class=\"anchor\" id=\"additionalTopics\"></a>" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Stop the data import\n", "\n", "Possibly during fine-tuning development, you'll accidentally upload data before you're ready. If you don't want to wait for the data upload and processing, there is a handy \"Stop API\" call.\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# StopResource\n", "stop_ts_dataset_import_job_arn = forecast.stop_resource(ResourceArn=ts_dataset_import_job_arn)" ] }, { "cell_type": "code", "execution_count": 78, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{'ResponseMetadata': {'RequestId': '0b91b92b-e435-49aa-9d11-267f3d91ae8f',\n", " 'HTTPStatusCode': 200,\n", " 'HTTPHeaders': {'content-type': 'application/x-amz-json-1.1',\n", " 'date': 'Thu, 21 Oct 2021 22:05:36 GMT',\n", " 'x-amzn-requestid': '0b91b92b-e435-49aa-9d11-267f3d91ae8f',\n", " 'content-length': '0',\n", " 'connection': 'keep-alive'},\n", " 'RetryAttempts': 0}}" ] }, "execution_count": 78, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Delete the target time series dataset import job\n", "# util.wait_till_delete(lambda: forecast.delete_dataset_import_job(DatasetImportJobArn=ts_dataset_import_job_arn))\n", "\n", "forecast.delete_resource_tree(ResourceArn=\"arn:aws:forecast:us-west-2:730750055343:dataset/util_power_demo_1\")" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.12" } }, "nbformat": 4, "nbformat_minor": 4 }