{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# How to use Amazon Forecast\n",
    "\n",
    "Helps advanced users start with Amazon Forecast quickly. The demo notebook runs through a typical end to end usecase for a simple timeseries forecasting scenario. \n",
    "\n",
    "Prerequisites: \n",
    "[AWS CLI](https://docs.aws.amazon.com/cli/latest/userguide/installing.html) . \n",
    "\n",
    "For more informations about APIs, please check the [documentation](https://docs.aws.amazon.com/forecast/latest/dg/what-is-forecast.html)\n",
    "\n",
    "## Table Of Contents\n",
    "* [Setting up](#setup)\n",
    "* [Test Setup - Running first API](#hello)\n",
    "* [Forecasting Example with Amazon Forecast](#forecastingExample)\n",
    "\n",
    "**Read Every Cell FULLY before executing it**\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Setup <a class=\"anchor\" id=\"setup\"></a>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import sys\n",
    "import os\n",
    "import time\n",
    "\n",
    "import boto3\n",
    "\n",
    "# importing forecast notebook utility from notebooks/common directory\n",
    "sys.path.insert( 0, os.path.abspath(\"../../common\") )\n",
    "import util"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Configure the S3 bucket name and region name for this lesson.\n",
    "\n",
    "- If you don't have an S3 bucket, create it first on S3.\n",
    "- Although we have set the region to us-west-2 as a default value below, you can choose any of the regions that the service is available in."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "text_widget_bucket = util.create_text_widget( \"bucketName\", \"input your S3 bucket name\" )\n",
    "text_widget_region = util.create_text_widget( \"region\", \"input region name.\", default_value=\"us-west-2\" )"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "bucketName = text_widget_bucket.value\n",
    "assert bucketName, \"bucket_name not set.\"\n",
    "\n",
    "region = text_widget_region.value\n",
    "assert region, \"region not set.\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "session = boto3.Session(region_name=region) \n",
    "\n",
    "forecast = session.client(service_name='forecast') \n",
    "forecastquery = session.client(service_name='forecastquery')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Forecasting with Amazon Forecast<a class=\"anchor\" id=\"forecastingExample\"></a>\n",
    "### Preparing your Data"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In Amazon Forecast , a dataset is a collection of file(s) which contain data that is relevant for a forecasting task. A dataset must conform to a schema provided by Amazon Forecast. "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "For this exercise, we use the individual household electric power consumption dataset. (Dua, D. and Karra Taniskidou, E. (2017). UCI Machine Learning Repository [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science.) We aggregate the usage data hourly. "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Data Type"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Amazon forecast can import data from Amazon S3. We first explore the data locally to see the fields"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd\n",
    "df = pd.read_csv(\"../../common/data/item-demand-time.csv\", dtype = object)\n",
    "df.head(3)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now upload the data to S3. But before doing that, go into your AWS Console, select S3 for the service and create a new bucket inside the `Oregon` or `us-west-2` region. Use that bucket name convention of `amazon-forecast-unique-value-data`. The name must be unique, if you get an error, just adjust until your name works, then update the `bucketName` cell below."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "s3 = session.client('s3')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "key=\"elec_data/item-demand-time.csv\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "s3.upload_file(Filename=\"../../common/data/item-demand-time.csv\", Bucket=bucketName, Key=key)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Create the role to provide to Amazon Forecast.\n",
    "role_name = \"ForecastNotebookRole-AutoML\"\n",
    "role_arn = util.get_or_create_iam_role( role_name = role_name )"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### CreateDataset"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "More details about `Domain` and dataset type can be found on the [documentation](https://docs.aws.amazon.com/forecast/latest/dg/howitworks-domains-ds-types.html) . For this example, we are using [CUSTOM](https://docs.aws.amazon.com/forecast/latest/dg/custom-domain.html) domain with 3 required attributes `timestamp`, `target_value` and `item_id`. Also for your project name, update it to reflect your name in a lowercase format."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "DATASET_FREQUENCY = \"H\" \n",
    "TIMESTAMP_FORMAT = \"yyyy-MM-dd hh:mm:ss\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "project = 'workshop_forecastdemo_1' # Replace this with a unique name here, make sure the entire name is < 30 characters.\n",
    "datasetName= project+'_ds'\n",
    "datasetGroupName= project +'_gp'\n",
    "s3DataPath = \"s3://\"+bucketName+\"/\"+key"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "datasetName"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Schema Definition \n",
    "### We are defining the attributes for the model "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Specify the schema of your dataset here. Make sure the order of columns matches the raw data files.\n",
    "schema ={\n",
    "   \"Attributes\":[\n",
    "      {\n",
    "         \"AttributeName\":\"timestamp\",\n",
    "         \"AttributeType\":\"timestamp\"\n",
    "      },\n",
    "      {\n",
    "         \"AttributeName\":\"target_value\",\n",
    "         \"AttributeType\":\"float\"\n",
    "      },\n",
    "      {\n",
    "         \"AttributeName\":\"item_id\",\n",
    "         \"AttributeType\":\"string\"\n",
    "      }\n",
    "   ]\n",
    "}\n",
    "\n",
    "response=forecast.create_dataset(\n",
    "                    Domain=\"CUSTOM\",\n",
    "                    DatasetType='TARGET_TIME_SERIES',\n",
    "                    DatasetName=datasetName,\n",
    "                    DataFrequency=DATASET_FREQUENCY, \n",
    "                    Schema = schema\n",
    "                   )\n",
    "datasetArn = response['DatasetArn']"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "create_dataset_group_response = forecast.create_dataset_group(DatasetGroupName=datasetGroupName,\n",
    "                                                              Domain=\"CUSTOM\",\n",
    "                                                              DatasetArns= [datasetArn]\n",
    "                                                             )\n",
    "datasetGroupArn = create_dataset_group_response['DatasetGroupArn']"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "If you have an existing datasetgroup, you can update it using **update_dataset_group** to update dataset group."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "forecast.describe_dataset_group(DatasetGroupArn=datasetGroupArn)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Create Data Import Job\n",
    "Brings the data into Amazon Forecast system ready to forecast from raw data. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "datasetImportJobName = 'EP_AML_DSIMPORT_JOB_TARGET'\n",
    "ds_import_job_response=forecast.create_dataset_import_job(DatasetImportJobName=datasetImportJobName,\n",
    "                                                          DatasetArn=datasetArn,\n",
    "                                                          DataSource= {\n",
    "                                                              \"S3Config\" : {\n",
    "                                                                 \"Path\":s3DataPath,\n",
    "                                                                 \"RoleArn\": role_arn\n",
    "                                                              } \n",
    "                                                          },\n",
    "                                                          TimestampFormat=TIMESTAMP_FORMAT\n",
    "                                                         )"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "ds_import_job_arn=ds_import_job_response['DatasetImportJobArn']\n",
    "print(ds_import_job_arn)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Check the status of dataset, when the status change from **CREATE_IN_PROGRESS** to **ACTIVE**, we can continue to next steps. Depending on the data size. It can take 10 mins to be **ACTIVE**. This process will take 5 to 10 minutes."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "status_indicator = util.StatusIndicator()\n",
    "\n",
    "while True:\n",
    "    status = forecast.describe_dataset_import_job(DatasetImportJobArn=ds_import_job_arn)['Status']\n",
    "    status_indicator.update(status)\n",
    "    if status in ('ACTIVE', 'CREATE_FAILED'): break\n",
    "    time.sleep(10)\n",
    "\n",
    "status_indicator.end()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "forecast.describe_dataset_import_job(DatasetImportJobArn=ds_import_job_arn)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Create Predictor with customer forecast horizon"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Forecast horizon is the number of number of time points to predicted in the future. For weekly data, a value of 12 means 12 weeks. Our example is hourly data, we try forecast the next day, so we can set to 24."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "If we are not sure which recipe will perform best, we can utilise the Auto ML option that the SDK offers."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "predictorName = project+'_autoML'"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "forecastHorizon = 24"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "algorithmArn = 'arn:aws:forecast:::algorithm/ETS'"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "create_predictor_response=forecast.create_predictor(PredictorName=predictorName, \n",
    "                                                  ForecastHorizon=forecastHorizon,\n",
    "                                                  PerformAutoML=True,\n",
    "                                                  PerformHPO=False,\n",
    "                                                  EvaluationParameters= {\"NumberOfBacktestWindows\": 1, \n",
    "                                                                         \"BackTestWindowOffset\": 24}, \n",
    "                                                  InputDataConfig= {\"DatasetGroupArn\": datasetGroupArn},\n",
    "                                                  FeaturizationConfig= {\"ForecastFrequency\": \"H\", \n",
    "                                                                        \"Featurizations\": \n",
    "                                                                        [\n",
    "                                                                          {\"AttributeName\": \"target_value\", \n",
    "                                                                           \"FeaturizationPipeline\": \n",
    "                                                                            [\n",
    "                                                                              {\"FeaturizationMethodName\": \"filling\", \n",
    "                                                                               \"FeaturizationMethodParameters\": \n",
    "                                                                                {\"frontfill\": \"none\", \n",
    "                                                                                 \"middlefill\": \"zero\", \n",
    "                                                                                 \"backfill\": \"zero\"}\n",
    "                                                                              }\n",
    "                                                                            ]\n",
    "                                                                          }\n",
    "                                                                        ]\n",
    "                                                                       }\n",
    "                                                 )"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "predictorArn=create_predictor_response['PredictorArn']"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Check the status of the predictor. When the status change from **CREATE_IN_PROGRESS** to **ACTIVE**, we can continue to next steps. Depending on data size, model selection and hyper parametersï¼Œit can take 10 mins to more than one hour to be **ACTIVE**."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "status_indicator = util.StatusIndicator()\n",
    "\n",
    "while True:\n",
    "    status = forecast.describe_predictor(PredictorArn=predictorArn)['Status']\n",
    "    status_indicator.update(status)\n",
    "    if status in ('ACTIVE', 'CREATE_FAILED'): break\n",
    "    time.sleep(10)\n",
    "\n",
    "status_indicator.end()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Get Error Metrics"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's get the accuracy metrics of the predicto we just created using Auto ML. The response will be a dictionary with all available recipes. Auto ML works out the best one for our predictor."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "forecast.get_accuracy_metrics(PredictorArn=predictorArn)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Create Forecast"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now create a forecast using the model that was trained."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "forecastName= project+'_aml_forecast'"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "create_forecast_response=forecast.create_forecast(ForecastName=forecastName,\n",
    "                                                  PredictorArn=predictorArn)\n",
    "forecastArn = create_forecast_response['ForecastArn']"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Check the status of the forecast process, when the status change from **CREATE_IN_PROGRESS** to **ACTIVE**, we can continue to next steps. Depending on data size, model selection and hyper parametersï¼Œit can take 10 mins to more than one hour to be **ACTIVE**. There's no output here, but that is fine as long as the * is there."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "status_indicator = util.StatusIndicator()\n",
    "\n",
    "while True:\n",
    "    status = forecast.describe_forecast(ForecastArn=forecastArn)['Status']\n",
    "    status_indicator.update(status)\n",
    "    if status in ('ACTIVE', 'CREATE_FAILED'): break\n",
    "    time.sleep(10)\n",
    "\n",
    "status_indicator.end()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Get Forecast"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Once created, the forecast results are ready and you view them. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "forecastResponse = forecastquery.query_forecast(\n",
    "    ForecastArn=forecastArn,\n",
    "   Filters={\"item_id\":\"client_12\"}\n",
    ")\n",
    "print(forecastResponse)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Export Forecast"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You can export forecast to s3 bucket. To do so an role with s3 put access is needed, but this has already been created."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "forecastExportName= project+'_aml_forecast_export'"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "outputPath=\"s3://\"+bucketName+\"/output\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "forecast_export_response = forecast.create_forecast_export_job(\n",
    "                                                                ForecastExportJobName = forecastExportName,\n",
    "                                                                ForecastArn=forecastArn, \n",
    "                                                                Destination = {\n",
    "                                                                   \"S3Config\" : {\n",
    "                                                                       \"Path\":outputPath,\n",
    "                                                                       \"RoleArn\": role_arn\n",
    "                                                                   } \n",
    "                                                                }\n",
    "                                                              )"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "forecastExportJobArn = forecast_export_response['ForecastExportJobArn']"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "status_indicator = util.StatusIndicator()\n",
    "\n",
    "while True:\n",
    "    status = forecast.describe_forecast_export_job(ForecastExportJobArn=forecastExportJobArn)['Status']\n",
    "    status_indicator.update(status)\n",
    "    if status in ('ACTIVE', 'CREATE_FAILED'): break\n",
    "    time.sleep(10)\n",
    "\n",
    "status_indicator.end()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Check s3 bucket for results"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "s3.list_objects(Bucket=bucketName,Prefix=\"output\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Cleanup\n",
    "\n",
    "Once we have completed the above steps, we can start to cleanup the resources we created. All the created resources can be deleted using `delete_resource_tree` and it is an asynchronous operation, so we have added the helpful `wait_till_delete` function. To learn more about deleting a parent resource and all its child resources, visit [DeleteResourceTree](https://docs.aws.amazon.com/forecast/latest/dg/API_DeleteResourceTree.html) API. \n",
    "Resource Limits documented <a href=\"https://docs.aws.amazon.com/forecast/latest/dg/limits.html\">here</a>."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Delete the DatasetGroup, and all its child resources such as Predictor, PredictorBacktestExportJob, Forecast and ForecastExportJob\n",
    "util.wait_till_delete(lambda: forecast.delete_resource_tree(ResourceArn=datasetGroupArn))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Delete the Dataset and its child DatasetImportJob resources:\n",
    "util.wait_till_delete(lambda: forecast.delete_resource_tree(ResourceArn=datasetArn))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Delete IAM role\n",
    "util.delete_iam_role( role_name )"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "conda_python3",
   "language": "python",
   "name": "conda_python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.5"
  },
  "toc": {
   "collapse_to_match_collapsible_headings": false,
   "colors": {
    "hover_highlight": "#DAA520",
    "navigate_num": "#000000",
    "navigate_text": "#333333",
    "running_highlight": "#FF0000",
    "selected_highlight": "#FFD700",
    "sidebar_border": "#EEEEEE",
    "wrapper_background": "#FFFFFF"
   },
   "moveMenuLeft": true,
   "nav_menu": {
    "height": "253px",
    "width": "254px"
   },
   "navigate_menu": true,
   "number_sections": true,
   "sideBar": true,
   "skip_h1_title": false,
   "threshold": 4,
   "toc_cell": false,
   "toc_section_display": "block",
   "toc_window_display": false,
   "widenNotebook": false
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}