{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "0534387a",
   "metadata": {},
   "source": [
    "# Now you can train on a global dataset and produce forecasts on selected items\n",
    "\n",
    "We are excited to announce that [Amazon Forecast](https://aws.amazon.com/forecast/) now offers a new feature that allows customers to generate forecasted data points on a selected subset of time-series items.  With this release, customers can import a global set of data, train a predictor (model) on the global set, yet generate forecasted data points (inference) surgically on a subset of time-series items, as needed, using a new \"TimeSeriesSelector\" available in the [CreateForecast API](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/forecast.html#ForecastService.Client.create_forecast).\n",
    "\n",
    "This notebook follows patterns seen in other Amazon Forecast notebooks; however, the key differentiator in this example is in Step 4 and Step 5, where two CreateForecast jobs are executed, driven by item_id values inside a CSV input file, resident on S3.  These two small forecasts are then exported to S3 where their content is shown as part of the notebook example.\n",
    "\n",
    "The provided notebook is saved in an executed state, so you may review outputs without having to run each cell, unless you choose to do so.\n",
    "\n",
    "For this exercise, yellow taxi trip records are used from [NYC Taxi and Limousine Commission (TLC)](https://www1.nyc.gov/site/tlc/about/tlc-trip-record-data.page)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "34b1bdd8",
   "metadata": {},
   "source": [
    "\n",
    "## Table of Contents\n",
    "* [Initial Setup](#setup)\n",
    "* Step 1: [Upload sample data to S3](#upload)\n",
    "* Step 2: [Create a dataset, import data and dataset group](#dataset)\n",
    "* Step 3: [Create predictor](#predictors)\n",
    "* Step 4: [Create forecasts for subset of items](#forecasts)\n",
    "* Step 5: [Export and view forecasted data](#view)\n",
    "* Step 6: [Cleanup](#cleanup)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "01ef6adb",
   "metadata": {},
   "source": [
    "# Initial Setup<a class=\"anchor\" id=\"setup\"></a>\n",
    "\n",
    "### Upgrade boto3\n",
    "\n",
    "Before proceeding, ensure you have upgraded boto3.\n",
    "\n",
    "Important: To use the feature highlighted in this notebook, you will need boto3 1.24.1 or greater.  In order to receive this SDK update, you will need to use Python 3.7 or later as per [this blog](https://aws.amazon.com/blogs/developer/python-support-policy-updates-for-aws-sdks-and-tools/).  As mentioned, Python 3.6 ended on May 30, 2022."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "0f433461",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Requirement already satisfied: boto3 in /home/ec2-user/anaconda3/envs/mxnet_latest_p37/lib/python3.7/site-packages (1.24.1)\n",
      "Requirement already satisfied: jmespath<2.0.0,>=0.7.1 in /home/ec2-user/anaconda3/envs/mxnet_latest_p37/lib/python3.7/site-packages (from boto3) (0.10.0)\n",
      "Requirement already satisfied: botocore<1.28.0,>=1.27.1 in /home/ec2-user/anaconda3/envs/mxnet_latest_p37/lib/python3.7/site-packages (from boto3) (1.27.1)\n",
      "Requirement already satisfied: s3transfer<0.7.0,>=0.6.0 in /home/ec2-user/anaconda3/envs/mxnet_latest_p37/lib/python3.7/site-packages (from boto3) (0.6.0)\n",
      "Requirement already satisfied: python-dateutil<3.0.0,>=2.1 in /home/ec2-user/anaconda3/envs/mxnet_latest_p37/lib/python3.7/site-packages (from botocore<1.28.0,>=1.27.1->boto3) (2.8.1)\n",
      "Requirement already satisfied: urllib3<1.27,>=1.25.4 in /home/ec2-user/anaconda3/envs/mxnet_latest_p37/lib/python3.7/site-packages (from botocore<1.28.0,>=1.27.1->boto3) (1.26.8)\n",
      "Requirement already satisfied: six>=1.5 in /home/ec2-user/anaconda3/envs/mxnet_latest_p37/lib/python3.7/site-packages (from python-dateutil<3.0.0,>=2.1->botocore<1.28.0,>=1.27.1->boto3) (1.15.0)\n",
      "\u001b[33mWARNING: You are using pip version 21.3.1; however, version 22.1.2 is available.\n",
      "You should consider upgrading via the '/home/ec2-user/anaconda3/envs/mxnet_latest_p37/bin/python -m pip install --upgrade pip' command.\u001b[0m\n"
     ]
    }
   ],
   "source": [
    "!pip install boto3 --upgrade"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d34ac5b3",
   "metadata": {},
   "source": [
    "### Setup Imports"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "c3229377",
   "metadata": {},
   "outputs": [],
   "source": [
    "import boto3\n",
    "from time import sleep\n",
    "import subprocess\n",
    "import sys\n",
    "import os\n",
    "import pandas as pd\n",
    "import calendar\n",
    "\n",
    "sys.path.insert( 0, os.path.abspath(\"../../common\") )\n",
    "\n",
    "import json\n",
    "import util"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ff6b72f3",
   "metadata": {},
   "source": [
    "### Function to supressing printing account numbers"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "c85e1bd7",
   "metadata": {},
   "outputs": [],
   "source": [
    "import re\n",
    "\n",
    "def mask_arn(input_string):\n",
    "\n",
    "    mask_regex = re.compile(':[0-9]{12}:')\n",
    "    mask = mask_regex.search(input_string)\n",
    "        \n",
    "    while mask:\n",
    "        input_string = input_string.replace(mask.group(),'X'*12)\n",
    "        mask = mask_regex.search(input_string) \n",
    "        \n",
    "    return input_string"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "901c3845",
   "metadata": {},
   "source": [
    "### Create an instance of AWS SDK client for Amazon Forecast"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "46e554d6",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Set your region accordingly, us-east-1 as shown\n",
    "region = 'us-east-1'\n",
    "session = boto3.Session(region_name=region) \n",
    "forecast = session.client(service_name='forecast')\n",
    "\n",
    "# Checking to make sure we can communicate with Amazon Forecast\n",
    "assert forecast.list_forecasts()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a6730c18",
   "metadata": {},
   "source": [
    "### Setup IAM Role used by Amazon Forecast to access your data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "034faa43",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Creating Role ForecastNotebookRole-Basic...\n",
      "The role ForecastNotebookRole-Basic already exists, skipping creation\n",
      "Done.\n",
      "Success! Created role = ForecastNotebookRole-Basic\n"
     ]
    }
   ],
   "source": [
    "role_name = \"ForecastNotebookRole-Basic\"\n",
    "print(f\"Creating Role {mask_arn(role_name)}...\")\n",
    "role_arn = util.get_or_create_iam_role( role_name = role_name )\n",
    "\n",
    "# echo user inputs without account\n",
    "print(f\"Success! Created role = {mask_arn(role_arn).split('/')[1]}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3f7e0c57",
   "metadata": {},
   "source": [
    "# Step 1: Upload sample data to S3<a class=\"anchor\" id=\"upload\"></a>\n",
    "\n",
    "The dataset has the following 3 columns:\n",
    "- timestamp: Timetamp at which pick-ups are requested.\n",
    "- item_id: Pick-up location ID.\n",
    "- target_value: Number of pick-ups requested around the timestamp at the pick-up location.\n",
    "\n",
    "In addition, two small files are uploaded called subset1 and subset2.  These contain a small list of three locations each and will be used to drive targeted forecast generation.\n",
    "\n",
    "Note:  As delivered, this uses the sample file in the data folder relative to this notebook.  Please take care to ensure this file is available to your notebook."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "299f9cba",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "Enter S3 bucket name for uploading the data and hit ENTER key:forecast-us-east-1-XXXXXXXXXXXX\n"
     ]
    }
   ],
   "source": [
    "bucket_name =  input(\"\\nEnter S3 bucket name for uploading the data and hit ENTER key:\")\n",
    "\n",
    "s3 = boto3.resource('s3')\n",
    "s3.meta.client.upload_file('./data/taxi-dec2017-jan2019.csv', bucket_name, 'taxi_dec2017_jan2019.csv')\n",
    "s3.meta.client.upload_file('./data/subset1.csv', bucket_name, 'subset1.csv')\n",
    "s3.meta.client.upload_file('./data/subset2.csv', bucket_name, 'subset2.csv')\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "bb676775",
   "metadata": {},
   "source": [
    "# Step 2: Create a dataset, import data and dataset group<a class=\"anchor\" id=\"dataset\"></a>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5a39b2ce",
   "metadata": {},
   "source": [
    "### Create Dataset"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "id": "0f096ece",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Dataset ARN arn:aws:forecast:us-east-1XXXXXXXXXXXXdataset/TAXI_TIME_FORECAST_SUBSET_DEMO is now ACTIVE.\n"
     ]
    }
   ],
   "source": [
    "DATASET_FREQUENCY = \"H\"\n",
    "TS_DATASET_NAME = \"TAXI_TIME_FORECAST_SUBSET_DEMO\"\n",
    "TS_SCHEMA = {\n",
    "   \"Attributes\":[\n",
    "      {\n",
    "         \"AttributeName\":\"timestamp\",\n",
    "         \"AttributeType\":\"timestamp\"\n",
    "      },\n",
    "      {\n",
    "         \"AttributeName\":\"item_id\",\n",
    "         \"AttributeType\":\"string\"\n",
    "      },\n",
    "      {\n",
    "         \"AttributeName\":\"target_value\",\n",
    "         \"AttributeType\":\"integer\"\n",
    "      }\n",
    "   ]\n",
    "}   \n",
    "\n",
    "create_dataset_response = forecast.create_dataset(Domain=\"CUSTOM\",\n",
    "                                                  DatasetType='TARGET_TIME_SERIES',\n",
    "                                                  DatasetName=TS_DATASET_NAME,\n",
    "                                                  DataFrequency=DATASET_FREQUENCY,\n",
    "                                                  Schema=TS_SCHEMA)\n",
    "\n",
    "ts_dataset_arn = create_dataset_response['DatasetArn']\n",
    "describe_dataset_response = forecast.describe_dataset(DatasetArn=ts_dataset_arn)\n",
    "\n",
    "print(f\"Dataset ARN {mask_arn(ts_dataset_arn)} is now {describe_dataset_response['Status']}.\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d009acce",
   "metadata": {},
   "source": [
    "### Import the related time series\n",
    "This TTS files contains data for more than 200 item_id values (locations); later we will forecast a total of 6 locations in two Forecast jobs."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "id": "ae314cc6",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Waiting for Dataset Import Job with ARN arn:aws:forecast:us-east-1XXXXXXXXXXXXdataset-import-job/TAXI_TIME_FORECAST_SUBSET_DEMO/taxi_dec2017_jan2019 to become ACTIVE.\n",
      "CREATE_PENDING \n",
      "CREATE_IN_PROGRESS .........\n",
      "ACTIVE \n"
     ]
    }
   ],
   "source": [
    "TS_IMPORT_JOB_NAME = 'taxi_dec2017_jan2019'\n",
    "TIMESTAMP_FORMAT = \"yyyy-MM-dd hh:mm:ss\"\n",
    "ts_s3_path = f\"s3://{bucket_name}/{TS_IMPORT_JOB_NAME}.csv\"\n",
    "TIMEZONE = \"EST\"\n",
    "\n",
    "#frequency of poll event from API to check status of tasks\n",
    "sleep_duration=60\n",
    "\n",
    "ts_dataset_import_job_response = \\\n",
    "    forecast.create_dataset_import_job(DatasetImportJobName=TS_IMPORT_JOB_NAME,\n",
    "                                       DatasetArn=ts_dataset_arn,\n",
    "                                       DataSource= {\n",
    "                                         \"S3Config\" : {\n",
    "                                             \"Path\": ts_s3_path,\n",
    "                                             \"RoleArn\": role_arn\n",
    "                                         } \n",
    "                                       },\n",
    "                                       TimestampFormat=TIMESTAMP_FORMAT,\n",
    "                                       TimeZone = TIMEZONE)\n",
    "\n",
    "ts_dataset_import_job_arn = ts_dataset_import_job_response['DatasetImportJobArn']\n",
    "\n",
    "print(f\"Waiting for Dataset Import Job with ARN {mask_arn(ts_dataset_import_job_arn)} to become ACTIVE.\")\n",
    "\n",
    "status = util.wait(lambda: forecast.describe_dataset_import_job(DatasetImportJobArn=ts_dataset_import_job_arn), sleep_duration)\n",
    "      "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a7d09d7e",
   "metadata": {},
   "source": [
    "### Create a dataset group"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "id": "e0b61893",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "The DatasetGroup with ARN arn:aws:forecast:us-east-1XXXXXXXXXXXXdataset-group/TAXI_TIME_FORECAST_SUBSET_DEMO is now ACTIVE.\n"
     ]
    }
   ],
   "source": [
    "DATASET_GROUP_NAME = \"TAXI_TIME_FORECAST_SUBSET_DEMO\"\n",
    "DATASET_ARNS = [ts_dataset_arn]\n",
    "\n",
    "create_dataset_group_response = \\\n",
    "    forecast.create_dataset_group(Domain=\"CUSTOM\",\n",
    "                                  DatasetGroupName=DATASET_GROUP_NAME,\n",
    "                                  DatasetArns=DATASET_ARNS)\n",
    "\n",
    "dataset_group_arn = create_dataset_group_response['DatasetGroupArn']\n",
    "describe_dataset_group_response = forecast.describe_dataset_group(DatasetGroupArn=dataset_group_arn)\n",
    "\n",
    "print(f\"The DatasetGroup with ARN {mask_arn(dataset_group_arn)} is now {describe_dataset_group_response['Status']}.\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c45c8d5c",
   "metadata": {},
   "source": [
    "# Step 3: Create predictor<a class=\"anchor\" id=\"predictors\"></a>\n",
    "\n",
    "This step creates a daily predictor with a short, three-day horizon and minimal features for this demo.  The imported dataset, as delivered, is more than 260 unique item_ids which represents demand at clustered locations.  Later in this notebook, you will see that forecasts are generated only for the locations desired.   The benefit here is a model is trained on a global data set and forecasts can be targeted and produced as demanded."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "id": "7ebe9f7c",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Waiting for Predictor ARN arn:aws:forecast:us-east-1XXXXXXXXXXXXpredictor/TAXI_TIME_FORECAST_SUBSET_DEMO_01G4GJTP6HNPNVRQ0QVVTPR0T1 to become ACTIVE.\n",
      "CREATE_PENDING \n",
      "CREATE_IN_PROGRESS ..................................................\n",
      "ACTIVE \n"
     ]
    }
   ],
   "source": [
    "FORECAST_HORIZON = 3\n",
    "FORECAST_FREQUENCY = \"D\"\n",
    "\n",
    "create_auto_predictor_response = \\\n",
    "    forecast.create_auto_predictor(PredictorName = 'TAXI_TIME_FORECAST_SUBSET_DEMO',\n",
    "                                   ForecastHorizon = FORECAST_HORIZON,\n",
    "                                   ForecastFrequency = FORECAST_FREQUENCY,\n",
    "                                   DataConfig = {\n",
    "                                       'DatasetGroupArn': dataset_group_arn\n",
    "                                    },\n",
    "                                   ExplainPredictor = False)\n",
    "\n",
    "predictor_arn = create_auto_predictor_response['PredictorArn']\n",
    "print(f\"Waiting for Predictor ARN {mask_arn(predictor_arn)} to become ACTIVE.\")\n",
    "\n",
    "#Wait on the predictors to complete and become active\n",
    "status = util.wait(lambda: forecast.describe_auto_predictor(PredictorArn=predictor_arn), sleep_duration)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4a121127",
   "metadata": {},
   "source": [
    "# Step 4: Create forecasts for subset of items<a class=\"anchor\" id=\"forecasts\"></a>\n",
    "\n",
    "This step is the primary emphasis of this notebook.  CSV file subset1.csv contains a list of item_id (location) 101, 102, 103.  Subset2 is similar in concept, yet it contains another set of item_id values.\n",
    "\n",
    "From the local file system, let's view the subset1.csv.  The same file was uploaded to S3 earlier in this notebook, so it can be read by Amazon Forecast.  This is the file that is driving the TimeSeriesIdentifier data source in the actual Create Forecast API.   If your use case has multiple dimensions, your input file should reflect them as a multi-valued delimited record.  In addition, the schema attribute definition in the below API call should match your physical file structure."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "id": "b85ebae7",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "101\r\n",
      "102\r\n",
      "103\r\n"
     ]
    }
   ],
   "source": [
    "!cat ./data/subset1.csv"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c5dff801",
   "metadata": {},
   "source": [
    "The API calls below produce two forecasts, one for each \"needed\" subset of time-series, keeping inference to a minimum.  Without the subset selection in place, the create_forecast API will produce inference for every time-series in the dataset used to train the predictor."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "id": "1194a71a",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Waiting for Subset 1 Forecast with ARN arn:aws:forecast:us-east-1XXXXXXXXXXXXforecast/TAXI_TIME_FORECAST_SUBSET_1 to become ACTIVE.\n",
      "Waiting for Subset 2 Forecast with ARN arn:aws:forecast:us-east-1XXXXXXXXXXXXforecast/TAXI_TIME_FORECAST_SUBSET_2 to become ACTIVE.\n",
      "CREATE_PENDING \n",
      "CREATE_IN_PROGRESS ....................\n",
      "ACTIVE \n",
      "ACTIVE \n"
     ]
    }
   ],
   "source": [
    "#locations 101, 102, 103\n",
    "subset1_path = f\"s3://{bucket_name}/subset1.csv\"\n",
    "\n",
    "create_forecast_response = \\\n",
    "    forecast.create_forecast(ForecastName= 'TAXI_TIME_FORECAST_SUBSET_1',\n",
    "                             ForecastTypes= [\"0.5\"],\n",
    "                             PredictorArn= predictor_arn,\n",
    "                             TimeSeriesSelector= {\n",
    "                                \"TimeSeriesIdentifiers\": {\n",
    "                                   \"DataSource\": {\n",
    "                                      \"S3Config\": {\n",
    "                                         \"Path\": subset1_path,\n",
    "                                         \"RoleArn\": role_arn\n",
    "                                      } \n",
    "                                   },\n",
    "                                   \"Format\": 'CSV',\n",
    "                                   \"Schema\": {\n",
    "                                      \"Attributes\": [\n",
    "                                         {\n",
    "                                            \"AttributeName\": \"item_id\",\n",
    "                                            \"AttributeType\": \"string\"\n",
    "                                         }\n",
    "                                      ]\n",
    "                                   }\n",
    "                                }\n",
    "                            }\n",
    "                            )\n",
    "\n",
    "subset1_forecast_arn = create_forecast_response['ForecastArn']\n",
    "\n",
    "\n",
    "#locations 201, 202, 203\n",
    "\n",
    "subset2_path = f\"s3://{bucket_name}/subset2.csv\"\n",
    "\n",
    "create_forecast_response = \\\n",
    "    forecast.create_forecast(ForecastName= 'TAXI_TIME_FORECAST_SUBSET_2',\n",
    "                             ForecastTypes= [\"0.5\"],\n",
    "                             PredictorArn= predictor_arn,\n",
    "                             TimeSeriesSelector= {\n",
    "                                \"TimeSeriesIdentifiers\": {\n",
    "                                   \"DataSource\": {\n",
    "                                      \"S3Config\": {\n",
    "                                         \"Path\": subset2_path,\n",
    "                                         \"RoleArn\": role_arn\n",
    "                                      } \n",
    "                                   },\n",
    "                                   \"Format\": 'CSV',\n",
    "                                   \"Schema\": {\n",
    "                                      \"Attributes\": [\n",
    "                                         {\n",
    "                                            \"AttributeName\": \"item_id\",\n",
    "                                            \"AttributeType\": \"string\"\n",
    "                                         }\n",
    "                                      ]\n",
    "                                   }\n",
    "                                }\n",
    "                            }\n",
    "                            )\n",
    "\n",
    "subset2_forecast_arn = create_forecast_response['ForecastArn']\n",
    "\n",
    "\n",
    "print(f\"Waiting for Subset 1 Forecast with ARN {mask_arn(subset1_forecast_arn)} to become ACTIVE.\")\n",
    "print(f\"Waiting for Subset 2 Forecast with ARN {mask_arn(subset2_forecast_arn)} to become ACTIVE.\")\n",
    "\n",
    "status = util.wait(lambda: forecast.describe_forecast(ForecastArn=subset1_forecast_arn), sleep_duration)\n",
    "status = util.wait(lambda: forecast.describe_forecast(ForecastArn=subset2_forecast_arn), sleep_duration)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d0ae8237",
   "metadata": {},
   "source": [
    "# Step 5: Export and view forecasted datapoints<a class=\"anchor\" id=\"view\">\n",
    "\n",
    "Below, the notebook shows how to export the time-series forecasted data points (predictions) that were created in the two forecast jobs created in Step 4.  When the job is complete, CSV files are available for each time-series, time step and quantile requested."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "id": "804851ed",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Waiting for Subset 1 Forecast export with arn:aws:forecast:us-east-1XXXXXXXXXXXXforecast-export-job/TAXI_TIME_FORECAST_SUBSET_1/subset1_forecast_export to become ACTIVE.\n",
      "Waiting for Subset 2 Forecast export with arn:aws:forecast:us-east-1XXXXXXXXXXXXforecast-export-job/TAXI_TIME_FORECAST_SUBSET_2/subset2_forecast_export to become ACTIVE.\n",
      "\n",
      "\n",
      "CREATE_PENDING \n",
      "CREATE_IN_PROGRESS .\n",
      "ACTIVE \n",
      "CREATE_IN_PROGRESS .........\n",
      "ACTIVE \n"
     ]
    }
   ],
   "source": [
    "# create task for exporting subset1\n",
    "\n",
    "subset1_export_path = \"s3://\" + bucket_name + \"/subset1\"\n",
    "subset1_export_name = \"subset1_forecast_export\"\n",
    "forecast_response = forecast.create_forecast_export_job(ForecastExportJobName=subset1_export_name,\n",
    "                                    ForecastArn=subset1_forecast_arn,\n",
    "                                    Destination={\n",
    "                                        \"S3Config\": {\n",
    "                                            \"Path\": subset1_export_path,\n",
    "                                            \"RoleArn\": role_arn\n",
    "                                        }\n",
    "                                    })\n",
    "\n",
    "subset1_forecast_export_arn = forecast_response['ForecastExportJobArn']\n",
    "\n",
    "\n",
    "# create task for exporting subset2\n",
    "subset2_export_path = \"s3://\" + bucket_name + \"/subset2\"\n",
    "subset2_export_name = \"subset2_forecast_export\"\n",
    "forecast_response = forecast.create_forecast_export_job(ForecastExportJobName=subset2_export_name,\n",
    "                                    ForecastArn=subset2_forecast_arn,\n",
    "                                    Destination={\n",
    "                                        \"S3Config\": {\n",
    "                                            \"Path\": subset2_export_path,\n",
    "                                            \"RoleArn\": role_arn\n",
    "                                        }\n",
    "                                    })\n",
    "\n",
    "subset2_forecast_export_arn = forecast_response['ForecastExportJobArn']\n",
    "\n",
    "\n",
    "# Next, wait on the task to complete\n",
    "print(f\"Waiting for Subset 1 Forecast export with {mask_arn(subset1_forecast_export_arn)} to become ACTIVE.\")\n",
    "print(f\"Waiting for Subset 2 Forecast export with {mask_arn(subset2_forecast_export_arn)} to become ACTIVE.\\n\\n\")\n",
    "\n",
    "status = util.wait(lambda: forecast.describe_forecast_export_job(ForecastExportJobArn=subset1_forecast_export_arn), sleep_duration)\n",
    "status = util.wait(lambda: forecast.describe_forecast_export_job(ForecastExportJobArn=subset2_forecast_export_arn), sleep_duration)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "45008732",
   "metadata": {},
   "source": [
    "### Download exported forecasts locally using CLI's sync feature"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "id": "aea3db26",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0"
      ]
     },
     "execution_count": 16,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "os.system('aws s3 sync '+subset1_export_path+' ./data/')\n",
    "os.system('aws s3 sync '+subset2_export_path+' ./data/')"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3248d322",
   "metadata": {},
   "source": [
    "### View the Predictions from Subset1\n",
    "\n",
    "In the next cells, observe the item_id values in the input subset*.csv files.  The output forecasted data points contain the same item_id values -- as they were driven by the input CSV.  You may notice multiple output files exist too.  This isn't a 1:1 relationship between input item_id and output files; instead this is due to the forecast export job being done from a multi-threaded backend server.  Your number of files and content (cardinality) within may differ."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "id": "b9139cbb",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "101\r\n",
      "102\r\n",
      "103\r\n"
     ]
    }
   ],
   "source": [
    "!cat ./data/subset1.csv"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "id": "50237d4e",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "item_id,date,p50\r\n",
      "item_id,date,p50\r\n",
      "101,2019-02-01T00:00:00Z,5.088283521094429\r\n",
      "101,2019-02-02T00:00:00Z,4.567379136327473\r\n",
      "101,2019-02-03T00:00:00Z,4.6882656109336915\r\n",
      "item_id,date,p50\r\n",
      "102,2019-02-01T00:00:00Z,5.999995178848844\r\n",
      "102,2019-02-02T00:00:00Z,5.398893783080317\r\n",
      "102,2019-02-03T00:00:00Z,5.434501873982511\r\n",
      "item_id,date,p50\r\n",
      "103,2019-02-01T00:00:00Z,-6.300245287129642\r\n",
      "103,2019-02-02T00:00:00Z,-6.499229937267954\r\n",
      "103,2019-02-03T00:00:00Z,-6.617409566614679\r\n"
     ]
    }
   ],
   "source": [
    "!cat ./data/subset1_forecast_export*.csv"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9baaeeaf",
   "metadata": {},
   "source": [
    "### View the Predictions from Subset2\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "id": "9b299624",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "201\r\n",
      "202\r\n",
      "203\r\n"
     ]
    }
   ],
   "source": [
    "!cat ./data/subset2.csv"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "id": "0ed77472",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "item_id,date,p50\r\n",
      "item_id,date,p50\r\n",
      "203,2019-02-01T00:00:00Z,8.769411924588741\r\n",
      "203,2019-02-02T00:00:00Z,7.390189030461122\r\n",
      "203,2019-02-03T00:00:00Z,6.6944046360308755\r\n",
      "item_id,date,p50\r\n",
      "202,2019-02-01T00:00:00Z,13.011395465473562\r\n",
      "202,2019-02-02T00:00:00Z,14.803267553466718\r\n",
      "202,2019-02-03T00:00:00Z,11.073257424886641\r\n",
      "item_id,date,p50\r\n",
      "201,2019-02-01T00:00:00Z,1.0144781856082785\r\n",
      "201,2019-02-02T00:00:00Z,0.936826154303063\r\n",
      "201,2019-02-03T00:00:00Z,0.7691473730429197\r\n"
     ]
    }
   ],
   "source": [
    "!cat ./data/subset2_forecast_export*"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1a8fa243",
   "metadata": {},
   "source": [
    "# Step 6: Cleanup<a class=\"anchor\" id=\"cleanup\">\n",
    "\n",
    "You will need to allow a few minutes for each of these steps to complete.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "4ba3940d",
   "metadata": {},
   "outputs": [],
   "source": [
    "forecast.delete_resource_tree(ResourceArn=dataset_group_arn)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8947f93e",
   "metadata": {},
   "source": [
    "Once the dataset group has been deleted (allow a few minutes), you may proceed.  The following code will allow you to test and determine when the dataset group has been deleted.   When you run this next cell, you may see your dataset group.  Allow a couple minutes, and try again.  Once your dataset is deleted, you may proceed to next step."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "4a2eaea2",
   "metadata": {},
   "outputs": [],
   "source": [
    "forecast.list_dataset_groups()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "643b1b0b",
   "metadata": {},
   "source": [
    "Delete dataset import jobs with TAXI_TIME_FORECAST_SUBSET_DEMO in the job name."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "61e26b00",
   "metadata": {},
   "outputs": [],
   "source": [
    "response = forecast.list_dataset_import_jobs()\n",
    "\n",
    "for i in response['DatasetImportJobs']:\n",
    "\n",
    "    try:\n",
    "        if i['DatasetImportJobArn'].index('TAXI_TIME_FORECAST_SUBSET_DEMO'):\n",
    "            print('Deleting',i['DatasetImportJobName'])\n",
    "            forecast.delete_dataset_import_job(DatasetImportJobArn=i['DatasetImportJobArn'])\n",
    "    except:\n",
    "        pass"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c7784d6a",
   "metadata": {},
   "source": [
    "It will take a few minutes to delete the dataset import jobs.  Once that is complete, the dataset can be deleted as follows in the next cell."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "e285ff2a",
   "metadata": {},
   "outputs": [],
   "source": [
    "forecast.delete_dataset(DatasetArn=ts_dataset_arn)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "conda_mxnet_latest_p37",
   "language": "python",
   "name": "conda_mxnet_latest_p37"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.10"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}