{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Multiple models comparison"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This notebook will run three forecasting algorithms on the same dataset and compare their performances.\n",
    "\n",
    "The algorithms are:\n",
    "  - Prophet\n",
    "  - ETS\n",
    "  - DeepAR+\n",
    " "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Setup"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import sys\n",
    "import os\n",
    "import time\n",
    "import pprint\n",
    "\n",
    "import boto3\n",
    "import pandas as pd\n",
    "\n",
    "# importing forecast notebook utility from notebooks/common directory\n",
    "sys.path.insert( 0, os.path.abspath(\"../../common\") )\n",
    "import util\n",
    "\n",
    "pp = pprint.PrettyPrinter(indent=2)  # Better display for dictionaries"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Configure the S3 bucket name and region name for this lesson.\n",
    "\n",
    "- If you don't have an S3 bucket, create it first on S3.\n",
    "- Although we have set the region to us-west-2 as a default value below, you can choose any of the regions that the service is available in."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "text_widget_bucket = util.create_text_widget( \"bucket_name\", \"input your S3 bucket name\" )\n",
    "text_widget_region = util.create_text_widget( \"region\", \"input region name.\", default_value=\"us-west-2\" )"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "bucket_name = text_widget_bucket.value\n",
    "assert bucket_name, \"bucket_name not set.\"\n",
    "\n",
    "region = text_widget_region.value\n",
    "assert region, \"region not set.\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The last part of the setup process is to validate that your account can communicate with Amazon Forecast, the cell below does just that."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "session = boto3.Session(region_name=region) \n",
    "forecast = session.client(service_name='forecast') \n",
    "forecastquery = session.client(service_name='forecastquery')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Create the role to provide to Amazon Forecast.\n",
    "role_name = \"ForecastNotebookRole-CompareMultipleModels\"\n",
    "role_arn = util.get_or_create_iam_role( role_name = role_name )"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Prepare your data\n",
    "\n",
    "Preparing dataset with following steps.\n",
    "1. Upload the CSV file to S3.\n",
    "2. Create a DatasetGroup.\n",
    "3. Create a Dataset and associate it with the DatasetGroup.\n",
    "4. Import the uploaded CSV file to the Dataset."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Upload the CSV file to S3"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "s3 = session.client('s3')\n",
    "key=\"elec_data/item-demand-time.csv\"\n",
    "s3.upload_file(Filename=\"../../common/data/item-demand-time.csv\", Bucket=bucket_name, Key=key)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "project = 'compare_multiple_models'\n",
    "dataset_name= project+'_ds'\n",
    "dataset_group_name= project +'_dsg'\n",
    "s3_data_path = \"s3://\"+bucket_name+\"/\"+key"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Create a DatasetGroup"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "create_dataset_group_response = forecast.create_dataset_group(\n",
    "    DatasetGroupName=dataset_group_name,\n",
    "    Domain=\"CUSTOM\",\n",
    "    )\n",
    "\n",
    "dataset_group_arn = create_dataset_group_response['DatasetGroupArn']"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Create a Dataset and associate it with the DatasetGroup"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "DATASET_FREQUENCY = \"H\"\n",
    "TIMESTAMP_FORMAT = \"yyyy-MM-dd hh:mm:ss\"\n",
    "\n",
    "schema ={\n",
    "   \"Attributes\":[\n",
    "      {\n",
    "         \"AttributeName\":\"timestamp\",\n",
    "         \"AttributeType\":\"timestamp\"\n",
    "      },\n",
    "      {\n",
    "         \"AttributeName\":\"target_value\",\n",
    "         \"AttributeType\":\"float\"\n",
    "      },\n",
    "      {\n",
    "         \"AttributeName\":\"item_id\",\n",
    "         \"AttributeType\":\"string\"\n",
    "      }\n",
    "   ]\n",
    "}\n",
    "\n",
    "response=forecast.create_dataset(\n",
    "    Domain=\"CUSTOM\",\n",
    "    DatasetType='TARGET_TIME_SERIES',\n",
    "    DatasetName=dataset_name,\n",
    "    DataFrequency=DATASET_FREQUENCY, \n",
    "    Schema = schema\n",
    ")\n",
    "\n",
    "dataset_arn = response['DatasetArn']\n",
    "\n",
    "forecast.update_dataset_group(DatasetGroupArn=dataset_group_arn, DatasetArns=[dataset_arn])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Import the uploaded CSV file to the Dataset"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "datasetImportJobName = 'EP_DSIMPORT_JOB_TARGET'\n",
    "ds_import_job_response = forecast.create_dataset_import_job(\n",
    "    DatasetImportJobName = datasetImportJobName,\n",
    "    DatasetArn = dataset_arn,\n",
    "    DataSource = {\n",
    "        \"S3Config\" : {\n",
    "            \"Path\":s3_data_path,\n",
    "            \"RoleArn\": role_arn\n",
    "        }\n",
    "    },\n",
    "    TimestampFormat=TIMESTAMP_FORMAT\n",
    ")\n",
    "\n",
    "ds_import_job_arn=ds_import_job_response['DatasetImportJobArn']"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "status_indicator = util.StatusIndicator()\n",
    "\n",
    "while True:\n",
    "    status = forecast.describe_dataset_import_job(DatasetImportJobArn=ds_import_job_arn)['Status']\n",
    "    status_indicator.update(status)\n",
    "    if status in ('ACTIVE', 'CREATE_FAILED'): break\n",
    "    time.sleep(10)\n",
    "\n",
    "status_indicator.end()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Create the predictors"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The next step is to create a dictionary where to store useful information about the algorithms: their name, ARN and eventually their performance metrics."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "algos = ['Prophet', 'ETS', 'Deep_AR_Plus']\n",
    "\n",
    "predictors = {a:{} for a in algos}\n",
    "\n",
    "for p in predictors:\n",
    "    predictors[p]['predictor_name'] = project + '_' + p + '_algo'\n",
    "    predictors[p]['algorithm_arn'] = 'arn:aws:forecast:::algorithm/' + p\n",
    "\n",
    "pp.pprint(predictors)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Here we also define our forecast horizon: the number of time points to be predicted in the future. For weekly data, a value of 12 means 12 weeks. Our example is hourly data, we try forecast the next day, so we can set to 24."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "forecastHorizon = 24"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The following function actually creates the predictor as specified by several parameters. We will call this function once for each of the 3 algorithms."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def create_predictor_for_comparison(pred_name, algo_arn, forecast_horizon):\n",
    "    response=forecast.create_predictor(PredictorName=pred_name, \n",
    "                                       AlgorithmArn=algo_arn,\n",
    "                                       ForecastHorizon=forecast_horizon,\n",
    "                                       PerformAutoML= False,\n",
    "                                       PerformHPO=False,\n",
    "                                       EvaluationParameters= {\"NumberOfBacktestWindows\": 1, \n",
    "                                                              \"BackTestWindowOffset\": 24}, \n",
    "                                       InputDataConfig= {\"DatasetGroupArn\": dataset_group_arn},\n",
    "                                       FeaturizationConfig= {\"ForecastFrequency\": \"H\", \n",
    "                                                             \"Featurizations\": \n",
    "                                                             [\n",
    "                                                                 {\"AttributeName\": \"target_value\", \n",
    "                                                                  \"FeaturizationPipeline\": \n",
    "                                                                  [\n",
    "                                                                      {\"FeaturizationMethodName\": \"filling\", \n",
    "                                                                       \"FeaturizationMethodParameters\": \n",
    "                                                                       {\"frontfill\": \"none\", \n",
    "                                                                        \"middlefill\": \"zero\", \n",
    "                                                                        \"backfill\": \"zero\"}\n",
    "                                                                      }\n",
    "                                                                  ]\n",
    "                                                                 }\n",
    "                                                             ]\n",
    "                                                            }\n",
    "                                      )\n",
    "    return response"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "For all 3 algorithms, we invoke their creation and wait until they are complete. We also store their performance in our dictionary."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "for p in predictors.keys():\n",
    "\n",
    "    print('Creating predictor :', p)\n",
    "    \n",
    "    predictor_response = create_predictor_for_comparison(predictors[p]['predictor_name'], predictors[p]['algorithm_arn'], forecastHorizon)\n",
    "    predictorArn=predictor_response['PredictorArn']\n",
    "    \n",
    "    # wait for the predictor to be actually created\n",
    "    status_indicator = util.StatusIndicator()\n",
    "    while True:\n",
    "        status = forecast.describe_predictor(PredictorArn=predictorArn)['Status']\n",
    "        status_indicator.update(status)\n",
    "        if status in ('ACTIVE', 'CREATE_FAILED'): break\n",
    "        time.sleep(10)\n",
    "    status_indicator.end()            \n",
    "\n",
    "    predictors[p]['predictor_arn'] = predictorArn  # save it, just for reference\n",
    "\n",
    "    # compute and store performance metrics, then proceed with the next algorithm        \n",
    "    predictors[p]['accuracy'] = forecast.get_accuracy_metrics(PredictorArn=predictorArn)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**TODO:** (Bar?)plot RMSE, 0.9-, 0.5- and 0.1-quantile LossValues for each algorithm"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This is what we stored so far for DeepAR+:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "predictors['Deep_AR_Plus']"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Visualize results"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Looping over our dictionary, we can retrieve the Root Mean Square Error (RMSE) for each predictor and plot it as a bar plot."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "scores = pd.DataFrame(columns=['predictor', 'RMSE'])\n",
    "for p in predictors:\n",
    "    score = predictors[p]['accuracy']['PredictorEvaluationResults'][0]['TestWindows'][0]['Metrics']['RMSE']\n",
    "    scores = scores.append(pd.DataFrame({'predictor':[p], 'RMSE':[score]}), ignore_index=True)\n",
    "\n",
    "scores.plot.bar( x=\"predictor\", y=\"RMSE\" )"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Cleanup\n",
    "Deleting all Amazon Forecast resources we created above."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Delete predictors\n",
    "for p in predictors:\n",
    "    util.wait_till_delete(lambda: forecast.delete_predictor(PredictorArn = predictors[p]['predictor_arn']))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Delete dataset import job\n",
    "util.wait_till_delete(lambda: forecast.delete_dataset_import_job(DatasetImportJobArn=ds_import_job_arn))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Delete dataset\n",
    "util.wait_till_delete(lambda: forecast.delete_dataset(DatasetArn=dataset_arn))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Delete dataset group\n",
    "forecast.delete_dataset_group(DatasetGroupArn=dataset_group_arn)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Delete IAM role\n",
    "util.delete_iam_role( role_name )"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "conda_python3",
   "language": "python",
   "name": "conda_python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.5"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}