{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# **Amazon Lookout for Equipment** - Demonstration on an anonymized compressor dataset\n",
    "*Part 3: Model training*"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Initialization\n",
    "---\n",
    "Following the data preparation notebook, this repository should now be structured as follow:\n",
    "```\n",
    "/lookout-equipment-demo/getting_started/\n",
    "|\n",
    "├── data/\n",
    "|   |\n",
    "|   ├── labelled-data/\n",
    "|   |   └── labels.csv\n",
    "|   |\n",
    "|   └── training-data/\n",
    "|       └── expander/\n",
    "|           ├── subsystem-01\n",
    "|           |   └── subsystem-01.csv\n",
    "|           |\n",
    "|           ├── subsystem-02\n",
    "|           |   └── subsystem-02.csv\n",
    "|           |\n",
    "|           ├── ...\n",
    "|           |\n",
    "|           └── subsystem-24\n",
    "|               └── subsystem-24.csv\n",
    "|\n",
    "├── dataset/                                <<< Original dataset <<<\n",
    "|   ├── labels.csv\n",
    "|   ├── tags_description.csv\n",
    "|   ├── timeranges.txt\n",
    "|   └── timeseries.zip\n",
    "|\n",
    "├── notebooks/\n",
    "|   ├── 1_data_preparation.ipynb\n",
    "|   ├── 2_dataset_creation.ipynb\n",
    "|   ├── 3_model_training.ipynb              <<< This notebook <<<\n",
    "|   ├── 4_model_evaluation.ipynb\n",
    "|   ├── 5_inference_scheduling.ipynb\n",
    "|   └── config.py\n",
    "|\n",
    "└── utils/\n",
    "    ├── aws_matplotlib_light.py\n",
    "    └── lookout_equipment_utils.py\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Notebook configuration update"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!pip install --quiet --upgrade tqdm"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Imports\n",
    "<span style=\"color: white; background-color: OrangeRed; padding: 0px 15px 0px 15px; border-radius: 20px;\">**Note:** Update the content of the **config.py** file **before** running the following cell</span>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import boto3\n",
    "import config\n",
    "import os\n",
    "import pandas as pd\n",
    "import sagemaker\n",
    "import sys\n",
    "\n",
    "# Helper functions for managing Lookout for Equipment API calls:\n",
    "sys.path.append('../utils')\n",
    "import lookout_equipment_utils as lookout"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Parameters"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "DATA       = os.path.join('..', 'data')\n",
    "LABEL_DATA = os.path.join(DATA, 'labelled-data')\n",
    "TRAIN_DATA = os.path.join(DATA, 'training-data', 'expander')\n",
    "\n",
    "ROLE_ARN        = sagemaker.get_execution_role()\n",
    "REGION_NAME     = boto3.session.Session().region_name\n",
    "DATASET_NAME    = config.DATASET_NAME\n",
    "BUCKET          = config.BUCKET\n",
    "PREFIX_TRAINING = config.PREFIX_TRAINING\n",
    "PREFIX_LABEL    = config.PREFIX_LABEL\n",
    "MODEL_NAME      = config.MODEL_NAME"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Based on our previous analysis, we will use the following time ranges:\n",
    "\n",
    "* **Train set:** 1st January 2015 - 31st August 2015: Lookout for Equipment needs at least 180 days of training data. March is one of the anomaly period tagged in the label, so this should not change the modeling behaviour.\n",
    "* **Test set:** 1st September 2015 - 30th November 2015 *(this test set should include both normal and abnormal data to evaluate our model on)*"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Loading time ranges:\n",
    "timeranges_fname = os.path.join(DATA, 'timeranges.txt')\n",
    "with open(timeranges_fname, 'r') as f:\n",
    "    timeranges = f.readlines()\n",
    "    \n",
    "training_start   = pd.to_datetime(timeranges[0][:-1])\n",
    "training_end     = pd.to_datetime(timeranges[1][:-1])\n",
    "evaluation_start = pd.to_datetime(timeranges[2][:-1])\n",
    "evaluation_end   = pd.to_datetime(timeranges[3][:-1])\n",
    "\n",
    "print(f'Training period: from {training_start} to {training_end}')\n",
    "print(f'Evaluation period: from {evaluation_start} to {evaluation_end}')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Model training\n",
    "---"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Prepare the model parameters:\n",
    "lookout_model = lookout.LookoutEquipmentModel(model_name=MODEL_NAME,\n",
    "                                              dataset_name=DATASET_NAME,\n",
    "                                              region_name=REGION_NAME)\n",
    "\n",
    "# Set the training / evaluation split date:\n",
    "lookout_model.set_time_periods(evaluation_start,\n",
    "                               evaluation_end,\n",
    "                               training_start,\n",
    "                               training_end)\n",
    "\n",
    "# Set the label data location:\n",
    "lookout_model.set_label_data(bucket=BUCKET, \n",
    "                             prefix=PREFIX_LABEL,\n",
    "                             access_role_arn=ROLE_ARN)\n",
    "\n",
    "# This sets up the rate the service will resample the data before \n",
    "# training:\n",
    "lookout_model.set_target_sampling_rate(sampling_rate='PT5M')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Actually create the model and train it:\n",
    "lookout_model.train()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "A training is now in progress as captured by the console:\n",
    "    \n",
    "![Training in progress](../assets/model-training-in-progress.png)\n",
    "\n",
    "Use the following cell to capture the model training progress. **This model should take around an hour to be trained.** Key drivers for training time are:\n",
    "* Number of labels in the label dataset (if provided)\n",
    "* Number of datapoints: this number depends on the sampling rate, the number of time series and the time range."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "lookout_model.poll_model_training()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "A model is now training and we can visualize the results of the back testing on the evaluation window selected at the beginning on this notebook:\n",
    "\n",
    "![Training complete](../assets/model-training-complete.png)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You can also click on any detected event to bring up a ranking of the top 15 sensors contributing to it:\n",
    "![Event details](../assets/model-event-details.png)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Conclusion\n",
    "---\n",
    "In this notebook, we use the dataset created in part 2 of this notebook series and trained a Lookout for Equipment model.\n",
    "\n",
    "From here you can either head:\n",
    "* To the next notebook where we will **extract the evaluation data** for this model and use it to perform further analysis on the model results.\n",
    "* Or to the **inference scheduling notebook** where we will start the model, feed it some new data and catch the results."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "conda_python3",
   "language": "python",
   "name": "conda_python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.13"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}