{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# **Amazon Lookout for Equipment** - Demonstration on an anonymized compressor dataset\n", "*Part 3: Model training*" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Initialization\n", "---\n", "Following the data preparation notebook, this repository should now be structured as follow:\n", "```\n", "/lookout-equipment-demo/getting_started/\n", "|\n", "├── data/\n", "| |\n", "| ├── labelled-data/\n", "| | └── labels.csv\n", "| |\n", "| └── training-data/\n", "| └── expander/\n", "| ├── subsystem-01\n", "| | └── subsystem-01.csv\n", "| |\n", "| ├── subsystem-02\n", "| | └── subsystem-02.csv\n", "| |\n", "| ├── ...\n", "| |\n", "| └── subsystem-24\n", "| └── subsystem-24.csv\n", "|\n", "├── dataset/ <<< Original dataset <<<\n", "| ├── labels.csv\n", "| ├── tags_description.csv\n", "| ├── timeranges.txt\n", "| └── timeseries.zip\n", "|\n", "├── notebooks/\n", "| ├── 1_data_preparation.ipynb\n", "| ├── 2_dataset_creation.ipynb\n", "| ├── 3_model_training.ipynb <<< This notebook <<<\n", "| ├── 4_model_evaluation.ipynb\n", "| ├── 5_inference_scheduling.ipynb\n", "| └── config.py\n", "|\n", "└── utils/\n", " ├── aws_matplotlib_light.py\n", " └── lookout_equipment_utils.py\n", "```" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Notebook configuration update" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "!pip install --quiet --upgrade tqdm" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Imports\n", "**Note:** Update the content of the **config.py** file **before** running the following cell" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import boto3\n", "import config\n", "import os\n", "import pandas as pd\n", "import sagemaker\n", "import sys\n", "\n", "# Helper functions for managing Lookout for Equipment API calls:\n", "sys.path.append('../utils')\n", "import lookout_equipment_utils as lookout" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Parameters" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "DATA = os.path.join('..', 'data')\n", "LABEL_DATA = os.path.join(DATA, 'labelled-data')\n", "TRAIN_DATA = os.path.join(DATA, 'training-data', 'expander')\n", "\n", "ROLE_ARN = sagemaker.get_execution_role()\n", "REGION_NAME = boto3.session.Session().region_name\n", "DATASET_NAME = config.DATASET_NAME\n", "BUCKET = config.BUCKET\n", "PREFIX_TRAINING = config.PREFIX_TRAINING\n", "PREFIX_LABEL = config.PREFIX_LABEL\n", "MODEL_NAME = config.MODEL_NAME" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Based on our previous analysis, we will use the following time ranges:\n", "\n", "* **Train set:** 1st January 2015 - 31st August 2015: Lookout for Equipment needs at least 180 days of training data. March is one of the anomaly period tagged in the label, so this should not change the modeling behaviour.\n", "* **Test set:** 1st September 2015 - 30th November 2015 *(this test set should include both normal and abnormal data to evaluate our model on)*" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Loading time ranges:\n", "timeranges_fname = os.path.join(DATA, 'timeranges.txt')\n", "with open(timeranges_fname, 'r') as f:\n", " timeranges = f.readlines()\n", " \n", "training_start = pd.to_datetime(timeranges[0][:-1])\n", "training_end = pd.to_datetime(timeranges[1][:-1])\n", "evaluation_start = pd.to_datetime(timeranges[2][:-1])\n", "evaluation_end = pd.to_datetime(timeranges[3][:-1])\n", "\n", "print(f'Training period: from {training_start} to {training_end}')\n", "print(f'Evaluation period: from {evaluation_start} to {evaluation_end}')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Model training\n", "---" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Prepare the model parameters:\n", "lookout_model = lookout.LookoutEquipmentModel(model_name=MODEL_NAME,\n", " dataset_name=DATASET_NAME,\n", " region_name=REGION_NAME)\n", "\n", "# Set the training / evaluation split date:\n", "lookout_model.set_time_periods(evaluation_start,\n", " evaluation_end,\n", " training_start,\n", " training_end)\n", "\n", "# Set the label data location:\n", "lookout_model.set_label_data(bucket=BUCKET, \n", " prefix=PREFIX_LABEL,\n", " access_role_arn=ROLE_ARN)\n", "\n", "# This sets up the rate the service will resample the data before \n", "# training:\n", "lookout_model.set_target_sampling_rate(sampling_rate='PT5M')" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Actually create the model and train it:\n", "lookout_model.train()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "A training is now in progress as captured by the console:\n", " \n", "![Training in progress](../assets/model-training-in-progress.png)\n", "\n", "Use the following cell to capture the model training progress. **This model should take around an hour to be trained.** Key drivers for training time are:\n", "* Number of labels in the label dataset (if provided)\n", "* Number of datapoints: this number depends on the sampling rate, the number of time series and the time range." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "lookout_model.poll_model_training()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "A model is now training and we can visualize the results of the back testing on the evaluation window selected at the beginning on this notebook:\n", "\n", "![Training complete](../assets/model-training-complete.png)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can also click on any detected event to bring up a ranking of the top 15 sensors contributing to it:\n", "![Event details](../assets/model-event-details.png)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Conclusion\n", "---\n", "In this notebook, we use the dataset created in part 2 of this notebook series and trained a Lookout for Equipment model.\n", "\n", "From here you can either head:\n", "* To the next notebook where we will **extract the evaluation data** for this model and use it to perform further analysis on the model results.\n", "* Or to the **inference scheduling notebook** where we will start the model, feed it some new data and catch the results." ] } ], "metadata": { "kernelspec": { "display_name": "conda_python3", "language": "python", "name": "conda_python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.13" } }, "nbformat": 4, "nbformat_minor": 4 }