{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# **Amazon Lookout for Equipment** - Demonstration on an anonymized expander dataset\n", "*Part 4: Model evaluation*" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Initialization\n", "---\n", "Following the data preparation notebook, this repository should now be structured as follow:\n", "```\n", "/lookout-equipment-demo/getting_started/\n", "|\n", "├── data/\n", "| |\n", "| ├── labelled-data/\n", "| | └── labels.csv\n", "| |\n", "| └── training-data/\n", "| └── expander/\n", "| ├── subsystem-01\n", "| | └── subsystem-01.csv\n", "| |\n", "| ├── subsystem-02\n", "| | └── subsystem-02.csv\n", "| |\n", "| ├── ...\n", "| |\n", "| └── subsystem-24\n", "| └── subsystem-24.csv\n", "|\n", "├── dataset/ <<< Original dataset <<<\n", "| ├── labels.csv\n", "| ├── tags_description.csv\n", "| ├── timeranges.txt\n", "| └── timeseries.zip\n", "|\n", "├── notebooks/\n", "| ├── 1_data_preparation.ipynb\n", "| ├── 2_dataset_creation.ipynb\n", "| ├── 3_model_training.ipynb\n", "| ├── 4_model_evaluation.ipynb <<< This notebook <<<\n", "| ├── 5_inference_scheduling.ipynb\n", "| └── config.py\n", "|\n", "└── utils/\n", " ├── aws_matplotlib_light.py\n", " └── lookout_equipment_utils.py\n", "```" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Notebook configuration update" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "!pip install --quiet --upgrade tqdm" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Imports\n", "**Note:** Update the content of the **config.py** file **before** running the following cell" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import boto3\n", "import config\n", "import matplotlib.pyplot as plt\n", "import os\n", "import pandas as pd\n", "import pyarrow as pa\n", "import pyarrow.parquet as pq\n", "import sys\n", "\n", "# Helper functions for managing Lookout for Equipment API calls:\n", "sys.path.append('../utils')\n", "import lookout_equipment_utils as lookout" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Parameters" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "DATA = os.path.join('..', 'data')\n", "LABEL_DATA = os.path.join(DATA, 'labelled-data')\n", "REGION_NAME = boto3.session.Session().region_name\n", "DATASET_NAME = config.DATASET_NAME\n", "BUCKET = config.BUCKET\n", "PREFIX_TRAINING = config.PREFIX_TRAINING\n", "PREFIX_LABEL = config.PREFIX_LABEL\n", "MODEL_NAME = config.MODEL_NAME" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Loading time ranges:\n", "timeranges_fname = os.path.join(DATA, 'timeranges.txt')\n", "with open(timeranges_fname, 'r') as f:\n", " timeranges = f.readlines()\n", " \n", "training_start = pd.to_datetime(timeranges[0][:-1])\n", "training_end = pd.to_datetime(timeranges[1][:-1])\n", "evaluation_start = pd.to_datetime(timeranges[2][:-1])\n", "evaluation_end = pd.to_datetime(timeranges[3][:-1])\n", "\n", "print(f'Training period: from {training_start} to {training_end}')\n", "print(f'Evaluation period: from {evaluation_start} to {evaluation_end}')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### AWS Look & Feel definition for Matplotlib" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%matplotlib inline\n", "\n", "# Load style sheet:\n", "plt.style.use('../utils/aws_matplotlib_light.py')\n", "\n", "# Get colors from custom AWS palette:\n", "prop_cycle = plt.rcParams['axes.prop_cycle']\n", "colors = prop_cycle.by_key()['color']" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Loading original datasets for analysis purpose" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Let's load all our original signals (they will be useful later on):\n", "all_tags_fname = os.path.join(DATA, 'training-data', 'expander.parquet')\n", "table = pq.read_table(all_tags_fname)\n", "all_tags_df = table.to_pandas()\n", "del table" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Model evaluation\n", "---" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The `DescribeModel` API can be used to extract, among other things, the metrics associated to the trained model:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "lookout_client = lookout.get_client(region_name=REGION_NAME)\n", "describe_model_response = lookout_client.describe_model(ModelName=MODEL_NAME)\n", "list(describe_model_response.keys())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The describe model response is a dictionnary. The `labeled_ranges` contains the label provided as an input while the `predicted_ranges` contains all the predicted ranges where Lookout for Equipment detected an anomaly. Let's use the following utility function get these into two dataframes:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "LookoutDiagnostics = lookout.LookoutEquipmentAnalysis(model_name=MODEL_NAME, tags_df=all_tags_df, region_name=REGION_NAME)\n", "LookoutDiagnostics.set_time_periods(evaluation_start, evaluation_end, training_start, training_end)\n", "predicted_ranges = LookoutDiagnostics.get_predictions()\n", "labels_fname = os.path.join(LABEL_DATA, 'labels.csv')\n", "labeled_range = LookoutDiagnostics.get_labels(labels_fname)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Note:** the labeled range from the model Describe API, only provides any labelled data falling within the evaluation range. We use the original label data to get all of them." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's now display one of the original signal and map both the labeled and the predicted ranges on the same plot:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# We load the original signal we looked at in the data preparation step:\n", "tag = 'signal-028'\n", "tag_df = all_tags_df.loc[training_start:evaluation_end, [tag]]\n", "tag_df.columns = ['Value']\n", "\n", "# Plot all of that:\n", "fig, axes = lookout.plot_timeseries(\n", " timeseries_df=tag_df, \n", " tag_name=tag,\n", " fig_width=20, \n", " tag_split=evaluation_start, \n", " labels_df=labeled_range,\n", " predictions=predicted_ranges,\n", " custom_grid=False\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Diagnostics\n", "---" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's compare:\n", "1. The signal values during the periods marked as **anomalies** in the **evaluation period**\n", "2. The signal values deemed as normal during the **training period**\n", "\n", "**We will plot two histograms** for each signal: one in red for the points marked as anomalies and another one in green for all the other normal datapoints. We will also compute a distance between these two distributions and rank them by decreasing order. The reasoning behind this comparison is to show which signals differ the most from their normal behavior when they are marked as anomalies by the model. This overview can point the customer SME towards the right directions to inspect a cause of anomaly." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "LookoutDiagnostics.compute_histograms()\n", "fig, axes = LookoutDiagnostics.plot_histograms()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can also plot the data points marked as anomalies directly on each time series signal:\n", "* **In green**, the normal values during both the training and evaluation period\n", "* **In red**, the values predicted as anomalies by the trained model\n", "* **In grey**, the values marked as anomalies and excluded by the training to capture the asset behavior when it's operating under normal conditions" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "fig, axes = LookoutDiagnostics.plot_signals()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's now extract a list of these signals:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "LookoutDiagnostics.get_ranked_list()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Conclusion\n", "---\n", "In this notebook, we use the model created in part 3 of this notebook series and performed a few visualization and diagnostics on the results obtained. You can now move forward to the next step to the **inference scheduling notebook** where we will start the model, feed it some new data and catch the results." ] } ], "metadata": { "kernelspec": { "display_name": "conda_python3", "language": "python", "name": "conda_python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.13" } }, "nbformat": 4, "nbformat_minor": 4 }