# **Amazon Lookout for Equipment** - Demonstration on an anonymized expander dataset
*Part 4: Model evaluation*

## Initialization
---
Following the data preparation notebook, this repository should now be structured as follow:
```
/lookout-equipment-demo/getting_started/
|
├── data/
| |
| ├── labelled-data/
| | └── labels.csv
| |
| └── training-data/
| └── expander/
| ├── subsystem-01
| | └── subsystem-01.csv
| |
| ├── subsystem-02
| | └── subsystem-02.csv
| |
| ├── ...
| |
| └── subsystem-24
| └── subsystem-24.csv
|
├── dataset/ <<< Original dataset <<<
| ├── labels.csv
| ├── tags_description.csv
| ├── timeranges.txt
| └── timeseries.zip
|
├── notebooks/
| ├── 1_data_preparation.ipynb
| ├── 2_dataset_creation.ipynb
| ├── 3_model_training.ipynb
| ├── 4_model_evaluation.ipynb <<< This notebook <<<
| ├── 5_inference_scheduling.ipynb
| └── config.py
|
└── utils/
 ├── aws_matplotlib_light.py
 └── lookout_equipment_utils.py
```

### Notebook configuration update

In [None]:
!pip install --quiet --upgrade tqdm

### Imports
**Note:** Update the content of the **config.py** file **before** running the following cell

In [None]:
import boto3
import config
import matplotlib.pyplot as plt
import os
import pandas as pd
import pyarrow as pa
import pyarrow.parquet as pq
import sys

# Helper functions for managing Lookout for Equipment API calls:
sys.path.append('../utils')
import lookout_equipment_utils as lookout

### Parameters

In [None]:
DATA = os.path.join('..', 'data')
LABEL_DATA = os.path.join(DATA, 'labelled-data')
REGION_NAME = boto3.session.Session().region_name
DATASET_NAME = config.DATASET_NAME
BUCKET = config.BUCKET
PREFIX_TRAINING = config.PREFIX_TRAINING
PREFIX_LABEL = config.PREFIX_LABEL
MODEL_NAME = config.MODEL_NAME

In [None]:
# Loading time ranges:
timeranges_fname = os.path.join(DATA, 'timeranges.txt')
with open(timeranges_fname, 'r') as f:
 timeranges = f.readlines()
 
training_start = pd.to_datetime(timeranges[0][:-1])
training_end = pd.to_datetime(timeranges[1][:-1])
evaluation_start = pd.to_datetime(timeranges[2][:-1])
evaluation_end = pd.to_datetime(timeranges[3][:-1])

print(f'Training period: from {training_start} to {training_end}')
print(f'Evaluation period: from {evaluation_start} to {evaluation_end}')

### AWS Look & Feel definition for Matplotlib

In [None]:
%matplotlib inline

# Load style sheet:
plt.style.use('../utils/aws_matplotlib_light.py')

# Get colors from custom AWS palette:
prop_cycle = plt.rcParams['axes.prop_cycle']
colors = prop_cycle.by_key()['color']

### Loading original datasets for analysis purpose

In [None]:
# Let's load all our original signals (they will be useful later on):
all_tags_fname = os.path.join(DATA, 'training-data', 'expander.parquet')
table = pq.read_table(all_tags_fname)
all_tags_df = table.to_pandas()
del table

## Model evaluation
---

The `DescribeModel` API can be used to extract, among other things, the metrics associated to the trained model:

In [None]:
lookout_client = lookout.get_client(region_name=REGION_NAME)
describe_model_response = lookout_client.describe_model(ModelName=MODEL_NAME)
list(describe_model_response.keys())

The describe model response is a dictionnary. The `labeled_ranges` contains the label provided as an input while the `predicted_ranges` contains all the predicted ranges where Lookout for Equipment detected an anomaly. Let's use the following utility function get these into two dataframes:

In [None]:
LookoutDiagnostics = lookout.LookoutEquipmentAnalysis(model_name=MODEL_NAME, tags_df=all_tags_df, region_name=REGION_NAME)
LookoutDiagnostics.set_time_periods(evaluation_start, evaluation_end, training_start, training_end)
predicted_ranges = LookoutDiagnostics.get_predictions()
labels_fname = os.path.join(LABEL_DATA, 'labels.csv')
labeled_range = LookoutDiagnostics.get_labels(labels_fname)

**Note:** the labeled range from the model Describe API, only provides any labelled data falling within the evaluation range. We use the original label data to get all of them.

Let's now display one of the original signal and map both the labeled and the predicted ranges on the same plot:

In [None]:
# We load the original signal we looked at in the data preparation step:
tag = 'signal-028'
tag_df = all_tags_df.loc[training_start:evaluation_end, [tag]]
tag_df.columns = ['Value']

# Plot all of that:
fig, axes = lookout.plot_timeseries(
 timeseries_df=tag_df, 
 tag_name=tag,
 fig_width=20, 
 tag_split=evaluation_start, 
 labels_df=labeled_range,
 predictions=predicted_ranges,
 custom_grid=False
)

## Diagnostics
---

Let's compare:
1. The signal values during the periods marked as **anomalies** in the **evaluation period**
2. The signal values deemed as normal during the **training period**

**We will plot two histograms** for each signal: one in red for the points marked as anomalies and another one in green for all the other normal datapoints. We will also compute a distance between these two distributions and rank them by decreasing order. The reasoning behind this comparison is to show which signals differ the most from their normal behavior when they are marked as anomalies by the model. This overview can point the customer SME towards the right directions to inspect a cause of anomaly.

In [None]:
LookoutDiagnostics.compute_histograms()
fig, axes = LookoutDiagnostics.plot_histograms()

We can also plot the data points marked as anomalies directly on each time series signal:
* **In green**, the normal values during both the training and evaluation period
* **In red**, the values predicted as anomalies by the trained model
* **In grey**, the values marked as anomalies and excluded by the training to capture the asset behavior when it's operating under normal conditions

In [None]:
fig, axes = LookoutDiagnostics.plot_signals()

Let's now extract a list of these signals:

In [None]:
LookoutDiagnostics.get_ranked_list()

## Conclusion
---
In this notebook, we use the model created in part 3 of this notebook series and performed a few visualization and diagnostics on the results obtained. You can now move forward to the next step to the **inference scheduling notebook** where we will start the model, feed it some new data and catch the results.