# **Amazon Lookout for Equipment** - Demonstration on an anonymized compressor dataset
*Part 3: Model training*

## Initialization
---
Following the data preparation notebook, this repository should now be structured as follow:
```
/lookout-equipment-demo/getting_started/
|
├── data/
| |
| ├── labelled-data/
| | └── labels.csv
| |
| └── training-data/
| └── expander/
| ├── subsystem-01
| | └── subsystem-01.csv
| |
| ├── subsystem-02
| | └── subsystem-02.csv
| |
| ├── ...
| |
| └── subsystem-24
| └── subsystem-24.csv
|
├── dataset/ <<< Original dataset <<<
| ├── labels.csv
| ├── tags_description.csv
| ├── timeranges.txt
| └── timeseries.zip
|
├── notebooks/
| ├── 1_data_preparation.ipynb
| ├── 2_dataset_creation.ipynb
| ├── 3_model_training.ipynb <<< This notebook <<<
| ├── 4_model_evaluation.ipynb
| ├── 5_inference_scheduling.ipynb
| └── config.py
|
└── utils/
 ├── aws_matplotlib_light.py
 └── lookout_equipment_utils.py
```

### Notebook configuration update

In [None]:
!pip install --quiet --upgrade tqdm

### Imports
**Note:** Update the content of the **config.py** file **before** running the following cell

In [None]:
import boto3
import config
import os
import pandas as pd
import sagemaker
import sys

# Helper functions for managing Lookout for Equipment API calls:
sys.path.append('../utils')
import lookout_equipment_utils as lookout

### Parameters

In [None]:
DATA = os.path.join('..', 'data')
LABEL_DATA = os.path.join(DATA, 'labelled-data')
TRAIN_DATA = os.path.join(DATA, 'training-data', 'expander')

ROLE_ARN = sagemaker.get_execution_role()
REGION_NAME = boto3.session.Session().region_name
DATASET_NAME = config.DATASET_NAME
BUCKET = config.BUCKET
PREFIX_TRAINING = config.PREFIX_TRAINING
PREFIX_LABEL = config.PREFIX_LABEL
MODEL_NAME = config.MODEL_NAME

Based on our previous analysis, we will use the following time ranges:

* **Train set:** 1st January 2015 - 31st August 2015: Lookout for Equipment needs at least 180 days of training data. March is one of the anomaly period tagged in the label, so this should not change the modeling behaviour.
* **Test set:** 1st September 2015 - 30th November 2015 *(this test set should include both normal and abnormal data to evaluate our model on)*

In [None]:
# Loading time ranges:
timeranges_fname = os.path.join(DATA, 'timeranges.txt')
with open(timeranges_fname, 'r') as f:
 timeranges = f.readlines()
 
training_start = pd.to_datetime(timeranges[0][:-1])
training_end = pd.to_datetime(timeranges[1][:-1])
evaluation_start = pd.to_datetime(timeranges[2][:-1])
evaluation_end = pd.to_datetime(timeranges[3][:-1])

print(f'Training period: from {training_start} to {training_end}')
print(f'Evaluation period: from {evaluation_start} to {evaluation_end}')

## Model training
---

In [None]:
# Prepare the model parameters:
lookout_model = lookout.LookoutEquipmentModel(model_name=MODEL_NAME,
 dataset_name=DATASET_NAME,
 region_name=REGION_NAME)

# Set the training / evaluation split date:
lookout_model.set_time_periods(evaluation_start,
 evaluation_end,
 training_start,
 training_end)

# Set the label data location:
lookout_model.set_label_data(bucket=BUCKET, 
 prefix=PREFIX_LABEL,
 access_role_arn=ROLE_ARN)

# This sets up the rate the service will resample the data before 
# training:
lookout_model.set_target_sampling_rate(sampling_rate='PT5M')

In [None]:
# Actually create the model and train it:
lookout_model.train()

A training is now in progress as captured by the console:
 
![Training in progress](../assets/model-training-in-progress.png)

Use the following cell to capture the model training progress. **This model should take around an hour to be trained.** Key drivers for training time are:
* Number of labels in the label dataset (if provided)
* Number of datapoints: this number depends on the sampling rate, the number of time series and the time range.

In [None]:
lookout_model.poll_model_training()

A model is now training and we can visualize the results of the back testing on the evaluation window selected at the beginning on this notebook:

![Training complete](../assets/model-training-complete.png)

You can also click on any detected event to bring up a ranking of the top 15 sensors contributing to it:
![Event details](../assets/model-event-details.png)

## Conclusion
---
In this notebook, we use the dataset created in part 2 of this notebook series and trained a Lookout for Equipment model.

From here you can either head:
* To the next notebook where we will **extract the evaluation data** for this model and use it to perform further analysis on the model results.
* Or to the **inference scheduling notebook** where we will start the model, feed it some new data and catch the results.