# **Amazon Lookout for Equipment** - SDK Tutorial

## Initialization
---

### Imports

In [None]:
!pip install lookoutequipment

In [None]:
import boto3
import matplotlib.pyplot as plt
import matplotlib.ticker as mtick
import numpy as np
import os
import pandas as pd
import sagemaker
import sys
import time

from lookoutequipment import plot, dataset, model, evaluation, scheduler

### Parameters

**Note:** Update the value of the **bucket** and **prefix** variables below **before** running the following cell

Make sure the IAM role used to run your notebook has access to the chosen bucket.

In [None]:
bucket = '<>'
prefix = '<>/' # Keep the trailing slash at the end

plt.style.use('Solarize_Light2')
plt.rcParams['lines.linewidth'] = 0.5

### Dataset preparation

In [None]:
data = dataset.load_dataset(dataset_name='expander', target_dir='expander-data')
dataset.upload_dataset('expander-data', bucket, prefix)

## Role definition
--- 
Before you can run this notebook (for instance, from a SageMaker environment), you will need:
* To allow SageMaker to run Lookout for Equipment API calls
* To allow Amazon Lookout for Equipment to access your training data (located in the bucket and prefix defined in the previous cell)

### Authorizing SageMaker to make Lookout for Equipment calls
You need to ensure that this notebook instance has an IAM role which allows it to call the Amazon Lookout for Equipment APIs:

1. In your IAM console, look for the SageMaker execution role endorsed by your notebook instance (a role with a name like `AmazonSageMaker-ExecutionRole-yyyymmddTHHMMSS`)
2. On the `Permissions` tab, click on `Attach policies`
3. In the Filter policies search field, look for `AmazonLookoutEquipmentFullAccess`, tick the checkbox next to it and click on `Attach policy`

Your notebook has now the ability to call any Lookout for Equipment APIs.

### Give access to your S3 data to Lookout for Equipment
When Lookout for Equipment will run, it will try to access your S3 data at several occasions:

* When ingesting the training data
* At training time when accessing the label data
* At inference time to run the input data and output the results

To enable these access, you need to create a role that Lookout for Equipment can endorse by following these steps:

1. Log in again to your [**IAM console**](https://console.aws.amazon.com/iamv2/home)
2. On the left menu bar click on `Roles` and then on the `Create role` button located at the top right
3. On the create role screen, selected `AWS Service` as the type of trusted entity
4. In the following section (`Choose a use case`), locate `SageMaker` and click on the service name. Not all AWS services appear in these ready to configure use cases and this is why we are using SageMaker as the baseline for our new role. In the next steps, we will adjust the role created to configure it specifically for Amazon Lookout for Equipment.
5. Click on the `Next` button until you reach the last step (`Review`): give a name and a description to your role (for instance `LookoutEquipmentS3AccessRole`)
6. Click on `Create role`: your role is created and you are brought back to the list of existing role
7. In the search bar, search for the role you just created and choose it from the returned result to see a summary of your role
8. At the top of your screen, you will see a role ARN field: **copy this ARN and paste it in the following cell, replacing the `<>` string below**
9. Click on the cross at the far right of the `AmazonSageMakerFullAccess` managed policy to remove this permission for this role as we don't need it.
10. Click on `Add inline policy` and then on the `JSON` tab. Then fill in the policy with the following document:

In [None]:
print((
 '{\n'
 ' "Version": "2012-10-17",\n'
 ' "Statement": [\n'
 ' {\n'
 ' "Effect": "Allow",\n'
 ' "Action": [\n'
 ' "s3:ListBucket",\n'
 ' "s3:GetObject",\n'
 ' "s3:PutObject"\n'
 ' ],\n'
 ' "Resource": [\n'
 f' "arn:aws:s3:::{bucket}/*",\n'
 f' "arn:aws:s3:::{bucket}"\n'
 ' ]\n'
 ' }\n'
 ' ]\n'
 '}'
))

11. Give a name to your policy (for instance: `LookoutEquipmentS3AccessPolicy`) and click on `Create policy`.
12. On the `Trust relationships` tab, choose `Edit trust relationship`.
13. Under policy document, replace the whole policy by the following document and click on the `Update Trust Policy` button on the bottom right:

```json
{
 "Version": "2012-10-17",
 "Statement": [
 {
 "Sid": "",
 "Effect": "Allow",
 "Principal": {
 "Service": "lookoutequipment.amazonaws.com"
 },
 "Action": "sts:AssumeRole"
 }
 ]
}
```

And you're done! When Amazon Lookout for Equipment will try to read the datasets you just uploaded in S3, it will request permissions from IAM by using the role we just created:
1. The **trust policy** allows Lookout for Equipment to assume this role.
2. The **inline policy** specifies that Lookout for Equipment is authorized to list and access the objects in the S3 bucket you created earlier.

Don't forget to update the **role_arn** variable below with the ARN of the role you just create **before** running the following cell

In [None]:
role_arn = '<>'

## Lookout for Equipment end-to-end walkthrough
---

### Dataset creation and data ingestion

In [None]:
lookout_dataset = dataset.LookoutEquipmentDataset(
 dataset_name='my_dataset',
 access_role_arn=role_arn,
 component_root_dir=f's3://{bucket}/{prefix}training-data'
)
lookout_dataset.create()
response = lookout_dataset.ingest_data(bucket, prefix + 'training-data/', wait=True)

### Building an anomaly detection model
#### Model training

In [None]:
lookout_model = model.LookoutEquipmentModel(model_name='my_model', 
 dataset_name='my_dataset')
lookout_model.set_time_periods(data['evaluation_start'],
 data['evaluation_end'],
 data['training_start'],
 data['training_end'])
lookout_model.set_label_data(bucket=bucket, 
 prefix=prefix + 'label-data/',
 access_role_arn=role_arn)
lookout_model.set_target_sampling_rate(sampling_rate='PT30M')

response = lookout_model.train()
lookout_model.poll_model_training(sleep_time=300)

#### Trained model evaluation overview

In [None]:
LookoutDiagnostics = evaluation.LookoutEquipmentAnalysis(model_name='my_model', tags_df=data['data'])
predicted_ranges = LookoutDiagnostics.get_predictions()
labels_fname = os.path.join('expander-data', 'labels.csv')
labeled_range = LookoutDiagnostics.get_labels(labels_fname)

In [None]:
TSViz = plot.TimeSeriesVisualization(timeseries_df=data['data'], data_format='tabular')
TSViz.add_signal(['signal-028'])
TSViz.add_labels(labeled_range)
TSViz.add_predictions([predicted_ranges])
TSViz.add_train_test_split(data['evaluation_start'])
TSViz.add_rolling_average(60*24)
TSViz.legend_format = {'loc': 'upper left', 'framealpha': 0.4, 'ncol': 3}
fig, axis = TSViz.plot()

#### Plot signal distribution
You might be curious about why Amazon Lookout for Equipment detected an anomalous event. Sometime, looking at a few of the time series is enough. But sometime, you need to dig deeper.

The following function, aggregate the signal importance of every signals over the evaluation period and sum these contributions over time for each signal. Then, it takes the top 8 signals and plot two distributions: one with the values each signal takes during the normal periods (present in the evaluation range) and a second one with the values taken during all the anomalous events detected in the evaluation range. This will help you visualize any significant shift of values for the top contributing signals.

You can also restrict these histograms over a specific range of time by setting the `start` and `end` arguments of the following function with datetime values:

In [None]:
fig = TSViz.plot_histograms(freq='5min', top_n=8)

### Scheduling inferences
#### Preparing inferencing data

In [None]:
dataset.prepare_inference_data(
 root_dir='expander-data',
 sample_data_dict=data,
 bucket=bucket,
 prefix=prefix,
 start_date='2015-11-21 04:00:00',
 num_sequences=12
)

#### Configuring and starting a scheduler

In [None]:
lookout_scheduler = scheduler.LookoutEquipmentScheduler(
 scheduler_name='my_scheduler',
 model_name='my_model'
)

scheduler_params = {
 'input_bucket': bucket,
 'input_prefix': prefix + 'inference-data/input/',
 'output_bucket': bucket,
 'output_prefix': prefix + 'inference-data/output/',
 'role_arn': role_arn,
 'upload_frequency': 'PT30M',
 'delay_offset': None,
 'timezone_offset': '+00:00',
 'component_delimiter': '_',
 'timestamp_format': 'yyyyMMddHHmmss'
}

lookout_scheduler.set_parameters(**scheduler_params)
response = lookout_scheduler.create()

Let's now wait for the scheduler to generate the first execution:

In [None]:
execution_summaries = []

while len(execution_summaries) == 0:
 execution_summaries = lookout_scheduler.list_inference_executions()
 
 if len(execution_summaries) == 0:
 print('WAITING FOR THE FIRST INFERENCE EXECUTION')
 time.sleep(60)
 
 else:
 print('FIRST INFERENCE EXECUTED\n')
 break
 
execution_summaries

#### Post-processing the inference results
Make sure you have some inference results before you run the next cell:

In [None]:
results_df = lookout_scheduler.get_predictions()
results_df.head()

In [None]:
event_details = pd.DataFrame(results_df.iloc[0, 1:]).reset_index()
fig, ax = plot.plot_event_barh(event_details, fig_width=12)

## Cleanup
---
The next cell deletes all the artifacts created by this notebook:

In [None]:
dataset.delete_dataset(dataset_name='my_dataset', delete_children=True, verbose=True)