{ "cells": [ { "cell_type": "markdown", "id": "ca84fb81", "metadata": {}, "source": [ "# Amazon Lookout for Equipment no-code Workshop\n", "---\n", "No-code workshop to experiment with Amazon Lookout for Equipment using only the AWS console.\n", "\n", "To get started with Lookout for Equipment, you can do the following:\n", "* Create a project and ingest some historical data\n", "* Train an anomaly detection model using part of the ingested data\n", "* Evaluate the model on a left-out part of the ingested data\n", "* Configure and launch an inference scheduler" ] }, { "cell_type": "markdown", "id": "bdc56ee2", "metadata": {}, "source": [ "## Region selection\n", "Please, check with your instructor about the regions where you will be working for this hands-on workshop. You can change it by selecting the region name in the upper right of the screen:\n", "\n", "![Region selection](pictures/region_selection.png)\n", "\n", "At this time, Lookout for Equipment is available in Europe (Ireland, **eu-west-1**), US East (N. Virginia, **us-east-1**) and Asia Pacific (Seoul, **ap-northeast-2**)." ] }, { "cell_type": "markdown", "id": "e84139f1", "metadata": {}, "source": [ "## Creating an S3 Bucket\n", "Lookout for Equipment imports the training data from any S3 bucket. Let's create one to store the files we will need during this workshop.\n", "\n", "1. Access the S3 management console by typing **S3** in the search bar as shown below and click on S3 to open the service console.\n", "\n", "![S3 console access](pictures/s3_service_search.png)\n", "\n", "2. Click on **Create bucket**\n", "3. Enter a **Bucket name**: for the remaining of this document, we will reference a bucket named *lookout-equipment-no-code-workshop* (as bucket names need to be unique across the S3 service, yours will have to be named otherwise). Scroll down and click on **Create bucket**\n", "4. You are brought back to the bucket list from your account. Click on the name of your bucket and create the following folders:\n", " * `facility/`\n", " * `label-data/`\n", " * `inference-data/input/`\n", " * `inference-data/output/`\n", " \n", "Your bucket content should look similar to the following:\n", "\n", "![S3 Bucket initial content](pictures/s3_bucket_root.png)\n", "\n", "Your bucket is ready to receive some data, let's download and prepare them." ] }, { "cell_type": "markdown", "id": "52167c0e", "metadata": {}, "source": [ "## Prepare input data\n", "### Download data for learning\n", "In this hands-on exercise, you will use a synthetic dataset where all sensors are located in one file (there are 30 of them). Download training and label data to your computer using the following links:\n", "* Click on [**this link**](https://lookoutforequipmentbucket-us-east-1.s3.amazonaws.com/datasets/getting-started/console/pump_sensors.zip) to download the `pump_sensors.zip` archive and unzip it. You end up with a 250 MB time series file named `pump_sensors.csv`.\n", "* Click on [**this link**](https://lookoutforequipmentbucket-us-east-1.s3.amazonaws.com/datasets/getting-started/console/labels.csv) to download the `labels.csv` file\n", "\n", "For this tutorial we will use this synthetic dataset and only use a version where all sensors are in one single CSV file. The first column of this file is a timestamp, so this file contains 31 columns and 432,000 rows. The following are sample rows from this time series dataset:\n", "\n", "![Synthetic pump data timeseries overview](pictures/pump_data_overview.png)\n", "\n", "We have access to 10 months of data with a regular sampling rate of 1 minute for every sensors.\n", "\n", "This dataset comes with a label file where each row contains a time period where we know some anomalies occurred. This file contains 30 anomalous periods where the first and the second column respectively marks the start and the end of the periods.\n", "\n", "The following are sample rows from this label dataset: **note the absence of any header or row number**.\n", "\n", "![Synthetic pump data label overview](pictures/pump_label_data.png)\n", "\n", "**Note:** all the timestamps from the training dataset and the labels should be within the same time zone.\n", "\n", "![Signal overview](pictures/signal_overview.png)" ] }, { "cell_type": "markdown", "id": "5f420df5", "metadata": {}, "source": [ "### Alternative input format\n", "In this tutorial, we will consider a single file as it is easier to manage at this stage.\n", "\n", "However, your production system might provide sensor data in individual files (1 file per sensor or 1 file for a group of sensors). Having distinct files removes the need to align timestamps for every columns in your datafile and might be easier to process depending on how your data generation and collection pipeline look like.\n", "\n", "Amazon Lookout for Equipment is flexible when it comes to ingesting your data and the service will adapt itself to these different situations as you will see at the ingestion step." ] }, { "cell_type": "markdown", "id": "c2355873", "metadata": {}, "source": [ "### Upload input data to S3\n", "Navigate to the `label-data` folder in your S3 bucket and drag'n drop the `labels.csv` file there.\n", "\n", "Then, navigate to the `facility` folder and upload your `pump_sensors.csv` file there: depending on your Internet connectivity to the selected AWS region, this upload can take a few minutes to complete:\n", "\n", "![Dataset upload](pictures/dataset_upload.png)\n", "\n", "**Congratulations, your S3 bucket is created and populated, you are ready to create your first Lookout for Equipment project!**" ] }, { "cell_type": "markdown", "id": "11db6dca", "metadata": {}, "source": [ "## Create a new Lookout for Equipment project\n", "Now that your dataset is ready and uploaded to an S3 bucket, we are going to use the console to train a model and see the events detected in your dataset.\n", "\n", "1. Access the Lookout for Equipment management console by typing **Lookout** in the search bar as shown below and click on Lookout for Equipment to open the service console:\n", "\n", "![Lookout for Equipment console access](pictures/lookout_service_search.png)" ] }, { "cell_type": "markdown", "id": "f4478301", "metadata": {}, "source": [ "2. Click on the **Create project** button and give a name to your project:\n", "\n", "![Project creation](pictures/create_project.png)" ] }, { "cell_type": "markdown", "id": "df0ead48", "metadata": {}, "source": [ "3. After your project is created, you're brought to the project dashboard where you can click on **Add dataset** to ingest your time series data:\n", "\n", "![Project dashboard](pictures/project_dashboard.png)" ] }, { "cell_type": "markdown", "id": "d18f6589", "metadata": {}, "source": [ "4. On the **Add dataset** screen:\n", " * Click on **Browse** and point to the S3 location where you uploaded your time series data\n", " * Then, select `Create a new role` for the **IAM role**: this lets Lookout for Equipment create a least privileged role that only has access to the S3 bucket selected in the previous step.\n", " * Then, select `By part of the filename` for the **Schema detection method**\n", " * Finally, choose `_` for the **Delimiter**\n", "\n", "You should see screen similar toe the following one:\n", "\n", "![Data ingestion](pictures/data_ingestion.png)" ] }, { "cell_type": "markdown", "id": "61e27ce6", "metadata": {}, "source": [ "5. Click on the **Start ingestion** button. If you elected to let Lookout for Equipment to create a role on your behalf, a blue banner at the top will let you know the role is created and currently propagating through your account. A couple seconds later, the ingestion starts and you are brought back to the project dashboard where you can see the ingestion status:\n", "\n", "![Data ingestion in progress](pictures/data_ingestion_progress.png)" ] }, { "cell_type": "markdown", "id": "891f87a8", "metadata": {}, "source": [ "6. After 5 to 10 minutes, the ingestion is done. The banner at the top of the dashboard changes to green and displays a success message. During ingestion, Lookout for Equipment prepares the dataset so that it can be used for multiple algorithms at training time. The service also grades your signals and generates a data quality report. After ingestion, you can click on **View dataset** to visualize this report:\n", "\n", "![Signal grading report](pictures/signal_grading_report.png)\n", "\n", "This report grades each signal as:\n", "* `High`: no significant validation errors were detected in the data during ingestion. Data from sensors in this category is considered the most reliable for model training and evaluation.\n", "* `Medium`: one or more **potentially** harmful validation errors were detected in the data during ingestion. Data from sensors in this category is considered less reliable for model training and evaluation. The insights provided for the individual sensors may help you improve your dataset quality (for instance, you may have not noticed a lot of missing data on specific signals)\n", "* `Low`: one or more significant validation errors were detected in the data during ingestion. There's a high probability that training a model on data from sensors in this category will result in poor model performance.\n", "\n", "If you click on the individual sensor grades, you will see the reasons behind Lookout for Equipment grade. You can check out the [**documentation page**](https://docs.aws.amazon.com//lookout-for-equipment/latest/ug/reading-details-by-sensor.html?icmpid=docs_console_unmapped) dedicated to Evaluating sensor grades to know more about the error, explanations, actions taken by Lookout for Equipment on your behalf and actions recommended to you as a user.\n", "\n", "Now that your time series dataset has been ingested, you can train an anomaly detection model." ] }, { "cell_type": "markdown", "id": "8c375ce0", "metadata": {}, "source": [ "## Training an anomaly detection model\n", "\n", "We are going to create a new model from the **Dataset** page. To achieve this, you can follow these steps:\n", "\n", "1. In the *Details by sensor* section, select all the sensors you want to use to train a model. You can select up to 300 fields to train a single model and you can use the table headers to sort your signals. Select all the fields available in this dataset by navigating the 2 pages available. Your screen should look similar to the following. Note the 30/30 at the top of this section which means 30 sensors out of the 30 availables have been selected:\n", "\n", "![Training field selection](pictures/training_field_selection.png)" ] }, { "cell_type": "markdown", "id": "340d8281", "metadata": {}, "source": [ "2. Click on **Create model**\n", "3. Give a name to your model (`synthetic-pump-model` for instance) and click **Next**\n", "4. Choose `2019-01-01` to `2019-07-31` for the **Training set date range** (the first 7 months: Lookout for Equipment needs a minimum of 90 days of data to train a model)\n", "5. Choose `2019-08-01` to `2019-10-27` for the **Evaluation set date range** (the last 3 months)\n", "6. For the **Sample rate** choose `15 minutes`\n", " * Sampling rate has an impact on the training time. The original sampling rate of this dataset is `1 minute`: leaving `No rate specified` means Lookout for Equipment will use this value, which will yields a training that lasts around 1.5 hour. For this workshop, you can start by downsampling you data to 15 minutes: training time will be around 15-20 minutes. Come back at this step and train a second model with a different sampling rate to appreciate the difference in the results for this particular dataset.\n", "7. Leave the other fields as is and click **Next**\n", "8. Browse to your S3 bucket and select the location where you upload the `labels.csv` file during the data preparation phase earlier.\n", " * **This is optional** and impacts heavily the training time.\n", " * You can launch a first training without this parameter and train another one afterward to visualize the difference.\n", "9. On the **Review and train** section, scroll down and click on **Start training**. You are brought back to the project dashboard where you can see the training is now in progress:\n", "\n", "![Training in progress](pictures/training_in_progress.png)" ] }, { "cell_type": "markdown", "id": "a456782c", "metadata": {}, "source": [ "When your model has been trained, the model status transitions to *Completed*. Without labels, this training will take **~15 minutes** to train. If you decide to take the labels into account, the training time will be approximately **1.5 hour** with the default sampling rate of 1 minute and approximately **15-20 minutes** if you selected a downsampling to 15 minutes.\n", "\n", "Now that your model has been trained, you can visualize the evaluation results for your model:\n", "\n", "1. Click on **View models** from the project dashboard: the **Models** page is displayed where you can see status and creation time for each model trained in this project:\n", "\n", "![Models list](pictures/models_list.png)" ] }, { "cell_type": "markdown", "id": "77131da1", "metadata": {}, "source": [ "2. Click on the *Name* of the model you want to dive deeper on: the **Model details** page is shown. In the top section (*Model overview*), you will see the *Status* (`Ready for inference`) of your model and the *Training time* (`15 minutes`) that was needed to build it:\n", "\n", "![Model overview](pictures/model_overview.png)" ] }, { "cell_type": "markdown", "id": "a3f3933e", "metadata": {}, "source": [ "3. Scroll down to the *Model performance* section which will look similar to the following:\n", "\n", "![Model performance](pictures/model_performance.png)" ] }, { "cell_type": "markdown", "id": "c5af7e68", "metadata": {}, "source": [ "At the top, you get an overview of the performance of the model, telling you that it was able to:\n", "\n", "* Capture 8 abnormal equipment behavior events in the label ranges (where known anomalies happen): Amazon Lookout for Equipment also computes an average forewarning time of 19 hours and 43 minutes for these events.\n", "* Detect an addition 8 abnormal equipment behavior events outside of the label ranges with an average duration of 39 minutes.\n", "* You also get a time range plot putting the labelled events (in blue) and the detected ones (in red) next to each other. You can use the slider to zoom and pan over the time range more precisely.\n", "\n", "The *Event details* section unpacks up to the top 15 sensors for each detected events. When the page is displayed, the ranking shown are the one of the first detected event. You can select any detected event by clicking on one of the red bars above to update the *Event details* section. When you select a detected events, you will get a screen similar to the following:\n", "\n", "![Top contributing sensors](pictures/model_top_sensors.png)" ] }, { "cell_type": "markdown", "id": "618a48e7", "metadata": {}, "source": [ "## Deploying a trained model" ] }, { "cell_type": "markdown", "id": "8c053cc9", "metadata": {}, "source": [ "### Configuring an inference scheduler\n", "Now that you have a trained model, you can deploy it by creating an inference scheduler. To do this, follow these steps:\n", "\n", "1. On the model performance page, scroll down and select the **Inference schedule tab**. There, click on **Schedule inference**:\n", "\n", "![Inference, initial page](pictures/inference_empty.png)" ] }, { "cell_type": "markdown", "id": "3242c4ed", "metadata": {}, "source": [ "2. Enter a name for your scheduler: I called mine `synthetic-pump-model-scheduler`\n", "3. Click on the **Browse** button next to the **Input data** S3 location and point to the **inference-data/input/** path in your bucket\n", "4. Select a **Data upload frequency** of `15 minutes`: this parameter will have a minimum value equal to the resampling of your trained model.\n", "5. Scroll down to the **Output data** section and select the **inference-data/output/** path in your bucket\n", "6. Leave all the other parameters as is and click **Schedule inference** at the bottom of the screen.\n", "\n", "After a few seconds, you scheduler is up and ready. It will wake up within the next 15 minutes and every 15 minutes after that. Once your scheduler is created, you are brought back to the model details page where you can see you model is ready for inference. If you click on the *Inference schedule* tab, you will see a screen similar to this one:\n", "\n", "![Active scheduler](pictures/inference_active.png)" ] }, { "cell_type": "markdown", "id": "3e1ede23", "metadata": {}, "source": [ "### Preparing inference data\n", "Let’s prepare some data so that they are found by the scheduler: the input file must be a CSV file that follows the same structure than the original dataset. In this example, we had a single CSV file with one timestamp column followed by 30 columns, each containing an individual tag. [**Click here**](https://lookoutforequipmentbucket-us-east-1.s3.amazonaws.com/datasets/getting-started/console/pump_yyyymmddHHMM00.csv) to download a sample of this file.\n", "\n", "![Synthetic pump data timeseries overview](pictures/pump_data_overview.png)\n", "\n", "According to the `labels.csv` file, we know that an anomalous period was mentioned on **2019-10-17**. As an example, we are going to extract the *last 10 rows of that particular day* and put them in a new CSV file.\n", "\n", "When Amazon Lookout for Equipment scheduler wakes up, given how we configured it in the previous step, it looks for a file with a name that will match this pattern: `{assetname}_{yyyyMMddHmmss}.csv`.\n", "\n", "At the time this document was written, the date / time was **2022-07-29 at 11.36am UTC**: as the scheduler was configured to wake up every 15 minutes, we will save our CSV file with this name: `pump_20220729113000.csv` where `pump` is the name of our component in the original dataset schema. The scheduler will wake up at **11.45am UTC** (this is the next time that is a multiple of 15 minutes) and looks for a file generated 15 minutes before (11.45am minus 15 minutes is 11.30am). Using this naming convention will allow the scheduler to find this file and open it.\n", "\n", "Next we need to make sure the timestamps *inside* the first column of this file also match the timestamp in the name: when the scheduler opens this file, it will indeed look for any row with a date that is between the inference execution start time minus 15 minutes and the actual execution time. In our case, we need to have at least one row with a timestamp between 11.30am UTC and 11.45pm UTC.\n", "\n", "Update the timestamp column of your CSV file and make sure the UTC date and time matches the timestamp you used to name the file. Your initial file should look somewhat similar to this (note the timestamp column in UTC time *and* matching the timestamp in the filename):\n", "\n", "![Inference input](pictures/inference_input.png)" ] }, { "cell_type": "markdown", "id": "471ee975", "metadata": {}, "source": [ "Then, upload your inference data to your S3 bucket, in the `inference-data/input/` location. After a few minutes, your scheduler will wake up, look into the input location configured above, run the data found against your trained model and output the results in JSON format in the output location in Amazon S3. The scheduler *Inference history* is updated and should look similar to the following: if a file is not found, or if no suitable timestamp is found in the inference input, the status will be marked as *Failed*:\n", "\n", "![Inference success](pictures/inference_success.png)" ] }, { "cell_type": "markdown", "id": "464af206", "metadata": {}, "source": [ "Navigate to your output Amazon S3 location and locate the JSON files to review its content and check the events potentially detected by Amazon Lookout for Equipment. For an example on how you can post-process and visualize these results, feel free to check out the *Python Notebook section* in the Getting Started folder of this GitHub repo. This is first JSON line you will see in this file if you open it:\n", "\n", "```json\n", "{\n", " \"timestamp\": \"2022-07-29T11:45:00.000000\", \n", " \"prediction\": 1, \n", " \"prediction_reason\": \"ANOMALY_DETECTED\", \n", " \"anomaly_score\": 0.97399, \n", " \"diagnostics\": [\n", " {\"name\": \"pump\\\\Sensor0\", \"value\": 0.04}, \n", " {\"name\": \"pump\\\\Sensor1\", \"value\": 0.0}, \n", " {\"name\": \"pump\\\\Sensor10\", \"value\": 0.06}, \n", " {\"name\": \"pump\\\\Sensor11\", \"value\": 0.0}, \n", " {\"name\": \"pump\\\\Sensor12\", \"value\": 0.06}, \n", " \n", " [...]\n", " \n", " {\"name\": \"pump\\\\Sensor9\", \"value\": 0.0}\n", " ]\n", "}\n", "\n", "```\n", "\n", "For each row with a timestamp that falls between two inference executions, you will get:\n", "\n", "* The actual *timestamp*\n", "* The *prediction* (0 for nothing detected and 1 if an anomaly is detected)\n", "* The *raw anomaly score* (a float number between 0.0 and 1.0)\n", "* If the prediction is 1, then you will also have a *diagnostics* field in the JSON file which will list each component\\sensor pair and the contribution to the event as a percentage (similarly to what is output in the diagnostic page of a trained model)" ] }, { "cell_type": "markdown", "id": "b3057fbb", "metadata": {}, "source": [ "## Conclusion\n", "### Cleanup\n", "Don't forget to stop your scheduler to stop incurring cost. If you stop the scheduler, your inference history will be maintained: if you delete the scheduler, you will lose the inference history.\n", "\n", "You are free to leave the trained model and your project as they won't incur you anymore cost.\n", "\n", "Otherwise, feel free to delete the model, then the project, empty your S3 bucket and delete it." ] } ], "metadata": { "kernelspec": { "display_name": "conda_python3", "language": "python", "name": "conda_python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.12" } }, "nbformat": 4, "nbformat_minor": 5 }