{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# End to End example to manage lifecycle of ML models deployed on the edge using SageMaker Edge Manager\n", "\n", "**SageMaker Studio Kernel**: Data Science\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Contents \n", "\n", "* Use Case\n", "* Workflow\n", "* Setup\n", "* Building and Deploying the ML Model\n", "* Running the fleet of Virtual Wind Turbines and Edge Devices\n", "* Cleanup\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Use Case\n", "\n", "The challenge we're trying to address here is to detect anomalies in the components of a Wind Turbine. Each wind turbine has many sensors that reads data like:\n", " - Internal & external temperature\n", " - Wind speed\n", " - Rotor speed\n", " - Air pressure\n", " - Voltage (or current) in the generator\n", " - Vibration in the GearBox (using an IMU -> Accelerometer + Gyroscope)\n", "\n", "So, depending on the types of the anomalies we want to detect, we need to select one or more features and then prepare a dataset that 'explains' the anomalies. We are interested in three types of anomalies:\n", " - Rotor speed (when the rotor is not in an expected speed)\n", " - Produced voltage (when the generator is not producing the expected voltage)\n", " - Gearbox vibration (when the vibration of the gearbox is far from the expected)\n", " \n", "All these three anomalies (or violations) depend on many variables while the turbine is working. Thus, in order to address that, let's use a ML model called [Autoencoder](https://en.wikipedia.org/wiki/Autoencoder), with correlated features. This model is unsupervised. It learns the latent representation of the dataset and tries to predict (regression) the same tensor given as input. The strategy then is to use a dataset collected from a normal turbine (without anomalies). The model will then learn **'what is a normal turbine'**. When the sensors readings of a malfunctioning turbine is used as input, the model will not be able to rebuild the input, predicting something with a high error and detected as an anomaly.\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Workflow" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "In this example, you will create a robust end-to-end solution that manages the lifecycle of ML models deployed to a wind turbine fleet to detect the anomalies in the operation using SageMaker Edge Manager.\n", "\n", " - Prepare a ML model\n", " - download a pre-trained model;\n", " - compile the ML model with SageMaker Neo for Linux x86_64;\n", " - create a deployment package using SageMaker Edge Manager;\n", " - download/unpack the deployment package;\n", " - Download/unpack a package with the IoT certificates, required by the agent; \n", " - Download/unpack **SageMaker Edge Agent** for Linux x86_64;\n", " - Generate the protobuf/grpc stubs (.py scripts) - with these files we will send requests via unix:// sockets to the agent; \n", " - Using some helper functions, we're going to interact with the agent and do some tests.\n", "\n", "The following diagram shows the resources, required to run this experiment and understand how the agent works and how to interact with it. \n", "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Step 1 - Setup " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Installing some required libraries" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "!apt-get -y update && apt-get -y install build-essential procps\n", "!pip install --quiet -U numpy sysv_ipc boto3 grpcio-tools grpcio protobuf sagemaker\n", "!pip install --quiet -U matplotlib==3.4.1 seaborn==0.11.1\n", "!pip install --quiet -U grpcio-tools grpcio protobuf\n", "!pip install --quiet paho-mqtt\n", "!pip install --quiet ipywidgets" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import boto3\n", "import tarfile\n", "import os\n", "import stat\n", "import io\n", "import time\n", "import sagemaker\n", "import pandas as pd\n", "import matplotlib.pyplot as plt\n", "from datetime import datetime\n", "import numpy as np\n", "import glob" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Let's take a look at the dataset and its features\n", "Download the dataset " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%matplotlib inline\n", "%config InlineBackend.figure_format='retina'\n", "\n", "!mkdir -p data\n", "!curl https://aws-ml-blog.s3.amazonaws.com/artifacts/monitor-manage-anomaly-detection-model-wind-turbine-fleet-sagemaker-neo/dataset_wind_turbine.csv.gz -o data/dataset_wind.csv.gz\n", " \n", "parser = lambda date: datetime.strptime(date, '%Y-%m-%dT%H:%M:%S.%f+00:00')\n", "df = pd.read_csv('data/dataset_wind.csv.gz', compression=\"gzip\", sep=',', low_memory=False, parse_dates=[ 'eventTime'], date_parser=parser)\n", "\n", "df.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Features:\n", " - **nanoId**: id of the edge device that collected the data\n", " - **turbineId**: id of the turbine that produced this data\n", " - **arduino_timestamp**: timestamp of the arduino that was operating this turbine\n", " - **nanoFreemem**: amount of free memory in bytes\n", " - **eventTime**: timestamp of the row\n", " - **rps**: rotation of the rotor in Rotations Per Second\n", " - **voltage**: voltage produced by the generator in milivolts\n", " - **qw, qx, qy, qz**: quaternion angular acceleration\n", " - **gx, gy, gz**: gravity acceleration\n", " - **ax, ay, az**: linear acceleration\n", " - **gearboxtemp**: internal temperature\n", " - **ambtemp**: external temperature\n", " - **humidity**: air humidity\n", " - **pressure**: air pressure\n", " - **gas**: air quality\n", " - **wind_speed_rps**: wind speed in Rotations Per Second" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Step 2 - Deploying the pre-built ML Model\n", "\n", "\n", "In this below section you will :\n", "\n", " - Compile/Optimize your pre-trained model to your edge device (Linux X86_64) using [SageMaker NEO](https://docs.aws.amazon.com/sagemaker/latest/dg/neo.html)\n", " - Create a deployment package with a signed model + the runtime used by SageMaker Edge Agent to load and invoke the optimized model\n", " - Deploy the package using IoT Jobs\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "project_name='wind-turbine-farm'\n", "\n", "s3_client = boto3.client('s3')\n", "sm_client = boto3.client('sagemaker')\n", "\n", "project_id = sm_client.describe_project(ProjectName=project_name)['ProjectId']\n", "bucket_name = 'sagemaker-wind-turbine-farm-%s' % project_id\n", "\n", "prefix='wind_turbine_anomaly'\n", "sagemaker_session=sagemaker.Session(default_bucket=bucket_name)\n", "role = sagemaker.get_execution_role()\n", "print('Project name: %s' % project_name)\n", "print('Project id: %s' % project_id)\n", "print('Bucket name: %s' % bucket_name)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Compiling/Packaging/Deploying our ML model to our edge devices\n", "\n", "Invoking SageMaker NEO to compile the pre-trained model. To know how this model was trained please refer to the training notebook [here](https://github.com/aws-samples/amazon-sagemaker-edge-manager-workshop/tree/main/lab/02-Training). \n", "\n", "Upload the pre-trained model to S3 bucket" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "model_file = open(\"model/model.tar.gz\", \"rb\")\n", "boto3.Session().resource(\"s3\").Bucket(bucket_name).Object('model/model.tar.gz').upload_fileobj(model_file)\n", "print(\"Model successfully uploaded!\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "It will compile the model for targeted hardware and OS with SageMaker Neo service. It will also include the [deep learning runtime](https://github.com/neo-ai/neo-ai-dlr) in the model package." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "compilation_job_name = 'wind-turbine-anomaly-%d' % int(time.time()*1000)\n", "sm_client.create_compilation_job(\n", " CompilationJobName=compilation_job_name,\n", " RoleArn=role,\n", " InputConfig={\n", " 'S3Uri': 's3://%s/model/model.tar.gz' % sagemaker_session.default_bucket(),\n", " 'DataInputConfig': '{\"input0\":[1,6,10,10]}',\n", " 'Framework': 'PYTORCH'\n", " },\n", " OutputConfig={\n", " 'S3OutputLocation': 's3://%s/wind_turbine/optimized/' % sagemaker_session.default_bucket(), \n", " 'TargetPlatform': { 'Os': 'LINUX', 'Arch': 'X86_64' }\n", " },\n", " StoppingCondition={ 'MaxRuntimeInSeconds': 900 }\n", ")\n", "while True:\n", " resp = sm_client.describe_compilation_job(CompilationJobName=compilation_job_name) \n", " if resp['CompilationJobStatus'] in ['STARTING', 'INPROGRESS']:\n", " print('Running...')\n", " else:\n", " print(resp['CompilationJobStatus'], compilation_job_name)\n", " break\n", " time.sleep(5)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Building the Deployment Package SageMaker Edge Manager\n", "It will sign the model and create a deployment package with:\n", " - The optimized model\n", " - Model Metadata" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import time\n", "model_version = '1.0'\n", "model_name = 'WindTurbineAnomalyDetection'\n", "edge_packaging_job_name='wind-turbine-anomaly-%d' % int(time.time()*1000)\n", "resp = sm_client.create_edge_packaging_job(\n", " EdgePackagingJobName=edge_packaging_job_name,\n", " CompilationJobName=compilation_job_name,\n", " ModelName=model_name,\n", " ModelVersion=model_version,\n", " RoleArn=role,\n", " OutputConfig={\n", " 'S3OutputLocation': 's3://%s/%s/model/' % (bucket_name, prefix)\n", " }\n", ")\n", "while True:\n", " resp = sm_client.describe_edge_packaging_job(EdgePackagingJobName=edge_packaging_job_name) \n", " if resp['EdgePackagingJobStatus'] in ['STARTING', 'INPROGRESS']:\n", " print('Running...')\n", " else:\n", " print(resp['EdgePackagingJobStatus'], compilation_job_name)\n", " break\n", " time.sleep(5)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Deploy the package\n", "Using IoT Jobs, we will notify the Python application in the edge devices. The application will:\n", " - Download the deployment package\n", " - Unpack it\n", " - Load the new mode (unload previous versions if any)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import boto3\n", "import json\n", "import sagemaker\n", "import uuid\n", "\n", "iot_client = boto3.client('iot')\n", "sts_client = boto3.client('sts')\n", "\n", "model_version = '1.0'\n", "model_name = 'WindTurbineAnomalyDetection'\n", "sagemaker_session=sagemaker.Session()\n", "region_name = sagemaker_session.boto_session.region_name\n", "account_id = sts_client.get_caller_identity()[\"Account\"]" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "resp = iot_client.create_job(\n", " jobId=str(uuid.uuid4()),\n", " targets=[\n", " 'arn:aws:iot:%s:%s:thinggroup/WindTurbineFarm-%s' % (region_name, account_id, project_id), \n", " ],\n", " document=json.dumps({\n", " 'type': 'new_model',\n", " 'model_version': model_version,\n", " 'model_name': model_name,\n", " 'model_package_bucket': bucket_name,\n", " 'model_package_key': \"%s/model/%s-%s.tar.gz\" % (prefix, model_name, model_version) \n", " }),\n", " targetSelection='SNAPSHOT'\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Alright! Now, the deployment process will start on the connected edge devices!\n", "\n", "## Step 3 - Running the fleet of Virtual Wind Turbines and Edge Devices" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In this section you will run a local application written in Python3 that simulates 5 Wind Turbines and 5 edge devices. The SageMaker Edge Agent is deployed on the edge devices.\n", "\n", "Here you'll be the **Wind Turbine Farm Operator**. It's possible to visualize the data flowing from the sensors to the ML Model and analyze the anomalies. Also, you'll be able to inject noise (pressing some buttons) in the data to simulate potential anomalies with the equipment.\n", "\n", "
ARCHITECTURE | \n", "PYTHON CLASS STRUCTURE in DEMO | \n", "
![]() | \n",
" ![]() | \n",
"