{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# **Amazon Lookout for Equipment** - Getting started\n", "*Part 5 - Scheduling regular inference calls*" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Initialization\n", "---\n", "This repository is structured as follow:\n", "\n", "```sh\n", ". lookout-equipment-demo\n", "|\n", "├── data/\n", "| ├── interim # Temporary intermediate data\n", "| ├── processed # Finalized datasets\n", "| └── raw # Immutable original data\n", "|\n", "├── getting_started/\n", "| ├── 1_data_preparation.ipynb\n", "| ├── 2_dataset_creation.ipynb\n", "| ├── 3_model_training.ipynb\n", "| ├── 4_model_evaluation.ipynb\n", "| ├── 5_inference_scheduling.ipynb <<< THIS NOTEBOOK <<<\n", "| ├── 6_visualization_with_quicksight.ipynb\n", "| └── 7_cleanup.ipynb\n", "|\n", "└── utils/\n", " └── lookout_equipment_utils.py\n", "```" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Notebook configuration update" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\u001b[33mWARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv\u001b[0m\u001b[33m\n", "\u001b[0m" ] } ], "source": [ "!pip install --quiet --upgrade sagemaker lookoutequipment" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Imports" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "import boto3\n", "import config\n", "import datetime\n", "import matplotlib.pyplot as plt\n", "import matplotlib.ticker as mtick\n", "import numpy as np\n", "import os\n", "import pandas as pd\n", "import pytz\n", "import sagemaker\n", "import sys\n", "import time\n", "\n", "from matplotlib.gridspec import GridSpec\n", "\n", "# SDK / toolbox for managing Lookout for Equipment API calls:\n", "import lookoutequipment as lookout" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### AWS Look & Feel definition for Matplotlib" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "from matplotlib import font_manager\n", "\n", "# Load style sheet:\n", "plt.style.use('../utils/aws_matplotlib_template.py')\n", "\n", "# Get colors from custom AWS palette:\n", "prop_cycle = plt.rcParams['axes.prop_cycle']\n", "colors = prop_cycle.by_key()['color']" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Parameters" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "TMP_DATA = os.path.join('..', 'data', 'interim', 'getting-started')\n", "PROCESSED_DATA = os.path.join('..', 'data', 'processed', 'getting-started')\n", "INFERENCE_DATA = os.path.join(PROCESSED_DATA, 'inference-data')\n", "TRAIN_DATA = os.path.join(PROCESSED_DATA, 'training-data', 'centrifugal-pump')\n", "\n", "os.makedirs(INFERENCE_DATA, exist_ok=True)\n", "os.makedirs(os.path.join(INFERENCE_DATA, 'input'), exist_ok=True)\n", "os.makedirs(os.path.join(INFERENCE_DATA, 'output'), exist_ok=True)\n", "\n", "ROLE_ARN = sagemaker.get_execution_role()\n", "REGION_NAME = boto3.session.Session().region_name\n", "BUCKET = config.BUCKET\n", "PREFIX = config.PREFIX_INFERENCE\n", "INFERENCE_SCHEDULER_NAME = config.INFERENCE_SCHEDULER_NAME\n", "MODEL_NAME = config.MODEL_NAME\n", "\n", "%matplotlib inline" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Create an inference scheduler\n", "---\n", "While navigating to the model details part of the console, you will see that you have no inference scheduled yet:\n", "\n", "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Scheduler configuration\n", "Let's create a new inference schedule: some parameters are mandatory, while others offer some added flexibility.\n", "\n", "#### Mandatory Parameters\n", "\n", "* Set `upload_frequency` at which the data will be uploaded for inference. Allowed values are `PT5M`, `PT10M`, `PT15M`, `PT30M` and `PT1H`.\n", " * This is both the frequency of the inference scheduler and how often data are uploaded to the source bucket.\n", " * **Note**: ***the upload frequency must be compatible with the sampling rate selected at training time.*** *For example, if a model was trained with a 30 minutes resampling, asking for 5 minutes won't work and you need to select either PT30M and PT1H for this parameter at inference time.*\n", "* Set `input_bucket` to the S3 bucket of your inference data\n", "* Set `input_prefix` to the S3 prefix of your inference data\n", "* Set `output_bucket` to the S3 bucket where you want inference results\n", "* Set `output_prefix` to the S3 prefix where you want inference results\n", "* Set `role_arn` to the role to be used to **read** data to infer on and **write** inference output" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Time zone parameter (optional)\n", "\n", "You can set `INPUT_TIMEZONE_OFFSET` to the following allowed values: `+00:00`, `+00:30`, `+01:00`, ... `+11:30`, `+12:00`, `-00:00`, `-00:30`, `-01:00`, ... `-11:30`, `-12:00`.\n", "\n", "This is the timezone the scheduler will use to find the input files to run inference for. A timezone's offset refers to how many hours the timezone is from Coordinated Universal Time (UTC).\n", "\n", "Let's take an example:\n", "* The current date April 5th, 2021 and time is 1pm UTC\n", "* You're in India, which is 5 hour 30 ahead of UTC and you set the `INPUT_TIMEZONE_OFFSET` to `+05:30`\n", "* If the scheduler wakes up at 1pm UTC, A filename called 20210405**1830**00 will be found (1pm + 5H30 = 6.30pm)\n", "\n", "Use the following cell to convert time zone identifier (`Europe/Paris`, `US/Central`...) to a time zone offset. You can build a timezone object by leveraging the World Timezone Definition **[available here](https://gist.github.com/heyalexej/8bf688fd67d7199be4a1682b3eec7568)** or by listing the available ones using this code snippet:\n", "```python\n", "import pytz\n", "for tz in pytz.all_timezones:\n", " print tz\n", "```\n", "If you want to use universal time, replace the timezone string below (`Asia/Calcutta`) by `UTC`:" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'+00:00'" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "utc_timezone = pytz.timezone(\"UTC\")\n", "\n", "# current_timezone = pytz.timezone(\"Asia/Calcutta\")\n", "current_timezone = pytz.timezone(\"UTC\")\n", "tz_offset = datetime.datetime.now(current_timezone).strftime('%z')\n", "tz_offset = tz_offset[:3] + ':' + tz_offset[3:]\n", "tz_offset" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Other optional parameters\n", "\n", "* Set `delay_offset` to the number of minutes you expect the data to be delayed to upload. It's a time buffer to upload data.\n", "* Set `timestamp_format`. The allowed values `EPOCH`, `yyyy-MM-dd-HH-mm-ss` or `yyyyMMddHHmmss`. This is the format of timestamp which is the suffix of the input data file name. This is used by Lookout Equipment to understand which files to run inference on (so that you don't need to remove previous files to let the scheduler finds which one to run on).\n", "* Set `component_delimiter`. The allowed values `-`, `_` or ` `. This is the delimiter character that is used to separate the component from the timestamp in the input filename." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Create the inference scheduler\n", "The CreateInferenceScheduler API creates a scheduler. The following code prepares the configuration but does not create the scheduler just yet:" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [], "source": [ "scheduler = lookout.LookoutEquipmentScheduler(\n", " scheduler_name=INFERENCE_SCHEDULER_NAME,\n", " model_name=MODEL_NAME\n", ")\n", "\n", "scheduler_params = {\n", " 'input_bucket': BUCKET,\n", " 'input_prefix': f'{PREFIX}/input/',\n", " 'output_bucket': BUCKET,\n", " 'output_prefix': f'{PREFIX}/output/',\n", " 'role_arn': ROLE_ARN,\n", " 'upload_frequency': 'PT5M',\n", " 'delay_offset': None,\n", " 'timezone_offset': tz_offset,\n", " 'component_delimiter': '_',\n", " 'timestamp_format': 'yyyyMMddHHmmss'\n", "}\n", "\n", "scheduler.set_parameters(**scheduler_params)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Prepare the inference data\n", "---\n", "Let's prepare and send some data in the S3 input location our scheduler will monitor: we are going to extract 10 sequences of 5 minutes each (5 minutes being the minimum scheduler frequency). We assume that data are sampled at a rate of one data point per minute meaning that each sequence will be a CSV with 5 rows (to match the scheduler frequency). We have set aside a file we can use for inference. We need to update the timestamps to match the current time and date and then split the file in individual datasets of 5 rows each." ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [], "source": [ "# Load the original inference data:\n", "inference_fname = os.path.join(TMP_DATA, 'inference-data', 'inference.csv')\n", "inference_df = pd.read_csv(inference_fname)\n", "inference_df['Timestamp'] = pd.to_datetime(inference_df['Timestamp'])\n", "inference_df = inference_df.set_index('Timestamp')\n", "\n", "# How many sequences do we want to extract:\n", "num_sequences = 12\n", "\n", "# The scheduling frequency in minutes: this **MUST** match the\n", "# resampling rate used to train the model:\n", "frequency = 5\n", "start = inference_df.index.min()\n", "for i in range(num_sequences):\n", " end = start + datetime.timedelta(minutes=+frequency - 1)\n", " inference_input = inference_df.loc[start:end, :]\n", " start = start + datetime.timedelta(minutes=+frequency)\n", " \n", " # Rounding time to the previous X minutes \n", " # where X is the selected frequency:\n", " filename_tm = datetime.datetime.now(current_timezone)\n", " filename_tm = filename_tm - datetime.timedelta(\n", " minutes=filename_tm.minute % frequency,\n", " seconds=filename_tm.second,\n", " microseconds=filename_tm.microsecond\n", " )\n", " filename_tm = filename_tm + datetime.timedelta(minutes=+frequency * (i))\n", " current_timestamp = (filename_tm).strftime(format='%Y%m%d%H%M%S')\n", " \n", " # The timestamp inside the file are in UTC and are not linked to the current timezone:\n", " timestamp_tm = datetime.datetime.now(utc_timezone)\n", " timestamp_tm = timestamp_tm - datetime.timedelta(\n", " minutes=timestamp_tm.minute % frequency,\n", " seconds=timestamp_tm.second,\n", " microseconds=timestamp_tm.microsecond\n", " )\n", " timestamp_tm = timestamp_tm + datetime.timedelta(minutes=+frequency * (i))\n", " \n", " # We need to reset the index to match the time \n", " # at which the scheduler will run inference:\n", " new_index = pd.date_range(\n", " start=timestamp_tm,\n", " periods=inference_input.shape[0], \n", " freq='1min'\n", " )\n", " inference_input.index = new_index\n", " inference_input.index.name = 'Timestamp'\n", " inference_input = inference_input.reset_index()\n", " inference_input['Timestamp'] = inference_input['Timestamp'].dt.strftime('%Y-%m-%dT%H:%M:%S.%f')\n", " \n", " # Export this file in CSV format:\n", " scheduled_fname = os.path.join(INFERENCE_DATA, 'input', f'centrifugal-pump_{current_timestamp}.csv')\n", " inference_input.to_csv(scheduled_fname, index=None)\n", " \n", "# Upload the whole folder to S3, in the input location:\n", "!aws s3 cp --recursive --quiet $INFERENCE_DATA/input s3://$BUCKET/$PREFIX/input" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Our S3 bucket is now in the following state: this emulates what you could expect if your industrial information system was sending a new sample of data every five minutes.\n", "\n", "Note how:\n", "* Every files are located in the same folder\n", "* Each file has the recorded timestamp in its name\n", "* The timestamps are rounding to the closest 5 minutes (as our scheduler is configured to wake up every 5 minutes)\n", "\n", "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now that we've prepared the data, we can create the scheduler by running:\n", "\n", "```python\n", "create_scheduler_response = lookout_client.create_inference_scheduler({\n", " 'ClientToken': uuid.uuid4().hex\n", "})\n", "```\n", "\n", "The following method encapsulates the call to the [**CreateInferenceScheduler**](https://docs.aws.amazon.com/lookout-for-equipment/latest/ug/API_CreateInferenceScheduler.html) API:" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "This scheduler already exists. Try changing its nameand retry or try to start it.\n" ] } ], "source": [ "create_scheduler_response = scheduler.create()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Our scheduler is now running and its inference history is currently empty:\n", "\n", "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Get inference results\n", "---" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### List inference executions" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Let's now wait for 5-15 minutes to give some time to the scheduler to run its first inferences.** Once the wait is over, we can use the ListInferenceExecution API for our current inference scheduler. The only mandatory parameter is the scheduler name.\n", "\n", "You can also choose a time period for which you want to query inference executions for. If you don't specify it, then all executions for an inference scheduler will be listed. If you want to specify the time range, you can do this:\n", "\n", "```python\n", "START_TIME_FOR_INFERENCE_EXECUTIONS = datetime.datetime(2010,1,3,0,0,0)\n", "END_TIME_FOR_INFERENCE_EXECUTIONS = datetime.datetime(2010,1,5,0,0,0)\n", "```\n", "\n", "Which means the executions after `2010-01-03 00:00:00` and before `2010-01-05 00:00:00` will be listed.\n", "\n", "You can also choose to query for executions in particular status, the allowed status are `IN_PROGRESS`, `SUCCESS` and `FAILED`.\n", "\n", "The following cell use `scheduler.list_inference_executions()` as a wrapper around the [**ListInferenceExecutions**](https://docs.aws.amazon.com/lookout-for-equipment/latest/ug/API_ListInferenceExecutions.html) API:\n", "\n", "```python\n", " list_executions_response = lookout_client.list_inference_executions({\n", " \"MaxResults\": 50,\n", " \"InferenceSchedulerName\": INFERENCE_SCHEDULER_NAME,\n", " \"Status\": EXECUTION_STATUS,\n", " \"DataStartTimeAfter\": START_TIME_FOR_INFERENCE_EXECUTIONS,\n", " \"DataEndTimeBefore\": END_TIME_FOR_INFERENCE_EXECUTIONS\n", " })\n", "```" ] }, { "cell_type": "code", "execution_count": 28, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "FIRST INFERENCE EXECUTED\n", "\n", "26 inference execution(s) found.\n", "Displaying the first three ones:\n" ] }, { "data": { "text/plain": [ "[{'ModelName': 'getting-started-pump-model',\n", " 'ModelArn': 'arn:aws:lookoutequipment:eu-west-1:038552646228:model/getting-started-pump-model/07a0f7e9-0fb1-42e8-94d8-b4cd92778ebb',\n", " 'InferenceSchedulerName': 'getting-started-pump-scheduler',\n", " 'InferenceSchedulerArn': 'arn:aws:lookoutequipment:eu-west-1:038552646228:inference-scheduler/getting-started-pump-scheduler/8098a92f-9049-44d6-9c68-aa5cd0c69559',\n", " 'ScheduledStartTime': datetime.datetime(2022, 5, 13, 12, 15, tzinfo=tzlocal()),\n", " 'DataStartTime': datetime.datetime(2022, 5, 13, 12, 10, tzinfo=tzlocal()),\n", " 'DataEndTime': datetime.datetime(2022, 5, 13, 12, 15, tzinfo=tzlocal()),\n", " 'DataInputConfiguration': {'S3InputConfiguration': {'Bucket': 'lookout-equipment-poc',\n", " 'Prefix': 'getting_started/inference-data/input/'}},\n", " 'DataOutputConfiguration': {'S3OutputConfiguration': {'Bucket': 'lookout-equipment-poc',\n", " 'Prefix': 'getting_started/inference-data/output/'}},\n", " 'Status': 'FAILED',\n", " 'FailedReason': \"Could not find data file for sensor centrifugal-pump within expected range of (2022-05-13T12:10:00Z, 2022-05-13T12:15:00Z]. Please ensure data is present for all sensors of the machine under prefix 's3://getting_started/inference-data/input/getting_started/inference-data/input/centrifugal-pump_20220513120500' ~ 's3://getting_started/inference-data/input/getting_started/inference-data/input/centrifugal-pump_20220513122000'\"},\n", " {'ModelName': 'getting-started-pump-model',\n", " 'ModelArn': 'arn:aws:lookoutequipment:eu-west-1:038552646228:model/getting-started-pump-model/07a0f7e9-0fb1-42e8-94d8-b4cd92778ebb',\n", " 'InferenceSchedulerName': 'getting-started-pump-scheduler',\n", " 'InferenceSchedulerArn': 'arn:aws:lookoutequipment:eu-west-1:038552646228:inference-scheduler/getting-started-pump-scheduler/8098a92f-9049-44d6-9c68-aa5cd0c69559',\n", " 'ScheduledStartTime': datetime.datetime(2022, 5, 13, 12, 10, tzinfo=tzlocal()),\n", " 'DataStartTime': datetime.datetime(2022, 5, 13, 12, 5, tzinfo=tzlocal()),\n", " 'DataEndTime': datetime.datetime(2022, 5, 13, 12, 10, tzinfo=tzlocal()),\n", " 'DataInputConfiguration': {'S3InputConfiguration': {'Bucket': 'lookout-equipment-poc',\n", " 'Prefix': 'getting_started/inference-data/input/'}},\n", " 'DataOutputConfiguration': {'S3OutputConfiguration': {'Bucket': 'lookout-equipment-poc',\n", " 'Prefix': 'getting_started/inference-data/output/'}},\n", " 'Status': 'FAILED',\n", " 'FailedReason': \"Could not find data file for sensor centrifugal-pump within expected range of (2022-05-13T12:05:00Z, 2022-05-13T12:10:00Z]. Please ensure data is present for all sensors of the machine under prefix 's3://getting_started/inference-data/input/getting_started/inference-data/input/centrifugal-pump_20220513120000' ~ 's3://getting_started/inference-data/input/getting_started/inference-data/input/centrifugal-pump_20220513121500'\"},\n", " {'ModelName': 'getting-started-pump-model',\n", " 'ModelArn': 'arn:aws:lookoutequipment:eu-west-1:038552646228:model/getting-started-pump-model/07a0f7e9-0fb1-42e8-94d8-b4cd92778ebb',\n", " 'InferenceSchedulerName': 'getting-started-pump-scheduler',\n", " 'InferenceSchedulerArn': 'arn:aws:lookoutequipment:eu-west-1:038552646228:inference-scheduler/getting-started-pump-scheduler/8098a92f-9049-44d6-9c68-aa5cd0c69559',\n", " 'ScheduledStartTime': datetime.datetime(2022, 5, 13, 12, 5, tzinfo=tzlocal()),\n", " 'DataStartTime': datetime.datetime(2022, 5, 13, 12, 0, tzinfo=tzlocal()),\n", " 'DataEndTime': datetime.datetime(2022, 5, 13, 12, 5, tzinfo=tzlocal()),\n", " 'DataInputConfiguration': {'S3InputConfiguration': {'Bucket': 'lookout-equipment-poc',\n", " 'Prefix': 'getting_started/inference-data/input/'}},\n", " 'DataOutputConfiguration': {'S3OutputConfiguration': {'Bucket': 'lookout-equipment-poc',\n", " 'Prefix': 'getting_started/inference-data/output/'}},\n", " 'Status': 'FAILED',\n", " 'FailedReason': \"Could not find data file for sensor centrifugal-pump within expected range of (2022-05-13T12:00:00Z, 2022-05-13T12:05:00Z]. Please ensure data is present for all sensors of the machine under prefix 's3://getting_started/inference-data/input/getting_started/inference-data/input/centrifugal-pump_20220513115500' ~ 's3://getting_started/inference-data/input/getting_started/inference-data/input/centrifugal-pump_20220513121000'\"}]" ] }, "execution_count": 28, "metadata": {}, "output_type": "execute_result" } ], "source": [ "START_TIME_FOR_INFERENCE_EXECUTIONS = None\n", "END_TIME_FOR_INFERENCE_EXECUTIONS = None\n", "EXECUTION_STATUS = None\n", "\n", "execution_summaries = []\n", "\n", "while len(execution_summaries) == 0:\n", " execution_summaries = scheduler.list_inference_executions(\n", " start_time=START_TIME_FOR_INFERENCE_EXECUTIONS,\n", " end_time=END_TIME_FOR_INFERENCE_EXECUTIONS,\n", " execution_status=EXECUTION_STATUS\n", " )\n", " if len(execution_summaries) == 0:\n", " print('WAITING FOR THE FIRST INFERENCE EXECUTION')\n", " time.sleep(60)\n", " \n", " else:\n", " print('FIRST INFERENCE EXECUTED\\n')\n", " break\n", " \n", "print(len(execution_summaries), 'inference execution(s) found.')\n", "print('Displaying the first three ones:')\n", "execution_summaries[:3]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We have configured this scheduler to run every five minutes. After at least 5 minutes we can also see the history in the console populated with its first few executions: after an hour or so, we will see that the last ones failed as we only generated 12 files above:\n", "\n", "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "When the scheduler starts (for example at `datetime.datetime(2021, 1, 27, 9, 15)`, it looks for **a single** CSV file located in the input location with a filename that contains a timestamp set to the previous step. For example, a file named:\n", "\n", "* centrifugal-pump_2021012709**10**00.csv will be found and ingested\n", "* centrifugal-pump_2021012708**15**00.csv will **not be** ingested (it will be ingested at the next inference execution however)\n", "\n", "In addition, when opening the file `centrifugal-pump_20210127091000.csv`, it will also open one file before and after this execution time: it will then look for any row with a date that is between the `DataStartTime` and the `DataEndTime` of the inference execution. If it doesn't find such a row in any of these three files, an exception will be thrown: the status of the execution will be marked `Failed`." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Download inference results\n", "Let's have a look at the content now available in the scheduler output location: each inference execution creates a subfolder in the output directory. The subfolder name is the timestamp (GMT) at which the inference was executed and it contains a single [JSON lines](https://jsonlines.org/) file named `results.jsonl`:\n", "\n", "\n", "\n", "Each execution summary is a JSON document that has the following format:" ] }, { "cell_type": "code", "execution_count": 29, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{'ModelName': 'getting-started-pump-model',\n", " 'ModelArn': 'arn:aws:lookoutequipment:eu-west-1:038552646228:model/getting-started-pump-model/07a0f7e9-0fb1-42e8-94d8-b4cd92778ebb',\n", " 'InferenceSchedulerName': 'getting-started-pump-scheduler',\n", " 'InferenceSchedulerArn': 'arn:aws:lookoutequipment:eu-west-1:038552646228:inference-scheduler/getting-started-pump-scheduler/8098a92f-9049-44d6-9c68-aa5cd0c69559',\n", " 'ScheduledStartTime': datetime.datetime(2022, 5, 13, 12, 15, tzinfo=tzlocal()),\n", " 'DataStartTime': datetime.datetime(2022, 5, 13, 12, 10, tzinfo=tzlocal()),\n", " 'DataEndTime': datetime.datetime(2022, 5, 13, 12, 15, tzinfo=tzlocal()),\n", " 'DataInputConfiguration': {'S3InputConfiguration': {'Bucket': 'lookout-equipment-poc',\n", " 'Prefix': 'getting_started/inference-data/input/'}},\n", " 'DataOutputConfiguration': {'S3OutputConfiguration': {'Bucket': 'lookout-equipment-poc',\n", " 'Prefix': 'getting_started/inference-data/output/'}},\n", " 'Status': 'FAILED',\n", " 'FailedReason': \"Could not find data file for sensor centrifugal-pump within expected range of (2022-05-13T12:10:00Z, 2022-05-13T12:15:00Z]. Please ensure data is present for all sensors of the machine under prefix 's3://getting_started/inference-data/input/getting_started/inference-data/input/centrifugal-pump_20220513120500' ~ 's3://getting_started/inference-data/input/getting_started/inference-data/input/centrifugal-pump_20220513122000'\"}" ] }, "execution_count": 29, "metadata": {}, "output_type": "execute_result" } ], "source": [ "execution_summaries[0]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "When the `Status` key from the previous JSON result is set to `SUCCESS`, you can collect the results location in the `CustomerResultObject` field. We are now going to loop through each execution result and download each JSON lines files generated by the scheduler. Then we will insert their results into an overall dataframe for further analysis:" ] }, { "cell_type": "code", "execution_count": 30, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", " | prediction | \n", "anomaly_score | \n", "centrifugal-pump\\Sensor0 | \n", "centrifugal-pump\\Sensor1 | \n", "centrifugal-pump\\Sensor2 | \n", "centrifugal-pump\\Sensor3 | \n", "centrifugal-pump\\Sensor4 | \n", "centrifugal-pump\\Sensor5 | \n", "centrifugal-pump\\Sensor6 | \n", "centrifugal-pump\\Sensor7 | \n", "... | \n", "centrifugal-pump\\Sensor14 | \n", "centrifugal-pump\\Sensor15 | \n", "centrifugal-pump\\Sensor16 | \n", "centrifugal-pump\\Sensor17 | \n", "centrifugal-pump\\Sensor18 | \n", "centrifugal-pump\\Sensor19 | \n", "centrifugal-pump\\Sensor20 | \n", "centrifugal-pump\\Sensor21 | \n", "centrifugal-pump\\Sensor22 | \n", "centrifugal-pump\\Sensor23 | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
timestamp | \n", "\n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " |
2022-05-13 10:06:00 | \n", "1 | \n", "0.92249 | \n", "0.02000 | \n", "0.02000 | \n", "0.12000 | \n", "0.18000 | \n", "0.14000 | \n", "0.00000 | \n", "0.00000 | \n", "0.00000 | \n", "... | \n", "0.08000 | \n", "0.00000 | \n", "0.00000 | \n", "0.02000 | \n", "0.02000 | \n", "0.02000 | \n", "0.0 | \n", "0.00000 | \n", "0.00000 | \n", "0.00000 | \n", "
2022-05-13 10:07:00 | \n", "1 | \n", "0.94968 | \n", "0.04361 | \n", "0.00819 | \n", "0.04916 | \n", "0.07374 | \n", "0.05735 | \n", "0.02361 | \n", "0.12987 | \n", "0.08265 | \n", "... | \n", "0.04458 | \n", "0.00000 | \n", "0.00000 | \n", "0.04361 | \n", "0.02000 | \n", "0.04361 | \n", "0.0 | \n", "0.01181 | \n", "0.00000 | \n", "0.00000 | \n", "
2022-05-13 10:08:00 | \n", "1 | \n", "0.95342 | \n", "0.06348 | \n", "0.01940 | \n", "0.03184 | \n", "0.06185 | \n", "0.05124 | \n", "0.04348 | \n", "0.08410 | \n", "0.05352 | \n", "... | \n", "0.03592 | \n", "0.00000 | \n", "0.13392 | \n", "0.02824 | \n", "0.01295 | \n", "0.02824 | \n", "0.0 | \n", "0.00765 | \n", "0.00000 | \n", "0.00000 | \n", "
2022-05-13 10:09:00 | \n", "1 | \n", "0.93590 | \n", "0.05433 | \n", "0.01660 | \n", "0.04744 | \n", "0.05293 | \n", "0.04385 | \n", "0.05164 | \n", "0.07197 | \n", "0.04580 | \n", "... | \n", "0.04516 | \n", "0.00577 | \n", "0.13480 | \n", "0.02417 | \n", "0.01108 | \n", "0.02417 | \n", "0.0 | \n", "0.00654 | \n", "0.00000 | \n", "0.00577 | \n", "
2022-05-13 10:10:00 | \n", "1 | \n", "0.90920 | \n", "0.05577 | \n", "0.01734 | \n", "0.04974 | \n", "0.05125 | \n", "0.04246 | \n", "0.05000 | \n", "0.07159 | \n", "0.04815 | \n", "... | \n", "0.04437 | \n", "0.00622 | \n", "0.13053 | \n", "0.02530 | \n", "0.01073 | \n", "0.02340 | \n", "0.0 | \n", "0.00634 | \n", "0.00063 | \n", "0.00749 | \n", "
5 rows × 32 columns
\n", "