{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# **Amazon Lookout for Equipment** - Getting started\n",
    "*Part 2 - Dataset creation*"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Initialization\n",
    "---\n",
    "This repository is structured as follow:\n",
    "\n",
    "```sh\n",
    ". lookout-equipment-demo\n",
    "|\n",
    "├── data/\n",
    "|   ├── interim                          # Temporary intermediate data\n",
    "|   ├── processed                        # Finalized datasets\n",
    "|   └── raw                              # Immutable original data\n",
    "|\n",
    "├── getting_started/\n",
    "|   ├── 1_data_preparation.ipynb\n",
    "|   ├── 2_dataset_creation.ipynb               <<< THIS NOTEBOOK <<<\n",
    "|   ├── 3_model_training.ipynb\n",
    "|   ├── 4_model_evaluation.ipynb\n",
    "|   ├── 5_inference_scheduling.ipynb\n",
    "|   ├── 6_visualization_with_quicksight.ipynb\n",
    "|   └── 7_cleanup.ipynb\n",
    "|\n",
    "└── utils/\n",
    "    └── lookout_equipment_utils.py\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Notebook configuration update"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\u001b[33mWARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv\u001b[0m\u001b[33m\n",
      "\u001b[0m\u001b[33mWARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv\u001b[0m\u001b[33m\n",
      "\u001b[0m"
     ]
    }
   ],
   "source": [
    "!pip install --quiet --upgrade pip\n",
    "!pip install --quiet --upgrade sagemaker lookoutequipment"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Imports"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "import boto3\n",
    "import config\n",
    "import os\n",
    "import pandas as pd\n",
    "import pprint\n",
    "import sagemaker\n",
    "import sys\n",
    "import time\n",
    "\n",
    "from datetime import datetime\n",
    "\n",
    "# SDK / toolbox for managing Lookout for Equipment API calls:\n",
    "import lookoutequipment as lookout"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "PROCESSED_DATA = os.path.join('..', 'data', 'processed', 'getting-started')\n",
    "TRAIN_DATA     = os.path.join(PROCESSED_DATA, 'training-data')\n",
    "\n",
    "ROLE_ARN       = sagemaker.get_execution_role()\n",
    "REGION_NAME    = boto3.session.Session().region_name\n",
    "DATASET_NAME   = config.DATASET_NAME\n",
    "BUCKET         = config.BUCKET\n",
    "PREFIX         = config.PREFIX_TRAINING"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Create a dataset\n",
    "---"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Create data schema"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [],
   "source": [
    "lookout_dataset = lookout.LookoutEquipmentDataset(\n",
    "    dataset_name=DATASET_NAME,\n",
    "    component_root_dir=f's3://{BUCKET}/{PREFIX}',\n",
    "    access_role_arn=ROLE_ARN\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The following method encapsulate the [**CreateDataset**](https://docs.aws.amazon.com/lookout-for-equipment/latest/ug/API_CreateDataset.html) API:\n",
    "\n",
    "```python\n",
    "lookout_client.create_dataset(\n",
    "    DatasetName=self.dataset_name,\n",
    "    \n",
    "    # Optional\n",
    "    DatasetSchema={\n",
    "        'InlineDataSchema': \"schema\"\n",
    "    }\n",
    ")\n",
    "```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Dataset \"getting-started-pump\" does not exist, creating it...\n",
      "\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "{'DatasetName': 'getting-started-pump',\n",
       " 'DatasetArn': 'arn:aws:lookoutequipment:eu-west-1:038552646228:dataset/getting-started-pump/9f3b8a45-fa09-4e23-971d-29e0b9e30498',\n",
       " 'Status': 'CREATED',\n",
       " 'ResponseMetadata': {'RequestId': 'b8c933f1-1e0d-43f6-94d8-855c8645a350',\n",
       "  'HTTPStatusCode': 200,\n",
       "  'HTTPHeaders': {'x-amzn-requestid': 'b8c933f1-1e0d-43f6-94d8-855c8645a350',\n",
       "   'content-type': 'application/x-amz-json-1.0',\n",
       "   'content-length': '186',\n",
       "   'date': 'Fri, 13 May 2022 09:01:10 GMT'},\n",
       "  'RetryAttempts': 0}}"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "lookout_dataset.create()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The dataset is now created, but it is empty and ready to receive some timeseries data that we will ingest from the S3 location prepared in the previous notebook:"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "![Dataset created](assets/dataset-created.png)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Ingest data into a dataset\n",
    "---\n",
    "Let's double check the values of all the parameters that will be used to ingest some data into an existing Lookout for Equipment dataset:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "('arn:aws:iam::038552646228:role/AmazonSageMaker-LookoutEquipmentEnv',\n",
       " 'lookout-equipment-poc',\n",
       " 'getting_started/training-data/',\n",
       " 'getting-started-pump')"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "ROLE_ARN, BUCKET, PREFIX, DATASET_NAME"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Launch the ingestion job in the Lookout for Equipment dataset: the following method encapsulates the [**StartDataIngestionJob**](https://docs.aws.amazon.com/lookout-for-equipment/latest/ug/API_StartDataIngestionJob.html) API:\n",
    "\n",
    "```python\n",
    "lookout_client.start_data_ingestion_job(\n",
    "    DatasetName=DATASET_NAME,\n",
    "    RoleArn=ROLE_ARN, \n",
    "    IngestionInputConfiguration={ \n",
    "        'S3InputConfiguration': { \n",
    "            'Bucket': BUCKET,\n",
    "            'Prefix': PREFIX,\n",
    "            'KeyPattern': \"string\"\n",
    "        }\n",
    "    }\n",
    ")\n",
    "```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [],
   "source": [
    "response = lookout_dataset.ingest_data(BUCKET, PREFIX)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The ingestion is launched. With this amount of data (around 50 MB), it should take between less than 5 minutes:\n",
    "\n",
    "![dataset_schema](assets/dataset-ingestion-in-progress.png)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We use the following cell to monitor the ingestion process by calling the following method, which encapsulates the [**DescribeDataIngestionJob**](https://docs.aws.amazon.com/lookout-for-equipment/latest/ug/API_DescribeDataIngestionJob.html) API and runs it every 60 seconds:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "2022-05-13 09:04:11 | Data ingestion: IN_PROGRESS\n",
      "2022-05-13 09:05:11 | Data ingestion: IN_PROGRESS\n",
      "2022-05-13 09:06:11 | Data ingestion: IN_PROGRESS\n",
      "2022-05-13 09:07:11 | Data ingestion: SUCCESS\n"
     ]
    }
   ],
   "source": [
    "lookout_dataset.poll_data_ingestion(sleep_time=60)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In case any issue arise, you can inspect the API response available as a JSON document:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'JobId': 'af28c7e6ea53ad88e43457d8ced8ded4',\n",
       " 'DatasetArn': 'arn:aws:lookoutequipment:eu-west-1:038552646228:dataset/getting-started-pump/9f3b8a45-fa09-4e23-971d-29e0b9e30498',\n",
       " 'IngestionInputConfiguration': {'S3InputConfiguration': {'Bucket': 'lookout-equipment-poc',\n",
       "   'Prefix': 'getting_started/training-data/'}},\n",
       " 'RoleArn': 'arn:aws:iam::038552646228:role/AmazonSageMaker-LookoutEquipmentEnv',\n",
       " 'CreatedAt': datetime.datetime(2022, 5, 13, 9, 3, 8, 123000, tzinfo=tzlocal()),\n",
       " 'Status': 'SUCCESS',\n",
       " 'DataQualitySummary': {'InsufficientSensorData': {'MissingCompleteSensorData': {'AffectedSensorCount': 0},\n",
       "   'SensorsWithShortDateRange': {'AffectedSensorCount': 0}},\n",
       "  'MissingSensorData': {'AffectedSensorCount': 0,\n",
       "   'TotalNumberOfMissingValues': 0},\n",
       "  'InvalidSensorData': {'AffectedSensorCount': 0,\n",
       "   'TotalNumberOfInvalidValues': 0},\n",
       "  'UnsupportedTimestamps': {'TotalNumberOfUnsupportedTimestamps': 0},\n",
       "  'DuplicateTimestamps': {'TotalNumberOfDuplicateTimestamps': 0}},\n",
       " 'IngestedFilesSummary': {'TotalNumberOfFiles': 1,\n",
       "  'IngestedNumberOfFiles': 1,\n",
       "  'DiscardedFiles': []},\n",
       " 'IngestedDataSize': 51535331,\n",
       " 'DataStartTime': datetime.datetime(2019, 1, 1, 0, 0, tzinfo=tzlocal()),\n",
       " 'DataEndTime': datetime.datetime(2019, 10, 27, 23, 55, tzinfo=tzlocal()),\n",
       " 'ResponseMetadata': {'RequestId': 'a292c2c7-ddbf-48b2-9e7f-3e331fa6fd77',\n",
       "  'HTTPStatusCode': 200,\n",
       "  'HTTPHeaders': {'x-amzn-requestid': 'a292c2c7-ddbf-48b2-9e7f-3e331fa6fd77',\n",
       "   'content-type': 'application/x-amz-json-1.0',\n",
       "   'content-length': '1046',\n",
       "   'date': 'Fri, 13 May 2022 09:07:11 GMT'},\n",
       "  'RetryAttempts': 0}}"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "lookout_dataset.ingestion_job_response"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The ingestion should now be complete as can be seen in the console:\n",
    "\n",
    "![Ingestion done](assets/dataset-ingestion-done.png)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Inspecting sensor data quality\n",
    "---"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You can now inspect the data quality of your dataset by clicking on `View dataset`. In this new screen, you will be able to visualize:\n",
    "* Your dataset details with a summary of their grade. In our case, 22 sensors are marked as **High quality* while 8 sensors are marked as **Medium quality**\n",
    "* The total number of sensors ingested\n",
    "* The overall date range\n",
    "* The location of the data source on S3\n",
    "\n",
    "You then have a table with a row for each sensor where you can see the overall date range, the number of days of available data and the sensor grade. Hovering your mouse over a given sensor grade will give you the explanations linked to this grading. In the example below, you can see that Sensor0 was graded as Medium because multiple operating modes are detected. You will be able to use every sensors ingested, but the Lookout for Equipment console gives you some pieces of advice and warns about situations where bad performance may arise further down the road. To read about all the sensor grades the service checks out, [follow this link](https://docs.aws.amazon.com//lookout-for-equipment/latest/ug/reading-details-by-sensor.html):\n",
    "\n",
    "![Ingestion done](assets/dataset-inspection.png)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You can obtain these detailed information by querying the [ListSensorStatistics](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/lookoutequipment.html#LookoutEquipment.Client.list_sensor_statistics) API:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "(30, 17)\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>ComponentName</th>\n",
       "      <th>SensorName</th>\n",
       "      <th>DataExists</th>\n",
       "      <th>DataStartTime</th>\n",
       "      <th>DataEndTime</th>\n",
       "      <th>MissingValues.Count</th>\n",
       "      <th>MissingValues.Percentage</th>\n",
       "      <th>InvalidValues.Count</th>\n",
       "      <th>InvalidValues.Percentage</th>\n",
       "      <th>InvalidDateEntries.Count</th>\n",
       "      <th>InvalidDateEntries.Percentage</th>\n",
       "      <th>DuplicateTimestamps.Count</th>\n",
       "      <th>DuplicateTimestamps.Percentage</th>\n",
       "      <th>CategoricalValues.Status</th>\n",
       "      <th>MultipleOperatingModes.Status</th>\n",
       "      <th>LargeTimestampGaps.Status</th>\n",
       "      <th>MonotonicValues.Status</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>centrifugal-pump</td>\n",
       "      <td>Sensor0</td>\n",
       "      <td>True</td>\n",
       "      <td>2019-01-01 00:00:00+00:00</td>\n",
       "      <td>2019-10-27 23:55:00+00:00</td>\n",
       "      <td>0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>NO_ISSUE_DETECTED</td>\n",
       "      <td>POTENTIAL_ISSUE_DETECTED</td>\n",
       "      <td>NO_ISSUE_DETECTED</td>\n",
       "      <td>NO_ISSUE_DETECTED</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>centrifugal-pump</td>\n",
       "      <td>Sensor1</td>\n",
       "      <td>True</td>\n",
       "      <td>2019-01-01 00:00:00+00:00</td>\n",
       "      <td>2019-10-27 23:55:00+00:00</td>\n",
       "      <td>0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>NO_ISSUE_DETECTED</td>\n",
       "      <td>NO_ISSUE_DETECTED</td>\n",
       "      <td>NO_ISSUE_DETECTED</td>\n",
       "      <td>NO_ISSUE_DETECTED</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>centrifugal-pump</td>\n",
       "      <td>Sensor10</td>\n",
       "      <td>True</td>\n",
       "      <td>2019-01-01 00:00:00+00:00</td>\n",
       "      <td>2019-10-27 23:55:00+00:00</td>\n",
       "      <td>0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>NO_ISSUE_DETECTED</td>\n",
       "      <td>NO_ISSUE_DETECTED</td>\n",
       "      <td>NO_ISSUE_DETECTED</td>\n",
       "      <td>NO_ISSUE_DETECTED</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>centrifugal-pump</td>\n",
       "      <td>Sensor11</td>\n",
       "      <td>True</td>\n",
       "      <td>2019-01-01 00:00:00+00:00</td>\n",
       "      <td>2019-10-27 23:55:00+00:00</td>\n",
       "      <td>0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>NO_ISSUE_DETECTED</td>\n",
       "      <td>NO_ISSUE_DETECTED</td>\n",
       "      <td>NO_ISSUE_DETECTED</td>\n",
       "      <td>NO_ISSUE_DETECTED</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>centrifugal-pump</td>\n",
       "      <td>Sensor12</td>\n",
       "      <td>True</td>\n",
       "      <td>2019-01-01 00:00:00+00:00</td>\n",
       "      <td>2019-10-27 23:55:00+00:00</td>\n",
       "      <td>0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>NO_ISSUE_DETECTED</td>\n",
       "      <td>NO_ISSUE_DETECTED</td>\n",
       "      <td>NO_ISSUE_DETECTED</td>\n",
       "      <td>NO_ISSUE_DETECTED</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "      ComponentName SensorName  DataExists             DataStartTime  \\\n",
       "0  centrifugal-pump    Sensor0        True 2019-01-01 00:00:00+00:00   \n",
       "1  centrifugal-pump    Sensor1        True 2019-01-01 00:00:00+00:00   \n",
       "2  centrifugal-pump   Sensor10        True 2019-01-01 00:00:00+00:00   \n",
       "3  centrifugal-pump   Sensor11        True 2019-01-01 00:00:00+00:00   \n",
       "4  centrifugal-pump   Sensor12        True 2019-01-01 00:00:00+00:00   \n",
       "\n",
       "                DataEndTime  MissingValues.Count  MissingValues.Percentage  \\\n",
       "0 2019-10-27 23:55:00+00:00                    0                       0.0   \n",
       "1 2019-10-27 23:55:00+00:00                    0                       0.0   \n",
       "2 2019-10-27 23:55:00+00:00                    0                       0.0   \n",
       "3 2019-10-27 23:55:00+00:00                    0                       0.0   \n",
       "4 2019-10-27 23:55:00+00:00                    0                       0.0   \n",
       "\n",
       "   InvalidValues.Count  InvalidValues.Percentage  InvalidDateEntries.Count  \\\n",
       "0                    0                       0.0                         0   \n",
       "1                    0                       0.0                         0   \n",
       "2                    0                       0.0                         0   \n",
       "3                    0                       0.0                         0   \n",
       "4                    0                       0.0                         0   \n",
       "\n",
       "   InvalidDateEntries.Percentage  DuplicateTimestamps.Count  \\\n",
       "0                            0.0                          0   \n",
       "1                            0.0                          0   \n",
       "2                            0.0                          0   \n",
       "3                            0.0                          0   \n",
       "4                            0.0                          0   \n",
       "\n",
       "   DuplicateTimestamps.Percentage CategoricalValues.Status  \\\n",
       "0                             0.0        NO_ISSUE_DETECTED   \n",
       "1                             0.0        NO_ISSUE_DETECTED   \n",
       "2                             0.0        NO_ISSUE_DETECTED   \n",
       "3                             0.0        NO_ISSUE_DETECTED   \n",
       "4                             0.0        NO_ISSUE_DETECTED   \n",
       "\n",
       "  MultipleOperatingModes.Status LargeTimestampGaps.Status  \\\n",
       "0      POTENTIAL_ISSUE_DETECTED         NO_ISSUE_DETECTED   \n",
       "1             NO_ISSUE_DETECTED         NO_ISSUE_DETECTED   \n",
       "2             NO_ISSUE_DETECTED         NO_ISSUE_DETECTED   \n",
       "3             NO_ISSUE_DETECTED         NO_ISSUE_DETECTED   \n",
       "4             NO_ISSUE_DETECTED         NO_ISSUE_DETECTED   \n",
       "\n",
       "  MonotonicValues.Status  \n",
       "0      NO_ISSUE_DETECTED  \n",
       "1      NO_ISSUE_DETECTED  \n",
       "2      NO_ISSUE_DETECTED  \n",
       "3      NO_ISSUE_DETECTED  \n",
       "4      NO_ISSUE_DETECTED  "
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "job_id = lookout_dataset.ingestion_job_response['JobId']\n",
    "\n",
    "response = lookout_dataset.client.list_sensor_statistics(DatasetName=DATASET_NAME, IngestionJobId=job_id)\n",
    "results = response['SensorStatisticsSummaries']\n",
    "while 'NextToken' in response:\n",
    "    response = l4e_client.list_sensor_statistics(DatasetName=DATASET_NAME, IngestionJobId=job_id, NextToken=response['NextToken'])\n",
    "    results.extend(response['SensorStatisticsSummaries'])\n",
    "    \n",
    "stats_df = pd.json_normalize(results, max_level=1)\n",
    "print(stats_df.shape)\n",
    "stats_df.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Here are all the characteristics you can get for each sensor:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "ComponentName                              centrifugal-pump\n",
       "SensorName                                          Sensor0\n",
       "DataExists                                             True\n",
       "DataStartTime                     2019-01-01 00:00:00+00:00\n",
       "DataEndTime                       2019-10-27 23:55:00+00:00\n",
       "MissingValues.Count                                       0\n",
       "MissingValues.Percentage                                0.0\n",
       "InvalidValues.Count                                       0\n",
       "InvalidValues.Percentage                                0.0\n",
       "InvalidDateEntries.Count                                  0\n",
       "InvalidDateEntries.Percentage                           0.0\n",
       "DuplicateTimestamps.Count                                 0\n",
       "DuplicateTimestamps.Percentage                          0.0\n",
       "CategoricalValues.Status                  NO_ISSUE_DETECTED\n",
       "MultipleOperatingModes.Status      POTENTIAL_ISSUE_DETECTED\n",
       "LargeTimestampGaps.Status                 NO_ISSUE_DETECTED\n",
       "MonotonicValues.Status                    NO_ISSUE_DETECTED\n",
       "Name: 0, dtype: object"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "stats_df.iloc[0]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Conclusion\n",
    "---"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In this notebook, we created a **Lookout for Equipment dataset** and ingested the S3 data previously uploaded into this dataset. **Move now to the next notebook to train a model based on these data.**"
   ]
  }
 ],
 "metadata": {
  "instance_type": "ml.t3.medium",
  "kernelspec": {
   "display_name": "Python 3 (Data Science)",
   "language": "python",
   "name": "python3__SAGEMAKER_INTERNAL__arn:aws:sagemaker:eu-west-1:470317259841:image/datascience-1.0"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.10"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}