{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "2f21e958",
   "metadata": {
    "papermill": {
     "duration": 0.021009,
     "end_time": "2022-04-18T00:13:14.040150",
     "exception": false,
     "start_time": "2022-04-18T00:13:14.019141",
     "status": "completed"
    },
    "tags": []
   },
   "source": [
    "# Amazon SageMaker Model Monitor\n"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "---\n",
    "\n",
    "This notebook's CI test result for us-west-2 is as follows. CI test results in other regions can be found at the end of the notebook. \n",
    "\n",
    "![This us-west-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/us-west-2/sagemaker_model_monitor|introduction|SageMaker-ModelMonitoring_outputs.ipynb)\n",
    "\n",
    "---"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2f21e958",
   "metadata": {
    "papermill": {
     "duration": 0.021009,
     "end_time": "2022-04-18T00:13:14.040150",
     "exception": false,
     "start_time": "2022-04-18T00:13:14.019141",
     "status": "completed"
    },
    "tags": []
   },
   "source": [
    "This notebook shows how to:\n",
    "\n",
    "\u2022 Host a machine learning model in Amazon SageMaker and capture inference requests, results, and metadata \n",
    "\n",
    "\u2022 Analyze a training dataset to generate baseline constraints\n",
    "\n",
    "\u2022 Monitor a live endpoint for violations against constraints\n",
    "\n",
    "## Background\n",
    "\n",
    "Amazon SageMaker provides every developer and data scientist with the ability to build, train, and deploy machine learning models quickly. Amazon SageMaker is a fully-managed service that encompasses the entire machine learning workflow. You can label and prepare your data, choose an algorithm, train a model, and then tune and optimize it for deployment. You can deploy your models to production with Amazon SageMaker to make predictions and lower costs than was previously possible.\n",
    "\n",
    "In addition, Amazon SageMaker enables you to capture the input, output and metadata for invocations of the models that you deploy. It also enables you to analyze the data and monitor its quality. In this notebook, you learn how Amazon SageMaker enables these capabilities.\n",
    "\n",
    "## Runtime\n",
    "\n",
    "This notebook uses an hourly monitor, so it takes between 30-90 minutes to run.\n",
    "\n",
    "## Contents\n",
    "\n",
    "1. [PART A: Capturing real-time inference data from Amazon SageMaker endpoints](#PART-A:-Capturing-real-time-inference-data-from-Amazon-SageMaker-endpoints)\n",
    "1. [PART B: Model Monitor - Baselining and continuous monitoring](#PART-B:-Model-Monitor---Baselining-and-continuous-monitoring)\n",
    "    1. [Constraint suggestion with baseline/training dataset](#1.-Constraint-suggestion-with-baseline/training-dataset)\n",
    "    1. [Analyze collected data for data quality issues](#2.-Analyze-collected-data-for-data-quality-issues)\n",
    "\n",
    "## Setup\n",
    "\n",
    "To get started, make sure you have these prerequisites completed:\n",
    "\n",
    "* Specify an AWS Region to host your model.\n",
    "* An IAM role ARN exists that is used to give Amazon SageMaker access to your data in Amazon Simple Storage Service (Amazon S3).\n",
    "* Use the default S3 bucket to store the data used to train your model, any additional model data, and the data captured from model invocations. For demonstration purposes, you are using the same bucket for these. In reality, you might want to separate them with different security policies."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "06059fc2",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2022-04-18T00:13:14.089492Z",
     "iopub.status.busy": "2022-04-18T00:13:14.088656Z",
     "iopub.status.idle": "2022-04-18T00:13:15.821945Z",
     "shell.execute_reply": "2022-04-18T00:13:15.821505Z"
    },
    "isConfigCell": true,
    "papermill": {
     "duration": 1.760865,
     "end_time": "2022-04-18T00:13:15.822056",
     "exception": false,
     "start_time": "2022-04-18T00:13:14.061191",
     "status": "completed"
    },
    "tags": []
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Role ARN: arn:aws:iam::000000000000:role/ProdBuildSystemStack-ReleaseBuildRoleFB326D49-QK8LUA2UI1IC\n",
      "Demo Bucket: sagemaker-us-west-2-521695447989\n",
      "Capture path: s3://sagemaker-us-west-2-000000000000/sagemaker/DEMO-ModelMonitor/datacapture\n",
      "Report path: s3://sagemaker-us-west-2-000000000000/sagemaker/DEMO-ModelMonitor/reports\n",
      "Preproc Code path: s3://sagemaker-us-west-2-000000000000/sagemaker/DEMO-ModelMonitor/code/preprocessor.py\n",
      "Postproc Code path: s3://sagemaker-us-west-2-000000000000/sagemaker/DEMO-ModelMonitor/code/postprocessor.py\n"
     ]
    }
   ],
   "source": [
    "import os\n",
    "import boto3\n",
    "import re\n",
    "import json\n",
    "import sagemaker\n",
    "from sagemaker import get_execution_role, session\n",
    "\n",
    "sm_session = sagemaker.Session()\n",
    "region = sm_session.boto_region_name\n",
    "\n",
    "role = get_execution_role()\n",
    "print(\"Role ARN: {}\".format(role))\n",
    "\n",
    "bucket = sm_session.default_bucket()\n",
    "print(\"Demo Bucket: {}\".format(bucket))\n",
    "prefix = \"sagemaker/DEMO-ModelMonitor\"\n",
    "\n",
    "data_capture_prefix = \"{}/datacapture\".format(prefix)\n",
    "s3_capture_upload_path = \"s3://{}/{}\".format(bucket, data_capture_prefix)\n",
    "reports_prefix = \"{}/reports\".format(prefix)\n",
    "s3_report_path = \"s3://{}/{}\".format(bucket, reports_prefix)\n",
    "code_prefix = \"{}/code\".format(prefix)\n",
    "s3_code_preprocessor_uri = \"s3://{}/{}/{}\".format(bucket, code_prefix, \"preprocessor.py\")\n",
    "s3_code_postprocessor_uri = \"s3://{}/{}/{}\".format(bucket, code_prefix, \"postprocessor.py\")\n",
    "\n",
    "print(\"Capture path: {}\".format(s3_capture_upload_path))\n",
    "print(\"Report path: {}\".format(s3_report_path))\n",
    "print(\"Preproc Code path: {}\".format(s3_code_preprocessor_uri))\n",
    "print(\"Postproc Code path: {}\".format(s3_code_postprocessor_uri))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c45e9a54",
   "metadata": {
    "papermill": {
     "duration": 0.021449,
     "end_time": "2022-04-18T00:13:15.865605",
     "exception": false,
     "start_time": "2022-04-18T00:13:15.844156",
     "status": "completed"
    },
    "tags": []
   },
   "source": [
    "## PART A: Capturing real-time inference data from Amazon SageMaker endpoints\n",
    "Create an endpoint to showcase the data capture capability in action.\n",
    "\n",
    "### Upload the pre-trained model to Amazon S3\n",
    "This code uploads a pre-trained XGBoost model that is ready for you to deploy. This model was trained using the XGB Churn Prediction Notebook in SageMaker. You can also use your own pre-trained model in this step. If you already have a pretrained model in Amazon S3, you can add it instead by specifying the s3_key."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "ee26d288",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2022-04-18T00:13:15.920803Z",
     "iopub.status.busy": "2022-04-18T00:13:15.912381Z",
     "iopub.status.idle": "2022-04-18T00:13:16.050565Z",
     "shell.execute_reply": "2022-04-18T00:13:16.050961Z"
    },
    "papermill": {
     "duration": 0.16406,
     "end_time": "2022-04-18T00:13:16.051105",
     "exception": false,
     "start_time": "2022-04-18T00:13:15.887045",
     "status": "completed"
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "model_file = open(\"model/xgb-churn-prediction-model.tar.gz\", \"rb\")\n",
    "s3_key = os.path.join(prefix, \"xgb-churn-prediction-model.tar.gz\")\n",
    "boto3.Session().resource(\"s3\").Bucket(bucket).Object(s3_key).upload_fileobj(model_file)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9aad4a66",
   "metadata": {
    "papermill": {
     "duration": 0.021593,
     "end_time": "2022-04-18T00:13:16.094614",
     "exception": false,
     "start_time": "2022-04-18T00:13:16.073021",
     "status": "completed"
    },
    "tags": []
   },
   "source": [
    "### Deploy the model to Amazon SageMaker\n",
    "Start with deploying a pre-trained churn prediction model. Here, you create the model object with the image and model data."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "1ad19f39",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2022-04-18T00:13:16.143192Z",
     "iopub.status.busy": "2022-04-18T00:13:16.142700Z",
     "iopub.status.idle": "2022-04-18T00:13:16.156473Z",
     "shell.execute_reply": "2022-04-18T00:13:16.156847Z"
    },
    "papermill": {
     "duration": 0.040689,
     "end_time": "2022-04-18T00:13:16.156984",
     "exception": false,
     "start_time": "2022-04-18T00:13:16.116295",
     "status": "completed"
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "from time import gmtime, strftime\n",
    "from sagemaker.model import Model\n",
    "from sagemaker.image_uris import retrieve\n",
    "\n",
    "model_name = \"DEMO-xgb-churn-pred-model-monitor-\" + strftime(\"%Y-%m-%d-%H-%M-%S\", gmtime())\n",
    "model_url = \"https://{}.s3-{}.amazonaws.com/{}/xgb-churn-prediction-model.tar.gz\".format(\n",
    "    bucket, region, prefix\n",
    ")\n",
    "\n",
    "image_uri = retrieve(\"xgboost\", region, \"0.90-1\")\n",
    "\n",
    "model = Model(image_uri=image_uri, model_data=model_url, role=role)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4fa7b61e",
   "metadata": {
    "papermill": {
     "duration": 0.02197,
     "end_time": "2022-04-18T00:13:16.200979",
     "exception": false,
     "start_time": "2022-04-18T00:13:16.179009",
     "status": "completed"
    },
    "tags": []
   },
   "source": [
    "To enable data capture for monitoring the model data quality, you specify the new capture option called `DataCaptureConfig`. You can capture the request payload, the response payload or both with this configuration. The capture config applies to all variants. Go ahead with the deployment."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "7a8f85c2",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2022-04-18T00:13:16.258005Z",
     "iopub.status.busy": "2022-04-18T00:13:16.249135Z",
     "iopub.status.idle": "2022-04-18T00:20:48.394644Z",
     "shell.execute_reply": "2022-04-18T00:20:48.395055Z"
    },
    "papermill": {
     "duration": 452.17232,
     "end_time": "2022-04-18T00:20:48.395194",
     "exception": false,
     "start_time": "2022-04-18T00:13:16.222874",
     "status": "completed"
    },
    "tags": []
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "EndpointName=DEMO-xgb-churn-pred-model-monitor-2022-04-18-00-13-16\n",
      "---------------!"
     ]
    }
   ],
   "source": [
    "from sagemaker.model_monitor import DataCaptureConfig\n",
    "\n",
    "endpoint_name = \"DEMO-xgb-churn-pred-model-monitor-\" + strftime(\"%Y-%m-%d-%H-%M-%S\", gmtime())\n",
    "print(\"EndpointName={}\".format(endpoint_name))\n",
    "\n",
    "data_capture_config = DataCaptureConfig(\n",
    "    enable_capture=True, sampling_percentage=100, destination_s3_uri=s3_capture_upload_path\n",
    ")\n",
    "\n",
    "predictor = model.deploy(\n",
    "    initial_instance_count=1,\n",
    "    instance_type=\"ml.m4.xlarge\",\n",
    "    endpoint_name=endpoint_name,\n",
    "    data_capture_config=data_capture_config,\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a2ed7979",
   "metadata": {
    "papermill": {
     "duration": 0.025376,
     "end_time": "2022-04-18T00:20:48.446074",
     "exception": false,
     "start_time": "2022-04-18T00:20:48.420698",
     "status": "completed"
    },
    "tags": []
   },
   "source": [
    "### Invoke the deployed model\n",
    "\n",
    "You can now send data to this endpoint to get inferences in real time. Because you enabled the data capture in the previous steps, the request and response payload, along with some additional metadata, is saved in the Amazon Simple Storage Service (Amazon S3) location you have specified in the DataCaptureConfig."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e848dd92",
   "metadata": {
    "papermill": {
     "duration": 0.025161,
     "end_time": "2022-04-18T00:20:48.496491",
     "exception": false,
     "start_time": "2022-04-18T00:20:48.471330",
     "status": "completed"
    },
    "tags": []
   },
   "source": [
    "This step invokes the endpoint with included sample data for about 3 minutes. Data is captured based on the sampling percentage specified and the capture continues until the data capture option is turned off."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "3d376f93",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2022-04-18T00:20:48.562549Z",
     "iopub.status.busy": "2022-04-18T00:20:48.561781Z",
     "iopub.status.idle": "2022-04-18T00:23:51.673621Z",
     "shell.execute_reply": "2022-04-18T00:23:51.674029Z"
    },
    "papermill": {
     "duration": 183.152276,
     "end_time": "2022-04-18T00:23:51.674166",
     "exception": false,
     "start_time": "2022-04-18T00:20:48.521890",
     "status": "completed"
    },
    "tags": []
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Sending test traffic to the endpoint DEMO-xgb-churn-pred-model-monitor-2022-04-18-00-13-16. \n",
      "Please wait...\n",
      "Done!\n"
     ]
    }
   ],
   "source": [
    "from sagemaker.predictor import Predictor\n",
    "from sagemaker.serializers import CSVSerializer\n",
    "import time\n",
    "\n",
    "predictor = Predictor(endpoint_name=endpoint_name, serializer=CSVSerializer())\n",
    "\n",
    "# Get a subset of test data for a quick test\n",
    "!head -180 test_data/test-dataset-input-cols.csv > test_data/test_sample.csv\n",
    "print(\"Sending test traffic to the endpoint {}. \\nPlease wait...\".format(endpoint_name))\n",
    "\n",
    "with open(\"test_data/test_sample.csv\", \"r\") as f:\n",
    "    for row in f:\n",
    "        payload = row.rstrip(\"\\n\")\n",
    "        response = predictor.predict(data=payload)\n",
    "        time.sleep(1)\n",
    "\n",
    "print(\"Done!\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "12963ac1",
   "metadata": {
    "papermill": {
     "duration": 0.025961,
     "end_time": "2022-04-18T00:23:51.726250",
     "exception": false,
     "start_time": "2022-04-18T00:23:51.700289",
     "status": "completed"
    },
    "tags": []
   },
   "source": [
    "### View captured data\n",
    "\n",
    "Now list the data capture files stored in Amazon S3. You should expect to see different files from different time periods organized based on the hour in which the invocation occurred. The format of the Amazon S3 path is:\n",
    "\n",
    "`s3://{destination-bucket-prefix}/{endpoint-name}/{variant-name}/yyyy/mm/dd/hh/filename.jsonl`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "be89b97b",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2022-04-18T00:23:51.798870Z",
     "iopub.status.busy": "2022-04-18T00:23:51.786388Z",
     "iopub.status.idle": "2022-04-18T00:23:51.965951Z",
     "shell.execute_reply": "2022-04-18T00:23:51.965241Z"
    },
    "papermill": {
     "duration": 0.213801,
     "end_time": "2022-04-18T00:23:51.966129",
     "exception": false,
     "start_time": "2022-04-18T00:23:51.752328",
     "status": "completed"
    },
    "scrolled": true,
    "tags": []
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Found Capture Files:\n",
      "sagemaker/DEMO-ModelMonitor/datacapture/DEMO-xgb-churn-pred-model-monitor-2022-04-18-00-13-16/AllTraffic/2022/04/18/00/20-48-871-0b7bfd33-c8a7-4adf-a09a-98fd0af79974.jsonl\n",
      " sagemaker/DEMO-ModelMonitor/datacapture/DEMO-xgb-churn-pred-model-monitor-2022-04-18-00-13-16/AllTraffic/2022/04/18/00/21-48-903-028573f5-3215-431a-b3b8-a200be4d2d9a.jsonl\n"
     ]
    }
   ],
   "source": [
    "s3_client = boto3.Session().client(\"s3\")\n",
    "current_endpoint_capture_prefix = \"{}/{}\".format(data_capture_prefix, endpoint_name)\n",
    "result = s3_client.list_objects(Bucket=bucket, Prefix=current_endpoint_capture_prefix)\n",
    "capture_files = [capture_file.get(\"Key\") for capture_file in result.get(\"Contents\")]\n",
    "print(\"Found Capture Files:\")\n",
    "print(\"\\n \".join(capture_files))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9b147b62",
   "metadata": {
    "papermill": {
     "duration": 0.027531,
     "end_time": "2022-04-18T00:23:52.029034",
     "exception": false,
     "start_time": "2022-04-18T00:23:52.001503",
     "status": "completed"
    },
    "tags": []
   },
   "source": [
    "Next, view the contents of a single capture file. Here you should see all the data captured in an Amazon SageMaker specific JSON-line formatted file. Take a quick peek at the first few lines in the captured file."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "16f20ef8",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2022-04-18T00:23:52.088844Z",
     "iopub.status.busy": "2022-04-18T00:23:52.088257Z",
     "iopub.status.idle": "2022-04-18T00:23:52.174071Z",
     "shell.execute_reply": "2022-04-18T00:23:52.173556Z"
    },
    "papermill": {
     "duration": 0.118865,
     "end_time": "2022-04-18T00:23:52.174189",
     "exception": false,
     "start_time": "2022-04-18T00:23:52.055324",
     "status": "completed"
    },
    "tags": []
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "{\"captureData\":{\"endpointInput\":{\"observedContentType\":\"text/csv\",\"mode\":\"INPUT\",\"data\":\"132,10,182.9,54,292.4,68,142.3,116,11.5,4,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,1,0,0,1\",\"encoding\":\"CSV\"},\"endpointOutput\":{\"observedContentType\":\"text/csv; charset=utf-8\",\"mode\":\"OUTPUT\",\"data\":\"0.036397725343704224\",\"encoding\":\"CSV\"}},\"eventMetadata\":{\"eventId\":\"f04a8538-696c-4648-8064-8b680b2e3318\",\"inferenceTime\":\"2022-04-18T00:21:48Z\"},\"eventVersion\":\"0\"}\n",
      "{\"captureData\":{\"endpointInput\":{\"observedContentType\":\"text/csv\",\"mode\":\"INPUT\",\"data\":\"130,0,263.7,113,186.5,103,195.3,99,18.3,6,1,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,1,0,1,0\",\"encoding\":\"CSV\"},\"endpointOutput\":{\"observedContentType\":\"text/csv; charset=utf-8\",\"mode\":\"OUTPUT\",\"data\":\"0.3791300654411316\",\"encoding\":\"CSV\"}},\"eventMetadata\":{\"eventId\":\"4f301857-49c9-44d1-b275-ba0ab7767033\",\"inferenceTime\":\"2022-04-18T00:21:49Z\"},\"eventVersion\":\"0\"}\n",
      "{\"captureData\":{\"endpointInput\":{\"observedContentType\":\"text/csv\",\"mode\":\"INPUT\",\"data\":\"85,0,210.3,66,195.8,76,221.6,82,11.2,7,1,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,1,0,1,0\",\"encoding\":\"CSV\"},\"endpointOutput\":{\"observedContentType\":\"text/csv; charset=utf-8\",\"mode\":\"OUTPUT\",\"data\":\"0.017852725461125374\",\"encoding\":\"CSV\"}},\"eventMetadata\":{\"eventId\":\"9827355a-ac22-40f4-9748-539e03db6c2a\",\"inferenceTime\":\"2022-04-18T00:21:50Z\"},\"eventVersion\":\"0\"}\n",
      "{\"captureData\":{\"endpointInput\":{\"observedContentType\":\"text/csv\",\"mode\":\"INPUT\",\"data\":\"146,0,160.1,63,208.4,112,177.6,98,9.2,3,2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,1,0,0,0,1,1,0\",\"encoding\":\"CSV\"},\"endpointOutput\":{\"observedContentType\":\"text/csv; charset=utf-8\",\"mode\":\"OUTPUT\",\"data\":\"0.13224346935749054\",\"encoding\":\"CSV\"}},\"eventMetadata\":{\"eventId\":\"7dd7b9\n"
     ]
    }
   ],
   "source": [
    "def get_obj_body(obj_key):\n",
    "    return s3_client.get_object(Bucket=bucket, Key=obj_key).get(\"Body\").read().decode(\"utf-8\")\n",
    "\n",
    "\n",
    "capture_file = get_obj_body(capture_files[-1])\n",
    "print(capture_file[:2000])"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "427cf179",
   "metadata": {
    "papermill": {
     "duration": 0.026714,
     "end_time": "2022-04-18T00:23:52.227979",
     "exception": false,
     "start_time": "2022-04-18T00:23:52.201265",
     "status": "completed"
    },
    "tags": []
   },
   "source": [
    "Finally, the contents of a single line is present below in a formatted JSON file so that you can observe a little better."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "id": "edbca102",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2022-04-18T00:23:52.287189Z",
     "iopub.status.busy": "2022-04-18T00:23:52.286660Z",
     "iopub.status.idle": "2022-04-18T00:23:52.288913Z",
     "shell.execute_reply": "2022-04-18T00:23:52.289304Z"
    },
    "papermill": {
     "duration": 0.034824,
     "end_time": "2022-04-18T00:23:52.289438",
     "exception": false,
     "start_time": "2022-04-18T00:23:52.254614",
     "status": "completed"
    },
    "scrolled": true,
    "tags": []
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "{\n",
      "  \"captureData\": {\n",
      "    \"endpointInput\": {\n",
      "      \"observedContentType\": \"text/csv\",\n",
      "      \"mode\": \"INPUT\",\n",
      "      \"data\": \"132,10,182.9,54,292.4,68,142.3,116,11.5,4,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,1,0,0,1\",\n",
      "      \"encoding\": \"CSV\"\n",
      "    },\n",
      "    \"endpointOutput\": {\n",
      "      \"observedContentType\": \"text/csv; charset=utf-8\",\n",
      "      \"mode\": \"OUTPUT\",\n",
      "      \"data\": \"0.036397725343704224\",\n",
      "      \"encoding\": \"CSV\"\n",
      "    }\n",
      "  },\n",
      "  \"eventMetadata\": {\n",
      "    \"eventId\": \"f04a8538-696c-4648-8064-8b680b2e3318\",\n",
      "    \"inferenceTime\": \"2022-04-18T00:21:48Z\"\n",
      "  },\n",
      "  \"eventVersion\": \"0\"\n",
      "}\n"
     ]
    }
   ],
   "source": [
    "import json\n",
    "\n",
    "print(json.dumps(json.loads(capture_file.split(\"\\n\")[0]), indent=2))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ba3b6245",
   "metadata": {
    "papermill": {
     "duration": 0.026948,
     "end_time": "2022-04-18T00:23:52.343438",
     "exception": false,
     "start_time": "2022-04-18T00:23:52.316490",
     "status": "completed"
    },
    "tags": []
   },
   "source": [
    "As you can see, each inference request is captured in one line in the jsonl file. The line contains both the input and output merged together. In the example, you provided the ContentType as `text/csv` which is reflected in the `observedContentType` value. Also, you expose the encoding that you used to encode the input and output payloads in the capture format with the `encoding` value.\n",
    "\n",
    "To recap, you observed how you can enable capturing the input or output payloads to an endpoint with a new parameter. You have also observed what the captured format looks like in Amazon S3. Next, continue to explore how Amazon SageMaker helps with monitoring the data collected in Amazon S3."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b25e43e3",
   "metadata": {
    "papermill": {
     "duration": 0.02743,
     "end_time": "2022-04-18T00:23:52.397993",
     "exception": false,
     "start_time": "2022-04-18T00:23:52.370563",
     "status": "completed"
    },
    "tags": []
   },
   "source": [
    "## PART B: Model Monitor - Baselining and continuous monitoring"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2cbd7dfd",
   "metadata": {
    "papermill": {
     "duration": 0.026767,
     "end_time": "2022-04-18T00:23:52.452009",
     "exception": false,
     "start_time": "2022-04-18T00:23:52.425242",
     "status": "completed"
    },
    "tags": []
   },
   "source": [
    "In addition to collecting the data, Amazon SageMaker provides the capability for you to monitor and evaluate the data observed by the endpoints. For this:\n",
    "1. Create a baseline with which you compare the realtime traffic. \n",
    "1. Once a baseline is ready, setup a schedule to continously evaluate and compare against the baseline."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "377e8855",
   "metadata": {
    "papermill": {
     "duration": 0.026868,
     "end_time": "2022-04-18T00:23:52.505930",
     "exception": false,
     "start_time": "2022-04-18T00:23:52.479062",
     "status": "completed"
    },
    "tags": []
   },
   "source": [
    "### 1. Constraint suggestion with baseline/training dataset"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c596ed4c",
   "metadata": {
    "papermill": {
     "duration": 0.026955,
     "end_time": "2022-04-18T00:23:52.559834",
     "exception": false,
     "start_time": "2022-04-18T00:23:52.532879",
     "status": "completed"
    },
    "tags": []
   },
   "source": [
    "The training dataset with which you trained the model is usually a good baseline dataset. Note that the training dataset data schema and the inference dataset schema should exactly match (i.e. the number and order of the features).\n",
    "\n",
    "From the training dataset you can ask Amazon SageMaker to suggest a set of baseline `constraints` and generate descriptive `statistics` to explore the data. For this example, upload the training dataset that was used to train the pre-trained model included in this example. If you already have it in Amazon S3, you can directly point to it."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "id": "dce35f31",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2022-04-18T00:23:52.618548Z",
     "iopub.status.busy": "2022-04-18T00:23:52.617827Z",
     "iopub.status.idle": "2022-04-18T00:23:52.620899Z",
     "shell.execute_reply": "2022-04-18T00:23:52.621263Z"
    },
    "papermill": {
     "duration": 0.034674,
     "end_time": "2022-04-18T00:23:52.621397",
     "exception": false,
     "start_time": "2022-04-18T00:23:52.586723",
     "status": "completed"
    },
    "tags": []
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Baseline data uri: s3://sagemaker-us-west-2-000000000000/sagemaker/DEMO-ModelMonitor/baselining/data\n",
      "Baseline results uri: s3://sagemaker-us-west-2-000000000000/sagemaker/DEMO-ModelMonitor/baselining/results\n"
     ]
    }
   ],
   "source": [
    "# copy over the training dataset to Amazon S3 (if you already have it in Amazon S3, you could reuse it)\n",
    "baseline_prefix = prefix + \"/baselining\"\n",
    "baseline_data_prefix = baseline_prefix + \"/data\"\n",
    "baseline_results_prefix = baseline_prefix + \"/results\"\n",
    "\n",
    "baseline_data_uri = \"s3://{}/{}\".format(bucket, baseline_data_prefix)\n",
    "baseline_results_uri = \"s3://{}/{}\".format(bucket, baseline_results_prefix)\n",
    "print(\"Baseline data uri: {}\".format(baseline_data_uri))\n",
    "print(\"Baseline results uri: {}\".format(baseline_results_uri))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "id": "db1f8aca",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2022-04-18T00:23:52.697273Z",
     "iopub.status.busy": "2022-04-18T00:23:52.696706Z",
     "iopub.status.idle": "2022-04-18T00:23:52.860994Z",
     "shell.execute_reply": "2022-04-18T00:23:52.860548Z"
    },
    "papermill": {
     "duration": 0.212278,
     "end_time": "2022-04-18T00:23:52.861105",
     "exception": false,
     "start_time": "2022-04-18T00:23:52.648827",
     "status": "completed"
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "training_data_file = open(\"test_data/training-dataset-with-header.csv\", \"rb\")\n",
    "s3_key = os.path.join(baseline_prefix, \"data\", \"training-dataset-with-header.csv\")\n",
    "boto3.Session().resource(\"s3\").Bucket(bucket).Object(s3_key).upload_fileobj(training_data_file)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3205c9d4",
   "metadata": {
    "papermill": {
     "duration": 0.03685,
     "end_time": "2022-04-18T00:23:52.925824",
     "exception": false,
     "start_time": "2022-04-18T00:23:52.888974",
     "status": "completed"
    },
    "tags": []
   },
   "source": [
    "#### Create a baselining job with training dataset"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7751107f",
   "metadata": {
    "papermill": {
     "duration": 0.028188,
     "end_time": "2022-04-18T00:23:52.997676",
     "exception": false,
     "start_time": "2022-04-18T00:23:52.969488",
     "status": "completed"
    },
    "tags": []
   },
   "source": [
    "Now that you have the training data ready in Amazon S3, start a job to `suggest` constraints. `DefaultModelMonitor.suggest_baseline(..)` starts a `ProcessingJob` using an Amazon SageMaker provided Model Monitor container to generate the constraints."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "id": "63dc14e7",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2022-04-18T00:23:53.066384Z",
     "iopub.status.busy": "2022-04-18T00:23:53.065835Z",
     "iopub.status.idle": "2022-04-18T00:29:46.346123Z",
     "shell.execute_reply": "2022-04-18T00:29:46.346507Z"
    },
    "papermill": {
     "duration": 353.320851,
     "end_time": "2022-04-18T00:29:46.346647",
     "exception": false,
     "start_time": "2022-04-18T00:23:53.025796",
     "status": "completed"
    },
    "tags": []
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "Job Name:  baseline-suggestion-job-2022-04-18-00-23-53-286\n",
      "Inputs:  [{'InputName': 'baseline_dataset_input', 'AppManaged': False, 'S3Input': {'S3Uri': 's3://sagemaker-us-west-2-000000000000/sagemaker/DEMO-ModelMonitor/baselining/data/training-dataset-with-header.csv', 'LocalPath': '/opt/ml/processing/input/baseline_dataset_input', 'S3DataType': 'S3Prefix', 'S3InputMode': 'File', 'S3DataDistributionType': 'FullyReplicated', 'S3CompressionType': 'None'}}]\n",
      "Outputs:  [{'OutputName': 'monitoring_output', 'AppManaged': False, 'S3Output': {'S3Uri': 's3://sagemaker-us-west-2-000000000000/sagemaker/DEMO-ModelMonitor/baselining/results', 'LocalPath': '/opt/ml/processing/output', 'S3UploadMode': 'EndOfJob'}}]\n",
      ".........................\u001b[34m2022-04-18 00:27:57,964 - matplotlib.font_manager - INFO - Generating new fontManager, this may take some time...\u001b[0m\n",
      "\u001b[34m2022-04-18 00:27:58.544887: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory\u001b[0m\n",
      "...\n",
      "\u001b[34m2022-04-18 00:29:19 INFO  FileUtil:29 - Write to file statistics.json at path /opt/ml/processing/output.\u001b[0m\n",
      "\u001b[34m2022-04-18 00:29:19 INFO  YarnClientSchedulerBackend:54 - Interrupting monitor thread\u001b[0m\n",
      "\u001b[34m2022-04-18 00:29:19 INFO  YarnClientSchedulerBackend:54 - Shutting down all executors\u001b[0m\n",
      "\u001b[34m2022-04-18 00:29:19 INFO  YarnSchedulerBackend$YarnDriverEndpoint:54 - Asking each executor to shut down\u001b[0m\n",
      "\u001b[34m2022-04-18 00:29:19 INFO  SchedulerExtensionServices:54 - Stopping SchedulerExtensionServices\u001b[0m\n",
      "\u001b[34m(serviceOption=None,\n",
      " services=List(),\n",
      " started=false)\u001b[0m\n",
      "\u001b[34m2022-04-18 00:29:19 INFO  YarnClientSchedulerBackend:54 - Stopped\u001b[0m\n",
      "\u001b[34m2022-04-18 00:29:19 INFO  MapOutputTrackerMasterEndpoint:54 - MapOutputTrackerMasterEndpoint stopped!\u001b[0m\n",
      "\u001b[34m2022-04-18 00:29:19 INFO  MemoryStore:54 - MemoryStore cleared\u001b[0m\n",
      "\u001b[34m2022-04-18 00:29:19 INFO  BlockManager:54 - BlockManager stopped\u001b[0m\n",
      "\u001b[34m2022-04-18 00:29:19 INFO  BlockManagerMaster:54 - BlockManagerMaster stopped\u001b[0m\n",
      "\u001b[34m2022-04-18 00:29:19 INFO  OutputCommitCoordinator$OutputCommitCoordinatorEndpoint:54 - OutputCommitCoordinator stopped!\u001b[0m\n",
      "\u001b[34m2022-04-18 00:29:19 INFO  SparkContext:54 - Successfully stopped SparkContext\u001b[0m\n",
      "\u001b[34m2022-04-18 00:29:19 INFO  Main:65 - Completed: Job completed successfully with no violations.\u001b[0m\n",
      "\u001b[34m2022-04-18 00:29:19 INFO  Main:141 - Write to file /opt/ml/output/message.\u001b[0m\n",
      "\u001b[34m2022-04-18 00:29:19 INFO  ShutdownHookManager:54 - Shutdown hook called\u001b[0m\n",
      "\u001b[34m2022-04-18 00:29:19 INFO  ShutdownHookManager:54 - Deleting directory /tmp/spark-4d0fa5da-ce51-48bb-9fde-31f5cfd39c10\u001b[0m\n",
      "\u001b[34m2022-04-18 00:29:19 INFO  ShutdownHookManager:54 - Deleting directory /tmp/spark-c4c4235c-d98b-42f7-8ebf-aaf29574aeeb\u001b[0m\n",
      "\u001b[34m2022-04-18 00:29:20,048 - DefaultDataAnalyzer - INFO - Completed spark-submit with return code : 0\u001b[0m\n",
      "\u001b[34m2022-04-18 00:29:20,048 - DefaultDataAnalyzer - INFO - Spark job completed.\u001b[0m\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "<sagemaker.processing.ProcessingJob at 0x7ff2badb7350>"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from sagemaker.model_monitor import DefaultModelMonitor\n",
    "from sagemaker.model_monitor.dataset_format import DatasetFormat\n",
    "\n",
    "my_default_monitor = DefaultModelMonitor(\n",
    "    role=role,\n",
    "    instance_count=1,\n",
    "    instance_type=\"ml.m5.xlarge\",\n",
    "    volume_size_in_gb=20,\n",
    "    max_runtime_in_seconds=3600,\n",
    ")\n",
    "\n",
    "my_default_monitor.suggest_baseline(\n",
    "    baseline_dataset=baseline_data_uri + \"/training-dataset-with-header.csv\",\n",
    "    dataset_format=DatasetFormat.csv(header=True),\n",
    "    output_s3_uri=baseline_results_uri,\n",
    "    wait=True,\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5a8b2a3d",
   "metadata": {
    "papermill": {
     "duration": 0.048105,
     "end_time": "2022-04-18T00:29:46.442994",
     "exception": false,
     "start_time": "2022-04-18T00:29:46.394889",
     "status": "completed"
    },
    "tags": []
   },
   "source": [
    "#### Explore the generated constraints and statistics"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "id": "5298eea2",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2022-04-18T00:29:46.551577Z",
     "iopub.status.busy": "2022-04-18T00:29:46.550580Z",
     "iopub.status.idle": "2022-04-18T00:29:46.717774Z",
     "shell.execute_reply": "2022-04-18T00:29:46.718169Z"
    },
    "papermill": {
     "duration": 0.227209,
     "end_time": "2022-04-18T00:29:46.718320",
     "exception": false,
     "start_time": "2022-04-18T00:29:46.491111",
     "status": "completed"
    },
    "tags": []
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Found Files:\n",
      "sagemaker/DEMO-ModelMonitor/baselining/results/constraints.json\n",
      " sagemaker/DEMO-ModelMonitor/baselining/results/statistics.json\n"
     ]
    }
   ],
   "source": [
    "s3_client = boto3.Session().client(\"s3\")\n",
    "result = s3_client.list_objects(Bucket=bucket, Prefix=baseline_results_prefix)\n",
    "report_files = [report_file.get(\"Key\") for report_file in result.get(\"Contents\")]\n",
    "print(\"Found Files:\")\n",
    "print(\"\\n \".join(report_files))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "id": "b6179805",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2022-04-18T00:29:46.819261Z",
     "iopub.status.busy": "2022-04-18T00:29:46.818492Z",
     "iopub.status.idle": "2022-04-18T00:29:47.210196Z",
     "shell.execute_reply": "2022-04-18T00:29:47.209794Z"
    },
    "papermill": {
     "duration": 0.444019,
     "end_time": "2022-04-18T00:29:47.210306",
     "exception": false,
     "start_time": "2022-04-18T00:29:46.766287",
     "status": "completed"
    },
    "tags": []
   },
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/opt/conda/lib/python3.7/site-packages/ipykernel_launcher.py:4: FutureWarning: pandas.io.json.json_normalize is deprecated, use pandas.json_normalize instead\n",
      "  after removing the cwd from sys.path.\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>name</th>\n",
       "      <th>inferred_type</th>\n",
       "      <th>numerical_statistics.common.num_present</th>\n",
       "      <th>numerical_statistics.common.num_missing</th>\n",
       "      <th>numerical_statistics.mean</th>\n",
       "      <th>numerical_statistics.sum</th>\n",
       "      <th>numerical_statistics.std_dev</th>\n",
       "      <th>numerical_statistics.min</th>\n",
       "      <th>numerical_statistics.max</th>\n",
       "      <th>numerical_statistics.distribution.kll.buckets</th>\n",
       "      <th>numerical_statistics.distribution.kll.sketch.parameters.c</th>\n",
       "      <th>numerical_statistics.distribution.kll.sketch.parameters.k</th>\n",
       "      <th>numerical_statistics.distribution.kll.sketch.data</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Churn</td>\n",
       "      <td>Integral</td>\n",
       "      <td>2333</td>\n",
       "      <td>0</td>\n",
       "      <td>0.139306</td>\n",
       "      <td>325.0</td>\n",
       "      <td>0.346265</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>[{'lower_bound': 0.0, 'upper_bound': 0.1, 'cou...</td>\n",
       "      <td>0.64</td>\n",
       "      <td>2048.0</td>\n",
       "      <td>[[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0,...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Account Length</td>\n",
       "      <td>Integral</td>\n",
       "      <td>2333</td>\n",
       "      <td>0</td>\n",
       "      <td>101.276897</td>\n",
       "      <td>236279.0</td>\n",
       "      <td>39.552442</td>\n",
       "      <td>1.0</td>\n",
       "      <td>243.0</td>\n",
       "      <td>[{'lower_bound': 1.0, 'upper_bound': 25.2, 'co...</td>\n",
       "      <td>0.64</td>\n",
       "      <td>2048.0</td>\n",
       "      <td>[[119.0, 100.0, 111.0, 181.0, 95.0, 104.0, 70....</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>VMail Message</td>\n",
       "      <td>Integral</td>\n",
       "      <td>2333</td>\n",
       "      <td>0</td>\n",
       "      <td>8.214316</td>\n",
       "      <td>19164.0</td>\n",
       "      <td>13.776908</td>\n",
       "      <td>0.0</td>\n",
       "      <td>51.0</td>\n",
       "      <td>[{'lower_bound': 0.0, 'upper_bound': 5.1, 'cou...</td>\n",
       "      <td>0.64</td>\n",
       "      <td>2048.0</td>\n",
       "      <td>[[19.0, 0.0, 0.0, 40.0, 36.0, 0.0, 0.0, 24.0, ...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Day Mins</td>\n",
       "      <td>Fractional</td>\n",
       "      <td>2333</td>\n",
       "      <td>0</td>\n",
       "      <td>180.226489</td>\n",
       "      <td>420468.4</td>\n",
       "      <td>53.987179</td>\n",
       "      <td>0.0</td>\n",
       "      <td>350.8</td>\n",
       "      <td>[{'lower_bound': 0.0, 'upper_bound': 35.08, 'c...</td>\n",
       "      <td>0.64</td>\n",
       "      <td>2048.0</td>\n",
       "      <td>[[178.1, 160.3, 197.1, 105.2, 283.1, 113.6, 23...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Day Calls</td>\n",
       "      <td>Integral</td>\n",
       "      <td>2333</td>\n",
       "      <td>0</td>\n",
       "      <td>100.259323</td>\n",
       "      <td>233905.0</td>\n",
       "      <td>20.165008</td>\n",
       "      <td>0.0</td>\n",
       "      <td>165.0</td>\n",
       "      <td>[{'lower_bound': 0.0, 'upper_bound': 16.5, 'co...</td>\n",
       "      <td>0.64</td>\n",
       "      <td>2048.0</td>\n",
       "      <td>[[110.0, 138.0, 117.0, 61.0, 112.0, 87.0, 122....</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>Eve Mins</td>\n",
       "      <td>Fractional</td>\n",
       "      <td>2333</td>\n",
       "      <td>0</td>\n",
       "      <td>200.050107</td>\n",
       "      <td>466716.9</td>\n",
       "      <td>50.015928</td>\n",
       "      <td>31.2</td>\n",
       "      <td>361.8</td>\n",
       "      <td>[{'lower_bound': 31.2, 'upper_bound': 64.26, '...</td>\n",
       "      <td>0.64</td>\n",
       "      <td>2048.0</td>\n",
       "      <td>[[212.8, 221.3, 227.8, 341.3, 286.2, 158.6, 29...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>Eve Calls</td>\n",
       "      <td>Integral</td>\n",
       "      <td>2333</td>\n",
       "      <td>0</td>\n",
       "      <td>99.573939</td>\n",
       "      <td>232306.0</td>\n",
       "      <td>19.675578</td>\n",
       "      <td>12.0</td>\n",
       "      <td>170.0</td>\n",
       "      <td>[{'lower_bound': 12.0, 'upper_bound': 27.8, 'c...</td>\n",
       "      <td>0.64</td>\n",
       "      <td>2048.0</td>\n",
       "      <td>[[100.0, 92.0, 128.0, 79.0, 86.0, 98.0, 112.0,...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>Night Mins</td>\n",
       "      <td>Fractional</td>\n",
       "      <td>2333</td>\n",
       "      <td>0</td>\n",
       "      <td>201.388598</td>\n",
       "      <td>469839.6</td>\n",
       "      <td>50.627961</td>\n",
       "      <td>23.2</td>\n",
       "      <td>395.0</td>\n",
       "      <td>[{'lower_bound': 23.2, 'upper_bound': 60.37999...</td>\n",
       "      <td>0.64</td>\n",
       "      <td>2048.0</td>\n",
       "      <td>[[226.3, 150.4, 214.0, 165.7, 261.7, 187.7, 20...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>Night Calls</td>\n",
       "      <td>Integral</td>\n",
       "      <td>2333</td>\n",
       "      <td>0</td>\n",
       "      <td>100.227175</td>\n",
       "      <td>233830.0</td>\n",
       "      <td>19.282029</td>\n",
       "      <td>42.0</td>\n",
       "      <td>175.0</td>\n",
       "      <td>[{'lower_bound': 42.0, 'upper_bound': 55.3, 'c...</td>\n",
       "      <td>0.64</td>\n",
       "      <td>2048.0</td>\n",
       "      <td>[[123.0, 120.0, 101.0, 97.0, 129.0, 87.0, 112....</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>Intl Mins</td>\n",
       "      <td>Fractional</td>\n",
       "      <td>2333</td>\n",
       "      <td>0</td>\n",
       "      <td>10.253065</td>\n",
       "      <td>23920.4</td>\n",
       "      <td>2.778766</td>\n",
       "      <td>0.0</td>\n",
       "      <td>18.4</td>\n",
       "      <td>[{'lower_bound': 0.0, 'upper_bound': 1.8399999...</td>\n",
       "      <td>0.64</td>\n",
       "      <td>2048.0</td>\n",
       "      <td>[[10.0, 11.2, 9.3, 6.3, 11.3, 10.5, 0.0, 9.7, ...</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "             name inferred_type  numerical_statistics.common.num_present  \\\n",
       "0           Churn      Integral                                     2333   \n",
       "1  Account Length      Integral                                     2333   \n",
       "2   VMail Message      Integral                                     2333   \n",
       "3        Day Mins    Fractional                                     2333   \n",
       "4       Day Calls      Integral                                     2333   \n",
       "5        Eve Mins    Fractional                                     2333   \n",
       "6       Eve Calls      Integral                                     2333   \n",
       "7      Night Mins    Fractional                                     2333   \n",
       "8     Night Calls      Integral                                     2333   \n",
       "9       Intl Mins    Fractional                                     2333   \n",
       "\n",
       "   numerical_statistics.common.num_missing  numerical_statistics.mean  \\\n",
       "0                                        0                   0.139306   \n",
       "1                                        0                 101.276897   \n",
       "2                                        0                   8.214316   \n",
       "3                                        0                 180.226489   \n",
       "4                                        0                 100.259323   \n",
       "5                                        0                 200.050107   \n",
       "6                                        0                  99.573939   \n",
       "7                                        0                 201.388598   \n",
       "8                                        0                 100.227175   \n",
       "9                                        0                  10.253065   \n",
       "\n",
       "   numerical_statistics.sum  numerical_statistics.std_dev  \\\n",
       "0                     325.0                      0.346265   \n",
       "1                  236279.0                     39.552442   \n",
       "2                   19164.0                     13.776908   \n",
       "3                  420468.4                     53.987179   \n",
       "4                  233905.0                     20.165008   \n",
       "5                  466716.9                     50.015928   \n",
       "6                  232306.0                     19.675578   \n",
       "7                  469839.6                     50.627961   \n",
       "8                  233830.0                     19.282029   \n",
       "9                   23920.4                      2.778766   \n",
       "\n",
       "   numerical_statistics.min  numerical_statistics.max  \\\n",
       "0                       0.0                       1.0   \n",
       "1                       1.0                     243.0   \n",
       "2                       0.0                      51.0   \n",
       "3                       0.0                     350.8   \n",
       "4                       0.0                     165.0   \n",
       "5                      31.2                     361.8   \n",
       "6                      12.0                     170.0   \n",
       "7                      23.2                     395.0   \n",
       "8                      42.0                     175.0   \n",
       "9                       0.0                      18.4   \n",
       "\n",
       "       numerical_statistics.distribution.kll.buckets  \\\n",
       "0  [{'lower_bound': 0.0, 'upper_bound': 0.1, 'cou...   \n",
       "1  [{'lower_bound': 1.0, 'upper_bound': 25.2, 'co...   \n",
       "2  [{'lower_bound': 0.0, 'upper_bound': 5.1, 'cou...   \n",
       "3  [{'lower_bound': 0.0, 'upper_bound': 35.08, 'c...   \n",
       "4  [{'lower_bound': 0.0, 'upper_bound': 16.5, 'co...   \n",
       "5  [{'lower_bound': 31.2, 'upper_bound': 64.26, '...   \n",
       "6  [{'lower_bound': 12.0, 'upper_bound': 27.8, 'c...   \n",
       "7  [{'lower_bound': 23.2, 'upper_bound': 60.37999...   \n",
       "8  [{'lower_bound': 42.0, 'upper_bound': 55.3, 'c...   \n",
       "9  [{'lower_bound': 0.0, 'upper_bound': 1.8399999...   \n",
       "\n",
       "   numerical_statistics.distribution.kll.sketch.parameters.c  \\\n",
       "0                                               0.64           \n",
       "1                                               0.64           \n",
       "2                                               0.64           \n",
       "3                                               0.64           \n",
       "4                                               0.64           \n",
       "5                                               0.64           \n",
       "6                                               0.64           \n",
       "7                                               0.64           \n",
       "8                                               0.64           \n",
       "9                                               0.64           \n",
       "\n",
       "   numerical_statistics.distribution.kll.sketch.parameters.k  \\\n",
       "0                                             2048.0           \n",
       "1                                             2048.0           \n",
       "2                                             2048.0           \n",
       "3                                             2048.0           \n",
       "4                                             2048.0           \n",
       "5                                             2048.0           \n",
       "6                                             2048.0           \n",
       "7                                             2048.0           \n",
       "8                                             2048.0           \n",
       "9                                             2048.0           \n",
       "\n",
       "   numerical_statistics.distribution.kll.sketch.data  \n",
       "0  [[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0,...  \n",
       "1  [[119.0, 100.0, 111.0, 181.0, 95.0, 104.0, 70....  \n",
       "2  [[19.0, 0.0, 0.0, 40.0, 36.0, 0.0, 0.0, 24.0, ...  \n",
       "3  [[178.1, 160.3, 197.1, 105.2, 283.1, 113.6, 23...  \n",
       "4  [[110.0, 138.0, 117.0, 61.0, 112.0, 87.0, 122....  \n",
       "5  [[212.8, 221.3, 227.8, 341.3, 286.2, 158.6, 29...  \n",
       "6  [[100.0, 92.0, 128.0, 79.0, 86.0, 98.0, 112.0,...  \n",
       "7  [[226.3, 150.4, 214.0, 165.7, 261.7, 187.7, 20...  \n",
       "8  [[123.0, 120.0, 101.0, 97.0, 129.0, 87.0, 112....  \n",
       "9  [[10.0, 11.2, 9.3, 6.3, 11.3, 10.5, 0.0, 9.7, ...  "
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import pandas as pd\n",
    "\n",
    "baseline_job = my_default_monitor.latest_baselining_job\n",
    "schema_df = pd.io.json.json_normalize(baseline_job.baseline_statistics().body_dict[\"features\"])\n",
    "schema_df.head(10)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "id": "d25691b9",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2022-04-18T00:29:47.317888Z",
     "iopub.status.busy": "2022-04-18T00:29:47.317350Z",
     "iopub.status.idle": "2022-04-18T00:29:47.392290Z",
     "shell.execute_reply": "2022-04-18T00:29:47.391919Z"
    },
    "papermill": {
     "duration": 0.132679,
     "end_time": "2022-04-18T00:29:47.392401",
     "exception": false,
     "start_time": "2022-04-18T00:29:47.259722",
     "status": "completed"
    },
    "tags": []
   },
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/opt/conda/lib/python3.7/site-packages/ipykernel_launcher.py:2: FutureWarning: pandas.io.json.json_normalize is deprecated, use pandas.json_normalize instead\n",
      "  \n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>name</th>\n",
       "      <th>inferred_type</th>\n",
       "      <th>completeness</th>\n",
       "      <th>num_constraints.is_non_negative</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Churn</td>\n",
       "      <td>Integral</td>\n",
       "      <td>1.0</td>\n",
       "      <td>True</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Account Length</td>\n",
       "      <td>Integral</td>\n",
       "      <td>1.0</td>\n",
       "      <td>True</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>VMail Message</td>\n",
       "      <td>Integral</td>\n",
       "      <td>1.0</td>\n",
       "      <td>True</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Day Mins</td>\n",
       "      <td>Fractional</td>\n",
       "      <td>1.0</td>\n",
       "      <td>True</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Day Calls</td>\n",
       "      <td>Integral</td>\n",
       "      <td>1.0</td>\n",
       "      <td>True</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>Eve Mins</td>\n",
       "      <td>Fractional</td>\n",
       "      <td>1.0</td>\n",
       "      <td>True</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>Eve Calls</td>\n",
       "      <td>Integral</td>\n",
       "      <td>1.0</td>\n",
       "      <td>True</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>Night Mins</td>\n",
       "      <td>Fractional</td>\n",
       "      <td>1.0</td>\n",
       "      <td>True</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>Night Calls</td>\n",
       "      <td>Integral</td>\n",
       "      <td>1.0</td>\n",
       "      <td>True</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>Intl Mins</td>\n",
       "      <td>Fractional</td>\n",
       "      <td>1.0</td>\n",
       "      <td>True</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "             name inferred_type  completeness  num_constraints.is_non_negative\n",
       "0           Churn      Integral           1.0                             True\n",
       "1  Account Length      Integral           1.0                             True\n",
       "2   VMail Message      Integral           1.0                             True\n",
       "3        Day Mins    Fractional           1.0                             True\n",
       "4       Day Calls      Integral           1.0                             True\n",
       "5        Eve Mins    Fractional           1.0                             True\n",
       "6       Eve Calls      Integral           1.0                             True\n",
       "7      Night Mins    Fractional           1.0                             True\n",
       "8     Night Calls      Integral           1.0                             True\n",
       "9       Intl Mins    Fractional           1.0                             True"
      ]
     },
     "execution_count": 15,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "constraints_df = pd.io.json.json_normalize(\n",
    "    baseline_job.suggested_constraints().body_dict[\"features\"]\n",
    ")\n",
    "constraints_df.head(10)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1df1fea8",
   "metadata": {
    "papermill": {
     "duration": 0.049354,
     "end_time": "2022-04-18T00:29:47.491527",
     "exception": false,
     "start_time": "2022-04-18T00:29:47.442173",
     "status": "completed"
    },
    "tags": []
   },
   "source": [
    "### 2. Analyze collected data for data quality issues\n",
    "\n",
    "When you have collected the data above, analyze and monitor the data with Monitoring Schedules."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d61d11a9",
   "metadata": {
    "papermill": {
     "duration": 0.049273,
     "end_time": "2022-04-18T00:29:47.590172",
     "exception": false,
     "start_time": "2022-04-18T00:29:47.540899",
     "status": "completed"
    },
    "tags": []
   },
   "source": [
    "#### Create a schedule"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "id": "fc1c301f",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2022-04-18T00:29:47.702041Z",
     "iopub.status.busy": "2022-04-18T00:29:47.701315Z",
     "iopub.status.idle": "2022-04-18T00:29:47.906167Z",
     "shell.execute_reply": "2022-04-18T00:29:47.906567Z"
    },
    "papermill": {
     "duration": 0.267475,
     "end_time": "2022-04-18T00:29:47.906709",
     "exception": false,
     "start_time": "2022-04-18T00:29:47.639234",
     "status": "completed"
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "# Upload some test scripts to the S3 bucket for pre- and post-processing\n",
    "bucket = boto3.Session().resource(\"s3\").Bucket(bucket)\n",
    "bucket.Object(code_prefix + \"/preprocessor.py\").upload_file(\"preprocessor.py\")\n",
    "bucket.Object(code_prefix + \"/postprocessor.py\").upload_file(\"postprocessor.py\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e142c2be",
   "metadata": {
    "papermill": {
     "duration": 0.050333,
     "end_time": "2022-04-18T00:29:48.008206",
     "exception": false,
     "start_time": "2022-04-18T00:29:47.957873",
     "status": "completed"
    },
    "tags": []
   },
   "source": [
    "You can create a model monitoring schedule for the endpoint created earlier. Use the baseline resources (constraints and statistics) to compare against the realtime traffic."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "id": "e7889de1",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2022-04-18T00:29:48.112871Z",
     "iopub.status.busy": "2022-04-18T00:29:48.112310Z",
     "iopub.status.idle": "2022-04-18T00:29:48.725136Z",
     "shell.execute_reply": "2022-04-18T00:29:48.725519Z"
    },
    "papermill": {
     "duration": 0.667743,
     "end_time": "2022-04-18T00:29:48.725674",
     "exception": false,
     "start_time": "2022-04-18T00:29:48.057931",
     "status": "completed"
    },
    "tags": []
   },
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "The endpoint attribute has been renamed in sagemaker>=2.\n",
      "See: https://sagemaker.readthedocs.io/en/stable/v2.html for details.\n"
     ]
    }
   ],
   "source": [
    "from sagemaker.model_monitor import CronExpressionGenerator\n",
    "\n",
    "mon_schedule_name = \"DEMO-xgb-churn-pred-model-monitor-schedule-\" + strftime(\n",
    "    \"%Y-%m-%d-%H-%M-%S\", gmtime()\n",
    ")\n",
    "my_default_monitor.create_monitoring_schedule(\n",
    "    monitor_schedule_name=mon_schedule_name,\n",
    "    endpoint_input=predictor.endpoint,\n",
    "    # record_preprocessor_script=pre_processor_script,\n",
    "    post_analytics_processor_script=s3_code_postprocessor_uri,\n",
    "    output_s3_uri=s3_report_path,\n",
    "    statistics=my_default_monitor.baseline_statistics(),\n",
    "    constraints=my_default_monitor.suggested_constraints(),\n",
    "    schedule_cron_expression=CronExpressionGenerator.hourly(),\n",
    "    enable_cloudwatch_metrics=True,\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c556ecf8",
   "metadata": {
    "papermill": {
     "duration": 0.049771,
     "end_time": "2022-04-18T00:29:48.825958",
     "exception": false,
     "start_time": "2022-04-18T00:29:48.776187",
     "status": "completed"
    },
    "tags": []
   },
   "source": [
    "#### Start generating some artificial traffic\n",
    "The cell below starts a thread to send some traffic to the endpoint. Note that you need to stop the kernel to terminate this thread. If there is no traffic, the monitoring jobs are marked as `Failed` since there is no data to process."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "id": "23e81c77",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2022-04-18T00:29:48.933021Z",
     "iopub.status.busy": "2022-04-18T00:29:48.932245Z",
     "iopub.status.idle": "2022-04-18T00:29:48.937734Z",
     "shell.execute_reply": "2022-04-18T00:29:48.938102Z"
    },
    "papermill": {
     "duration": 0.06214,
     "end_time": "2022-04-18T00:29:48.938258",
     "exception": false,
     "start_time": "2022-04-18T00:29:48.876118",
     "status": "completed"
    },
    "tags": []
   },
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "The endpoint attribute has been renamed in sagemaker>=2.\n",
      "See: https://sagemaker.readthedocs.io/en/stable/v2.html for details.\n"
     ]
    }
   ],
   "source": [
    "from threading import Thread\n",
    "from time import sleep\n",
    "\n",
    "endpoint_name = predictor.endpoint\n",
    "runtime_client = sm_session.sagemaker_runtime_client\n",
    "\n",
    "# (just repeating code from above for convenience/ able to run this section independently)\n",
    "def invoke_endpoint(ep_name, file_name, runtime_client):\n",
    "    with open(file_name, \"r\") as f:\n",
    "        for row in f:\n",
    "            payload = row.rstrip(\"\\n\")\n",
    "            response = runtime_client.invoke_endpoint(\n",
    "                EndpointName=ep_name, ContentType=\"text/csv\", Body=payload\n",
    "            )\n",
    "            response[\"Body\"].read()\n",
    "            time.sleep(1)\n",
    "\n",
    "\n",
    "def invoke_endpoint_forever():\n",
    "    while True:\n",
    "        try:\n",
    "            invoke_endpoint(endpoint_name, \"test_data/test-dataset-input-cols.csv\", runtime_client)\n",
    "        except runtime_client.exceptions.ValidationError:\n",
    "            pass\n",
    "\n",
    "\n",
    "thread = Thread(target=invoke_endpoint_forever)\n",
    "thread.start()\n",
    "\n",
    "# Note that you need to stop the kernel to stop the invocations"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "22785b69",
   "metadata": {
    "papermill": {
     "duration": 0.050394,
     "end_time": "2022-04-18T00:29:49.042351",
     "exception": false,
     "start_time": "2022-04-18T00:29:48.991957",
     "status": "completed"
    },
    "tags": []
   },
   "source": [
    "#### Describe and inspect the schedule\n",
    "Once you describe, observe that the MonitoringScheduleStatus changes to Scheduled."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "id": "7d272369",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2022-04-18T00:29:49.148331Z",
     "iopub.status.busy": "2022-04-18T00:29:49.147784Z",
     "iopub.status.idle": "2022-04-18T00:29:49.186670Z",
     "shell.execute_reply": "2022-04-18T00:29:49.187047Z"
    },
    "papermill": {
     "duration": 0.094778,
     "end_time": "2022-04-18T00:29:49.187187",
     "exception": false,
     "start_time": "2022-04-18T00:29:49.092409",
     "status": "completed"
    },
    "tags": []
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Schedule status: Pending\n"
     ]
    }
   ],
   "source": [
    "desc_schedule_result = my_default_monitor.describe_schedule()\n",
    "print(\"Schedule status: {}\".format(desc_schedule_result[\"MonitoringScheduleStatus\"]))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "fc3fe71d",
   "metadata": {
    "papermill": {
     "duration": 0.050423,
     "end_time": "2022-04-18T00:29:49.289400",
     "exception": false,
     "start_time": "2022-04-18T00:29:49.238977",
     "status": "completed"
    },
    "tags": []
   },
   "source": [
    "#### List executions\n",
    "The schedule starts jobs at the previously specified intervals. Here, you list the latest five executions. Note that if you are kicking this off after creating the hourly schedule, you might find the executions empty. You might have to wait until you cross the hour boundary (in UTC) to see executions kick off. The code below has the logic for waiting.\n",
    "\n",
    "Note: Even for an hourly schedule, Amazon SageMaker has a buffer period of 20 minutes to schedule your execution. You might see your execution start in anywhere from zero to ~20 minutes from the hour boundary. This is expected and done for load balancing in the backend."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "id": "1a7494cf",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2022-04-18T00:29:49.396118Z",
     "iopub.status.busy": "2022-04-18T00:29:49.395608Z",
     "iopub.status.idle": "2022-04-18T01:10:54.022149Z",
     "shell.execute_reply": "2022-04-18T01:10:54.022560Z"
    },
    "papermill": {
     "duration": 2464.682976,
     "end_time": "2022-04-18T01:10:54.022704",
     "exception": false,
     "start_time": "2022-04-18T00:29:49.339728",
     "status": "completed"
    },
    "scrolled": true,
    "tags": []
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "No executions found for schedule. monitoring_schedule_name: DEMO-xgb-churn-pred-model-monitor-schedule-2022-04-18-00-29-48\n",
      "We created a hourly schedule above that begins executions ON the hour (plus 0-20 min buffer.\n",
      "We will have to wait till we hit the hour...\n",
      "Waiting for the first execution to happen...\n",
      "No executions found for schedule. monitoring_schedule_name: DEMO-xgb-churn-pred-model-monitor-schedule-2022-04-18-00-29-48\n",
      "Waiting for the first execution to happen...\n",
      "No executions found for schedule. monitoring_schedule_name: DEMO-xgb-churn-pred-model-monitor-schedule-2022-04-18-00-29-48\n",
      "Waiting for the first execution to happen...\n",
      "No executions found for schedule. monitoring_schedule_name: DEMO-xgb-churn-pred-model-monitor-schedule-2022-04-18-00-29-48\n",
      "Waiting for the first execution to happen...\n",
      "No executions found for schedule. monitoring_schedule_name: DEMO-xgb-churn-pred-model-monitor-schedule-2022-04-18-00-29-48\n",
      "Waiting for the first execution to happen...\n",
      "No executions found for schedule. monitoring_schedule_name: DEMO-xgb-churn-pred-model-monitor-schedule-2022-04-18-00-29-48\n",
      "Waiting for the first execution to happen...\n",
      "No executions found for schedule. monitoring_schedule_name: DEMO-xgb-churn-pred-model-monitor-schedule-2022-04-18-00-29-48\n",
      "Waiting for the first execution to happen...\n",
      "No executions found for schedule. monitoring_schedule_name: DEMO-xgb-churn-pred-model-monitor-schedule-2022-04-18-00-29-48\n",
      "Waiting for the first execution to happen...\n",
      "No executions found for schedule. monitoring_schedule_name: DEMO-xgb-churn-pred-model-monitor-schedule-2022-04-18-00-29-48\n",
      "Waiting for the first execution to happen...\n",
      "No executions found for schedule. monitoring_schedule_name: DEMO-xgb-churn-pred-model-monitor-schedule-2022-04-18-00-29-48\n",
      "Waiting for the first execution to happen...\n",
      "No executions found for schedule. monitoring_schedule_name: DEMO-xgb-churn-pred-model-monitor-schedule-2022-04-18-00-29-48\n",
      "Waiting for the first execution to happen...\n",
      "No executions found for schedule. monitoring_schedule_name: DEMO-xgb-churn-pred-model-monitor-schedule-2022-04-18-00-29-48\n",
      "Waiting for the first execution to happen...\n",
      "No executions found for schedule. monitoring_schedule_name: DEMO-xgb-churn-pred-model-monitor-schedule-2022-04-18-00-29-48\n",
      "Waiting for the first execution to happen...\n",
      "No executions found for schedule. monitoring_schedule_name: DEMO-xgb-churn-pred-model-monitor-schedule-2022-04-18-00-29-48\n",
      "Waiting for the first execution to happen...\n",
      "No executions found for schedule. monitoring_schedule_name: DEMO-xgb-churn-pred-model-monitor-schedule-2022-04-18-00-29-48\n",
      "Waiting for the first execution to happen...\n",
      "No executions found for schedule. monitoring_schedule_name: DEMO-xgb-churn-pred-model-monitor-schedule-2022-04-18-00-29-48\n",
      "Waiting for the first execution to happen...\n",
      "No executions found for schedule. monitoring_schedule_name: DEMO-xgb-churn-pred-model-monitor-schedule-2022-04-18-00-29-48\n",
      "Waiting for the first execution to happen...\n",
      "No executions found for schedule. monitoring_schedule_name: DEMO-xgb-churn-pred-model-monitor-schedule-2022-04-18-00-29-48\n",
      "Waiting for the first execution to happen...\n",
      "No executions found for schedule. monitoring_schedule_name: DEMO-xgb-churn-pred-model-monitor-schedule-2022-04-18-00-29-48\n",
      "Waiting for the first execution to happen...\n",
      "No executions found for schedule. monitoring_schedule_name: DEMO-xgb-churn-pred-model-monitor-schedule-2022-04-18-00-29-48\n",
      "Waiting for the first execution to happen...\n",
      "No executions found for schedule. monitoring_schedule_name: DEMO-xgb-churn-pred-model-monitor-schedule-2022-04-18-00-29-48\n",
      "Waiting for the first execution to happen...\n",
      "No executions found for schedule. monitoring_schedule_name: DEMO-xgb-churn-pred-model-monitor-schedule-2022-04-18-00-29-48\n",
      "Waiting for the first execution to happen...\n",
      "No executions found for schedule. monitoring_schedule_name: DEMO-xgb-churn-pred-model-monitor-schedule-2022-04-18-00-29-48\n",
      "Waiting for the first execution to happen...\n",
      "No executions found for schedule. monitoring_schedule_name: DEMO-xgb-churn-pred-model-monitor-schedule-2022-04-18-00-29-48\n",
      "Waiting for the first execution to happen...\n",
      "No executions found for schedule. monitoring_schedule_name: DEMO-xgb-churn-pred-model-monitor-schedule-2022-04-18-00-29-48\n",
      "Waiting for the first execution to happen...\n",
      "No executions found for schedule. monitoring_schedule_name: DEMO-xgb-churn-pred-model-monitor-schedule-2022-04-18-00-29-48\n",
      "Waiting for the first execution to happen...\n",
      "No executions found for schedule. monitoring_schedule_name: DEMO-xgb-churn-pred-model-monitor-schedule-2022-04-18-00-29-48\n",
      "Waiting for the first execution to happen...\n",
      "No executions found for schedule. monitoring_schedule_name: DEMO-xgb-churn-pred-model-monitor-schedule-2022-04-18-00-29-48\n",
      "Waiting for the first execution to happen...\n",
      "No executions found for schedule. monitoring_schedule_name: DEMO-xgb-churn-pred-model-monitor-schedule-2022-04-18-00-29-48\n",
      "Waiting for the first execution to happen...\n",
      "No executions found for schedule. monitoring_schedule_name: DEMO-xgb-churn-pred-model-monitor-schedule-2022-04-18-00-29-48\n",
      "Waiting for the first execution to happen...\n",
      "No executions found for schedule. monitoring_schedule_name: DEMO-xgb-churn-pred-model-monitor-schedule-2022-04-18-00-29-48\n",
      "Waiting for the first execution to happen...\n",
      "No executions found for schedule. monitoring_schedule_name: DEMO-xgb-churn-pred-model-monitor-schedule-2022-04-18-00-29-48\n",
      "Waiting for the first execution to happen...\n",
      "No executions found for schedule. monitoring_schedule_name: DEMO-xgb-churn-pred-model-monitor-schedule-2022-04-18-00-29-48\n",
      "Waiting for the first execution to happen...\n",
      "No executions found for schedule. monitoring_schedule_name: DEMO-xgb-churn-pred-model-monitor-schedule-2022-04-18-00-29-48\n",
      "Waiting for the first execution to happen...\n",
      "No executions found for schedule. monitoring_schedule_name: DEMO-xgb-churn-pred-model-monitor-schedule-2022-04-18-00-29-48\n",
      "Waiting for the first execution to happen...\n",
      "No executions found for schedule. monitoring_schedule_name: DEMO-xgb-churn-pred-model-monitor-schedule-2022-04-18-00-29-48\n",
      "Waiting for the first execution to happen...\n",
      "No executions found for schedule. monitoring_schedule_name: DEMO-xgb-churn-pred-model-monitor-schedule-2022-04-18-00-29-48\n",
      "Waiting for the first execution to happen...\n",
      "No executions found for schedule. monitoring_schedule_name: DEMO-xgb-churn-pred-model-monitor-schedule-2022-04-18-00-29-48\n",
      "Waiting for the first execution to happen...\n",
      "No executions found for schedule. monitoring_schedule_name: DEMO-xgb-churn-pred-model-monitor-schedule-2022-04-18-00-29-48\n",
      "Waiting for the first execution to happen...\n",
      "No executions found for schedule. monitoring_schedule_name: DEMO-xgb-churn-pred-model-monitor-schedule-2022-04-18-00-29-48\n",
      "Waiting for the first execution to happen...\n",
      "Waiting for the first execution to happen...\n"
     ]
    }
   ],
   "source": [
    "mon_executions = my_default_monitor.list_executions()\n",
    "print(\n",
    "    \"We created a hourly schedule above that begins executions ON the hour (plus 0-20 min buffer.\\nWe will have to wait till we hit the hour...\"\n",
    ")\n",
    "\n",
    "while len(mon_executions) == 0:\n",
    "    print(\"Waiting for the first execution to happen...\")\n",
    "    time.sleep(60)\n",
    "    mon_executions = my_default_monitor.list_executions()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "708e5a91",
   "metadata": {
    "papermill": {
     "duration": 0.059309,
     "end_time": "2022-04-18T01:10:54.142040",
     "exception": false,
     "start_time": "2022-04-18T01:10:54.082731",
     "status": "completed"
    },
    "tags": []
   },
   "source": [
    "#### Inspect a specific execution (latest execution)\n",
    "In the previous cell, you picked up the latest completed or failed scheduled execution. Here are the possible terminal states and what each of them mean: \n",
    "* `Completed` - The monitoring execution completed and no issues were found in the violations report.\n",
    "* `CompletedWithViolations` - The execution completed, but constraint violations were detected.\n",
    "* `Failed` - The monitoring execution failed, maybe due to client error (perhaps incorrect role premissions) or infrastructure issues. Further examination of `FailureReason` and `ExitMessage` is necessary to identify what exactly happened.\n",
    "* `Stopped` - The job exceeded max runtime or was manually stopped."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "id": "4d550a8a",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2022-04-18T01:10:54.266146Z",
     "iopub.status.busy": "2022-04-18T01:10:54.265580Z",
     "iopub.status.idle": "2022-04-18T01:15:40.304016Z",
     "shell.execute_reply": "2022-04-18T01:15:40.303589Z"
    },
    "papermill": {
     "duration": 286.102546,
     "end_time": "2022-04-18T01:15:40.304129",
     "exception": false,
     "start_time": "2022-04-18T01:10:54.201583",
     "status": "completed"
    },
    "tags": []
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      ".............................................!Latest execution status: Completed\n",
      "Latest execution result: CompletedWithViolations: Job completed successfully with 60 violations.\n"
     ]
    }
   ],
   "source": [
    "latest_execution = mon_executions[-1]  # Latest execution's index is -1, second to last is -2, etc\n",
    "time.sleep(60)\n",
    "latest_execution.wait(logs=False)\n",
    "\n",
    "print(\"Latest execution status: {}\".format(latest_execution.describe()[\"ProcessingJobStatus\"]))\n",
    "print(\"Latest execution result: {}\".format(latest_execution.describe()[\"ExitMessage\"]))\n",
    "\n",
    "latest_job = latest_execution.describe()\n",
    "if latest_job[\"ProcessingJobStatus\"] != \"Completed\":\n",
    "    print(\n",
    "        \"====STOP==== \\n No completed executions to inspect further. Please wait till an execution completes or investigate previously reported failures.\"\n",
    "    )"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "id": "34b857e3",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2022-04-18T01:15:40.445435Z",
     "iopub.status.busy": "2022-04-18T01:15:40.444884Z",
     "iopub.status.idle": "2022-04-18T01:15:40.447659Z",
     "shell.execute_reply": "2022-04-18T01:15:40.447208Z"
    },
    "papermill": {
     "duration": 0.074909,
     "end_time": "2022-04-18T01:15:40.447784",
     "exception": false,
     "start_time": "2022-04-18T01:15:40.372875",
     "status": "completed"
    },
    "tags": []
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Report Uri: s3://sagemaker-us-west-2-000000000000/sagemaker/DEMO-ModelMonitor/reports/DEMO-xgb-churn-pred-model-monitor-2022-04-18-00-13-16/DEMO-xgb-churn-pred-model-monitor-schedule-2022-04-18-00-29-48/2022/04/18/01\n"
     ]
    }
   ],
   "source": [
    "report_uri = latest_execution.output.destination\n",
    "print(\"Report Uri: {}\".format(report_uri))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "bc4070b1",
   "metadata": {
    "papermill": {
     "duration": 0.068521,
     "end_time": "2022-04-18T01:15:40.585134",
     "exception": false,
     "start_time": "2022-04-18T01:15:40.516613",
     "status": "completed"
    },
    "tags": []
   },
   "source": [
    "#### List the generated reports"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "id": "227c215b",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2022-04-18T01:15:40.736923Z",
     "iopub.status.busy": "2022-04-18T01:15:40.728273Z",
     "iopub.status.idle": "2022-04-18T01:15:40.987597Z",
     "shell.execute_reply": "2022-04-18T01:15:40.987996Z"
    },
    "papermill": {
     "duration": 0.33424,
     "end_time": "2022-04-18T01:15:40.988140",
     "exception": false,
     "start_time": "2022-04-18T01:15:40.653900",
     "status": "completed"
    },
    "tags": []
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Report bucket: sagemaker-us-west-2-521695447989\n",
      "Report key: sagemaker/DEMO-ModelMonitor/reports/DEMO-xgb-churn-pred-model-monitor-2022-04-18-00-13-16/DEMO-xgb-churn-pred-model-monitor-schedule-2022-04-18-00-29-48/2022/04/18/01\n",
      "Found Report Files:\n",
      "sagemaker/DEMO-ModelMonitor/reports/DEMO-xgb-churn-pred-model-monitor-2022-04-18-00-13-16/DEMO-xgb-churn-pred-model-monitor-schedule-2022-04-18-00-29-48/2022/04/18/01/constraint_violations.json\n",
      " sagemaker/DEMO-ModelMonitor/reports/DEMO-xgb-churn-pred-model-monitor-2022-04-18-00-13-16/DEMO-xgb-churn-pred-model-monitor-schedule-2022-04-18-00-29-48/2022/04/18/01/constraints.json\n",
      " sagemaker/DEMO-ModelMonitor/reports/DEMO-xgb-churn-pred-model-monitor-2022-04-18-00-13-16/DEMO-xgb-churn-pred-model-monitor-schedule-2022-04-18-00-29-48/2022/04/18/01/statistics.json\n"
     ]
    }
   ],
   "source": [
    "from urllib.parse import urlparse\n",
    "\n",
    "s3uri = urlparse(report_uri)\n",
    "report_bucket = s3uri.netloc\n",
    "report_key = s3uri.path.lstrip(\"/\")\n",
    "print(\"Report bucket: {}\".format(report_bucket))\n",
    "print(\"Report key: {}\".format(report_key))\n",
    "\n",
    "s3_client = boto3.Session().client(\"s3\")\n",
    "result = s3_client.list_objects(Bucket=report_bucket, Prefix=report_key)\n",
    "report_files = [report_file.get(\"Key\") for report_file in result.get(\"Contents\")]\n",
    "print(\"Found Report Files:\")\n",
    "print(\"\\n \".join(report_files))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "38abe9e0",
   "metadata": {
    "papermill": {
     "duration": 0.069468,
     "end_time": "2022-04-18T01:15:41.127235",
     "exception": false,
     "start_time": "2022-04-18T01:15:41.057767",
     "status": "completed"
    },
    "tags": []
   },
   "source": [
    "#### Violations report"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "77403388",
   "metadata": {
    "papermill": {
     "duration": 0.070022,
     "end_time": "2022-04-18T01:15:41.266951",
     "exception": false,
     "start_time": "2022-04-18T01:15:41.196929",
     "status": "completed"
    },
    "tags": []
   },
   "source": [
    "Any violations compared to the baseline are listed below."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "id": "111944d0",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2022-04-18T01:15:41.411758Z",
     "iopub.status.busy": "2022-04-18T01:15:41.411186Z",
     "iopub.status.idle": "2022-04-18T01:15:41.543054Z",
     "shell.execute_reply": "2022-04-18T01:15:41.543448Z"
    },
    "papermill": {
     "duration": 0.206895,
     "end_time": "2022-04-18T01:15:41.543590",
     "exception": false,
     "start_time": "2022-04-18T01:15:41.336695",
     "status": "completed"
    },
    "tags": []
   },
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/opt/conda/lib/python3.7/site-packages/ipykernel_launcher.py:3: FutureWarning: pandas.io.json.json_normalize is deprecated, use pandas.json_normalize instead\n",
      "  This is separate from the ipykernel package so we can avoid doing imports until\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>feature_name</th>\n",
       "      <th>constraint_check_type</th>\n",
       "      <th>description</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Area Code_408</td>\n",
       "      <td>data_type_check</td>\n",
       "      <td>Data type match requirement is not met. Expected data type: Integral, Expected match: 100.0%. Observed: Only 99.64592817400101% of data is Integral.</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>VMail Plan_no</td>\n",
       "      <td>data_type_check</td>\n",
       "      <td>Data type match requirement is not met. Expected data type: Integral, Expected match: 100.0%. Observed: Only 99.64592817400101% of data is Integral.</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>State_NV</td>\n",
       "      <td>data_type_check</td>\n",
       "      <td>Data type match requirement is not met. Expected data type: Integral, Expected match: 100.0%. Observed: Only 99.64592817400101% of data is Integral.</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>State_NC</td>\n",
       "      <td>data_type_check</td>\n",
       "      <td>Data type match requirement is not met. Expected data type: Integral, Expected match: 100.0%. Observed: Only 99.64592817400101% of data is Integral.</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>State_SC</td>\n",
       "      <td>data_type_check</td>\n",
       "      <td>Data type match requirement is not met. Expected data type: Integral, Expected match: 100.0%. Observed: Only 99.64592817400101% of data is Integral.</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>State_UT</td>\n",
       "      <td>data_type_check</td>\n",
       "      <td>Data type match requirement is not met. Expected data type: Integral, Expected match: 100.0%. Observed: Only 99.64592817400101% of data is Integral.</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>State_LA</td>\n",
       "      <td>data_type_check</td>\n",
       "      <td>Data type match requirement is not met. Expected data type: Integral, Expected match: 100.0%. Observed: Only 99.64592817400101% of data is Integral.</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>State_CA</td>\n",
       "      <td>data_type_check</td>\n",
       "      <td>Data type match requirement is not met. Expected data type: Integral, Expected match: 100.0%. Observed: Only 99.64592817400101% of data is Integral.</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>State_RI</td>\n",
       "      <td>data_type_check</td>\n",
       "      <td>Data type match requirement is not met. Expected data type: Integral, Expected match: 100.0%. Observed: Only 99.64592817400101% of data is Integral.</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>Churn</td>\n",
       "      <td>data_type_check</td>\n",
       "      <td>Data type match requirement is not met. Expected data type: Integral, Expected match: 100.0%. Observed: Only 0.0% of data is Integral.</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "    feature_name constraint_check_type  \\\n",
       "0  Area Code_408       data_type_check   \n",
       "1  VMail Plan_no       data_type_check   \n",
       "2       State_NV       data_type_check   \n",
       "3       State_NC       data_type_check   \n",
       "4       State_SC       data_type_check   \n",
       "5       State_UT       data_type_check   \n",
       "6       State_LA       data_type_check   \n",
       "7       State_CA       data_type_check   \n",
       "8       State_RI       data_type_check   \n",
       "9          Churn       data_type_check   \n",
       "\n",
       "                                                                                                                                            description  \n",
       "0  Data type match requirement is not met. Expected data type: Integral, Expected match: 100.0%. Observed: Only 99.64592817400101% of data is Integral.  \n",
       "1  Data type match requirement is not met. Expected data type: Integral, Expected match: 100.0%. Observed: Only 99.64592817400101% of data is Integral.  \n",
       "2  Data type match requirement is not met. Expected data type: Integral, Expected match: 100.0%. Observed: Only 99.64592817400101% of data is Integral.  \n",
       "3  Data type match requirement is not met. Expected data type: Integral, Expected match: 100.0%. Observed: Only 99.64592817400101% of data is Integral.  \n",
       "4  Data type match requirement is not met. Expected data type: Integral, Expected match: 100.0%. Observed: Only 99.64592817400101% of data is Integral.  \n",
       "5  Data type match requirement is not met. Expected data type: Integral, Expected match: 100.0%. Observed: Only 99.64592817400101% of data is Integral.  \n",
       "6  Data type match requirement is not met. Expected data type: Integral, Expected match: 100.0%. Observed: Only 99.64592817400101% of data is Integral.  \n",
       "7  Data type match requirement is not met. Expected data type: Integral, Expected match: 100.0%. Observed: Only 99.64592817400101% of data is Integral.  \n",
       "8  Data type match requirement is not met. Expected data type: Integral, Expected match: 100.0%. Observed: Only 99.64592817400101% of data is Integral.  \n",
       "9                Data type match requirement is not met. Expected data type: Integral, Expected match: 100.0%. Observed: Only 0.0% of data is Integral.  "
      ]
     },
     "execution_count": 24,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "violations = my_default_monitor.latest_monitoring_constraint_violations()\n",
    "pd.set_option(\"display.max_colwidth\", None)\n",
    "constraints_df = pd.io.json.json_normalize(violations.body_dict[\"violations\"])\n",
    "constraints_df.head(10)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "de159706",
   "metadata": {
    "papermill": {
     "duration": 0.070228,
     "end_time": "2022-04-18T01:15:41.683659",
     "exception": false,
     "start_time": "2022-04-18T01:15:41.613431",
     "status": "completed"
    },
    "tags": []
   },
   "source": [
    "#### Other commands\n",
    "We can also start and stop the monitoring schedules."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "id": "70749848",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2022-04-18T01:15:41.827114Z",
     "iopub.status.busy": "2022-04-18T01:15:41.826633Z",
     "iopub.status.idle": "2022-04-18T01:15:41.828553Z",
     "shell.execute_reply": "2022-04-18T01:15:41.828938Z"
    },
    "papermill": {
     "duration": 0.075182,
     "end_time": "2022-04-18T01:15:41.829067",
     "exception": false,
     "start_time": "2022-04-18T01:15:41.753885",
     "status": "completed"
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "# my_default_monitor.stop_monitoring_schedule()\n",
    "# my_default_monitor.start_monitoring_schedule()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "da8713a6",
   "metadata": {
    "papermill": {
     "duration": 0.070556,
     "end_time": "2022-04-18T01:15:41.969642",
     "exception": false,
     "start_time": "2022-04-18T01:15:41.899086",
     "status": "completed"
    },
    "tags": []
   },
   "source": [
    "## Delete resources\n",
    "\n",
    "You can keep your endpoint running to continue capturing data. If you do not plan to collect more data or use this endpoint further, delete the endpoint to avoid incurring additional charges. Note that deleting your endpoint does not delete the data that was captured during the model invocations. That data persists in Amazon S3 until you delete it yourself.\n",
    "\n",
    "You need to delete the schedule before deleting the model and endpoint."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "id": "1af984eb",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2022-04-18T01:15:42.144557Z",
     "iopub.status.busy": "2022-04-18T01:15:42.144042Z",
     "iopub.status.idle": "2022-04-18T01:17:02.831114Z",
     "shell.execute_reply": "2022-04-18T01:17:02.831629Z"
    },
    "papermill": {
     "duration": 80.76217,
     "end_time": "2022-04-18T01:17:02.831829",
     "exception": false,
     "start_time": "2022-04-18T01:15:42.069659",
     "status": "completed"
    },
    "tags": []
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "Stopping Monitoring Schedule with name: DEMO-xgb-churn-pred-model-monitor-schedule-2022-04-18-00-29-48\n",
      "\n",
      "Deleting Monitoring Schedule with name: DEMO-xgb-churn-pred-model-monitor-schedule-2022-04-18-00-29-48\n"
     ]
    }
   ],
   "source": [
    "my_default_monitor.stop_monitoring_schedule()\n",
    "my_default_monitor.delete_monitoring_schedule()\n",
    "time.sleep(60)  # Wait for the deletion"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "id": "c5d93a04",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2022-04-18T01:17:02.995062Z",
     "iopub.status.busy": "2022-04-18T01:17:02.994531Z",
     "iopub.status.idle": "2022-04-18T01:17:03.202044Z",
     "shell.execute_reply": "2022-04-18T01:17:03.202448Z"
    },
    "papermill": {
     "duration": 0.291003,
     "end_time": "2022-04-18T01:17:03.202589",
     "exception": false,
     "start_time": "2022-04-18T01:17:02.911586",
     "status": "completed"
    },
    "scrolled": true,
    "tags": []
   },
   "outputs": [],
   "source": [
    "predictor.delete_model()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "id": "2fd0f570",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2022-04-18T01:17:03.360478Z",
     "iopub.status.busy": "2022-04-18T01:17:03.359925Z",
     "iopub.status.idle": "2022-04-18T01:17:03.565182Z",
     "shell.execute_reply": "2022-04-18T01:17:03.565688Z"
    },
    "papermill": {
     "duration": 0.290093,
     "end_time": "2022-04-18T01:17:03.565886",
     "exception": false,
     "start_time": "2022-04-18T01:17:03.275793",
     "status": "completed"
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "predictor.delete_endpoint()"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Notebook CI Test Results\n",
    "\n",
    "This notebook was tested in multiple regions. The test results are as follows, except for us-west-2 which is shown at the top of the notebook.\n",
    "\n",
    "![This us-east-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/us-east-1/sagemaker_model_monitor|introduction|SageMaker-ModelMonitoring_outputs.ipynb)\n",
    "\n",
    "![This us-east-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/us-east-2/sagemaker_model_monitor|introduction|SageMaker-ModelMonitoring_outputs.ipynb)\n",
    "\n",
    "![This us-west-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/us-west-1/sagemaker_model_monitor|introduction|SageMaker-ModelMonitoring_outputs.ipynb)\n",
    "\n",
    "![This ca-central-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/ca-central-1/sagemaker_model_monitor|introduction|SageMaker-ModelMonitoring_outputs.ipynb)\n",
    "\n",
    "![This sa-east-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/sa-east-1/sagemaker_model_monitor|introduction|SageMaker-ModelMonitoring_outputs.ipynb)\n",
    "\n",
    "![This eu-west-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/eu-west-1/sagemaker_model_monitor|introduction|SageMaker-ModelMonitoring_outputs.ipynb)\n",
    "\n",
    "![This eu-west-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/eu-west-2/sagemaker_model_monitor|introduction|SageMaker-ModelMonitoring_outputs.ipynb)\n",
    "\n",
    "![This eu-west-3 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/eu-west-3/sagemaker_model_monitor|introduction|SageMaker-ModelMonitoring_outputs.ipynb)\n",
    "\n",
    "![This eu-central-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/eu-central-1/sagemaker_model_monitor|introduction|SageMaker-ModelMonitoring_outputs.ipynb)\n",
    "\n",
    "![This eu-north-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/eu-north-1/sagemaker_model_monitor|introduction|SageMaker-ModelMonitoring_outputs.ipynb)\n",
    "\n",
    "![This ap-southeast-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/ap-southeast-1/sagemaker_model_monitor|introduction|SageMaker-ModelMonitoring_outputs.ipynb)\n",
    "\n",
    "![This ap-southeast-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/ap-southeast-2/sagemaker_model_monitor|introduction|SageMaker-ModelMonitoring_outputs.ipynb)\n",
    "\n",
    "![This ap-northeast-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/ap-northeast-1/sagemaker_model_monitor|introduction|SageMaker-ModelMonitoring_outputs.ipynb)\n",
    "\n",
    "![This ap-northeast-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/ap-northeast-2/sagemaker_model_monitor|introduction|SageMaker-ModelMonitoring_outputs.ipynb)\n",
    "\n",
    "![This ap-south-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/ap-south-1/sagemaker_model_monitor|introduction|SageMaker-ModelMonitoring_outputs.ipynb)\n"
   ]
  }
 ],
 "metadata": {
  "anaconda-cloud": {},
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.9"
  },
  "notice": "Copyright 2017 Amazon.com, Inc. or its affiliates. All Rights Reserved.  Licensed under the Apache License, Version 2.0 (the \"License\"). You may not use this file except in compliance with the License. A copy of the License is located at http://aws.amazon.com/apache2.0/ or in the \"license\" file accompanying this file. This file is distributed on an \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.",
  "papermill": {
   "default_parameters": {},
   "duration": 3833.130815,
   "end_time": "2022-04-18T01:17:06.266500",
   "environment_variables": {},
   "exception": null,
   "input_path": "SageMaker-ModelMonitoring.ipynb",
   "output_path": "/opt/ml/processing/output/SageMaker-ModelMonitoring-2022-04-18-00-08-24.ipynb",
   "parameters": {
    "kms_key": "arn:aws:kms:us-west-2:000000000000:1234abcd-12ab-34cd-56ef-1234567890ab"
   },
   "start_time": "2022-04-18T00:13:13.135685",
   "version": "2.3.4"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}