{
 "cells": [
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Customer Churn Prediction with Amazon SageMaker Autopilot\n"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "---\n",
    "\n",
    "This notebook's CI test result for us-west-2 is as follows. CI test results in other regions can be found at the end of the notebook. \n",
    "\n",
    "![This us-west-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/us-west-2/autopilot|autopilot_customer_churn.ipynb)\n",
    "\n",
    "---"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "_**Using AutoPilot to Predict Mobile Customer Departure**_\n",
    "\n",
    "---\n",
    "\n",
    "---\n",
    "\n",
    "Kernel `Python 3 (Data Science)` works well with this notebook.\n",
    "\n",
    "## Contents\n",
    "\n",
    "1. [Introduction](#Introduction)\n",
    "1. [Setup](#Setup)\n",
    "1. [Data](#Data)\n",
    "1. [Train](#Settingup)\n",
    "1. [Autopilot Results](#Results)\n",
    "1. [Host](#Host)\n",
    "1. [Cleanup](#Cleanup)\n",
    "\n",
    "\n",
    "---\n",
    "\n",
    "## Introduction\n",
    "\n",
    "Amazon SageMaker Autopilot is an automated machine learning (commonly referred to as AutoML) solution for tabular datasets. You can use SageMaker Autopilot in different ways: on autopilot (hence the name) or with human guidance, without code through SageMaker Studio, or using the AWS SDKs. This notebook, as a first glimpse, will use the AWS SDKs to simply create and deploy a machine learning model.\n",
    "\n",
    "Losing customers is costly for any business.  Identifying unhappy customers early on gives you a chance to offer them incentives to stay.  This notebook describes using machine learning (ML) for the automated identification of unhappy customers, also known as customer churn prediction. ML models rarely give perfect predictions though, so this notebook is also about how to incorporate the relative costs of prediction mistakes when determining the financial outcome of using ML.\n",
    "\n",
    "We use an example of churn that is familiar to all of us–leaving a mobile phone operator.  Seems like I can always find fault with my provider du jour! And if my provider knows that I’m thinking of leaving, it can offer timely incentives–I can always use a phone upgrade or perhaps have a new feature activated–and I might just stick around. Incentives are often much more cost effective than losing and reacquiring a customer.\n",
    "\n",
    "---\n",
    "## Setup\n",
    "\n",
    "_This notebook was created and tested on an ml.m4.xlarge notebook instance._\n",
    "\n",
    "Let's start by specifying:\n",
    "\n",
    "- The S3 bucket and prefix that you want to use for training and model data.  This should be within the same region as the Notebook Instance, training, and hosting.\n",
    "- The IAM role arn used to give training and hosting access to your data. See the documentation for how to create these.  Note, if more than one role is required for notebook instances, training, and/or hosting, please replace the boto regexp with a the appropriate full IAM role arn string(s)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "isConfigCell": true,
    "tags": [
     "parameters"
    ]
   },
   "outputs": [],
   "source": [
    "import sagemaker\n",
    "import boto3\n",
    "from sagemaker import get_execution_role\n",
    "\n",
    "region = boto3.Session().region_name\n",
    "\n",
    "session = sagemaker.Session()\n",
    "\n",
    "# You can modify the following to use a bucket of your choosing\n",
    "bucket = session.default_bucket()\n",
    "prefix = \"sagemaker/DEMO-autopilot-churn\"\n",
    "\n",
    "role = get_execution_role()\n",
    "\n",
    "# This is the client we will use to interact with SageMaker AutoPilot\n",
    "sm = boto3.Session().client(service_name=\"sagemaker\", region_name=region)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Next, we'll import the Python libraries we'll need for the remainder of the exercise."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd\n",
    "import numpy as np\n",
    "import matplotlib.pyplot as plt\n",
    "import io\n",
    "import os\n",
    "import sys\n",
    "import time\n",
    "import json\n",
    "from IPython.display import display\n",
    "from time import strftime, gmtime\n",
    "import boto3\n",
    "import sagemaker\n",
    "from sagemaker.predictor import csv_serializer"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "---\n",
    "## Data\n",
    "\n",
    "Mobile operators have historical records on which customers ultimately ended up churning and which continued using the service. We can use this historical information to construct an ML model of one mobile operator’s churn using a process called training. After training the model, we can pass the profile information of an arbitrary customer (the same profile information that we used to train the model) to the model, and have the model predict whether this customer is going to churn. Of course, we expect the model to make mistakes–after all, predicting the future is tricky business! But I’ll also show how to deal with prediction errors.\n",
    "\n",
    "The dataset we will use is synthetically generated, but indictive of the types of features you'd see in this use case."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "s3 = boto3.client(\"s3\")\n",
    "s3.download_file(\n",
    "    f\"sagemaker-example-files-prod-{region}\", \"datasets/tabular/synthetic/churn.txt\", \"churn.txt\"\n",
    ")"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Upload the dataset to S3\n",
    "\n",
    "Before you run Autopilot on the dataset, first perform a check of the dataset to make sure that it has no obvious errors. The Autopilot process can take long time, and it's generally a good practice to inspect the dataset before you start a job. This particular dataset is small, so you can inspect it in the notebook instance itself. If you have a larger dataset that will not fit in a notebook instance memory, inspect the dataset offline using a big data analytics tool like Apache Spark. [Deequ](https://github.com/awslabs/deequ) is a library built on top of Apache Spark that can be helpful for performing checks on large datasets. Autopilot is capable of handling datasets up to 5 GB.\n",
    "\n",
    "Read the data into a Pandas data frame and take a look."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>State</th>\n",
       "      <th>Account Length</th>\n",
       "      <th>Area Code</th>\n",
       "      <th>Phone</th>\n",
       "      <th>Int'l Plan</th>\n",
       "      <th>VMail Plan</th>\n",
       "      <th>VMail Message</th>\n",
       "      <th>Day Mins</th>\n",
       "      <th>Day Calls</th>\n",
       "      <th>Day Charge</th>\n",
       "      <th>Eve Mins</th>\n",
       "      <th>Eve Calls</th>\n",
       "      <th>Eve Charge</th>\n",
       "      <th>Night Mins</th>\n",
       "      <th>Night Calls</th>\n",
       "      <th>Night Charge</th>\n",
       "      <th>Intl Mins</th>\n",
       "      <th>Intl Calls</th>\n",
       "      <th>Intl Charge</th>\n",
       "      <th>CustServ Calls</th>\n",
       "      <th>Churn?</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>KS</td>\n",
       "      <td>128</td>\n",
       "      <td>415</td>\n",
       "      <td>382-4657</td>\n",
       "      <td>no</td>\n",
       "      <td>yes</td>\n",
       "      <td>25</td>\n",
       "      <td>265.1</td>\n",
       "      <td>110</td>\n",
       "      <td>45.07</td>\n",
       "      <td>197.4</td>\n",
       "      <td>99</td>\n",
       "      <td>16.78</td>\n",
       "      <td>244.7</td>\n",
       "      <td>91</td>\n",
       "      <td>11.01</td>\n",
       "      <td>10.0</td>\n",
       "      <td>3</td>\n",
       "      <td>2.70</td>\n",
       "      <td>1</td>\n",
       "      <td>False.</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>OH</td>\n",
       "      <td>107</td>\n",
       "      <td>415</td>\n",
       "      <td>371-7191</td>\n",
       "      <td>no</td>\n",
       "      <td>yes</td>\n",
       "      <td>26</td>\n",
       "      <td>161.6</td>\n",
       "      <td>123</td>\n",
       "      <td>27.47</td>\n",
       "      <td>195.5</td>\n",
       "      <td>103</td>\n",
       "      <td>16.62</td>\n",
       "      <td>254.4</td>\n",
       "      <td>103</td>\n",
       "      <td>11.45</td>\n",
       "      <td>13.7</td>\n",
       "      <td>3</td>\n",
       "      <td>3.70</td>\n",
       "      <td>1</td>\n",
       "      <td>False.</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>NJ</td>\n",
       "      <td>137</td>\n",
       "      <td>415</td>\n",
       "      <td>358-1921</td>\n",
       "      <td>no</td>\n",
       "      <td>no</td>\n",
       "      <td>0</td>\n",
       "      <td>243.4</td>\n",
       "      <td>114</td>\n",
       "      <td>41.38</td>\n",
       "      <td>121.2</td>\n",
       "      <td>110</td>\n",
       "      <td>10.30</td>\n",
       "      <td>162.6</td>\n",
       "      <td>104</td>\n",
       "      <td>7.32</td>\n",
       "      <td>12.2</td>\n",
       "      <td>5</td>\n",
       "      <td>3.29</td>\n",
       "      <td>0</td>\n",
       "      <td>False.</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>OH</td>\n",
       "      <td>84</td>\n",
       "      <td>408</td>\n",
       "      <td>375-9999</td>\n",
       "      <td>yes</td>\n",
       "      <td>no</td>\n",
       "      <td>0</td>\n",
       "      <td>299.4</td>\n",
       "      <td>71</td>\n",
       "      <td>50.90</td>\n",
       "      <td>61.9</td>\n",
       "      <td>88</td>\n",
       "      <td>5.26</td>\n",
       "      <td>196.9</td>\n",
       "      <td>89</td>\n",
       "      <td>8.86</td>\n",
       "      <td>6.6</td>\n",
       "      <td>7</td>\n",
       "      <td>1.78</td>\n",
       "      <td>2</td>\n",
       "      <td>False.</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>OK</td>\n",
       "      <td>75</td>\n",
       "      <td>415</td>\n",
       "      <td>330-6626</td>\n",
       "      <td>yes</td>\n",
       "      <td>no</td>\n",
       "      <td>0</td>\n",
       "      <td>166.7</td>\n",
       "      <td>113</td>\n",
       "      <td>28.34</td>\n",
       "      <td>148.3</td>\n",
       "      <td>122</td>\n",
       "      <td>12.61</td>\n",
       "      <td>186.9</td>\n",
       "      <td>121</td>\n",
       "      <td>8.41</td>\n",
       "      <td>10.1</td>\n",
       "      <td>3</td>\n",
       "      <td>2.73</td>\n",
       "      <td>3</td>\n",
       "      <td>False.</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3328</th>\n",
       "      <td>AZ</td>\n",
       "      <td>192</td>\n",
       "      <td>415</td>\n",
       "      <td>414-4276</td>\n",
       "      <td>no</td>\n",
       "      <td>yes</td>\n",
       "      <td>36</td>\n",
       "      <td>156.2</td>\n",
       "      <td>77</td>\n",
       "      <td>26.55</td>\n",
       "      <td>215.5</td>\n",
       "      <td>126</td>\n",
       "      <td>18.32</td>\n",
       "      <td>279.1</td>\n",
       "      <td>83</td>\n",
       "      <td>12.56</td>\n",
       "      <td>9.9</td>\n",
       "      <td>6</td>\n",
       "      <td>2.67</td>\n",
       "      <td>2</td>\n",
       "      <td>False.</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3329</th>\n",
       "      <td>WV</td>\n",
       "      <td>68</td>\n",
       "      <td>415</td>\n",
       "      <td>370-3271</td>\n",
       "      <td>no</td>\n",
       "      <td>no</td>\n",
       "      <td>0</td>\n",
       "      <td>231.1</td>\n",
       "      <td>57</td>\n",
       "      <td>39.29</td>\n",
       "      <td>153.4</td>\n",
       "      <td>55</td>\n",
       "      <td>13.04</td>\n",
       "      <td>191.3</td>\n",
       "      <td>123</td>\n",
       "      <td>8.61</td>\n",
       "      <td>9.6</td>\n",
       "      <td>4</td>\n",
       "      <td>2.59</td>\n",
       "      <td>3</td>\n",
       "      <td>False.</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3330</th>\n",
       "      <td>RI</td>\n",
       "      <td>28</td>\n",
       "      <td>510</td>\n",
       "      <td>328-8230</td>\n",
       "      <td>no</td>\n",
       "      <td>no</td>\n",
       "      <td>0</td>\n",
       "      <td>180.8</td>\n",
       "      <td>109</td>\n",
       "      <td>30.74</td>\n",
       "      <td>288.8</td>\n",
       "      <td>58</td>\n",
       "      <td>24.55</td>\n",
       "      <td>191.9</td>\n",
       "      <td>91</td>\n",
       "      <td>8.64</td>\n",
       "      <td>14.1</td>\n",
       "      <td>6</td>\n",
       "      <td>3.81</td>\n",
       "      <td>2</td>\n",
       "      <td>False.</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3331</th>\n",
       "      <td>CT</td>\n",
       "      <td>184</td>\n",
       "      <td>510</td>\n",
       "      <td>364-6381</td>\n",
       "      <td>yes</td>\n",
       "      <td>no</td>\n",
       "      <td>0</td>\n",
       "      <td>213.8</td>\n",
       "      <td>105</td>\n",
       "      <td>36.35</td>\n",
       "      <td>159.6</td>\n",
       "      <td>84</td>\n",
       "      <td>13.57</td>\n",
       "      <td>139.2</td>\n",
       "      <td>137</td>\n",
       "      <td>6.26</td>\n",
       "      <td>5.0</td>\n",
       "      <td>10</td>\n",
       "      <td>1.35</td>\n",
       "      <td>2</td>\n",
       "      <td>False.</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3332</th>\n",
       "      <td>TN</td>\n",
       "      <td>74</td>\n",
       "      <td>415</td>\n",
       "      <td>400-4344</td>\n",
       "      <td>no</td>\n",
       "      <td>yes</td>\n",
       "      <td>25</td>\n",
       "      <td>234.4</td>\n",
       "      <td>113</td>\n",
       "      <td>39.85</td>\n",
       "      <td>265.9</td>\n",
       "      <td>82</td>\n",
       "      <td>22.60</td>\n",
       "      <td>241.4</td>\n",
       "      <td>77</td>\n",
       "      <td>10.86</td>\n",
       "      <td>13.7</td>\n",
       "      <td>4</td>\n",
       "      <td>3.70</td>\n",
       "      <td>0</td>\n",
       "      <td>False.</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>3333 rows × 21 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "     State  Account Length  Area Code     Phone Int'l Plan VMail Plan  \\\n",
       "0       KS             128        415  382-4657         no        yes   \n",
       "1       OH             107        415  371-7191         no        yes   \n",
       "2       NJ             137        415  358-1921         no         no   \n",
       "3       OH              84        408  375-9999        yes         no   \n",
       "4       OK              75        415  330-6626        yes         no   \n",
       "...    ...             ...        ...       ...        ...        ...   \n",
       "3328    AZ             192        415  414-4276         no        yes   \n",
       "3329    WV              68        415  370-3271         no         no   \n",
       "3330    RI              28        510  328-8230         no         no   \n",
       "3331    CT             184        510  364-6381        yes         no   \n",
       "3332    TN              74        415  400-4344         no        yes   \n",
       "\n",
       "      VMail Message  Day Mins  Day Calls  Day Charge  Eve Mins  Eve Calls  \\\n",
       "0                25     265.1        110       45.07     197.4         99   \n",
       "1                26     161.6        123       27.47     195.5        103   \n",
       "2                 0     243.4        114       41.38     121.2        110   \n",
       "3                 0     299.4         71       50.90      61.9         88   \n",
       "4                 0     166.7        113       28.34     148.3        122   \n",
       "...             ...       ...        ...         ...       ...        ...   \n",
       "3328             36     156.2         77       26.55     215.5        126   \n",
       "3329              0     231.1         57       39.29     153.4         55   \n",
       "3330              0     180.8        109       30.74     288.8         58   \n",
       "3331              0     213.8        105       36.35     159.6         84   \n",
       "3332             25     234.4        113       39.85     265.9         82   \n",
       "\n",
       "      Eve Charge  Night Mins  Night Calls  Night Charge  Intl Mins  \\\n",
       "0          16.78       244.7           91         11.01       10.0   \n",
       "1          16.62       254.4          103         11.45       13.7   \n",
       "2          10.30       162.6          104          7.32       12.2   \n",
       "3           5.26       196.9           89          8.86        6.6   \n",
       "4          12.61       186.9          121          8.41       10.1   \n",
       "...          ...         ...          ...           ...        ...   \n",
       "3328       18.32       279.1           83         12.56        9.9   \n",
       "3329       13.04       191.3          123          8.61        9.6   \n",
       "3330       24.55       191.9           91          8.64       14.1   \n",
       "3331       13.57       139.2          137          6.26        5.0   \n",
       "3332       22.60       241.4           77         10.86       13.7   \n",
       "\n",
       "      Intl Calls  Intl Charge  CustServ Calls  Churn?  \n",
       "0              3         2.70               1  False.  \n",
       "1              3         3.70               1  False.  \n",
       "2              5         3.29               0  False.  \n",
       "3              7         1.78               2  False.  \n",
       "4              3         2.73               3  False.  \n",
       "...          ...          ...             ...     ...  \n",
       "3328           6         2.67               2  False.  \n",
       "3329           4         2.59               3  False.  \n",
       "3330           6         3.81               2  False.  \n",
       "3331          10         1.35               2  False.  \n",
       "3332           4         3.70               0  False.  \n",
       "\n",
       "[3333 rows x 21 columns]"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "churn = pd.read_csv(\"./churn.txt\")\n",
    "pd.set_option(\"display.max_columns\", 500)\n",
    "churn"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "By modern standards, it’s a relatively small dataset, with only 5,000 records, where each record uses 21 attributes to describe the profile of a customer of an unknown US mobile operator. The attributes are:\n",
    "\n",
    "- `State`: the US state in which the customer resides, indicated by a two-letter abbreviation; for example, OH or NJ\n",
    "- `Account Length`: the number of days that this account has been active\n",
    "- `Area Code`: the three-digit area code of the corresponding customer’s phone number\n",
    "- `Phone`: the remaining seven-digit phone number\n",
    "- `Int’l Plan`: whether the customer has an international calling plan: yes/no\n",
    "- `VMail Plan`: whether the customer has a voice mail feature: yes/no\n",
    "- `VMail Message`: presumably the average number of voice mail messages per month\n",
    "- `Day Mins`: the total number of calling minutes used during the day\n",
    "- `Day Calls`: the total number of calls placed during the day\n",
    "- `Day Charge`: the billed cost of daytime calls\n",
    "- `Eve Mins, Eve Calls, Eve Charge`: the billed cost for calls placed during the evening\n",
    "- `Night Mins`, `Night Calls`, `Night Charge`: the billed cost for calls placed during nighttime\n",
    "- `Intl Mins`, `Intl Calls`, `Intl Charge`: the billed cost for international calls\n",
    "- `CustServ Calls`: the number of calls placed to Customer Service\n",
    "- `Churn?`: whether the customer left the service: true/false\n",
    "\n",
    "The last attribute, `Churn?`, is known as the target attribute–the attribute that we want the ML model to predict."
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Reserve some data for calling inference on the model\n",
    "\n",
    "Divide the data into training and testing splits. The training split is used by SageMaker Autopilot. The testing split is reserved to perform inference using the suggested model."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [],
   "source": [
    "train_data = churn.sample(frac=0.8, random_state=200)\n",
    "\n",
    "test_data = churn.drop(train_data.index)\n",
    "\n",
    "test_data_no_target = test_data.drop(columns=[\"Churn?\"])"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now we'll upload these files to S3."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Train data uploaded to: s3://sagemaker-us-west-2-262002448484/sagemaker/DEMO-autopilot-churn/train/train_data.csv\n",
      "Test data uploaded to: s3://sagemaker-us-west-2-262002448484/sagemaker/DEMO-autopilot-churn/test/test_data.csv\n"
     ]
    }
   ],
   "source": [
    "train_file = \"train_data.csv\"\n",
    "train_data.to_csv(train_file, index=False, header=True)\n",
    "train_data_s3_path = session.upload_data(path=train_file, key_prefix=prefix + \"/train\")\n",
    "print(\"Train data uploaded to: \" + train_data_s3_path)\n",
    "\n",
    "test_file = \"test_data.csv\"\n",
    "test_data_no_target.to_csv(test_file, index=False, header=False)\n",
    "test_data_s3_path = session.upload_data(path=test_file, key_prefix=prefix + \"/test\")\n",
    "print(\"Test data uploaded to: \" + test_data_s3_path)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "---\n",
    "## Setting up the SageMaker Autopilot Job<a name=\"Settingup\"></a>\n",
    "\n",
    "After uploading the dataset to Amazon S3, you can invoke Autopilot to find the best ML pipeline to train a model on this dataset. \n",
    "\n",
    "The required inputs for invoking a Autopilot job are:\n",
    "* Amazon S3 location for input dataset and for all output artifacts\n",
    "* Name of the column of the dataset you want to predict (`Churn?` in this case) \n",
    "* An IAM role\n",
    "\n",
    "Currently Autopilot supports only tabular datasets in CSV format. Either all files should have a header row, or the first file of the dataset, when sorted in alphabetical/lexical order by name, is expected to have a header row."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [],
   "source": [
    "input_data_config = [\n",
    "    {\n",
    "        \"DataSource\": {\n",
    "            \"S3DataSource\": {\n",
    "                \"S3DataType\": \"S3Prefix\",\n",
    "                \"S3Uri\": \"s3://{}/{}/train\".format(bucket, prefix),\n",
    "            }\n",
    "        },\n",
    "        \"TargetAttributeName\": \"Churn?\",\n",
    "    }\n",
    "]\n",
    "\n",
    "output_data_config = {\"S3OutputPath\": \"s3://{}/{}/output\".format(bucket, prefix)}"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You can also specify the type of problem you want to solve with your dataset (`Regression, MulticlassClassification, BinaryClassification`). In case you are not sure, SageMaker Autopilot will infer the problem type based on statistics of the target column (the column you want to predict). \n",
    "\n",
    "Because the target attribute, ```Churn?```, is binary, our model will be performing binary prediction, also known as binary classification. In this example we will let AutoPilot infer the type of problem for us.\n",
    "\n",
    "You have the option to limit the running time of a SageMaker Autopilot job by providing either the maximum number of pipeline evaluations or candidates (one pipeline evaluation is called a `Candidate` because it generates a candidate model) or providing the total time allocated for the overall Autopilot job. Under default settings, this job takes about four hours to run. This varies between runs because of the nature of the exploratory process Autopilot uses to find optimal training parameters."
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Launching the SageMaker Autopilot Job<a name=\"Launching\"></a>\n",
    "\n",
    "You can now launch the Autopilot job by calling the `create_auto_ml_job` API. We limit the number of candidates to 20 so that the job finishes in a few minutes."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "AutoMLJobName: automl-churn-03-17-50-14\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "{'AutoMLJobArn': 'arn:aws:sagemaker:us-west-2:262002448484:automl-job/automl-churn-03-17-50-14',\n",
       " 'ResponseMetadata': {'RequestId': '346f751d-4738-4c0f-86a4-513f15e2b2ea',\n",
       "  'HTTPStatusCode': 200,\n",
       "  'HTTPHeaders': {'x-amzn-requestid': '346f751d-4738-4c0f-86a4-513f15e2b2ea',\n",
       "   'content-type': 'application/x-amz-json-1.1',\n",
       "   'content-length': '95',\n",
       "   'date': 'Wed, 03 Feb 2021 17:50:14 GMT'},\n",
       "  'RetryAttempts': 0}}"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from time import gmtime, strftime, sleep\n",
    "\n",
    "timestamp_suffix = strftime(\"%d-%H-%M-%S\", gmtime())\n",
    "\n",
    "auto_ml_job_name = \"automl-churn-\" + timestamp_suffix\n",
    "print(\"AutoMLJobName: \" + auto_ml_job_name)\n",
    "\n",
    "sm.create_auto_ml_job(\n",
    "    AutoMLJobName=auto_ml_job_name,\n",
    "    InputDataConfig=input_data_config,\n",
    "    OutputDataConfig=output_data_config,\n",
    "    AutoMLJobConfig={\"CompletionCriteria\": {\"MaxCandidates\": 20}},\n",
    "    RoleArn=role,\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Stored 'auto_ml_job_name' (str)\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "'automl-churn-03-17-50-14'"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Store the AutoMLJobName name for use in subsequent notebooks\n",
    "%store auto_ml_job_name\n",
    "auto_ml_job_name"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Tracking SageMaker Autopilot job progress<a name=\"Tracking\"></a>\n",
    "SageMaker Autopilot job consists of the following high-level steps : \n",
    "* Analyzing Data, where the dataset is analyzed and Autopilot comes up with a list of ML pipelines that should be tried out on the dataset. The dataset is also split into train and validation sets.\n",
    "* Feature Engineering, where Autopilot performs feature transformation on individual features of the dataset as well as at an aggregate level.\n",
    "* Model Tuning, where the top performing pipeline is selected along with the optimal hyperparameters for the training algorithm (the last stage of the pipeline). "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "JobStatus - Secondary Status\n",
      "------------------------------\n",
      "InProgress - Starting\n",
      "InProgress - Starting\n",
      "InProgress - AnalyzingData\n",
      "InProgress - AnalyzingData\n",
      "InProgress - AnalyzingData\n",
      "InProgress - AnalyzingData\n",
      "InProgress - AnalyzingData\n",
      "InProgress - AnalyzingData\n",
      "InProgress - AnalyzingData\n",
      "InProgress - AnalyzingData\n",
      "InProgress - AnalyzingData\n",
      "InProgress - AnalyzingData\n",
      "InProgress - AnalyzingData\n",
      "InProgress - AnalyzingData\n",
      "InProgress - AnalyzingData\n",
      "InProgress - AnalyzingData\n",
      "InProgress - AnalyzingData\n",
      "InProgress - AnalyzingData\n",
      "InProgress - AnalyzingData\n",
      "InProgress - FeatureEngineering\n",
      "InProgress - FeatureEngineering\n",
      "InProgress - FeatureEngineering\n",
      "InProgress - FeatureEngineering\n",
      "InProgress - FeatureEngineering\n",
      "InProgress - FeatureEngineering\n",
      "InProgress - FeatureEngineering\n",
      "InProgress - FeatureEngineering\n",
      "InProgress - FeatureEngineering\n",
      "InProgress - FeatureEngineering\n",
      "InProgress - FeatureEngineering\n",
      "InProgress - FeatureEngineering\n",
      "InProgress - FeatureEngineering\n",
      "InProgress - FeatureEngineering\n",
      "InProgress - FeatureEngineering\n",
      "InProgress - FeatureEngineering\n",
      "InProgress - FeatureEngineering\n",
      "InProgress - FeatureEngineering\n",
      "InProgress - ModelTuning\n",
      "InProgress - ModelTuning\n",
      "InProgress - ModelTuning\n",
      "InProgress - ModelTuning\n",
      "InProgress - ModelTuning\n",
      "InProgress - ModelTuning\n",
      "InProgress - ModelTuning\n",
      "InProgress - ModelTuning\n",
      "InProgress - ModelTuning\n",
      "InProgress - ModelTuning\n",
      "InProgress - ModelTuning\n",
      "InProgress - ModelTuning\n",
      "InProgress - ModelTuning\n",
      "InProgress - ModelTuning\n",
      "InProgress - ModelTuning\n",
      "InProgress - ModelTuning\n",
      "Completed - MaxCandidatesReached\n"
     ]
    }
   ],
   "source": [
    "print(\"JobStatus - Secondary Status\")\n",
    "print(\"------------------------------\")\n",
    "\n",
    "\n",
    "describe_response = sm.describe_auto_ml_job(AutoMLJobName=auto_ml_job_name)\n",
    "print(describe_response[\"AutoMLJobStatus\"] + \" - \" + describe_response[\"AutoMLJobSecondaryStatus\"])\n",
    "job_run_status = describe_response[\"AutoMLJobStatus\"]\n",
    "\n",
    "while job_run_status not in (\"Failed\", \"Completed\", \"Stopped\"):\n",
    "    describe_response = sm.describe_auto_ml_job(AutoMLJobName=auto_ml_job_name)\n",
    "    job_run_status = describe_response[\"AutoMLJobStatus\"]\n",
    "\n",
    "    print(\n",
    "        describe_response[\"AutoMLJobStatus\"] + \" - \" + describe_response[\"AutoMLJobSecondaryStatus\"]\n",
    "    )\n",
    "    sleep(30)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**AutoPilot automatically generates two executable Jupyter Notebooks:  <br> - SageMakerAutopilotDataExplorationNotebook.ipynb <br> and <br> - SageMakerAutopilotCandidateDefinitionNotebook.ipynb. <br> These notebooks are stored in S3. Let us download them onto our SageMaker Notebook instance so we could explore them later.** "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "s3://sagemaker-us-west-2-262002448484/sagemaker/DEMO-autopilot-churn/output/automl-churn-03-17-50-14/sagemaker-automl-candidates/pr-1-087be24846d8436faecf8de3c2d70c10cd7e5218bb62499f822657956b/notebooks/SageMakerAutopilotCandidateDefinitionNotebook.ipynb\n",
      "s3://sagemaker-us-west-2-262002448484/sagemaker/DEMO-autopilot-churn/output/automl-churn-03-17-50-14/sagemaker-automl-candidates/pr-1-087be24846d8436faecf8de3c2d70c10cd7e5218bb62499f822657956b/notebooks/SageMakerAutopilotDataExplorationNotebook.ipynb\n"
     ]
    }
   ],
   "source": [
    "# print(describe_response)\n",
    "print(describe_response[\"AutoMLJobArtifacts\"][\"CandidateDefinitionNotebookLocation\"])\n",
    "print(describe_response[\"AutoMLJobArtifacts\"][\"DataExplorationNotebookLocation\"])\n",
    "\n",
    "candidate_nbk = describe_response[\"AutoMLJobArtifacts\"][\"CandidateDefinitionNotebookLocation\"]\n",
    "data_explore_nbk = describe_response[\"AutoMLJobArtifacts\"][\"DataExplorationNotebookLocation\"]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "sagemaker-us-west-2-262002448484 sagemaker/DEMO-autopilot-churn/output/automl-churn-03-17-50-14/sagemaker-automl-candidates/pr-1-087be24846d8436faecf8de3c2d70c10cd7e5218bb62499f822657956b/notebooks/SageMakerAutopilotCandidateDefinitionNotebook.ipynb sagemaker/DEMO-autopilot-churn/output/automl-churn-03-17-50-14/sagemaker-automl-candidates/pr-1-087be24846d8436faecf8de3c2d70c10cd7e5218bb62499f822657956b/notebooks/SageMakerAutopilotDataExplorationNotebook.ipynb\n"
     ]
    }
   ],
   "source": [
    "def split_s3_path(s3_path):\n",
    "    path_parts = s3_path.replace(\"s3://\", \"\").split(\"/\")\n",
    "    bucket = path_parts.pop(0)\n",
    "    key = \"/\".join(path_parts)\n",
    "    return bucket, key\n",
    "\n",
    "\n",
    "s3_bucket, candidate_nbk_key = split_s3_path(candidate_nbk)\n",
    "_, data_explore_nbk_key = split_s3_path(data_explore_nbk)\n",
    "\n",
    "print(s3_bucket, candidate_nbk_key, data_explore_nbk_key)\n",
    "\n",
    "session.download_data(path=\"./\", bucket=s3_bucket, key_prefix=candidate_nbk_key)\n",
    "\n",
    "session.download_data(path=\"./\", bucket=s3_bucket, key_prefix=data_explore_nbk_key)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {
    "toc-hr-collapsed": true
   },
   "source": [
    "---\n",
    "## Results\n",
    "\n",
    "Now use the describe_auto_ml_job API to look up the best candidate selected by the SageMaker Autopilot job. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "{'CandidateName': 'tuning-job-1-3572d1c0effe4c2db9-017-ab2e9b77', 'FinalAutoMLJobObjectiveMetric': {'MetricName': 'validation:f1', 'Value': 0.923229992389679}, 'ObjectiveStatus': 'Succeeded', 'CandidateSteps': [{'CandidateStepType': 'AWS::SageMaker::ProcessingJob', 'CandidateStepArn': 'arn:aws:sagemaker:us-west-2:262002448484:processing-job/db-1-dc376427266845c88039d2bdb3410fdf23ec309a025e4f1eb84b5eb853', 'CandidateStepName': 'db-1-dc376427266845c88039d2bdb3410fdf23ec309a025e4f1eb84b5eb853'}, {'CandidateStepType': 'AWS::SageMaker::TrainingJob', 'CandidateStepArn': 'arn:aws:sagemaker:us-west-2:262002448484:training-job/automl-chu-dpp5-1-705bf26de0f842c3b3cd5e0e57f12bce7834aa5c88b04', 'CandidateStepName': 'automl-chu-dpp5-1-705bf26de0f842c3b3cd5e0e57f12bce7834aa5c88b04'}, {'CandidateStepType': 'AWS::SageMaker::TransformJob', 'CandidateStepArn': 'arn:aws:sagemaker:us-west-2:262002448484:transform-job/automl-chu-dpp5-rpb-1-61fcb42b552d4a4b8ab2655f68ff1f4e8048ada21', 'CandidateStepName': 'automl-chu-dpp5-rpb-1-61fcb42b552d4a4b8ab2655f68ff1f4e8048ada21'}, {'CandidateStepType': 'AWS::SageMaker::TrainingJob', 'CandidateStepArn': 'arn:aws:sagemaker:us-west-2:262002448484:training-job/tuning-job-1-3572d1c0effe4c2db9-017-ab2e9b77', 'CandidateStepName': 'tuning-job-1-3572d1c0effe4c2db9-017-ab2e9b77'}], 'CandidateStatus': 'Completed', 'InferenceContainers': [{'Image': '246618743249.dkr.ecr.us-west-2.amazonaws.com/sagemaker-sklearn-automl:0.2-1-cpu-py3', 'ModelDataUrl': 's3://sagemaker-us-west-2-262002448484/sagemaker/DEMO-autopilot-churn/output/automl-churn-03-17-50-14/data-processor-models/automl-chu-dpp5-1-705bf26de0f842c3b3cd5e0e57f12bce7834aa5c88b04/output/model.tar.gz', 'Environment': {'AUTOML_SPARSE_ENCODE_RECORDIO_PROTOBUF': '1', 'AUTOML_TRANSFORM_MODE': 'feature-transform', 'SAGEMAKER_DEFAULT_INVOCATIONS_ACCEPT': 'application/x-recordio-protobuf', 'SAGEMAKER_PROGRAM': 'sagemaker_serve', 'SAGEMAKER_SUBMIT_DIRECTORY': '/opt/ml/model/code'}}, {'Image': '246618743249.dkr.ecr.us-west-2.amazonaws.com/sagemaker-xgboost:1.0-1-cpu-py3', 'ModelDataUrl': 's3://sagemaker-us-west-2-262002448484/sagemaker/DEMO-autopilot-churn/output/automl-churn-03-17-50-14/tuning/automl-chu-dpp5-xgb/tuning-job-1-3572d1c0effe4c2db9-017-ab2e9b77/output/model.tar.gz', 'Environment': {'MAX_CONTENT_LENGTH': '20971520', 'SAGEMAKER_DEFAULT_INVOCATIONS_ACCEPT': 'text/csv', 'SAGEMAKER_INFERENCE_OUTPUT': 'predicted_label', 'SAGEMAKER_INFERENCE_SUPPORTED': 'predicted_label,probability,probabilities'}}, {'Image': '246618743249.dkr.ecr.us-west-2.amazonaws.com/sagemaker-sklearn-automl:0.2-1-cpu-py3', 'ModelDataUrl': 's3://sagemaker-us-west-2-262002448484/sagemaker/DEMO-autopilot-churn/output/automl-churn-03-17-50-14/data-processor-models/automl-chu-dpp5-1-705bf26de0f842c3b3cd5e0e57f12bce7834aa5c88b04/output/model.tar.gz', 'Environment': {'AUTOML_TRANSFORM_MODE': 'inverse-label-transform', 'SAGEMAKER_DEFAULT_INVOCATIONS_ACCEPT': 'text/csv', 'SAGEMAKER_INFERENCE_INPUT': 'predicted_label', 'SAGEMAKER_INFERENCE_OUTPUT': 'predicted_label', 'SAGEMAKER_INFERENCE_SUPPORTED': 'predicted_label,probability,labels,probabilities', 'SAGEMAKER_PROGRAM': 'sagemaker_serve', 'SAGEMAKER_SUBMIT_DIRECTORY': '/opt/ml/model/code'}}], 'CreationTime': datetime.datetime(2021, 2, 3, 18, 12, 19, tzinfo=tzlocal()), 'EndTime': datetime.datetime(2021, 2, 3, 18, 13, 2, tzinfo=tzlocal()), 'LastModifiedTime': datetime.datetime(2021, 2, 3, 18, 15, 52, 159000, tzinfo=tzlocal())}\n",
      "\n",
      "\n",
      "CandidateName: tuning-job-1-3572d1c0effe4c2db9-017-ab2e9b77\n",
      "FinalAutoMLJobObjectiveMetricName: validation:f1\n",
      "FinalAutoMLJobObjectiveMetricValue: 0.923229992389679\n"
     ]
    }
   ],
   "source": [
    "best_candidate = sm.describe_auto_ml_job(AutoMLJobName=auto_ml_job_name)[\"BestCandidate\"]\n",
    "best_candidate_name = best_candidate[\"CandidateName\"]\n",
    "print(best_candidate)\n",
    "print(\"\\n\")\n",
    "print(\"CandidateName: \" + best_candidate_name)\n",
    "print(\n",
    "    \"FinalAutoMLJobObjectiveMetricName: \"\n",
    "    + best_candidate[\"FinalAutoMLJobObjectiveMetric\"][\"MetricName\"]\n",
    ")\n",
    "print(\n",
    "    \"FinalAutoMLJobObjectiveMetricValue: \"\n",
    "    + str(best_candidate[\"FinalAutoMLJobObjectiveMetric\"][\"Value\"])\n",
    ")"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Due to some randomness in the algorithms involved, different runs will provide slightly different results, but accuracy will be around or above $93\\%$, which is a good result."
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**If you are curious to explore the performance of other algorithms that AutoPilot explored, they are enumerated for you below via list_candidates_for_auto_ml_job() API call**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 56,
   "metadata": {},
   "outputs": [],
   "source": [
    "# sm.describe_auto_ml_job(AutoMLJobName=auto_ml_job_name)\n",
    "# sm.list_auto_ml_jobs()\n",
    "sm_dict = sm.list_candidates_for_auto_ml_job(AutoMLJobName=auto_ml_job_name)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 109,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "tuning-job-1-3572d1c0effe4c2db9-020-e82606f5 {'MetricName': 'validation:binary_f_beta', 'Value': 0.35087719559669495}\n",
      "174872318107.dkr.ecr.us-west-2.amazonaws.com/linear-learner:latest \n",
      "\n",
      "tuning-job-1-3572d1c0effe4c2db9-019-343c7bdb {'MetricName': 'validation:f1', 'Value': 0.752780020236969}\n",
      "246618743249.dkr.ecr.us-west-2.amazonaws.com/sagemaker-xgboost:1.0-1-cpu-py3 \n",
      "\n",
      "tuning-job-1-3572d1c0effe4c2db9-018-7eb3911a {'MetricName': 'validation:f1', 'Value': 0.923229992389679}\n",
      "246618743249.dkr.ecr.us-west-2.amazonaws.com/sagemaker-xgboost:1.0-1-cpu-py3 \n",
      "\n",
      "tuning-job-1-3572d1c0effe4c2db9-017-ab2e9b77 {'MetricName': 'validation:f1', 'Value': 0.923229992389679}\n",
      "246618743249.dkr.ecr.us-west-2.amazonaws.com/sagemaker-xgboost:1.0-1-cpu-py3 \n",
      "\n",
      "tuning-job-1-3572d1c0effe4c2db9-016-5aa287c4 {'MetricName': 'validation:binary_f_beta', 'Value': 0.23999999463558197}\n",
      "174872318107.dkr.ecr.us-west-2.amazonaws.com/linear-learner:latest \n",
      "\n",
      "tuning-job-1-3572d1c0effe4c2db9-015-56ca82e5 {'MetricName': 'validation:binary_f_beta', 'Value': 0.23999999463558197}\n",
      "174872318107.dkr.ecr.us-west-2.amazonaws.com/linear-learner:latest \n",
      "\n",
      "tuning-job-1-3572d1c0effe4c2db9-014-e9ec7e8c {'MetricName': 'validation:f1', 'Value': 0.8602499961853027}\n",
      "246618743249.dkr.ecr.us-west-2.amazonaws.com/sagemaker-xgboost:1.0-1-cpu-py3 \n",
      "\n",
      "tuning-job-1-3572d1c0effe4c2db9-013-b86fdc91 {'MetricName': 'validation:f1', 'Value': 0.8618000149726868}\n",
      "246618743249.dkr.ecr.us-west-2.amazonaws.com/sagemaker-xgboost:1.0-1-cpu-py3 \n",
      "\n",
      "tuning-job-1-3572d1c0effe4c2db9-012-c8883e70 {'MetricName': 'validation:f1', 'Value': 0.7779200077056885}\n",
      "246618743249.dkr.ecr.us-west-2.amazonaws.com/sagemaker-xgboost:1.0-1-cpu-py3 \n",
      "\n",
      "tuning-job-1-3572d1c0effe4c2db9-011-c4159396 {'MetricName': 'validation:binary_f_beta', 'Value': 0.47926267981529236}\n",
      "174872318107.dkr.ecr.us-west-2.amazonaws.com/linear-learner:latest \n",
      "\n"
     ]
    }
   ],
   "source": [
    "for item in sm_dict[\"Candidates\"]:\n",
    "    print(item[\"CandidateName\"], item[\"FinalAutoMLJobObjectiveMetric\"])\n",
    "    print(item[\"InferenceContainers\"][1][\"Image\"], \"\\n\")"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "---\n",
    "## Host\n",
    "\n",
    "Now that we've trained the algorithm, let's create a model and deploy it to a hosted endpoint."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [],
   "source": [
    "timestamp_suffix = strftime(\"%d-%H-%M-%S\", gmtime())\n",
    "model_name = best_candidate_name + timestamp_suffix + \"-model\"\n",
    "model_arn = sm.create_model(\n",
    "    Containers=best_candidate[\"InferenceContainers\"], ModelName=model_name, ExecutionRoleArn=role\n",
    ")\n",
    "\n",
    "epc_name = best_candidate_name + timestamp_suffix + \"-epc\"\n",
    "ep_config = sm.create_endpoint_config(\n",
    "    EndpointConfigName=epc_name,\n",
    "    ProductionVariants=[\n",
    "        {\n",
    "            \"InstanceType\": \"ml.m5.2xlarge\",\n",
    "            \"InitialInstanceCount\": 1,\n",
    "            \"ModelName\": model_name,\n",
    "            \"VariantName\": \"main\",\n",
    "        }\n",
    "    ],\n",
    ")\n",
    "\n",
    "ep_name = best_candidate_name + timestamp_suffix + \"-ep\"\n",
    "create_endpoint_response = sm.create_endpoint(EndpointName=ep_name, EndpointConfigName=epc_name)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {},
   "outputs": [],
   "source": [
    "sm.get_waiter(\"endpoint_in_service\").wait(EndpointName=ep_name)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Evaluate\n",
    "\n",
    "Now that we have a hosted endpoint running, we can make real-time predictions from our model very easily, simply by making an http POST request.  But first, we'll need to setup serializers and deserializers for passing our `test_data` NumPy arrays to the model behind the endpoint."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Accuracy: 0.9610194902548725\n"
     ]
    }
   ],
   "source": [
    "from io import StringIO\n",
    "\n",
    "if sagemaker.__version__ < \"2\":\n",
    "    from sagemaker.predictor import RealTimePredictor\n",
    "    from sagemaker.content_types import CONTENT_TYPE_CSV\n",
    "\n",
    "    predictor = RealTimePredictor(\n",
    "        endpoint=ep_name,\n",
    "        sagemaker_session=session,\n",
    "        content_type=CONTENT_TYPE_CSV,\n",
    "        accept=CONTENT_TYPE_CSV,\n",
    "    )\n",
    "\n",
    "    # Remove the target column from the test data\n",
    "    test_data_inference = test_data.drop(\"Churn?\", axis=1)\n",
    "\n",
    "    # Obtain predictions from SageMaker endpoint\n",
    "    prediction = predictor.predict(\n",
    "        test_data_inference.to_csv(sep=\",\", header=False, index=False)\n",
    "    ).decode(\"utf-8\")\n",
    "\n",
    "    # Load prediction in pandas and compare to ground truth\n",
    "    prediction_df = pd.read_csv(StringIO(prediction), header=None)\n",
    "    accuracy = (test_data.reset_index()[\"Churn?\"] == prediction_df[0]).sum() / len(\n",
    "        test_data_inference\n",
    "    )\n",
    "    print(\"Accuracy: {}\".format(accuracy))\n",
    "\n",
    "else:\n",
    "    from sagemaker.predictor import Predictor\n",
    "    from sagemaker.serializers import CSVSerializer\n",
    "    from sagemaker.deserializers import CSVDeserializer\n",
    "\n",
    "    predictor = Predictor(\n",
    "        endpoint_name=ep_name,\n",
    "        sagemaker_session=session,\n",
    "        serializer=CSVSerializer(),\n",
    "        deserializer=CSVDeserializer(),\n",
    "    )\n",
    "\n",
    "    # Remove the target column from the test data\n",
    "    test_data_inference = test_data.drop(\"Churn?\", axis=1)\n",
    "\n",
    "    # Obtain predictions from SageMaker endpoint\n",
    "    prediction = predictor.predict(test_data_inference.to_csv(sep=\",\", header=False, index=False))\n",
    "\n",
    "    # Load prediction in pandas and compare to ground truth\n",
    "    prediction_df = pd.DataFrame(prediction)\n",
    "    accuracy = (test_data.reset_index()[\"Churn?\"] == prediction_df[0]).sum() / len(\n",
    "        test_data_inference\n",
    "    )\n",
    "    print(\"Accuracy: {}\".format(accuracy))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Accuracy: 0.9610194902548725\n",
      "Precision: 0.896551724137931\n",
      "Recall: 0.8210526315789474\n",
      "F1: 0.8571428571428572\n"
     ]
    }
   ],
   "source": [
    "from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score\n",
    "\n",
    "accuracy = accuracy_score(test_data.reset_index()[\"Churn?\"], prediction_df[0])\n",
    "precision = precision_score(test_data.reset_index()[\"Churn?\"], prediction_df[0], pos_label=\"True.\")\n",
    "recall = recall_score(\n",
    "    test_data.reset_index()[\"Churn?\"], prediction_df[0], pos_label=\"True.\", average=\"binary\"\n",
    ")\n",
    "f1 = f1_score(test_data.reset_index()[\"Churn?\"], prediction_df[0], pos_label=\"True.\")\n",
    "\n",
    "print(\"Accuracy: {}\".format(accuracy))\n",
    "print(\"Precision: {}\".format(precision))\n",
    "print(\"Recall: {}\".format(recall))\n",
    "print(\"F1: {}\".format(f1))"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "---\n",
    "## Cleanup\n",
    "\n",
    "The Autopilot job creates many underlying artifacts such as dataset splits, preprocessing scripts, or preprocessed data, etc. This code, when un-commented, deletes them. This operation deletes all the generated models and the auto-generated notebooks as well. "
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "s3 = boto3.resource('s3')\n",
    "s3_bucket = s3.Bucket(bucket)\n",
    "\n",
    "print(s3_bucket)\n",
    "job_outputs_prefix = '{}/output/{}'.format(prefix, auto_ml_job_name)\n",
    "print(job_outputs_prefix)\n",
    "#s3_bucket.objects.filter(Prefix=job_outputs_prefix).delete()"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Finally, we delete the endpoint and associated resources."
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "sm.delete_endpoint(EndpointName=ep_name)\n",
    "sm.delete_endpoint_config(EndpointConfigName=epc_name)\n",
    "sm.delete_model(ModelName=model_name)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Notebook CI Test Results\n",
    "\n",
    "This notebook was tested in multiple regions. The test results are as follows, except for us-west-2 which is shown at the top of the notebook.\n",
    "\n",
    "![This us-east-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/us-east-1/autopilot|autopilot_customer_churn.ipynb)\n",
    "\n",
    "![This us-east-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/us-east-2/autopilot|autopilot_customer_churn.ipynb)\n",
    "\n",
    "![This us-west-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/us-west-1/autopilot|autopilot_customer_churn.ipynb)\n",
    "\n",
    "![This ca-central-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/ca-central-1/autopilot|autopilot_customer_churn.ipynb)\n",
    "\n",
    "![This sa-east-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/sa-east-1/autopilot|autopilot_customer_churn.ipynb)\n",
    "\n",
    "![This eu-west-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/eu-west-1/autopilot|autopilot_customer_churn.ipynb)\n",
    "\n",
    "![This eu-west-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/eu-west-2/autopilot|autopilot_customer_churn.ipynb)\n",
    "\n",
    "![This eu-west-3 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/eu-west-3/autopilot|autopilot_customer_churn.ipynb)\n",
    "\n",
    "![This eu-central-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/eu-central-1/autopilot|autopilot_customer_churn.ipynb)\n",
    "\n",
    "![This eu-north-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/eu-north-1/autopilot|autopilot_customer_churn.ipynb)\n",
    "\n",
    "![This ap-southeast-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/ap-southeast-1/autopilot|autopilot_customer_churn.ipynb)\n",
    "\n",
    "![This ap-southeast-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/ap-southeast-2/autopilot|autopilot_customer_churn.ipynb)\n",
    "\n",
    "![This ap-northeast-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/ap-northeast-1/autopilot|autopilot_customer_churn.ipynb)\n",
    "\n",
    "![This ap-northeast-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/ap-northeast-2/autopilot|autopilot_customer_churn.ipynb)\n",
    "\n",
    "![This ap-south-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/ap-south-1/autopilot|autopilot_customer_churn.ipynb)\n"
   ]
  }
 ],
 "metadata": {
  "celltoolbar": "Tags",
  "instance_type": "ml.t3.medium",
  "kernelspec": {
   "display_name": "Python 3 (Data Science 3.0)",
   "language": "python",
   "name": "python3__SAGEMAKER_INTERNAL__arn:aws:sagemaker:us-east-1:081325390199:image/sagemaker-data-science-310-v1"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.6"
  },
  "notice": "Copyright 2017 Amazon.com, Inc. or its affiliates. All Rights Reserved.  Licensed under the Apache License, Version 2.0 (the \"License\"). You may not use this file except in compliance with the License. A copy of the License is located at http://aws.amazon.com/apache2.0/ or in the \"license\" file accompanying this file. This file is distributed on an \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License."
 },
 "nbformat": 4,
 "nbformat_minor": 4
}