# Deploy Autopilot models to serverless inference endpoints

---

This notebook's CI test result for us-west-2 is as follows. CI test results in other regions can be found at the end of the notebook. 

![This us-west-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/us-west-2/autopilot|autopilot-serverless-inference|autopilot-models-serverless-inference.ipynb)

---

Amazon SageMaker Serverless Inference is a purpose-built inference option that makes it easy for customers to deploy and scale ML models. Serverless Inference is ideal for workloads which have idle periods between traffic spurts and can tolerate cold starts. Serverless endpoints also automatically launch compute resources and scale them in and out depending on traffic, eliminating the need to choose instance types or manage scaling policies.

In this notebook we use models generated with Amazon SageMaker Autopilot and then deploy these models to serverless endpoints.

We will be using the public [UCI Direct Marketing](https://archive.ics.uci.edu/ml/datasets/bank+marketing) dataset for this example.

**Notebook Settings:**

- **SageMaker Classic Notebook Instance:** `ml.t3.xlarge` Notebook Instance & `conda_python3` Kernel
- **SageMaker Studio:** `Python 3 (Data Science 2.0) Kernel`
- **Regions Available:** SageMaker Serverless Inference is currently available in the following regions: 
        US East (Northern Virginia), US East (Ohio), US West (Oregon), EU (Ireland), Asia Pacific (Tokyo) and Asia Pacific (Sydney)


---

## Prerequisites
Let's ensure we have the latest packages installed. For this notebook, we need the below versions for `sagemaker` and `boto3` packages
1. sagemaker >= `2.110.0`
2. boto3 >= `boto3-1.24.84`


In [None]:
!pip install -U awscli sagemaker boto3 --quiet

In [None]:
import boto3
import sagemaker
import sys

print(f"SageMaker Version: {sagemaker.__version__}")
print(f"Boto3 Version: {boto3.__version__}")

---

## Setup
Import packages, establish session and unique ID for job name suffix

In [None]:
# Import required libraries
import os
import json
import itertools
import numpy as np
import pandas as pd

from datetime import datetime
from time import gmtime, strftime, sleep
from uuid import uuid4
from IPython import display


# Define region, bucket
session = sagemaker.Session()
region = boto3.Session().region_name
bucket = session.default_bucket()
# use the below for default SageMaker execution role else replace with your own IAM Role ARN
role = sagemaker.get_execution_role()

prefix = "autopilot/bankadditional"

today = datetime.now().strftime("%d%b%Y")
timestamp_suffix = f"{str(uuid4())[:6]}-{today}"

# Define sagemaker client object to invoke Sagemaker services
sm_client = boto3.client("sagemaker", region_name=region)

# Set prefix for AutoML jobnames. Let's keep the prefix short. We use suffixes to distinguish job names.
automl_job_prefix = "bankmrkt"  # 6-8 chars max
model_prefix = automl_job_prefix

print(f"Bucket: s3://{bucket}/{prefix}")
print(f"Region: {region}")
print(f"Role: {role}")
print(f"Job and model prefix string: {automl_job_prefix}")
print(f"suffix string: {timestamp_suffix}")

## Dataset
This example uses [UCI direct marketing dataset](https://archive.ics.uci.edu/ml/datasets/Bank+Marketing):

[Moro et al., 2014] S. Moro, P. Cortez and P. Rita. A Data-Driven Approach to Predict the Success of Bank Telemarketing. Decision Support Systems, Elsevier, 62:22-31, June 2014

Download dataset from `sagemaker-example-files-prod-{region}` s3 bucket:

In [None]:
from sagemaker.s3 import S3Downloader

s3uri = f"s3://sagemaker-example-files-prod-{region}/datasets/tabular/uci_bank_marketing/bank-additional-full.csv"

if not os.path.exists('data/bank-additional/bank-additional-full.csv'):
    print("Downloading bank-additional-full.csv...")
    !mkdir -p data/bank-additional
    S3Downloader.download(s3_uri=s3uri, local_path="data/bank-additional", sagemaker_session=session)
    print("Done")
else:
    print("Skipping download..dataset exists at ./data/bank-additional")

### Visualize dataset
The data is related with direct marketing campaigns (phone calls) of a Portuguese banking institution. The classification goal is to predict if the client will subscribe a term deposit (variable y).

Problem Type: **Binary Classification**

Ref: <https://archive.ics.uci.edu/ml/datasets/bank+marketing>


In [None]:
df_data = pd.read_csv("./data/bank-additional/bank-additional-full.csv")

pd.set_option("display.max_columns", 10)  # View all of the columns
df_data  # show first 5 and last 5 rows of the dataframe

## Upload dataset to S3
We upload the `bank-additional-full.csv` to S3.

In [None]:
# Set this flag to False for subsequent runs of this notebook
upload_dataset = True

In [None]:
DATA_FILE = "data/bank-additional/bank-additional-full.csv"

if upload_dataset:
    print(f"Uploading data to s3...")
    dataset_s3uri = session.upload_data(DATA_FILE, key_prefix=f"{prefix}/raw")
    print(f"Data uploaded to : \n {dataset_s3uri}")
else:
    dataset_s3uri = f"s3://{bucket}/{prefix}/raw/bank-additional-full.csv"
    print(f"Skipping upload .. dataset is under: {dataset_s3uri}")

---

## Launch Autopilot jobs in `ENSEMBLING` and `HPO` modes


First we specify the AutoML job config constants
- `TargetAttributeName` (Target column `y` for your dataset)
- `Training Mode` - `Valid values: AUTO | ENSEMBLING | HYPERPARAMETER_TUNING`
- `ProblemType` (optional) `Valid values: BinaryClassification | MulticlassClassification | Regression`
- `ObjectiveMetric` (Optional) Valid Values: `Accuracy | F1 | MSE` [`AutoMLJobObjective`](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_AutoMLJobObjective.html)
- `Max_Candidates` (Optional) (set only for HPO Jobs)
- `OutputDataConfig` (Optional, set if you need to specify output location for artifacts generated)

In [None]:
# Autopilot job params
target_column = "y"
training_mode = "ENSEMBLING"

# Optional Parameters
problem_type = "BinaryClassification"
objective_metric = "F1"
max_job_runtime_seconds = 3600
max_runtime_per_job_seconds = 1200
max_candidates = 10

Next, we define the Autopilot job config values
- [`AutoMLJobConfig`](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_AutoMLJobConfig.html) (`Mode` = `AUTO | ENSEMBLING | HYPERPARAMETER_TUNING`)
- [`InputDataConfig`](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateAutoMLJob.html#sagemaker-CreateAutoMLJob-request-InputDataConfig)
- [`AutoMLJobObjective`](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_AutoMLJobObjective.html) (Optional. `Accuracy | MSE | F1 | F1macro | AUC`)
- [`OutputDataConfig`](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_AutoMLOutputDataConfig.html) (Optional)

##### Define Autopilot job config values

In [None]:
automl_job_config = {
    "CompletionCriteria": {
        "MaxRuntimePerTrainingJobInSeconds": max_runtime_per_job_seconds,
        "MaxAutoMLJobRuntimeInSeconds": max_job_runtime_seconds,
    },
    "Mode": training_mode,
}

automl_job_objective = {"MetricName": objective_metric}

input_data_config = [
    {
        "DataSource": {"S3DataSource": {"S3DataType": "S3Prefix", "S3Uri": dataset_s3uri}},
        "TargetAttributeName": target_column,
    }
]

output_data_config = {"S3OutputPath": f"s3://{bucket}/{prefix}/output"}

# Optional: Define a Tag
tags_config = [{"Key": "Project", "Value": "Autopilot-serverless"}]

### Launch Autopilot job with training mode set to `ENSEMBLING`

In [None]:
try:
    ens_automl_job_name = f"{model_prefix}-ENS-{timestamp_suffix}"
    print(f"Launching AutoMLJob → {ens_automl_job_name} with mode set to {training_mode}")
    response = sm_client.create_auto_ml_job(
        AutoMLJobName=ens_automl_job_name,
        InputDataConfig=input_data_config,
        OutputDataConfig=output_data_config,
        AutoMLJobConfig=automl_job_config,
        ProblemType=problem_type,
        AutoMLJobObjective=automl_job_objective,
        RoleArn=role,
        Tags=tags_config,
    )
    print(response)
except Exception as e:
    print(f"Error launching ENSEMBLING Autopilot Job: {ens_automl_job_name}")
    print(f"{e}")
    pass

### Launch Autopilot job with training mode set to `HYPERPARAMETER_TUNING` mode

We update the `automl_job_config` dict to update `training_mode` to `HYPERPARAMETER_TUNING` and set the `MaxCandidates` to 15.

>NOTE: In `HPO` mode the best model is derived by tuning various hyperparameters, default setting for `max_candidates` is 250 but for demonstration purposes we set the `max_candidates` to 15

In [None]:
# We use the defined job prefix to construct model name(s) and later to construct endpoint config and endpoint names.
try:
    training_mode = "HYPERPARAMETER_TUNING"
    automl_job_config["Mode"] = training_mode
    automl_job_config["CompletionCriteria"]["MaxCandidates"] = 15
    hpo_automl_job_name = f"{model_prefix}-HPO-{timestamp_suffix}"
    print(f"Launching AutoMLJob → {hpo_automl_job_name} with mode set to {training_mode}")
    response = sm_client.create_auto_ml_job(
        AutoMLJobName=hpo_automl_job_name,
        InputDataConfig=input_data_config,
        OutputDataConfig=output_data_config,
        AutoMLJobConfig=automl_job_config,
        ProblemType=problem_type,
        AutoMLJobObjective=automl_job_objective,
        RoleArn=role,
        Tags=tags_config,
    )
    print(response)
except Exception as e:
    print(f"Error launching HPO Autopilot Job: {hpo_automl_job_name}")
    print(f"{e}")
    pass

### Monitor AutoML job completion status

>**NOTE:** Jobs with `ENSEMBLING` mode finishes faster

In [None]:
def get_job_status(sm_client, job_name):
    resp = sm_client.describe_auto_ml_job(AutoMLJobName=job_name)
    p_status = resp["AutoMLJobStatus"]
    s_status = resp["AutoMLJobSecondaryStatus"]
    desc = f"{job_name}: {p_status} | {s_status} ..."
    return (p_status, desc)

In [None]:
# monitor job status launched in ensembling mode
(p_status, desc) = get_job_status(sm_client, ens_automl_job_name)

while p_status not in ("Completed", "Failed"):
    (p_status, desc) = get_job_status(sm_client, ens_automl_job_name)
    if p_status not in ("Completed", "Failed"):
        print(desc)
        sleep(60)
        continue
    else:
        break

### Create model from the best candidate generated by Autopilot
- In `Ensemble` training mode Autopilot generates a single Inference container.

![](./images/ap-jobprofile-ens-04Oct2022.png)


### helper functions to create model(s), serverless endpoint config and endpoint

In [None]:
def create_autopilot_model(sm_client, model_name, role, model_container, index):
    try:
        transform_mode = model_container["Environment"]["AUTOML_TRANSFORM_MODE"]
        if transform_mode:
            model_name = f"{model_name}-datamodel-{index}"
    except:
        model_name = f"{model_name}-Inf-{index}"

    if len(model_name) <= 63:
        print(f"Creating Model {index}: {model_name} ...")
        model_response = sm_client.create_model(
            ModelName=model_name, ExecutionRoleArn=role, Containers=[model_container]
        )
        status_code = model_response["ResponseMetadata"]["HTTPStatusCode"]
        model_arn = model_response["ModelArn"]
        return (status_code, model_arn)
    else:
        print(f"Model Name: {model_name} length exceeds max. allowed chars : 63")
        raise ValueError("Model name cannot exceed 63 chars.")


def create_serverless_endpoint_config(
    sm_client, endpoint_config_name, model_name, memory: int = 2048, max_concurrency: int = 20
):
    if len(endpoint_config_name) <= 63:
        print(f"Creating Endpoint Config: {endpoint_config_name} ...")
        try:
            epc_response = sm_client.create_endpoint_config(
                EndpointConfigName=endpoint_config_name,
                ProductionVariants=[
                    {
                        "ModelName": model_name,
                        "VariantName": "AllTraffic",
                        "ServerlessConfig": {
                            "MemorySizeInMB": memory,
                            "MaxConcurrency": max_concurrency,
                        },
                    }
                ],
            )
            status_code = epc_response["ResponseMetadata"]["HTTPStatusCode"]
            epc_arn = epc_response["EndpointConfigArn"]
            return (status_code, epc_arn)
        except Exception as e:
            print(f"Error creating EndpointConfig: {endpoint_config_name}")
            print(f"{e}")
    else:
        print(f"EndpointConfig name exceeds allowed 63 char limit")
        raise ValueError("EndpointConfig name cannot exceed 63 chars.")


def create_serverless_endpoint(sm_client, endpoint_name, endpoint_config_name):
    if len(endpoint_name) <= 63:
        print(f"Creating Serverless Endpoint: {endpoint_name} ...")
        try:
            ep_response = sm_client.create_endpoint(
                EndpointName=endpoint_name, EndpointConfigName=endpoint_config_name
            )
            status_code = ep_response["ResponseMetadata"]["HTTPStatusCode"]
            return status_code
        except Exception as e:
            print(f"Error creating Endpoint: {endpoint_name}")
            print(f"{e}")
    else:
        print(f"Endpoint name exceeds allowed 63 char limit")
        raise ValueError("Endpoint name cannot exceed 63 chars.")


def get_s3_objsize_in_MB(bucket, key):
    s3 = boto3.client("s3")
    resp = s3.head_object(Bucket=bucket, Key=key)["ContentLength"]
    size = round(resp / (1024 * 1024))
    if size < 1:
        print(f"Model Size: ~ {round(resp / 1024)} KB")
    else:
        print(f"Model Size: ~ {size} MB")

    return size


def set_serverless_endpoint_memory(model_size: int):
    if model_size <= 1024:
        return 1024
    elif model_size > 1024 and model_size <= 2048:
        return 2048
    elif model_size > 2048 and model_size <= 3072:
        return 3072
    elif model_size > 3072 and model_size <= 4096:
        return 4096
    elif model_size > 4096 and model_size <= 5120:
        return 5120
    elif model_size > 5120 and model_size <= 6144:
        return 6144
    elif model_size > 6144:
        raise ValueError("Model size is greater than 6GB")

### Verify model size and create serverless endpoint configuration accordingly

>Serverless Inference auto-assigns compute resources proportional to the memory you select. 
If you choose a larger memory size, your container has access to more `vCPUs`. Choose your endpoint’s memory size according to your model size. 
Generally, the memory size should be at least as large as your model size. 

Ref: <https://docs.aws.amazon.com/sagemaker/latest/dg/serverless-endpoints-create.html>

In [None]:
response = sm_client.describe_auto_ml_job(AutoMLJobName=ens_automl_job_name)
inference_container = response["BestCandidate"]["InferenceContainers"][0]
print(f"Inference Container for AutoML job: {ens_automl_job_name}")
print(inference_container)

# Verify generated model size before creating endpoint config.
# Extract s3 Key from ModelDataUrl
model_dataurl_key = inference_container["ModelDataUrl"].split(f"{bucket}")[1][1:]
ens_model_size = get_s3_objsize_in_MB(bucket, model_dataurl_key)
print(f"Ensemble Model Size: ~ {ens_model_size}MB")

Set serverless endpoint config `MemorySize` and `MaxConcurrency`. Generally, the memory size should be **at least** as large as your model size. 

Set endpoint memory size to `4096` (4 GB) and `MaxConcurrency` to 10.

Your serverless endpoint has a minimum RAM size of **1024 MB (1 GB)**, and the maximum RAM size you can choose is **6144 MB (6 GB)**

If you don't specify any Memory `2048` (2 GB) is chosen as default. The memory sizes you can choose are 1024 MB, 2048 MB, 3072 MB, 4096 MB, 5120 MB, or 6144 MB.

Ref: <https://docs.aws.amazon.com/sagemaker/latest/dg/serverless-endpoints.html#serverless-endpoints-how-it-works-memory>




In [None]:
models = list()
# create model
(status, model_arn) = create_autopilot_model(
    sm_client, ens_automl_job_name, role, inference_container, 0
)
model_name = model_arn.split("/")[1]
models.append(model_name)

endpoint_configs = list()
endpoint_config_name = f"epc-{model_name}"
memory = 4096
# create endpoint config
(status, epc_arn) = create_serverless_endpoint_config(
    sm_client, endpoint_config_name, model_name, memory=memory, max_concurrency=10
)
endpoint_configs.append(endpoint_config_name)

endpoints = list()
endpoint_name = endpoint_config_name.replace("epc-", "ep-")
# create serverless endpoint
create_serverless_endpoint(sm_client, endpoint_name, endpoint_config_name)
endpoints.append(endpoint_name)

Wait for endpoint status to be `InService`

In [None]:
def get_endpoint_status(sm_client, endpoint_name):
    resp = sm_client.describe_endpoint(EndpointName=endpoint_name)
    status = resp["EndpointStatus"]
    desc = f"{endpoint_name} | {status} ..."
    return (status, desc)

In [None]:
# monitor endpoint status
(status, desc) = get_endpoint_status(sm_client, endpoint_name)
print(desc)
while status not in ("InService", "Failed"):
    (status, desc) = get_endpoint_status(sm_client, endpoint_name)
    if status not in ("InService", "Failed"):
        print(desc)
        sleep(60)
        continue
    else:
        print(desc)
        break

### Send Inference request to serverless endpoint with ENSEMBLE model

>**NOTE:** Serverless endpoints, being fully-managed, provision compute resources on demand, as a result your endpoint may experience cold starts. Typically, you'll experience a cold start during the first inference request and after a brief period of inactivity.



In [None]:
from sagemaker.predictor import Predictor
from sagemaker.serializers import CSVSerializer
from sagemaker.deserializers import CSVDeserializer

endpoint = endpoints[0]

payload = "51,technician,married,professional.course,no,yes,no,cellular,apr,thu,687,1,0,1,success,-1.8,93.075,-47.1,1.365,5099.1"
# payload = "42,services,married,professional.course,no,yes,no,telephone,may,thu,813,1,999,0,nonexistent,1.1,93.994,-36.4,4.855,5191.0"
# payload = "37,services,married,high.school,no,yes,no,telephone,may,mon,226,1,999,0,nonexistent,1.1,93.994,-36.4,4.857,5191.0"
# payload = "55,admin.,married,high.school,no,no,no,telephone,may,thu,94,1,999,0,nonexistent,1.1,93.994,-36.4,4.855,5191.0"
# payload = "34,blue-collar,married,basic.4y,no,no,no,telephone,may,tue,800,4,999,0,nonexistent,1.1,93.994,-36.4,4.857,5191.0"

try:
    print(f"Invoking endpoint: {endpoint} with payload .. \n")
    print(payload)
    predictor = Predictor(
        endpoint_name=endpoint,
        sagmaker_session=session,
        serializer=CSVSerializer(),
        deserializer=CSVDeserializer(),
    )
    prediction = predictor.predict(payload)
    print(f"Predicted Label: {prediction[0][0]}")
except Exception as e:
    print(f"Error invoking Endpoint: {endpoint}")
    print(f"{e}")
    pass

### Cleanup (ensemble endpoint)
Delete endpoint, endpoint config and model in that order

In [None]:
epc_name = endpoint.replace("ep-", "epc-")
model_name = endpoint.replace("ep-", "")

print(f"Deleting endpoint : {endpoint}")
try:
    sm_client.delete_endpoint(EndpointName=endpoint)
except Exception as e:
    print(f"{e}")
    pass

print(f"Deleting EndpointConfig : {epc_name}")
try:
    sm_client.delete_endpoint_config(EndpointConfigName=epc_name)
except Exception as e:
    print(f"{e}")
    pass

print(f"Deleting Model : {model_name}")
try:
    sm_client.delete_model(ModelName=model_name)
except Exception as e:
    print(f"{e}")
    pass

## Deploy HPO models to serverless endpoints

Autopilot in HYPERPARAMETER_TUNING mode generates 3 inference containers for binary classification problem types.

Ref: <https://docs.aws.amazon.com/sagemaker/latest/dg/autopilot-automate-model-development-container-output.html#autopilot-problem-type-container-output>

![](./images/ap-jobprofile-hpo-04Oct2022.png)

#### Monitor HPO AutoML job completion status

In [None]:
# monitor job status launched in hpo mode
(p_status, desc) = get_job_status(sm_client, hpo_automl_job_name)
print(desc)
while p_status not in ("Completed", "Failed"):
    (p_status, desc) = get_job_status(sm_client, hpo_automl_job_name)
    if p_status not in ("Completed", "Failed"):
        print(desc)
        sleep(60)
        continue
    else:
        print(desc)
        break

We enumerate through `InferenceContainers` list from `BestCandidate` HPO Model and create endpoints accordingly

- Step 1 : Create Model
- Step 2 : Create Endpoint Config with Model Name
- Step 3 : Create Endpoint with Endpoint Config

In [None]:
job_response = sm_client.describe_auto_ml_job(AutoMLJobName=hpo_automl_job_name)
inference_containers = job_response["BestCandidate"]["InferenceContainers"]
print(f"Inference Containers for AutoML job: {hpo_automl_job_name}")
print(inference_containers)

Get model sizes of generated inference containers

In [None]:
for idx, container in enumerate(inference_containers):
    print(f"calculating generated model_{idx} size")
    # Extract s3 Key from ModelDataUrl
    model_dataurl_key = container["ModelDataUrl"].split(f"{bucket}")[1][1:]
    # print(model_dataurl_key)
    model_size = get_s3_objsize_in_MB(bucket, model_dataurl_key)

All generated models are less than 1 MB. 
Let's set `MemorySize` to **2048 MB** and `MaxConcurrency` to **10**

In [None]:
models = list()
endpoint_configs = list()
endpoints = list()

memory = 2048
max_concurreny = 10

# Create model, endpoint_config, endpoint and store them in lists for easier access
for idx, container in enumerate(inference_containers):
    (status, model_arn) = create_autopilot_model(
        sm_client, hpo_automl_job_name, role, container, idx
    )
    model_name = model_arn.split("/")[1]
    print(f"\tcreated model: {model_name}...")
    models.append(model_name)

    endpoint_config_name = f"epc-{model_name}"
    endpoint_name = f"ep-{model_name}"

    (status, epc_arn) = create_serverless_endpoint_config(
        sm_client, endpoint_config_name, model_name, memory=memory, max_concurrency=max_concurreny
    )
    print(f"\tcreated epc: {endpoint_config_name}")
    endpoint_configs.append(endpoint_config_name)

    res = create_serverless_endpoint(sm_client, endpoint_name, endpoint_config_name)
    print(f"\tcreated ep: {endpoint_name}")
    endpoints.append(endpoint_name)

#### Monitor Endpoint creation status
Wait till all Endpoints are in `InService` status

In [None]:
statuses = [get_endpoint_status(sm_client, ep)[0] for ep in endpoints]
print(statuses)

while statuses != ["InService", "InService", "InService"]:
    statuses = [get_endpoint_status(sm_client, ep)[0] for ep in endpoints]
    print(statuses)
    if statuses != ["InService", "InService", "InService"]:
        sleep(60)
        continue
    else:
        print(statuses)
        break

### Send inference request to get predictions from each endpoint

Inference request flow:

![](./images/ap-hpo-serverless-payloadflow.png)


<div class="alert alert-info">

**Note** : In `HPO` mode, the `feature-transform` container response is of type `application/x-recordio-protobuf`. Therefore, we use an [IdentitySerializer](https://sagemaker.readthedocs.io/en/stable/api/inference/serializers.html#sagemaker.serializers.IdentitySerializer) to serialize the response from `feature-transform` container to the `Inference Container` i.e. container #2, without any modification. We then deserialize the output of the `Inference Container` using a `CSVDeserializer`.


Serverless endpoints, being fully-managed, provision compute resources on demand, as a result the endpoints may experience cold starts. Typically, you'll experience a cold start during the first inference request and after a brief period of inactivity.
</div>

#### Inference request with `Predicted Label` as output

In [None]:
from sagemaker.predictor import Predictor
from sagemaker.serializers import CSVSerializer, IdentitySerializer
from sagemaker.deserializers import CSVDeserializer

payload = "51,technician,married,professional.course,no,yes,no,cellular,apr,thu,687,1,0,1,success,-1.8,93.075,-47.1,1.365,5099.1"
# payload = "42,services,married,professional.course,no,yes,no,telephone,may,thu,813,1,999,0,nonexistent,1.1,93.994,-36.4,4.855,5191.0"
# payload = "37,services,married,high.school,no,yes,no,telephone,may,mon,226,1,999,0,nonexistent,1.1,93.994,-36.4,4.857,5191.0"
# payload = "55,admin,married,high.school,no,no,no,telephone,may,thu,94,1,999,0,nonexistent,1.1,93.994,-36.4,4.855,5191.0"
# payload = "34,blue-collar,married,basic.4y,no,no,no,telephone,may,tue,800,4,999,0,nonexistent,1.1,93.994,-36.4,4.857,5191.0"
# payload = "100,services,married,high.school,no,yes,no,cellular,apr,thu,483,2,999,0,nonexistent,-1.8,93.075,-47.1,1.41,5099.1"

for _, ep in enumerate(endpoints):
    try:
        print(f"Payload: {payload}")
        if _ == 1:
            predictor = Predictor(
                endpoint_name=ep,
                sagemaker_session=session,
                serializer=IdentitySerializer(content_type="application/x-recordio-protobuf"),
                deserializer=CSVDeserializer(),
            )
        else:
            predictor = Predictor(
                endpoint_name=ep, sagemaker_session=session, serializer=CSVSerializer()
            )
        prediction = predictor.predict(payload)
        print(f"Prediction: \n{prediction}")
        print("--" * 20)
        payload = prediction
    except Exception as e:
        print(f"Error invoking Endpoint: {ep} \n {e}")
        break

print(f"Final Prediction: {payload.decode('utf-8')}")

**NOTE (Optional)**: For capturing `probabilities` and `labels` along with `predicted_label` we could update the `inference_containers` to configure the required inputs and outputs.

For e.g. to update an inference container to output **`predicted_label`** and **`probabilites`** we could update the `inference_containers` object defined earlier.

![Configure SM Inference Output](./images/configure-sm-inference-output.png)

After update, recreate models, endpoint configs and then re-deploy the endpoints with updated `SAGEMAKER_INFERENCE_OUPUT` configurations.

You can read more about configuring inference output in generated containers [here](https://docs.aws.amazon.com/sagemaker/latest/dg/autopilot-automate-model-development-container-output.html#autopilot-problem-type-container-output)

---

## Cleanup (HPO endpoints)

In [None]:
print("Deleting endpoints...")
for _, ep in enumerate(endpoints):
    try:
        print(f"\tDeleting {ep}...")
        sm_client.delete_endpoint(EndpointName=ep)
    except Exception as e:
        print(f"{e}")
        continue
print("--" * 15)
print("Deleting endpoint configs...")
for _, epc in enumerate(endpoint_configs):
    try:
        print(f"\tDeleting {epc} ...")
        sm_client.delete_endpoint_config(EndpointConfigName=epc)
    except Exception as e:
        print(f"{e}")
        continue
print("--" * 15)
print("Deleting models...")
for _, mdl in enumerate(models):
    try:
        print(f"\tDeleting {mdl}...")
        sm_client.delete_model(ModelName=mdl)
    except Exception as e:
        print(f"{e}")
        continue

print(f"Done")

## Notebook CI Test Results

This notebook was tested in multiple regions. The test results are as follows, except for us-west-2 which is shown at the top of the notebook.

![This us-east-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/us-east-1/autopilot|autopilot-serverless-inference|autopilot-models-serverless-inference.ipynb)

![This us-east-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/us-east-2/autopilot|autopilot-serverless-inference|autopilot-models-serverless-inference.ipynb)

![This us-west-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/us-west-1/autopilot|autopilot-serverless-inference|autopilot-models-serverless-inference.ipynb)

![This ca-central-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/ca-central-1/autopilot|autopilot-serverless-inference|autopilot-models-serverless-inference.ipynb)

![This sa-east-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/sa-east-1/autopilot|autopilot-serverless-inference|autopilot-models-serverless-inference.ipynb)

![This eu-west-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/eu-west-1/autopilot|autopilot-serverless-inference|autopilot-models-serverless-inference.ipynb)

![This eu-west-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/eu-west-2/autopilot|autopilot-serverless-inference|autopilot-models-serverless-inference.ipynb)

![This eu-west-3 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/eu-west-3/autopilot|autopilot-serverless-inference|autopilot-models-serverless-inference.ipynb)

![This eu-central-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/eu-central-1/autopilot|autopilot-serverless-inference|autopilot-models-serverless-inference.ipynb)

![This eu-north-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/eu-north-1/autopilot|autopilot-serverless-inference|autopilot-models-serverless-inference.ipynb)

![This ap-southeast-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/ap-southeast-1/autopilot|autopilot-serverless-inference|autopilot-models-serverless-inference.ipynb)

![This ap-southeast-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/ap-southeast-2/autopilot|autopilot-serverless-inference|autopilot-models-serverless-inference.ipynb)

![This ap-northeast-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/ap-northeast-1/autopilot|autopilot-serverless-inference|autopilot-models-serverless-inference.ipynb)

![This ap-northeast-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/ap-northeast-2/autopilot|autopilot-serverless-inference|autopilot-models-serverless-inference.ipynb)

![This ap-south-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/ap-south-1/autopilot|autopilot-serverless-inference|autopilot-models-serverless-inference.ipynb)
