# Targeting Direct Marketing with Amazon SageMaker XGBoost


---

This notebook's CI test result for us-west-2 is as follows. CI test results in other regions can be found at the end of the notebook. 

![This us-west-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/us-west-2/inference|structured|async|default_server|single_model|deploy_all_options_xgb.ipynb)

---

_**Deploy a trained Gradient Boosted Trees model in SageMaker: A Binary Prediction Problem With Unbalanced Classes**_

---
## Deployments with bring your own model and custom inference script
With Amazon SageMaker, you can deploy your machine learning (ML) models to make predictions, also known as inference. SageMaker provides a broad selection of ML infrastructure and model deployment options to help meet all your ML inference needs. It is a fully managed service and integrates with MLOps tools, so you can scale your model deployment, reduce inference costs, manage models more effectively in production, and reduce operational burden.

After you’ve built and trained a machine learning model, you can use SageMaker Inference to start getting predictions, or inferences, from your model. With SageMaker Inference, you can either set up an endpoint that returns inferences or run Asynchronous inference workloads

To get started with SageMaker Inference, see the following sections and review the Inference options https://docs.aws.amazon.com/sagemaker/latest/dg/deploy-model.html#deploy-model-options to determine which feature best fits your use case.

#### Background for Model and data set
Direct marketing, either through mail, email, phone, etc., is a common tactic to acquire customers. Because resources and a customer's attention is limited, the goal is to only target the subset of prospects who are likely to engage with a specific offer. Predicting those potential customers based on readily available information like demographics, past interactions, and environmental factors is a common machine learning problem.

The data set is available at https://sagemaker-sample-data-us-west-2.s3-us-west-2.amazonaws.com/autopilot/direct_marketing/bank-additional.zip

#### For purpose of this notebook we wil execute the following steps
* Visualize the data set used to train the model
* Upload the test data set to S3 for leveraging during the test runs of the Inferencing
* Set up the following End points
 * Real-time Inference endpoint
 * Serverless Inference endpoint
 * Asynchronous Inference endpoint
* Investigate the model latency times and pros and cons of the approach

Optoinal section:
* Scaling options and showcase how to scale endpoints in SageMaker

---

### Preparation

_This notebook was created and tested on an ml.m4.xlarge notebook instance._

Let's start by specifying:

- The S3 bucket and prefix that you want to use for training and model data. This should be within the same region as the Notebook Instance, training, and hosting.
- The IAM role arn used to give training and hosting access to your data. See the documentation for how to create these. Note, if more than one role is required for notebook instances, training, and/or hosting, please replace the boto regexp with a the appropriate full IAM role arn string(s).

In [None]:
# cell 01
import sagemaker

bucket = sagemaker.Session().default_bucket()
prefix = "sagemaker/DEMO-xgboost-dm"

# Define IAM role
import boto3
import re
from sagemaker import get_execution_role

role = get_execution_role()

# cell 02
import numpy as np # For matrix operations and numerical processing
import pandas as pd # For munging tabular data
import matplotlib.pyplot as plt # For charts and visualizations
from IPython.display import Image # For displaying images in the notebook
from IPython.display import display # For displaying outputs in the notebook
from time import gmtime, strftime # For labeling SageMaker models, endpoints, etc.
import sys # For writing outputs to notebook
import math # For ceiling function
import json # For parsing hosting outputs
import os # For manipulating filepath names
import sagemaker
import zipfile # Amazon SageMaker's Python SDK provides many helper functions

Now let's bring in the Python libraries that we'll use throughout the analysis

In [None]:
# cell 02
import numpy as np # For matrix operations and numerical processing
import pandas as pd # For munging tabular data
import matplotlib.pyplot as plt # For charts and visualizations
from IPython.display import Image # For displaying images in the notebook
from IPython.display import display # For displaying outputs in the notebook
from time import gmtime, strftime # For labeling SageMaker models, endpoints, etc.
import sys # For writing outputs to notebook
import math # For ceiling function
import json # For parsing hosting outputs
import os # For manipulating filepath names
import sagemaker
import zipfile # Amazon SageMaker's Python SDK provides many helper functions
from sagemaker.multidatamodel import MultiDataModel
import time

## Prerequisite

Upload the model and the data to S3 for deployments. We will simulate the bring your own model concept which assumes the model is already in S3

Visualize the Data we are going to use to run predictions on. The model will essentially predict the last column

In [None]:
# cell 03
data = pd.read_csv("./data_xgb/bank-additional.csv")
pd.set_option("display.max_columns", 500) # Make sure we can see all of the columns
pd.set_option("display.max_rows", 20) # Keep the output on one page
data

### Upload the Model and test data artifacts into S3 for simulations

In [None]:
# Cell 04
import sagemaker

import boto3
import os
import time
import json
import re

In [None]:
# Cell 05

prefix = "sagemaker/DEMO-xgboost-dm"
role = sagemaker.get_execution_role() # execution role for the endpoint
sess = sagemaker.session.Session() # sagemaker session for interacting with different AWS APIs
def_bucket = sess.default_bucket() # bucket to house artifacts
model_bucket = sess.default_bucket() # bucket to house artifacts

region = sess._region_name
account_id = sess.account_id()

s3_client = boto3.client("s3")
sm_client = boto3.client("sagemaker")
smr_client = boto3.client("sagemaker-runtime")

print(role)
print(region)

# - upload the model -- use this if you want to test with your own trained model
# s3_model_path = sess.upload_data(
# "./models_xgb/model.tar.gz",
# def_bucket,
# "sagemaker/DEMO-xgboost-dm/output/xgboost-2023-01-20-01-45-52-042/output",
# ) # - file, bucket, key_prefix

s3_model_path = (
 f"s3://sagemaker-examples-files-prod-{region}/models/xgboost/hosting/alloptions/model.tar.gz"
)

print(s3_model_path)
# os.remove(output_filename)

#### Upload the Testing data and the Ground Truth

The full data set is a 4119, 58 matrix

In [None]:
# Cell 06

# - Upload the Test Data and the Ground truth

s3_test_path = sess.upload_data(
 "./data_xgb/test_x.csv",
 def_bucket,
 "sagemaker/DEMO-xgboost-dm/output/xgboost-2023-01-20-01-45-52-042/data",
) # - file, bucket, key_prefix

s3_y_path = sess.upload_data(
 "./data_xgb/test_y.csv",
 def_bucket,
 "sagemaker/DEMO-xgboost-dm/output/xgboost-2023-01-20-01-45-52-042/data",
) # - file, bucket, key_prefix

orig_bank_data = sess.upload_data(
 "./data_xgb/bank-additional.csv",
 def_bucket,
 "sagemaker/DEMO-xgboost-dm/output/xgboost-2023-01-20-01-45-52-042/data",
) # - file, bucket, key_prefix
print(orig_bank_data)
print(s3_test_path)
print(s3_y_path)

### Bring in the container with the specific version which will be used to run the model
Here we need to have the container version be the same as the one which was used to train the model

In [None]:
# cell 07
# container = sagemaker.image_uris.retrieve(region=boto3.Session().region_name, framework='xgboost', version='latest')
container = sagemaker.image_uris.retrieve(
 region=boto3.Session().region_name, framework="xgboost", version="1.5-1"
)

container

#### Create the Inference script. SageMaker allows you to combine a pre trained model with your own inference script 

In [None]:
# Cell 08
!mkdir -p code_inference

In [None]:
%%writefile code_inference/model.py

import json
import os
import pickle as pkl

import numpy as np
import sagemaker_xgboost_container.encoder as xgb_encoders
import xgboost as xgb


def model_fn(model_dir):
 """
 Deserialize and return fitted model.
 """
 model_file = "xgboost-model"
 booster = pkl.load(open(os.path.join(model_dir, model_file), "rb"))
 prin(f"Model:loaded={booster}:", flush=True)
 return booster


def input_fn(request_body, request_content_type):
 """
 The SageMaker XGBoost model server receives the request data body and the content type,
 and invokes the `input_fn`.
 Return a DMatrix (an object that can be passed to predict_fn).
 """
 if request_content_type == "text/libsvm":
 return xgb_encoders.libsvm_to_dmatrix(request_body)
 elif request_content_type == "text/csv":
 input_np = np.array(request_body).reshape((1, -1))
 print(f"Model:input:text:shape={input_np.shape}::", flush=True)
 # return input_np
 return xgb.DMatrix(input_np)
 else:
 raise ValueError("Input Content type {} is not supported.".format(request_content_type))


def predict_fn(input_data, model):
 """
 SageMaker XGBoost model server invokes `predict_fn` on the return value of `input_fn`.
 """
 prediction = model.predict(input_data)
 print(f"Model:prediction:shape={prediction.shape}:", flush=True)
 return prediction # output


def output_fn(predictions, content_type):
 """
 After invoking predict_fn, the model server invokes `output_fn`.
 """
 if content_type == "text/csv":
 return ",".join(str(x) for x in predictions[0])
 else:
 raise ValueError("Return Content type {} is not supported.".format(content_type))

---

## Hosting

### Real-time Inference endpoint
Now that we've have loaded the pre trained `xgboost` model in S3, let's deploy a model that's hosted behind a real-time endpoint.

In [None]:
# Cell 10
from sagemaker.xgboost.model import XGBoostModel

xgboost_model = XGBoostModel(
 model_data=s3_model_path,
 role=role,
 entry_point="model.py",
 source_dir="code_inference",
 framework_version="latest", # "1.0-1"
)

xgboost_model

#### This cell can take a couple of minutes please be patient

In [None]:
# Cell 11
xgb_predictor = xgboost_model.deploy(instance_type="ml.m4.xlarge", initial_instance_count=1)

xgb_predictor

In [None]:
# Cell 12
print(xgb_predictor.serializer)
print(xgb_predictor.deserializer)

### Change the default serializers which is libsvm to CSV since we will be dealing with a CSV data set

In [None]:
# cell 13
xgb_predictor.serializer = (
 sagemaker.serializers.CSVSerializer()
) # sagemaker.serializers.LibSVMSerializer
xgb_predictor.deserializer = sagemaker.deserializers.CSVDeserializer()

---

## Evaluation
There are many ways to compare the performance of a machine learning model, but let's start by simply comparing actual to predicted values. In this case, we're simply predicting whether the customer subscribed to a term deposit (`1`) or not (`0`), which produces a simple confusion matrix.

First we'll need to determine how we pass data into and receive data from our endpoint. Our data is currently stored as NumPy arrays in memory of our notebook instance. To send it in an HTTP POST request, we'll serialize it as a CSV string and then decode the resulting CSV.

*Note: For inference with CSV format, SageMaker XGBoost requires that the data does NOT include the target variable.*

We have a couple of options here , we can:
1. Loop over our test dataset and split it into mini batches and send that into the model
1. Or we could send the entire payload as is into the model. This is governed by the payload size which cannot exceed 6MB
1. Snce our payload is smaller we can use the entire batch as is to feed
1. This can be configured by the batch_size_to_run variable. For purpose of this lab we can keep it with a shorted value

In [None]:
# cell 14
test_x = pd.read_csv("./data_xgb/test_x.csv", names=[f"{i}" for i in range(58)])
test_y = pd.read_csv("./data_xgb/test_y.csv", names=["y"])
print(test_x.shape)
print(test_y.shape)

In [None]:
# cell 15
batch_size_to_run = 20
# - create a batch_size_to_run row Test data to be used for predictions -- .drop(test_x.columns[0], axis=1)
test_array = test_x.iloc[:batch_size_to_run, :].to_numpy()
y_array = test_y["y"].values[:batch_size_to_run]
test_array.shape # columns in this test data

In [None]:
# cell 16
# xgb_predictor.predict(test_array[0].tolist())
# xgb_predictor.predict(test_array.tolist())

predictions = xgb_predictor.predict(test_array)

In [None]:
# cell 17

print(len(predictions[0]))

Now we'll check our confusion matrix to see how well we predicted versus actuals.

In [None]:
# cell 18
# pd.crosstab(index=test_y['y'].values, columns=np.round(predictions), rownames=['actuals'], colnames=['predictions'])
pd.crosstab(
 index=y_array,
 columns=np.round(np.array(predictions[0], dtype=np.float32)),
 rownames=["actuals"],
 colnames=["predictions"],
)

So, (since out batch size is 20 ) of the ~20 potential customers, this would be on that batch size. Please see for more details on Confusion Matrix [here](https://docs.aws.amazon.com/machine-learning/latest/dg/multiclass-model-insights.html).

Run the below to get a P95 latency numbers for our model

In [None]:
# cell 19

# - get p95 numbers for Latency

import numpy as np
import time

print(f"Starting invocation for model::Real:Time please wait batch_size={batch_size_to_run}:.....")
results = []
for i in range(0, 10):
 start = time.time()
 xgb_predictor.predict(test_array)
 results.append((time.time() - start) * 1000)
print("\nPredictions for model latency: \n")
print("\nP95: " + str(np.percentile(results, 95)) + " ms\n")
print("P90: " + str(np.percentile(results, 90)) + " ms\n")
print("Average: " + str(np.average(results)) + " ms\n")

## Serverless Invocations

Create a Serverless Inference endpoint and show the initial cold-start issue and then subsequent calls run

For serverless we need to specify only the concurrency and memory size of the model. For further reading you can refer to https://docs.aws.amazon.com/sagemaker/latest/dg/serverless-endpoints.html

In [None]:
# Cell 20
serverless_inf_config = sagemaker.serverless.ServerlessInferenceConfig(
 memory_size_in_mb=1024, # 68 KB is the size of the model
 max_concurrency=5, # max invocations concurrently
)
serverless_inf_config

In [None]:
# cell 21

xgb_serverless_predictor = xgboost_model.deploy(
 instance_type="ml.m4.xlarge",
 initial_instance_count=1,
 serverless_inference_config=serverless_inf_config,
)

xgb_serverless_predictor

In [None]:
# Cell 22
print(xgb_serverless_predictor.serializer)
print(xgb_serverless_predictor.deserializer)

#### As with realtime we will change the serializers to match our data sets

In [None]:
# cell 23
xgb_serverless_predictor.serializer = (
 sagemaker.serializers.CSVSerializer()
) # sagemaker.serializers.LibSVMSerializer
xgb_serverless_predictor.deserializer = sagemaker.deserializers.CSVDeserializer()

#### Cold start for the serverless 

The very first time the request is sent, the serverless endpoint will spin up an instance and then run predictions on that. To save cost for the customer serverless will spin down all the instances after a certain time period of in-activity. See this for more details https://docs.aws.amazon.com/sagemaker/latest/dg/serverless-endpoints-monitoring.html

In [None]:
%%time
# Cell 24

# xgb_predictor.predict(test_array[0].tolist())
# xgb_predictor.predict(test_array.tolist())

predictions_serverless = xgb_serverless_predictor.predict(test_array)

#### Subsequent invocation will be a lot faster

In [None]:
%%time
# Cell 25

# xgb_predictor.predict(test_array[0].tolist())
# xgb_predictor.predict(test_array.tolist())

predictions_serverless = xgb_serverless_predictor.predict(test_array)

In [None]:
# Cell 26
print(len(predictions_serverless[0]))

In [None]:
# Cell 27
# pd.crosstab(index=test_y['y'].values, columns=np.round(predictions), rownames=['actuals'], colnames=['predictions'])
pd.crosstab(
 index=y_array,
 columns=np.round(np.array(predictions_serverless[0], dtype=np.float32)),
 rownames=["actuals"],
 colnames=["predictions"],
)

### Run invocations for the endpoint and get P95 numbers

In [None]:
%%time
# cell 28

# - get p95 numbers for Latency

import numpy as np
import time

print(f"Starting invocation for model::Real:Time please wait batch_size={batch_size_to_run}:.....")
results = []
for i in range(0, 10):
 start = time.time()
 xgb_serverless_predictor.predict(test_array)
 results.append((time.time() - start) * 1000)
print("\nPredictions for model latency: \n")
print("\nP95: " + str(np.percentile(results, 95)) + " ms\n")
print("P90: " + str(np.percentile(results, 90)) + " ms\n")
print("Average: " + str(np.average(results)) + " ms\n")

### Start with the Asynchronous inference endpoint for deployment

Upload Test data sets into multiple buckets. The idea behind this is to leverage Asynchronous inference by splitting 
the data sets into smaller data sets which can then be fed into the system for predictions. This simplifies the blast radius and makes it easier to debug vs a batch workload. The size of each payload should be ~1G maximum.
We can simulate loading the Asynchronous inference queue using these location


Once you have a model, create an Asynchronous inference configuration. Amazon SageMaker hosting services uses this configuration to deploy models. In the configuration, you identify one or more model that were created, to deploy the resources that you want Amazon SageMaker to provision. Specify the AsyncInferenceConfig object and provide an output Amazon S3 location for OutputConfig. You can optionally specify Amazon SNS topics on which to send notifications about prediction results.

Asynchronous inference endpoint can spin down to all the way down to 0 Instances to save cost 

In [None]:
# Cell 29
import os

# -- upload a set of 10 test data into S3 for A-SYNC
s3_async_path_list = []
for index in range(10):
 s3_async_path = sess.upload_data(
 "./data_xgb/test_x.csv",
 def_bucket,
 f"sagemaker/DEMO-xgboost-dm/output/xgboost-2023-01-20-01-45-52-042/async/data{index}",
 extra_args={"ContentType": "text/csv"},
 )
 s3_async_path_list.append(s3_async_path)

print(len(s3_async_path_list))

#### Define where the output will go - it will be for each of the outputs we run through the system for prediction

In [None]:
# Cell 30
async_output_path = f"s3://{def_bucket}/sagemaker/DEMO-xgboost-dm/output/xgboost-2023-01-20-01-45-52-042/async/output"
print(async_output_path)

#### Now create the Asynchronous inference endpoint

In [None]:
# Cell 31
async_inf_config = sagemaker.async_inference.AsyncInferenceConfig(
 output_path=async_output_path,
 max_concurrent_invocations_per_instance=2, # max invocations concurrently
)
async_inf_config

In [None]:
# Cell 32

xgb_async_predictor = xgboost_model.deploy(
 instance_type="ml.m5.xlarge",
 initial_instance_count=1,
 async_inference_config=async_inf_config,
)

xgb_async_predictor

In [None]:
# Cell 33
xgb_async_predictor.endpoint_name

In [None]:
# Cell 34
print(xgb_async_predictor.serializer)
print(xgb_async_predictor.deserializer)

#### Change the serializers to match the model requirements

In [None]:
# cell 35
xgb_async_predictor.serializer = (
 sagemaker.serializers.CSVSerializer()
) # sagemaker.serializers.LibSVMSerializer
xgb_async_predictor.deserializer = sagemaker.deserializers.CSVDeserializer()

#### Input data for which you want the model to provide inference. 

The Asynchronous inference needs the data to be in s3 and below is a helper method to take the payload from memory and save it to S3 and then run the prediciton on the Asynchronous inference Endpoint

If a serializer was specified in the encapsulated in the Predictor object, the result of the serializer is sent as input data. Otherwise the data must be sequence of bytes, and the predict method then upload the data to the S3 location

The predictions will be stored in S3 location at async_output_path

In [None]:
%%time
# Cell 36

input_payload_location = s3_async_path_list[0]
predictions_async = xgb_async_predictor.predict_async(
 data=test_array,
)
predictions_async

In [None]:
# Cell 37
print(predictions_async.output_path)
!aws s3 ls $predictions_async.output_path
!echo "The Output Path files in S3"
!aws s3 ls $async_output_path/

#### We can use the SDK API's to get the result or check from the S3 location 
Run the below only in case you see the output from the aws ls for async_output_path variable location 

In [None]:
# Cell 38
print(len(predictions_async.get_result()[0]))
print("\n")
print(predictions_async.output_path)

In [None]:
# Cell 39
# -- now view the outputs and copy to the local location for viewing results

!aws s3 ls $predictions_async.output_path
!aws s3 cp $predictions_async.output_path ./data_xgb

#### View the results using Pandas

In [None]:
# Cell 40

file_name = predictions_async.output_path.split("/")[-1]
file_name = f"./data_xgb/{file_name}"

pred_np = np.loadtxt(file_name, delimiter=",", dtype=np.float32).reshape(-1, 1)
print(pred_np.shape)
pred_np

#### Run Predictions using the saved data sets

This sends in the request - and the endpoint will process from the queue. It is not garunteed that the results will be available right away

The saved data sets were the full 4119 rows of data and hence the length of predictions will not match the batch size we had specified earlier

In [None]:
%%time
# Cell 41

output_path_list = []
for index in range(len(s3_async_path_list)):
 input_payload_location = s3_async_path_list[index]
 predictions_async = xgb_async_predictor.predict_async(
 # data=test_array,
 input_path=input_payload_location,
 )
 # print(len(predictions_async.get_result()[0]))
 print(predictions_async.output_path)
 output_path_list.append(predictions_async.output_path)

In [None]:
# Cell 42
!echo "The Output Path files in S3 $async_output_path"
!aws s3 ls $async_output_path/

### Check Output Location

Check the output location to see if the inference has been processed. We make multiple requests (beginning of the while True statement in the get_output function) every two seconds until there is an output of the inference request:


In [None]:
# Cell 43

import urllib, time
from botocore.exceptions import ClientError


def get_output(output_location):
 start_time = time.time()
 output_url = urllib.parse.urlparse(output_location)
 bucket = output_url.netloc
 key = output_url.path[1:]
 while (
 time.time() - start_time
 ) * 1000 < 30: # 30 seconds max wait time - for now - ideally we have a 15 min SLA
 try:
 return sess.read_s3_file(bucket=output_url.netloc, key_prefix=output_url.path[1:])
 except ClientError as e:
 if e.response["Error"]["Code"] == "NoSuchKey":
 print(f"waiting for output...key={key}:")
 time.sleep(2)
 continue
 print(f"Timeout or finished method for key={key}:")

In [None]:
# Cell 44
# the length of predictions will be 1 per row of input data so it should total to 4119
for output_location in output_path_list:
 output = get_output(output_location)
 print(f"Output: {len(output.split(',')) }")

In [None]:
# Cell 45
# - check the shape of the output -- to match input size of 4119
np.array(output.split(",")).shape

#### Cross tab for the predictions

In [None]:
# Cell 46 (a)
# - we can use the data and prtedictions we generated from in memory run
pd.crosstab(
 index=y_array,
 columns=np.round(pred_np.reshape(-1)),
 rownames=["actuals"],
 colnames=["predictions"],
)

In [None]:
# Cell 46 (b)
# - we can use the data from the stored results after Async finished the predictions.
pred_np_array = np.array(output.split(","), dtype=np.float32)[: len(y_array)]
pd.crosstab(
 index=y_array,
 columns=np.round(pred_np_array),
 rownames=["actuals"],
 colnames=["predictions"],
)

## Congratulations End of Lab for deployment Options

### Below section is optional and depends on service limits in your accounts

#### Now we create a autoscaling policy for the end point. For this we will leverage the boto3 API

This section describes how to configure autoscaling on your asynchronous endpoint using Application Autoscaling. You need to first register your endpoint variant with Application Autoscaling, define a scaling policy, and then apply the scaling policy. In this configuration, we use a custom metric, CustomizedMetricSpecification, called ApproximateBacklogSizePerInstance. Please refer to the SageMaker Developer guide for a detailed list of metrics available with your asynchronous inference endpoint.

In [None]:
# Cell 47
sm_client = boto3.client("sagemaker")

In [None]:
# Cell 48
sm_client.describe_endpoint(EndpointName=xgb_async_predictor.endpoint_name)

In [None]:
# Cell 49
client = boto3.client(
 "application-autoscaling"
) # Common class representing Application Auto Scaling for SageMaker amongst other services

resource_id = (
 "endpoint/" + xgb_async_predictor.endpoint_name + "/variant/" + "AllTraffic"
) # This is the format in which application autoscaling references the endpoint

# Configure Autoscaling on asynchronous endpoint down to zero instances
response = client.register_scalable_target(
 ServiceNamespace="sagemaker",
 ResourceId=resource_id,
 ScalableDimension="sagemaker:variant:DesiredInstanceCount",
 MinCapacity=1,
 MaxCapacity=3,
)

# - use a lower value to simulate the autoscaling to kick in
response = client.put_scaling_policy(
 PolicyName="Invocations-ScalingPolicy",
 ServiceNamespace="sagemaker", # The namespace of the AWS service that provides the resource.
 ResourceId=resource_id, # Endpoint name
 ScalableDimension="sagemaker:variant:DesiredInstanceCount", # SageMaker supports only Instance Count
 PolicyType="TargetTrackingScaling", # 'StepScaling'|'TargetTrackingScaling'
 TargetTrackingScalingPolicyConfiguration={
 "TargetValue": 2.0, # The target value for the metric. - here the metric is - ApproximateBacklogSizePerInstance
 "CustomizedMetricSpecification": {
 "MetricName": "ApproximateBacklogSizePerInstance",
 "Namespace": "AWS/SageMaker",
 "Dimensions": [{"Name": "EndpointName", "Value": xgb_async_predictor.endpoint_name}],
 "Statistic": "Average",
 },
 "ScaleInCooldown": 600, # The cooldown period helps you prevent your Auto Scaling group from launching or terminating
 # additional instances before the effects of previous activities are visible.
 # You can configure the length of time based on your instance startup time or other application needs.
 # ScaleInCooldown - The amount of time, in seconds, after a scale in activity completes before another scale in activity can start.
 "ScaleOutCooldown": 1 # ScaleOutCooldown - The amount of time, in seconds, after a scale out activity completes before another scale out activity can start.
 # 'DisableScaleIn': True|False - ndicates whether scale in by the target tracking policy is disabled.
 # If the value is true , scale in is disabled and the target tracking policy won't remove capacity from the scalable resource.
 },
)
response

#### The end point is now ready for invocations with burst capacity 

In [None]:
# Cell 50
# - clean the output path so we can be sure of the response after scaling
!aws s3 rm --recursive $async_output_path/ --quiet
!aws s3 ls $async_output_path/ | wc -l

In [None]:
%%time
# Cell 51
# - invoke n times again

output_path_list = []

for index in range(len(s3_async_path_list)):
 for top_i in range(50):
 # for index in range( len (s3_async_path_list) ):
 input_payload_location = s3_async_path_list[index]
 predictions_async = xgb_async_predictor.predict_async(
 # data=test_array,
 input_path=input_payload_location,
 )
 # print(len(predictions_async.get_result()[0]))
 # print(predictions_async.output_path)
 output_path_list.append(predictions_async.output_path)
 time.sleep(0.5)

print(f"No of requests sent to the ASync endpoint are {len(output_path_list)} ")

In [None]:
# Cell 52
!echo "The Output Path files in S3 $async_output_path"
!aws s3 ls $async_output_path/ | wc -l
# !aws s3 ls --summarize --human-readable --recursive $async_output_path/

In [None]:
# Cell 53 (a)
# Check the Scaling conditions
scale_response = sm_client.describe_endpoint(EndpointName=xgb_async_predictor.endpoint_name)
scale_response["EndpointName"]

In [None]:
# Cell 53 (b)

print(f"Scaling endpoint status --- > {scale_response['EndpointStatus']}")
print(
 f"Scaling endpoint instance count --- > {scale_response['ProductionVariants'][0]['CurrentInstanceCount']}"
)
print(
 f"Scaling endpoint Desired instance count --- > {scale_response['ProductionVariants'][0]['DesiredInstanceCount']}"
)

#### Run the below only in case you see scaling activities have finished - this can takea couple of minutes

endpoint status should show as Inservice

In [None]:
# Cell 53
# check one of the output files
for output_location in output_path_list:
 output = get_output(output_location)
 print(f"Output: {len(output.split(',')) }")
 break

### (Optional) Clean-up

If you are done with this notebook, please run the cell below. This will remove the hosted endpoint you created and avoid any charges from a stray instance being left on.

In [None]:
# Cell 54
# - run this cell only in case you had set up the Scaling options
response = client.deregister_scalable_target(
 ServiceNamespace="sagemaker",
 ResourceId=resource_id,
 ScalableDimension="sagemaker:variant:DesiredInstanceCount",
)
response

In [None]:
# Cell 55

try:
 xgb_predictor.delete_endpoint(delete_endpoint_config=True)
 xgb_serverless_predictor.delete_endpoint(delete_endpoint_config=True)
 xgb_async_predictor.delete_endpoint(delete_endpoint_config=True)
except:
 pass

#### Clean the bucket and delete contents

In [None]:
# Cell 56
!aws s3 rm --recursive $async_output_path/ --quiet

In [None]:
# Cell 57
!aws s3 ls $async_output_path/

## Notebook CI Test Results

This notebook was tested in multiple regions. The test results are as follows, except for us-west-2 which is shown at the top of the notebook.

![This us-east-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/us-east-1/inference|structured|async|default_server|single_model|deploy_all_options_xgb.ipynb)

![This us-east-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/us-east-2/inference|structured|async|default_server|single_model|deploy_all_options_xgb.ipynb)

![This us-west-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/us-west-1/inference|structured|async|default_server|single_model|deploy_all_options_xgb.ipynb)

![This ca-central-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/ca-central-1/inference|structured|async|default_server|single_model|deploy_all_options_xgb.ipynb)

![This sa-east-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/sa-east-1/inference|structured|async|default_server|single_model|deploy_all_options_xgb.ipynb)

![This eu-west-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/eu-west-1/inference|structured|async|default_server|single_model|deploy_all_options_xgb.ipynb)

![This eu-west-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/eu-west-2/inference|structured|async|default_server|single_model|deploy_all_options_xgb.ipynb)

![This eu-west-3 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/eu-west-3/inference|structured|async|default_server|single_model|deploy_all_options_xgb.ipynb)

![This eu-central-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/eu-central-1/inference|structured|async|default_server|single_model|deploy_all_options_xgb.ipynb)

![This eu-north-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/eu-north-1/inference|structured|async|default_server|single_model|deploy_all_options_xgb.ipynb)

![This ap-southeast-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/ap-southeast-1/inference|structured|async|default_server|single_model|deploy_all_options_xgb.ipynb)

![This ap-southeast-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/ap-southeast-2/inference|structured|async|default_server|single_model|deploy_all_options_xgb.ipynb)

![This ap-northeast-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/ap-northeast-1/inference|structured|async|default_server|single_model|deploy_all_options_xgb.ipynb)

![This ap-northeast-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/ap-northeast-2/inference|structured|async|default_server|single_model|deploy_all_options_xgb.ipynb)

![This ap-south-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/ap-south-1/inference|structured|async|default_server|single_model|deploy_all_options_xgb.ipynb)
