## SageMaker Inference Recommender - XGBoost

### 1. Introduction
SageMaker Inference Recommender is a new capability of SageMaker that reduces the time required to get machine learning (ML) models in production by automating load tests and optimizing model performance across instance types. You can use Inference Recommender to select a real-time inference endpoint that delivers the best performance at the lowest cost.

Get started with Inference Recommender on SageMaker in minutes while selecting an instance and get an optimized endpoint configuration in hours, eliminating weeks of manual testing and tuning time.

### 2. Setup
Note that we are using the conda_python3 kernel in SageMaker Notebook Instances. This is running Python 3.6. If you'd like to use the same setup, in the AWS Management Console, go to the Amazon SageMaker console. Choose Notebook Instances, and click create a new notebook instance. Upload the current notebook and set the kernel. You can also run this in SageMaker Studio Notebooks with the Python 3 (Data Science) kernel.

In the next steps, you'll import standard methods and libraries as well as set variables that will be used in this notebook. The get_execution_role function retrieves the AWS Identity and Access Management (IAM) role you created at the time of creating your notebook instance. 

For this example we will be utilizing the [SageMaker Python SDK](https://github.com/aws/sagemaker-python-sdk/tree/cc9b286f9977c4b793d16196ebd02570f8249bbd/src/sagemaker/inference_recommender) support for Inference Recommender. For our ML use-case we will be utilizing the built-in [SageMaker XGBoost](https://docs.aws.amazon.com/sagemaker/latest/dg/xgboost.html) algorithm to solve a classification problem for a fraud detection use-case that is built off of the following [example](https://github.com/aws-samples/amazon-sagemaker-fraud-detection/blob/master/notebooks/sagemaker_fraud_detection_xgb.ipynb).

In [None]:
!pip install -U sagemaker

### 3. Retrieve and Prepare Dataset

In [None]:
%%bash
wget https://s3-us-west-2.amazonaws.com/sagemaker-e2e-solutions/fraud-detection/creditcardfraud.zip
unzip creditcardfraud.zip

In [None]:
import numpy as np 
import pandas as pd
import matplotlib.pyplot as plt

data = pd.read_csv('creditcard.csv', delimiter=',')

In [None]:
print(data.columns)
data[['Time', 'V1', 'V2', 'V27', 'V28', 'Amount', 'Class']].describe()
data.head(10)

In [None]:
nonfrauds, frauds = data.groupby('Class').size()
print('Number of frauds: ', frauds)
print('Number of non-frauds: ', nonfrauds)
print('Percentage of fradulent data:', 100.*frauds/(frauds + nonfrauds))

In [None]:
feature_columns = data.columns[:-1]
label_column = data.columns[-1]

features = data[feature_columns].values.astype('float32')
labels = (data[label_column].values).astype('float32')

In [None]:
model_data = data
model_data.head()
model_data = pd.concat([model_data['Class'], model_data.drop(['Class'], axis=1)], axis=1)
model_data.head()

#### Upload Dataset to S3

In [None]:
import boto3
import os
import sagemaker
region = boto3.Session().region_name
role = sagemaker.get_execution_role()

session = sagemaker.Session()

bucket = session.default_bucket()
sagemaker_iam_role = sagemaker.get_execution_role()

prefix = 'sagemaker/DEMO-xgboost-fraud'

train_data, validation_data, test_data = np.split(model_data.sample(frac=1, random_state=1729), 
                                                  [int(0.7 * len(model_data)), int(0.9 * len(model_data))])
train_data.to_csv('train.csv', header=False, index=False)
validation_data.to_csv('validation.csv', header=False, index=False)


boto3.Session().resource('s3').Bucket(bucket).Object(os.path.join(prefix, 'train/train.csv')) \
                                .upload_file('train.csv')
boto3.Session().resource('s3').Bucket(bucket).Object(os.path.join(prefix, 'validation/validation.csv')) \
                                .upload_file('validation.csv')
s3_train_data = 's3://{}/{}/train/train.csv'.format(bucket, prefix)
s3_validation_data = 's3://{}/{}/validation/validation.csv'.format(bucket, prefix)
print('Uploaded training data location: {}'.format(s3_train_data))
print('Uploaded training data location: {}'.format(s3_validation_data))

output_location = 's3://{}/{}/output'.format(bucket, prefix)
print('Training artifacts will be uploaded to: {}'.format(output_location))

#### Convert Dataset to Smaller Payload

For the payload we will pass into Inference Recommender we want to make sure we don't cross the 6MB payload limit with real-time inference, so we make a smaller subset of our test dataset we created in the previous cell.

In [None]:
sub_test = test_data.iloc[:100,:]
df2 = sub_test.iloc[: , 1:]

In [None]:
df2.to_csv("payload.csv", header=False,index=False)

### 5. SageMaker Training Job

In [None]:
container = sagemaker.image_uris.retrieve(framework="xgboost", region=region, version="1.0-1", py_version="py3", 
                                              image_scope='inference')

In [None]:
from sagemaker.inputs import TrainingInput
s3_input_train = TrainingInput(s3_data='s3://{}/{}/train'.format(bucket, prefix), content_type='csv')
s3_input_validation = TrainingInput(s3_data='s3://{}/{}/validation/'.format(bucket, prefix), content_type='csv')

In [None]:
xgb = sagemaker.estimator.Estimator(container,
                                    role=sagemaker_iam_role, 
                                    train_instance_count=1, 
                                    train_instance_type='ml.m4.xlarge',
                                    output_path=output_location,
                                    sagemaker_session=session)
xgb.set_hyperparameters(max_depth=5,
                        eta=0.2,
                        gamma=4,
                        min_child_weight=6,
                        subsample=0.8,
                        silent=0,
                        objective='binary:logistic',
                        num_round=100)

In [None]:
xgb.fit({'train': s3_input_train, 'validation': s3_input_validation})

### 6. Upload model and payload data for Inference Recommender job

In [None]:
model_url = xgb.model_data
model_url

In [None]:
payload_archive_name = "payload.tar.gz"

In [None]:
!tar -cvzf {payload_archive_name} payload.csv

In [None]:
from sagemaker import get_execution_role, Session, image_uris
import boto3
import time

region = boto3.Session().region_name
role = get_execution_role()
sm_client = boto3.client("sagemaker", region_name=region)
sagemaker_session = Session()
print(region)

In [None]:
sample_payload_url = sagemaker_session.upload_data(
    path=payload_archive_name, key_prefix="final-fraud"
)

### 7. Create Model Package Group and SageMaker Model

In [None]:
model_package_group_name = "xgboost-fraud" + str(round(time.time()))

In [None]:
from sagemaker.model import Model
from sagemaker import image_uris

model = Model(
    model_data=model_url,
    role=role,
    image_uri = sagemaker.image_uris.retrieve(framework="xgboost", region=region, version="1.5-1", py_version="py3", 
                                              image_scope='inference'),
    sagemaker_session=sagemaker_session
    )

### 8. Register Model (Optional)

In [None]:
model_package = model.register(
    content_types=["text/csv"],
    response_types=["text/csv"],
    model_package_group_name=model_package_group_name,
    image_uri=model.image_uri,
    approval_status="Approved",
    framework="XGBOOST"
)

### 9. Run Default IR Job

The default IR job should take approximately 45 minutes to complete, you can also visualize the results utilizing the SageMaker Studio UI.

In [None]:
default_job_name=f"credit-card-fraud-default-job-{str(round(time.time()))}"

model.right_size(
    sample_payload_url=sample_payload_url,
    supported_content_types=["text/csv"],
    supported_instance_types=["ml.m5.large", "ml.m5.xlarge", "ml.m5.2xlarge", "ml.m5.4xlarge", "ml.m5.12xlarge"],
    framework="XGBOOST",
    job_name=default_job_name
)

### 10. Run Advanced IR Job

Here you can define different environment variables that apply to your container/framework that you want to test and iterate on, Inference Recommender will automatically benchmark the array of values you pass in and return the results for each parameter.

In [None]:
from sagemaker.parameter import CategoricalParameter 
from sagemaker.inference_recommender.inference_recommender_mixin import (  
    Phase,  
    ModelLatencyThreshold 
) 

hyperparameter_ranges = [ 
    { 
        "instance_types": CategoricalParameter(["ml.m5.2xlarge", "ml.m5.4xlarge"]), 
        'OMP_NUM_THREADS': CategoricalParameter(['3','4','5']), 
    } 
] 

phases = [ 
    Phase(duration_in_seconds=120, initial_number_of_users=2, spawn_rate=2),
] 

model_latency_thresholds = [ 
    ModelLatencyThreshold(percentile="P95", value_in_milliseconds=100) 
]

In [None]:
advanced_job_name=f"credit-card-fraud-adv-job-{str(round(time.time()))}"

model.right_size( 
    sample_payload_url=sample_payload_url, 
    supported_content_types=["text/csv"], 
    framework="XGBOOST", 
    job_duration_in_seconds=7200, 
    hyperparameter_ranges=hyperparameter_ranges, 
    phases=phases, # TrafficPattern 
    max_invocations=30000, # StoppingConditions 
    model_latency_thresholds=model_latency_thresholds,
    job_name=advanced_job_name
)

### 11. Create Endpoint (Optional)

In [None]:
import time
from time import gmtime, strftime
endpoint_name = 'deployed-xgboost-fraud-prediction' + strftime("%Y-%m-%d-%H-%M-%S", gmtime())
xgb_predictor = xgb.deploy(initial_instance_count = 1, instance_type = 'ml.c5.xlarge',
                          endpoint_name=endpoint_name)

In [None]:
import boto3
smr = boto3.client('sagemaker-runtime')
resp = smr.invoke_endpoint(EndpointName=endpoint_name, Body=b'1.766913,0.251711,-0.501575,4.214333,0.152405,-0.054836,0.066733,-0.142544,0.823496,1.008849,-0.801094,-3.260263,0.372933,1.674254,-2.125822,0.499348,0.273020,-0.075144,-1.541981,-0.269055,-0.045084,0.070225,0.031831,-0.117100,0.049678,0.056044,-0.075564,-0.046625,84.22', 
                           ContentType='text/csv')

print(resp['Body'].read())

### 12. List Inference recommender job steps

We recently introduced ListInferenceRecommendationsJobSteps that allow you to analyze subtasks in a inference recommender job. Following code snippet show how to use list_inference_recommendations_job_steps boto3 API to get the list of subtasks, this can help with debugging inference recommender job failures at step level. This functionality is not support in Python SDK yet. 

In [None]:
region = boto3.Session().region_name
role = get_execution_role()
sm_client = boto3.client("sagemaker", region_name=region)

list_job_steps_response = sm_client.list_inference_recommendations_job_steps(JobName=default_job_name)
print(list_job_steps_response)

In [None]:
region = boto3.Session().region_name
role = get_execution_role()
sm_client = boto3.client("sagemaker", region_name=region)

list_job_steps_response = sm_client.list_inference_recommendations_job_steps(JobName=advanced_job_name)
print(list_job_steps_response)

Let's analyze the advanced job execution and get insights from the metrics 

In [None]:
import pprint
import pandas as pd

finished = False
while not finished:
    inference_recommender_job = sm_client.describe_inference_recommendations_job(JobName=default_job_name)
    if inference_recommender_job["Status"] in ["COMPLETED", "STOPPED", "FAILED"]:
        finished = True
    else:
        print("In progress")
        time.sleep(300)

if inference_recommender_job["Status"] == "FAILED":
    print("Inference recommender job failed ")
    print("Failed Reason: {}".inference_recommender_job["FailedReason"])
else:
    print("Inference recommender job completed")

In [None]:
data = [
    {**x["EndpointConfiguration"], **x["ModelConfiguration"], **x["Metrics"]}
    for x in inference_recommender_job["InferenceRecommendations"]
]
df = pd.DataFrame(data)
dropFilter = df.filter(["VariantName"])
df.drop(dropFilter, inplace=True, axis=1)
pd.set_option("max_colwidth", 400)
df.head()

In [None]:
finished = False
while not finished:
    inference_recommender_job = sm_client.describe_inference_recommendations_job(
        JobName=advanced_job_name
    )
    if inference_recommender_job["Status"] in ["COMPLETED", "STOPPED", "FAILED"]:
        finished = True
    else:
        print("In progress")
        time.sleep(300)

if inference_recommender_job["Status"] == "FAILED":
    print("Inference recommender job failed")
    print("Failure Reason: {}".format(inference_recommender_job["FailureReason"]))
else:
    print("Inference recommender job completed")

In [None]:
data = [
    {**x["EndpointConfiguration"], **x["ModelConfiguration"], **x["Metrics"]}
    for x in inference_recommender_job["InferenceRecommendations"]
]
df = pd.DataFrame(data)
dropFilter = df.filter(["VariantName"])
df.drop(dropFilter, inplace=True, axis=1)
pd.set_option("max_colwidth", 400)
df.head(20)

### 12. Cleanup

In [None]:
xgb_predictor.delete_endpoint()