# SageMaker Real-Time Inference
## XGBoost Regression Example

Amazon SageMaker Real-Time Inference is instance-based hosting, you should utilize it for low latency, high throughput sensitive workloads.

For this notebook we'll be working with the SageMaker XGBoost Algorithm to train a model and then deploy a real-time endpoint. We will be using the public S3 Abalone regression dataset for this example.

## Table of Contents
- Setup
- Deployment
 - Model Creation
 - Endpoint Configuration (Prod Variants + Instance Setup)
 - Real-Time Endpoint Creation
 - Endpoint Invocation
- Cleanup

## Setup

For testing you need to properly configure your Notebook Role to have SageMaker Full Access.

In [None]:
# Let's start by installing preview wheels of the Python SDK, boto and aws cli
! pip install sagemaker botocore boto3 awscli --upgrade

In [None]:
# Setup clients
import boto3

client = boto3.client(service_name="sagemaker")
runtime = boto3.client(service_name="sagemaker-runtime")

### SageMaker Setup
To begin, we import the AWS SDK for Python (Boto3) and set up our environment, including an IAM role and an S3 bucket to store our data.

In [None]:
import boto3
import sagemaker
from sagemaker.estimator import Estimator

boto_session = boto3.session.Session()
region = boto_session.region_name
print(region)

sagemaker_session = sagemaker.Session()
base_job_prefix = "xgboost-example"
role = sagemaker.get_execution_role()
print(role)

default_bucket = sagemaker_session.default_bucket()
s3_prefix = base_job_prefix

## Deployment

### Model Creation
Create a model by providing your model artifacts, the container image URI, environment variables for the container (if applicable), a model name, and the SageMaker IAM role.

In [None]:
model_s3_key = f"{s3_prefix}/model.tar.gz"
model_url = f"s3://{default_bucket}/{model_s3_key}"
print(f"Uploading Model to {model_url}")

with open("model/model.tar.gz", "rb") as model_file:
 boto_session.resource("s3").Bucket(default_bucket).Object(model_s3_key).upload_fileobj(model_file)

In [None]:
from time import gmtime, strftime

model_name = "xgboost-realtime" + strftime("%Y-%m-%d-%H-%M-%S", gmtime())
print("Model name: " + model_name)

# environment variables
byo_container_env_vars = {"SAGEMAKER_CONTAINER_LOG_LEVEL": "20"}

inference_instance_type = "ml.m5.xlarge"

# retrieve xgboost image
image_uri = sagemaker.image_uris.retrieve(
 framework="xgboost",
 region=region,
 version="1.0-1",
 py_version="py3",
 instance_type=inference_instance_type,
)

create_model_response = client.create_model(
 ModelName=model_name,
 Containers=[
 {
 "Image": image_uri,
 "Mode": "SingleModel",
 "ModelDataUrl": model_url,
 "Environment": byo_container_env_vars,
 }
 ],
 ExecutionRoleArn=role,
)

print("Model Arn: " + create_model_response["ModelArn"])

### Endpoint Configuration Creation

This is where you can adjust the instance count/type for your endpoint.

In [None]:
xgboost_epc_name = "xgboost-real-time-epc" + strftime("%Y-%m-%d-%H-%M-%S", gmtime())

endpoint_config_response = client.create_endpoint_config(
 EndpointConfigName=xgboost_epc_name,
 ProductionVariants=[
 {
 "VariantName": "byoVariant",
 "ModelName": model_name,
 "InitialInstanceCount": 1,
 "InstanceType": "ml.m5.xlarge"
 },
 ],
)
print("Endpoint Configuration Arn: " + endpoint_config_response["EndpointConfigArn"])

### RealTime Endpoint Creation
Now that we have an endpoint configuration, we can create a RealTime endpoint and deploy our model to it. When creating the endpoint, provide the name of your endpoint configuration and a name for the new endpoint.

In [None]:
endpoint_name = "xgboost-realtime-ep" + strftime("%Y-%m-%d-%H-%M-%S", gmtime())

create_endpoint_response = client.create_endpoint(
 EndpointName=endpoint_name,
 EndpointConfigName=xgboost_epc_name,
)

print("Endpoint Arn: " + create_endpoint_response["EndpointArn"])

Wait until the endpoint status is InService before invoking the endpoint.

In [None]:
# wait for endpoint to reach a terminal state (InService) using describe endpoint
import time

describe_endpoint_response = client.describe_endpoint(EndpointName=endpoint_name)

while describe_endpoint_response["EndpointStatus"] == "Creating":
 describe_endpoint_response = client.describe_endpoint(EndpointName=endpoint_name)
 print(describe_endpoint_response["EndpointStatus"])
 time.sleep(15)

describe_endpoint_response

### Endpoint Invocation
Invoke the endpoint by sending a request to it. The following is a sample data point grabbed from the CSV file downloaded from the public Abalone dataset.

In [None]:
response = runtime.invoke_endpoint(
 EndpointName=endpoint_name,
 Body=b".345,0.224414,.131102,0.042329,.279923,-0.110329,-0.099358,0.0",
 ContentType="text/csv",
)

print(response["Body"].read())

## Clean Up
Delete any resources you created in this notebook that you no longer wish to use.

In [None]:
# client.delete_model(ModelName=model_name)
# client.delete_endpoint_config(EndpointConfigName=xgboost_epc_name)
# client.delete_endpoint(EndpointName=endpoint_name)