# Deploy your model for inference

In this lab you will walk through the process of deploying an XGBoost model that has been approved in the Model Registry. We will create a SageMaker serverless endpoint. For more information on deployment options on SageMaker, visit the [SageMaker documentation](https://docs.aws.amazon.com/sagemaker/latest/dg/deploy-model.html).


In [None]:
!pip install "sagemaker>=2.123.0"

In [None]:
import sagemaker
import boto3
import numpy as np 
import pandas as pd 
import os 
from sagemaker import get_execution_role
from datetime import datetime

# Get default bucket
bucket = sagemaker.Session().default_bucket()
prefix = 'sagemaker/mlops-workshop'

# Get SageMaker Execution Role
role = get_execution_role()
region = boto3.Session().region_name

# SageMaker Session
sagemaker_session = sagemaker.session.Session()

# SageMaker client
sm_client = boto3.client('sagemaker')

### Retrieve variables

In [None]:
%store -r

In [None]:
print(model_package_arn)

In [None]:
from sagemaker.model import ModelPackage

model_package = ModelPackage(
 model_package_arn = model_package_arn, 
 role = role,
 sagemaker_session = sagemaker_session
)

## Deploy the model

Since we are still experimenting with our model, we will create a Serverless Endpoint to save on cost. 
For more information on Serverless Endpoints, visit the SageMaker documentation [here](https://docs.aws.amazon.com/sagemaker/latest/dg/serverless-endpoints.html)

Ensure the model is approved before deploying it.

## Approve the Model


This can be done via the UI as shown in the following image or through SageMaker APIs. In this notebook, we will use the API to approve the model.

![](./imgs/mr-approval.png)

In [None]:
model_package_update_input_dict = {
 "ModelPackageArn" : model_package_arn,
 "ModelApprovalStatus" : "Approved"
}

model_package_update_response = sm_client.update_model_package(**model_package_update_input_dict)

In [None]:
from datetime import datetime
current_time = datetime.now().strftime("%d-%m-%Y-%H-%M-%S")

## Create a Serverless endpoint

This step should take 3-5 minutes to complete. 

In [None]:
from sagemaker.serverless import ServerlessInferenceConfig

endpoint_name = 'xgb-model-' + current_time

try:
 model_package.deploy(
 endpoint_name = endpoint_name,
 serverless_inference_config = ServerlessInferenceConfig(
 memory_size_in_mb = 4096,
 max_concurrency = 1
 )
 )
except Exception as e:
 print("Make sure model is in an Approved state. Navigate to the model registry UI to approve the model\n\n", e)
 raise Exception("Make sure model is in an Approved state. Navigate to the model registry UI to approve the model\n\n")

In [None]:
from sagemaker.predictor import Predictor
xgb_predictor = Predictor(
 endpoint_name = endpoint_name, 
 serializer = sagemaker.serializers.CSVSerializer(),
 sagemaker_session = sagemaker_session
)

## Evaluation
Let us evaluate our model against the test dataset.

As our data is currently stored as NumPy arrays in memory of our notebook instance. To send it in an HTTP POST request, we'll serialize it as a CSV string and then decode the resulting CSV.

*Note: For inference with CSV format, SageMaker XGBoost requires that the data does NOT include the target variable.*

The helper method below allows us to pass in our test data and make predictions against it. The following steps are performed in this helper method. 
1. Loop over our test dataset
1. Split it into mini-batches of rows 
1. Convert those mini-batches to CSV string payloads (notice, we drop the target variable from our dataset first)
1. Retrieve mini-batch predictions by invoking the XGBoost endpoint
1. Collect predictions and convert from the CSV output our model provides into a NumPy array

In [None]:
!aws s3 cp {test_uri}/test.csv test.csv

In [None]:
test_data = pd.read_csv('test.csv', header = None)
test_data

In [None]:
y_true = test_data[0]
data = test_data.drop(0, axis = 1)

In [None]:
data.head()

In [None]:
def predict(data, predictor, rows=500 ):
 split_array = np.array_split(data, int(data.shape[0] / float(rows) + 1))
 predictions = ''
 for array in split_array:
 predictions = ','.join([predictions, predictor.predict(array).decode('utf-8')])

 return np.fromstring(predictions[1:], sep=',')

predictions = predict(data.to_numpy(), xgb_predictor)

A confusion matrix is a table that is often used to describe the performance of a classification model. Below we will check our confusion matrix to see how well we predicted versus actuals.

In [None]:
print(predictions)

In [None]:
pd.crosstab(index=y_true, columns=np.round(predictions), rownames=['actuals'], colnames=['predictions'])

### (Optional) Clean-up

If you are done with this notebook, please run the cell below. This will remove the hosted endpoint you created and avoid any charges from a stray instance being left on.

In [None]:
xgb_predictor.delete_endpoint(delete_endpoint_config=True)

#### Now move on the Module 2 in the workshop