# Amazon SageMaker Workshop
## _**Deployment**_

---

In this part of the workshop we will deploy our model created in the previous lab in an endpoint for real-time inferences to Predict Mobile Customer Departure.

---

## Contents

1. [Model hosting](https://docs.aws.amazon.com/sagemaker/latest/dg/how-it-works-hosting.html)
 * Set up a persistent endpoint to get predictions from your model
 
2. [Exercise - You turn to an endpoint and customize inference](#Exercise)
 
---

## Background

In the previous labs [Modeling](../../2-Modeling/modeling.ipynb) and [Evaluation](../../3-Evaluation/evaluation.ipynb) we trained multiple models with multiple SageMaker training jobs and evaluated them .

Let's import the libraries for this lab:


In [None]:
#Supress default INFO loggingd
import logging
logger = logging.getLogger()
logger.setLevel(logging.ERROR)

In [None]:
import time
import json
from time import strftime, gmtime

import boto3

import sagemaker
from sagemaker import get_execution_role
from sagemaker.predictor import csv_serializer
from sagemaker.model_monitor import DataCaptureConfig, DatasetFormat, DefaultModelMonitor
from sagemaker.s3 import S3Uploader, S3Downloader

In [None]:
sess = boto3.Session()
sm = sess.client('sagemaker')
role = sagemaker.get_execution_role()

In [None]:
%store -r bucket
%store -r prefix
%store -r region
%store -r docker_image_name
%store -r framework_version

In [None]:
bucket, prefix, region, docker_image_name, framework_version

---
### - if you _**skipped**_ the lab `2-Modeling/` follow instructions:

 - **run this:**

In [None]:
# # Uncomment if you have not done Lab 2-Modeling

# from config.solution_lab2 import get_estimator_from_lab2
# xgb = get_estimator_from_lab2(docker_image_name, framework_version)

---
### - if you _**have done**_ the lab `2-Modeling/` follow instructions:

 - **run this:**

In [None]:
# # Uncomment if you've done Lab 2-Modeling

#%store -r training_job_name
#xgb = sagemaker.estimator.Estimator.attach(training_job_name)

---
## Host the model

Now that we've trained the model, let's deploy it to a hosted endpoint. To monitor the model after it's hosted and serving requests, we'll also add configurations to capture data that is being sent to the endpoint.

In [None]:
data_capture_prefix = '{}/datacapture'.format(prefix)

endpoint_name = "workshop-xgboost-customer-churn-" + strftime("%Y-%m-%d-%H-%M-%S", gmtime())
print("EndpointName = {}".format(endpoint_name))

In [None]:
xgb_predictor = xgb.deploy(initial_instance_count=1, 
 instance_type='ml.m4.xlarge',
 endpoint_name=endpoint_name,
 data_capture_config=DataCaptureConfig(enable_capture=True,
 sampling_percentage=100,
 destination_s3_uri=f's3://{bucket}/{data_capture_prefix}'
 )
 )

Ok, we just trained a model with SageMaker and then used deployed it in a managed SageMaker endpoint. 

In [None]:
from IPython.core.display import display, HTML
sm_ep_placeholder = "https://us-east-2.console.aws.amazon.com/sagemaker/home?region={}#/endpoints"

display(HTML(f"Look at your endpoints here"))

Or go to the left tab here, inside the Studio UI, and select "Endpoints":

![endpoints.png](media/endpoints.png)

#### Let's save the endpoint name for later (Monitoring lab)

In [None]:
%store endpoint_name

### Invoke the deployed model

Now that we have a hosted endpoint running, we can make real-time predictions from our model by making an http POST request. But first, we need to set up serializers and deserializers for passing our `test_data` NumPy arrays to the model behind the endpoint.

In [None]:
from sagemaker.serializers import CSVSerializer
from sagemaker.deserializers import CSVDeserializer

xgb_predictor.serializer = CSVSerializer()
xgb_predictor.deserializer = CSVDeserializer()

Now, we'll loop over our test dataset and collect predictions by invoking the XGBoost endpoint:

In [None]:
print("Sending test traffic to the endpoint {}. \nPlease wait for a minute...".format(endpoint_name))

count = 0

with open('config/test_sample.csv', 'r') as f:
 for row in f:
 if count == 10: break # only evaluate 10 first items
 payload = row.rstrip('\n')
 response = xgb_predictor.predict(data=payload)
 print(response)
 time.sleep(0.5)
 count+= 1

In [None]:
response

### Verify that data is captured in Amazon S3

When we made some real-time predictions by sending data to our endpoint, we should have also captured that data for monitoring purposes. 

Let's list the data capture files stored in Amazon S3. Expect to see different files from different time periods organized based on the hour in which the invocation occurred. The format of the Amazon S3 path is:

`s3://{destination-bucket-prefix}/{endpoint-name}/{variant-name}/yyyy/mm/dd/hh/filename.jsonl`

In [None]:
from time import sleep

current_endpoint_capture_prefix = '{}/{}'.format(data_capture_prefix, endpoint_name)
for _ in range(12): # wait up to a minute to see captures in S3
 capture_files = S3Downloader.list("s3://{}/{}".format(bucket, current_endpoint_capture_prefix))
 if capture_files:
 break
 sleep(5)

print("Found Data Capture Files:")
print(capture_files)

All the data captured is stored in a SageMaker specific json-line formatted file. Next, Let's take a quick peek at the contents of a single line in a pretty formatted json so that we can observe the format a little better.

In [None]:
capture_file = S3Downloader.read_file(capture_files[-1])

print("=====Single Data Capture====")
print(json.dumps(json.loads(capture_file.split('\n')[0]), indent=2)[:2000])

As you can see, each inference request is captured in one line in the jsonl file. The line contains both the input and output merged together. In our example, we provided the ContentType as `text/csv` which is reflected in the `observedContentType` value. Also, we expose the enconding that we used to encode the input and output payloads in the capture format with the `encoding` value.

To recap, we have observed how you can enable capturing the input and/or output payloads to an Endpoint with a new parameter. We have also observed how the captured format looks like in S3. Let's continue to explore how SageMaker helps with monitoring the data collected in S3.

---
## _Alternative deployment_

Ok, nice! We can train with SageMaker and then deploy in a managed endpoint with monitoring enabled.

But:

#### - What if I already have a model that was trained outside of SageMaker? How do I deploy it in SageMaker without training it previously?

#### - What if I need to preprocess the request before performing inference and then post process what my model just predicted. How can I customize the inference logic with a custom inference script?

# Exercise
### _[Challenge] Your turn!_

Deploy another model in SageMaker. Remember that the output of each training job was an artifact (tar.gz file with the model and other configurations) that was saved in S3.

1. Pick one of this models in S3 or upload another one from your laptop to S3. Then deploy it.
(If you haven't trained a model, pick the `model.tar.gz` in the `config` directory).

2. Add a custom inference script in your endpoint

To make things easiser, you can add a simple post-processing function add a new value to the output `"hello from post-processing function!!!` to the request.

So, if we send to our endpoint: 
```
186,0.1,137.8,97,187.7,118,146.4,85,8.7,6,1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,0.10,0.11,0.12,0.13,0.14,0.15,0.16,0.17,1.1,0.18,0.19,0.20,0.21,0.22,0.23,0.24,0.25,0.26,0.27,0.28,0.29,0.30,0.31,0.32,0.33,0.34,0.35,0.36,0.37,0.38,0.39,0.40,0.41,0.42,0.43,0.44,0.45,0.46,0.47,0.48,0.49,0.50,0.51,0.52,0.53,1.2,1.3,0.54,1.4,0.55
``` 

The output will be something like:
```
0.014719205908477306,"hello from post-processing"
```

Want a hint? [Look here](./solutions/b-hint1.md)

In [None]:
# YOUR SOLUTION HERE


---
# [You can now go to the lab 5-Monitoring](../../5-Monitoring/monitoring.ipynb)