# Amazon SageMaker Workshop
## _** Batch Transform Deployment**_

---

In this part of the workshop we will deploy our model created in the previous lab in an batch endpoint for asynchronous inferences to Predict Mobile Customer Departure.

Batch transform uses the same mechanics as real-time hosting to generate predictions. However, unlike real-time hosted endpoints which have persistent hardware (instances stay running until you shut them down), batch transform clusters are torn down when the job completes.

---

## Contents

1. [Batch Transform](https://docs.aws.amazon.com/sagemaker/latest/dg/how-it-works-batch.html)
 * Set up a asynchronous endpoint to get predictions from your model
 
---

## Background

In the previous labs [Modeling](../../2-Modeling/modeling.ipynb) and [Evaluation](../../3-Evaluation/evaluation.ipynb) we trained multiple models with multiple SageMaker training jobs and evaluated them .

Let's import the libraries for this lab:


In [None]:
#Supress default INFO loggingd
import logging
logger = logging.getLogger()
logger.setLevel(logging.ERROR)

In [None]:
import os
import time
import json
import tarfile
from time import strftime, gmtime

import boto3
import pandas as pd
import numpy as np
import pickle
import xgboost

import sagemaker
from sagemaker import get_execution_role
from sagemaker.predictor import csv_serializer
from sagemaker.s3 import S3Uploader, S3Downloader

from sklearn import metrics

In [None]:
sess = boto3.Session()
sm = sess.client('sagemaker')
role = sagemaker.get_execution_role()

In [None]:
%store -r bucket
%store -r prefix
%store -r region
%store -r docker_image_name
%store -r framework_version
%store -r s3uri_test

In [None]:
bucket, prefix, region, docker_image_name, framework_version, s3uri_test

---
### - if you _**skipped**_ the lab `2-Modeling/` follow instructions:

 - **run this:**

In [None]:
# # Uncomment if you have not done Lab 2-Modeling

#from config.solution_lab2 import get_estimator_from_lab2
#xgb = get_estimator_from_lab2(docker_image_name, framework_version)

---
### - if you _**have done**_ the lab `2-Modeling/` follow instructions:

 - **run this:**

In [None]:
# # Uncomment if you've done Lab 2-Modeling

#%store -r training_job_name
#xgb = sagemaker.estimator.Estimator.attach(training_job_name)

---
## Batch Prediction

Batch Transform manages all necessary compute resources, including launching instances to deploy endpoints and deleting them afterward.

#### Download Test Dataset and Model

In [None]:
S3Downloader.download(xgb.model_data, ".")
S3Downloader.download(s3uri_test, ".")

#### Visualizing Test Data

In [None]:
test_path = "test.csv"
df = pd.read_csv(test_path, header=None)
df

* batch_input The batch input dataset used for prediction(test dataset) cannot have target column and should be saved in S3 buckets
* batch_output We need to specify the path for the batch output

In [None]:
test_true_y = df.iloc[:,0] # get target column
test_true_y.to_frame()
test_data_batch = df.iloc[:, 1:] # delete the target column
test_data_batch.to_csv('test_batch.csv', header=False, index=False)
test_data_batch

#### Upload on S3

In [None]:
# upload to S3
boto3.Session().resource('s3').Bucket(bucket).Object(os.path.join(prefix, 'batch/test_batch.csv')).upload_file('test_batch.csv')

In [None]:
s3_batch_input = 's3://{}/{}/batch/test_batch.csv'.format(bucket,prefix) # test data used for prediction
s3_batch_output = 's3://{}/{}/batch/batch-inference'.format(bucket, prefix) # specify the location of batch output

In [None]:
s3_batch_input, s3_batch_output

#### Import Pickle

In [None]:
model_path = "model.tar.gz"
with tarfile.open(model_path) as tar:
 tar.extractall(path=".")

print("Loading xgboost model.")
model = pickle.load(open("xgboost-model", "rb"))
model

#### Testing model locally for randomly subset

In [None]:
print("Some random test data")
x = test_data_batch.sample(1)
print(x)


print("Performing predictions against test data.")

X_test = xgboost.DMatrix(x.values)
predictions_probs = model.predict(X_test)
predictions = predictions_probs.round()

print(predictions)

### Create Batch job and make batch predictions

As we saw in the **2-Modeling** lab, we added custom the inference logic in our script (with the *input_fn and predict_fn*). So just by selecting our previous estimator, we can deploy it and run batch inferences:

In [None]:
# creates a transformer object from the trained model
transformer = xgb.transformer(
 instance_count=1,
 instance_type='ml.m5.large',
 output_path=s3_batch_output)

# calls that object's transform method to create a transform job
transformer.transform(data=s3_batch_input, data_type='S3Prefix', content_type='text/csv', split_type='Line')

transformer.wait()

### Track Results on Sagemaker Experiments
If you open *Experiments and trials* again, and select the "Unassigned trial components", you should see that your SageMaker Transform job executed successfully:

![batch_transform_result.png](./media/batch_transform_result.png)

#### Download Batch result from S3

In [None]:
batch_output = 's3://{}/{}/batch/batch-inference/test_batch.csv.out'.format(bucket,prefix)
S3Downloader.download(batch_output, ".")

In [None]:
batch_output = pd.read_csv('test_batch.csv.out', header=None)
pred_y = np.round(batch_output)
pred_y

## Evaluating Results

Following codes will evaluate job output data, to check accuracy of our Batch Transform model.

In [None]:
def get_score(y_true,y_pred):
 f1 = metrics.f1_score(y_true, y_pred)
 precision = metrics.precision_score(y_true, y_pred)
 recall = metrics.recall_score(y_true, y_pred)
 accuracy = metrics.accuracy_score(y_true, y_pred)
 tn, fp, fn, tp = metrics.confusion_matrix(y_true, y_pred).ravel()
 return precision, recall, f1, accuracy, tn, fp, fn, tp

In [None]:
#get scores
temp_precision, temp_recall, temp_f1, temp_accuracy, tn, fp, fn, tp = get_score(test_true_y, pred_y)
output = [temp_precision,temp_recall,temp_f1,temp_accuracy,tp, fp, tn, fn]
output = pd.Series(output, index=['precision', 'recall', 'f1', 'accuracy', 'tp', 'fp', 'tn', 'fn']) 
print(output[['accuracy', 'tp', 'fp', 'tn', 'fn']])

from sklearn.metrics import classification_report
print(classification_report(test_true_y, pred_y))