# Amazon SageMaker Workshop
### _**Evaluation**_

---
In this part of the workshop we will get the previous model we trained to Predict Mobile Customer Departure and evaluate its performance with a test dataset.

---

## Contents

1. [Background](#Background) - Getting the model trained in the previous lab.
2. [Evaluate](#Evaluate)
 * Creating a script to evaluate model
 * Using [SageMaker Processing](https://docs.aws.amazon.com/sagemaker/latest/dg/processing-job.html) jobs to automate evaluation of models
3. [Exercise](#a_Exercise) - customizing metrics and evaluation reports
4. [Wrap-up - end of Evaluation Lab](#Wrap-up)


---

## Background

In the previous [Modeling](../2-Modeling/modeling.ipynb) lab we used SageMaker trained models by creating multiple SageMaker training jobs.

Install and import some packages we'll need for this lab:

In [None]:
import boto3
import sagemaker

from sagemaker.experiments.run import Run, load_run
from sagemaker.s3 import S3Uploader, S3Downloader

In [None]:
sm_sess = sagemaker.session.Session()
role = sagemaker.get_execution_role()

Get the variables from initial setup:

In [None]:
%store -r bucket
%store -r prefix
%store -r region
%store -r docker_image_name
%store -r s3uri_train

In [None]:
bucket, prefix, region, docker_image_name, s3uri_train

---
### - if you _**skipped**_ the data preparation lab follow instructions:

 - **run this [notebook](./config/pre_setup.ipynb)**
 - load the model S3 URI:

In [None]:
# # Uncomment if you skipped the data preparation lab

#%store -r s3uri_model
#!cp config/model.tar.gz ./

#%store -r s3uri_test

---
### - if you _**have done**_ the previous labs

Download the model and test data from S3:

In [None]:
# # Uncomment if you have done the previous labs

# # Get name of training job and other variables
#%store -r training_job_name

#training_job_name

In [None]:
# # Uncomment if you have done the previous labs
#estimator = sagemaker.estimator.Estimator.attach(training_job_name)
#s3uri_model = estimator.model_data
#print("\ns3uri_model =",s3uri_model)

#S3Downloader.download(s3uri_model, ".")

In [None]:
#%store -r s3uri_test
#S3Downloader.download(s3uri_test, ".")

---

Now you should have the `model.tar.gz` file in the 3-Evaluation directory 

(click the refresh button)

![refresh_dir.png](./media/refresh_dir.png)

# Evaluate model

Let's create a simple evaluation with some Scikit-Learn Metrics like [Area Under the Curve (AUC)](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.auc.html) and [Accuracy](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.accuracy_score.html).

In [None]:
import json
import os
import tarfile
import logging
import pickle

import pandas as pd
import xgboost

from sklearn.metrics import classification_report, roc_auc_score, accuracy_score


model_path = "model.tar.gz"
with tarfile.open(model_path) as tar:
 tar.extractall(path=".")

print("Loading xgboost model.")
model = pickle.load(open("xgboost-model", "rb"))
model

In [None]:
print("Loading test input data")
test_path = "config/test.csv"
df = pd.read_csv(test_path, header=None)
df

In [None]:
print("Reading test data. We should get an `DMatrix` object...")
y_test = df.iloc[:, 0].to_numpy()
df.drop(df.columns[0], axis=1, inplace=True)
X_test = xgboost.DMatrix(df.values)
X_test

In [None]:
print("Performing predictions against test data.")
predictions_probs = model.predict(X_test)
predictions = predictions_probs.round()

print("Creating classification evaluation report")
acc = accuracy_score(y_test, predictions)
auc = roc_auc_score(y_test, predictions_probs)

print("Accuracy =", acc)
print("AUC =", auc)

### Creating a classification report

Now, let's save the results in a JSON file, following the structure defined in SageMaker docs:
https://docs.aws.amazon.com/sagemaker/latest/dg/model-monitor-model-quality-metrics.html

We'll use this logic later in [Lab 6-Pipelines](../6-Pipelines/pipelines.ipynb):

In [None]:
import pprint
# The metrics reported can change based on the model used - check the link for the documentation 
report_dict = {
 "binary_classification_metrics": {
 "accuracy": {
 "value": acc,
 "standard_deviation": "NaN",
 },
 "auc": {"value": auc, "standard_deviation": "NaN"},
 },
}

print("Classification report:")
pprint.pprint(report_dict)

In [None]:
evaluation_output_path = os.path.join(
 ".", "evaluation.json"
)
print("Saving classification report to {}".format(evaluation_output_path))

with open(evaluation_output_path, "w") as f:
 f.write(json.dumps(report_dict))

---

## Ok, now we have working code. Let's put it in a Python Script

In [None]:
%%writefile evaluate.py
"""Evaluation script for measuring model accuracy."""

import json
import os
import tarfile
import logging
import pickle

import pandas as pd
import xgboost

logger = logging.getLogger()
logger.setLevel(logging.INFO)
logger.addHandler(logging.StreamHandler())

# May need to import additional metrics depending on what you are measuring.
# See https://docs.aws.amazon.com/sagemaker/latest/dg/model-monitor-model-quality-metrics.html
from sklearn.metrics import classification_report, roc_auc_score, accuracy_score

def get_dataset(dir_path, dataset_name) -> pd.DataFrame:
 files = [ os.path.join(dir_path, file) for file in os.listdir(dir_path) ]
 if len(files) == 0:
 raise ValueError(('There are no files in {}.\n' +
 'This usually indicates that the channel ({}) was incorrectly specified,\n' +
 'the data specification in S3 was incorrectly specified or the role specified\n' +
 'does not have permission to access the data.').format(files, dataset_name))
 raw_data = [ pd.read_csv(file, header=None) for file in files ]
 df = pd.concat(raw_data)
 return df

if __name__ == "__main__":
 model_path = "/opt/ml/processing/model/model.tar.gz"
 with tarfile.open(model_path) as tar:
 tar.extractall(path="..")

 logger.debug("Loading xgboost model.")
 model = pickle.load(open("xgboost-model", "rb"))

 logger.info("Loading test input data")
 test_path = "/opt/ml/processing/test"
 df = get_dataset(test_path, "test_set")

 logger.debug("Reading test data.")
 y_test = df.iloc[:, 0].to_numpy()
 df.drop(df.columns[0], axis=1, inplace=True)
 X_test = xgboost.DMatrix(df.values)

 logger.info("Performing predictions against test data.")
 predictions_probs = model.predict(X_test)
 predictions = predictions_probs.round()

 logger.info("Creating classification evaluation report")
 acc = accuracy_score(y_test, predictions)
 auc = roc_auc_score(y_test, predictions_probs)

 # The metrics reported can change based on the model used, but it must be a specific name per (https://docs.aws.amazon.com/sagemaker/latest/dg/model-monitor-model-quality-metrics.html)
 report_dict = {
 "binary_classification_metrics": {
 "accuracy": {
 "value": acc,
 "standard_deviation": "NaN",
 },
 "auc": {"value": auc, "standard_deviation": "NaN"},
 },
 }

 logger.info("Classification report:\n{}".format(report_dict))

 evaluation_output_path = os.path.join(
 "/opt/ml/processing/evaluation", "evaluation.json"
 )
 logger.info("Saving classification report to {}".format(evaluation_output_path))

 with open(evaluation_output_path, "w") as f:
 f.write(json.dumps(report_dict))


---

## Ok, now we are finally running this script with a simple call to SageMaker Processing!

In [None]:
from sagemaker.processing import (
 ProcessingInput,
 ProcessingOutput,
 ScriptProcessor,
)

In [None]:
# Processing step for evaluation
processor = ScriptProcessor(
 image_uri=docker_image_name,
 command=["python3"],
 instance_type="ml.m5.xlarge",
 instance_count=1,
 base_job_name="CustomerChurn/eval-script",
 sagemaker_session=sm_sess,
 role=role,
)

In [None]:
entrypoint = "evaluate.py"

In [None]:
from time import strftime, gmtime
# Helper to create timestamps
create_date = lambda: strftime("%Y-%m-%d-%H-%M-%S", gmtime())

In [None]:
processor.run(
 code=entrypoint,
 inputs=[
 sagemaker.processing.ProcessingInput(
 source=s3uri_model,
 destination="/opt/ml/processing/model",
 ),
 sagemaker.processing.ProcessingInput(
 source=s3uri_test,
 destination="/opt/ml/processing/test",
 ),
 ],
 outputs=[
 sagemaker.processing.ProcessingOutput(
 output_name="evaluation", source="/opt/ml/processing/evaluation"
 ),
 ],
 job_name=f"CustomerChurnEval-{create_date()}"
)

If everything went well, the SageMaker Processing job must have created the JSON with the evaluation report of our model and saved it in S3.

In addition, under the hood, SageMaker Processing has uploaded our `evaluate.py` script to S3. Let's check where the script was saved:

In [None]:
for proc_in in processor.latest_job.inputs:
 if proc_in.input_name == "code":
 s3_evaluation_code_uri = proc_in.source 
 
s3_evaluation_code_uri

#### Let's store the S3 URI where our evaluation script was saved for later

In [None]:
%store s3_evaluation_code_uri

### Let's check it the evaluation report from S3!

In [None]:
out_s3_report_uri = processor.latest_job.outputs[0].destination
out_s3_report_uri

In [None]:
reports_list = S3Downloader.list(out_s3_report_uri)
reports_list

In [None]:
report = S3Downloader.read_file(reports_list[0])

print("=====Model Report====")
print(json.dumps(json.loads(report.split('\n')[0]), indent=2))

# Wrap-up

Now that we finished the **evaluation lab**, let's make everything here re-usable. It may come in handy later (spoiler alert - when creating Pipelines)...

In [None]:
%%writefile ../6-Pipelines/my_labs_solutions/evaluation_solution.py
import sagemaker
from sagemaker.processing import (
 ProcessingInput,
 ProcessingOutput,
 ScriptProcessor,
)

def get_evaluation_processor(docker_image_name) -> ScriptProcessor:
 
 role = sagemaker.get_execution_role()
 sm_sess = sagemaker.session.Session()

 # Processing step for evaluation
 processor = ScriptProcessor(
 image_uri=docker_image_name,
 command=["python3"],
 instance_type="ml.m5.xlarge",
 instance_count=1,
 base_job_name="CustomerChurn/eval-script",
 sagemaker_session=sm_sess,
 role=role,
 )
 
 return processor

---
# SageMaker Clarify

Amazon SageMaker Clarify helps improve your machine learning models by detecting potential bias and helping explain how these models make predictions.

### Firstly, let's create our model and register it on SageMaker

In [None]:
model_name="xgboost-churn-1" # change to any name

model = estimator.create_model(name=model_name)
container_def = model.prepare_container_def()
sm_sess.create_model(model_name,
 role,
 container_def)

In [None]:
from sagemaker import clarify
clarify_processor = clarify.SageMakerClarifyProcessor(role=role,
 instance_count=1,
 instance_type='ml.m5.xlarge',
 sagemaker_session=sm_sess)

In [None]:
columns_headers = ['Churn', 'Account Length', 'VMail Message', 'Day Mins', 'Day Calls',
 'Eve Mins', 'Eve Calls', 'Night Mins', 'Night Calls', 'Intl Mins',
 'Intl Calls', 'CustServ Calls', 'State_AK', 'State_AL', 'State_AR',
 'State_AZ', 'State_CA', 'State_CO', 'State_CT', 'State_DC', 'State_DE',
 'State_FL', 'State_GA', 'State_HI', 'State_IA', 'State_ID', 'State_IL',
 'State_IN', 'State_KS', 'State_KY', 'State_LA', 'State_MA', 'State_MD',
 'State_ME', 'State_MI', 'State_MN', 'State_MO', 'State_MS', 'State_MT',
 'State_NC', 'State_ND', 'State_NE', 'State_NH', 'State_NJ', 'State_NM',
 'State_NV', 'State_NY', 'State_OH', 'State_OK', 'State_OR', 'State_PA',
 'State_RI', 'State_SC', 'State_SD', 'State_TN', 'State_TX', 'State_UT',
 'State_VA', 'State_VT', 'State_WA', 'State_WI', 'State_WV', 'State_WY',
 'Area Code_408', 'Area Code_415', 'Area Code_510', "Int'l Plan_no",
 "Int'l Plan_yes", 'VMail Plan_no', 'VMail Plan_yes']

### Writing ModelConfig

A *ModelConfig* object communicates information about your trained model. To avoid additional traffic to your production models, SageMaker Clarify sets up and tears down a dedicated endpoint when processing.

In [None]:
model_config = clarify.ModelConfig(model_name=model_name,
 instance_type='ml.m5.xlarge',
 instance_count=1,
 accept_type='text/csv',
 content_type='text/csv')

## Explaining Predictions

There are expanding business needs and legislative regulations that require explanations of why a model made the decision it did. SageMaker Clarify uses SHAP to explain the contribution that each input feature makes to the final decision.

In [None]:
df.columns = columns_headers[1:]
df

In [None]:
shap_config = clarify.SHAPConfig(baseline=[df.iloc[0].values.tolist()],
 num_samples=20,
 agg_method='mean_abs',
 save_local_shap_values=False)

explainability_output_path = 's3://{}/{}/clarify-explainability'.format(bucket, prefix)

explainability_data_config = clarify.DataConfig(s3_data_input_path=s3uri_train,
 s3_output_path=explainability_output_path,
 label='Churn',
 headers=columns_headers,
 dataset_type='text/csv')

In [None]:
create_date = lambda: strftime("%Y-%m-%d-%H-%M-%S", gmtime())
experiment_name=f"customer-churn-explainability-{create_date()}"

In [None]:
with Run(
 experiment_name=experiment_name,
 run_name="explainabilit-run", # create a experiment run with only the model explainabilit on it
 sagemaker_session=sm_sess,
) as run:
 clarify_processor.run_explainability(data_config=explainability_data_config,
 model_config=model_config,
 explainability_config=shap_config)

In [None]:
explainability_output_path

## Viewing Clarify reports on SageMaker Studio

There's an easy way to check results inside SageMaker Studio instead of parsing raw file on S3. 
After your clarify job completes:

1. Go to Experiments Menu
2. Select the run you've just created:

![explainability_run.png](./media/10-explainability.png)

3. Click on "Explainability" on left:

![explainability_details.png](./media/20-exp-det.png)

---
# [You can now go to the lab 4-Deployment](../4-Deployment/RealTime/deployment_hosting.ipynb)