# Using SageMaker Clarify and A2I to create transparent and reliable ML solutions

1. [Overview](#Overview)
2. [Prerequisites and Data](#Prerequisites-and-Data)
    1. [Initialize SageMaker](#Initialize-SageMaker)
    1. [Download data](#Download-data)
    1. [Loading the data: Adult Dataset](#Loading-the-data:-Adult-Dataset) 
    1. [Data inspection](#Data-inspection) 
    1. [Data encoding and upload to S3](#Encode-and-Upload-Training-Data) 
3. [Train and Deploy XGBoost Model](#Train-XGBoost-Model)
    1. [Train Model](#Train-Model)
    1. [Deploy Model to Endpoint](#Deploy-Model)
4. [Amazon SageMaker Clarify](#Amazon-SageMaker-Clarify)
    1. [Explaining Predictions](#Explaining-Predictions)
        1. [Viewing the Explainability Report](#Viewing-the-Explainability-Report)
5. [Create Control Plane Resources for A2I](#Create-Control-Plane-Resources)
    1. [Create Human Task UI](#Create-Human-Task-UI)
    2. [Create Flow Definition](#Create-Flow-Definition)
6. [Starting Human Loops](#Scenario-1-:-When-Activation-Conditions-are-met-,-and-HumanLoop-is-created)
    1. [Wait For Workers to Complete Task](#Wait-For-Workers-to-Complete-Task)
    2. [Check Status of Human Loop](#Check-Status-of-Human-Loop)
    3. [View Task Results](#View-Task-Results)
7. [Preparing new groundtruth data based on the reviewed results](#Merge-the-A2I-prediction-results-with-the-test-data-to-generate-GroundTruth)
8. [Clean Up](#Clean-Up)


## Overview

There are two major challenges being faced by customers looking to implement machine learning solutions in their line of business.    
1. Machine learning models are getting more and more complex and opaque, which makes it harder to explain the predictions of such models.    
2. Machine learning decisions lack the human understanding and collaboration.
    
These challenges prevent lot of customers from financial and healthcare industries to implement machine learning solutions in their business critical functions. Amazon Sagemaker clarify and Amazon Augmented AI(A2I) try to solve both of these challenges from different perspectives.

Amazon SageMaker Clarify helps improve your machine learning models by detecting potential bias and helping explain how these models make predictions. The fairness and explainability functionality provided by SageMaker Clarify takes a step towards enabling AWS customers to build trustworthy and understandable machine learning models.

At the same time, Amazon A2I provides a way to introduce human review loop step in the machine learning inference pipeline. This greatly improves the trust and reliability in the machine learning process.

Based on this understanding, in this notebook, we will look at an example of how we can use both SageMaker Clariy and Amazon A2I at the same time in a single machine learning pipeline to improve transparency and introduce reliability in the inference workflows.

We will use the adult population dataset located at: https://archive.ics.uci.edu/ml/machine-learning-databases/adult/ to determine if a person's salary is greater than $50,000 or less than $50,000.


Below are the steps we will perform as part of this notebook:  
1. Train and deploy an XGBoost model on the Adult population dataset predicting if the person's salary is greater than $50,000.

1. Run Batch inference on the model endpoint along with also running explainability analysis on the batch of records.

1. Filter the negative predictions as we are interested in knowing why the model predicted a person's salary to be less than $50,000 and which features had the most impact in that process.

1. Plot the SHAP values computed by SageMaker Clarify for those negative outcomes, to see which feature/s contributed the most in predicting the negative outcome.

1. Use A2I Human Review Workflow providing the prediction score and SHAP plot for the human reviewer to analyze the outcome to verify the feature attributions in the model.

1. Use the reviewed data as groundtruth to be used for re-training purposes.

## Prerequisites and Data



### Setup Amazon SageMaker Studio Notebook

1. Onboard to Amazon SageMaker Studio using the quick start (https://docs.aws.amazon.com/sagemaker/latest/dg/onboard-quick-start.html). Please attach the [AmazonAugmentedAIFullAccess](https://console.aws.amazon.com/iam/home#/policies/arn%3Aaws%3Aiam%3A%3Aaws%3Apolicy%2FAmazonAugmentedAIFullAccess) permissions policy to the IAM role you create during Studio onboarding to run this notebook.
1. When user is created and is active, click Open Studio.
1. In the Studio landing page, choose File --> New --> Terminal.
1. In the terminal, enter the following code:
    * git clone https://github.com/aws-samples/amazon-sagemaker-clarify-a2i-demo
1. Open the notebook by choosing “sagemaker-clarify-a2i.ipynb” in the amazon-sagemaker-clarify-a2i-demo folder in the left pane of the Studio landing page.

### Install open source SHAP library

First of all, We will need to install the [open source SHAP library](https://shap.readthedocs.io/en/latest/index.html), as we will be using this library to plot the SHAP values computed by SageMaker Clarify further in this notebook. 

There are two ways of installing the SHAP library:
1. If you are using SageMaker Notebook instances, then run `pip install shap` 
2. If you are using SageMaker Studio Notebooks, then run `conda install -c conda-forge shap`

#### If using SageMaker Studio notebook, execute the below cell, or else skip to the next cell.

In [None]:
conda install -c conda-forge shap

#### If using SageMaker Notebook Instances, execute the below cell.

In [None]:
pip install shap

##### NOTE
__You need to restart the kernel, after installing the library for the changes to take effect.__

### Initialize SageMaker

In [None]:
from sagemaker import Session
from sagemaker import get_execution_role
import pandas as pd
import numpy as np
import urllib
import os

# Define IAM role
role = get_execution_role()

session = Session()
bucket = session.default_bucket()
prefix = 'sagemaker/clarify-a2i-demo'
region = session.boto_region_name

### Download data
Data Source: [https://archive.ics.uci.edu/ml/machine-learning-databases/adult/](https://archive.ics.uci.edu/ml/machine-learning-databases/adult/)

Let's __download__ the data and save it in the local folder with the name adult.data and adult.test from UCI repository$^{[2]}$.

$^{[2]}$Dua Dheeru, and Efi Karra Taniskidou. "[UCI Machine Learning Repository](http://archive.ics.uci.edu/ml)". Irvine, CA: University of California, School of Information and Computer Science (2017).

In [None]:
adult_columns = ["Age", "Workclass", "fnlwgt", "Education", "Education-Num", "Marital Status",
                 "Occupation", "Relationship", "Ethnic group", "Sex", "Capital Gain", "Capital Loss",
                 "Hours per week", "Country", "Target"]
if not os.path.isfile('adult.data'):
    urllib.request.urlretrieve('https://archive.ics.uci.edu/ml/machine-learning-databases/adult/adult.data',
                              'adult.data')
    print('adult.data saved!')
else:
    print('adult.data already on disk.')

if not os.path.isfile('adult.test'):
    urllib.request.urlretrieve('https://archive.ics.uci.edu/ml/machine-learning-databases/adult/adult.test',
                              'adult.test')
    print('adult.test saved!')
else:
    print('adult.test already on disk.')

### Loading the data: Adult Dataset
From the UCI repository of machine learning datasets, this database contains 14 features concerning demographic characteristics of 45,222 rows (32,561 for training and 12,661 for testing). The task is to predict whether a person has a yearly income that is more or less than $50,000.

Here are the features and their possible values:
1. **Age**: continuous.
1. **Workclass**: Private, Self-emp-not-inc, Self-emp-inc, Federal-gov, Local-gov, State-gov, Without-pay, Never-worked.
1. **Fnlwgt**: continuous (the number of people the census takers believe that observation represents).
1. **Education**: Bachelors, Some-college, 11th, HS-grad, Prof-school, Assoc-acdm, Assoc-voc, 9th, 7th-8th, 12th, Masters, 1st-4th, 10th, Doctorate, 5th-6th, Preschool.
1. **Education-num**: continuous.
1. **Marital-status**: Married-civ-spouse, Divorced, Never-married, Separated, Widowed, Married-spouse-absent, Married-AF-spouse.
1. **Occupation**: Tech-support, Craft-repair, Other-service, Sales, Exec-managerial, Prof-specialty, Handlers-cleaners, Machine-op-inspct, Adm-clerical, Farming-fishing, Transport-moving, Priv-house-serv, Protective-serv, Armed-Forces.
1. **Relationship**: Wife, Own-child, Husband, Not-in-family, Other-relative, Unmarried.
1. **Ethnic group**: White, Asian-Pac-Islander, Amer-Indian-Eskimo, Other, Black.
1. **Sex**: Female, Male.
    * **Note**: this data is extracted from the 1994 Census and enforces a binary option on Sex
1. **Capital-gain**: continuous.
1. **Capital-loss**: continuous.
1. **Hours-per-week**: continuous.
1. **Native-country**: United-States, Cambodia, England, Puerto-Rico, Canada, Germany, Outlying-US(Guam-USVI-etc), India, Japan, Greece, South, China, Cuba, Iran, Honduras, Philippines, Italy, Poland, Jamaica, Vietnam, Mexico, Portugal, Ireland, France, Dominican-Republic, Laos, Ecuador, Taiwan, Haiti, Columbia, Hungary, Guatemala, Nicaragua, Scotland, Thailand, Yugoslavia, El-Salvador, Trinadad&Tobago, Peru, Hong, Holand-Netherlands.

Next we specify our binary prediction task:  
15. **Target**: <=50,000, >$50,000.

In [None]:
training_data = pd.read_csv("adult.data",
                             names=adult_columns,
                             sep=r'\s*,\s*',
                             engine='python',
                             na_values="?").dropna()

testing_data = pd.read_csv("adult.test",
                            names=adult_columns,
                            sep=r'\s*,\s*',
                            engine='python',
                            na_values="?",
                            skiprows=1).dropna()

training_data.head()

### Data inspection
Plotting histograms for the distribution of the different features is a good way to visualize the data. Let's plot a few of the features that can be considered _sensitive_.  
Let's take a look specifically at the Sex feature of a census respondent. In the first plot we see that there are fewer Female respondents as a whole but especially in the positive outcomes, where they form ~$\frac{1}{7}$th of respondents.

In [None]:
training_data['Sex'].value_counts().sort_values().plot(kind='bar', title='Counts of Sex', rot=0)

In [None]:
training_data['Sex'].where(training_data['Target']=='>50K').value_counts().sort_values().plot(kind='bar', title='Counts of Sex earning >$50K', rot=0)

### Encode and Upload Training Data
Here we encode the training and test data. Encoding input data is not necessary for SageMaker Clarify, but is necessary for XGBoost models.
The below cell does the following:
- Prepare the training data for SageMaker training
- Prepare the test data
- Define the batch size, which we will use to create batch predictions
- Prepare the explainability config data to be used for running the explainability analysis using SageMaker Clarify
- Perform label encoding

To make this notebook run faster, we will be sending a batch of 100 records from the test dataset for prediction and using the same batch for generating explanations powered by SageMaker Clarify. 

Based on your use-case, you may increase the batch size or send the whole CSV to the endpoint. Generally for a production grade setup, you will not need to create batches as batch transform has the ability to break a large csv into multiple small CSVs. But just to make this notebook run faster, we are using a small batch of records for demonstration purpose and quick execution.


In [None]:
from sklearn import preprocessing
def number_encode_features(df):
    result = df.copy()
    encoders = {}
    for column in result.columns:
        if result.dtypes[column] == np.object:
            encoders[column] = preprocessing.LabelEncoder()
            result[column] = encoders[column].fit_transform(result[column].fillna('None'))
    return result, encoders

#preparing the training data with no headers and target columns being the first
training_data = pd.concat([training_data['Target'], training_data.drop(['Target'], axis=1)], axis=1)
training_data, _ = number_encode_features(training_data)
training_data.to_csv('train_data.csv', index=False, header=False)

#preparing the baseline dataset to be used by SageMaker Clarify for explainability analysis
baseline_data = training_data.drop(['Target'], axis = 1)
baseline_data.to_csv('baseline_data.csv', index=False, header=False)


# now preparing the testing data
testing_data, _ = number_encode_features(testing_data)

# defining the batch of records to be used for doing batch predictions and calculating SHAP values.
# You can change this number based on your use-case
batch_size=100

# preparing the explanability data config csv having the batch of records from the testing_data, having target column being the first
explanability_data_config = pd.concat([testing_data['Target'], testing_data.drop(['Target'], axis=1)], axis=1)
explanability_data_config = explanability_data_config[:batch_size]
explanability_data_config.to_csv('explanability_data_config.csv', index=False, header=False)


# setting up the entire test dataset to csv
test_features = testing_data.drop(['Target'], axis = 1)
test_features.to_csv('test_features.csv', index=False, header=False)


# prepare the batch of records for performing inference
test_features_mini_batch = test_features[:batch_size]
test_features_mini_batch.to_csv('test_features_mini_batch.csv', index=False, header=False)

A quick note about our encoding: the "Female" Sex value has been encoded as 0 and "Male" as 1.

Lastly, let's upload the train, test and explanability config data to S3

In [None]:
from sagemaker.s3 import S3Uploader
from sagemaker.inputs import TrainingInput


train_uri = S3Uploader.upload('train_data.csv', 's3://{}/{}'.format(bucket, prefix))
train_input = TrainingInput(train_uri, content_type='csv')

test_mini_batch_uri = S3Uploader.upload('test_features_mini_batch.csv', 's3://{}/{}'.format(bucket, prefix))

explanability_data_config_uri = S3Uploader.upload('explanability_data_config.csv', 's3://{}/{}'.format(bucket, prefix))

### Train XGBoost Model
#### Train Model
Since our focus is on understanding how to use SageMaker Clarify, we keep it simple by using a standard XGBoost model.

In [None]:
from sagemaker.image_uris import retrieve
from sagemaker.estimator import Estimator

container = retrieve('xgboost', region, version='1.2-1')
xgb = Estimator(container,
                role,
                instance_count=1,
                instance_type='ml.m4.xlarge',
                disable_profiler=True,
                sagemaker_session=session)

xgb.set_hyperparameters(max_depth=5,
                        eta=0.2,
                        gamma=4,
                        min_child_weight=6,
                        subsample=0.8,
                        objective='binary:logistic',
                        num_round=800)

xgb.fit({'train': train_input}, logs='None', wait='True')

### Deploy Model

Now, let us deploy the model. Regarding this use case and others where model explainability is required, it is generally the backend teams running a nightly jobs to get the predictions and its explainations to send it to their workforce for review. Hence for such cases, a SageMaker Batch Transform job is more practical than a real-time endpoint. Hence we will setup a Batch Transform job for a small set of records from the test dataset to replicate this scenario.

For setting up the batch transform, we need to specify the following:

- instance_count – Number of EC2 instances to use.
- instance_type – Type of EC2 instance to use, for example, ‘ml.c5.xlarge’.    
- strategy: The strategy used to decide how to batch records in a single request (default: None). Valid values: ‘MultiRecord’ and ‘SingleRecord’.
- assemble_with: How the output is assembled (default: None). Valid values: ‘Line’ or ‘None’.
- output_path: S3 location for saving the transform result. If not specified, results are stored to a default bucket. Note, file(s) will be named with '.out' suffixed to the input file(s) names. Note that in this case, running batch transform over again will overwrite existing output values unless you provide a different path each time.

You can also setup a CloudWatch event to trigger a batch prediction at a particular time of the day/week/month

In [None]:
transformer_s3_output_path ='s3://{}/{}/predictions'.format(bucket, prefix)

xgb_transformer = xgb.transformer(instance_count=1,
                                  instance_type='ml.c5.xlarge',
                                  strategy='MultiRecord',
                                  assemble_with='Line',
                                  output_path=transformer_s3_output_path)

### Run the Batch Predictions

Now it's time to run the batch predictions. Since the Transformer does not provide an API to check when the batch transform job is completed, one of the following options can be chosen:
- Setup a CloudWatch event to send an SNS notification that the job is completed. (Recommended for any customer facing project in production)
- call the wait() method on the transformer so that the notebook execution will wait for the transform job to complete.

For demonstration purpose, we are using the second option.

In [None]:
xgb_transformer.transform(test_mini_batch_uri, content_type='text/csv', split_type='Line')
xgb_transformer.wait()

### **_NOTE_**: #### 

**The output of the model is a prediction score between 0 and 1, where the prediction score will denote the probability of the person's salary being greater than $50,000.**

**For example:** if the model gives a prediction score of 0.3, it means that the model sees a 30% probability that the salary of the person would be greater than \\$50,000, which is quite a low probability. Similarly, if the prediction score is 0.9, it means the models finds a probability of 90% that the person's salary would be greater than $50,000

## Amazon SageMaker Clarify
Now that the predictions have been made, let's setup a processor definition for SageMaker Clarify. For running the explainability analysis on the model, SageMaker Clarify uses SageMaker Processing jobs under the hood.

The first step is to setup a `SageMakerClarifyProcessor`

In [None]:
from sagemaker import clarify
clarify_processor = clarify.SageMakerClarifyProcessor(role=role,
                                                      instance_count=1,
                                                      instance_type='ml.c5.xlarge',
                                                      sagemaker_session=session)

#### Writing ModelConfig

Now, you setup the `ModelConfig` object. This object communicates information about your trained model

**Note**: To avoid additional traffic to your production models, SageMaker Clarify sets up and tears down a temporary endpoint when processing. `ModelConfig` specifies your preferred instance type and instance count used to run your model on during Clarify's processing.

In [None]:
from sagemaker import clarify


model_config = clarify.ModelConfig(model_name=xgb_transformer.model_name,
                                   instance_type='ml.c5.xlarge',
                                   instance_count=1,
                                   accept_type='text/csv')

### Explaining Predictions
There are expanding business needs and legislative regulations that require explainations of _why_ a model made the decision it did. SageMaker Clarify uses the [KernelSHAP](https://arxiv.org/abs/1705.07874) algorithm to explain the contribution that each input feature makes to the final decision.

To do this, you need to provide some details in terms of setting up SHAP related configuration, an S3 output path where the explainability results will be stored and data configuration related to running the explainability analysis. Note that we are supplying the same `test_mini_batch_uri` which we used for predictions. The below cell does the following:
- Calculates the baseline to be used in `shap_config`. Here the complete training dataset is supplied to calculate a good baseline. The `baseline_data.csv` is basically the training dataset without having the target column in it.
- Treats the whole training dataset to be used as a baseline for `SHAPConfig`
- Setup `DataConfig` providing details on where the input data is located and where to store the results along with more details.

__NOTE__: The value for `num_samples` is given for demonstration purpose only. To increase the fidelity of SHAP values, use a larger value for `num_samples`

In [None]:
# Here use the mean value of training dataset as SHAP baseline
shap_baseline_df = pd.read_csv("baseline_data.csv", header=None)
shap_baseline = [list(shap_baseline_df.mean())]

# create the SHAPConfig
shap_config = clarify.SHAPConfig(baseline=shap_baseline,
                                 num_samples=15,
                                 agg_method='mean_abs',
                                 use_logit=True)

explainability_output_path = 's3://{}/{}/explainability'.format(bucket, prefix)

# create the DataConfig
explainability_data_config = clarify.DataConfig(s3_data_input_path=explanability_data_config_uri,
                                s3_output_path=explainability_output_path,
                                label='Target',
                                headers=training_data.columns.to_list(),
                                dataset_type='text/csv')

### Run the explainability analysis

Now we are all set. Let us trigger the explainability analysis job. Once the job is finished, the result will be uploaded to the s3 output path set in the previous cell.

In [None]:
clarify_processor.run_explainability(data_config=explainability_data_config,
                                     model_config=model_config,
                                     explainability_config=shap_config)

### Download the explanability results and batch predictions

Now, download the explanability result data and also the batch prediction data to start preparing it for A2I. The below cell will do the following:
- Download the csv containing the SHAP values for individual rows passed as part of `data_config` in the `run_explainability method`
- Download the `analysis.json` from explanability results, containing the global SHAP values and the expected `base value`
- Download the batch transform prediction results
- Create a single pandas dataframe containing predictions and the SHAP values corresponding to it
- Creating a new column in the same dataframe, named as `Prediction` by keeping the value as `0` for all the prediction scores `less than 0.5` and value `1` for prediction scores `greater than 0.5` to `1` where, `0` denotes person's salary to be `less than $50,000` and `1` denotes the salary to be `greater than $50,000`


In [None]:
from sagemaker.s3 import S3Downloader
import json

# read the shap values
S3Downloader.download(s3_uri=explainability_output_path+"/explanations_shap", local_path="output")
shap_values_df = pd.read_csv("output/out.csv")

# read the inference results
S3Downloader.download(s3_uri=transformer_s3_output_path, local_path="output")
predictions_df = pd.read_csv("output/test_features_mini_batch.csv.out", header=None)
predictions_df = predictions_df.round(5)

# get the base expected value to be used to plot SHAP values
S3Downloader.download(s3_uri=explainability_output_path+"/analysis.json", local_path="output")

with open('output/analysis.json') as json_file:
    data = json.load(json_file)
    base_value = data['explanations']['kernel_shap']['label0']['expected_value']

print("base value: ", base_value)

predictions_df.columns = ['Probability_Score']

# join the probability score and shap values together in a single data frame
prediction_shap_df = pd.concat([predictions_df,shap_values_df],axis=1)

#create a new column as 'Prediction' converting the prediction to either 1 or 0
prediction_shap_df.insert(0,'Prediction', (prediction_shap_df['Probability_Score'] > 0.5).astype(int))

#adding an index column based on the batch size;to be used for merging the A2I predictions with the groundtruth.
prediction_shap_df['row_num'] = test_features_mini_batch.index

## Step 5 - Set up a human review loop for high-confidence detection using Amazon A2I

Amazon Augmented AI (Amazon A2I) makes it easy to build the workflows required for human review of ML predictions. Amazon A2I brings human review to all developers, removing the undifferentiated heavy lifting associated with building human review systems or managing large numbers of human reviewers.

To incorporate Amazon A2I into your human review workflows you need:

A worker task template to create a worker UI. The worker UI displays your input data, such as documents or images, and instructions to workers. It also provides interactive tools that the worker uses to complete your tasks. For more information, see [A2I instructions overview](https://docs.aws.amazon.com/sagemaker/latest/dg/a2i-create-worker-template-console.html)

A human review workflow, also referred to as a flow definition. You use the flow definition to configure your human workforce and provide information about how to accomplish the human review task. To learn more see [create flow definition](https://docs.aws.amazon.com/sagemaker/latest/dg/a2i-create-flow-definition.html)

When using a custom task type, you start a human loop using the Amazon Augmented AI Runtime API. When you call StartHumanLoop in your custom application, a task is sent to human reviewers.

#### In this section, you set up a human review loop for low-confidence detections in Amazon A2I. It includes the following steps:

* Create or choose your workforce
* Create a human task UI
* Create the flow definition
* Trigger conditions for human loop activation
* Check the human loop status and wait for reviewers to complete the task

Let's now initialize some variables that we need in the subsequent steps

In [None]:
import io
import uuid
import time
import boto3

timestamp = time.strftime("%Y-%m-%d-%H-%M-%S", time.gmtime())
# Amazon SageMaker client
sagemaker_client = boto3.client('sagemaker')

# Amazon Augment AI (A2I) client
a2i = boto3.client('sagemaker-a2i-runtime')

# Amazon S3 client 
s3 = boto3.client('s3')

# Flow definition name - this value is unique per account and region. You can also provide your own value here.
flow_definition_name = 'flow-def-clarify-a2i-' + timestamp

# Task UI name - this value is unique per account and region. You can also provide your own value here.
task_UI_name = 'task-ui-clarify-a2i-' + timestamp

# Flow definition outputs
flow_definition_output_path = f's3://{bucket}/{prefix}/clarify-a2i-results'

### Create your workforce

This step requires you to use the AWS Console. You will create a private workteam and add only one user (you) to it. To create a private team:

1. Go to AWS Console > Amazon SageMaker > Labeling workforces
1. Click "Private" and then "Create private team".
1. Enter the desired name for your private workteam.
1. Enter your own email address in the "Email addresses" section.
1. Enter the name of your organization and a contact email to administer the private workteam.
1. Click "Create Private Team".
1. The AWS Console should now return to AWS Console > Amazon SageMaker > Labeling workforces. Your newly created team should be visible under "Private teams". Next to it you will see an ARN which is a long string that looks like arn:aws:sagemaker:region-name-123456:workteam/private-crowd/team-name. **Please enter this ARN in the cell below**
1. You should get an email from no-reply@verificationemail.com that contains your workforce username and password.
1. In AWS Console > Amazon SageMaker > Labeling workforces, click on the URL in Labeling portal sign-in URL. Use the email/password combination from Step 8 to log in (you will be asked to create a new, non-default password).
1. This is your private worker's interface. When you create an A2I task in Verify your task using a private team below, your task should appear in this window. You can invite your colleagues to participate in the labeling job by clicking the "Invite new workers" button.

In [None]:
workteam_arn = "<enter the ARN of your private labeling workforce>"

### Create the human task UI

Create a human task UI resource, giving a UI template in liquid html. This template will be rendered to the human workers whenever human loop is required. For over 70 pre built UIs, check: https://github.com/aws-samples/amazon-a2i-sample-task-uis

In [None]:
template = r"""
<script src="https://assets.crowd.aws/crowd-html-elements.js"></script>

<style>
  table, tr, th, td {
    border: 1px solid black;
    border-collapse: collapse;
    padding: 5px;
  }
</style>

<crowd-form>
    <div>
        <h1>Instructions</h1>
        <p>Please review the predictions in the Predictions table based on the input data table below, and make corrections where appropriate. </p>
        <p> Here are the labels: </p>
        <p> 0: Salary is less than $50K </p>
        <p> 1: Salary is greater than $50K </p>
         <p> NOTE: There is also a column showing the probability score, 
         which tells you how confident the model is that the person's salary would be greater than $50,000. 
         Currently every row with probability score greater than 0.5 shows the prediction as 1 
         and for rows with probability less than 0.5, the prediction is marked as 0</p>
       <p>Your task is to look at the prediction, probability score and the SHAP plot to understand which features contributed most to the model's prediction 
       and the probability of the model suggesting a positive outcome</p>  
    </div>
    <div>
      <h3> Adult Population dataset </h3>
     
   </div>
    <br>
    <h1> Predictions Table </h1>
    <table>
      <tr>
        <th>ROW NUMBER</th>
        <th>MODEL PREDICTION</th>
        <th>PROBABILITY SCORE</th>
        <th>SHAP VALUES</th>
        <th>AGREE/DISAGREE WITH ML RATING?</th>
        <th>YOUR PREDICTION</th>
        <th>CHANGE REASON </th>
      </tr>

      {% for pair in task.input.Pairs %}

        <tr>
        
          <td>{{ pair.row }}</td>
          
          <td><crowd-text-area name="predicted{{ forloop.index }}" value="{{ pair.prediction }}"></crowd-text-area></td>
          
          <td><crowd-text-area name="confidence{{ forloop.index }}" value="{{ pair.probability_score }}"></crowd-text-area></td>
        
          <td><img src="{{ pair.shap_image_s3_uri | grant_read_access }}" alt="shap value plot" style="width:auto; height:auto;"></td>
          
          <td>
            <p>
              <input type="radio" id="agree{{ forloop.index }}" name="rating{{ forloop.index }}" value="agree" required>
              <label for="agree{{ forloop.index }}">Agree</label>
            </p>
            <p>
              <input type="radio" id="disagree{{ forloop.index }}" name="rating{{ forloop.index }}" value="disagree" required>
              <label for="disagree{{ forloop.index }}">Disagree</label>       
            </p> 
          </td>
          
          <td>
            <p>
            <input type="text" name="True Prediction{{ forloop.index }}" placeholder="Enter your Prediction" />
            </p>
           </td>
           
           <td>
            <p>
            <input type="text" name="Change Reason{{ forloop.index }}" placeholder="Explain why you changed the prediction" />
            </p>
           </td>
           
        </tr>

      {% endfor %}

    </table>
</crowd-form>
"""



def create_task_ui():
    '''
    Creates a Human Task UI resource.

    Returns:
    struct: HumanTaskUiArn
    '''
    response = sagemaker_client.create_human_task_ui(
        HumanTaskUiName=task_UI_name,
        UiTemplate={'Content': template})
    return response

In [None]:
# Create task UI
human_task_UI_response = create_task_ui()

human_task_Ui_arn = human_task_UI_response['HumanTaskUiArn']

print(human_task_Ui_arn)

### Create the Flow Definition
In this section, we're going to create a flow definition definition. Flow Definitions allow us to specify:
- The workforce that your tasks will be sent to. 
- The instructions that your workforce will receive. This is called a worker task template. 
- Where your output data will be stored.

This demo is going to use the API, but you can optionally create this workflow definition in the console as well. For more details and instructions, see: https://docs.aws.amazon.com/sagemaker/latest/dg/a2i-create-flow-definition.html.

In [None]:
create_workflow_definition_response = sagemaker_client.create_flow_definition(
        FlowDefinitionName= flow_definition_name,
        RoleArn= role,
        HumanLoopConfig= {
            "WorkteamArn": workteam_arn,
            "HumanTaskUiArn": human_task_Ui_arn,
            "TaskCount": 1,
            "TaskDescription": "Review the model predictions and SHAP values and determine if you agree or disagree. Assign a label of 1 to indicate positive result or 0 to indicate a negative result based on your review of the prediction, probability and SHAP values",
            "TaskTitle": "Using Clarify and A2I"
        },
        OutputConfig={
            "S3OutputPath" : flow_definition_output_path
        }
    )

flow_definition_arn = create_workflow_definition_response['FlowDefinitionArn']

In [None]:
# Describe flow definition - status should be active
for x in range(60):
    describe_flow_definition_response = sagemaker_client.describe_flow_definition(FlowDefinitionName=flow_definition_name)
    print(describe_flow_definition_response['FlowDefinitionStatus'])
    if (describe_flow_definition_response['FlowDefinitionStatus'] == 'Active'):
        print("Flow Definition is active")
        break
    time.sleep(2)

### Trigger human loop for all predictions with a negative outcome

We would like to send all the predictions with a negative outcome, to an Amazon A2I Human loop. We would like to check which features contributed to the model prediction while predicting a person's salary to be less than \\$50,000. This can help identify if the model is only giving negative outcome for people belonging to a certain gender or ethnicity group etc. We will also be showing the probability scores along with the predictions and SHAP plots. This is to give complete visibility to the reviewer about how confident the model was, while making a certain prediction.

In [None]:
negative_outcomes_df = prediction_shap_df[prediction_shap_df.iloc[:, 0] == 0]

### Plot the SHAP values computed by SageMaker Clarify for the negative outcomes

Now, Plot the SHAP values for each of the negative outcomes, export the plots as an image and upload them to an s3 location. These images will be rendered in the task review template along with the predictions.

Also, to make it easy to access the s3 path of the images corresponding to predictions, appends the corresponding s3 uris of images in the same dataframe where predictions and SHAP values are present.

In [None]:
import shap
import matplotlib.pyplot as plt

column_list = list(test_features_mini_batch.columns)

s3_uris =[]
for i in range(len(negative_outcomes_df)):
    explanation_obj = shap._explanation.Explanation(values=negative_outcomes_df.iloc[i,2:-1].to_numpy(), base_values=base_value, data=test_features_mini_batch.iloc[i].to_numpy(), feature_names=column_list)
    shap.plots.waterfall(shap_values=explanation_obj, max_display=4, show=False)
    img_name = 'shap-' + str(i) + '.png'
    plt.savefig('shap_images/'+img_name, bbox_inches='tight')
    plt.close()
    s3_uri = S3Uploader.upload('shap_images/'+img_name, 's3://{}/{}/shap_images'.format(bucket, prefix))
    s3_uris.append(s3_uri)

    
negative_outcomes_df['shap_image_s3_uri'] = s3_uris

In [None]:
print(f"{len(negative_outcomes_df)} out of {len(predictions_df)} samples or " +
      '{:.1%} of the predictions will be sent to review.'.format(len(negative_outcomes_df)/len(predictions_df)))

### Trigger the Human Review Loop

Now, all is set to trigger the human review loop. The below cell will:
- Pick a set of negative outcome records (for example: 3 records)
- Create a human review loop for it, showing all the three records in a single template
- Wait untill the reviewers have completed their tasks
- Append all completed human review loop details in a list

In [None]:
import json
import time

# Note that the prediction is in terms of a probability from 0 to 1 for a discrete label of 1 indicating the person has a salary < $50K

prediction_list = negative_outcomes_df.iloc[:,:1].values.flatten().tolist()

probability_score_list = negative_outcomes_df.iloc[:,1:2].values.flatten().tolist()

probability_score_list

row_num_list = negative_outcomes_df.iloc[:,-2:-1].values.flatten().tolist()

NUM_TO_REVIEW = len(negative_outcomes_df) # You can change this number as desired

completed_human_loops = []

step_size = 3

for i in range(0, NUM_TO_REVIEW, step_size):
    if i+step_size <= NUM_TO_REVIEW-1:
        start_idx = i
        end_idx = i+step_size
    else:
        start_idx = i
        end_idx = NUM_TO_REVIEW
        
    item_list = [{'row': "{}".format(row_num_list[j]), 'prediction': prediction_list[j], 'probability_score': probability_score_list[j], 'shap_image_s3_uri': s3_uris[j]} for j in range(start_idx, end_idx)]

    ip_content = {'Pairs': item_list}    
    
    humanLoopName = str(uuid.uuid4())
    start_loop_response = a2i.start_human_loop(
            HumanLoopName=humanLoopName,
            FlowDefinitionArn=flow_definition_arn,
            HumanLoopInput={
                "InputContent": json.dumps(ip_content)
            }
        ) 
    
    print("Task - " + str(i) + " submitted, Now, Navigate to the private worker portal and perform the tasks. Make sure you've invited yourself to your workteam!")
    
    response = a2i.describe_human_loop(HumanLoopName=humanLoopName)
    status = response["HumanLoopStatus"]
    while status != "Completed":
        print("Task still in-progress, wait for 10 more seconds for reviewers to complete the task...")
        time.sleep(10)   
        response = a2i.describe_human_loop(HumanLoopName=humanLoopName)
        status = response["HumanLoopStatus"]
    
    print("Human Review Loop for the Task - " +  str(i) + " completed")
    completed_human_loops.append(response)


Let's inspect the results of the human review tasks. We will also start preparing the groundtruth labels

In [None]:
import re
import pprint

pp = pprint.PrettyPrinter(indent=4)

groundtruth_labels = {}

for resp in completed_human_loops:
    splitted_string = re.split('s3://' +  bucket + '/', resp['HumanLoopOutput']['OutputS3Uri'])
    output_bucket_key = splitted_string[1]

    response = s3.get_object(Bucket=bucket, Key=output_bucket_key)
    content = response["Body"].read()
    json_output = json.loads(content)
    
    j=1
    for i in range(0, step_size):
        if json_output['humanAnswers'][0]['answerContent']['rating{}'.format(j)]['agree'] == True:
            groundtruth_labels[json_output['inputContent']['Pairs'][i]['row']] = 0
        else:
             groundtruth_labels[json_output['inputContent']['Pairs'][i]['row']] = 1
        j = j +1

json_output

### Merge the A2I prediction results with the test data to generate GroundTruth    

Since the predictions have been reviewed by human reviewers with analysis provided by SageMaker Clarify, we can treat these predictions as groundtruth data for further re-training purposes.

So, let us merge the A2I predictions with the batch of testdata used earlier.


In [None]:
new_training_data = testing_data[:batch_size]

new_training_data['row_num'] = test_features_mini_batch.index


for row in groundtruth_labels:
    new_training_data.loc[(new_training_data.row_num == int(row)), 'Target'] = groundtruth_labels[row]


new_training_data.to_csv('new_training_data.csv', index=False, header=True)

S3Uploader.upload('new_training_data.csv', 's3://{}/{}'.format(bucket, prefix))


### Clean Up
Finally, don't forget to clean up the resources we set up and used for this demo!

In [None]:
session.delete_model(model_name)