# Integrate Amazon SageMaker Clarify into StepFunctions 
This repository shows how to utilize Amazon SageMaker Clarify with StepFunctions and their Python SDK to generate an explainability report on a trained SageMaker estimator. 

## What is Clarify
Amazon SageMaker Clarify provides machine learning developers with greater visibility into their training data and models so they can identify and limit bias and explain predictions.
Clarify can:
- Detect bias in your data and model
    - Identify imbalances in data 
    - Check the trained model for bias 
    - Monitor the model for bias
- Explain model behavior 
    - Understand the model, what are the most important features when predicting
    - Monitor the model for changes in behaviour
    - Explain individual model predictions

For details please visit https://aws.amazon.com/sagemaker/clarify/ 

## StepFunctions Integration
This repository shows how to integrate Amazon SageMaker Clarify into a StepFunctions workflow. Throughout our example we will use the [Data Science Python SDK](https://aws-step-functions-data-science-sdk.readthedocs.io/en/stable/) for AWS StepFunctions that will help you develop these workflows. One feature of this SDK is the possibility to integrate Amazon SageMaker processing containers in AWS StepFunctions. Amazon SageMaker Clarify is nothing but a processing job, which we will use here.

Overview of the required steps in the Workflow

1. Build a Training Pipeline including the following steps
    1. Training of the a SKLeran model
    2. Saving of the SKLearn model 
2. Integration of the Amazon SageMaker Clarify processing job into the pipeline requires the following steps:
    1. Two inputs need to be provided:
        1. The Clarify config file analysis_config.json which will be created by an AWS Lambda function. 
        2. The input data to explain predictions for
    2. The output: The S3 destination where the explainability report is saved
    3. The Amazon SageMaker Clarify Processor [Amazon ECR](https://aws.amazon.com/ecr/) container image
3. Create a SageMaker Endpoint Configuration and deploy a SageMaker endpoint with it

## How to use SageMaker Clarify within StepFunctions
Please see the notebook `ModelPipeline-w-Clarify.ipynb` for an example how to use Clarify with the StepFunctions Data Science Python SDK. Example data  needs to be downloaded from the official source [UCI Machine Learning Repository](https://archive.ics.uci.edu/ml/datasets/abalone) as instructed in the notebook. Please run the notebook and replicate it for your own use. 
When applying it to your own data and model keep in mind that the following conditions must be satisfied:
1. The model to be explained must come from the supervised learning model class, i.e. there must exist actual labels for each sample input
1. The model must be packaged into a SageMaker container this could be one of the following
    - a [built-in SageMaker algorithm](https://docs.aws.amazon.com/sagemaker/latest/dg/algos.html)
    - a [pre-built SageMaker docker container](https://docs.aws.amazon.com/sagemaker/latest/dg/docker-containers-prebuilt.html)
    - a custom SageMaker docker container
1. This example uses a pre-built SageMaker docker container, namely SKLearn