{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Amazon SageMaker Workshop\n",
    "### _**Evaluation**_\n",
    "\n",
    "---\n",
    "In this part of the workshop we will get the previous model we trained to Predict Mobile Customer Departure and evaluate its performance with a test dataset.\n",
    "\n",
    "---\n",
    "\n",
    "## Contents\n",
    "\n",
    "1. [Background](#Background) - Getting the model trained in the previous lab.\n",
    "2. [Evaluate](#Evaluate)\n",
    "    * Creating a script to evaluate model\n",
    "    * Using [SageMaker Processing](https://docs.aws.amazon.com/sagemaker/latest/dg/processing-job.html) jobs to automate evaluation of models\n",
    "3. [Exercise](#a_Exercise) - customizing metrics and evaluation reports\n",
    "4. [Wrap-up - end of Evaluation Lab](#Wrap-up)\n",
    "\n",
    "\n",
    "---\n",
    "\n",
    "## Background\n",
    "\n",
    "In the previous [Modeling](../2-Modeling/modeling.ipynb) lab we used SageMaker trained models by creating multiple SageMaker training jobs.\n",
    "\n",
    "Install and import some packages we'll need for this lab:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "import boto3\n",
    "import sagemaker\n",
    "\n",
    "from sagemaker.experiments.run import Run, load_run\n",
    "from sagemaker.s3 import S3Uploader, S3Downloader"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "sm_sess = sagemaker.session.Session()\n",
    "role = sagemaker.get_execution_role()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Get the variables from initial setup:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "%store -r bucket\n",
    "%store -r prefix\n",
    "%store -r region\n",
    "%store -r docker_image_name\n",
    "%store -r s3uri_train"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "bucket, prefix, region, docker_image_name, s3uri_train"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "---\n",
    "### - if you _**skipped**_ the data preparation lab follow instructions:\n",
    "\n",
    "   - **run this [notebook](./config/pre_setup.ipynb)**\n",
    "   - load the model S3 URI:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# # Uncomment if you skipped the data preparation lab\n",
    "\n",
    "#%store -r s3uri_model\n",
    "#!cp config/model.tar.gz ./\n",
    "\n",
    "#%store -r s3uri_test"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "---\n",
    "### - if you _**have done**_ the previous labs\n",
    "\n",
    "Download the model and test data from S3:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "# # Uncomment if you have done the previous labs\n",
    "\n",
    "# # Get name of training job and other variables\n",
    "#%store -r training_job_name\n",
    "\n",
    "#training_job_name"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "# # Uncomment if you have done the previous labs\n",
    "#estimator = sagemaker.estimator.Estimator.attach(training_job_name)\n",
    "#s3uri_model = estimator.model_data\n",
    "#print(\"\\ns3uri_model =\",s3uri_model)\n",
    "\n",
    "#S3Downloader.download(s3uri_model, \".\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "#%store -r s3uri_test\n",
    "#S3Downloader.download(s3uri_test, \".\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "---"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now you should have the `model.tar.gz` file in the 3-Evaluation directory \n",
    "\n",
    "(click the refresh button)\n",
    "\n",
    "![refresh_dir.png](./media/refresh_dir.png)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Evaluate model\n",
    "\n",
    "Let's create a simple evaluation with some Scikit-Learn Metrics like [Area Under the Curve (AUC)](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.auc.html) and [Accuracy](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.accuracy_score.html)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "import json\n",
    "import os\n",
    "import tarfile\n",
    "import logging\n",
    "import pickle\n",
    "\n",
    "import pandas as pd\n",
    "import xgboost\n",
    "\n",
    "from sklearn.metrics import classification_report, roc_auc_score, accuracy_score\n",
    "\n",
    "\n",
    "model_path = \"model.tar.gz\"\n",
    "with tarfile.open(model_path) as tar:\n",
    "    tar.extractall(path=\".\")\n",
    "\n",
    "print(\"Loading xgboost model.\")\n",
    "model = pickle.load(open(\"xgboost-model\", \"rb\"))\n",
    "model"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "print(\"Loading test input data\")\n",
    "test_path = \"config/test.csv\"\n",
    "df = pd.read_csv(test_path, header=None)\n",
    "df"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "print(\"Reading test data. We should get an `DMatrix` object...\")\n",
    "y_test = df.iloc[:, 0].to_numpy()\n",
    "df.drop(df.columns[0], axis=1, inplace=True)\n",
    "X_test = xgboost.DMatrix(df.values)\n",
    "X_test"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "print(\"Performing predictions against test data.\")\n",
    "predictions_probs = model.predict(X_test)\n",
    "predictions = predictions_probs.round()\n",
    "\n",
    "print(\"Creating classification evaluation report\")\n",
    "acc = accuracy_score(y_test, predictions)\n",
    "auc = roc_auc_score(y_test, predictions_probs)\n",
    "\n",
    "print(\"Accuracy =\", acc)\n",
    "print(\"AUC =\", auc)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Creating a classification report\n",
    "\n",
    "Now, let's save the results in a JSON file, following the structure defined in SageMaker docs:\n",
    "https://docs.aws.amazon.com/sagemaker/latest/dg/model-monitor-model-quality-metrics.html\n",
    "\n",
    "We'll use this logic later in [Lab 6-Pipelines](../6-Pipelines/pipelines.ipynb):"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "import pprint\n",
    "# The metrics reported can change based on the model used - check the link for the documentation \n",
    "report_dict = {\n",
    "    \"binary_classification_metrics\": {\n",
    "        \"accuracy\": {\n",
    "            \"value\": acc,\n",
    "            \"standard_deviation\": \"NaN\",\n",
    "        },\n",
    "        \"auc\": {\"value\": auc, \"standard_deviation\": \"NaN\"},\n",
    "    },\n",
    "}\n",
    "\n",
    "print(\"Classification report:\")\n",
    "pprint.pprint(report_dict)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "evaluation_output_path = os.path.join(\n",
    "    \".\", \"evaluation.json\"\n",
    ")\n",
    "print(\"Saving classification report to {}\".format(evaluation_output_path))\n",
    "\n",
    "with open(evaluation_output_path, \"w\") as f:\n",
    "    f.write(json.dumps(report_dict))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "---\n",
    "\n",
    "## Ok, now we have working code. Let's put it in a Python Script"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "%%writefile evaluate.py\n",
    "\"\"\"Evaluation script for measuring model accuracy.\"\"\"\n",
    "\n",
    "import json\n",
    "import os\n",
    "import tarfile\n",
    "import logging\n",
    "import pickle\n",
    "\n",
    "import pandas as pd\n",
    "import xgboost\n",
    "\n",
    "logger = logging.getLogger()\n",
    "logger.setLevel(logging.INFO)\n",
    "logger.addHandler(logging.StreamHandler())\n",
    "\n",
    "# May need to import additional metrics depending on what you are measuring.\n",
    "# See https://docs.aws.amazon.com/sagemaker/latest/dg/model-monitor-model-quality-metrics.html\n",
    "from sklearn.metrics import classification_report, roc_auc_score, accuracy_score\n",
    "\n",
    "def get_dataset(dir_path, dataset_name) -> pd.DataFrame:\n",
    "    files = [ os.path.join(dir_path, file) for file in os.listdir(dir_path) ]\n",
    "    if len(files) == 0:\n",
    "        raise ValueError(('There are no files in {}.\\n' +\n",
    "                          'This usually indicates that the channel ({}) was incorrectly specified,\\n' +\n",
    "                          'the data specification in S3 was incorrectly specified or the role specified\\n' +\n",
    "                          'does not have permission to access the data.').format(files, dataset_name))\n",
    "    raw_data = [ pd.read_csv(file, header=None) for file in files ]\n",
    "    df = pd.concat(raw_data)\n",
    "    return df\n",
    "\n",
    "if __name__ == \"__main__\":\n",
    "    model_path = \"/opt/ml/processing/model/model.tar.gz\"\n",
    "    with tarfile.open(model_path) as tar:\n",
    "        tar.extractall(path=\"..\")\n",
    "\n",
    "    logger.debug(\"Loading xgboost model.\")\n",
    "    model = pickle.load(open(\"xgboost-model\", \"rb\"))\n",
    "\n",
    "    logger.info(\"Loading test input data\")\n",
    "    test_path = \"/opt/ml/processing/test\"\n",
    "    df = get_dataset(test_path, \"test_set\")\n",
    "\n",
    "    logger.debug(\"Reading test data.\")\n",
    "    y_test = df.iloc[:, 0].to_numpy()\n",
    "    df.drop(df.columns[0], axis=1, inplace=True)\n",
    "    X_test = xgboost.DMatrix(df.values)\n",
    "\n",
    "    logger.info(\"Performing predictions against test data.\")\n",
    "    predictions_probs = model.predict(X_test)\n",
    "    predictions = predictions_probs.round()\n",
    "\n",
    "    logger.info(\"Creating classification evaluation report\")\n",
    "    acc = accuracy_score(y_test, predictions)\n",
    "    auc = roc_auc_score(y_test, predictions_probs)\n",
    "\n",
    "    # The metrics reported can change based on the model used, but it must be a specific name per (https://docs.aws.amazon.com/sagemaker/latest/dg/model-monitor-model-quality-metrics.html)\n",
    "    report_dict = {\n",
    "        \"binary_classification_metrics\": {\n",
    "            \"accuracy\": {\n",
    "                \"value\": acc,\n",
    "                \"standard_deviation\": \"NaN\",\n",
    "            },\n",
    "            \"auc\": {\"value\": auc, \"standard_deviation\": \"NaN\"},\n",
    "        },\n",
    "    }\n",
    "\n",
    "    logger.info(\"Classification report:\\n{}\".format(report_dict))\n",
    "\n",
    "    evaluation_output_path = os.path.join(\n",
    "        \"/opt/ml/processing/evaluation\", \"evaluation.json\"\n",
    "    )\n",
    "    logger.info(\"Saving classification report to {}\".format(evaluation_output_path))\n",
    "\n",
    "    with open(evaluation_output_path, \"w\") as f:\n",
    "        f.write(json.dumps(report_dict))\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "---\n",
    "\n",
    "## Ok, now we are finally running this script with a simple call to SageMaker Processing!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "from sagemaker.processing import (\n",
    "    ProcessingInput,\n",
    "    ProcessingOutput,\n",
    "    ScriptProcessor,\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "# Processing step for evaluation\n",
    "processor = ScriptProcessor(\n",
    "    image_uri=docker_image_name,\n",
    "    command=[\"python3\"],\n",
    "    instance_type=\"ml.m5.xlarge\",\n",
    "    instance_count=1,\n",
    "    base_job_name=\"CustomerChurn/eval-script\",\n",
    "    sagemaker_session=sm_sess,\n",
    "    role=role,\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "entrypoint = \"evaluate.py\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "from time import strftime, gmtime\n",
    "# Helper to create timestamps\n",
    "create_date = lambda: strftime(\"%Y-%m-%d-%H-%M-%S\", gmtime())"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true,
    "tags": []
   },
   "outputs": [],
   "source": [
    "processor.run(\n",
    "    code=entrypoint,\n",
    "    inputs=[\n",
    "        sagemaker.processing.ProcessingInput(\n",
    "            source=s3uri_model,\n",
    "            destination=\"/opt/ml/processing/model\",\n",
    "        ),\n",
    "        sagemaker.processing.ProcessingInput(\n",
    "            source=s3uri_test,\n",
    "            destination=\"/opt/ml/processing/test\",\n",
    "        ),\n",
    "    ],\n",
    "    outputs=[\n",
    "        sagemaker.processing.ProcessingOutput(\n",
    "            output_name=\"evaluation\", source=\"/opt/ml/processing/evaluation\"\n",
    "        ),\n",
    "    ],\n",
    "    job_name=f\"CustomerChurnEval-{create_date()}\"\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "If everything went well, the SageMaker Processing job must have created the JSON with the evaluation report of our model and saved it in S3.\n",
    "\n",
    "In addition, under the hood, SageMaker Processing has uploaded our `evaluate.py` script to S3. Let's check where the script was saved:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "for proc_in in processor.latest_job.inputs:\n",
    "    if proc_in.input_name == \"code\":\n",
    "        s3_evaluation_code_uri = proc_in.source \n",
    "        \n",
    "s3_evaluation_code_uri"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Let's store the S3 URI where our evaluation script was saved for later"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "%store s3_evaluation_code_uri"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Let's check it the evaluation report from S3!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "out_s3_report_uri = processor.latest_job.outputs[0].destination\n",
    "out_s3_report_uri"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "reports_list = S3Downloader.list(out_s3_report_uri)\n",
    "reports_list"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "report = S3Downloader.read_file(reports_list[0])\n",
    "\n",
    "print(\"=====Model Report====\")\n",
    "print(json.dumps(json.loads(report.split('\\n')[0]), indent=2))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Wrap-up\n",
    "\n",
    "Now that we finished the **evaluation lab**, let's make everything here re-usable. It may come in handy later (spoiler alert - when creating Pipelines)..."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "%%writefile ../6-Pipelines/my_labs_solutions/evaluation_solution.py\n",
    "import sagemaker\n",
    "from sagemaker.processing import (\n",
    "    ProcessingInput,\n",
    "    ProcessingOutput,\n",
    "    ScriptProcessor,\n",
    ")\n",
    "\n",
    "def get_evaluation_processor(docker_image_name) -> ScriptProcessor:\n",
    "    \n",
    "    role = sagemaker.get_execution_role()\n",
    "    sm_sess = sagemaker.session.Session()\n",
    "\n",
    "    # Processing step for evaluation\n",
    "    processor = ScriptProcessor(\n",
    "        image_uri=docker_image_name,\n",
    "        command=[\"python3\"],\n",
    "        instance_type=\"ml.m5.xlarge\",\n",
    "        instance_count=1,\n",
    "        base_job_name=\"CustomerChurn/eval-script\",\n",
    "        sagemaker_session=sm_sess,\n",
    "        role=role,\n",
    "    )\n",
    "    \n",
    "    return processor"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "---\n",
    "# SageMaker Clarify\n",
    "\n",
    "Amazon SageMaker Clarify helps improve your machine learning models by detecting potential bias and helping explain how these models make predictions."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Firstly, let's create our model and register it on SageMaker"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "model_name=\"xgboost-churn-1\" # change to any name\n",
    "\n",
    "model = estimator.create_model(name=model_name)\n",
    "container_def = model.prepare_container_def()\n",
    "sm_sess.create_model(model_name,\n",
    "                     role,\n",
    "                     container_def)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "from sagemaker import clarify\n",
    "clarify_processor = clarify.SageMakerClarifyProcessor(role=role,\n",
    "                                                      instance_count=1,\n",
    "                                                      instance_type='ml.m5.xlarge',\n",
    "                                                      sagemaker_session=sm_sess)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "columns_headers = ['Churn', 'Account Length', 'VMail Message', 'Day Mins', 'Day Calls',\n",
    "       'Eve Mins', 'Eve Calls', 'Night Mins', 'Night Calls', 'Intl Mins',\n",
    "       'Intl Calls', 'CustServ Calls', 'State_AK', 'State_AL', 'State_AR',\n",
    "       'State_AZ', 'State_CA', 'State_CO', 'State_CT', 'State_DC', 'State_DE',\n",
    "       'State_FL', 'State_GA', 'State_HI', 'State_IA', 'State_ID', 'State_IL',\n",
    "       'State_IN', 'State_KS', 'State_KY', 'State_LA', 'State_MA', 'State_MD',\n",
    "       'State_ME', 'State_MI', 'State_MN', 'State_MO', 'State_MS', 'State_MT',\n",
    "       'State_NC', 'State_ND', 'State_NE', 'State_NH', 'State_NJ', 'State_NM',\n",
    "       'State_NV', 'State_NY', 'State_OH', 'State_OK', 'State_OR', 'State_PA',\n",
    "       'State_RI', 'State_SC', 'State_SD', 'State_TN', 'State_TX', 'State_UT',\n",
    "       'State_VA', 'State_VT', 'State_WA', 'State_WI', 'State_WV', 'State_WY',\n",
    "       'Area Code_408', 'Area Code_415', 'Area Code_510', \"Int'l Plan_no\",\n",
    "       \"Int'l Plan_yes\", 'VMail Plan_no', 'VMail Plan_yes']"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Writing ModelConfig\n",
    "\n",
    "A *ModelConfig* object communicates information about your trained model. To avoid additional traffic to your production models, SageMaker Clarify sets up and tears down a dedicated endpoint when processing."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "model_config = clarify.ModelConfig(model_name=model_name,\n",
    "                                   instance_type='ml.m5.xlarge',\n",
    "                                   instance_count=1,\n",
    "                                   accept_type='text/csv',\n",
    "                                   content_type='text/csv')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Explaining Predictions\n",
    "\n",
    "There are expanding business needs and legislative regulations that require explanations of why a model made the decision it did. SageMaker Clarify uses SHAP to explain the contribution that each input feature makes to the final decision."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "df.columns = columns_headers[1:]\n",
    "df"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "shap_config = clarify.SHAPConfig(baseline=[df.iloc[0].values.tolist()],\n",
    "                                 num_samples=20,\n",
    "                                 agg_method='mean_abs',\n",
    "                                 save_local_shap_values=False)\n",
    "\n",
    "explainability_output_path = 's3://{}/{}/clarify-explainability'.format(bucket, prefix)\n",
    "\n",
    "explainability_data_config = clarify.DataConfig(s3_data_input_path=s3uri_train,\n",
    "                                s3_output_path=explainability_output_path,\n",
    "                                label='Churn',\n",
    "                                headers=columns_headers,\n",
    "                                dataset_type='text/csv')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "create_date = lambda: strftime(\"%Y-%m-%d-%H-%M-%S\", gmtime())\n",
    "experiment_name=f\"customer-churn-explainability-{create_date()}\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true,
    "tags": []
   },
   "outputs": [],
   "source": [
    "with Run(\n",
    "    experiment_name=experiment_name,\n",
    "    run_name=\"explainabilit-run\",  # create a experiment run with only the model explainabilit on it\n",
    "    sagemaker_session=sm_sess,\n",
    ") as run:\n",
    "    clarify_processor.run_explainability(data_config=explainability_data_config,\n",
    "                                         model_config=model_config,\n",
    "                                         explainability_config=shap_config)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "explainability_output_path"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Viewing Clarify reports on SageMaker Studio\n",
    "\n",
    "There's an easy way to check results inside SageMaker Studio instead of parsing raw file on S3. \n",
    "After your clarify job completes:\n",
    "\n",
    "1. Go to Experiments Menu\n",
    "2. Select the run you've just created:\n",
    "\n",
    "![explainability_run.png](./media/10-explainability.png)\n",
    "\n",
    "3. Click on \"Explainability\" on left:\n",
    "\n",
    "![explainability_details.png](./media/20-exp-det.png)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "---\n",
    "# [You can now go to the lab 4-Deployment](../4-Deployment/RealTime/deployment_hosting.ipynb)"
   ]
  }
 ],
 "metadata": {
  "availableInstances": [
   {
    "_defaultOrder": 0,
    "_isFastLaunch": true,
    "category": "General purpose",
    "gpuNum": 0,
    "memoryGiB": 4,
    "name": "ml.t3.medium",
    "vcpuNum": 2
   },
   {
    "_defaultOrder": 1,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "memoryGiB": 8,
    "name": "ml.t3.large",
    "vcpuNum": 2
   },
   {
    "_defaultOrder": 2,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "memoryGiB": 16,
    "name": "ml.t3.xlarge",
    "vcpuNum": 4
   },
   {
    "_defaultOrder": 3,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "memoryGiB": 32,
    "name": "ml.t3.2xlarge",
    "vcpuNum": 8
   },
   {
    "_defaultOrder": 4,
    "_isFastLaunch": true,
    "category": "General purpose",
    "gpuNum": 0,
    "memoryGiB": 8,
    "name": "ml.m5.large",
    "vcpuNum": 2
   },
   {
    "_defaultOrder": 5,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "memoryGiB": 16,
    "name": "ml.m5.xlarge",
    "vcpuNum": 4
   },
   {
    "_defaultOrder": 6,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "memoryGiB": 32,
    "name": "ml.m5.2xlarge",
    "vcpuNum": 8
   },
   {
    "_defaultOrder": 7,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "memoryGiB": 64,
    "name": "ml.m5.4xlarge",
    "vcpuNum": 16
   },
   {
    "_defaultOrder": 8,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "memoryGiB": 128,
    "name": "ml.m5.8xlarge",
    "vcpuNum": 32
   },
   {
    "_defaultOrder": 9,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "memoryGiB": 192,
    "name": "ml.m5.12xlarge",
    "vcpuNum": 48
   },
   {
    "_defaultOrder": 10,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "memoryGiB": 256,
    "name": "ml.m5.16xlarge",
    "vcpuNum": 64
   },
   {
    "_defaultOrder": 11,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "memoryGiB": 384,
    "name": "ml.m5.24xlarge",
    "vcpuNum": 96
   },
   {
    "_defaultOrder": 12,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "memoryGiB": 8,
    "name": "ml.m5d.large",
    "vcpuNum": 2
   },
   {
    "_defaultOrder": 13,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "memoryGiB": 16,
    "name": "ml.m5d.xlarge",
    "vcpuNum": 4
   },
   {
    "_defaultOrder": 14,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "memoryGiB": 32,
    "name": "ml.m5d.2xlarge",
    "vcpuNum": 8
   },
   {
    "_defaultOrder": 15,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "memoryGiB": 64,
    "name": "ml.m5d.4xlarge",
    "vcpuNum": 16
   },
   {
    "_defaultOrder": 16,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "memoryGiB": 128,
    "name": "ml.m5d.8xlarge",
    "vcpuNum": 32
   },
   {
    "_defaultOrder": 17,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "memoryGiB": 192,
    "name": "ml.m5d.12xlarge",
    "vcpuNum": 48
   },
   {
    "_defaultOrder": 18,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "memoryGiB": 256,
    "name": "ml.m5d.16xlarge",
    "vcpuNum": 64
   },
   {
    "_defaultOrder": 19,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "memoryGiB": 384,
    "name": "ml.m5d.24xlarge",
    "vcpuNum": 96
   },
   {
    "_defaultOrder": 20,
    "_isFastLaunch": true,
    "category": "Compute optimized",
    "gpuNum": 0,
    "memoryGiB": 4,
    "name": "ml.c5.large",
    "vcpuNum": 2
   },
   {
    "_defaultOrder": 21,
    "_isFastLaunch": false,
    "category": "Compute optimized",
    "gpuNum": 0,
    "memoryGiB": 8,
    "name": "ml.c5.xlarge",
    "vcpuNum": 4
   },
   {
    "_defaultOrder": 22,
    "_isFastLaunch": false,
    "category": "Compute optimized",
    "gpuNum": 0,
    "memoryGiB": 16,
    "name": "ml.c5.2xlarge",
    "vcpuNum": 8
   },
   {
    "_defaultOrder": 23,
    "_isFastLaunch": false,
    "category": "Compute optimized",
    "gpuNum": 0,
    "memoryGiB": 32,
    "name": "ml.c5.4xlarge",
    "vcpuNum": 16
   },
   {
    "_defaultOrder": 24,
    "_isFastLaunch": false,
    "category": "Compute optimized",
    "gpuNum": 0,
    "memoryGiB": 72,
    "name": "ml.c5.9xlarge",
    "vcpuNum": 36
   },
   {
    "_defaultOrder": 25,
    "_isFastLaunch": false,
    "category": "Compute optimized",
    "gpuNum": 0,
    "memoryGiB": 96,
    "name": "ml.c5.12xlarge",
    "vcpuNum": 48
   },
   {
    "_defaultOrder": 26,
    "_isFastLaunch": false,
    "category": "Compute optimized",
    "gpuNum": 0,
    "memoryGiB": 144,
    "name": "ml.c5.18xlarge",
    "vcpuNum": 72
   },
   {
    "_defaultOrder": 27,
    "_isFastLaunch": false,
    "category": "Compute optimized",
    "gpuNum": 0,
    "memoryGiB": 192,
    "name": "ml.c5.24xlarge",
    "vcpuNum": 96
   },
   {
    "_defaultOrder": 28,
    "_isFastLaunch": true,
    "category": "Accelerated computing",
    "gpuNum": 1,
    "memoryGiB": 16,
    "name": "ml.g4dn.xlarge",
    "vcpuNum": 4
   },
   {
    "_defaultOrder": 29,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 1,
    "memoryGiB": 32,
    "name": "ml.g4dn.2xlarge",
    "vcpuNum": 8
   },
   {
    "_defaultOrder": 30,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 1,
    "memoryGiB": 64,
    "name": "ml.g4dn.4xlarge",
    "vcpuNum": 16
   },
   {
    "_defaultOrder": 31,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 1,
    "memoryGiB": 128,
    "name": "ml.g4dn.8xlarge",
    "vcpuNum": 32
   },
   {
    "_defaultOrder": 32,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 4,
    "memoryGiB": 192,
    "name": "ml.g4dn.12xlarge",
    "vcpuNum": 48
   },
   {
    "_defaultOrder": 33,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 1,
    "memoryGiB": 256,
    "name": "ml.g4dn.16xlarge",
    "vcpuNum": 64
   },
   {
    "_defaultOrder": 34,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 1,
    "memoryGiB": 61,
    "name": "ml.p3.2xlarge",
    "vcpuNum": 8
   },
   {
    "_defaultOrder": 35,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 4,
    "memoryGiB": 244,
    "name": "ml.p3.8xlarge",
    "vcpuNum": 32
   },
   {
    "_defaultOrder": 36,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 8,
    "memoryGiB": 488,
    "name": "ml.p3.16xlarge",
    "vcpuNum": 64
   },
   {
    "_defaultOrder": 37,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 8,
    "memoryGiB": 768,
    "name": "ml.p3dn.24xlarge",
    "vcpuNum": 96
   },
   {
    "_defaultOrder": 38,
    "_isFastLaunch": false,
    "category": "Memory Optimized",
    "gpuNum": 0,
    "memoryGiB": 16,
    "name": "ml.r5.large",
    "vcpuNum": 2
   },
   {
    "_defaultOrder": 39,
    "_isFastLaunch": false,
    "category": "Memory Optimized",
    "gpuNum": 0,
    "memoryGiB": 32,
    "name": "ml.r5.xlarge",
    "vcpuNum": 4
   },
   {
    "_defaultOrder": 40,
    "_isFastLaunch": false,
    "category": "Memory Optimized",
    "gpuNum": 0,
    "memoryGiB": 64,
    "name": "ml.r5.2xlarge",
    "vcpuNum": 8
   },
   {
    "_defaultOrder": 41,
    "_isFastLaunch": false,
    "category": "Memory Optimized",
    "gpuNum": 0,
    "memoryGiB": 128,
    "name": "ml.r5.4xlarge",
    "vcpuNum": 16
   },
   {
    "_defaultOrder": 42,
    "_isFastLaunch": false,
    "category": "Memory Optimized",
    "gpuNum": 0,
    "memoryGiB": 256,
    "name": "ml.r5.8xlarge",
    "vcpuNum": 32
   },
   {
    "_defaultOrder": 43,
    "_isFastLaunch": false,
    "category": "Memory Optimized",
    "gpuNum": 0,
    "memoryGiB": 384,
    "name": "ml.r5.12xlarge",
    "vcpuNum": 48
   },
   {
    "_defaultOrder": 44,
    "_isFastLaunch": false,
    "category": "Memory Optimized",
    "gpuNum": 0,
    "memoryGiB": 512,
    "name": "ml.r5.16xlarge",
    "vcpuNum": 64
   },
   {
    "_defaultOrder": 45,
    "_isFastLaunch": false,
    "category": "Memory Optimized",
    "gpuNum": 0,
    "memoryGiB": 768,
    "name": "ml.r5.24xlarge",
    "vcpuNum": 96
   },
   {
    "_defaultOrder": 46,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 1,
    "memoryGiB": 16,
    "name": "ml.g5.xlarge",
    "vcpuNum": 4
   },
   {
    "_defaultOrder": 47,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 1,
    "memoryGiB": 32,
    "name": "ml.g5.2xlarge",
    "vcpuNum": 8
   },
   {
    "_defaultOrder": 48,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 1,
    "memoryGiB": 64,
    "name": "ml.g5.4xlarge",
    "vcpuNum": 16
   },
   {
    "_defaultOrder": 49,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 1,
    "memoryGiB": 128,
    "name": "ml.g5.8xlarge",
    "vcpuNum": 32
   },
   {
    "_defaultOrder": 50,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 1,
    "memoryGiB": 256,
    "name": "ml.g5.16xlarge",
    "vcpuNum": 64
   },
   {
    "_defaultOrder": 51,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 4,
    "memoryGiB": 192,
    "name": "ml.g5.12xlarge",
    "vcpuNum": 48
   },
   {
    "_defaultOrder": 52,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 4,
    "memoryGiB": 384,
    "name": "ml.g5.24xlarge",
    "vcpuNum": 96
   },
   {
    "_defaultOrder": 53,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 8,
    "memoryGiB": 768,
    "name": "ml.g5.48xlarge",
    "vcpuNum": 192
   }
  ],
  "instance_type": "ml.t3.medium",
  "kernelspec": {
   "display_name": "Python 3 (Data Science)",
   "language": "python",
   "name": "python3__SAGEMAKER_INTERNAL__arn:aws:sagemaker:us-east-1:081325390199:image/datascience-1.0"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.10"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}