{ "cells": [ { "cell_type": "markdown", "metadata": { "Collapsed": "false" }, "source": [ "# AutoGluon Tabular with SageMaker\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---\n", "\n", "This notebook's CI test result for us-west-2 is as follows. CI test results in other regions can be found at the end of the notebook. \n", "\n", "![This us-west-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/us-west-2/advanced_functionality|autogluon-tabular|AutoGluon_Tabular_SageMaker.ipynb)\n", "\n", "---" ] }, { "cell_type": "markdown", "metadata": { "Collapsed": "false" }, "source": [ "\n", "[AutoGluon](https://github.com/awslabs/autogluon) automates machine learning tasks enabling you to easily achieve strong predictive performance in your applications. With just a few lines of code, you can train and deploy high-accuracy deep learning models on tabular, image, and text data.\n", "This notebook shows how to use AutoGluon-Tabular with Amazon SageMaker by creating custom containers." ] }, { "cell_type": "markdown", "metadata": { "Collapsed": "false" }, "source": [ "## Prerequisites\n", "\n", "If using a SageMaker hosted notebook, select kernel `conda_mxnet_p36`." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "Collapsed": "false" }, "outputs": [], "source": [ "import subprocess\n", "\n", "# Make sure docker compose is set up properly for local mode\n", "subprocess.run(\"./setup.sh\", shell=True)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# For Studio\n", "subprocess.run(\"apt-get update -y\", shell=True)\n", "subprocess.run(\"apt install unzip\", shell=True)\n", "subprocess.run(\"pip install ipywidgets\", shell=True)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "Collapsed": "false" }, "outputs": [], "source": [ "import os\n", "import sys\n", "import boto3\n", "import sagemaker\n", "from time import sleep\n", "from collections import Counter\n", "import numpy as np\n", "import pandas as pd\n", "from sagemaker import get_execution_role, local, Model, utils, s3\n", "from sagemaker.estimator import Estimator\n", "from sagemaker.predictor import Predictor\n", "from sagemaker.serializers import CSVSerializer\n", "from sagemaker.deserializers import StringDeserializer\n", "from sklearn.metrics import accuracy_score, classification_report\n", "from IPython.core.display import display, HTML\n", "from IPython.core.interactiveshell import InteractiveShell\n", "\n", "# Print settings\n", "InteractiveShell.ast_node_interactivity = \"all\"\n", "pd.set_option(\"display.max_columns\", 500)\n", "pd.set_option(\"display.max_rows\", 10)\n", "\n", "# Account/s3 setup\n", "session = sagemaker.Session()\n", "local_session = local.LocalSession()\n", "bucket = session.default_bucket()\n", "prefix = \"sagemaker/autogluon-tabular\"\n", "region = session.boto_region_name\n", "role = get_execution_role()\n", "client = session.boto_session.client(\n", " \"sts\", region_name=region, endpoint_url=utils.sts_regional_endpoint(region)\n", ")\n", "account = client.get_caller_identity()[\"Account\"]\n", "\n", "registry_uri_training = sagemaker.image_uris.retrieve(\n", " \"mxnet\",\n", " region,\n", " version=\"1.7.0\",\n", " py_version=\"py3\",\n", " instance_type=\"ml.m5.2xlarge\",\n", " image_scope=\"training\",\n", ")\n", "registry_uri_inference = sagemaker.image_uris.retrieve(\n", " \"mxnet\",\n", " region,\n", " version=\"1.7.0\",\n", " py_version=\"py3\",\n", " instance_type=\"ml.m5.2xlarge\",\n", " image_scope=\"inference\",\n", ")\n", "ecr_uri_prefix = account + \".\" + \".\".join(registry_uri_training.split(\"/\")[0].split(\".\")[1:])" ] }, { "cell_type": "markdown", "metadata": { "Collapsed": "false" }, "source": [ "### Build docker images" ] }, { "cell_type": "markdown", "metadata": { "Collapsed": "false" }, "source": [ "Build the training/inference image and push to ECR" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "Collapsed": "false" }, "outputs": [], "source": [ "training_algorithm_name = \"autogluon-sagemaker-training\"\n", "inference_algorithm_name = \"autogluon-sagemaker-inference\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "First, you may want to remove existing docker images to make a room to build autogluon containers." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "subprocess.run(\"docker system prune -af\", shell=True)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "Collapsed": "false", "scrolled": true }, "outputs": [], "source": [ "subprocess.run(\n", " f\"/bin/bash ./container-training/build_push_training.sh {account} {region} {training_algorithm_name} {ecr_uri_prefix} {registry_uri_training.split('/')[0].split('.')[0]} {registry_uri_training}\",\n", " shell=True,\n", ")\n", "subprocess.run(\"docker system prune -af\", shell=True)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "subprocess.run(\n", " f\"/bin/bash ./container-inference/build_push_inference.sh {account} {region} {inference_algorithm_name} {ecr_uri_prefix} {registry_uri_training.split('/')[0].split('.')[0]} {registry_uri_inference}\",\n", " shell=True,\n", ")\n", "subprocess.run(\"docker system prune -af\", shell=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Alternative way of building docker images using sm-docker" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The new Amazon SageMaker Studio Image Build convenience package allows data scientists and developers to easily build custom container images from your Studio notebooks via a new CLI. \n", "Newly built Docker images are tagged and pushed to Amazon ECR. \n", "\n", "To use the CLI, you need to ensure the Amazon SageMaker execution role used by your Studio notebook environment (or another AWS Identity and Access Management (IAM) role, if you prefer) has the required permissions to interact with the resources used by the CLI, including access to CodeBuild and Amazon ECR. Your role should have a trust policy with CodeBuild. \n", "\n", "You also need to make sure the appropriate permissions are included in your role to run the build in CodeBuild, create a repository in Amazon ECR, and push images to that repository. \n", "\n", "See also: https://aws.amazon.com/blogs/machine-learning/using-the-amazon-sagemaker-studio-image-build-cli-to-build-container-images-from-your-studio-notebooks/" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# subprocess.run(\"pip install sagemaker-studio-image-build\", shell=True)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "\"\"\"\n", "training_repo_name = training_algorithm_name + ':latest'\n", "training_repo_name \n", "\n", "!sm-docker build . --repository {training_repo_name} \\\n", "--file ./container-training/Dockerfile.training --build-arg REGISTRY_URI={registry_uri_training}\n", "\n", "inference_repo_name = inference_algorithm_name + ':latest'\n", "inference_repo_name \n", "\n", "!sm-docker build . --repository {inference_repo_name} \\\n", "--file ./container-inference/Dockerfile.inference --build-arg REGISTRY_URI={registry_uri_inference}\n", "\"\"\"" ] }, { "cell_type": "markdown", "metadata": { "Collapsed": "false" }, "source": [ "### Get the data" ] }, { "cell_type": "markdown", "metadata": { "Collapsed": "false" }, "source": [ "In this example we'll use the direct-marketing dataset to build a binary classification model that predicts whether customers will accept or decline a marketing offer. \n", "First we'll download the data and split it into train and test sets. AutoGluon does not require a separate validation set (it uses bagged k-fold cross-validation)." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "Collapsed": "false" }, "outputs": [], "source": [ "# Download the data\n", "subprocess.run(\"mkdir bank-additional\", shell=True)\n", "s3 = boto3.client(\"s3\")\n", "s3.download_file(\n", " f\"sagemaker-example-files-prod-{region}\",\n", " \"datasets/tabular/uci_bank_marketing/bank-additional-full.csv\",\n", " \"bank-additional/bank-additional-full.csv\",\n", ")\n", "\n", "local_data_path = \"./bank-additional/bank-additional-full.csv\"\n", "data = pd.read_csv(local_data_path)\n", "\n", "# Split train/test data\n", "train = data.sample(frac=0.7, random_state=42)\n", "test = data.drop(train.index)\n", "\n", "# Split test X/y\n", "label = \"y\"\n", "y_test = test[label]\n", "X_test = test.drop(columns=[label])" ] }, { "cell_type": "markdown", "metadata": { "Collapsed": "false" }, "source": [ "##### Check the data" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "Collapsed": "false" }, "outputs": [], "source": [ "train.head(3)\n", "train.shape\n", "\n", "test.head(3)\n", "test.shape\n", "\n", "X_test.head(3)\n", "X_test.shape" ] }, { "cell_type": "markdown", "metadata": { "Collapsed": "false" }, "source": [ "Upload the data to s3" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "Collapsed": "false" }, "outputs": [], "source": [ "train_file = \"train.csv\"\n", "train.to_csv(train_file, index=False)\n", "train_s3_path = session.upload_data(train_file, key_prefix=\"{}/data\".format(prefix))\n", "\n", "test_file = \"test.csv\"\n", "test.to_csv(test_file, index=False)\n", "test_s3_path = session.upload_data(test_file, key_prefix=\"{}/data\".format(prefix))\n", "\n", "X_test_file = \"X_test.csv\"\n", "X_test.to_csv(X_test_file, index=False)\n", "X_test_s3_path = session.upload_data(X_test_file, key_prefix=\"{}/data\".format(prefix))" ] }, { "cell_type": "markdown", "metadata": { "Collapsed": "false" }, "source": [ "## Hyperparameter Selection\n", "\n", "The minimum required settings for training is just a target label, `init_args['label']`.\n", "\n", "Additional optional hyperparameters can be passed to the `autogluon.tabular.TabularPredictor.fit` function via `fit_args`.\n", "\n", "Below shows a more in depth example of AutoGluon-Tabular hyperparameters from the example [Predicting Columns in a Table - In Depth](https://auto.gluon.ai/stable/tutorials/tabular_prediction/tabular-indepth.html). Please see [fit parameters](https://auto.gluon.ai/stable/_modules/autogluon/tabular/predictor/predictor.html#TabularPredictor) for further information. Note that in order for hyperparameter ranges to work in SageMaker, values passed to the `fit_args['hyperparameters']` must be represented as strings.\n", "\n", "```python\n", "nn_options = {\n", " 'num_epochs': \"10\",\n", " 'learning_rate': \"ag.space.Real(1e-4, 1e-2, default=5e-4, log=True)\",\n", " 'activation': \"ag.space.Categorical('relu', 'softrelu', 'tanh')\",\n", " 'layers': \"ag.space.Categorical([100],[1000],[200,100],[300,200,100])\",\n", " 'dropout_prob': \"ag.space.Real(0.0, 0.5, default=0.1)\"\n", "}\n", "\n", "gbm_options = {\n", " 'num_boost_round': \"100\",\n", " 'num_leaves': \"ag.space.Int(lower=26, upper=66, default=36)\"\n", "}\n", "\n", "model_hps = {'NN': nn_options, 'GBM': gbm_options} \n", "\n", "init_args = {\n", " 'eval_metric' : 'roc_auc' \n", " 'label': 'y'\n", "}\n", "\n", "fit_args = {\n", " 'presets': ['best_quality', 'optimize_for_deployment'],\n", " 'time_limits': 60*10,\n", " 'hyperparameters': model_hps,\n", " 'hyperparameter_tune': True,\n", " 'search_strategy': 'skopt'\n", "}\n", "\n", "\n", "hyperparameters = {\n", " 'fit_args': fit_args,\n", " 'feature_importance': True\n", "}\n", "```\n", "**Note:** Your hyperparameter choices may affect the size of the model package, which could result in additional time taken to upload your model and complete training. Including `'optimize_for_deployment'` in the list of `fit_args['presets']` is recommended to greatly reduce upload times.\n", "\n", "
" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "Collapsed": "false" }, "outputs": [], "source": [ "# Define required label and optional additional parameters\n", "init_args = {\"label\": \"y\"}\n", "\n", "# Define additional parameters\n", "fit_args = {\n", " # Adding 'best_quality' to presets list will result in better performance (but longer runtime)\n", " \"presets\": [\"optimize_for_deployment\"],\n", "}\n", "\n", "# Pass fit_args to SageMaker estimator hyperparameters\n", "hyperparameters = {\"init_args\": init_args, \"fit_args\": fit_args, \"feature_importance\": True}\n", "\n", "tags = [{\"Key\": \"AlgorithmName\", \"Value\": \"AutoGluon-Tabular\"}]" ] }, { "cell_type": "markdown", "metadata": { "Collapsed": "false" }, "source": [ "## Train\n", "\n", "For local training set `train_instance_type` to `local` . \n", "For non-local training the recommended instance type is `ml.m5.2xlarge`. \n", "\n", "**Note:** Depending on how many underlying models are trained, `train_volume_size` may need to be increased so that they all fit on disk." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "Collapsed": "false" }, "outputs": [], "source": [ "%%time\n", "\n", "instance_type = \"ml.m5.2xlarge\"\n", "# instance_type = 'local'\n", "\n", "ecr_image = f\"{ecr_uri_prefix}/{training_algorithm_name}:latest\"\n", "\n", "estimator = Estimator(\n", " image_uri=ecr_image,\n", " role=role,\n", " instance_count=1,\n", " instance_type=instance_type,\n", " hyperparameters=hyperparameters,\n", " volume_size=100,\n", " tags=tags,\n", ")\n", "\n", "# Set inputs. Test data is optional, but requires a label column.\n", "inputs = {\"training\": train_s3_path, \"testing\": test_s3_path}\n", "\n", "estimator.fit(inputs)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Review the performance of the trained model" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from utils.ag_utils import launch_viewer\n", "\n", "launch_viewer(is_debug=False)" ] }, { "cell_type": "markdown", "metadata": { "Collapsed": "false" }, "source": [ "### Create Model" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "Collapsed": "false" }, "outputs": [], "source": [ "# Create predictor object\n", "class AutoGluonTabularPredictor(Predictor):\n", " def __init__(self, *args, **kwargs):\n", " super().__init__(\n", " *args, serializer=CSVSerializer(), deserializer=StringDeserializer(), **kwargs\n", " )" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "Collapsed": "false" }, "outputs": [], "source": [ "ecr_image = f\"{ecr_uri_prefix}/{inference_algorithm_name}:latest\"\n", "\n", "if instance_type == \"local\":\n", " model = estimator.create_model(image_uri=ecr_image, role=role)\n", "else:\n", " # model_uri = os.path.join(estimator.output_path, estimator._current_job_name, \"output\", \"model.tar.gz\")\n", " model_uri = estimator.model_data\n", " model = Model(\n", " ecr_image,\n", " model_data=model_uri,\n", " role=role,\n", " sagemaker_session=session,\n", " predictor_cls=AutoGluonTabularPredictor,\n", " )" ] }, { "cell_type": "markdown", "metadata": { "Collapsed": "false" }, "source": [ "### Batch Transform" ] }, { "cell_type": "markdown", "metadata": { "Collapsed": "false" }, "source": [ "For local mode, either `s3:////output/` or `file:///` can be used as outputs.\n", "\n", "By including the label column in the test data, you can also evaluate prediction performance (In this case, passing `test_s3_path` instead of `X_test_s3_path`)." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "Collapsed": "false", "scrolled": true }, "outputs": [], "source": [ "output_path = f\"s3://{bucket}/{prefix}/output/\"\n", "# output_path = f'file://{os.getcwd()}'\n", "\n", "transformer = model.transformer(\n", " instance_count=1,\n", " instance_type=instance_type,\n", " strategy=\"MultiRecord\",\n", " max_payload=6,\n", " max_concurrent_transforms=1,\n", " output_path=output_path,\n", ")\n", "\n", "transformer.transform(test_s3_path, content_type=\"text/csv\", split_type=\"Line\")\n", "transformer.wait()" ] }, { "cell_type": "markdown", "metadata": { "Collapsed": "false" }, "source": [ "### Endpoint" ] }, { "cell_type": "markdown", "metadata": { "Collapsed": "false" }, "source": [ "##### Deploy remote or local endpoint" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "Collapsed": "false" }, "outputs": [], "source": [ "instance_type = \"ml.m5.2xlarge\"\n", "# instance_type = 'local'\n", "\n", "predictor = model.deploy(initial_instance_count=1, instance_type=instance_type)" ] }, { "cell_type": "markdown", "metadata": { "Collapsed": "false" }, "source": [ "##### Attach to endpoint (or reattach if kernel was restarted)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "Collapsed": "false" }, "outputs": [], "source": [ "# Select standard or local session based on instance_type\n", "if instance_type == \"local\":\n", " sess = local_session\n", "else:\n", " sess = session\n", "\n", "# Attach to endpoint\n", "predictor = AutoGluonTabularPredictor(predictor.endpoint_name, sagemaker_session=sess)" ] }, { "cell_type": "markdown", "metadata": { "Collapsed": "false" }, "source": [ "##### Predict on unlabeled test data" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "Collapsed": "false" }, "outputs": [], "source": [ "results = predictor.predict(X_test.to_csv(index=False)).splitlines()\n", "\n", "# Check output\n", "threshold = 0.5\n", "y_results = np.array([\"yes\" if float(i.split(\",\")[1]) > threshold else \"no\" for i in results])\n", "\n", "print(Counter(y_results))" ] }, { "cell_type": "markdown", "metadata": { "Collapsed": "false" }, "source": [ "##### Predict on data that includes label column \n", "Prediction performance metrics will be printed to endpoint logs." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "Collapsed": "false" }, "outputs": [], "source": [ "results = predictor.predict(test.to_csv(index=False)).splitlines()\n", "\n", "# Check output\n", "threshold = 0.5\n", "y_results = np.array([\"yes\" if float(i.split(\",\")[1]) > threshold else \"no\" for i in results])\n", "\n", "print(Counter(y_results))" ] }, { "cell_type": "markdown", "metadata": { "Collapsed": "false" }, "source": [ "##### Check that classification performance metrics match evaluation printed to endpoint logs as expected" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "Collapsed": "false" }, "outputs": [], "source": [ "threshold = 0.5\n", "y_results = np.array([\"yes\" if float(i.split(\",\")[1]) > threshold else \"no\" for i in results])\n", "\n", "print(\"accuracy: {}\".format(accuracy_score(y_true=y_test, y_pred=y_results)))\n", "print(classification_report(y_true=y_test, y_pred=y_results, digits=6))" ] }, { "cell_type": "markdown", "metadata": { "Collapsed": "false" }, "source": [ "##### Clean up endpoint" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "Collapsed": "false" }, "outputs": [], "source": [ "predictor.delete_endpoint()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Explainability with Amazon SageMaker Clarify\n", "\n", "There are growing business needs and legislative regulations that require explainations of why a model made a certain decision. SHAP (SHapley Additive exPlanations) is an approach to explain the output of machine learning models. SHAP values represent a feature's contribution to a change in the model output. SageMaker Clarify uses SHAP to explain the contribution that each input feature makes to the final decision." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "##### Set parameters for SHAP calculation" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "seed = 0\n", "num_rows = 500\n", "\n", "# Write a csv file used by SageMaker Clarify\n", "test_explainavility_file = \"test_explainavility.csv\"\n", "train.head(num_rows).to_csv(test_explainavility_file, index=False, header=False)\n", "test_explainavility_s3_path = session.upload_data(\n", " test_explainavility_file, key_prefix=\"{}/data\".format(prefix)\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "##### Specify computing resources" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from sagemaker import clarify\n", "\n", "model_name = estimator.latest_training_job.job_name\n", "container_def = model.prepare_container_def()\n", "session.create_model(model_name, role, container_def)\n", "\n", "clarify_processor = clarify.SageMakerClarifyProcessor(\n", " role=role, instance_count=1, instance_type=\"ml.c4.xlarge\", sagemaker_session=session\n", ")\n", "model_config = clarify.ModelConfig(\n", " model_name=model_name, instance_type=\"ml.c5.xlarge\", instance_count=1, accept_type=\"text/csv\"\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "##### Run a SageMaker Clarify job" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "shap_config = clarify.SHAPConfig(\n", " baseline=X_test.sample(15, random_state=seed).values.tolist(),\n", " num_samples=100,\n", " agg_method=\"mean_abs\",\n", ")\n", "\n", "explainability_output_path = \"s3://{}/{}/{}/clarify-explainability\".format(\n", " bucket, prefix, model_name\n", ")\n", "explainability_data_config = clarify.DataConfig(\n", " s3_data_input_path=test_explainavility_s3_path,\n", " s3_output_path=explainability_output_path,\n", " label=\"y\",\n", " headers=train.columns.to_list(),\n", " dataset_type=\"text/csv\",\n", ")\n", "\n", "predictions_config = clarify.ModelPredictedLabelConfig(probability_threshold=0.5)\n", "\n", "clarify_processor.run_explainability(\n", " data_config=explainability_data_config,\n", " model_config=model_config,\n", " explainability_config=shap_config,\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "##### View the Explainability Report" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can view the explainability report in Studio under the experiments tab. If you're not a Studio user yet, as with the Bias Report, you can access this report at the following S3 bucket." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "subprocess.run(f\"aws s3 cp {explainability_output_path} . --recursive\", shell=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Global explanatory methods allow understanding the model and its feature contributions in aggregate over multiple datapoints. Here we show an aggregate bar plot that plots the mean absolute SHAP value for each feature." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "subprocess.run(f\"{sys.executable} -m pip install shap\", shell=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "##### Compute global shap values out of out.csv" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "shap_values_ = pd.read_csv(\"explanations_shap/out.csv\")\n", "shap_values_.abs().mean().to_dict()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "num_features = len(train.head(num_rows).drop([\"y\"], axis=1).columns)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import shap\n", "\n", "shap_values = [shap_values_.to_numpy()[:, :num_features], shap_values_.to_numpy()[:, num_features:]]\n", "shap.summary_plot(\n", " shap_values,\n", " plot_type=\"bar\",\n", " feature_names=train.head(num_rows).drop([\"y\"], axis=1).columns.tolist(),\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The detailed summary plot below can provide more context over the above bar chart. It tells which features are most important and, in addition, their range of effects over the dataset. The color allows us to match how changes in the value of a feature effect the change in prediction. The 'red' indicates higher value of the feature and 'blue' indicates lower (normalized over the features)." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "shap.summary_plot(\n", " shap_values_[shap_values_.columns[20:]].to_numpy(), train.head(num_rows).drop([\"y\"], axis=1)\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Notebook CI Test Results\n", "\n", "This notebook was tested in multiple regions. The test results are as follows, except for us-west-2 which is shown at the top of the notebook.\n", "\n", "![This us-east-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/us-east-1/advanced_functionality|autogluon-tabular|AutoGluon_Tabular_SageMaker.ipynb)\n", "\n", "![This us-east-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/us-east-2/advanced_functionality|autogluon-tabular|AutoGluon_Tabular_SageMaker.ipynb)\n", "\n", "![This us-west-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/us-west-1/advanced_functionality|autogluon-tabular|AutoGluon_Tabular_SageMaker.ipynb)\n", "\n", "![This ca-central-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/ca-central-1/advanced_functionality|autogluon-tabular|AutoGluon_Tabular_SageMaker.ipynb)\n", "\n", "![This sa-east-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/sa-east-1/advanced_functionality|autogluon-tabular|AutoGluon_Tabular_SageMaker.ipynb)\n", "\n", "![This eu-west-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/eu-west-1/advanced_functionality|autogluon-tabular|AutoGluon_Tabular_SageMaker.ipynb)\n", "\n", "![This eu-west-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/eu-west-2/advanced_functionality|autogluon-tabular|AutoGluon_Tabular_SageMaker.ipynb)\n", "\n", "![This eu-west-3 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/eu-west-3/advanced_functionality|autogluon-tabular|AutoGluon_Tabular_SageMaker.ipynb)\n", "\n", "![This eu-central-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/eu-central-1/advanced_functionality|autogluon-tabular|AutoGluon_Tabular_SageMaker.ipynb)\n", "\n", "![This eu-north-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/eu-north-1/advanced_functionality|autogluon-tabular|AutoGluon_Tabular_SageMaker.ipynb)\n", "\n", "![This ap-southeast-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/ap-southeast-1/advanced_functionality|autogluon-tabular|AutoGluon_Tabular_SageMaker.ipynb)\n", "\n", "![This ap-southeast-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/ap-southeast-2/advanced_functionality|autogluon-tabular|AutoGluon_Tabular_SageMaker.ipynb)\n", "\n", "![This ap-northeast-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/ap-northeast-1/advanced_functionality|autogluon-tabular|AutoGluon_Tabular_SageMaker.ipynb)\n", "\n", "![This ap-northeast-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/ap-northeast-2/advanced_functionality|autogluon-tabular|AutoGluon_Tabular_SageMaker.ipynb)\n", "\n", "![This ap-south-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/ap-south-1/advanced_functionality|autogluon-tabular|AutoGluon_Tabular_SageMaker.ipynb)\n" ] } ], "metadata": { "availableInstances": [ { "_defaultOrder": 0, "_isFastLaunch": true, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 4, "name": "ml.t3.medium", "vcpuNum": 2 }, { "_defaultOrder": 1, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 8, "name": "ml.t3.large", "vcpuNum": 2 }, { "_defaultOrder": 2, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 16, "name": "ml.t3.xlarge", "vcpuNum": 4 }, { "_defaultOrder": 3, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 32, "name": "ml.t3.2xlarge", "vcpuNum": 8 }, { "_defaultOrder": 4, "_isFastLaunch": true, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 8, "name": "ml.m5.large", "vcpuNum": 2 }, { "_defaultOrder": 5, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 16, "name": "ml.m5.xlarge", "vcpuNum": 4 }, { "_defaultOrder": 6, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 32, "name": "ml.m5.2xlarge", "vcpuNum": 8 }, { "_defaultOrder": 7, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 64, "name": "ml.m5.4xlarge", "vcpuNum": 16 }, { "_defaultOrder": 8, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 128, "name": "ml.m5.8xlarge", "vcpuNum": 32 }, { "_defaultOrder": 9, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 192, "name": "ml.m5.12xlarge", "vcpuNum": 48 }, { "_defaultOrder": 10, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 256, "name": "ml.m5.16xlarge", "vcpuNum": 64 }, { "_defaultOrder": 11, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 384, "name": "ml.m5.24xlarge", "vcpuNum": 96 }, { "_defaultOrder": 12, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 8, "name": "ml.m5d.large", "vcpuNum": 2 }, { "_defaultOrder": 13, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 16, "name": "ml.m5d.xlarge", "vcpuNum": 4 }, { "_defaultOrder": 14, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 32, "name": "ml.m5d.2xlarge", "vcpuNum": 8 }, { "_defaultOrder": 15, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 64, "name": "ml.m5d.4xlarge", "vcpuNum": 16 }, { "_defaultOrder": 16, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 128, "name": "ml.m5d.8xlarge", "vcpuNum": 32 }, { "_defaultOrder": 17, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 192, "name": "ml.m5d.12xlarge", "vcpuNum": 48 }, { "_defaultOrder": 18, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 256, "name": "ml.m5d.16xlarge", "vcpuNum": 64 }, { "_defaultOrder": 19, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 384, "name": "ml.m5d.24xlarge", "vcpuNum": 96 }, { "_defaultOrder": 20, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": true, "memoryGiB": 0, "name": "ml.geospatial.interactive", "supportedImageNames": [ "sagemaker-geospatial-v1-0" ], "vcpuNum": 0 }, { "_defaultOrder": 21, "_isFastLaunch": true, "category": "Compute optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 4, "name": "ml.c5.large", "vcpuNum": 2 }, { "_defaultOrder": 22, "_isFastLaunch": false, "category": "Compute optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 8, "name": "ml.c5.xlarge", "vcpuNum": 4 }, { "_defaultOrder": 23, "_isFastLaunch": false, "category": "Compute optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 16, "name": "ml.c5.2xlarge", "vcpuNum": 8 }, { "_defaultOrder": 24, "_isFastLaunch": false, "category": "Compute optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 32, "name": "ml.c5.4xlarge", "vcpuNum": 16 }, { "_defaultOrder": 25, "_isFastLaunch": false, "category": "Compute optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 72, "name": "ml.c5.9xlarge", "vcpuNum": 36 }, { "_defaultOrder": 26, "_isFastLaunch": false, "category": "Compute optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 96, "name": "ml.c5.12xlarge", "vcpuNum": 48 }, { "_defaultOrder": 27, "_isFastLaunch": false, "category": "Compute optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 144, "name": "ml.c5.18xlarge", "vcpuNum": 72 }, { "_defaultOrder": 28, "_isFastLaunch": false, "category": "Compute optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 192, "name": "ml.c5.24xlarge", "vcpuNum": 96 }, { "_defaultOrder": 29, "_isFastLaunch": true, "category": "Accelerated computing", "gpuNum": 1, "hideHardwareSpecs": false, "memoryGiB": 16, "name": "ml.g4dn.xlarge", "vcpuNum": 4 }, { "_defaultOrder": 30, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 1, "hideHardwareSpecs": false, "memoryGiB": 32, "name": "ml.g4dn.2xlarge", "vcpuNum": 8 }, { "_defaultOrder": 31, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 1, "hideHardwareSpecs": false, "memoryGiB": 64, "name": "ml.g4dn.4xlarge", "vcpuNum": 16 }, { "_defaultOrder": 32, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 1, "hideHardwareSpecs": false, "memoryGiB": 128, "name": "ml.g4dn.8xlarge", "vcpuNum": 32 }, { "_defaultOrder": 33, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 4, "hideHardwareSpecs": false, "memoryGiB": 192, "name": "ml.g4dn.12xlarge", "vcpuNum": 48 }, { "_defaultOrder": 34, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 1, "hideHardwareSpecs": false, "memoryGiB": 256, "name": "ml.g4dn.16xlarge", "vcpuNum": 64 }, { "_defaultOrder": 35, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 1, "hideHardwareSpecs": false, "memoryGiB": 61, "name": "ml.p3.2xlarge", "vcpuNum": 8 }, { "_defaultOrder": 36, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 4, "hideHardwareSpecs": false, "memoryGiB": 244, "name": "ml.p3.8xlarge", "vcpuNum": 32 }, { "_defaultOrder": 37, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 8, "hideHardwareSpecs": false, "memoryGiB": 488, "name": "ml.p3.16xlarge", "vcpuNum": 64 }, { "_defaultOrder": 38, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 8, "hideHardwareSpecs": false, "memoryGiB": 768, "name": "ml.p3dn.24xlarge", "vcpuNum": 96 }, { "_defaultOrder": 39, "_isFastLaunch": false, "category": "Memory Optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 16, "name": "ml.r5.large", "vcpuNum": 2 }, { "_defaultOrder": 40, "_isFastLaunch": false, "category": "Memory Optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 32, "name": "ml.r5.xlarge", "vcpuNum": 4 }, { "_defaultOrder": 41, "_isFastLaunch": false, "category": "Memory Optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 64, "name": "ml.r5.2xlarge", "vcpuNum": 8 }, { "_defaultOrder": 42, "_isFastLaunch": false, "category": "Memory Optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 128, "name": "ml.r5.4xlarge", "vcpuNum": 16 }, { "_defaultOrder": 43, "_isFastLaunch": false, "category": "Memory Optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 256, "name": "ml.r5.8xlarge", "vcpuNum": 32 }, { "_defaultOrder": 44, "_isFastLaunch": false, "category": "Memory Optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 384, "name": "ml.r5.12xlarge", "vcpuNum": 48 }, { "_defaultOrder": 45, "_isFastLaunch": false, "category": "Memory Optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 512, "name": "ml.r5.16xlarge", "vcpuNum": 64 }, { "_defaultOrder": 46, "_isFastLaunch": false, "category": "Memory Optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 768, "name": "ml.r5.24xlarge", "vcpuNum": 96 }, { "_defaultOrder": 47, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 1, "hideHardwareSpecs": false, "memoryGiB": 16, "name": "ml.g5.xlarge", "vcpuNum": 4 }, { "_defaultOrder": 48, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 1, "hideHardwareSpecs": false, "memoryGiB": 32, "name": "ml.g5.2xlarge", "vcpuNum": 8 }, { "_defaultOrder": 49, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 1, "hideHardwareSpecs": false, "memoryGiB": 64, "name": "ml.g5.4xlarge", "vcpuNum": 16 }, { "_defaultOrder": 50, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 1, "hideHardwareSpecs": false, "memoryGiB": 128, "name": "ml.g5.8xlarge", "vcpuNum": 32 }, { "_defaultOrder": 51, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 1, "hideHardwareSpecs": false, "memoryGiB": 256, "name": "ml.g5.16xlarge", "vcpuNum": 64 }, { "_defaultOrder": 52, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 4, "hideHardwareSpecs": false, "memoryGiB": 192, "name": "ml.g5.12xlarge", "vcpuNum": 48 }, { "_defaultOrder": 53, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 4, "hideHardwareSpecs": false, "memoryGiB": 384, "name": "ml.g5.24xlarge", "vcpuNum": 96 }, { "_defaultOrder": 54, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 8, "hideHardwareSpecs": false, "memoryGiB": 768, "name": "ml.g5.48xlarge", "vcpuNum": 192 }, { "_defaultOrder": 55, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 8, "hideHardwareSpecs": false, "memoryGiB": 1152, "name": "ml.p4d.24xlarge", "vcpuNum": 96 }, { "_defaultOrder": 56, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 8, "hideHardwareSpecs": false, "memoryGiB": 1152, "name": "ml.p4de.24xlarge", "vcpuNum": 96 } ], "kernelspec": { "display_name": "Python 3 (MXNet 1.9 Python 3.8 CPU Optimized)", "language": "python", "name": "python3__SAGEMAKER_INTERNAL__arn:aws:sagemaker:us-west-2:236514542706:image/mxnet-1.9-cpu-py38-ubuntu20.04-sagemaker-v1.0" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.10" } }, "nbformat": 4, "nbformat_minor": 4 }