{ "cells": [ { "cell_type": "markdown", "id": "39191856-3377-4476-b8d0-c49bb5b0741f", "metadata": { "tags": [] }, "source": [ "# JSON Support with SageMaker Clarify" ] }, { "cell_type": "markdown", "id": "96b4c525", "metadata": {}, "source": [ "---\n", "\n", "This notebook's CI test result for us-west-2 is as follows. CI test results in other regions can be found at the end of the notebook. \n", "\n", "![This us-west-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/us-west-2/sagemaker-clarify|fairness_and_explainability|fairness_and_explainability_json_format.ipynb)\n", "\n", "---" ] }, { "cell_type": "markdown", "id": "5bfb5ea5-afe9-44a1-99eb-edd5602b8c72", "metadata": {}, "source": [ "## Contents\n", "\n", "1. [Overview](#Overview)\n", "1. [Prerequisites and Data](#Prerequisites-and-Data)\n", " 1. [Initialize SageMaker](#Initialize-Sagemaker)\n", " 1. [Download data](#Download-data)\n", " 1. [Loading the data](#Loading-the-data)\n", "1. [Preprocessing](#Preprocessing)\n", " 1. [Encoding](#Encoding)\n", " 1. [JSON Dataset](#JSON-Dataset)\n", " 1. [Upload Dataset to s3](#Upload-Dataset-to-s3)\n", "1. [Training](#Training)\n", "1. [Deploy Model](#Deploy-Model)\n", " 1. [[Optional] Verifying JSON Inferences](#[Optional]-Verifying-JSON-Inferences)\n", "1. [SageMaker Clarify](#SageMaker-Clarify)\n", " 1. [DataConfig](#DataConfig)\n", " 1. [ModelConfig](#ModelConfig)\n", " 1. [ModelPredictedLabelConfig](#ModelPredictedLabelConfig)\n", " 1. [BiasConfig](#BiasConfig)\n", " 1. [ExplainabilityConfig](#ExplainabilityConfig)\n", " 1. [Run SageMaker Clarify Processing Job](#Run-SageMaker-Clarify-Processing-Job)\n", "1. [Viewing SageMaker Clarify Results](#Viewing-SageMaker-Clarify-Results)\n", " 1. [1. SageMaker Studio Experiments](#1.-SageMaker-Studio-Experiments)\n", " 1. [2. Downloading SageMaker Clarify Output Files](#2.-Downloading-SageMaker-Clarify-Output-Files)\n", "1. [Cleanup](#Cleanup)" ] }, { "cell_type": "markdown", "id": "d135d49b-ac98-476a-99a8-f28f30cc3d1a", "metadata": { "tags": [] }, "source": [ "## Overview\n", "\n", "Amazon SageMaker Clarify helps improve your machine learning models by detecting potential bias and helping explain how these models make predictions. The fairness and explainability functionality provided by SageMaker Clarify takes a step towards enabling AWS customers to build trustworthy and understandable machine learning models. The product comes with the tools to help you with the following tasks.\n", "\n", "- Measure biases that can occur during each stage of the ML lifecycle (data collection, model training and tuning, and monitoring of ML models deployed for inference).\n", "- Generate model governance reports targeting risk and compliance teams and external regulators.\n", "- Provide explanations of the data, models, and monitoring used to assess predictions for input containing data of various modalities like numerical data, categorical data, text, and images.\n", "\n", "Learn more about SageMaker Clarify [here](https://aws.amazon.com/sagemaker/clarify/). This sample notebook walks you through:\n", "\n", "1. Key terms and concepts needed to understand SageMaker Clarify\n", "2. The incremental updates required to prepare a model for bias measurement and explaining\n", " - Preprocessing, Training, Model Deploy\n", "3. Measuring the pre-training bias of a dataset and post-training bias of a model\n", "4. Explaining the importance of the various new input features on the model's decision\n", "5. In particular, this will showcase:\n", " - Clarify's support for JSON input datasets, and a model with JSON inputs and outputs\n", "\n", "In doing so, the notebook will first train a [Linear Learner](https://docs.aws.amazon.com/sagemaker/latest/dg/linear-learner.html) model using a training dataset, then use SageMaker Clarify to analyze a test dataset with JSON as the model input/output." ] }, { "cell_type": "markdown", "id": "7bb2c1d5-aa5a-4132-ac08-519a6459ca7d", "metadata": { "tags": [] }, "source": [ "## Prerequisites and Data\n", "\n", "If you are using SageMaker Notebook Instances, please use the `conda_python3` kernel.\n", "\n", "### Initialize SageMaker" ] }, { "cell_type": "code", "execution_count": null, "id": "59782829-b702-4ae7-921d-be9e20f6970b", "metadata": { "tags": [] }, "outputs": [], "source": [ "# Upgrade dependencies\n", "!pip install sagemaker botocore boto3 awscli --upgrade" ] }, { "cell_type": "code", "execution_count": null, "id": "73e74a91-795f-4339-a508-07f9a32e2576", "metadata": {}, "outputs": [], "source": [ "import sagemaker\n", "import pandas as pd\n", "\n", "# SageMaker session bucket is used to upload the dataset, model and model training logs\n", "sess = sagemaker.Session()\n", "region = sess.boto_region_name\n", "bucket = sess.default_bucket()\n", "print(f\"Bucket: {bucket}\")\n", "\n", "# Define the IAM role\n", "role = sagemaker.get_execution_role()\n", "print(f\"Execution Role: {role}\")" ] }, { "cell_type": "markdown", "id": "e9ad980b-fe10-41b3-87df-0bc89dacfcb2", "metadata": {}, "source": [ "### Download Data\n", "\n", "We use the popular [Adult Census Dataset](http://archive.ics.uci.edu/ml/datasets/Adult) from the UCI Machine Learning Repository$^{[1]}$. The data is already split between a training dataset (adult.train) and test dataset (adult.test) in the [Data Folder](https://archive.ics.uci.edu/ml/machine-learning-databases/adult/).\n", "\n", "The dataset files are available in a public s3 bucket which we download below and are in a CSV format.\n", "\n", "$^{[1]}$Dua, D. and Graff, C. (2019). [UCI Machine Learning Repository](http://archive.ics.uci.edu/ml). Irvine, CA: University of California, School of Information and Computer Science." ] }, { "cell_type": "code", "execution_count": null, "id": "76cd878c-1fe9-4f25-861f-a55e52770424", "metadata": {}, "outputs": [], "source": [ "import os\n", "import boto3\n", "\n", "s3_client = boto3.client(\"s3\")\n", "\n", "# Training dataset\n", "if not os.path.isfile(\"adult.data\"):\n", " s3_client.download_file(\n", " f\"sagemaker-example-files-prod-{region}\",\n", " \"datasets/tabular/uci_adult/adult.data\",\n", " \"adult.data\",\n", " )\n", " print(\"adult.data saved!\")\n", "else:\n", " print(\"adult.data already on disk.\")\n", "\n", "# Test dataset\n", "if not os.path.isfile(\"adult.test\"):\n", " s3_client.download_file(\n", " f\"sagemaker-example-files-prod-{region}\",\n", " \"datasets/tabular/uci_adult/adult.test\",\n", " \"adult.test\",\n", " )\n", " print(\"adult.test saved!\")\n", "else:\n", " print(\"adult.test already on disk.\")" ] }, { "cell_type": "markdown", "id": "9d3dd1e8-b4e0-4233-864b-928ce93cc25f", "metadata": { "tags": [] }, "source": [ "### Loading the data\n", "\n", "From the UCI repository of machine learning datasets, this database contains 14 features concerning demographic characteristics of 45,222 rows (32,561 for training and 12,661 for testing). The task is to predict whether a person has a yearly income that is more or less than $50,000.\n", "\n", "Here are the features and their possible values. Categorical values are listed, continuous columns are noted as such.:\n", "\n", "1. **Age**: continuous.\n", "2. **Workclass**: Private, Self-emp-not-inc, Self-emp-inc, Federal-gov, Local-gov, State-gov, Without-pay, Never-worked.\n", "3. **Fnlwgt**: continuous (the number of people the census takers believe that observation represents).\n", "4. **Education**: Bachelors, Some-college, 11th, HS-grad, Prof-school, Assoc-acdm, Assoc-voc, 9th, 7th-8th, 12th, Masters, 1st-4th, 10th, Doctorate, 5th-6th, Preschool.\n", "5. ****Education-num**: continuous.\n", "6. **Marital-status**: Married-civ-spouse, Divorced, Never-married, Separated, Widowed, Married-spouse-absent, Married-AF-spouse.\n", "7. **Occupation**: Tech-support, Craft-repair, Other-service, Sales, Exec-managerial, Prof-specialty, Handlers-cleaners, Machine-op-inspct, Adm-clerical, Farming-fishing, Transport-moving, Priv-house-serv, Protective-serv, Armed-Forces.\n", "8. **Relationship**: Wife, Own-child, Husband, Not-in-family, Other-relative, Unmarried.\n", "9. **Ethnic group**: White, Asian-Pac-Islander, Amer-Indian-Eskimo, Other, Black.\n", "10. **Sex**: Female, Male.\n", " - **Note**: this data is extracted from the 1994 Census and enforces a binary option on Sex\n", "11. **Capital-gain**: continuous.\n", "12. **Capital-loss**: continuous.\n", "13. **Hours-per-week**: continuous.\n", "14. **Native-country**: United-States, Cambodia, England, Puerto-Rico, Canada, Germany, Outlying-US(Guam-USVI-etc), India, Japan, Greece, South, China, Cuba, Iran, Honduras, Philippines, Italy, Poland, Jamaica, Vietnam, Mexico, Portugal, Ireland, France, Dominican-Republic, Laos, Ecuador, Taiwan, Haiti, Columbia, Hungary, Guatemala, Nicaragua, Scotland, Thailand, Yugoslavia, El-Salvador, Trinadad&Tobago, Peru, Hong, Holand-Netherlands.\n", "\n", "The target label is the last column in the CSV files:\n", "\n", "1. **Target**: <=50,000, >$50,000.\n", " - Or \"50K\" for short" ] }, { "cell_type": "code", "execution_count": null, "id": "23da403b-426b-48ea-8d81-c55d88dc4577", "metadata": {}, "outputs": [], "source": [ "# Columns of the dataset, in order, including the Target label\n", "adult_columns = [\n", " \"Age\",\n", " \"Workclass\",\n", " \"fnlwgt\",\n", " \"Education\",\n", " \"Education-Num\",\n", " \"Marital Status\",\n", " \"Occupation\",\n", " \"Relationship\",\n", " \"Ethnic group\",\n", " \"Sex\",\n", " \"Capital Gain\",\n", " \"Capital Loss\",\n", " \"Hours per week\",\n", " \"Country\",\n", " \"Target\",\n", "]\n", "\n", "# Load the CSV dataset files\n", "training_data = pd.read_csv(\n", " \"adult.data\", names=adult_columns, sep=r\"\\s*,\\s*\", engine=\"python\", na_values=\"?\"\n", ").dropna()\n", "testing_data = pd.read_csv(\n", " \"adult.test\", names=adult_columns, sep=r\"\\s*,\\s*\", engine=\"python\", na_values=\"?\", skiprows=1\n", ").dropna()\n", "\n", "display(training_data.head())\n", "display(testing_data.head())" ] }, { "cell_type": "markdown", "id": "edcfd68b-0e7d-4b7b-8642-d60e2fedc9af", "metadata": {}, "source": [ "## Preprocessing\n", "\n", "The data needs to be converted into a format our [Linear Learner](https://docs.aws.amazon.com/sagemaker/latest/dg/linear-learner.html) model can process.\n", "\n", "### Encoding\n", "\n", "Machine Learning models that process categorical columns (e.g. Occupation) typically require them to be encoded into numerical values. This allows the model to generically and universally handle any categorical values. \n", "\n", "Below, we take all categorical columns of the training and test datasets, and encode (or map) them to numerical values between 0 to number of categories for the respective column. We also change the position of the label column (Target column) to the first position as per the expectation of the Linear Learner model." ] }, { "cell_type": "code", "execution_count": null, "id": "b3ba461a-e2ad-4c8f-b400-81c7e384047c", "metadata": {}, "outputs": [], "source": [ "from sklearn import preprocessing\n", "\n", "\n", "def number_encode_features(df):\n", " \"\"\"Encodes non-numerical categorical columns to numerical labels.\n", "\n", " This is because our model expects encoded categories.\n", "\n", " :return: Mapping from column to the LabelEncoder, which can be used to reverse transform the\n", " encoded categories to original categories if needed, or to map a category to its encoding.\n", " \"\"\"\n", " encoders = {}\n", " for column in df.columns:\n", " if df.dtypes[column] == object:\n", " encoders[column] = preprocessing.LabelEncoder()\n", " df[column] = encoders[column].fit_transform(df[column].fillna(\"None\"))\n", " return encoders\n", "\n", "\n", "# Our Linear Learner model expects the label to be in the first column\n", "training_data = pd.concat([training_data[\"Target\"], training_data.drop([\"Target\"], axis=1)], axis=1)\n", "_ = number_encode_features(training_data)\n", "training_data.to_csv(\"train_data.csv\", index=False, header=False)\n", "\n", "# We will use the testing_data dataset to run the Clarify job later on. This dataset includes\n", "# labels already for each record. Clarify would also work if labels are omitted from the\n", "# input dataset, and will instead invoke the model endpoint for inference requests to get the label.\n", "testing_data = pd.concat([testing_data[\"Target\"], testing_data.drop([\"Target\"], axis=1)], axis=1)\n", "_ = number_encode_features(testing_data)\n", "testing_data.to_csv(\"test_data.csv\", index=False, header=False)\n", "test_features = testing_data.drop([\"Target\"], axis=1)\n", "\n", "display(training_data.head(15))\n", "display(testing_data.head())" ] }, { "cell_type": "markdown", "id": "58669bc8-1830-480b-ae3f-140d0ef40fcc", "metadata": {}, "source": [ "### JSON Dataset\n", "\n", "As we are showcasing Clarify's support of JSON input datasets, here we convert the DataFrame datasets into a JSON array of key-value pairs for each record." ] }, { "cell_type": "code", "execution_count": null, "id": "fa71aa8c-fb07-41de-af6d-25bae93dc90f", "metadata": {}, "outputs": [], "source": [ "import json\n", "import numpy as np\n", "\n", "\n", "class NpEncoder(json.JSONEncoder):\n", " def default(self, obj):\n", " if isinstance(obj, np.integer):\n", " return int(obj)\n", " if isinstance(obj, np.floating):\n", " return float(obj)\n", " if isinstance(obj, np.ndarray):\n", " return obj.tolist()\n", " return super(NpEncoder, self).default(obj)\n", "\n", "\n", "def csv_to_json(df, output):\n", " res = []\n", " for i, row in df.iterrows():\n", " r = {}\n", " for col in adult_columns:\n", " r[col] = row[col]\n", " res.append(r)\n", " with open(output, \"w\") as f:\n", " json.dump(res, f, indent=4, cls=NpEncoder)\n", " return res\n", "\n", "\n", "json_training_data = csv_to_json(training_data, \"adult.data.json\")\n", "json_test_data = csv_to_json(testing_data, \"adult.test.json\")\n", "print(json_training_data[:5])\n", "print(json_test_data[:5])" ] }, { "cell_type": "markdown", "id": "fe9c4a5c-3d49-4f64-9ea0-9dd5ea4ec6ee", "metadata": { "tags": [] }, "source": [ "### Upload Dataset to s3\n", "\n", "Let's upload the datasets we preprocessed above to s3, and specify the model artifacts output location as well" ] }, { "cell_type": "code", "execution_count": null, "id": "6284ce9c-510a-4a1a-88c9-2177a7c576f8", "metadata": {}, "outputs": [], "source": [ "from sagemaker.s3 import S3Uploader\n", "\n", "# S3 key prefix for the datasets\n", "prefix = \"sagemaker/DEMO-sagemaker-clarify-json-e2e\"\n", "\n", "s3_train_data = \"s3://{}/{}/train\".format(bucket, prefix)\n", "s3_test_data = \"s3://{}/{}/test\".format(bucket, prefix)\n", "\n", "# Linear Learner model can take CSV files as input for training\n", "train_uri = S3Uploader.upload(\"train_data.csv\", s3_train_data)\n", "test_data_uri = S3Uploader.upload(\"test_data.csv\", s3_test_data)\n", "\n", "# Linear Learner inference can use JSON, so we will save the test dataset JSON to\n", "# showcase using Clarify with both JSON input dataset, and JSON model input/output\n", "json_train_uri = S3Uploader.upload(\"adult.data.json\", s3_train_data)\n", "json_test_data_uri = S3Uploader.upload(\"adult.test.json\", s3_test_data)\n", "\n", "print(f\"Saved training data to {s3_train_data}\")\n", "print(f\"Saved test data to {s3_test_data}\")\n", "\n", "# Model artifacts output location\n", "s3_model_output_location = \"s3://{}/{}/output\".format(bucket, prefix)\n", "print(f\"Model artifacts will be saved to {s3_model_output_location}\")" ] }, { "cell_type": "markdown", "id": "f2730734-0754-4f17-b94b-0890b6d159ea", "metadata": {}, "source": [ "## Training\n", "\n", "Now that we have our datasets, we can train a Linear Learner model.\n", "\n", "First we fetch the Linear Learner container which will be used for training the model." ] }, { "cell_type": "code", "execution_count": null, "id": "e4c71115-ac16-4ae6-863b-f88bb25f8d31", "metadata": {}, "outputs": [], "source": [ "container = sagemaker.image_uris.retrieve(region=region, framework=\"linear-learner\")\n", "print(\"Using SageMaker Linear Learner container: {} ({})\".format(container, region))" ] }, { "cell_type": "markdown", "id": "f73221d0-46a9-4633-9381-82cc22dad7a5", "metadata": {}, "source": [ "Then we setup our [Estimator](https://sagemaker.readthedocs.io/en/stable/api/training/estimators.html) to use the Linear Learner container, an instance type and instance count for the training job, and the hyperparameters for the algorithm. See more hyperparameter details and tuning in the [documentation](https://docs.aws.amazon.com/sagemaker/latest/dg/ll_hyperparameters.html).\n", "\n", "The training job will take the training dataset (via the s3 uri) we uploaded to s3 above. Optionally, Linear Learner can take a validation dataset for calibration and a test dataset for a test score.\n", "\n", "Here we'll trigger the training job, which may take a few minutes." ] }, { "cell_type": "code", "execution_count": null, "id": "b775a002-137b-49d8-9262-57cab74aa7b5", "metadata": { "tags": [] }, "outputs": [], "source": [ "linear = sagemaker.estimator.Estimator(\n", " container,\n", " role,\n", " instance_count=1,\n", " instance_type=\"ml.c4.xlarge\",\n", " output_path=s3_model_output_location,\n", " sagemaker_session=sess,\n", ")\n", "# See Linear Learner documentation for hyperparameters\n", "linear.set_hyperparameters(\n", " feature_dim=len(adult_columns) - 1,\n", " predictor_type=\"binary_classifier\",\n", " mini_batch_size=200,\n", ")\n", "\n", "\n", "# Linear Learner uses CSV datasets for training\n", "train_data = sagemaker.inputs.TrainingInput(\n", " train_uri,\n", " distribution=\"FullyReplicated\",\n", " content_type=\"text/csv\",\n", " s3_data_type=\"S3Prefix\",\n", ")\n", "# Optional: add test dataset for test score\n", "test_data = sagemaker.inputs.TrainingInput(\n", " test_data_uri,\n", " distribution=\"FullyReplicated\",\n", " content_type=\"text/csv\",\n", " s3_data_type=\"S3Prefix\",\n", ")\n", "linear.fit({\"train\": train_data, \"test\": test_data})" ] }, { "cell_type": "markdown", "id": "493caeb0-0d0c-4677-ba90-cc59ed18f646", "metadata": { "tags": [] }, "source": [ "## Deploy Model\n", "\n", "Let's deploy the model and spin up an endpoint for inferences. \n", "\n", "**Note**: You could also just create the model with [`create_model()`](https://sagemaker.readthedocs.io/en/stable/api/training/estimators.html#sagemaker.estimator.EstimatorBase.create_model) and skip the inference validation steps below. When you spin up the SageMaker Clarify processing job, a shadow, or temporary endpoint can be spun up for the duration of the job.\n", "\n", "**Note**: If you did not explicit specify a `model_name` in the Estimator, the `model_name` will be a concatenation of the container + timestamp, which should be the same as the endpoint name." ] }, { "cell_type": "code", "execution_count": null, "id": "8d224a9d-47de-4cf3-8141-0a9252a54158", "metadata": { "tags": [] }, "outputs": [], "source": [ "predictor = linear.deploy(initial_instance_count=1, instance_type=\"ml.m4.xlarge\")\n", "print(f\"\\nEndpoint name: {predictor.endpoint_name}\")" ] }, { "cell_type": "markdown", "id": "327fe62b-a2e4-411c-b049-0479531c94e2", "metadata": {}, "source": [ "### [Optional] Verifying JSON Inferences\n", "\n", "This Notebook will showcase supporting a model that has JSON input/output for inference requests. We verify below that our model indeed can handle JSON inference requests, and output JSON responses. The inference request follows the [common structure for JSON](https://docs.aws.amazon.com/sagemaker/latest/dg/cdf-inference.html) and the response follows [documented formats](https://docs.aws.amazon.com/sagemaker/latest/dg/LL-in-formats.html).\n", "\n", "You should see prediction results below, containing a list of scores and predicted labels. We also display the real results to sanity check our model's performance (it's not great for the model and dataset due to the data quality and/or other factors, but we won't dive into that here)." ] }, { "cell_type": "code", "execution_count": null, "id": "05450d65-92fd-49ad-8463-1215d5391ddb", "metadata": {}, "outputs": [], "source": [ "import json\n", "from sagemaker.serializers import JSONSerializer\n", "from sagemaker.deserializers import JSONDeserializer\n", "\n", "# Serializer for model input, deserializer for model output\n", "predictor.serializer = JSONSerializer()\n", "predictor.deserializer = JSONDeserializer()\n", "\n", "instances = []\n", "for i, row in test_features[:20].iterrows():\n", " instances.append({\"features\": row.tolist()})\n", "\n", "response = predictor.predict({\"instances\": instances})\n", "print(json.dumps(response, indent=2))\n", "display(testing_data.head(20))" ] }, { "cell_type": "markdown", "id": "51194d66-40ae-4833-959b-180b4ab8cabb", "metadata": { "tags": [] }, "source": [ "## SageMaker Clarify\n", "\n", "Now, let's use SageMaker Clarify to measure pre-training bias metrics on the dataset, post-training bias metrics on the model, and explaining the importance of the dataset's features on the model's decisions!\n", "\n", "#### Initialize SageMaker Clarify Processor\n", "\n", "We initialize a [`SageMakerClarifyProcessor`](https://sagemaker.readthedocs.io/en/stable/api/training/processing.html#sagemaker.clarify.SageMakerClarifyProcessor) and specify the instance count and type for the Clarify job. (**Note**: increasing the instance count allows for [parallel processing with spark](https://docs.aws.amazon.com/sagemaker/latest/dg/clarify-processing-job-run.html#clarify-processing-job-run-spark)).\n", "\n", "Then, we can specify various configurations for the Clarify job using Config objects from the [SageMaker Python SDK](https://github.com/aws/sagemaker-python-sdk). Alternatively, you can specify an [`analysis_config.json` via the `ProcessingInput` and `ProcessingOutput` APIs](https://docs.aws.amazon.com/sagemaker/latest/dg/clarify-processing-job-configure-parameters.html)." ] }, { "cell_type": "code", "execution_count": null, "id": "af18b303-a609-414f-8d08-29e3f124b86a", "metadata": {}, "outputs": [], "source": [ "from sagemaker import clarify\n", "\n", "clarify_processor = clarify.SageMakerClarifyProcessor(\n", " role=role, instance_count=1, instance_type=\"ml.m5.xlarge\", sagemaker_session=sess\n", ")" ] }, { "cell_type": "markdown", "id": "3a4eb4e0-d020-4d7b-ba3f-bae1ca81cffa", "metadata": { "tags": [] }, "source": [ "#### DataConfig\n", "\n", "The [`DataConfig`](https://sagemaker.readthedocs.io/en/stable/api/training/processing.html#sagemaker.clarify.DataConfig) communicates information of the data I/O to SageMaker Clarify. We specify where to find the input dataset, where to store the output, the target column (`label`), the header names, and the dataset type. Check the documentation for supported dataset types.\n", "\n", "Here we set the input dataset to the JSON dataset we created earlier. JSON datasets require `headers` to have the `label` as the last element, and a `features` JMESPath set to extract the feature values from the dataset (this must result in a 2D array of feature values - see the `DataConfig` documentation for more details)." ] }, { "cell_type": "code", "execution_count": null, "id": "0ff656e3-9afc-421f-b746-8b3e8984b917", "metadata": {}, "outputs": [], "source": [ "# Model artifacts output location\n", "clarify_s3_output_location = \"s3://{}/{}/clarify/output\".format(bucket, prefix)\n", "print(f\"Clarify output will be saved to {clarify_s3_output_location}\")\n", "\n", "# For JSON/JSON Lines, the label header should be the last entry in the `headers` field\n", "headers = training_data.columns[1:].to_list() + [training_data.columns[0]]\n", "print(headers)\n", "\n", "# Note: JMESPath uses double quotes `\"\"` for strings, but python can use single quotes `''`,\n", "# so we make sure this is valid JMESPath by converting all single quotes to double quotes\n", "features_jmespath = f\"[*]{adult_columns[0:-1]}\".replace(\"'\", '\"')\n", "print(features_jmespath)\n", "\n", "data_config = clarify.DataConfig(\n", " s3_data_input_path=json_test_data_uri,\n", " s3_output_path=clarify_s3_output_location,\n", " features=features_jmespath,\n", " label=\"[*].Target\",\n", " headers=headers,\n", " dataset_type=\"application/json\", # JSON input dataset\n", ")" ] }, { "cell_type": "markdown", "id": "c1152c09-925b-473a-a401-a67ca57756e7", "metadata": { "tags": [] }, "source": [ "#### ModelConfig\n", "\n", "The [`ModelConfig`](https://sagemaker.readthedocs.io/en/stable/api/training/processing.html#sagemaker.clarify.ModelConfig) communicates information about your trained model. Here we specify the `endpoint_name` of the inference endpoint we deployed above. If you had instead used [`create_model()`](https://sagemaker.readthedocs.io/en/stable/api/training/estimators.html#sagemaker.estimator.EstimatorBase.create_model), you will need to specify the `model_name`, `instance_count` and `instance_type` for SageMaker Clarify to spin up a shadow endpoint for the Clarify job (this would be useful if you want to avoid additional traffic to a production endpoint).\n", "\n", "We also specify the model's input (`content_type`) and output (`accept_type`) formats. Here is where we show SageMaker Clarify supports JSON based model I/O. Check the AWS or [Python SDK](https://sagemaker.readthedocs.io/en/stable/api/training/processing.html#sagemaker.clarify.ModelConfig) documentation for all supported model input and output types.\n", "\n", "We will need to set a `content_template` to denote the outer structure of the JSON request for the model input, and a `record_template` to denote per-record JSON structure. Again, see the AWS or [Python SDK documentation](https://sagemaker.readthedocs.io/en/stable/api/training/processing.html#sagemaker.clarify.ModelConfig) for more details on these parameters, including concrete examples. For our model, we will want to construct JSON that looks like this (the [common inference format](https://docs.aws.amazon.com/sagemaker/latest/dg/cdf-inference.html)):\n", "\n", "```json\n", "{\n", " \"instances\": [\n", " // First record\n", " {\"features\": [ ]},\n", " // Second record\n", " {\"features\": [ ]}\n", " ...\n", " ]\n", "}\n", "```" ] }, { "cell_type": "code", "execution_count": null, "id": "9108ebe9-d7b2-4f07-a5a5-76fa5f72815c", "metadata": {}, "outputs": [], "source": [ "model_config = clarify.ModelConfig(\n", " endpoint_name=predictor.endpoint_name, # Reuse model endpoint created above\n", " content_type=\"application/json\", # Model input format\n", " accept_type=\"application/json\", # Model output format\n", " content_template='{\"instances\":$records}', # Outer JSON structure\n", " record_template='{\"features\":$features}', # per record is just a list of features\n", ")" ] }, { "cell_type": "markdown", "id": "00ef76f3-c07f-4756-8674-5055aa820699", "metadata": {}, "source": [ "#### ModelPredictedLabelConfig\n", "\n", "The [`ModelPredictedLabelConfig`](https://sagemaker.readthedocs.io/en/stable/api/training/processing.html#sagemaker.clarify.ModelPredictedLabelConfig) provides information on the format of the model prediction outputs. For our use case, we specify how to find the `label` and `probability` score in the predictions.\n", "\n", "If you ran the [[Optional] Verifying JSON Inferences](#[Optional]-Verifying-JSON-Inferences) section above, you can see for each inference, the model outputs a JSON structure with a `predicted_label` and `score`. Since it's a list of predictions, we use the expression below to fetch the fields for each prediction. See the [`ModelPredictedLabelConfig documentation`](https://sagemaker.readthedocs.io/en/stable/api/training/processing.html#sagemaker.clarify.ModelPredictedLabelConfig) for more details." ] }, { "cell_type": "code", "execution_count": null, "id": "d898660f-c68c-4875-874d-417a77859bdf", "metadata": {}, "outputs": [], "source": [ "predictions_config = clarify.ModelPredictedLabelConfig(\n", " label=\"predictions[*].predicted_label\", probability=\"predictions[*].score\"\n", ")" ] }, { "cell_type": "markdown", "id": "2fb3c4f3-1368-4d87-a9cb-b8c9bc746bbd", "metadata": {}, "source": [ "#### BiasConfig\n", "\n", "For measuring Bias metrics, we specify a [`BiasConfig`](https://sagemaker.readthedocs.io/en/stable/api/training/processing.html#sagemaker.clarify.BiasConfig) which contains information about what the sensitive columns (`facets`) are, what the sensitive features (`facet_values_or_threshold`) may be, and what the desirable (positive) outcomes are (`label_values_or_threshold`).\n", "\n", "For our dataset, the positive outcome is incoming >$50K, which we encoded as `1`. `Sex` is the sensitive column and `Female` respondents (encoded as `0`) are the sensitive group." ] }, { "cell_type": "code", "execution_count": null, "id": "3ea4f0bc-8f7b-4f89-9913-059ff67f84a9", "metadata": {}, "outputs": [], "source": [ "bias_config = clarify.BiasConfig(\n", " label_values_or_threshold=[1], facet_name=\"Sex\", facet_values_or_threshold=[0], group_name=\"Age\"\n", ")" ] }, { "cell_type": "markdown", "id": "e6348249-908d-4895-9b9f-2e8a78910810", "metadata": { "tags": [] }, "source": [ "#### ExplainabilityConfig\n", "\n", "For explaining predictions, we specify a [`SHAPConfig`](https://sagemaker.readthedocs.io/en/stable/api/training/processing.html#sagemaker.clarify.SHAPConfig) (Note: see documentation for other explanation methods). \n", "\n", "SageMaker Clarify uses the [Kernel SHAP](https://papers.nips.cc/paper/2017/file/8a20a8621978632d76c43dfd28b67767-Paper.pdf) algorithm to explain the contribution that each input feature makes to the final decision. This requires a `baseline` (or background dataset). If not provided, a baseline is calculated automatically by SageMaker Clarify using K-means or K-prototypes in the input dataset. Baseline dataset type shall be the same as `dataset_type` of our `DataConfig`, and baseline samples shall only include features. By definition, `baseline` should either be a S3 URI to the baseline dataset file, or an in-place list of samples. In this case we chose the latter, and put the first sample of the test dataset to the list. " ] }, { "cell_type": "code", "execution_count": null, "id": "91c70d95-54da-425d-97f6-70b966e2858f", "metadata": { "tags": [] }, "outputs": [], "source": [ "# Baseline should be in the same format as the dataset\n", "shap_config = clarify.SHAPConfig(\n", " baseline=[dict(zip(test_features.columns, test_features.iloc[0].values.tolist()))],\n", " num_samples=15,\n", " agg_method=\"mean_abs\",\n", " save_local_shap_values=True,\n", ")" ] }, { "cell_type": "markdown", "id": "18bde214-94a5-40a3-b18f-38fba9779bf3", "metadata": {}, "source": [ "### Run SageMaker Clarify Processing Job\n", "\n", "Using our configuration objects created above, let's trigger a SageMaker Clarify Processing Job to run bias metrics measurements and explainability analysis.\n", "\n", "**Note**: you can individually run pre-training, post-training, and explainability processing jobs separately via other functions on the `SageMakerClarifyProcessor` object such as [`run_bias()`](https://sagemaker.readthedocs.io/en/stable/api/training/processing.html#sagemaker.clarify.SageMakerClarifyProcessor.run_bias), [`run_pre_training_bias()`](https://sagemaker.readthedocs.io/en/stable/api/training/processing.html#sagemaker.clarify.SageMakerClarifyProcessor.run_pre_training_bias), [`run_post_training_bias()`](https://sagemaker.readthedocs.io/en/stable/api/training/processing.html#sagemaker.clarify.SageMakerClarifyProcessor.run_post_training_bias), [`run_explainability()`](https://sagemaker.readthedocs.io/en/stable/api/training/processing.html#sagemaker.clarify.SageMakerClarifyProcessor.run_explainability).\n", "\n", "The SageMaker Clarify analysis output files will be in the `s3_output_path` we specified in `DataConfig`. If you are using SageMaker Studio, visual reports will also be available which we will walk through below." ] }, { "cell_type": "code", "execution_count": null, "id": "c95b89d2-7603-4ff9-b5be-cfbf5c2e58df", "metadata": { "tags": [] }, "outputs": [], "source": [ "clarify_processor.run_bias_and_explainability(\n", " data_config=data_config,\n", " model_config=model_config,\n", " explainability_config=shap_config,\n", " bias_config=bias_config,\n", " model_predicted_label_config=predictions_config,\n", ")" ] }, { "cell_type": "markdown", "id": "b886e880-f795-4a95-a752-3160660067db", "metadata": {}, "source": [ "## Viewing SageMaker Clarify Results\n", "\n", "You can view the SageMaker Clarify results with:\n", "\n", "1. SageMaker Studio Experiments tab\n", "2. Downloading the output files locally" ] }, { "cell_type": "markdown", "id": "2fd26629-ed5f-44ee-b9fb-d95ff058e869", "metadata": {}, "source": [ "### 1. SageMaker Studio Experiments\n", "\n", "Navigate to the **SageMaker Resources** tab and select `Experiments and trials`\n", "\n", "\n", "\n", "You should be able to find the Clarify job you just ran in the list of jobs and explore the various outputs such as the Explainability Report or Bias Report.\n", "\n", "" ] }, { "cell_type": "markdown", "id": "5197aca7-eca8-4b58-88e6-2f9fd8f58f1e", "metadata": {}, "source": [ "### 2. Downloading SageMaker Clarify Output Files\n", "\n", "Let's download the Clarify job output files from the s3 output location we specified in `DataConfig`" ] }, { "cell_type": "code", "execution_count": null, "id": "0be9ce70-96ef-4511-b16e-26b292e3f08c", "metadata": {}, "outputs": [], "source": [ "!mkdir clarify_output\n", "s3 = boto3.client(\"s3\")\n", "for s3_key in s3.list_objects(Bucket=bucket, Prefix=prefix + \"/clarify/output\")[\"Contents\"]:\n", " s3_object = s3_key[\"Key\"]\n", " if not s3_object.endswith(\"/\"):\n", " s3.download_file(bucket, s3_object, \"clarify_output/\" + s3_object.split(\"/\")[-1])" ] }, { "cell_type": "markdown", "id": "85b959f6-6d24-4ba2-a02c-8aa7fc1634a6", "metadata": {}, "source": [ "You should then be able to view the analysis results files in `clarify_output/`:\n", "\n", "1. **explanations_shap**: Local Explanations results and baseline\n", "2. **analysis_config.json**: the [analysis_config.json](https://docs.aws.amazon.com/sagemaker/latest/dg/clarify-processing-job-configure-analysis.html) used for the SageMaker Clarify job\n", "3. **analysis.json**: analysis results in JSON format. This is also printed as the output during the SageMaker Clarify Processing job\n", "4. **report.html/report.ipynb/report.pdf**: a nicely visualized report document of the analysis, including plots, graphs, and tables - in HTML, Notebook, and PDF formats\n", "\n", "The Local Explanations output file will contain local SHAP values for each combination of facet and model output label:" ] }, { "cell_type": "code", "execution_count": null, "id": "f516d43c-2ffb-48df-926e-e9ae6ce29dfd", "metadata": {}, "outputs": [], "source": [ "local_explanations = pd.read_csv(\"clarify_output/out.csv\")\n", "local_explanations.head(10)" ] }, { "cell_type": "markdown", "id": "6de42db0-491b-427e-90c7-962ba98659d8", "metadata": {}, "source": [ "We can also take a look at the `report.pdf` file which provides a comprehensive analysis report.\n", "\n", "For programmatically accessing the analysis results, `analysis.json` may be more fitting." ] }, { "cell_type": "code", "execution_count": null, "id": "ccf729f5-d8c5-4f16-bae1-5c0514fd66f5", "metadata": {}, "outputs": [], "source": [ "from IPython.display import IFrame\n", "\n", "IFrame(src=\"clarify_output/report.html\", width=1000, height=1000)" ] }, { "cell_type": "markdown", "id": "ae235750-e394-401c-83aa-60e87ff1e075", "metadata": {}, "source": [ "## Cleanup\n", "\n", "Finally, don't forget to clean up the resources we created for this demo!" ] }, { "cell_type": "code", "execution_count": null, "id": "cd1a6321-817b-420f-af3d-2128f075ef84", "metadata": {}, "outputs": [], "source": [ "predictor.delete_model()\n", "predictor.delete_endpoint()" ] }, { "cell_type": "markdown", "id": "e05b864c", "metadata": {}, "source": [ "## Notebook CI Test Results\n", "\n", "This notebook was tested in multiple regions. The test results are as follows, except for us-west-2 which is shown at the top of the notebook.\n", "\n", "![This us-east-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/us-east-1/sagemaker-clarify|fairness_and_explainability|fairness_and_explainability_json_format.ipynb)\n", "\n", "![This us-east-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/us-east-2/sagemaker-clarify|fairness_and_explainability|fairness_and_explainability_json_format.ipynb)\n", "\n", "![This us-west-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/us-west-1/sagemaker-clarify|fairness_and_explainability|fairness_and_explainability_json_format.ipynb)\n", "\n", "![This ca-central-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/ca-central-1/sagemaker-clarify|fairness_and_explainability|fairness_and_explainability_json_format.ipynb)\n", "\n", "![This sa-east-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/sa-east-1/sagemaker-clarify|fairness_and_explainability|fairness_and_explainability_json_format.ipynb)\n", "\n", "![This eu-west-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/eu-west-1/sagemaker-clarify|fairness_and_explainability|fairness_and_explainability_json_format.ipynb)\n", "\n", "![This eu-west-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/eu-west-2/sagemaker-clarify|fairness_and_explainability|fairness_and_explainability_json_format.ipynb)\n", "\n", "![This eu-west-3 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/eu-west-3/sagemaker-clarify|fairness_and_explainability|fairness_and_explainability_json_format.ipynb)\n", "\n", "![This eu-central-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/eu-central-1/sagemaker-clarify|fairness_and_explainability|fairness_and_explainability_json_format.ipynb)\n", "\n", "![This eu-north-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/eu-north-1/sagemaker-clarify|fairness_and_explainability|fairness_and_explainability_json_format.ipynb)\n", "\n", "![This ap-southeast-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/ap-southeast-1/sagemaker-clarify|fairness_and_explainability|fairness_and_explainability_json_format.ipynb)\n", "\n", "![This ap-southeast-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/ap-southeast-2/sagemaker-clarify|fairness_and_explainability|fairness_and_explainability_json_format.ipynb)\n", "\n", "![This ap-northeast-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/ap-northeast-1/sagemaker-clarify|fairness_and_explainability|fairness_and_explainability_json_format.ipynb)\n", "\n", "![This ap-northeast-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/ap-northeast-2/sagemaker-clarify|fairness_and_explainability|fairness_and_explainability_json_format.ipynb)\n", "\n", "![This ap-south-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/ap-south-1/sagemaker-clarify|fairness_and_explainability|fairness_and_explainability_json_format.ipynb)\n" ] } ], "metadata": { "availableInstances": [ { "_defaultOrder": 0, "_isFastLaunch": true, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 4, "name": "ml.t3.medium", "vcpuNum": 2 }, { "_defaultOrder": 1, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 8, "name": "ml.t3.large", "vcpuNum": 2 }, { "_defaultOrder": 2, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 16, "name": "ml.t3.xlarge", "vcpuNum": 4 }, { "_defaultOrder": 3, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 32, "name": "ml.t3.2xlarge", "vcpuNum": 8 }, { "_defaultOrder": 4, "_isFastLaunch": true, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 8, "name": "ml.m5.large", "vcpuNum": 2 }, { "_defaultOrder": 5, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 16, "name": "ml.m5.xlarge", "vcpuNum": 4 }, { "_defaultOrder": 6, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 32, "name": "ml.m5.2xlarge", "vcpuNum": 8 }, { "_defaultOrder": 7, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 64, "name": "ml.m5.4xlarge", "vcpuNum": 16 }, { "_defaultOrder": 8, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 128, "name": "ml.m5.8xlarge", "vcpuNum": 32 }, { "_defaultOrder": 9, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 192, "name": "ml.m5.12xlarge", "vcpuNum": 48 }, { "_defaultOrder": 10, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 256, "name": "ml.m5.16xlarge", "vcpuNum": 64 }, { "_defaultOrder": 11, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 384, "name": "ml.m5.24xlarge", "vcpuNum": 96 }, { "_defaultOrder": 12, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 8, "name": "ml.m5d.large", "vcpuNum": 2 }, { "_defaultOrder": 13, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 16, "name": "ml.m5d.xlarge", "vcpuNum": 4 }, { "_defaultOrder": 14, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 32, "name": "ml.m5d.2xlarge", "vcpuNum": 8 }, { "_defaultOrder": 15, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 64, "name": "ml.m5d.4xlarge", "vcpuNum": 16 }, { "_defaultOrder": 16, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 128, "name": "ml.m5d.8xlarge", "vcpuNum": 32 }, { "_defaultOrder": 17, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 192, "name": "ml.m5d.12xlarge", "vcpuNum": 48 }, { "_defaultOrder": 18, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 256, "name": "ml.m5d.16xlarge", "vcpuNum": 64 }, { "_defaultOrder": 19, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 384, "name": "ml.m5d.24xlarge", "vcpuNum": 96 }, { "_defaultOrder": 20, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": true, "memoryGiB": 0, "name": "ml.geospatial.interactive", "supportedImageNames": [ "sagemaker-geospatial-v1-0" ], "vcpuNum": 0 }, { "_defaultOrder": 21, "_isFastLaunch": true, "category": "Compute optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 4, "name": "ml.c5.large", "vcpuNum": 2 }, { "_defaultOrder": 22, "_isFastLaunch": false, "category": "Compute optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 8, "name": "ml.c5.xlarge", "vcpuNum": 4 }, { "_defaultOrder": 23, "_isFastLaunch": false, "category": "Compute optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 16, "name": "ml.c5.2xlarge", "vcpuNum": 8 }, { "_defaultOrder": 24, "_isFastLaunch": false, "category": "Compute optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 32, "name": "ml.c5.4xlarge", "vcpuNum": 16 }, { "_defaultOrder": 25, "_isFastLaunch": false, "category": "Compute optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 72, "name": "ml.c5.9xlarge", "vcpuNum": 36 }, { "_defaultOrder": 26, "_isFastLaunch": false, "category": "Compute optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 96, "name": "ml.c5.12xlarge", "vcpuNum": 48 }, { "_defaultOrder": 27, "_isFastLaunch": false, "category": "Compute optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 144, "name": "ml.c5.18xlarge", "vcpuNum": 72 }, { "_defaultOrder": 28, "_isFastLaunch": false, "category": "Compute optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 192, "name": "ml.c5.24xlarge", "vcpuNum": 96 }, { "_defaultOrder": 29, "_isFastLaunch": true, "category": "Accelerated computing", "gpuNum": 1, "hideHardwareSpecs": false, "memoryGiB": 16, "name": "ml.g4dn.xlarge", "vcpuNum": 4 }, { "_defaultOrder": 30, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 1, "hideHardwareSpecs": false, "memoryGiB": 32, "name": "ml.g4dn.2xlarge", "vcpuNum": 8 }, { "_defaultOrder": 31, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 1, "hideHardwareSpecs": false, "memoryGiB": 64, "name": "ml.g4dn.4xlarge", "vcpuNum": 16 }, { "_defaultOrder": 32, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 1, "hideHardwareSpecs": false, "memoryGiB": 128, "name": "ml.g4dn.8xlarge", "vcpuNum": 32 }, { "_defaultOrder": 33, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 4, "hideHardwareSpecs": false, "memoryGiB": 192, "name": "ml.g4dn.12xlarge", "vcpuNum": 48 }, { "_defaultOrder": 34, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 1, "hideHardwareSpecs": false, "memoryGiB": 256, "name": "ml.g4dn.16xlarge", "vcpuNum": 64 }, { "_defaultOrder": 35, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 1, "hideHardwareSpecs": false, "memoryGiB": 61, "name": "ml.p3.2xlarge", "vcpuNum": 8 }, { "_defaultOrder": 36, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 4, "hideHardwareSpecs": false, "memoryGiB": 244, "name": "ml.p3.8xlarge", "vcpuNum": 32 }, { "_defaultOrder": 37, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 8, "hideHardwareSpecs": false, "memoryGiB": 488, "name": "ml.p3.16xlarge", "vcpuNum": 64 }, { "_defaultOrder": 38, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 8, "hideHardwareSpecs": false, "memoryGiB": 768, "name": "ml.p3dn.24xlarge", "vcpuNum": 96 }, { "_defaultOrder": 39, "_isFastLaunch": false, "category": "Memory Optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 16, "name": "ml.r5.large", "vcpuNum": 2 }, { "_defaultOrder": 40, "_isFastLaunch": false, "category": "Memory Optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 32, "name": "ml.r5.xlarge", "vcpuNum": 4 }, { "_defaultOrder": 41, "_isFastLaunch": false, "category": "Memory Optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 64, "name": "ml.r5.2xlarge", "vcpuNum": 8 }, { "_defaultOrder": 42, "_isFastLaunch": false, "category": "Memory Optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 128, "name": "ml.r5.4xlarge", "vcpuNum": 16 }, { "_defaultOrder": 43, "_isFastLaunch": false, "category": "Memory Optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 256, "name": "ml.r5.8xlarge", "vcpuNum": 32 }, { "_defaultOrder": 44, "_isFastLaunch": false, "category": "Memory Optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 384, "name": "ml.r5.12xlarge", "vcpuNum": 48 }, { "_defaultOrder": 45, "_isFastLaunch": false, "category": "Memory Optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 512, "name": "ml.r5.16xlarge", "vcpuNum": 64 }, { "_defaultOrder": 46, "_isFastLaunch": false, "category": "Memory Optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 768, "name": "ml.r5.24xlarge", "vcpuNum": 96 }, { "_defaultOrder": 47, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 1, "hideHardwareSpecs": false, "memoryGiB": 16, "name": "ml.g5.xlarge", "vcpuNum": 4 }, { "_defaultOrder": 48, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 1, "hideHardwareSpecs": false, "memoryGiB": 32, "name": "ml.g5.2xlarge", "vcpuNum": 8 }, { "_defaultOrder": 49, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 1, "hideHardwareSpecs": false, "memoryGiB": 64, "name": "ml.g5.4xlarge", "vcpuNum": 16 }, { "_defaultOrder": 50, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 1, "hideHardwareSpecs": false, "memoryGiB": 128, "name": "ml.g5.8xlarge", "vcpuNum": 32 }, { "_defaultOrder": 51, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 1, "hideHardwareSpecs": false, "memoryGiB": 256, "name": "ml.g5.16xlarge", "vcpuNum": 64 }, { "_defaultOrder": 52, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 4, "hideHardwareSpecs": false, "memoryGiB": 192, "name": "ml.g5.12xlarge", "vcpuNum": 48 }, { "_defaultOrder": 53, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 4, "hideHardwareSpecs": false, "memoryGiB": 384, "name": "ml.g5.24xlarge", "vcpuNum": 96 }, { "_defaultOrder": 54, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 8, "hideHardwareSpecs": false, "memoryGiB": 768, "name": "ml.g5.48xlarge", "vcpuNum": 192 }, { "_defaultOrder": 55, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 8, "hideHardwareSpecs": false, "memoryGiB": 1152, "name": "ml.p4d.24xlarge", "vcpuNum": 96 }, { "_defaultOrder": 56, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 8, "hideHardwareSpecs": false, "memoryGiB": 1152, "name": "ml.p4de.24xlarge", "vcpuNum": 96 } ], "kernelspec": { "display_name": "Python 3 (Data Science 3.0)", "language": "python", "name": "python3__SAGEMAKER_INTERNAL__arn:aws:sagemaker:us-west-2:236514542706:image/sagemaker-data-science-310-v1" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.6" } }, "nbformat": 4, "nbformat_minor": 5 }