{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Item Level Explainability - Amazon Forecast \n", "\n", "Our goal is to train a forecasting model with Amazon Forecast and explain the resultant model in order to understand how different features are impacting the predictions using Forecast Explainability.\n", "\n", "Explainability helps you better understand how the attributes in your datasets impact your forecasts. Amazon Forecast uses a metric called Impact scores to quantify the relative impact of each attribute and determine whether they increase or decrease forecast values.\n", "\n", "To enable Forecast Explainability, your predictor must include at least one of the following: related time series, item metadata, or additional datasets like Holidays and the Weather Index.\n", "\n", "CreateExplainability accepts either a Predictor ARN or Forecast ARN. To receive aggregated Impact scores for all time series and time points in your datasets, provide a Predictor ARN. To receive Impact scores for specific time series and time points, provide a Forecast ARN.\n", "\n", "\n", "To do this, we will predict the order quantity for 20 musical instruments for US stores belonging to MyMusicCompany Inc, with monthly frequency for a 12 month forecast horizon. Time-series forecasting is important to avoid the costs related to under and over forecasting, in this case specifically for order quantities for different musical instruments. The data includes dates, instrument models and order quantities. The data contains related time-varying features including Loss Rate which represents items that get damaged during transportation, and Customer Request, which represents the number of customers on the wait list for an item. The data contains one static feature, Model Type, which represents the category the Model Id belongs to. We will train our model with the built-in holidays data provided by Amazon Forecast. We will then examine how the features in the data impact the order quantity using Explainability. \n", "\n", "Note that the impact scores, including those shown in this notebook, may differ between jobs due to some inherent randonmess in how impact scores are computed.\n", "\n", "\n", "
\n", "\n", "Note: the data used in this notebook is a synthetic dataset generated for the purposes of educating you on how to use the feature.\n", "\n", "**This notebook covers generating explainability for forecasting models through Amazon Forecast.** \n", "
  • See blog announcement understand drivers that influence your\n", "forecasts with explainability impact scores in Amazon\n", "Forecast.
  • \n", "\n", "
    \n", "\n", "\n", "# Table of Contents\n", "\n", "* Step 0: [Setting up](#setup)\n", "* Step 1: [Importing the Data into Forecast](#import)\n", " * Step 1a: [Creating a Dataset Group](#createDSG)\n", " * Step 1b: [Creating a Target Dataset](#targetDS)\n", " * Step 1c: [Creating an RTS Dataset](#RTSDS)\n", " * Step 1d: [Creating an IM Dataset](#IMDS)\n", " * Step 1e: [Update the Dataset Group](#updateDSG)\n", " * Step 1f: [Creating a Target Time Series Dataset Import Job](#targetImport)\n", " * Step 1g: [Creating a Related Time Series Dataset Import Job](#RTSImport)\n", " * Step 1h: [Creating an Item Metadata Import Job](#IMImport)\n", "* Step 2a: [Train an AutoPredictor](#AutoPredictor)\n", "* Step 2b: [Export the model-level explainability](#export)\n", "* Step 2c: [Visualize the model-level explainability](#visualize)\n", "* Step 3: [Create a Forecast](#forecast)\n", "* Step 4a: [Create explainability for specific time-series](#itemLevelExplainability)\n", "* Step 4b: [Create explainability export for specific time-series](#itemLevelExplainabilityExport)\n", "* Step 4c: [Create explainability for specific time-series at time-points](#itemAndTimePointLevelExplainability)\n", "* Step 4d: [Create explainability export for specific time-series at time-points](#itemAndTimePointLevelExplainabilityExport)\n", "* Step 5: [Cleaning up your Resources](#cleanup)\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Step 0: Setting up\n", "\n", "### First let us setup Amazon Forecast\n", "\n", "This section sets up the permissions and relevant endpoints." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import sys\n", "import os\n", "import shutil\n", "import datetime\n", "\n", "import pandas as pd\n", "import numpy as np\n", "\n", "# get region from boto3\n", "import boto3\n", "REGION = boto3.Session().region_name\n", "\n", "# importing forecast notebook utility from notebooks/common directory\n", "sys.path.insert( 0, os.path.abspath(\"../../common\") )\n", "import util\n", "\n", "import matplotlib.pyplot as plt\n", "%matplotlib inline \n", "plt.rcParams['figure.figsize'] = (15.0, 5.0)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "First, let's define a helper function.\n", "This function will make it easier to read in the exported files created as part of an explaiability export into a single pandas dataframe. We'll use this later in the notebook." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def read_explainability_export(BUCKET_NAME, s3_path):\n", " \"\"\"Read explainability export files\n", " Inputs: \n", " BUCKET_NAME = S3 bucket name\n", " s3_path = S3 path to export files\n", " , everything after \"s3://BUCKET_NAME/\" in S3 URI path to your files\n", " Return: Pandas dataframe with all files concatenated row-wise\n", " \"\"\"\n", " # set s3 path\n", " s3 = boto3.resource('s3')\n", " s3_bucket = boto3.resource('s3').Bucket(BUCKET_NAME)\n", " s3_depth = s3_path.split(\"/\")\n", " s3_depth = len(s3_depth) - 1\n", " \n", " # set local path\n", " local_write_path = \"explainability_exports\"\n", " if (os.path.exists(local_write_path) and os.path.isdir(local_write_path)):\n", " shutil.rmtree('explainability_exports')\n", " if not(os.path.exists(local_write_path) and os.path.isdir(local_write_path)):\n", " os.makedirs(local_write_path)\n", " \n", " # concat part files\n", " part_filename = \"\"\n", " part_files = list(s3_bucket.objects.filter(Prefix=s3_path))\n", " print(f\"Number .part files found: {len(part_files)}\")\n", " for file in part_files:\n", " # There will be a collection of CSVs, modify this to go get them all\n", " if \"csv\" in file.key:\n", " part_filename = file.key.split('/')[s3_depth]\n", " window_object = s3.Object(BUCKET_NAME, file.key)\n", " file_size = window_object.content_length\n", " if file_size > 0:\n", " s3.Bucket(BUCKET_NAME).download_file(file.key, local_write_path+\"/\"+part_filename)\n", " \n", " # Read from local dir and combine all the part files\n", " temp_dfs = []\n", " for entry in os.listdir(local_write_path):\n", " if os.path.isfile(os.path.join(local_write_path, entry)):\n", " df = pd.read_csv(os.path.join(local_write_path, entry), index_col=None, header=0)\n", " temp_dfs.append(df)\n", "\n", " # Return assembled .part files as pandas Dataframe\n", " fcst_df = pd.concat(temp_dfs, axis=0, ignore_index=True, sort=False)\n", " return fcst_df" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Configure the S3 bucket name and region name for this lesson.\n", "\n", "- If you don't have an S3 bucket, create it first on S3.\n", "- Although we have set the region to us-west-2 as a default value below, you can choose any of the regions that the service is available in." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "bucket_name = input(\"\\nEnter S3 bucket name for uploading the data:\")\n", "default_region = REGION\n", "REGION = input(f\"region [enter to accept default]: {default_region} \") or default_region " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Connect API session" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "session = boto3.Session(region_name=REGION) \n", "forecast = session.client(service_name='forecast') " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Create the role to provide to Amazon Forecast" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "role_name = \"ForecastNotebookRole-Explainability\"\n", "print(f\"Creating Role {role_name} ...\")\n", "default_role = util.get_or_create_iam_role( role_name = role_name )\n", "role_arn = default_role\n", "\n", "print(f\"Success! Created role arn = {role_arn.split('/')[1]}\")\n", "print(role_arn)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Verify the steps above were succesful by calling list_predictors()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "forecast.list_predictors()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Step 1. Importing the Data\n", "\n", "In this step, we will create a **Dataset** and **Import** the dataset from S3 to Amazon Forecast. To train a Predictor we will need a **DatasetGroup** that groups the input **Datasets**. So, we will end this step by creating a **DatasetGroup** with the imported **Dataset**.\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Define a dataset group name and version number for naming purposes" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "project = \"explainability_notebook\"\n", "idx = 1" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Step 1a. Creating a Dataset Group\n", "First let's create a dataset group and then update it later to add our datasets." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "dataset_group = f\"{project}_{idx}\"\n", "dataset_arns = []\n", "create_dataset_group_response = forecast.create_dataset_group(\n", " Domain=\"CUSTOM\",\n", " DatasetGroupName=dataset_group,\n", " DatasetArns=dataset_arns)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Below, we specify key input data and forecast parameters.\n", "\n", "The forecast frequency for this data is weekly.\n", "The forecast horizon for this data is 12 weeks, which is about 3 months." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "freq = \"M\"\n", "forecast_horizon = 12\n", "timestamp_format = \"yyyy-MM-dd HH:mm:ss\"\n", "delimiter = ','" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "print(f'Creating dataset group {dataset_group}')" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "dataset_group_arn = create_dataset_group_response['DatasetGroupArn']" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "forecast.describe_dataset_group(DatasetGroupArn=dataset_group_arn)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Step 1b. Creating a Target Time Series (TTS) Dataset\n", "In this example, we will define a target time series. This is a required dataset to use the service." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "ts_dataset_name = f\"{project}_tts_{idx}\"\n", "print(ts_dataset_name)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Next, we specify the schema of our dataset below. Make sure the order of the attributes (columns) matches the raw data in the files. We follow the same three attribute format as the above example." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "ts_schema_val = [\n", " {\"AttributeName\": \"timestamp\", \"AttributeType\": \"timestamp\"},\n", " {\"AttributeName\": \"item_id\", \"AttributeType\": \"string\"},\n", " {\"AttributeName\": \"target_value\", \"AttributeType\": \"float\"}]\n", "ts_schema = {\"Attributes\": ts_schema_val}" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "print(f'Creating target dataset {ts_dataset_name}')" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "response = forecast.create_dataset(\n", " Domain=\"CUSTOM\",\n", " DatasetType='TARGET_TIME_SERIES',\n", " DatasetName=ts_dataset_name,\n", " DataFrequency=freq,\n", " Schema=ts_schema\n", " )" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "ts_dataset_arn = response['DatasetArn']" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "forecast.describe_dataset(DatasetArn=ts_dataset_arn)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Step 1c. Creating an Related Time Series (RTS) Dataset\n", "In this example, we will define a related time series dataset. The columns in the RTS are attributes whose impact can be explained. " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "rts_dataset_name = f\"{project}_rts_{idx}\"\n", "print(rts_dataset_name)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "rts_schema_val = [\n", " {\"AttributeName\": \"timestamp\", \"AttributeType\": \"timestamp\"},\n", " {\"AttributeName\": \"item_id\", \"AttributeType\": \"string\"},\n", " {\"AttributeName\": \"Loss_Rate\", \"AttributeType\": \"float\"},\n", " {\"AttributeName\": \"Customer_Request\", \"AttributeType\": \"float\"}]\n", "rts_schema = {\"Attributes\": rts_schema_val}" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "print(f'Creating RTS dataset {rts_dataset_name}')" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "response = forecast.create_dataset(\n", " Domain=\"CUSTOM\",\n", " DatasetType='RELATED_TIME_SERIES',\n", " DataFrequency=freq,\n", " DatasetName=rts_dataset_name,\n", " Schema=rts_schema\n", " )" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "rts_dataset_arn = response['DatasetArn']" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "forecast.describe_dataset(DatasetArn=rts_dataset_arn)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Step 1d. Creating an Item Metadata (IM) Dataset\n", "In this example, we will define an Item Metadata dataset. This will be a feature whose impact can be explained. " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "im_dataset_name = f\"{project}_im_{idx}\"\n", "print(im_dataset_name)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "im_schema_val = [\n", " {\"AttributeName\": \"item_id\", \"AttributeType\": \"string\"},\n", " {\"AttributeName\": \"Model_Type\", \"AttributeType\": \"string\"}]\n", "im_schema = {\"Attributes\": im_schema_val}" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "print(f'Creating IM dataset {im_dataset_name}')" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "response = forecast.create_dataset(\n", " Domain=\"CUSTOM\",\n", " DatasetType='ITEM_METADATA',\n", " DatasetName=im_dataset_name,\n", " Schema=im_schema\n", " )" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "im_dataset_arn = response['DatasetArn']" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "forecast.describe_dataset(DatasetArn=im_dataset_arn)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Step 1e. Updating the dataset group with the datasets we created\n", "You can have multiple datasets under the same dataset group. Update it with the datasets we created before." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "dataset_arns = []\n", "dataset_arns.append(ts_dataset_arn)\n", "dataset_arns.append(rts_dataset_arn)\n", "dataset_arns.append(im_dataset_arn)\n", "\n", "forecast.update_dataset_group(DatasetGroupArn=dataset_group_arn, DatasetArns=dataset_arns)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "forecast.describe_dataset_group(DatasetGroupArn=dataset_group_arn)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Step 1f. Creating a Target Time Series Dataset Import Job\n", " \n", "Below, we save the Target Time Series to your bucket on S3, since Amazon Forecast expects to be able to import the data from S3." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "local_file = \"instrumentData/TTS.csv\"\n", "key = f\"{project}/{local_file}\"\n", "boto3.Session().resource('s3').Bucket(bucket_name).Object(key).upload_file(local_file)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "ts_s3_data_path = f\"s3://{bucket_name}/{project}/{local_file}\"\n", "print(ts_s3_data_path)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "ts_dataset_import_job_response = forecast.create_dataset_import_job(\n", " DatasetImportJobName=dataset_group,\n", " DatasetArn=ts_dataset_arn,\n", " DataSource= {\n", " \"S3Config\" : {\n", " \"Path\": ts_s3_data_path,\n", " \"RoleArn\": role_arn\n", " } \n", " },\n", " TimestampFormat=timestamp_format\n", " )\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "ts_dataset_import_job_arn=ts_dataset_import_job_response['DatasetImportJobArn']" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "status = util.wait(lambda: forecast.describe_dataset_import_job(DatasetImportJobArn=ts_dataset_import_job_arn))\n", "assert status" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Step 1g. Creating a Related Time Series Dataset Import Job\n", "Below, we save the Related Time Series to your bucket on S3, since Amazon Forecast expects to be able to import the data from S3." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "local_file = \"instrumentData/RTS.csv\"\n", "key = f\"{project}/{local_file}\"\n", "boto3.Session().resource('s3').Bucket(bucket_name).Object(key).upload_file(local_file)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "rts_s3_data_path = f\"s3://{bucket_name}/{project}/{local_file}\"\n", "print(rts_s3_data_path)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "rts_dataset_import_job_response = forecast.create_dataset_import_job(\n", " DatasetImportJobName=dataset_group,\n", " DatasetArn=rts_dataset_arn,\n", " DataSource= {\n", " \"S3Config\" : {\n", " \"Path\": rts_s3_data_path,\n", " \"RoleArn\": role_arn\n", " } \n", " })" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "rts_dataset_import_job_arn=rts_dataset_import_job_response['DatasetImportJobArn']" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "status = util.wait(lambda: forecast.describe_dataset_import_job(DatasetImportJobArn=rts_dataset_import_job_arn))\n", "assert status" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Step 1h. Creating an Item Metadata Dataset Import Job\n", "Below, we save the Item Metadata to your bucket on S3, since Amazon Forecast expects to be able to import the data from S3." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "local_file = \"instrumentData/IM.csv\"\n", "key = f\"{project}/{local_file}\"\n", "boto3.Session().resource('s3').Bucket(bucket_name).Object(key).upload_file(local_file)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "im_s3_data_path = f\"s3://{bucket_name}/{project}/{local_file}\"\n", "print(im_s3_data_path)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "im_dataset_import_job_response = forecast.create_dataset_import_job(\n", " DatasetImportJobName=dataset_group,\n", " DatasetArn=im_dataset_arn,\n", " DataSource= {\n", " \"S3Config\" : {\n", " \"Path\": im_s3_data_path,\n", " \"RoleArn\": role_arn\n", " } \n", " })" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "im_dataset_import_job_arn=im_dataset_import_job_response['DatasetImportJobArn']" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "status = util.wait(lambda: forecast.describe_dataset_import_job(DatasetImportJobArn=im_dataset_import_job_arn))\n", "assert status" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Step 2a. Train an AutoPredictor with RTS, IM and Holidays" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Next, we will train an AutoPredictor using the dataset group created in step 2 as well as US Holidays" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Explainability requires at least one dataset attribute other than the item_id and target_value attributes. So for the predictor we create, impact scores will be generated for RTS columns, IM and Holidays." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can create Explainability for all forecasts generated from an AutoPredictor. \n", "In addition to this, at AutoPredictor creation, you have the option to generate model-level explainability. \n", "We will enable this option for predictor creation by setting:\n", "\n", "```python\n", "ExplainPredictor=True\n", "```" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "auto_predictor_name = f'holidays_instrument_orders_auto_predictor_{idx}'\n", "\n", "print(f'[{auto_predictor_name}] Creating predictor {auto_predictor_name} ...')" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "create_predictor_response = forecast.create_auto_predictor(\n", " PredictorName=auto_predictor_name,\n", " ForecastHorizon=forecast_horizon,\n", " ForecastFrequency=\"M\",\n", " DataConfig=\n", " {\"DatasetGroupArn\":dataset_group_arn,\n", " \"AdditionalDatasets\":\n", " [\n", " {\"Name\":\"holiday\",\n", " \"Configuration\":\n", " {\"CountryCode\":\n", " [\"US\"]\n", " }\n", " }\n", " ]\n", " },\n", " ExplainPredictor=True\n", " )" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "predictor_arn = create_predictor_response['PredictorArn']" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "status = util.wait(lambda: forecast.describe_auto_predictor(PredictorArn=predictor_arn))\n", "assert status" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "forecast.describe_auto_predictor(PredictorArn=predictor_arn)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### When we created the AutoPredictor, we also created a model level explainability job\n", "We will wait for the explainability job to be Active, and then we can export it and view the results." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Get the explainability arn from calling describe on the predictor." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "auto_predictor_response = forecast.describe_auto_predictor(PredictorArn=predictor_arn)\n", "explainability_model_level_arn = auto_predictor_response[\"ExplainabilityInfo\"][\"ExplainabilityArn\"]" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "status = util.wait(lambda: forecast.describe_explainability(ExplainabilityArn=explainability_model_level_arn))\n", "assert status" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now that the explainability is Active, we will export the results by creating an explainablity export" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Step 2b. Export the model-level explainability" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "explainability_export_name = f\"{project}_explainability_export_model_level_{idx}\"\n", "explainability_export_destination = f\"s3://{bucket_name}/{project}/{explainability_export_name}\"" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "explainability_export_response = forecast.create_explainability_export(ExplainabilityExportName=explainability_export_name, \n", " ExplainabilityArn=explainability_model_level_arn, \n", " Destination=\n", " {\"S3Config\": \n", " {\"Path\": explainability_export_destination,\n", " \"RoleArn\": role_arn}\n", " }\n", " )" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "explainability_export_model_level_arn = explainability_export_response['ExplainabilityExportArn']" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "status = util.wait(lambda: forecast.describe_explainability_export(ExplainabilityExportArn=explainability_export_model_level_arn))\n", "assert status" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "forecast.describe_explainability_export(ExplainabilityExportArn=explainability_export_model_level_arn)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now, let's load and view the data" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "export_data = read_explainability_export(bucket_name, project+\"/\"+explainability_export_name)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "export_data.style.hide_index()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Impact scores come in two forms: Normalized impact scores and Raw impact scores. Raw impact scores are based on Shapley values and are not scaled or bounded. Normalized impact scores scale the raw scores to a value between -1 and 1 to make comparing scores within the Explainability job easier." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Note that the impact scores, including those shown in this notebook, may differ between jobs due to some inherent randonmess in how impact scores are computed." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here we can see the aggregatd scores across time-series for features in the model. \n", "\n", "From the scores, Customer_Request has the highest impact driving up the forecasted values, as the normalized impact score is closest to 1.\n", "\n", "Loss_Rate has a lower impact then Customer_Request does.\n", "\n", "Of the features explained, Holiday_US has the lowest impact on the forecasted values, as the normalized impact score of is closest to 0 (no impact)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "It is important to note that Impact scores measure the relative impact of attributes, not the absolute impact. Therefore, Impact scores cannot be used to conclude whether particular attributes improve model accuracy. If an attribute has a low Impact score, that does not necessarily mean that it has a low impact on forecast values; it means that it has a lower impact on forecast values than other attributes used by the predictor. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Step 2c. Visualize the model-level explainability" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "![Image View predictor explainability impact scores](https://github.com/aws-samples/amazon-forecast-samples/raw/main/notebooks/advanced/Item_Level_Explainability/images/ModelLevelScores.png)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can also view these results on the Amazon Forecast console.\n", "\n", "For more details about using the Forecast console to create and view explainabilities, see: https://aws.amazon.com/blogs/machine-learning/understand-drivers-that-influence-your-forecasts-with-explainability-impact-scores-in-amazon-forecast/" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Step 3. Create Forecast \n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "forecast_name = f\"{project}_forecast_{idx}\"\n", "\n", "create_forecast_response = forecast.create_forecast(\n", " ForecastName=forecast_name,\n", " PredictorArn = predictor_arn\n", " )" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "forecast_arn = create_forecast_response['ForecastArn']" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "status = util.wait(lambda: forecast.describe_forecast(ForecastArn=forecast_arn))\n", "assert status" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "forecast.describe_forecast(ForecastArn=forecast_arn)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Step 4a. Create Explainability for specific time-series \n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We examined the model-level explainability generated during AutoPredictor creation. \n", "\n", "Next we will generate explainability for a set of time-series of our choosing." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To specify a list of time series, upload a CSV file identifying the time series by their item_id and dimension values. You can specify up to 50 time series. You must also define the attributes and attribute types of the time series in a schema.\n", "\n", "In this dataset, each time series is only defined by their item_id. \n", "\n", "We will load and view the item subset file stored locally. " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "item_subset_file = \"InstrumentData/item_subset.csv\"\n", "item_subset_df = pd.read_csv(item_subset_file, names=['item_id'])\n", "item_subset_df.style.hide_index()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now save the local item susbet file to S3, as Forecast expects to read the file from S3. " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "key = f\"{project}/InstrumentData/item_subset.csv\"\n", "boto3.Session().resource('s3').Bucket(bucket_name).Object(key).upload_file(item_subset_file)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "item_subset_path = f\"s3://{bucket_name}/{key}\"\n", "explainability_name = f\"{project}_item_level_explainability_{idx}\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To create the explainability using this subset of time-series, configure the following datatypes:\n", "\n", "* ExplainabilityConfig - set values for TimeSeriesGranularity to “SPECIFIC” and TimePointGranularity to “ALL”.\n", "```python\n", "ExplainabilityConfig={\"TimeSeriesGranularity\": \"SPECIFIC\", \"TimePointGranularity\": \"ALL\"}\n", "```\n", "* S3Config - set the values for “Path” to the S3 location of the CSV file and “RoleArn” to a role with access to the S3 bucket.\n", "```python\n", "\"S3Config\": {\"Path\": item_subset_path, \"RoleArn\": role_arn}\n", "```\n", "* Schema - define the “AttributeName” and “AttributeType” for item_id and the dimensions in the time series.\n", "```python\n", "Schema={\"Attributes\": \n", " [{\"AttributeName\": \"item_id\",\n", " \"AttributeType\": \"string\",\n", " \"AttributeCategory\": \"item_id\"}\n", " ]\n", " \n", " }\n", "```\n", "\n", "In order to view the explainability results on the console, we set EnableVisualiztion to True.\n", "```python\n", "EnableVisualization=True\n", "```" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "create_expainability_response=forecast.create_explainability(ExplainabilityName=explainability_name, \n", " ResourceArn=forecast_arn,\n", " ExplainabilityConfig={\"TimeSeriesGranularity\": \"SPECIFIC\", \"TimePointGranularity\": \"ALL\"},\n", " DataSource= \n", " {\"S3Config\": \n", " {\"Path\": item_subset_path,\n", " \"RoleArn\": role_arn}\n", " },\n", " Schema= \n", " {\"Attributes\": \n", " [{\"AttributeName\": \"item_id\",\n", " \"AttributeType\": \"string\",\n", " \"AttributeCategory\": \"item_id\"}\n", " ]\n", " },\n", " EnableVisualization=True)\n", " \n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "explainability_item_level_arn = create_expainability_response['ExplainabilityArn']" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "status = util.wait(lambda: forecast.describe_explainability(ExplainabilityArn=explainability_item_level_arn))\n", "assert status" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "forecast.describe_explainability(ExplainabilityArn=explainability_item_level_arn)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can also the results on the Amazon Forecast console.\n", "\n", "For more details about using the Forecast console to create and view explainabilities, see: https://aws.amazon.com/blogs/machine-learning/understand-drivers-that-influence-your-forecasts-with-explainability-impact-scores-in-amazon-forecast/" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "![Image View explainability impact scores for specifc items](https://github.com/aws-samples/amazon-forecast-samples/raw/main/notebooks/advanced/Item_Level_Explainability/images/ItemLevelAggregateAllTimepoints.png)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "From the dropdown, selecting the aggregate impact score across all time-series and time-points in the explainability job shows that Model_Type has an impact score of 0.361, meaning overall Model_Type moderately drives up the forecasted order quantites. \n", "\n", "Customer_Request (the number of customers on the waitlist for an item) has slightly less impact, with a score of 0.2608. \n", "\n", "Loss_Rate (the items damaged during transportation) has an impact of 0.1003, less than half of that of Customer_Request. \n", "\n", "Holidays has almost no impact, with a score of almost 0." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Next, let's selected a specific time-series: Guitar_1" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "![Image View explainability impact scores for Guitar 1](https://github.com/aws-samples/amazon-forecast-samples/raw/main/notebooks/advanced/Item_Level_Explainability/images/ItemLevelGuitar1AllTimepoints.png)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Guitar 1 across the timepoints explained in this job has a very high impact of 1 for Model_Type, meaning this attribute for Guitar 1 for the time-series in this job has a high impact that is increasing the forecasted values. \n", "\n", "Customer_Request has a much lower impact that still increases the forecast values.\n", "\n", "Holiday_US has no impact.\n", "\n", "Loss_Rate, represented by the bar in red, has an impact of 0.0419, but this impact decreases the forecasted values, driving them lower." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can also view scores for the items in this job at specific time-points, by selecting a time-point from the drop-down." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Step 4b. Create Explainability export for specific time-series" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Forecast enables you to export a CSV file of Impact scores to an S3 location. These exports are more detailed than the Impact scores displayed in the console.\n", "\n", "If you use the “Specific time series” or “Specific time series and time points” scopes, Forecast will also export aggregated impact scores. Exports for the “Specific time series” scope include aggregated normalized scores for the specified time series, and exports for the “Specific time series and time points” scope include aggregated normalized scores for the specified time points." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "explainability_export_name = f\"{project}_item_level_explainability_export_{idx}\"\n", "explainability_export_destination = f\"s3://{bucket_name}/{project}/{explainability_export_name}\"" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "explainability_export_response = forecast.create_explainability_export(ExplainabilityExportName=explainability_export_name, \n", " ExplainabilityArn=explainability_item_level_arn, \n", " Destination=\n", " {\"S3Config\": \n", " {\"Path\": explainability_export_destination,\n", " \"RoleArn\": role_arn}\n", " })" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "explainability_export_item_level_arn = explainability_export_response['ExplainabilityExportArn']" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "status = util.wait(lambda: forecast.describe_explainability_export(ExplainabilityExportArn=explainability_export_item_level_arn))\n", "assert status" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "forecast.describe_explainability_export(ExplainabilityExportArn=explainability_export_item_level_arn)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now let's load and view the data" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "export_data = read_explainability_export(bucket_name, project+\"/\"+explainability_export_name)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The export for the “Specific time series” scope contains raw and normalized impact scores for the specified time series, as well as a normalized aggregated impact score for all specified time series. The are no raw impact scores for the aggregate because, like with the “Entire forecast” scope, the aggregated scores are already representative of all specified time series." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Impact scores come in two forms: Normalized impact scores and Raw impact scores. Raw impact scores are based on Shapley values and are not scaled or bounded. Normalized impact scores scale the raw scores to a value between -1 and 1 to make comparing scores within the Explainability insight easier." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The export file contains the aggregate impact scores across all time-series in the job across all time-points" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "export_data.loc[export_data['item_id'] == \"Aggregate\"]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The export file also contains the aggregate impact scores across all time-points for each time-series in the job" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "export_data.loc[export_data['timestamp'] == \"Aggregate\"].loc[export_data[\"item_id\"] != \"Aggregate\"]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "From the normalized impact score, we find the for Guitar_1 Model_Type has high impact score close to 1, meaning this attribute is driving up the forecasted values Guitar_1, as the maximum normalized impact score is 1.\n", "\n", "Guitar_4 on the other hand has a normalized impact score close to 1 for Customer_Request, meaning this attribut has a higher impact on Guitar_4 than Loss-Rate does.\n", "\n", "Guitars 2 and 3 are overall not impacted by these features, with aggregate impact scores of 0. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Aggregating impact scores\n", "\n", "Forecast imposes a limit of 50 time-series that can be explained per explainabilty job.\n", "If you have more than 50 items to explain, the explainability for all the time-series can be generated in multiple batches. \n", "\n", "From there, if you want to generate an aggregate score for all time-series across explainability jobs, this can be done by taking an average of the noramlized impact scores for each feature. \n", "\n", "We will create one more explainability job, this time with differenent set of items and aggregate the results with those from the first batch. \n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "second_item_subset_file = \"InstrumentData/second_item_subset.csv\"\n", "second_item_subset_df = pd.read_csv(second_item_subset_file, names=['item_id'])\n", "second_item_subset_df.style.hide_index()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now, save the local item subset to S3" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "key = f\"{project}/InstrumentData/second_item_subset.csv\"\n", "boto3.Session().resource('s3').Bucket(bucket_name).Object(key).upload_file(second_item_subset_file)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "item_subset_path = f\"s3://{bucket_name}/{key}\"\n", "explainability_name_second_batch = f\"{project}_item_level_explainability_2nd_batch_{idx}\"" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "create_expainability_response=forecast.create_explainability(ExplainabilityName=explainability_name_second_batch, \n", " ResourceArn=forecast_arn,\n", " ExplainabilityConfig={\"TimeSeriesGranularity\": \"SPECIFIC\", \"TimePointGranularity\": \"ALL\"},\n", " DataSource= \n", " {\"S3Config\": \n", " {\"Path\": item_subset_path,\n", " \"RoleArn\": role_arn}\n", " },\n", " Schema= \n", " {\"Attributes\": \n", " [{\"AttributeName\": \"item_id\",\n", " \"AttributeType\": \"string\",\n", " \"AttributeCategory\": \"item_id\"}\n", " ]\n", " },\n", " EnableVisualization=True)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "explainability_item_level_batch2_arn = create_expainability_response['ExplainabilityArn']" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "status = util.wait(lambda: forecast.describe_explainability(ExplainabilityArn=explainability_item_level_batch2_arn))\n", "assert status" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "forecast.describe_explainability(ExplainabilityArn=explainability_item_level_batch2_arn)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now export the explainability results for the second batch of items" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "explainability_export_name_second_batch = f\"{project}_item_level_export_batch2_{idx}\"\n", "explainability_export_destination = f\"s3://{bucket_name}/{project}/{explainability_export_name_second_batch}\"" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "explainability_export_response = forecast.create_explainability_export(ExplainabilityExportName=explainability_export_name_second_batch, \n", " ExplainabilityArn=explainability_item_level_batch2_arn, \n", " Destination=\n", " {\"S3Config\": \n", " {\"Path\": explainability_export_destination,\n", " \"RoleArn\": role_arn}\n", " })" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "explainability_export_item_level_batch2_arn = explainability_export_response['ExplainabilityExportArn']" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "status = util.wait(lambda: forecast.describe_explainability_export(ExplainabilityExportArn=explainability_export_item_level_batch2_arn))\n", "assert status" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "forecast.describe_explainability_export(ExplainabilityExportArn=explainability_export_item_level_batch2_arn)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "export_data_second_batch = read_explainability_export(bucket_name, project+\"/\"+explainability_export_name)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Concatenate the explainability export results" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "export_combined_data = pd.concat([export_data, export_data_second_batch])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now that we have the results from both explainability jobs, we take an average across over the normalized impact scores for each feature in the data." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "normalized_columns = ['Customer_Request-NormalizedImpactScore', 'Loss_Rate-NormalizedImpactScore','Model_Type-NormalizedImpactScore','Holiday_US-NormalizedImpactScore']\n", "aggregate_impact_scores = pd.DataFrame(export_combined_data[normalized_columns].mean(), columns=['Mean'])\n", "aggregate_impact_scores" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now we have the aggregate noramlized impact scores for all items in both batches." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 4c. Create Explainability for specific time-series at specific time-points" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "When you specify time points for Forecast Explainability, Amazon Forecast calculates Impact scores for attributes for that specific time range. You can specify up to 500 consecutive time points within the forecast horizon.\n", "\n", "The Impact scores can be interpreted as the impact attributes have on a specific time series at a given time." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The modifications to create explainability for specific time-series at specific time-points are:\n", "* In ExplainabilityConfig, set values for TimeSeriesGranularity to “SPECIFIC” and TimePointGranularity to “SPECIFIC”.\n", "```python\n", "ExplainabilityConfig={\"TimeSeriesGranularity\": \"SPECIFIC\", \"TimePointGranularity\": \"SPECIFIC\"}\n", "```\n", "* Provide a startDateTime and EndDateTime in the request. Impact scores will be generated for all time-points between the startDateTime and EndDateTime. For example:\n", "```python\n", "EndDateTime=\"2022-11-30T09:00:00\",\n", "StartDateTime=\"2022-01-30T09:00:00\"\n", "```" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "First, upload the item subset to S3 and set the explainability name" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "key = f\"{project}/InstrumentData/second_item_subset.csv\"\n", "boto3.Session().resource('s3').Bucket(bucket_name).Object(key).upload_file(second_item_subset_file)\n", "item_subset_path = f\"s3://{bucket_name}/{key}\"\n", "\n", "explainability_name = f\"{project}_item_timepoint_level_explainability_{idx}\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now create the explainability" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "create_expainability_response=forecast.create_explainability(ExplainabilityName=explainability_name, \n", " ResourceArn=forecast_arn,\n", " ExplainabilityConfig={\"TimeSeriesGranularity\": \"SPECIFIC\", \n", " \"TimePointGranularity\": \"SPECIFIC\"},\n", " DataSource= \n", " {\"S3Config\": \n", " {\"Path\": item_subset_path,\n", " \"RoleArn\": role_arn}\n", " },\n", " Schema= \n", " {\"Attributes\": \n", " [\n", " {\"AttributeName\": \"item_id\",\n", " \"AttributeType\": \"string\",\n", " \"AttributeCategory\": \"item_id\"}\n", " ]\n", " },\n", " EndDateTime=\"2022-11-30T09:00:00\",\n", " StartDateTime=\"2022-01-30T09:00:00\",\n", " EnableVisualization=True)\n", " " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "explainability_item_and_timepoint_level_arn = create_expainability_response['ExplainabilityArn']" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "status = util.wait(lambda: forecast.describe_explainability(ExplainabilityArn=explainability_item_and_timepoint_level_arn))\n", "assert status" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "forecast.describe_explainability(ExplainabilityArn=explainability_item_and_timepoint_level_arn)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now that the explainability job is Active, we can view the results on the Forecast console.\n", "\n", "For more details about using the Forecast console to create and view explainabilities, see: https://aws.amazon.com/blogs/machine-learning/understand-drivers-that-influence-your-forecasts-with-explainability-impact-scores-in-amazon-forecast/" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "From the console, let's look specifically at Guitar_5, by selecting this item from the drop-down. We'll compare the scores for Guitar_5 at two different time-points, to see how the impact scores can change for each forecasted time-point." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "![Image View time-series explainability](https://github.com/aws-samples/amazon-forecast-samples/raw/main/notebooks/advanced/Item_Level_Explainability/images/Guitar5TimePoint1Scores.png)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The forecasted value for Guitar_5 on Jan 30th 2022 is highly impacted by Customer_Request, which has a normalized impact score of 1. \n", "\n", "Holiday_US has the next highest impact score of 0.3789, followed by Loss_Rate and Model_Type." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now, let's select the next time-point in the forecast horizon for Guitar_5, on Feb 28th 2022." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "![Image View time-series explainability](https://github.com/aws-samples/amazon-forecast-samples/raw/main/notebooks/advanced/Item_Level_Explainability/images/Guitar5TimePoint2Scores.png)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "For the same Guitar one month later, the Customer_Request impact score changes from 1 to 0.8054.\n", "\n", "The impact of Holidays_US drops from 0.3739 in January down to 0 (no impact) in February.\n", "\n", "Drilling down to specific time-points can paint a more detailed picture of how each attribute in the data is impacting each item over time." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You still have the option of the viewing the aggregate results for a specific item across time-points or for all items in the explainability job across all time-points by selection 'Aggregate' and 'All' from the drop-down." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Step 4d. Create Explainability export for specific time-series at specific time-points" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "explainability_export_name = f\"{project}_item_and_timepoints_level_export_{idx}\"\n", "explainability_export_destination = f\"s3://{bucket_name}/{project}/{explainability_export_name}\"" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "explainability_export_response = forecast.create_explainability_export(ExplainabilityExportName=explainability_export_name, \n", " ExplainabilityArn=explainability_item_and_timepoint_level_arn, \n", " Destination=\n", " {\"S3Config\": \n", " {\"Path\": explainability_export_destination,\n", " \"RoleArn\": role_arn}\n", " }\n", " )" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "explainability_export_item_and_timepoint_level_arn = explainability_export_response['ExplainabilityExportArn']" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "status = util.wait(lambda: forecast.describe_explainability_export(ExplainabilityExportArn=explainability_export_item_and_timepoint_level_arn))\n", "assert status" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "forecast.describe_explainability_export(ExplainabilityExportArn=explainability_export_item_and_timepoint_level_arn)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "export_data_specific_items_and_time_points = read_explainability_export(bucket_name, project+\"/\"+explainability_export_name)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The export for the “Specific time series and time points” scope contains raw and normalized impact scores for the specified time series and time points, as well as normalized and raw aggregated impact scores for all specified time points." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We'll take a look at the results for a specific item, Guitar_5" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "export_data_specific_items_and_time_points\n", "export_data.loc[export_data['item_id'] == \"Guitar_5\"]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Step 5. Cleaning up your Resources" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Once we have completed the above steps, we can start to cleanup the resources we created. All delete jobs, except for `delete_dataset_group` are asynchronous, so we have added the helpful `wait_till_delete` function. \n", "Resource Limits documented here. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If you want to clean up all the resources generated in this notebook, uncomment the lines in the cells below" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Delete explainability exports:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#util.wait_till_delete(lambda: forecast.delete_explainability_export(ExplainabilityExportArn = explainability_export_model_level_arn)\n", "#util.wait_till_delete(lambda: forecast.delete_explainability_export(ExplainabilityExportArn = explainability_export_item_level_arn)\n", "#util.wait_till_delete(lambda: forecast.delete_explainability_export(ExplainabilityExportArn = explainability_export_item_level_batch2_arn)\n", "#util.wait_till_delete(lambda: forecast.delete_explainability_export(ExplainabilityExportArn = explainability_export_item_and_timepoint_level_arn))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Delete explainabilities:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#util.wait_till_delete(lambda: forecast.delete_explainability(ExplainabilityArn = explainability_item_level_arn))\n", "#util.wait_till_delete(lambda: forecast.delete_explainability(ExplainabilityArn = explainability_item_level_batch2_arn))\n", "#util.wait_till_delete(lambda: forecast.delete_explainability(ExplainabilityArn = explainability_model_level_arn))\n", "#util.wait_till_delete(lambda: forecast.delete_explainability(ExplainabilityArn = explainability_item_and_timepoint_level_arn))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Delete forecast:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#util.wait_till_delete(lambda: forecast.delete_forecast(ForecastArn = forecast_arn))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Delete predictor:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#util.wait_till_delete(lambda: forecast.delete_predictor(PredictorArn = predictor_arn))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Delete dataset imports for TTS, RTS and IM:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#util.wait_till_delete(lambda: forecast.delete_dataset_import_job(DatasetImportJobArn=ts_dataset_import_job_arn))\n", "#util.wait_till_delete(lambda: forecast.delete_dataset_import_job(DatasetImportJobArn=rts_dataset_import_job_arn))\n", "#util.wait_till_delete(lambda: forecast.delete_dataset_import_job(DatasetImportJobArn=im_dataset_import_job_arn))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Delete the datasets for TTS, RTS and IM" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#util.wait_till_delete(lambda: forecast.delete_dataset(DatasetArn=ts_dataset_arn))\n", "#util.wait_till_delete(lambda: forecast.delete_dataset(DatasetArn=rts_dataset_arn))\n", "#util.wait_till_delete(lambda: forecast.delete_dataset(DatasetArn=im_dataset_arn))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Delete the dataset group" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#forecast.delete_dataset_group(DatasetGroupArn=dataset_group_arn)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Delete the IAM role" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#util.delete_iam_role( role_name )" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.7" } }, "nbformat": 4, "nbformat_minor": 4 }