# Amazon SageMaker モニタリング Part3(モデルドリフトとFeature Attribution)

このノートブックを実行する時のヒント:

- KernelはPython3(Data Science)で動作確認をしています。
- デフォルトではSageMakerのデフォルトBucketを利用します。必要に応じて変更することも可能です。
- 実際に動かさなくても出力を確認できるようにセルのアウトプットを残しています。きれいな状態から実行したい場合は、右クリックメニューから "Clear All Outputs"を選択して出力をクリアしてから始めてください。
- 作成されたスケジュールはSageMaker StudioのSageMaker resource (左側ペインの一番下)のEndpointメニューからも確認可能です。

## 前提
### 使用するデータの背景
米国国勢調査局の 人口動態調査(Current Population Survey: CPS)- 社会経済年次補助(Annual Social and Economic Supplement: ASEC)を利用します。 CPS は、一般の労働者を含むグループの収入やその他の特性に関する月次の調査です。 また、年次社会経済補足(CPS-ASEC) は、毎年2月、3月、4月に実施され、15歳以上の人の労働経験、収入、非現金給付、移住に関する情報を提供しています。

https://health.gov/healthypeople/objectives-and-data/data-sources-and-methods/data-sources/current-population-survey-annual-social-and-economic-supplement-cps-asec

### データ項目
- A_AGE: 年齢(数値)
- A_FTLF: 現在、フルタイムか(カテゴリ)
- A_HGA: 学歴(カテゴリ)
- A_HSCOL: 現在、高校/大学で就学しているか(カテゴリ)
- A_MARITL: 婚姻状況(カテゴリ)
- A_SEX: 性別(カテゴリ)
- A_UNMEM: 労働組合員か(カテゴリ)
- A_USLHRS: 労働時間(数値)
- NOEMP: 雇用主の企業総従業員数(カテゴリ)
- PENATVTY: 出身国(カテゴリ)
- PRCITSHP: 市民権の記録(カテゴリ)
- SEOTR: 自営業か(カテゴリ)
- WKSWORK: 何週間働いたか(数値)
- PTOTVAL: 総収入額(数値)、推論対象

### 運用時の課題
50000ドル以上の収入が得られる状況は年々変化することが想定される。また、学習時に重視していた特徴量が重要でなくなるなどの環境の変化も想定される。これらをモニタリングして、学習データの見直し、学習アルゴリズムの見直し、再学習を自動で行えるようにしたい。

### データ加工
1. 18歳以上のデータに限定
2. PTOTVALが50000ドル以上であれば1、そうでなければ0(変数名はPTOTVAL_Over50000)

## ステップ0: モニタリング前の準備
### 環境準備

In [2]:
import boto3
import copy
import json
import os
import random
import requests
import time
import numpy as np
import pandas as pd

from datetime import datetime, timedelta

from sagemaker import get_execution_role, image_uris, Session
from sagemaker.clarify import (
 BiasConfig,
 DataConfig,
 ModelConfig,
 ModelPredictedLabelConfig,
 SHAPConfig,
)
from sagemaker.estimator import Estimator
from sagemaker.image_uris import retrieve
from sagemaker.inputs import TrainingInput
from sagemaker.model import Model
from sagemaker.model_monitor import (
 BiasAnalysisConfig,
 CronExpressionGenerator,
 DataCaptureConfig,
 EndpointInput,
 ExplainabilityAnalysisConfig,
 ModelBiasMonitor,
 ModelExplainabilityMonitor,
)
from sagemaker.s3 import S3Downloader, S3Uploader
from sagemaker.utils import name_from_base

In [3]:
role = get_execution_role()
print(f"RoleArn: {role}")

sagemaker_session = Session()
sagemaker_client = sagemaker_session.sagemaker_client
sagemaker_runtime_client = sagemaker_session.sagemaker_runtime_client

region = sagemaker_session.boto_region_name
print(f"AWS region: {region}")

bucket = Session().default_bucket()
print(f"Bucket: {bucket}")
prefix = "sagemaker/blackbelt-part3-sample"
s3_key = f"s3://{bucket}/{prefix}"
print(f"S3 key: {s3_key}")

s3_capture_upload_path = f"{s3_key}/datacapture"
ground_truth_upload_path = f"{s3_key}/ground_truth_data/{datetime.now():%Y-%m-%d-%H-%M-%S}"
s3_report_path = f"{s3_key}/reports"

print(f"Capture path: {s3_capture_upload_path}")
print(f"Ground truth path: {ground_truth_upload_path}")
print(f"Report path: {s3_report_path}")

baseline_prefix = 'baselining/baseline'
baseline_results_uri = f"{s3_key}/baselining"
print(f"Baseline results uri: {baseline_results_uri}")

endpoint_name = name_from_base(f"bb-example-xgboost-classification-model-")
print(f'Endpoint name: {endpoint_name}')
endpoint_instance_count = 1
endpoint_instance_type = "ml.m5.large"
schedule_expression = CronExpressionGenerator.hourly()

training_data_prefix = 'sample_data'
training_dataset_uri = f"{s3_key}/{training_data_prefix}"
print(f'Training dataset s3 path: {training_dataset_uri}')
training_instance_type = "ml.m5.xlarge"

RoleArn: arn:aws:iam::036661559124:role/service-role/AmazonSageMaker-ExecutionRole-20220417T231709
AWS region: ap-northeast-1
Bucket: sagemaker-ap-northeast-1-036661559124
S3 key: s3://sagemaker-ap-northeast-1-036661559124/sagemaker/blackbelt-part3-sample
Capture path: s3://sagemaker-ap-northeast-1-036661559124/sagemaker/blackbelt-part3-sample/datacapture
Ground truth path: s3://sagemaker-ap-northeast-1-036661559124/sagemaker/blackbelt-part3-sample/ground_truth_data/2023-01-04-12-11-59
Report path: s3://sagemaker-ap-northeast-1-036661559124/sagemaker/blackbelt-part3-sample/reports
Baseline results uri: s3://sagemaker-ap-northeast-1-036661559124/sagemaker/blackbelt-part3-sample/baselining
Endpoint name: bb-example-xgboost-classification-model-2023-01-04-12-11-59-829
Training dataset s3 path: s3://sagemaker-ap-northeast-1-036661559124/sagemaker/blackbelt-part3-sample/sample_data


### データ準備
米国国勢調査局のデータをダウンロードします。

In [4]:
def get_csv_file(year):
 if year in [2019, 2020, 2021, 2022]:
 url = f"https://api.census.gov/data/{year}/cps/asec/mar?get=A_AGE,A_FTLF,A_HGA,A_HSCOL,A_MARITL,A_SEX,A_UNMEM,A_USLHRS,NOEMP,PENATVTY,PRCITSHP,SEOTR,WKSWORK,PTOTVAL"
 else:
 url = ""

 try:
 print(f"Downloading {year} file...")
 response = requests.request("GET", url, timeout=(3.0, 3600)).json()
 df = pd.DataFrame(response[1:], columns=response[0])
 df.to_csv(f'cps_{year}.csv')
 except json.JSONDecodeError:
 pass
 
def get_multi_records(years_list):
 for year in years_list:
 get_csv_file(year)

In [5]:
years = [2019, 2020, 2021, 2022]
get_multi_records(years)

Downloading 2019 file...
Downloading 2020 file...
Downloading 2021 file...
Downloading 2022 file...


### データ加工
18歳以上のデータを抽出し、年収が50000ドル以上かどうかのフラグ変数を作成します。加工後のデータをS3へアップロードします。

In [6]:
def data_processing(csv_name):
 df = pd.read_csv(csv_name, index_col=0)
 df = df[df['A_AGE'] >= 18]
 df['PTOTVAL_Over50000'] = np.select([df['PTOTVAL']>50000], [1], default=0)
 df = df.drop(['PTOTVAL'], axis=1)
 col_list = ['PTOTVAL_Over50000'] + df.columns.tolist()[:-1]
 df = df[col_list]
 
 return df

In [7]:
cps_2019_df = data_processing('cps_2019.csv')
cps_2019_df.to_csv("train.csv", header=False, index=False)
cps_2020_df = data_processing('cps_2020.csv')
cps_2020_df.to_csv("validation.csv", header=False, index=False)

boto3.Session().resource('s3').Bucket(bucket).Object(os.path.join(f'{prefix}/{training_data_prefix}', 'train/data.csv')).upload_file('train.csv')
boto3.Session().resource('s3').Bucket(bucket).Object(os.path.join(f'{prefix}/{training_data_prefix}', 'validation/data.csv')).upload_file('validation.csv')

train_input = TrainingInput(f'{training_dataset_uri}/train/data.csv', content_type="csv")
validation_input = TrainingInput(f'{training_dataset_uri}/validation/data.csv', content_type="csv")

ベースラインの作成に必要なデータを作成し、S3へアップロードします。またヘッダー情報を取得します。

In [8]:
baseline_file = 'baseline_with_header.csv'
baseline_dataset = cps_2020_df.sample(n=1000)
baseline_dataset.to_csv(baseline_file, header=True, index=False)
boto3.Session().resource('s3').Bucket(bucket).Object(os.path.join(f'{prefix}/{baseline_prefix}', baseline_file)).upload_file(baseline_file)
baseline_uri = f'{s3_key}/{baseline_prefix}/{baseline_file}'
print(f'baseline_uri: {baseline_uri}')

baseline_uri: s3://sagemaker-ap-northeast-1-036661559124/sagemaker/blackbelt-part3-sample/baselining/baseline/baseline_with_header.csv


In [9]:
with open('baseline_with_header.csv') as f:
 headers_line = f.readline().rstrip()
all_headers = headers_line.split(",")
label_header = all_headers[0]

### モデル準備
ビルトインアルゴリズムのXGBoostを使用してモデルを作成します。

In [10]:
container = retrieve("xgboost", region, version="1.5-1")
xgb = Estimator(
 container,
 role,
 instance_count=1,
 instance_type=training_instance_type,
 disable_profiler=True,
 sagemaker_session=sagemaker_session,
)

xgb.set_hyperparameters(
 max_depth=5,
 eta=0.2,
 gamma=4,
 min_child_weight=6,
 subsample=0.8,
 objective="binary:logistic",
 num_round=800,
)

xgb.fit({'train': train_input, 'validation': validation_input})

2023-01-04 12:14:15 Starting - Starting the training job...
2023-01-04 12:14:29 Starting - Preparing the instances for training............
2023-01-04 12:16:40 Downloading - Downloading input data
2023-01-04 12:16:40 Training - Training image download completed. Training in progress..[34m[2023-01-04 12:16:41.892 ip-10-0-203-120.ap-northeast-1.compute.internal:7 INFO utils.py:27] RULE_JOB_STOP_SIGNAL_FILENAME: None[0m
[34m[2023-01-04:12:16:41:INFO] Imported framework sagemaker_xgboost_container.training[0m
[34m[2023-01-04:12:16:41:INFO] Failed to parse hyperparameter objective value binary:logistic to Json.[0m
[34mReturning the value itself[0m
[34m[2023-01-04:12:16:41:INFO] No GPUs detected (normal if no gpus installed)[0m
[34m[2023-01-04:12:16:41:INFO] Running XGBoost Sagemaker in algorithm mode[0m
[34m[2023-01-04:12:16:41:INFO] Determined delimiter of CSV input is ','[0m
[34m[2023-01-04:12:16:41:INFO] Determined delimiter of CSV input is ','[0m
[34m[2023-01-04:12:16:4

### 推論データ準備
推論用データとして学習データと同様の加工を行い、説明変数のみからなる推論用データと正解データのデータを作成します。

In [11]:
# test_data = data_processing('cps_2021.csv')
test_data = data_processing('cps_2022.csv')

num_examples, num_columns = test_data.shape
print(
 f"The test dataset contains {num_examples} examples and {num_columns} columns.\n"
)

ground_truth_label, features = test_data.iloc[:, :1], test_data.iloc[:, 1:]

features.to_csv('testdata.csv', header=False, index=False)

dataset_type = "text/csv"
test_dataset = 'testdata.csv'

The test dataset contains 114158 examples and 14 columns.



### 推論エンドポイント作成
作成したモデルを用いてエンドポイントをデプロイします。

In [12]:
model_url = xgb.model_data
print(f"Model file is stored in {model_url}")

Model file is stored in s3://sagemaker-ap-northeast-1-036661559124/sagemaker-xgboost-2023-01-04-12-14-14-700/output/model.tar.gz


In [13]:
model_name = f'xgb-{datetime.now():%Y-%m-%d-%H%M}'
print("Model name: ", model_name)
print("Endpoint name: ", endpoint_name)

Model name: xgb-2023-01-04-1218
Endpoint name: bb-example-xgboost-classification-model-2023-01-04-12-11-59-829


In [14]:
image_uri = image_uris.retrieve("xgboost", region, "1.5-1")

print(f"XGBoost image uri: {image_uri}")
model = Model(
 role=role,
 name=model_name,
 image_uri=image_uri,
 model_data=model_url,
 sagemaker_session=sagemaker_session,
)

data_capture_config = DataCaptureConfig(
 enable_capture=True,
 sampling_percentage=100,
 destination_s3_uri=s3_capture_upload_path,
)

print(f"Deploying model {model_name} to endpoint {endpoint_name}")
model.deploy(
 initial_instance_count=endpoint_instance_count,
 instance_type=endpoint_instance_type,
 endpoint_name=endpoint_name,
 data_capture_config=data_capture_config,
)

XGBoost image uri: 354813040037.dkr.ecr.ap-northeast-1.amazonaws.com/sagemaker-xgboost:1.5-1
Deploying model xgb-2023-01-04-1218 to endpoint bb-example-xgboost-classification-model-2023-01-04-12-11-59-829
-----!

### 推論
作成したエンドポイントを使用して推論を実行します。Feature Attributionでドリフトを検知したい場合はコメントアウトされているpayloadを使用してください。

In [15]:
print(f"Sending test traffic to the endpoint {endpoint_name}. \nPlease wait", end="")
test_dataset_size = 0 # record the number of rows in data we're sending for inference
with open(test_dataset, "r") as f:
 for row in f:
 if 400 < test_dataset_size and test_dataset_size < 450:
 payload = row.rstrip("\n")
 # 下記を有効にするとFeature Attributionのドリフトが検知できる
 # payload = '100,100,100,100,100,100,100,100,100,100,100,100,100'
 response = sagemaker_runtime_client.invoke_endpoint(
 EndpointName=endpoint_name,
 Body=payload,
 ContentType=dataset_type,
 InferenceId=str(test_dataset_size),
 )
 prediction = response["Body"].read()
 print(".", end="", flush=True)
 time.sleep(0.5)
 test_dataset_size += 1

print()
print("Done!")

Sending test traffic to the endpoint bb-example-xgboost-classification-model-2023-01-04-12-11-59-829. 
Please wait.................................................
Done!


### キャプチャデータの確認
キャプチャされたデータを確認します。

もし600秒では足りず、エラーになった場合は時間をおいて再度実行してみてください。

こちらのコードはキャプチャデータの確認のみが目的のため、実行が完了していなくても、先に進むことができます。

In [16]:
print("Waiting 600 seconds for captures to show up", end="")
for _ in range(600):
 capture_files = sorted(S3Downloader.list(f"{s3_capture_upload_path}/{endpoint_name}"))
 if capture_files:
 break
 print(".", end="", flush=True)
 time.sleep(1)
print()
print("Found Capture Files:")
print("\n ".join(capture_files[-5:]))

Waiting 600 seconds for captures to show up.................................................
Found Capture Files:
s3://sagemaker-ap-northeast-1-036661559124/sagemaker/blackbelt-part3-sample/datacapture/bb-example-xgboost-classification-model-2023-01-04-12-11-59-829/AllTraffic/2023/01/04/12/21-01-974-079e84cc-98a0-41fa-842a-f18c59da9013.jsonl


In [17]:
capture_file = S3Downloader.read_file(capture_files[-1]).split("\n")[-10:-1]
print(capture_file[-1])

{"captureData":{"endpointInput":{"observedContentType":"text/csv","mode":"INPUT","data":"77,0,44,0,1,2,0,-1,0,57,1,0,0","encoding":"CSV"},"endpointOutput":{"observedContentType":"text/csv; charset=utf-8","mode":"OUTPUT","data":"0.2509244382381439\n","encoding":"CSV"}},"eventMetadata":{"eventId":"a608eba5-9aef-4580-8dba-a2b35aa94324","inferenceId":"449","inferenceTime":"2023-01-04T12:21:26Z"},"eventVersion":"0"}


In [18]:
print(json.dumps(json.loads(capture_file[-1]), indent=2))

{
 "captureData": {
 "endpointInput": {
 "observedContentType": "text/csv",
 "mode": "INPUT",
 "data": "77,0,44,0,1,2,0,-1,0,57,1,0,0",
 "encoding": "CSV"
 },
 "endpointOutput": {
 "observedContentType": "text/csv; charset=utf-8",
 "mode": "OUTPUT",
 "data": "0.2509244382381439\n",
 "encoding": "CSV"
 }
 },
 "eventMetadata": {
 "eventId": "a608eba5-9aef-4580-8dba-a2b35aa94324",
 "inferenceId": "449",
 "inferenceTime": "2023-01-04T12:21:26Z"
 },
 "eventVersion": "0"
}


### 正解データのアップロード
正解データをアップロードします。正解データがない場合、バイアスのスケジュールジョブは失敗します。

In [19]:
# import random

def ground_truth_with_id(inference_id):
 # random.seed(inference_id)
 # rand = random.random()

 return {
 "groundTruthData": {
 # ランダムなデータを正解として用意する場合のサンプルコード
 # "data": "1" if rand < 0.7 else "0",
 "data": str(ground_truth_label.iat[inference_id,0]),
 "encoding": "CSV",
 },
 "eventMetadata": {
 "eventId": str(inference_id),
 },
 "eventVersion": "0",
 }


def upload_ground_truth(upload_time):
 records = [ground_truth_with_id(i) for i in range(test_dataset_size)]
 fake_records = [json.dumps(r) for r in records]
 data_to_upload = "\n".join(fake_records)
 target_s3_uri = f"{ground_truth_upload_path}/{upload_time:%Y/%m/%d/%H/%M%S}.jsonl"
 print(f"Uploading {len(fake_records)} records to", target_s3_uri)
 S3Uploader.upload_string_as_file_body(data_to_upload, target_s3_uri)

In [20]:
upload_ground_truth(datetime.now())

Uploading 114158 records to s3://sagemaker-ap-northeast-1-036661559124/sagemaker/blackbelt-part3-sample/ground_truth_data/2023-01-04-12-11-59/2023/01/04/12/2231.jsonl


## ステップ1: モデルバイアス
### ベースライン作成
ベースラインの作成には10分程度かかります。

In [21]:
model_bias_monitor = ModelBiasMonitor(
 role=role,
 sagemaker_session=sagemaker_session,
 max_runtime_in_seconds=1800,
)

In [22]:
model_bias_baselining_job_result_uri = f"{baseline_results_uri}/model_bias"
model_bias_data_config = DataConfig(
 s3_data_input_path=baseline_uri,
 s3_output_path=model_bias_baselining_job_result_uri,
 label=label_header,
 headers=all_headers,
 dataset_type=dataset_type,
)

In [23]:
model_bias_config = BiasConfig(
 label_values_or_threshold=[1], 
 facet_name='A_SEX',
 group_name='A_HGA'
)

In [24]:
model_predicted_label_config = ModelPredictedLabelConfig(
 probability_threshold=0.7,
)

In [25]:
model_config = ModelConfig(
 model_name=model_name,
 instance_count=endpoint_instance_count,
 instance_type=endpoint_instance_type,
 content_type=dataset_type,
 accept_type=dataset_type,
)

In [26]:
model_bias_monitor.suggest_baseline(
 model_config=model_config,
 data_config=model_bias_data_config,
 bias_config=model_bias_config,
 model_predicted_label_config=model_predicted_label_config,
)
print(f"ModelBiasMonitor baselining job: {model_bias_monitor.latest_baselining_job_name}")


Job Name: baseline-suggestion-job-2023-01-04-12-22-36-206
Inputs: [{'InputName': 'dataset', 'AppManaged': False, 'S3Input': {'S3Uri': 's3://sagemaker-ap-northeast-1-036661559124/sagemaker/blackbelt-part3-sample/baselining/baseline/baseline_with_header.csv', 'LocalPath': '/opt/ml/processing/input/data', 'S3DataType': 'S3Prefix', 'S3InputMode': 'File', 'S3DataDistributionType': 'FullyReplicated', 'S3CompressionType': 'None'}}, {'InputName': 'analysis_config', 'AppManaged': False, 'S3Input': {'S3Uri': 's3://sagemaker-ap-northeast-1-036661559124/sagemaker/blackbelt-part3-sample/baselining/model_bias/analysis_config.json', 'LocalPath': '/opt/ml/processing/input/config', 'S3DataType': 'S3Prefix', 'S3InputMode': 'File', 'S3DataDistributionType': 'FullyReplicated', 'S3CompressionType': 'None'}}]
Outputs: [{'OutputName': 'analysis_result', 'AppManaged': False, 'S3Output': {'S3Uri': 's3://sagemaker-ap-northeast-1-036661559124/sagemaker/blackbelt-part3-sample/baselining/model_bias', 'LocalPath':

In [27]:
model_bias_monitor.latest_baselining_job.wait(logs=False)
model_bias_constraints = model_bias_monitor.suggested_constraints()
print()
print(f"ModelBiasMonitor suggested constraints: {model_bias_constraints.file_s3_uri}")
print(S3Downloader.read_file(model_bias_constraints.file_s3_uri))

...................................................................................................!
ModelBiasMonitor suggested constraints: s3://sagemaker-ap-northeast-1-036661559124/sagemaker/blackbelt-part3-sample/baselining/model_bias/analysis.json
{
 "version": "1.0",
 "post_training_bias_metrics": {
 "label": "PTOTVAL_Over50000",
 "facets": {
 "A_SEX": [
 {
 "value_or_threshold": "1",
 "metrics": [
 {
 "name": "AD",
 "description": "Accuracy Difference (AD)",
 "value": 0.047995482778091514
 },
 {
 "name": "CDDPL",
 "description": "Conditional Demographic Disparity in Predicted Labels (CDDPL)",
 "value": -0.35173994883458676
 },
 {
 "name": "DAR",
 "description": "Difference in Acceptance Rates (DAR)",
 "value": -0.18103448275862066
 },
 {
 "name": "DCA",
 "description": "Difference in Conditional Acceptance (DCA)",
 "value": 0.6393678160919543
 },
 {
 "name": "DCR",
 "description": "Difference in Conditional Rejection (DCR)",
 "value": -0.10960185448777915
 },
 {
 "name": "DI",
 

### スケジュールジョブ作成

In [28]:
model_bias_analysis_config = None
if not model_bias_monitor.latest_baselining_job:
 model_bias_analysis_config = BiasAnalysisConfig(
 model_bias_config,
 headers=all_headers,
 label=label_header,
 )
model_bias_monitor.create_monitoring_schedule(
 analysis_config=model_bias_analysis_config,
 output_s3_uri=s3_report_path,
 endpoint_input=EndpointInput(
 endpoint_name=endpoint_name,
 destination="/opt/ml/processing/input/endpoint",
 start_time_offset="-PT1H",
 end_time_offset="-PT0H",
 probability_threshold_attribute=0.7,
 ),
 ground_truth_input=ground_truth_upload_path,
 schedule_cron_expression=schedule_expression,
 enable_cloudwatch_metrics=True,
)
print(f"Model bias monitoring schedule: {model_bias_monitor.monitoring_schedule_name}")

Model bias monitoring schedule: monitoring-schedule-2023-01-04-12-30-54-730


## ステップ2: Feature Attribution
### ベースライン作成
ベースラインの作成には10分程度かかります。

In [29]:
model_explainability_monitor = ModelExplainabilityMonitor(
 role=role,
 sagemaker_session=sagemaker_session,
 max_runtime_in_seconds=1800,
)

In [30]:
model_explainability_baselining_job_result_uri = f"{baseline_results_uri}/model_explainability"
model_explainability_data_config = DataConfig(
 s3_data_input_path=baseline_uri,
 s3_output_path=model_explainability_baselining_job_result_uri,
 label=label_header,
 headers=all_headers,
 dataset_type=dataset_type,
)

In [31]:
baseline_dataframe = pd.read_csv(baseline_uri).drop(label_header, axis=1)
shap_baseline = [list(baseline_dataframe.mean())]

shap_config = SHAPConfig(
 baseline=shap_baseline,
 num_samples=100,
 agg_method="mean_abs",
 save_local_shap_values=True,
)

In [32]:
model_explainability_monitor.suggest_baseline(
 data_config=model_explainability_data_config,
 model_config=model_config,
 explainability_config=shap_config,
)
print(
 f"ModelExplainabilityMonitor baselining job: {model_explainability_monitor.latest_baselining_job_name}"
)


Job Name: baseline-suggestion-job-2023-01-04-12-30-55-986
Inputs: [{'InputName': 'dataset', 'AppManaged': False, 'S3Input': {'S3Uri': 's3://sagemaker-ap-northeast-1-036661559124/sagemaker/blackbelt-part3-sample/baselining/baseline/baseline_with_header.csv', 'LocalPath': '/opt/ml/processing/input/data', 'S3DataType': 'S3Prefix', 'S3InputMode': 'File', 'S3DataDistributionType': 'FullyReplicated', 'S3CompressionType': 'None'}}, {'InputName': 'analysis_config', 'AppManaged': False, 'S3Input': {'S3Uri': 's3://sagemaker-ap-northeast-1-036661559124/sagemaker/blackbelt-part3-sample/baselining/model_explainability/analysis_config.json', 'LocalPath': '/opt/ml/processing/input/config', 'S3DataType': 'S3Prefix', 'S3InputMode': 'File', 'S3DataDistributionType': 'FullyReplicated', 'S3CompressionType': 'None'}}]
Outputs: [{'OutputName': 'analysis_result', 'AppManaged': False, 'S3Output': {'S3Uri': 's3://sagemaker-ap-northeast-1-036661559124/sagemaker/blackbelt-part3-sample/baselining/model_explainab

In [33]:
model_explainability_monitor.latest_baselining_job.wait(logs=False)
model_explainability_constraints = model_explainability_monitor.suggested_constraints()
print()
print(
 f"ModelExplainabilityMonitor suggested constraints: {model_explainability_constraints.file_s3_uri}"
)
print(S3Downloader.read_file(model_explainability_constraints.file_s3_uri))

.........................................................................................................!
ModelExplainabilityMonitor suggested constraints: s3://sagemaker-ap-northeast-1-036661559124/sagemaker/blackbelt-part3-sample/baselining/model_explainability/analysis.json
{
 "version": "1.0",
 "explanations": {
 "kernel_shap": {
 "label0": {
 "global_shap_values": {
 "A_AGE": 0.03900713880041894,
 "A_FTLF": 0.009864065651959351,
 "A_HGA": 0.08227859137445104,
 "A_HSCOL": 0.006078339682850803,
 "A_MARITL": 0.020923815390634877,
 "A_SEX": 0.0464811787751795,
 "A_UNMEM": 0.00656392329159415,
 "A_USLHRS": 0.09165896714888844,
 "NOEMP": 0.029412844725952524,
 "PENATVTY": 0.03568894808061845,
 "PRCITSHP": 0.008444406567974315,
 "SEOTR": 0.008066975367524291,
 "WKSWORK": 0.10033051463598301
 },
 "expected_value": 0.05636756867170334
 }
 }
 }
}


### スケジュールジョブ作成

In [34]:
model_explainability_analysis_config = None
if not model_explainability_monitor.latest_baselining_job:
 headers_without_label_header = copy.deepcopy(all_headers)
 headers_without_label_header.remove(label_header)
 model_explainability_analysis_config = ExplainabilityAnalysisConfig(
 explainability_config=shap_config,
 model_config=model_config,
 headers=headers_without_label_header,
 )
model_explainability_monitor.create_monitoring_schedule(
 analysis_config = model_explainability_analysis_config,
 output_s3_uri=s3_report_path,
 endpoint_input=endpoint_name,
 schedule_cron_expression=schedule_expression,
 enable_cloudwatch_metrics=True,
)

## ステップ3: リソース削除

In [39]:
model_bias_monitor.delete_monitoring_schedule()


Deleting Monitoring Schedule with name: monitoring-schedule-2023-01-04-12-30-54-730


In [40]:
model_explainability_monitor.delete_monitoring_schedule()


Deleting Monitoring Schedule with name: monitoring-schedule-2023-01-04-12-39-44-618


In [41]:
from sagemaker.predictor import Predictor

predictor = Predictor(endpoint_name, sagemaker_session=sagemaker_session)
predictor.delete_endpoint(delete_endpoint_config=False)

In [42]:
model.delete_model()