{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# モデル品質モニタリングのステップB" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "このノートブックを実行する時のヒント: \n", "- このノートブックは大容量のRawデータを読み込むため、メモリー8GB以上のインスタンスで実行してください\n", "- KernelはPython3(Data Science)で動作確認をしています。\n", "- デフォルトではSageMakerのデフォルトBucketを利用します。必要に応じて変更することも可能です。\n", "- 実際に動かさなくても出力を確認できるようにセルのアウトプットを残しています。きれいな状態から実行したい場合は、右クリックメニューから \"Clear All Outputs\"を選択して出力をクリアしてから始めてください。\n", "- 作成されたスケジュールはSageMaker Studioの`SageMaker resource` (左側ペインの一番下)のEndpointメニューからも確認可能" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "複数のノートブックで共通で使用する変数" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "# エンドポイント名を指定する\n", "endpoint_name = 'nyctaxi-xgboost-endpoint'\n", "\n", "# エンドポイントConfigの名前を指定する\n", "endpoint_config_name = f'{endpoint_name}-config'\n", "\n", "# データ品質のモニタリングスケジュールの名前を指定する\n", "model_quality_monitoring_schedule = f'{endpoint_name}-model-quality-schedule'\n", "\n", "# SageMaker default bucketをModel Monitorのバケットとして使用\n", "# それ以外のバケットを使用している場合はここで指定する\n", "import sagemaker\n", "bucket = sagemaker.Session().default_bucket()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "モニタリング結果を保管するための、ベースラインやレポートのS3上のPrefixを設定します" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## B1(オプションA) 推論を実行してGround TruthをS3にアップロードする\n", "推論の実行後に次の周期のモニタリングジョブを待つ必要があります" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "# ベースラインの出力先Prefixを設定する\n", "baseline_prefix = 'model_monitor/model_quality_baseline'\n", "\n", "# 時系列での可視化のために、複数のレポートに共通するPrefixを設定する\n", "report_prefix = 'model_monitor/model_quality_monitoring_report'\n", "\n", "# Ground Truthをアップロードする先のPrefixを指定します\n", "ground_truth_prefix = 'model_monitor/model_quality_ground_truth'" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Inference IDを指定して推論を実行する" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "# 推論を実行する日付を指定する\n", "prediction_target_date = '2021-09-15'\n", "\n", "# データのサンプリングレートを指定する(モデル作成時の設定に合わせる)\n", "sampling_rate = 20\n", "\n", "# 推論結果を保存するディレクトリ名を指定する\n", "result_dir = 'prediction_results_model_quality'" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [], "source": [ "import os\n", "import boto3\n", "import pandas as pd\n", "import time\n", "from datetime import datetime\n", "import model_utils" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [], "source": [ "def get_data_for_pred(target, sampling_rate):\n", " previous_year, previous_month = model_utils.get_previous_year_month(target.year, target.month)\n", " df_previous_month = model_utils.get_raw_data(previous_year, previous_month, sampling_rate)\n", " df_current_month = model_utils.get_raw_data(target.year, target.month, sampling_rate)\n", " df_data = pd.concat([df_previous_month, df_current_month])\n", " del df_previous_month\n", " del df_current_month\n", "\n", " # Extract features\n", " df_features = model_utils.extract_features(df_data)\n", " df_features = model_utils.filter_current_month(df_features, target.year, target.month)\n", " \n", " return df_features" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Loading data for 2021-09\n", "Predicting 2021-09-15 00:00:00 nyctaxi-xgboost-endpoint\n", "Prediciton completed. Result file: prediction_results_model_quality/prediction-result-2021-09-15.csv\n" ] } ], "source": [ "# Create result directory if not exist\n", "if not os.path.exists(result_dir):\n", " os.makedirs(result_dir)\n", "\n", "target_date = pd.to_datetime(prediction_target_date)\n", "print('Loading data for', target_date.strftime('%Y-%m'))\n", "df_features = get_data_for_pred(target_date, sampling_rate)\n", " \n", "# Exec prediction for the target date\n", "print('Predicting', target_date, endpoint_name)\n", "df_pred = df_features[df_features.index == target_date].copy()\n", "df_pred[['pred', 'inference_id']] = model_utils.exec_prediction(endpoint_name, df_pred)\n", "\n", "# Save prediction result\n", "result_file = f'{result_dir}/prediction-result-{prediction_target_date}.csv'\n", "df_pred.to_csv(result_file, index=False)\n", "print('Prediciton completed. Result file: ', result_file)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 推論時に取得したInference IDとGround TruthをマージしてS3にアップロードする" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [], "source": [ "import sagemaker\n", "import boto3\n", "import pandas as pd\n", "import io\n", "import json\n", "import sagemaker\n", "from sagemaker.s3 import S3Uploader\n", "from datetime import datetime\n", "\n", "s3r = boto3.resource('s3')" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [], "source": [ "# Ground Truth(今回はpickup_countカラム)および、inference id(今回はinference_id)が格納されたファイルを取得する\n", "# ここではローカルファイルのcsvに1日分の推論実行結果が保存されており、Ground TruthとInferenceが同一ファイルに格納されていると想定する。\n", "ground_truth_colname = 'pickup_count'\n", "inference_id_colname = 'inference_id'\n", "\n", "# Ground TruthをアップロードするPrefixを設定\n", "# create_monitoring_scheduleを実行した際のモニタリングジョブの設定と一致させる\n", "bucket = sagemaker.Session().default_bucket()\n", "ground_truth_path = f's3://{bucket}/{ground_truth_prefix}'" ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "scrolled": true }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
pickup_counthistory_12slotshistory_16slotshistory_20slotshistory_24slotshistory_28slotshistory_32slotshistory_36slotshistory_40slotshistory_44slots...tolls_amount_mean_20slottolls_amount_mean_96slottolls_amount_mean_100slottolls_amount_mean_104slottolls_amount_mean_192slottolls_amount_mean_196slottolls_amount_mean_200slottime_slotpredinference_id
0113264.0324.0350.0359.0369.0362.0352.0301.0294.0...0.3527710.4405940.6742650.2037780.1690220.5440140.699157088f7642aa4-4496-43be-9185-89e9cfe14c53
191249.0274.0330.0391.0351.0320.0343.0340.0279.0...0.2778790.7860000.7516390.4634430.8831460.6812000.872543171a0911496-6e3e-4984-84e2-349152581687
281273.0270.0327.0405.0376.0363.0333.0316.0298.0...0.2603980.7771190.5642860.5360101.1058440.6610090.595752272bed71a18-4006-461d-863d-e9c1ec7e65d6
\n", "

3 rows × 145 columns

\n", "
" ], "text/plain": [ " pickup_count history_12slots history_16slots history_20slots \\\n", "0 113 264.0 324.0 350.0 \n", "1 91 249.0 274.0 330.0 \n", "2 81 273.0 270.0 327.0 \n", "\n", " history_24slots history_28slots history_32slots history_36slots \\\n", "0 359.0 369.0 362.0 352.0 \n", "1 391.0 351.0 320.0 343.0 \n", "2 405.0 376.0 363.0 333.0 \n", "\n", " history_40slots history_44slots ... tolls_amount_mean_20slot \\\n", "0 301.0 294.0 ... 0.352771 \n", "1 340.0 279.0 ... 0.277879 \n", "2 316.0 298.0 ... 0.260398 \n", "\n", " tolls_amount_mean_96slot tolls_amount_mean_100slot \\\n", "0 0.440594 0.674265 \n", "1 0.786000 0.751639 \n", "2 0.777119 0.564286 \n", "\n", " tolls_amount_mean_104slot tolls_amount_mean_192slot \\\n", "0 0.203778 0.169022 \n", "1 0.463443 0.883146 \n", "2 0.536010 1.105844 \n", "\n", " tolls_amount_mean_196slot tolls_amount_mean_200slot time_slot pred \\\n", "0 0.544014 0.699157 0 88 \n", "1 0.681200 0.872543 1 71 \n", "2 0.661009 0.595752 2 72 \n", "\n", " inference_id \n", "0 f7642aa4-4496-43be-9185-89e9cfe14c53 \n", "1 a0911496-6e3e-4984-84e2-349152581687 \n", "2 bed71a18-4006-461d-863d-e9c1ec7e65d6 \n", "\n", "[3 rows x 145 columns]" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Ground Truthの収集\n", "df_prediction = pd.read_csv(result_file)\n", "df_prediction.head(3)" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [], "source": [ "# Ground TruthとInference IDを開発者ガイドで定義されたフォーマットに変換する\n", "# https://docs.aws.amazon.com/sagemaker/latest/dg/model-monitor-model-quality-merge.html\n", "def ground_truth_with_id(ground_truth, inference_id):\n", " return {\n", " \"groundTruthData\": {\n", " \"data\": str(ground_truth),\n", " \"encoding\": \"CSV\",\n", " },\n", " \"eventMetadata\": {\n", " \"eventId\": str(inference_id),\n", " },\n", " \"eventVersion\": \"0\",\n", " }" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "推論したデータの前半に対しては、すぐにGround Truthが付与されたと仮定して、Ground Truthをアップロードする" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'s3://sagemaker-ap-northeast-1-370828233696/model_monitor/model_quality_ground_truth/2022/12/16/10/ground_truth.jsonl'" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Prediction結果をjsonに変換し、jsonlとしてstring化する\n", "gt_records = df_prediction.iloc[:int(df_prediction.shape[0]/2)].apply(lambda x: json.dumps(ground_truth_with_id(x[ground_truth_colname], x[inference_id_colname])), axis=1)\n", "gt_jsonl = '\\n'.join(gt_records)\n", "\n", "# StringをS3のground truth inputにアップロードする\n", "upload_time = datetime.now().strftime('%Y/%m/%d/%H')\n", "target_s3_uri = f'{ground_truth_path}/{upload_time}/ground_truth.jsonl'\n", "S3Uploader.upload_string_as_file_body(gt_jsonl, target_s3_uri)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "推論したデータの後半に対しては、Ground Truthの付与に1時間かかったと仮定して、3600秒後にGround Truthをアップロードする" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'s3://sagemaker-ap-northeast-1-370828233696/model_monitor/model_quality_ground_truth/2022/12/16/11/ground_truth.jsonl'" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "time.sleep(3600)\n", "# Prediction結果をjsonに変換し、jsonlとしてstring化する\n", "gt_records = df_prediction.iloc[int(df_prediction.shape[0]/2):].apply(lambda x: json.dumps(ground_truth_with_id(x[ground_truth_colname], x[inference_id_colname])), axis=1)\n", "gt_jsonl = '\\n'.join(gt_records)\n", "\n", "# StringをS3のground truth inputにアップロードする\n", "upload_time = datetime.now().strftime('%Y/%m/%d/%H')\n", "target_s3_uri = f'{ground_truth_path}/{upload_time}/ground_truth.jsonl'\n", "S3Uploader.upload_string_as_file_body(gt_jsonl, target_s3_uri)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "- ここまでの実行で推論の実行とGround Truthのアップロードが完了しました\n", "- 次の時間の0分から20分の間にモデル品質のモニタリングジョブが稼働し、モニタリングレポートが出力されます" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## B1 (オプションB) モニタリングの実行をまたずにサンプルで分析してみる\n", "モニタリング結果の分析では、複数のモニタリング周期にまたがる精度の推移を可視化します。複数のモニタリング周期にまたがってレポートを出力するには時間がかかるため、サンプルコードに含まれるレポートで分析や可視化を試したい場合は、以下のセルを実行してS3バケットにサンプルのレポートをアップロードしてください \n", "ご自身のレポートで可視化を行う場合は、このセルはスキップしてください" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "sagemaker.s3.S3Uploader.upload('model_quality_samples', f's3://{bucket}/model_monitor/model_quality_samples')\n", "\n", "baseline_prefix = 'model_monitor/model_quality_samples/baseline'\n", "report_prefix = 'model_monitor/model_quality_samples/reports'\n", "specific_report_prefix = 'model_monitor/model_quality_samples/reports/2020/03/16/01'" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# B2. モニタリング結果の分析" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "分析の実行" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "import sagemaker\n", "from sagemaker import model_monitor\n", "import boto3\n", "import pandas as pd" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "def get_reports(report_bucket, report_location, file_type):\n", " s3 = boto3.client('s3')\n", " assert file_type in ('statistics', 'constraints', 'violations')\n", " \n", " resp = s3.list_objects(Bucket=report_bucket, Prefix=report_location)\n", " report_files = [x['Key'] for x in resp['Contents'] if x['Key'].endswith(f'{file_type}.json')]\n", "\n", " monitoring_reports = {}\n", " for key in sorted(report_files):\n", " report_s3uri = f's3://{report_bucket}/{key}'\n", " \n", " if file_type == 'statistics':\n", " body_dict = model_monitor.Statistics.from_s3_uri(report_s3uri).body_dict\n", " elif file_type == 'constraints':\n", " body_dict = model_monitor.Constraints.from_s3_uri(report_s3uri).body_dict\n", " elif file_type == 'violations':\n", " body_dict = model_monitor.ConstraintViolations.from_s3_uri(report_s3uri).body_dict\n", " else:\n", " print('Unexpected file type')\n", " return \n", " \n", " report_time = pd.to_datetime('-'.join(key.split('/')[-5:-1]), format='%Y-%m-%d-%H')\n", " monitoring_reports[report_time] = body_dict\n", "\n", " return monitoring_reports" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "def extract_stats_value(statistics_reports, metric_type):\n", " \n", " all_values = {}\n", " for key, report in statistics_reports.items(): \n", " report_values = {}\n", " for metric_name, metric_value_dict in report[metric_type].items():\n", " report_values[metric_name] = metric_value_dict['value']\n", " \n", " all_values[key] = report_values\n", "\n", " df = pd.DataFrame(all_values).transpose()\n", " return df" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 精度メトリクスを時系列でDataFrame化する" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
maemsermser2
2020-01-06 01:00:0048.7360823559.98582859.6656170.915222
2020-01-13 01:00:0033.6408432203.99505346.9467260.959612
2020-01-20 01:00:0056.1779815339.80594573.0739760.840637
2020-01-27 01:00:0032.0588511598.72337939.9840390.966500
2020-02-03 01:00:0039.3237012523.98045150.2392320.948893
2020-02-10 01:00:0041.4852552958.22325354.3895510.949442
2020-02-17 01:00:0047.2022563553.18506759.6085990.865158
2020-02-24 01:00:0039.5852342842.26004153.3128510.942540
2020-03-02 01:00:0041.0839482872.07738153.5917660.942076
2020-03-09 01:00:0061.8584247275.34727085.2956460.812754
2020-03-16 01:00:00101.94184314074.934422118.637829-1.424614
2020-03-23 01:00:00220.14214457067.685159238.888437-176.484370
2020-03-30 01:00:00190.39782945052.495289212.255731-274.076476
2020-04-06 01:00:0092.6103199805.26324899.021529-85.943666
2020-04-13 01:00:0063.3925684285.37839965.462802-37.234659
2020-04-20 01:00:0062.3104754175.12419264.615201-34.572366
2020-04-27 01:00:0061.9108814149.06238464.413216-25.887938
2020-05-04 01:00:0059.3077943856.04983462.097100-18.118762
2020-05-11 01:00:0060.7884364143.17098664.367468-18.026519
2020-05-18 01:00:0059.1327323848.76569262.038421-16.037470
\n", "
" ], "text/plain": [ " mae mse rmse r2\n", "2020-01-06 01:00:00 48.736082 3559.985828 59.665617 0.915222\n", "2020-01-13 01:00:00 33.640843 2203.995053 46.946726 0.959612\n", "2020-01-20 01:00:00 56.177981 5339.805945 73.073976 0.840637\n", "2020-01-27 01:00:00 32.058851 1598.723379 39.984039 0.966500\n", "2020-02-03 01:00:00 39.323701 2523.980451 50.239232 0.948893\n", "2020-02-10 01:00:00 41.485255 2958.223253 54.389551 0.949442\n", "2020-02-17 01:00:00 47.202256 3553.185067 59.608599 0.865158\n", "2020-02-24 01:00:00 39.585234 2842.260041 53.312851 0.942540\n", "2020-03-02 01:00:00 41.083948 2872.077381 53.591766 0.942076\n", "2020-03-09 01:00:00 61.858424 7275.347270 85.295646 0.812754\n", "2020-03-16 01:00:00 101.941843 14074.934422 118.637829 -1.424614\n", "2020-03-23 01:00:00 220.142144 57067.685159 238.888437 -176.484370\n", "2020-03-30 01:00:00 190.397829 45052.495289 212.255731 -274.076476\n", "2020-04-06 01:00:00 92.610319 9805.263248 99.021529 -85.943666\n", "2020-04-13 01:00:00 63.392568 4285.378399 65.462802 -37.234659\n", "2020-04-20 01:00:00 62.310475 4175.124192 64.615201 -34.572366\n", "2020-04-27 01:00:00 61.910881 4149.062384 64.413216 -25.887938\n", "2020-05-04 01:00:00 59.307794 3856.049834 62.097100 -18.118762\n", "2020-05-11 01:00:00 60.788436 4143.170986 64.367468 -18.026519\n", "2020-05-18 01:00:00 59.132732 3848.765692 62.038421 -16.037470" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Statisticsレポートを取得\n", "statistics_reports = get_reports(bucket, report_prefix, 'statistics')\n", "\n", "# 特定のmetricを取得\n", "df_regression_metrics = extract_stats_value(statistics_reports, 'regression_metrics')\n", "df_regression_metrics" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 取得した精度メトリクスを時系列でグラフ化" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "%matplotlib inline\n", "metric = 'mae'\n", "df_regression_metrics[metric].plot(ylim=(0,250), figsize=(10,3), title=metric.upper())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 同じ期間のGround Truthと予測結果を比較する\n", "※ 「B1 オプションB」を選択した場合は推論結果がローカルに保存されていないため、実行できません。スキップしてください" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import glob\n", "\n", "values = {}\n", "prediction_result_files = glob.glob(f'./{result_dir}/*.csv')\n", "for file in prediction_result_files:\n", " df_pred = pd.read_csv(file)\n", " pred_date = pd.to_datetime('-'.join(file.split('.')[-2].split('-')[-3:]))\n", " values[pred_date] = df_pred[['pickup_count', 'pred']].mean().to_dict()\n", " \n", "df_gt = pd.DataFrame(values).transpose().sort_index()\n", "df_regression_metrics = pd.merge(df_regression_metrics, df_gt, how='inner', left_index=True, right_index=True)\n", "df_regression_metrics.head(3)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "df_regression_metrics[['pickup_count', 'pred']].plot(figsize=(10,3), title='Taxi Pickup Count Per Hour (mean)')" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# 参考コード" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 参考|キャプチャしたデータからinferenceIdを収集する\n", "データキャプチャ時にサンプリングしている場合は、キャプチャ対象となった推論リクエストを特定するためキャプチャされたレコードを取得する " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "bucket_name = sagemaker.Session().default_bucket()\n", "capture_data_key = 'model_monitor/endpoint-data-capture/nyctaxi-endpoint/AllTraffic/2022/12/11/07/27-10-902-c1277fdc-38a1-407e-b96d-3921ed423787.jsonl'" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
eventIdinferenceIdinferenceTime
0dcbf4055-1196-4b71-ab54-e290b4b3530e50f38bbc-1c15-4d81-ac2a-7a922d8adef62022-12-11T07:27:10Z
153b77003-449e-4ee9-9eea-88ac0b66df40b5d9a7ac-9eaf-4fa4-8549-c4b1e2c784922022-12-11T07:27:10Z
207e28837-40c6-4731-a1fa-dcdfbf543ec620b5e6f6-0b84-44fc-8977-68da623391d12022-12-11T07:27:11Z
33004ec50-6b30-4043-b149-76e3322016cd37b11b6d-1087-4378-948d-247a86da68f62022-12-11T07:27:11Z
49336b2ff-dea3-4095-9d9c-f882cab4e7b3f6bed701-025d-4aac-9941-c90e9ede45092022-12-11T07:27:11Z
............
83eb42e34b-93e5-46e5-9f28-68807f93348cdb5843e2-a983-4a19-b6fe-ae6c8a321ce92022-12-11T07:27:15Z
8472846ebd-4f6b-4571-9f09-554d792f5609190dbd03-4479-4706-bbe0-0b101e1a6fe22022-12-11T07:27:15Z
85bddc4048-e81c-4bbf-be19-d2359f5d8a0bbf1842fd-2884-4f2b-9932-8a2acb9b754c2022-12-11T07:27:15Z
8638c518b2-5d79-4cd6-94b6-6497bb9e992bfb6c7636-cce2-4727-a79e-599ea8c1fd1d2022-12-11T07:27:15Z
871d62673c-fb3a-4dad-94cf-50497d76723aad8d3c31-64ef-4b0a-92b0-e34e870d9fbf2022-12-11T07:27:15Z
\n", "

88 rows × 3 columns

\n", "
" ], "text/plain": [ " eventId \\\n", "0 dcbf4055-1196-4b71-ab54-e290b4b3530e \n", "1 53b77003-449e-4ee9-9eea-88ac0b66df40 \n", "2 07e28837-40c6-4731-a1fa-dcdfbf543ec6 \n", "3 3004ec50-6b30-4043-b149-76e3322016cd \n", "4 9336b2ff-dea3-4095-9d9c-f882cab4e7b3 \n", ".. ... \n", "83 eb42e34b-93e5-46e5-9f28-68807f93348c \n", "84 72846ebd-4f6b-4571-9f09-554d792f5609 \n", "85 bddc4048-e81c-4bbf-be19-d2359f5d8a0b \n", "86 38c518b2-5d79-4cd6-94b6-6497bb9e992b \n", "87 1d62673c-fb3a-4dad-94cf-50497d76723a \n", "\n", " inferenceId inferenceTime \n", "0 50f38bbc-1c15-4d81-ac2a-7a922d8adef6 2022-12-11T07:27:10Z \n", "1 b5d9a7ac-9eaf-4fa4-8549-c4b1e2c78492 2022-12-11T07:27:10Z \n", "2 20b5e6f6-0b84-44fc-8977-68da623391d1 2022-12-11T07:27:11Z \n", "3 37b11b6d-1087-4378-948d-247a86da68f6 2022-12-11T07:27:11Z \n", "4 f6bed701-025d-4aac-9941-c90e9ede4509 2022-12-11T07:27:11Z \n", ".. ... ... \n", "83 db5843e2-a983-4a19-b6fe-ae6c8a321ce9 2022-12-11T07:27:15Z \n", "84 190dbd03-4479-4706-bbe0-0b101e1a6fe2 2022-12-11T07:27:15Z \n", "85 bf1842fd-2884-4f2b-9932-8a2acb9b754c 2022-12-11T07:27:15Z \n", "86 fb6c7636-cce2-4727-a79e-599ea8c1fd1d 2022-12-11T07:27:15Z \n", "87 ad8d3c31-64ef-4b0a-92b0-e34e870d9fbf 2022-12-11T07:27:15Z \n", "\n", "[88 rows x 3 columns]" ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "s3r_bucket = s3r.Bucket(bucket_name)\n", "capture_data = s3r_bucket.Object(capture_data_key).get()['Body'].read().decode('utf-8')\n", "\n", "df_capture = pd.read_json(io.StringIO(capture_data), lines=True)\n", "df_inference_id = pd.json_normalize(df_capture['eventMetadata'])\n", "df_inference_id" ] } ], "metadata": { "instance_type": "ml.m5.large", "kernelspec": { "display_name": "Python 3 (Data Science)", "language": "python", "name": "python3__SAGEMAKER_INTERNAL__arn:aws:sagemaker:ap-northeast-1:102112518831:image/datascience-1.0" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.10" } }, "nbformat": 4, "nbformat_minor": 4 }