{
"cells": [
{
"cell_type": "markdown",
"id": "a28fe40b-5e06-4cfd-af19-dc88a93e680d",
"metadata": {},
"source": [
"# Shadow Variant Experiments via API\n"
]
},
{
"cell_type": "markdown",
"id": "3b54014c",
"metadata": {},
"source": [
"---\n",
"\n",
"This notebook's CI test result for us-west-2 is as follows. CI test results in other regions can be found at the end of the notebook. \n",
"\n",
"\n",
"\n",
"---"
]
},
{
"cell_type": "markdown",
"id": "dea3e642",
"metadata": {},
"source": [
"\n",
"## Introduction\n",
"\n",
"In this notebook, we will go through the steps of deploying a pre-trained model and then deploying a possible replacement model alongside it as Shadow mode as an experiment to compare the models. We'll do this entirely in code, making use of the SageMaker API. These models are trained on network classification, tabular dataset, where they classify network traffic into 15 different classes. \n",
"\n",
"## Contents\n",
"\n",
"1) [Setup](#setup)\n",
"2) [Deploy Model](#deploy)\n",
"3) [Register the Models](#register)\n",
"4) [Create a Shadow Test](#shadow)\n",
"5) [Perform Inference](#infer)\n",
"6) [Evaluate](#eval)\n",
"7) [Clean up](#clean)\n",
"\n",
"We trained our models with the CSE-CIC-IDS2018 dataset by CIC and ISCX which is used for security testing and malware prevention.\n",
"This data includes a huge amount of raw network traffic logs, plus pre-processed data where network connections have been reconstructed and relevant features extracted using CICFlowMeter, a tool that outputs network connection features as CSV files. Each record is classified as benign or one of fourteen types of malicious traffic.\n",
"\n",
"\n",
"Class are represented and have been encoded as follows (train + validation):\n",
"\n",
"\n",
"| Label | Encoded | \n",
"|:-------------------------|:-------:|\n",
"| Benign | 0 | \n",
"| Bot | 1 | \n",
"| DoS attacks-GoldenEye | 2 | \n",
"| DoS attacks-Slowloris | 3 | \n",
"| DDoS attacks-LOIC-HTTP | 4 | \n",
"| Infilteration | 5 | \n",
"| DDOS attack-LOIC-UDP | 6 | \n",
"| DDOS attack-HOIC | 7 | \n",
"| Brute Force -Web | 8 | \n",
"| Brute Force -XSS | 9 | \n",
"| SQL Injection | 10 | \n",
"| DoS attacks-SlowHTTPTest | 11 | \n",
"| DoS attacks-Hulk | 12 | \n",
"| FTP-BruteForce | 13 | \n",
"| SSH-Bruteforce | 14 | \n",
"\n",
"The trained models been saved to a public Amazon S3 bucket for your convenience, and labeled data is included with this notebook.\n",
"\n",
"### Let's get started!\n",
"\n",
"First, we set some variables, including the AWS region we are working in, the IAM (Identity and Access Management) execution role of the notebook instance and the Amazon S3 bucket where we will store data, models, outputs, etc. We will use the Amazon SageMaker default bucket for the selected AWS region, and then define a key prefix to make sure all objects have share the same prefix for easier discoverability."
]
},
{
"cell_type": "markdown",
"id": "bf5ad690-1cf2-4fc8-8c74-2230742cfe2f",
"metadata": {},
"source": [
"\n",
"\n",
"## Set up"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "2344d519-5394-4c5b-8df0-fc1d088115c2",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"%pip install jsonlines --quiet\n",
"%pip install sagemaker --upgrade --quiet"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "a0ce5e8a-b853-4b9d-bde2-e434ec68889a",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"# uncomment to reset kernel after installation\n",
"\n",
"# import IPython\n",
"# IPython.Application.instance().kernel.do_shutdown(True) # automatically restarts kernel"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "7e64222f-bb10-4ead-afb1-ff842fdd4360",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"import os\n",
"import time\n",
"import glob\n",
"import json\n",
"import jsonlines\n",
"import base64\n",
"import io\n",
"import datetime\n",
"\n",
"import boto3\n",
"import sagemaker\n",
"from sagemaker.model_monitor import DataCaptureConfig\n",
"from sagemaker.sklearn.model import SKLearnModel\n",
"\n",
"import matplotlib.pyplot as plt\n",
"import seaborn as sns\n",
"import pandas as pd\n",
"import numpy as np\n",
"from time import sleep\n",
"from sklearn.metrics import confusion_matrix, classification_report, accuracy_score, f1_score\n",
"from sklearn.model_selection import train_test_split\n",
"from IPython.display import display, clear_output\n",
"\n",
"pd.options.display.max_columns = 100\n",
"\n",
"region = boto3.Session().region_name\n",
"role = sagemaker.get_execution_role()\n",
"sagemaker_session = sagemaker.Session()\n",
"bucket_name = sagemaker.Session().default_bucket()\n",
"prefix = \"shadow-test\"\n",
"os.environ[\"AWS_REGION\"] = region\n",
"sm_client = boto3.Session().client(\"sagemaker\")\n",
"\n",
"print(f\"REGION: {region}\")\n",
"print(f\"ROLE: {role}\")\n",
"print(f\"BUCKET: {bucket_name}\")\n",
"\n",
"model_bucket = f\"s3://sagemaker-example-files-prod-{region}/models/shadow-test-models/\"\n",
"model_source_uri = f\"{model_bucket}sourcedir.tar.gz\"\n",
"model1_uri = f\"{model_bucket}hgb/model.tar.gz\"\n",
"model2_uri = f\"{model_bucket}rf/model.tar.gz\"\n",
"\n",
"# These are the clasifications that have been encoded as ints, we'll use these for analysis\n",
"class_list = [\n",
" \"Benign\",\n",
" \"Bot\",\n",
" \"DoS attacks-GoldenEye\",\n",
" \"DoS attacks-Slowloris\",\n",
" \"DDoS attacks-LOIC-HTTP\",\n",
" \"Infilteration\",\n",
" \"DDOS attack-LOIC-UDP\",\n",
" \"DDOS attack-HOIC\",\n",
" \"Brute Force-Web\",\n",
" \"Brute Force-XSS\",\n",
" \"SQL Injection\",\n",
" \"DoS attacks-SlowHTTPTest\",\n",
" \"DoS attacks-Hulk\",\n",
" \"FTP-BruteForce\",\n",
" \"SSH-Bruteforce\",\n",
"]"
]
},
{
"cell_type": "markdown",
"id": "c515c16a-09c2-4974-a649-4e0c788c28d9",
"metadata": {},
"source": [
""
]
},
{
"cell_type": "markdown",
"id": "5234537e-3bb5-4675-8d1a-9f748615f179",
"metadata": {},
"source": [
"\n",
"### Create and Deploy the production model\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "294d83e8-b900-42dc-a1af-a38c647c2069",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"sklearn_model = SKLearnModel(\n",
" model_data=model1_uri,\n",
" role=role,\n",
" entry_point=\"histgradientboost.py\",\n",
" source_dir=\"./code\",\n",
" framework_version=\"1.0-1\",\n",
")"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "5ab0f44b-be7a-4235-8152-605c62f7a480",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"data_capture_s3 = f\"s3://{bucket_name}/{prefix}/datacapture_test/\"\n",
"\n",
"data_capture_config = DataCaptureConfig(\n",
" enable_capture=True, sampling_percentage=100, destination_s3_uri=data_capture_s3\n",
")"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "de141ecd-b437-45c9-b147-abf7c8edc941",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"predictor = sklearn_model.deploy(\n",
" initial_instance_count=3, instance_type=\"ml.m5.2xlarge\", data_capture_config=data_capture_config\n",
")"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "f684c134-8e8a-4f81-9ddf-172753684480",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"predictor.endpoint_name"
]
},
{
"cell_type": "markdown",
"id": "d97dbeb1-9419-4de1-8de6-23f0056ebbf4",
"metadata": {},
"source": [
"## Predict\n",
"Here we verify our endpoint is working correctly by invoking the predictor."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "72913f51-4ac1-4e65-be3f-5dcbfef5f8e6",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"# single prediction\n",
"# We expect 4 - DDoS attacks-LOIC-HTTP as the predicted class.\n",
"test_values = [\n",
" 80,\n",
" 1056736,\n",
" 3,\n",
" 4,\n",
" 20,\n",
" 964,\n",
" 20,\n",
" 0,\n",
" 6.666666667,\n",
" 11.54700538,\n",
" 964,\n",
" 0,\n",
" 241.0,\n",
" 482.0,\n",
" 931.1691850999999,\n",
" 6.6241710320000005,\n",
" 176122.6667,\n",
" 431204.4454,\n",
" 1056315,\n",
" 2,\n",
" 394,\n",
" 197.0,\n",
" 275.77164469999997,\n",
" 392,\n",
" 2,\n",
" 1056733,\n",
" 352244.3333,\n",
" 609743.1115,\n",
" 1056315,\n",
" 24,\n",
" 0,\n",
" 0,\n",
" 0,\n",
" 0,\n",
" 72,\n",
" 92,\n",
" 2.8389304419999997,\n",
" 3.78524059,\n",
" 0,\n",
" 964,\n",
" 123.0,\n",
" 339.8873763,\n",
" 115523.4286,\n",
" 0,\n",
" 0,\n",
" 1,\n",
" 1,\n",
" 0,\n",
" 0,\n",
" 0,\n",
" 1,\n",
" 1.0,\n",
" 140.5714286,\n",
" 6.666666667,\n",
" 241.0,\n",
" 0.0,\n",
" 0.0,\n",
" 0.0,\n",
" 0.0,\n",
" 0.0,\n",
" 0.0,\n",
" 3,\n",
" 20,\n",
" 4,\n",
" 964,\n",
" 8192,\n",
" 211,\n",
" 1,\n",
" 20,\n",
" 0.0,\n",
" 0.0,\n",
" 0,\n",
" 0,\n",
" 0.0,\n",
" 0.0,\n",
" 0,\n",
" 0,\n",
" 20,\n",
" 2,\n",
" 2018,\n",
" 1,\n",
" 0,\n",
" 1,\n",
" 0,\n",
"]\n",
"result = predictor.predict(np.array(test_values).reshape(1, -1))\n",
"print(result)"
]
},
{
"cell_type": "markdown",
"id": "347de151-2ea7-4706-8850-12c6fcfa6830",
"metadata": {},
"source": [
"\n",
"### Register the models"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "594a6247-6562-40c5-b4f2-b59e0f9ca498",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"model1_script = \"histgradientboost.py\"\n",
"model2_script = \"randomforest.py\"\n",
"\n",
"image_uri = sagemaker.image_uris.retrieve(\"sklearn\", region, version=\"1.0-1\")\n",
"\n",
"model_name1 = \"PROD-HGB-{}\".format(datetime.datetime.now().strftime(\"%Y-%m-%d-%H%M%S\"))\n",
"model_name2 = \"SHADOW-RF-{}\".format(datetime.datetime.now().strftime(\"%Y-%m-%d-%H%M%S\"))\n",
"\n",
"print(f\"Prod model name: {model_name1}\")\n",
"print(f\"Shadow model name: {model_name2}\")\n",
"\n",
"resp = sm_client.create_model(\n",
" ModelName=model_name1,\n",
" ExecutionRoleArn=role,\n",
" PrimaryContainer={\n",
" \"Image\": image_uri,\n",
" \"Mode\": \"SingleModel\",\n",
" \"ModelDataUrl\": model1_uri,\n",
" \"Environment\": {\n",
" \"SAGEMAKER_CONTAINER_LOG_LEVEL\": \"20\",\n",
" \"SAGEMAKER_SUBMIT_DIRECTORY\": model_source_uri,\n",
" \"SAGEMAKER_PROGRAM\": model1_script,\n",
" },\n",
" },\n",
")\n",
"\n",
"resp = sm_client.create_model(\n",
" ModelName=model_name2,\n",
" ExecutionRoleArn=role,\n",
" PrimaryContainer={\n",
" \"Image\": image_uri,\n",
" \"Mode\": \"SingleModel\",\n",
" \"ModelDataUrl\": model2_uri,\n",
" \"Environment\": {\n",
" \"SAGEMAKER_CONTAINER_LOG_LEVEL\": \"20\",\n",
" \"SAGEMAKER_SUBMIT_DIRECTORY\": model_source_uri,\n",
" \"SAGEMAKER_PROGRAM\": model2_script,\n",
" },\n",
" },\n",
")"
]
},
{
"cell_type": "markdown",
"id": "7bd499b6-8b41-4c20-b7de-a92697f60812",
"metadata": {},
"source": [
"\n",
"# Create a Shadow Test "
]
},
{
"cell_type": "markdown",
"id": "441814b5-9071-4f7d-8c93-949653e307b9",
"metadata": {},
"source": [
"## Create a Shadow Test using an Existing Endpoint\n",
"\n",
"Now we will create a shadow test using the existing production endpoint. We will pass the holdout data we set aside earlier to the endpoint. This holdout dataset simulates production traffic. \n",
"\n",
"We can stop the shadow variant test using the API later in the notebook. Note that we could also specify the test start and stop time when we create the inference experiements. If we don't provide the start and end times, then the experiment starts immediately and concludes after 7 days. We are using an existing production endpoint for this test. SageMaker will update that endpoint with the new model variants. The production endpoint will also update the inference compute instance type for the production variant if needed. \n",
"\n",
"Below is an example of a SageMaker Endpoint with a shadow variant. \n"
]
},
{
"cell_type": "markdown",
"id": "f13d2116-12a1-4194-8374-575e38c8ef88",
"metadata": {},
"source": [
""
]
},
{
"cell_type": "markdown",
"id": "e7d14999-8522-42c7-893c-5ce305dbdbd6",
"metadata": {},
"source": [
"A production variant consists of the ML model, Serving Container, and ML Instance. Since each variant is independent of others, you can have different models, containers, or instance types across variants. SageMaker lets you specify autoscaling policies on a per-variant basis so they can scale independently based on incoming load. SageMaker supports up to 10 production variants per endpoint. You can either configure a variant to receive a portion of the incoming traffic by setting variant weights or specify the target variant in the incoming request. The response from the production variant is forwarded back to the invoker.\n",
"\n",
"A shadow variant (new) has the same components as a production production variant. A user specified portion of the requests, known as the traffic sampling percentage (VariantWeight parameter in the ShadowProductionVariants object), is forwarded to the shadow variant. You can choose to log the response of the shadow variant in S3 or discard it. For an endpoint with a shadow variant, you can have a maximum of one production variant."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "9251079f-cabb-4949-82c8-ebcbd12befe6",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"data_capture_s3 = f\"s3://{bucket_name}/{prefix}/datacapture_test/\""
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "95285014-09aa-4452-8093-967f69d3aaf4",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"shadowtestname = \"ShadowInferenceTestExistingEP-{}\".format(\n",
" datetime.datetime.now().strftime(\"%Y-%m-%d-%H%M%S\")\n",
")\n",
"infexperimentarn = sm_client.create_inference_experiment(\n",
" Name=shadowtestname,\n",
" Type=\"ShadowMode\",\n",
" Description=\"Shadow inference test created via boto3 python API using an existing EP\",\n",
" RoleArn=role,\n",
" EndpointName=predictor.endpoint_name,\n",
" ModelVariants=[\n",
" {\n",
" \"ModelName\": model_name1,\n",
" \"VariantName\": \"AllTraffic\",\n",
" \"InfrastructureConfig\": {\n",
" \"InfrastructureType\": \"RealTimeInference\",\n",
" \"RealTimeInferenceConfig\": {\"InstanceType\": \"ml.m5.2xlarge\", \"InstanceCount\": 3},\n",
" },\n",
" },\n",
" {\n",
" \"ModelName\": model_name2,\n",
" \"VariantName\": \"Shadow-01\",\n",
" \"InfrastructureConfig\": {\n",
" \"InfrastructureType\": \"RealTimeInference\",\n",
" \"RealTimeInferenceConfig\": {\"InstanceType\": \"ml.m5.2xlarge\", \"InstanceCount\": 3},\n",
" },\n",
" },\n",
" ],\n",
" DataStorageConfig={\n",
" \"Destination\": data_capture_s3,\n",
" },\n",
" ShadowModeConfig={\n",
" \"SourceModelVariantName\": \"AllTraffic\",\n",
" \"ShadowModelVariants\": [\n",
" {\"ShadowModelVariantName\": \"Shadow-01\", \"SamplingPercentage\": 100},\n",
" ],\n",
" },\n",
")"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "6b9ed6ff-fede-4f6d-a58f-ffa617e09a16",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"shadowtestdescribe = sm_client.describe_inference_experiment(Name=shadowtestname)\n",
"shadowtestdescribe"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "59187657-e740-41b7-be2e-63c6e9aa1d62",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"def wait_until_test_complete(test_name):\n",
" print(f\"Waiting on shadow test: {test_name}\")\n",
" done = False\n",
" while not done:\n",
" shadowtestdescribe = sm_client.describe_inference_experiment(Name=shadowtestname)\n",
" status = shadowtestdescribe[\"Status\"].lower()\n",
" print(f\"Status: {status}\")\n",
" if status == \"failed\" or status == \"cancelled\":\n",
" print(\"Failure detected. Exiting Loop.\")\n",
" print(shadowtestdescribe)\n",
" return\n",
" elif shadowtestdescribe[\"Status\"].lower() == \"running\":\n",
" print(\"Shadow test is running! Exiting Loop.\")\n",
" return\n",
" sleep(60)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "13b95f0e-e559-4ef7-9b87-9b60889d7714",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"wait_until_test_complete(shadowtestname)"
]
},
{
"cell_type": "markdown",
"id": "7d90ee2e-4376-446e-bbb8-8233313db995",
"metadata": {},
"source": [
"## Simulate Production Traffic\n",
"\n",
"We will now simulate the production traffic. We will loop over the production data. In a real production use case you won't need to do this since actual production data will be flowing to the production endpoint. Since our shadow test is now active the production variant and the shadow variant will recieve the inference input. Only the production output will be supplied in the response, however, since we have configured the test to capture data we will record both the production and shadow variant responses in s3. \n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "a92f6255-d114-441a-8335-beccbde3205c",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"holdout = pd.read_csv(\"./data/holdout.csv\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "aae508a4-8963-4713-9edb-7349ec0bb1d1",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"%%time\n",
"# this should take ~ 2 minutes to complete\n",
"indexes = []\n",
"actuals = []\n",
"i = 0\n",
"for index, row in holdout.iterrows():\n",
" vals = row.to_numpy()\n",
" prediction = predictor.predict(\n",
" vals[1::].reshape(1, -1), inference_id=f\"shadow test, index {index}\"\n",
" )\n",
" actuals.append(vals[0])\n",
" indexes.append(index)\n",
"\n",
" i += 1\n",
" if i % 1000 == 0:\n",
" print(i)"
]
},
{
"cell_type": "markdown",
"id": "77e7bed0-0560-405b-96e8-5314584c8cd1",
"metadata": {},
"source": [
"\n",
"## Now we can compare our two models\n",
"You could use an experiment like this to evaluate any aspect of model performance. Here we look at accuracy, but you might compare inference time or memory usage too. First lets grab the captured data. "
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "a9dfb234-41b0-4c27-98fa-6d3807fd4e7d",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"storage = shadowtestdescribe[\"DataStorageConfig\"][\"Destination\"] + predictor.endpoint_name + \"/\"\n",
"storage"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "5cfeeb91-bf04-4522-83cb-5e1169613820",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"!aws s3 ls {storage}"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "b695eec7-92b9-4064-8c67-654b1fe8436f",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"!aws s3 cp {storage} ./data/datacapture/ --recursive"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "ec2c9dce-b639-4c98-96fb-fb4da6fbfa76",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"shadowfiles = glob.glob(\"./data/datacapture/Shadow-01/**/*.jsonl\", recursive=True)\n",
"prodfiles = glob.glob(\"./data/datacapture/AllTraffic/**/*.jsonl\", recursive=True)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "3dac615b-1378-4d1b-b7b0-36892814b425",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"shadowin = []\n",
"shadowout = []\n",
"shadowid = []\n",
"\n",
"for f in shadowfiles:\n",
" print(f)\n",
" with jsonlines.open(f) as reader:\n",
" for obj in reader:\n",
" try:\n",
" infid = obj[\"eventMetadata\"][\"inferenceId\"].split(\" \")\n",
" shadowid.append(int(infid[-1]))\n",
"\n",
" # input to model\n",
" model_input = base64.b64decode(obj[\"captureData\"][\"endpointInput\"][\"data\"])\n",
" shadowin.append(np.load(io.BytesIO(model_input))[0].tolist())\n",
"\n",
" # output from model\n",
" model_output = base64.b64decode(obj[\"captureData\"][\"endpointOutput\"][\"data\"])\n",
" shadowout.append(np.load(io.BytesIO(model_output))[0])\n",
" except:\n",
" pass"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "e760047f-81df-4ee1-a4ed-2993d3b91995",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"shadowdf = pd.DataFrame(data=shadowout, index=shadowid, columns=[\"Shadow\"])\n",
"shadowdf"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "b4913248-07ae-4245-b1b0-b413cd6fe84a",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"shadowdf[\"Shadow\"] = pd.to_numeric(shadowdf[\"Shadow\"])\n",
"shadowdf[\"Shadow\"] = shadowdf[\"Shadow\"].astype(int)\n",
"shadowdf = pd.merge(shadowdf, holdout[\"Target\"], left_index=True, right_index=True)\n",
"acc = accuracy_score(shadowdf[\"Target\"], shadowdf[\"Shadow\"])\n",
"wf1 = f1_score(shadowdf[\"Target\"], shadowdf[\"Shadow\"], average=\"weighted\")\n",
"print(acc, wf1)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "87290338-dd6a-4315-9a9b-c370a77484a6",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"print(classification_report(shadowdf[\"Target\"], shadowdf[\"Shadow\"], zero_division=0))"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "f7037ee8-ad8b-4a60-bbe8-10229c3fc1ee",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"fig, ax = plt.subplots(figsize=(10, 8))\n",
"cm = confusion_matrix(shadowdf[\"Target\"], shadowdf[\"Shadow\"])\n",
"normalized_cm = cm.astype(\"float\") / cm.sum(axis=1)[:, np.newaxis]\n",
"clist = [class_list[i] for i in np.sort(shadowdf[\"Target\"].unique())]\n",
"sns.heatmap(normalized_cm, ax=ax, annot=cm, fmt=\"\", xticklabels=clist, yticklabels=clist)\n",
"plt.xlabel(\"Predicted\")\n",
"plt.ylabel(\"Actual\")\n",
"plt.title(\"Shadow Endpoint Confustion Matrix\")\n",
"plt.show()"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "30f80b9f-fab2-458b-b2d0-c6a2189f77b8",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"%%time\n",
"\n",
"prodin = []\n",
"prodout = []\n",
"prodid = []\n",
"\n",
"for f in prodfiles:\n",
" print(f)\n",
" with jsonlines.open(f) as reader:\n",
" for obj in reader:\n",
" try:\n",
" infid = obj[\"eventMetadata\"][\"inferenceId\"].split(\" \")\n",
" prodid.append(int(infid[-1]))\n",
"\n",
" # input to model\n",
" model_input = base64.b64decode(obj[\"captureData\"][\"endpointInput\"][\"data\"])\n",
" prodin.append(np.load(io.BytesIO(model_input))[0].tolist())\n",
"\n",
" # output from model\n",
" model_output = base64.b64decode(obj[\"captureData\"][\"endpointOutput\"][\"data\"])\n",
" prodout.append(np.load(io.BytesIO(model_output))[0])\n",
"\n",
" except:\n",
" pass"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "b9a2b377-dc2b-428d-92c1-7ae9c1da2d85",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"proddf = pd.DataFrame(data=prodout, index=prodid, columns=[\"Prod\"])\n",
"proddf"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "b4e94742-9ad1-485f-b987-d9e6d38dcf4c",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"# Line up our production model predictions with the true value based on the index\n",
"proddf = pd.merge(proddf, holdout[\"Target\"], left_index=True, right_index=True)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "cf7fa322-f118-4860-ab7b-88111d14f94c",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"acc = accuracy_score(proddf[\"Target\"], proddf[\"Prod\"])\n",
"wf1 = f1_score(proddf[\"Target\"], proddf[\"Prod\"], average=\"weighted\")\n",
"print(acc, wf1)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "f18d4cb8-c034-4444-94ff-3d1669da0534",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"print(classification_report(proddf[\"Target\"], proddf[\"Prod\"]))"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "1c93cead-a773-43ff-bf9e-655a10b8c2c6",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"fig, ax = plt.subplots(figsize=(10, 8))\n",
"cm = confusion_matrix(proddf[\"Target\"], proddf[\"Prod\"])\n",
"normalized_cm = cm.astype(\"float\") / cm.sum(axis=1)[:, np.newaxis]\n",
"sns.heatmap(normalized_cm, ax=ax, annot=cm, fmt=\"\", xticklabels=class_list, yticklabels=class_list)\n",
"plt.xlabel(\"Predicted\")\n",
"plt.ylabel(\"Actual\")\n",
"plt.title(\"Shadow Endpoint Confustion Matrix\")\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"id": "734dde1f-8a51-4d34-8f13-1d24843429d0",
"metadata": {},
"source": [
"## End the experiment and promote the shadow model to production\n",
"From the above evaluation we've decided that the shadow is ready for production. We will promote it to production as part of ending the experiment. You can also configure a similar experiment this to run automatically as part of a pipeline, and automatically promote a model if it met your criteria."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "7aea5e40-46cd-4b1e-815e-d0585cf10cfc",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"sm_client.stop_inference_experiment(\n",
" Name=shadowtestname,\n",
" ModelVariantActions={\"Shadow-01\": \"Promote\", \"AllTraffic\": \"Remove\"},\n",
" DesiredState=\"Completed\",\n",
" Reason=\"Shadow variant performed better in validation\",\n",
")"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "865c961a-801c-452a-ac99-936e105224c3",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"# Here we show that the shadow model is now deployed to production\n",
"sm_client.describe_endpoint(EndpointName=predictor.endpoint_name)"
]
},
{
"cell_type": "markdown",
"id": "45bc4e2f-a59c-49e0-98ea-b49ef188d7ce",
"metadata": {},
"source": [
"## Clean Up"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "f6397c29-5552-4ab3-b094-8af3c20ab9c5",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"def wait_until_complete(test_name):\n",
" print(f\"Waiting on shadow test: {test_name}\")\n",
" done = False\n",
" while not done:\n",
" shadowtestdescribe = sm_client.describe_inference_experiment(Name=shadowtestname)\n",
" status = shadowtestdescribe[\"Status\"].lower()\n",
" print(f\"Status: {status}\")\n",
" if status == \"completed\":\n",
" print(\"Shadow test is stopped, ok to delete. Exiting Loop.\")\n",
" return\n",
" sleep(60)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "372e2af1-3622-4032-8e80-f5f025c08fea",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"wait_until_complete(shadowtestname)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "4859e959-1737-48bc-bbf9-e48a19df3607",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"# predictor.delete_endpoint()\n",
"sm_client.delete_inference_experiment(Name=shadowtestname)\n",
"sm_client.delete_endpoint(EndpointName=predictor.endpoint_name)"
]
},
{
"cell_type": "markdown",
"id": "3096fa20-3ebf-4304-ad30-fb72d83f0ecb",
"metadata": {},
"source": [
"# References\n",
"\n",
"* A Realistic Cyber Defense Dataset (CSE-CIC-IDS2018) - https://registry.opendata.aws/cse-cic-ids2018/\n",
"* AIM362 - Re:Invent 2019 SageMaker Debugger and Model Monitor - https://github.com/aws-samples/reinvent2019-aim362-sagemaker-debugger-model-monitor"
]
},
{
"cell_type": "markdown",
"id": "8f400bb1",
"metadata": {},
"source": [
"## Notebook CI Test Results\n",
"\n",
"This notebook was tested in multiple regions. The test results are as follows, except for us-west-2 which is shown at the top of the notebook.\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n"
]
}
],
"metadata": {
"availableInstances": [
{
"_defaultOrder": 0,
"_isFastLaunch": true,
"category": "General purpose",
"gpuNum": 0,
"hideHardwareSpecs": false,
"memoryGiB": 4,
"name": "ml.t3.medium",
"vcpuNum": 2
},
{
"_defaultOrder": 1,
"_isFastLaunch": false,
"category": "General purpose",
"gpuNum": 0,
"hideHardwareSpecs": false,
"memoryGiB": 8,
"name": "ml.t3.large",
"vcpuNum": 2
},
{
"_defaultOrder": 2,
"_isFastLaunch": false,
"category": "General purpose",
"gpuNum": 0,
"hideHardwareSpecs": false,
"memoryGiB": 16,
"name": "ml.t3.xlarge",
"vcpuNum": 4
},
{
"_defaultOrder": 3,
"_isFastLaunch": false,
"category": "General purpose",
"gpuNum": 0,
"hideHardwareSpecs": false,
"memoryGiB": 32,
"name": "ml.t3.2xlarge",
"vcpuNum": 8
},
{
"_defaultOrder": 4,
"_isFastLaunch": true,
"category": "General purpose",
"gpuNum": 0,
"hideHardwareSpecs": false,
"memoryGiB": 8,
"name": "ml.m5.large",
"vcpuNum": 2
},
{
"_defaultOrder": 5,
"_isFastLaunch": false,
"category": "General purpose",
"gpuNum": 0,
"hideHardwareSpecs": false,
"memoryGiB": 16,
"name": "ml.m5.xlarge",
"vcpuNum": 4
},
{
"_defaultOrder": 6,
"_isFastLaunch": false,
"category": "General purpose",
"gpuNum": 0,
"hideHardwareSpecs": false,
"memoryGiB": 32,
"name": "ml.m5.2xlarge",
"vcpuNum": 8
},
{
"_defaultOrder": 7,
"_isFastLaunch": false,
"category": "General purpose",
"gpuNum": 0,
"hideHardwareSpecs": false,
"memoryGiB": 64,
"name": "ml.m5.4xlarge",
"vcpuNum": 16
},
{
"_defaultOrder": 8,
"_isFastLaunch": false,
"category": "General purpose",
"gpuNum": 0,
"hideHardwareSpecs": false,
"memoryGiB": 128,
"name": "ml.m5.8xlarge",
"vcpuNum": 32
},
{
"_defaultOrder": 9,
"_isFastLaunch": false,
"category": "General purpose",
"gpuNum": 0,
"hideHardwareSpecs": false,
"memoryGiB": 192,
"name": "ml.m5.12xlarge",
"vcpuNum": 48
},
{
"_defaultOrder": 10,
"_isFastLaunch": false,
"category": "General purpose",
"gpuNum": 0,
"hideHardwareSpecs": false,
"memoryGiB": 256,
"name": "ml.m5.16xlarge",
"vcpuNum": 64
},
{
"_defaultOrder": 11,
"_isFastLaunch": false,
"category": "General purpose",
"gpuNum": 0,
"hideHardwareSpecs": false,
"memoryGiB": 384,
"name": "ml.m5.24xlarge",
"vcpuNum": 96
},
{
"_defaultOrder": 12,
"_isFastLaunch": false,
"category": "General purpose",
"gpuNum": 0,
"hideHardwareSpecs": false,
"memoryGiB": 8,
"name": "ml.m5d.large",
"vcpuNum": 2
},
{
"_defaultOrder": 13,
"_isFastLaunch": false,
"category": "General purpose",
"gpuNum": 0,
"hideHardwareSpecs": false,
"memoryGiB": 16,
"name": "ml.m5d.xlarge",
"vcpuNum": 4
},
{
"_defaultOrder": 14,
"_isFastLaunch": false,
"category": "General purpose",
"gpuNum": 0,
"hideHardwareSpecs": false,
"memoryGiB": 32,
"name": "ml.m5d.2xlarge",
"vcpuNum": 8
},
{
"_defaultOrder": 15,
"_isFastLaunch": false,
"category": "General purpose",
"gpuNum": 0,
"hideHardwareSpecs": false,
"memoryGiB": 64,
"name": "ml.m5d.4xlarge",
"vcpuNum": 16
},
{
"_defaultOrder": 16,
"_isFastLaunch": false,
"category": "General purpose",
"gpuNum": 0,
"hideHardwareSpecs": false,
"memoryGiB": 128,
"name": "ml.m5d.8xlarge",
"vcpuNum": 32
},
{
"_defaultOrder": 17,
"_isFastLaunch": false,
"category": "General purpose",
"gpuNum": 0,
"hideHardwareSpecs": false,
"memoryGiB": 192,
"name": "ml.m5d.12xlarge",
"vcpuNum": 48
},
{
"_defaultOrder": 18,
"_isFastLaunch": false,
"category": "General purpose",
"gpuNum": 0,
"hideHardwareSpecs": false,
"memoryGiB": 256,
"name": "ml.m5d.16xlarge",
"vcpuNum": 64
},
{
"_defaultOrder": 19,
"_isFastLaunch": false,
"category": "General purpose",
"gpuNum": 0,
"hideHardwareSpecs": false,
"memoryGiB": 384,
"name": "ml.m5d.24xlarge",
"vcpuNum": 96
},
{
"_defaultOrder": 20,
"_isFastLaunch": false,
"category": "General purpose",
"gpuNum": 0,
"hideHardwareSpecs": true,
"memoryGiB": 0,
"name": "ml.geospatial.interactive",
"supportedImageNames": [
"sagemaker-geospatial-v1-0"
],
"vcpuNum": 0
},
{
"_defaultOrder": 21,
"_isFastLaunch": true,
"category": "Compute optimized",
"gpuNum": 0,
"hideHardwareSpecs": false,
"memoryGiB": 4,
"name": "ml.c5.large",
"vcpuNum": 2
},
{
"_defaultOrder": 22,
"_isFastLaunch": false,
"category": "Compute optimized",
"gpuNum": 0,
"hideHardwareSpecs": false,
"memoryGiB": 8,
"name": "ml.c5.xlarge",
"vcpuNum": 4
},
{
"_defaultOrder": 23,
"_isFastLaunch": false,
"category": "Compute optimized",
"gpuNum": 0,
"hideHardwareSpecs": false,
"memoryGiB": 16,
"name": "ml.c5.2xlarge",
"vcpuNum": 8
},
{
"_defaultOrder": 24,
"_isFastLaunch": false,
"category": "Compute optimized",
"gpuNum": 0,
"hideHardwareSpecs": false,
"memoryGiB": 32,
"name": "ml.c5.4xlarge",
"vcpuNum": 16
},
{
"_defaultOrder": 25,
"_isFastLaunch": false,
"category": "Compute optimized",
"gpuNum": 0,
"hideHardwareSpecs": false,
"memoryGiB": 72,
"name": "ml.c5.9xlarge",
"vcpuNum": 36
},
{
"_defaultOrder": 26,
"_isFastLaunch": false,
"category": "Compute optimized",
"gpuNum": 0,
"hideHardwareSpecs": false,
"memoryGiB": 96,
"name": "ml.c5.12xlarge",
"vcpuNum": 48
},
{
"_defaultOrder": 27,
"_isFastLaunch": false,
"category": "Compute optimized",
"gpuNum": 0,
"hideHardwareSpecs": false,
"memoryGiB": 144,
"name": "ml.c5.18xlarge",
"vcpuNum": 72
},
{
"_defaultOrder": 28,
"_isFastLaunch": false,
"category": "Compute optimized",
"gpuNum": 0,
"hideHardwareSpecs": false,
"memoryGiB": 192,
"name": "ml.c5.24xlarge",
"vcpuNum": 96
},
{
"_defaultOrder": 29,
"_isFastLaunch": true,
"category": "Accelerated computing",
"gpuNum": 1,
"hideHardwareSpecs": false,
"memoryGiB": 16,
"name": "ml.g4dn.xlarge",
"vcpuNum": 4
},
{
"_defaultOrder": 30,
"_isFastLaunch": false,
"category": "Accelerated computing",
"gpuNum": 1,
"hideHardwareSpecs": false,
"memoryGiB": 32,
"name": "ml.g4dn.2xlarge",
"vcpuNum": 8
},
{
"_defaultOrder": 31,
"_isFastLaunch": false,
"category": "Accelerated computing",
"gpuNum": 1,
"hideHardwareSpecs": false,
"memoryGiB": 64,
"name": "ml.g4dn.4xlarge",
"vcpuNum": 16
},
{
"_defaultOrder": 32,
"_isFastLaunch": false,
"category": "Accelerated computing",
"gpuNum": 1,
"hideHardwareSpecs": false,
"memoryGiB": 128,
"name": "ml.g4dn.8xlarge",
"vcpuNum": 32
},
{
"_defaultOrder": 33,
"_isFastLaunch": false,
"category": "Accelerated computing",
"gpuNum": 4,
"hideHardwareSpecs": false,
"memoryGiB": 192,
"name": "ml.g4dn.12xlarge",
"vcpuNum": 48
},
{
"_defaultOrder": 34,
"_isFastLaunch": false,
"category": "Accelerated computing",
"gpuNum": 1,
"hideHardwareSpecs": false,
"memoryGiB": 256,
"name": "ml.g4dn.16xlarge",
"vcpuNum": 64
},
{
"_defaultOrder": 35,
"_isFastLaunch": false,
"category": "Accelerated computing",
"gpuNum": 1,
"hideHardwareSpecs": false,
"memoryGiB": 61,
"name": "ml.p3.2xlarge",
"vcpuNum": 8
},
{
"_defaultOrder": 36,
"_isFastLaunch": false,
"category": "Accelerated computing",
"gpuNum": 4,
"hideHardwareSpecs": false,
"memoryGiB": 244,
"name": "ml.p3.8xlarge",
"vcpuNum": 32
},
{
"_defaultOrder": 37,
"_isFastLaunch": false,
"category": "Accelerated computing",
"gpuNum": 8,
"hideHardwareSpecs": false,
"memoryGiB": 488,
"name": "ml.p3.16xlarge",
"vcpuNum": 64
},
{
"_defaultOrder": 38,
"_isFastLaunch": false,
"category": "Accelerated computing",
"gpuNum": 8,
"hideHardwareSpecs": false,
"memoryGiB": 768,
"name": "ml.p3dn.24xlarge",
"vcpuNum": 96
},
{
"_defaultOrder": 39,
"_isFastLaunch": false,
"category": "Memory Optimized",
"gpuNum": 0,
"hideHardwareSpecs": false,
"memoryGiB": 16,
"name": "ml.r5.large",
"vcpuNum": 2
},
{
"_defaultOrder": 40,
"_isFastLaunch": false,
"category": "Memory Optimized",
"gpuNum": 0,
"hideHardwareSpecs": false,
"memoryGiB": 32,
"name": "ml.r5.xlarge",
"vcpuNum": 4
},
{
"_defaultOrder": 41,
"_isFastLaunch": false,
"category": "Memory Optimized",
"gpuNum": 0,
"hideHardwareSpecs": false,
"memoryGiB": 64,
"name": "ml.r5.2xlarge",
"vcpuNum": 8
},
{
"_defaultOrder": 42,
"_isFastLaunch": false,
"category": "Memory Optimized",
"gpuNum": 0,
"hideHardwareSpecs": false,
"memoryGiB": 128,
"name": "ml.r5.4xlarge",
"vcpuNum": 16
},
{
"_defaultOrder": 43,
"_isFastLaunch": false,
"category": "Memory Optimized",
"gpuNum": 0,
"hideHardwareSpecs": false,
"memoryGiB": 256,
"name": "ml.r5.8xlarge",
"vcpuNum": 32
},
{
"_defaultOrder": 44,
"_isFastLaunch": false,
"category": "Memory Optimized",
"gpuNum": 0,
"hideHardwareSpecs": false,
"memoryGiB": 384,
"name": "ml.r5.12xlarge",
"vcpuNum": 48
},
{
"_defaultOrder": 45,
"_isFastLaunch": false,
"category": "Memory Optimized",
"gpuNum": 0,
"hideHardwareSpecs": false,
"memoryGiB": 512,
"name": "ml.r5.16xlarge",
"vcpuNum": 64
},
{
"_defaultOrder": 46,
"_isFastLaunch": false,
"category": "Memory Optimized",
"gpuNum": 0,
"hideHardwareSpecs": false,
"memoryGiB": 768,
"name": "ml.r5.24xlarge",
"vcpuNum": 96
},
{
"_defaultOrder": 47,
"_isFastLaunch": false,
"category": "Accelerated computing",
"gpuNum": 1,
"hideHardwareSpecs": false,
"memoryGiB": 16,
"name": "ml.g5.xlarge",
"vcpuNum": 4
},
{
"_defaultOrder": 48,
"_isFastLaunch": false,
"category": "Accelerated computing",
"gpuNum": 1,
"hideHardwareSpecs": false,
"memoryGiB": 32,
"name": "ml.g5.2xlarge",
"vcpuNum": 8
},
{
"_defaultOrder": 49,
"_isFastLaunch": false,
"category": "Accelerated computing",
"gpuNum": 1,
"hideHardwareSpecs": false,
"memoryGiB": 64,
"name": "ml.g5.4xlarge",
"vcpuNum": 16
},
{
"_defaultOrder": 50,
"_isFastLaunch": false,
"category": "Accelerated computing",
"gpuNum": 1,
"hideHardwareSpecs": false,
"memoryGiB": 128,
"name": "ml.g5.8xlarge",
"vcpuNum": 32
},
{
"_defaultOrder": 51,
"_isFastLaunch": false,
"category": "Accelerated computing",
"gpuNum": 1,
"hideHardwareSpecs": false,
"memoryGiB": 256,
"name": "ml.g5.16xlarge",
"vcpuNum": 64
},
{
"_defaultOrder": 52,
"_isFastLaunch": false,
"category": "Accelerated computing",
"gpuNum": 4,
"hideHardwareSpecs": false,
"memoryGiB": 192,
"name": "ml.g5.12xlarge",
"vcpuNum": 48
},
{
"_defaultOrder": 53,
"_isFastLaunch": false,
"category": "Accelerated computing",
"gpuNum": 4,
"hideHardwareSpecs": false,
"memoryGiB": 384,
"name": "ml.g5.24xlarge",
"vcpuNum": 96
},
{
"_defaultOrder": 54,
"_isFastLaunch": false,
"category": "Accelerated computing",
"gpuNum": 8,
"hideHardwareSpecs": false,
"memoryGiB": 768,
"name": "ml.g5.48xlarge",
"vcpuNum": 192
},
{
"_defaultOrder": 55,
"_isFastLaunch": false,
"category": "Accelerated computing",
"gpuNum": 8,
"hideHardwareSpecs": false,
"memoryGiB": 1152,
"name": "ml.p4d.24xlarge",
"vcpuNum": 96
},
{
"_defaultOrder": 56,
"_isFastLaunch": false,
"category": "Accelerated computing",
"gpuNum": 8,
"hideHardwareSpecs": false,
"memoryGiB": 1152,
"name": "ml.p4de.24xlarge",
"vcpuNum": 96
}
],
"kernelspec": {
"display_name": "Python 3 (Data Science 3.0)",
"language": "python",
"name": "python3__SAGEMAKER_INTERNAL__arn:aws:sagemaker:us-west-2:236514542706:image/sagemaker-data-science-310-v1"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.6"
}
},
"nbformat": 4,
"nbformat_minor": 5
}