{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Amazon SageMaker Multi-Model Endpoints using XGBoost\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "With [Amazon SageMaker multi-model endpoints](https://docs.aws.amazon.com/sagemaker/latest/dg/multi-model-endpoints.html), customers can create an endpoint that seamlessly hosts up to thousands of models. These endpoints are well suited to use cases where any one of a large number of models, which can be served from a common inference container to save inference costs, needs to be invokable on-demand and where it is acceptable for infrequently invoked models to incur some additional latency. For applications which require consistently low inference latency, an endpoint deploying a single model is still the best choice.\n",
    "\n",
    "At a high level, Amazon SageMaker manages the loading and unloading of models for a multi-model endpoint, as they are needed. When an invocation request is made for a particular model, Amazon SageMaker routes the request to an instance assigned to that model, downloads the model artifacts from S3 onto that instance, and initiates loading of the model into the memory of the container. As soon as the loading is complete, Amazon SageMaker performs the requested invocation and returns the result. If the model is already loaded in memory on the selected instance, the downloading and loading steps are skipped and the invocation is performed immediately.\n",
    "\n",
    "To demonstrate how multi-model endpoints are created and used, this notebook provides an example using a set of XGBoost models that each predict housing prices for a single location. This domain is used as a simple example to easily experiment with multi-model endpoints.\n",
    "\n",
    "The Amazon SageMaker multi-model endpoint capability is designed to work across with Mxnet, PyTorch and Scikit-Learn machine learning frameworks (TensorFlow coming soon), SageMaker XGBoost, KNN, and Linear Learner algorithms.\n",
    "\n",
    "In addition, Amazon SageMaker multi-model endpoints are also designed to work with cases where you bring your own container that integrates with the multi-model server library. An example of this can be found [here](https://github.com/aws/amazon-sagemaker-examples/tree/main/advanced_functionality/multi_model_bring_your_own) and documentation [here](https://docs.aws.amazon.com/sagemaker/latest/dg/build-multi-model-build-container.html)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Step 1: Initialize boto3 client and location of model artifact  \n",
    "Here you instantiate the S3 client object and the locations inside your default S3 bucket, where the metrics and model artifacts are uploaded. Notice that the default bucket sagemaker-<your-Region>-<your-account-id> is automatically created by the SageMaker session object. The model and datasets that was used for training exist in a public S3 bucket named sagemaker-sample-files. The location inside the bucket is specified through the read prefix."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd\n",
    "import numpy as np\n",
    "import boto3\n",
    "import sagemaker\n",
    "import time\n",
    "import json\n",
    "import io\n",
    "from io import StringIO\n",
    "import base64\n",
    "import pprint\n",
    "import re\n",
    "\n",
    "from sagemaker.image_uris import retrieve\n",
    "\n",
    "sess = sagemaker.Session()\n",
    "default_bucket = sess.default_bucket()\n",
    "prefix = \"fraud-detect-demo-mme\"\n",
    "\n",
    "region = sess.boto_region_name\n",
    "s3_client = boto3.client(\"s3\", region_name=region)\n",
    "sm_client = boto3.client(\"sagemaker\", region_name=region)\n",
    "sm_runtime_client = boto3.client(\"sagemaker-runtime\")\n",
    "\n",
    "sagemaker_role = sagemaker.get_execution_role()\n",
    "\n",
    "# S3 locations used for parameterizing the notebook run\n",
    "read_bucket = \"sagemaker-sample-files\"\n",
    "read_prefix = \"datasets/tabular/synthetic_automobile_claims\" \n",
    "model_prefix = \"models/xgb-fraud\"\n",
    "\n",
    "# S3 location of trained model artifact\n",
    "model_uri = f\"s3://{read_bucket}/{model_prefix}/fraud-det-xgb-model.tar.gz\"\n",
    "\n",
    "# S3 location of test data\n",
    "test_data_uri = f\"s3://{read_bucket}/{read_prefix}/test.csv\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Step 2: Create 100 XGBoost model copies to host behind a single Endpoint"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "s3 = boto3.resource('s3')\n",
    "copy_source = {\n",
    "      'Bucket': read_bucket,\n",
    "      'Key': f\"{model_prefix}/fraud-det-xgb-model.tar.gz\"\n",
    "    }\n",
    "bucket = s3.Bucket(default_bucket)\n",
    "\n",
    "for i in range (0,100):\n",
    "    bucket.copy(copy_source, f\"{model_prefix}/fraud-det-xgb-model-{i}.tar.gz\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Step 3: List the models and confirm they all reside in S3"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!aws s3 ls s3://{default_bucket}/{model_prefix}/"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Step 4: Create a Real-Time Inference Multi-Model-Endpoint\n",
    "In SageMaker, there are multiple methods to deploy a trained model to a Real-Time Inference endpoint: SageMaker SDK, AWS SDK - Boto3, and SageMaker console. For more information, see Deploy Models for Inference in the Amazon SageMaker Developer Guide. SageMaker SDK has more abstractions compared to the AWS SDK - Boto3, with the latter exposing lower-level APIs for greater control over model deployment. In this tutorial, you deploy the model using the AWS SDK -Boto3. There are three steps you need to follow in sequence to deploy a model:\n",
    "\n",
    "1. Create a SageMaker model from the model artifact\n",
    "2. Create an endpoint configuration to specify properties, including instance type and count\n",
    "3. Create the endpoint using the endpoint configuration\n",
    "\n",
    "*For Multi-Model you set `\"Mode\": \"MultiModel\"`*"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Retrieve the SageMaker managed XGBoost image\n",
    "training_image = retrieve(framework=\"xgboost\", region=region, version=\"1.3-1\")\n",
    "\n",
    "# Specify a unique model name that does not exist\n",
    "model_name = \"fraud-detect-xgb-multi-model\"\n",
    "primary_container = {\n",
    "                     \"Image\": training_image,\n",
    "                     \"ModelDataUrl\": f\"s3://{default_bucket}/{model_prefix}/\",\n",
    "                     \"Mode\": \"MultiModel\",\n",
    "                    }\n",
    "\n",
    "model_matches = sm_client.list_models(NameContains=model_name)[\"Models\"]\n",
    "if not model_matches:\n",
    "    model = sm_client.create_model(ModelName=model_name,\n",
    "                    \n",
    "                                   PrimaryContainer=primary_container,\n",
    "                                   ExecutionRoleArn=sagemaker_role)\n",
    "else:\n",
    "    print(f\"Model with name {model_name} already exists! Change model name to create new\")\n",
    "    \n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "After the SageMaker model is created, the following code is uses the Boto3 create_endpoint_config method to configure the endpoint. The main inputs to the create_endpoint_config method are the endpoint configuration name and variant information, such as inference instance type and count, the name of the model to be deployed, and the traffic share the endpoint should handle.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Endpoint Config name\n",
    "endpoint_config_name = f\"{model_name}-endpoint-config\"\n",
    "\n",
    "# Endpoint config parameters\n",
    "production_variant_dict = {\n",
    "                           \"VariantName\": \"Alltraffic\",\n",
    "                           \"ModelName\": model_name,\n",
    "                           \"InitialInstanceCount\": 1,\n",
    "                           \"InstanceType\": \"ml.m5.xlarge\",\n",
    "                           \"InitialVariantWeight\": 1\n",
    "                          }\n",
    "\n",
    "\n",
    "# Create endpoint config if one with the same name does not exist\n",
    "endpoint_config_matches = sm_client.list_endpoint_configs(NameContains=endpoint_config_name)[\"EndpointConfigs\"]\n",
    "if not endpoint_config_matches:\n",
    "    endpoint_config_response = sm_client.create_endpoint_config(\n",
    "                                                                EndpointConfigName=endpoint_config_name,\n",
    "                                                                ProductionVariants=[production_variant_dict],\n",
    "                                                          \n",
    "                                                               )\n",
    "else:\n",
    "    print(f\"Endpoint config with name {endpoint_config_name} already exists! Change endpoint config name to create new\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The create_endpoint method takes the endpoint configuration as a parameter, and deploys the model specified in the endpoint configuration to a compute instance. It takes about 6 minutes to deploy the model."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "endpoint_name = f\"{model_name}-endpoint\"\n",
    "\n",
    "endpoint_matches = sm_client.list_endpoints(NameContains=endpoint_name)[\"Endpoints\"]\n",
    "if not endpoint_matches:\n",
    "    endpoint_response = sm_client.create_endpoint(\n",
    "                                                  EndpointName=endpoint_name,\n",
    "                                                  EndpointConfigName=endpoint_config_name\n",
    "                                                 )\n",
    "else:\n",
    "    print(f\"Endpoint with name {endpoint_name} already exists! Change endpoint name to create new\")\n",
    "\n",
    "resp = sm_client.describe_endpoint(EndpointName=endpoint_name)\n",
    "status = resp[\"EndpointStatus\"]\n",
    "while status == \"Creating\":\n",
    "    print(f\"Endpoint Status: {status}...\")\n",
    "    time.sleep(60)\n",
    "    resp = sm_client.describe_endpoint(EndpointName=endpoint_name)\n",
    "    status = resp[\"EndpointStatus\"]\n",
    "print(f\"Endpoint Status: {status}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Step 5: Invoke the inference Endpoint\n",
    "\n",
    "After the endpoint status changes to InService, you can invoke the endpoint using the REST API, AWS SDK - Boto3, SageMaker Studio, AWS CLI, or SageMaker Python SDK. In this tutorial, you use the AWS SDK - Boto3. Before calling an endpoint, it is important that the test data is formatted suitably for the endpoint using serialization and deserialization. Serialization is the process of converting raw data in a format such as .csv to byte streams that the endpoint can use. Deserialization is the reverse process of converting byte stream to human readable format. In this tutorial, you invoke the endpoint by sending the first five samples from a test dataset. To invoke the endpoint and get prediction results, copy and paste the following code. Since the request to the endpoint (test dataset) is in the .csv format, a csv serialization process is used to create the payload. The response is then deserialized to an array of predictions. After the execution completes, the cell returns the model predictions and the true labels for the test samples. Notice that the XGBoost model returns probabilities instead of actual class labels. The model has predicted a very low likelihood for the test samples to be fraudulent claims and the predictions are in line with the true labels. To invoke the model of your choice make use of `TargetModel=\"fraud-det-xgb-model-{i}.tar.gz\"` "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Fetch test data to run predictions with the endpoint\n",
    "test_df = pd.read_csv(test_data_uri)\n",
    "\n",
    "# For content type text/csv, payload should be a string with commas separating the values for each feature\n",
    "# This is the inference request serialization step\n",
    "# CSV serialization\n",
    "csv_file = io.StringIO()\n",
    "test_sample = test_df.drop([\"fraud\"], axis=1).iloc[:5]\n",
    "test_sample.to_csv(csv_file, sep=\",\", header=False, index=False)\n",
    "payload = csv_file.getvalue()\n",
    "\n",
    "for i in range (0,5):\n",
    "    response = sm_runtime_client.invoke_endpoint(\n",
    "                                                 EndpointName=endpoint_name,\n",
    "                                                 Body=payload,\n",
    "                                                 TargetModel=f\"fraud-det-xgb-model-{i}.tar.gz\",\n",
    "                                                 ContentType=\"text/csv\",\n",
    "                                                 Accept=\"text/csv\"\n",
    "                                                )\n",
    "    # This is the inference response deserialization step\n",
    "    # This is a bytes object\n",
    "    result = response[\"Body\"].read()\n",
    "    # Decoding bytes to a string\n",
    "    result = result.decode(\"utf-8\")\n",
    "    # Converting to list of predictions\n",
    "    result = re.split(\",|\\n\",result)\n",
    "    prediction_df = pd.DataFrame()\n",
    "    prediction_df[\"Prediction\"] = result[:5]\n",
    "    prediction_df[\"Label\"] = test_df[\"fraud\"].iloc[:5].values\n",
    "    prediction_df\n",
    "    print(f\"\\nfraud-det-xgb-model-{i}.tar.gz prediction results:\")\n",
    "    print(prediction_df)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Step 6: Delete Endpoint\n",
    "Before leaving this exercise, it is a good practice to delete the resources created."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# sm_client.delete_endpoint(EndpointName=endpoint_name)"
   ]
  }
 ],
 "metadata": {
  "instance_type": "ml.t3.medium",
  "kernelspec": {
   "display_name": "Python 3 (Data Science)",
   "language": "python",
   "name": "python3__SAGEMAKER_INTERNAL__arn:aws:sagemaker:us-east-1:081325390199:image/datascience-1.0"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.10"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}