{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Running multi-container endpoints on Amazon SageMaker\n",
    "\n",
    "SageMaker multi-container endpoints enable customers to deploy multiple containers to deploy different models on a SageMaker endpoint. The containers can be run in a sequence as an inference pipeline, or each container can be accessed individually by using direct invocation to improve endpoint utilization and optimize costs.\n",
    "\n",
    "\n",
    "This notebook shows how to create a multi-container endpoint which will host both the PyTorch(>=1.5) model and a TensorFlow(>=2.0) model, on a single endpoint. Here, `Direct` invocation behavior of multi-container endpoints is showcased where each model container can be invoked directly rather than being called in a sequence.\n",
    "\n",
    "This notebook is divided in the following sections:\n",
    "\n",
    "1. **Pre-requisites**\n",
    "1. **Setup Multi-container Endpoint with Direct Invocation**\n",
    "1. **Inference**\n",
    "1. **Clean up**"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Section 1: Pre-requisites"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "First, import some necessary libraries and variables. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import os\n",
    "import json\n",
    "import time\n",
    "import random\n",
    "import numpy as np\n",
    "from utils.mnist import mnist_to_numpy, normalize\n",
    "import random\n",
    "import matplotlib.pyplot as plt\n",
    "\n",
    "import boto3\n",
    "import sagemaker\n",
    "from sagemaker.tensorflow import TensorFlow\n",
    "from sagemaker.pytorch import PyTorch\n",
    "from sagemaker import get_execution_role\n",
    "from sagemaker.s3 import S3Downloader\n",
    "from sagemaker.s3 import S3Uploader\n",
    "\n",
    "sess = sagemaker.Session()\n",
    "\n",
    "role = get_execution_role()\n",
    "\n",
    "bucket = sess.default_bucket()\n",
    "\n",
    "region = sess.boto_region_name\n",
    "\n",
    "sm_client = sess.sagemaker_client\n",
    "runtime_sm_client = sess.sagemaker_runtime_client\n",
    "s3_client = boto3.client(\"s3\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Models and Dataset"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This notebook uses pretrained TensorFlow and PyTorch models trained on the `MNIST` dataset. `MNIST` is a widely used dataset for handwritten digit classification. It consists of 70,000 labeled `28x28` pixel grayscale images of hand-written digits. The dataset is split into 60,000 training images and 10,000 test images. There are 10 classes (one for each of the 10 digits). These models were trained on a SageMaker Training Job. However, the purpose of this notebook will focus on deployment. If you would like to see the script that was used to train these models, take a look under the `pytorch/code/train.py` `tensorflow/code/train` directory."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\n",
    "## Section 2: Set up Multi-container endpoint with Direct Invocation"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In this section, a multi-container endpoint is set up.\n",
    "\n",
    "SageMaker multi-container endpoints enable customers to deploy multiple containers to deploy different models on the same SageMaker endpoint. The containers can be run in a sequence as an inference pipeline, or each container can be accessed individually by using `direct` invocation to improve endpoint utilization and optimize costs.\n",
    "\n",
    "The TensorFlow and PyTorch models, trained in the earlier sections would be deployed against a single sagemaker endpoint using multi-container capability of SageMaker Endpoints. This section uses`boto3` APIs.\n",
    "\n",
    "Setting up a multi-container endpoint is a multi-step process, which looks like the following:\n",
    "- Create inference container definitions for all the containers needed to deploy\n",
    "- Create a SageMaker model using the `create_model` API. Use the `Containers` parameter instead of `PrimaryContainer`, and include more than one container in the `Containers` parameter.\n",
    "- Create a SageMaker Endpoint Configuration using the `create_endpoint_config` API\n",
    "- Create a SageMaker Endpoint using the `create_endpoint` API which uses the model and endpoint configuration created in the earlier steps.\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Create inference container definition for TensorFlow model\n",
    "\n",
    "To create a container definition, following must be defined :\n",
    "\n",
    "- `ContainerHostname`: The value of the parameter uniquely identifies the container for the purposes of logging and metrics. The `ContainerHostname` parameter is required for each container in a multi-container endpoint with `direct` invocation. Though it can be skipped, in case of serial inference pipeline as the inference pipeline will assign a unique name automatically.\n",
    "\n",
    "- `Image`: It is the path where inference code is stored. This can be either in Amazon EC2 Container Registry or in a Docker registry that is accessible from the same VPC that is configured for the endpoint. If custom algorithm is used instead of an algorithm provided by Amazon SageMaker, the inference code must meet Amazon SageMaker requirements.\n",
    "\n",
    "- `ModelDataUrl`: The S3 path where the model artifacts, which result from model training, are stored. This path must point to a single GZIP compressed tar archive (`.tar.gz` suffix). The S3 path is required for Amazon SageMaker built-in algorithms/frameworks, but not if a custom algorithm (not provided by sagemaker) is used.\n",
    "\n",
    "For the Image argument, supply the ECR path of the TensorFlow 2.3.1 inference image. For deep learning images available in SageMaker, refer to [Available Deep Learning Containers Images](https://github.com/aws/deep-learning-containers/blob/master/available_images.md).\n",
    "\n",
    "First we need to upload the model tar ball to s3:\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!aws s3 cp tensorflow/model.tar.gz s3://{bucket}/tensorflow-mce-demo/model.tar.gz"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "tf_ecr_image_uri = sagemaker.image_uris.retrieve(\n",
    "    framework=\"tensorflow\",\n",
    "    region=region,\n",
    "    version=\"2.3.1\",\n",
    "    py_version=\"py37\",\n",
    "    instance_type=\"ml.c5.4xlarge\",\n",
    "    image_scope=\"inference\",\n",
    ")\n",
    "\n",
    "tensorflow_container = {\n",
    "    \"ContainerHostname\": \"tensorflow-mnist\",\n",
    "    \"Image\": tf_ecr_image_uri,\n",
    "    \"ModelDataUrl\": f\"s3://{bucket}/tensorflow-mce-demo/model.tar.gz\"\n",
    "}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Create inference container definition for PyTorch model\n",
    "\n",
    "Now similarly, create the container definition for PyTorch model. \n",
    "\n",
    "Here in addition to the arguments defined for TensorFlow container, one more additional argument needs to be defined which is `Environment`. This is because, the PyTorch model server needs to know how to load the model and make the predictions. This is explained in detail in the following section.\n",
    "\n",
    "\n",
    "To tell the inference image how to load the model checkpoint, it needs to implement:\n",
    "\n",
    "- How to parse the incoming request\n",
    "- How to use the trained model to make inference\n",
    "- How to return the prediction to the caller of the service\n",
    "\n",
    "\n",
    "To achieve this, it needs to:\n",
    "\n",
    "- implement a function called `model_fn` which returns a PyTorch model.\n",
    "- implement a function called `input_fn` function which handles data decoding and returns an object that can be passed to `predict_fn`\n",
    "- implement a function called `predict_fn` function which will perform the prediction and returns as object that can be passed to `output_fn`\n",
    "- implement a function called `output_fn` function which will perform the de-serialization of the output given by `predict_fn`\n",
    "\n",
    "\n",
    "To achieve this, `inference.py` is created which provides the implementation of all the above functions in that file. This file must be supplied as an environment variable `SAGEMAKER_PROGRAM`.\n",
    "\n",
    "The model and `inference.py` also need to be wrapped together in a single `tar.gz`. The following steps are performed to zip the inference and model file together:\n",
    "\n",
    "- Download the `model.tar.gz` containing the trained PyTorch model\n",
    "- Unzip the `model.tar.gz`. The `model.pth` file is visible after unzipping.\n",
    "- GZIP the `model file(.pth)` and the `inference.py` together in a new `tar.gz`\n",
    "- Upload the new `tar.gz` to `s3` location, to be referred in the `model container definition` later\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# unzip the tar.gz\n",
    "!cp pytorch/model.tar.gz ./model.tar.gz \n",
    "!tar -xvf model.tar.gz\n",
    "\n",
    "# after unzipping, remove the model.tar.gz\n",
    "!rm model.tar.gz\n",
    "\n",
    "# copy the pytorch inference script to current dir\n",
    "!cp pytorch/code/inference.py .\n",
    "\n",
    "# gzip the inference.py and model file together in a new model.tar.gz\n",
    "!tar -czvf model.tar.gz model.pth inference.py\n",
    "\n",
    "# remove the residual files\n",
    "!rm inference.py model.pth\n",
    "\n",
    "# upload the new tar.gz to s3\n",
    "updated_pt_model_key = \"multi-container-endpoint/output/pytorch/updated\"\n",
    "pt_updated_model_uri = S3Uploader.upload(\n",
    "    \"model.tar.gz\", \"s3://{}/{}\".format(bucket, updated_pt_model_key)\n",
    ")\n",
    "\n",
    "# remove the new model.tar.gz from the current dir\n",
    "!rm model.tar.gz"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\n",
    "Now, everything is ready to create a container definition for PyTorch container\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "pt_ecr_image_uri = sagemaker.image_uris.retrieve(\n",
    "    framework=\"pytorch\",\n",
    "    region=region,\n",
    "    version=\"1.8.1\",\n",
    "    py_version=\"py36\",\n",
    "    instance_type=\"ml.c5.4xlarge\",\n",
    "    image_scope=\"inference\",\n",
    ")\n",
    "\n",
    "pytorch_container = {\n",
    "    \"ContainerHostname\": \"pytorch-mnist\",\n",
    "    \"Image\": pt_ecr_image_uri,\n",
    "    \"ModelDataUrl\": pt_updated_model_uri,\n",
    "    \"Environment\": {\n",
    "        \"SAGEMAKER_PROGRAM\": \"inference.py\",\n",
    "        \"SAGEMAKER_SUBMIT_DIRECTORY\": pt_updated_model_uri,\n",
    "    },\n",
    "}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Create a SageMaker Model\n",
    "\n",
    "In the below cell, call the `create_model` API to create a model which contains the definitions of both the PyTorch and TensorFlow containers created above. It needs to supply both the containers under the `Containers` argument. Also set the `Mode` parameter of the `InferenceExecutionConfig` field to `Direct` for direct invocation of each container, or `Serial` to use containers as an inference pipeline. The default mode is `Serial`. For more details, check out [Deploy multi-container endpoints](https://docs.aws.amazon.com/sagemaker/latest/dg/multi-container-endpoints.html)\n",
    "\n",
    "\n",
    "Since this notebook focuses on the Direct invocation behavior, hence set the value as `Direct`.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "create_model_response = sm_client.create_model(\n",
    "    ModelName=\"mnist-multi-container\",\n",
    "    #multi-container-example\n",
    "    Containers=[pytorch_container, tensorflow_container],\n",
    "    InferenceExecutionConfig={\"Mode\": \"Direct\"},\n",
    "    ExecutionRoleArn=role,\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Create Endpoint Configuration\n",
    "\n",
    "Now, create an endpoint configuration by calling the `create_endpoint_config` API. Here, supply the same `ModelName` used in the `create_model` API call."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "endpoint_config = sm_client.create_endpoint_config(\n",
    "    EndpointConfigName=\"mnist-multi-container-ep-config\",\n",
    "    ProductionVariants=[\n",
    "        {\n",
    "            \"VariantName\": \"prod\",\n",
    "            \"ModelName\": \"mnist-multi-container\",\n",
    "            \"InitialInstanceCount\": 1,\n",
    "            \"InstanceType\": \"ml.c5.4xlarge\",\n",
    "        },\n",
    "    ],\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Create a SageMaker Multi-container endpoint\n",
    "\n",
    "Now, the last step is to create a SageMaker multi-container endpoint. The `create_endpoint` API is used for this. The API behavior has no change compared to how a single container/model endpoint is deployed."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "endpoint = sm_client.create_endpoint(\n",
    "    EndpointName=\"mnist-multi-container-ep\", EndpointConfigName=\"mnist-multi-container-ep-config\"\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The `create_endpoint` API is synchronous in nature and returns an immediate response with the endpoint status being in`Creating` state. It takes around ~8-10 minutes for multi-container endpoint to be `InService`.\n",
    "\n",
    "In the below cell, use the `describe_endpoint` API to check the status of endpoint creation. It runs a simple waiter loop calling the `describe_endpoint` API, for the endpoint to be `InService`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "describe_endpoint = sm_client.describe_endpoint(EndpointName=\"mnist-multi-container-ep\")\n",
    "\n",
    "endpoint_status = describe_endpoint[\"EndpointStatus\"]\n",
    "\n",
    "while endpoint_status != \"InService\":\n",
    "    print(\"Current endpoint status is: {}\".format(endpoint_status))\n",
    "    time.sleep(60)\n",
    "    resp = sm_client.describe_endpoint(EndpointName=\"mnist-multi-container-ep\")\n",
    "    endpoint_status = resp[\"EndpointStatus\"]\n",
    "\n",
    "print(\"Endpoint status changed to 'InService'\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Section 3: Inference"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now that the endpoint is set up it is time to perform inference on the endpoint by specifying one of the container host name. First, download the `MNIST` data and select a random sample of images. \n",
    "\n",
    "Use the helper functions defined in `code.utils` to download `MNIST` data set and normalize the input data.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%matplotlib inline\n",
    "\n",
    "data_dir = \"/tmp/data\"\n",
    "X, _ = mnist_to_numpy(data_dir, train=False)\n",
    "\n",
    "# randomly sample 16 images to inspect\n",
    "mask = random.sample(range(X.shape[0]), 16)\n",
    "samples = X[mask]\n",
    "\n",
    "# plot the images\n",
    "fig, axs = plt.subplots(nrows=1, ncols=16, figsize=(16, 1))\n",
    "\n",
    "for i, splt in enumerate(axs):\n",
    "    splt.imshow(samples[i])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(samples.shape, samples.dtype)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Invoking the TensorFlow container\n",
    "\n",
    "Now invoke the TensorFlow container, on the same endpoint. First normalize the sample selected and then pass the sample to the `invoke_endpoint` API."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "tf_samples = normalize(samples, axis=(1, 2))\n",
    "\n",
    "tf_result = runtime_sm_client.invoke_endpoint(\n",
    "    EndpointName=\"mnist-multi-container-ep\",\n",
    "    ContentType=\"application/json\",\n",
    "    Accept=\"application/json\",\n",
    "    TargetContainerHostname=\"tensorflow-mnist\",\n",
    "    Body=json.dumps({\"instances\": np.expand_dims(tf_samples, 3).tolist()}),\n",
    ")\n",
    "\n",
    "tf_body = tf_result[\"Body\"].read().decode(\"utf-8\")\n",
    "\n",
    "tf_json_predictions = json.loads(tf_body)[\"predictions\"]\n",
    "\n",
    "\n",
    "# softmax to logit\n",
    "tf_predictions = np.array(tf_json_predictions, dtype=np.float32)\n",
    "tf_predictions = np.argmax(tf_json_predictions, axis=1)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(\"Predictions: \", tf_predictions.tolist())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Invoke PyTorch container\n",
    "\n",
    "Now, invoke the PyTorch Container. In `transform_fn`, of `inference.py` it is declared that the parsed data is a python dictionary with a key `inputs` and its value should be a 1D array of length 784. Hence, create a sample inference data in the cell below."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Before we invoke the SageMaker PyTorch model server with `samples`, we need to do some pre-processing\n",
    "- convert its data type to 32 bit floating point\n",
    "- normalize each channel (only one channel for `MNIST`)\n",
    "- add a channel dimension"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "pt_samples = normalize(samples.astype(np.float32), axis=(1, 2))\n",
    "\n",
    "pt_result = runtime_sm_client.invoke_endpoint(\n",
    "    EndpointName=\"mnist-multi-container-ep\",\n",
    "    ContentType=\"application/json\",\n",
    "    Accept=\"application/json\",\n",
    "    TargetContainerHostname=\"pytorch-mnist\",\n",
    "    Body=json.dumps({\"inputs\": np.expand_dims(pt_samples, axis=1).tolist()}),\n",
    ")\n",
    "\n",
    "pt_body = pt_result[\"Body\"].read().decode(\"utf-8\")\n",
    "\n",
    "pt_predictions = np.argmax(np.array(json.loads(pt_body), dtype=np.float32), axis=1).tolist()\n",
    "print(\"Predicted digits: \", pt_predictions)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\n",
    "## Section 4: clean up\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Before leaving this exercise, it is a good practice to delete the resources created."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# sm_client.delete_endpoint(EndpointName=\"mnist-multi-container-ep\")\n",
    "# sm_client.delete_endpoint_config(EndpointConfigName=\"mnist-multi-container-ep-config\")\n",
    "# sm_client.delete_model(ModelName=\"mnist-multi-container\")"
   ]
  }
 ],
 "metadata": {
  "instance_type": "ml.t3.medium",
  "kernelspec": {
   "display_name": "Python 3 (Data Science)",
   "language": "python",
   "name": "python3__SAGEMAKER_INTERNAL__arn:aws:sagemaker:us-east-1:081325390199:image/datascience-1.0"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.10"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}