{
 "cells": [
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "615b38ad-e8e5-49f4-a66a-036534c62798",
   "metadata": {},
   "source": [
    "# SageMaker Real-time Dynamic Batching Inference with Torchserve"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "---\n",
    "\n",
    "This notebook's CI test result for us-west-2 is as follows. CI test results in other regions can be found at the end of the notebook. \n",
    "\n",
    "![This us-west-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/us-west-2/sagemaker-python-sdk|pytorch_batch_inference|sagemaker_batch_inference_torchserve.ipynb)\n",
    "\n",
    "---"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "d1647aa0-0140-40fc-bf58-bd8cf786d7a4",
   "metadata": {},
   "source": [
    "This notebook demonstrates the use of dynamic batching on SageMaker with [torchserve](https://github.com/pytorch/serve/) as a model server. It demonstrates the following\n",
    "1. Batch inference using DLC i.e. SageMaker's default backend container. This is done by using SageMaker python sdk in script-mode.\n",
    "2. Specifying inference parameters for torchserve using environment variables.\n",
    "3. Option to use a custom container with config file for torchserve baked-in the container."
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "beb7434c-2d73-41dc-a56c-7db10b9f552f",
   "metadata": {},
   "source": [
    "**Imports**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "5db9333a",
   "metadata": {},
   "outputs": [],
   "source": [
    "! pip install --upgrade sagemaker"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "ce290e30-dadc-4fa8-bd15-7db4bafe4c90",
   "metadata": {},
   "outputs": [],
   "source": [
    "import base64\n",
    "import json\n",
    "import numpy as np\n",
    "import matplotlib.pyplot as plt\n",
    "from PIL import Image\n",
    "import os\n",
    "import boto3, time, json\n",
    "import sagemaker"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "32d5ef87-09c4-44ac-9dc4-748cc90960e4",
   "metadata": {},
   "source": [
    "**Initiate session and retrieve region, account details**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "474ea986-41bc-413b-88e6-470f7258d749",
   "metadata": {},
   "outputs": [],
   "source": [
    "sm_sess = sagemaker.Session()\n",
    "role = sagemaker.get_execution_role()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "3227d6a8-3db5-4938-86db-59f2deec2303",
   "metadata": {},
   "outputs": [],
   "source": [
    "sess = boto3.Session()\n",
    "region = sess.region_name\n",
    "account = boto3.client(\"sts\").get_caller_identity().get(\"Account\")"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "aa306e3b-0f01-46d3-b35e-4700ff7c38ad",
   "metadata": {},
   "source": [
    "**Prepare model**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "9ac74bc9-8463-45fe-b090-3a3a4bff6f66",
   "metadata": {},
   "outputs": [],
   "source": [
    "bucket = sm_sess.default_bucket()\n",
    "prefix = \"ts-dynamic-batching\"\n",
    "model_file_name = \"BERTSeqClassification\"\n",
    "\n",
    "!aws s3 cp s3://torchserve/tar_gz_files/BERTSeqClassification.tar.gz .\n",
    "!aws s3 cp BERTSeqClassification.tar.gz s3://{bucket}/{prefix}/models/\n",
    "\n",
    "f\"s3://{bucket}/{prefix}/models/\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "9fbed613-c3cd-4d10-95a6-93eecc4a7318",
   "metadata": {},
   "outputs": [],
   "source": [
    "model_artifact = f\"s3://{bucket}/{prefix}/models/{model_file_name}.tar.gz\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "4cdb7075-f6d4-4d6e-8192-ebe436746d25",
   "metadata": {},
   "outputs": [],
   "source": [
    "model_name = \"hf-dynamic-torchserve-sagemaker\""
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "d8119e33-fed7-4685-aa4c-caa468262e13",
   "metadata": {},
   "source": [
    "## Use AWS Deep Learning Container"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "930811ed-b740-4852-9671-37f797400cf4",
   "metadata": {},
   "outputs": [],
   "source": [
    "# We'll use a pytorch inference DLC image that ships with sagemaker-pytorch-inference-toolkit v2.0.6. This version includes support for Torchserve environment variables used below.\n",
    "image_uri = sagemaker.image_uris.retrieve(\n",
    "    framework=\"pytorch\",\n",
    "    region=region,\n",
    "    py_version=\"py39\",\n",
    "    image_scope=\"inference\",\n",
    "    version=\"1.13.1\",\n",
    "    instance_type=\"ml.c5.9xlarge\",\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "37a0b4de-34fb-41ef-8c40-d4db1215818c",
   "metadata": {},
   "outputs": [],
   "source": [
    "image_uri"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "7e0d34fb-980e-402a-9773-e86d03c7bb1a",
   "metadata": {},
   "source": [
    "#### Create SageMaker model, deploy and predict"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "66d847dd-ab32-4313-b616-32f166bfacf4",
   "metadata": {},
   "outputs": [],
   "source": [
    "from sagemaker.pytorch.model import PyTorchModel\n",
    "\n",
    "env_variables_dict = {\n",
    "    \"SAGEMAKER_TS_BATCH_SIZE\": \"3\",\n",
    "    \"SAGEMAKER_TS_MAX_BATCH_DELAY\": \"100000\",\n",
    "    \"SAGEMAKER_TS_MIN_WORKERS\": \"1\",\n",
    "    \"SAGEMAKER_TS_MAX_WORKERS\": \"1\",\n",
    "}\n",
    "\n",
    "pytorch_model = PyTorchModel(\n",
    "    model_data=model_artifact,\n",
    "    role=role,\n",
    "    image_uri=image_uri,\n",
    "    source_dir=\"code\",\n",
    "    framework_version=\"1.13.1\",\n",
    "    entry_point=\"inference.py\",\n",
    "    env=env_variables_dict,\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "3bff4c42-9b5f-43dc-80d9-6d0e96082cab",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Change the instance type as necessary, or use 'local' for executing in Sagemaker local mode\n",
    "instance_type = \"ml.c5.18xlarge\"\n",
    "\n",
    "predictor = pytorch_model.deploy(\n",
    "    initial_instance_count=1,\n",
    "    instance_type=instance_type,\n",
    "    serializer=sagemaker.serializers.JSONSerializer(),\n",
    "    deserializer=sagemaker.deserializers.BytesDeserializer(),\n",
    ")"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "274615f4-5247-4bff-accc-f66112730a16",
   "metadata": {},
   "source": [
    "## Predictions"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "c4d63f34-f14a-4a2e-8db7-9bc2307df40a",
   "metadata": {},
   "source": [
    "#### By spawning a pool of 3 processes we're able to simulate requests from multiple clients and verify inference results"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "bd983997-038d-4e1e-8540-e34ba5ffd0c3",
   "metadata": {},
   "outputs": [],
   "source": [
    "import multiprocessing\n",
    "\n",
    "\n",
    "def invoke(endpoint_name):\n",
    "    predictor = sagemaker.predictor.Predictor(\n",
    "        endpoint_name,\n",
    "        sm_sess,\n",
    "        serializer=sagemaker.serializers.JSONSerializer(),\n",
    "        deserializer=sagemaker.deserializers.BytesDeserializer(),\n",
    "    )\n",
    "    return predictor.predict(\n",
    "        \"{Bloomberg has decided to publish a new report on global economic situation.}\"\n",
    "    )\n",
    "\n",
    "\n",
    "endpoint_name = predictor.endpoint_name\n",
    "pool = multiprocessing.Pool(3)\n",
    "results = pool.map(invoke, 3 * [endpoint_name])\n",
    "pool.close()\n",
    "pool.join()\n",
    "print(results)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "ad59b939-744a-45cf-82fa-efbbf1cac2f6",
   "metadata": {},
   "outputs": [],
   "source": [
    "predictor.delete_endpoint(predictor.endpoint_name)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "1c7d981a-46de-46af-9e96-fcd66e3f057a",
   "metadata": {},
   "source": [
    "## Conclusion\n",
    "\n",
    "Through this exercise, we were able to understand the basics of batch inference using torchserve on Amazon SageMaker. We learnt that we can have several inference requests from different processes/users batched together, and the results will be processed as a batch of inputs. We also learnt that we could either use SageMaker's default DLC container as the base environment, and supply an inference.py script with the model, or create a custom container that can be used with SageMaker for more involved workflows."
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Notebook CI Test Results\n",
    "\n",
    "This notebook was tested in multiple regions. The test results are as follows, except for us-west-2 which is shown at the top of the notebook.\n",
    "\n",
    "![This us-east-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/us-east-1/sagemaker-python-sdk|pytorch_batch_inference|sagemaker_batch_inference_torchserve.ipynb)\n",
    "\n",
    "![This us-east-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/us-east-2/sagemaker-python-sdk|pytorch_batch_inference|sagemaker_batch_inference_torchserve.ipynb)\n",
    "\n",
    "![This us-west-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/us-west-1/sagemaker-python-sdk|pytorch_batch_inference|sagemaker_batch_inference_torchserve.ipynb)\n",
    "\n",
    "![This ca-central-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/ca-central-1/sagemaker-python-sdk|pytorch_batch_inference|sagemaker_batch_inference_torchserve.ipynb)\n",
    "\n",
    "![This sa-east-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/sa-east-1/sagemaker-python-sdk|pytorch_batch_inference|sagemaker_batch_inference_torchserve.ipynb)\n",
    "\n",
    "![This eu-west-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/eu-west-1/sagemaker-python-sdk|pytorch_batch_inference|sagemaker_batch_inference_torchserve.ipynb)\n",
    "\n",
    "![This eu-west-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/eu-west-2/sagemaker-python-sdk|pytorch_batch_inference|sagemaker_batch_inference_torchserve.ipynb)\n",
    "\n",
    "![This eu-west-3 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/eu-west-3/sagemaker-python-sdk|pytorch_batch_inference|sagemaker_batch_inference_torchserve.ipynb)\n",
    "\n",
    "![This eu-central-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/eu-central-1/sagemaker-python-sdk|pytorch_batch_inference|sagemaker_batch_inference_torchserve.ipynb)\n",
    "\n",
    "![This eu-north-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/eu-north-1/sagemaker-python-sdk|pytorch_batch_inference|sagemaker_batch_inference_torchserve.ipynb)\n",
    "\n",
    "![This ap-southeast-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/ap-southeast-1/sagemaker-python-sdk|pytorch_batch_inference|sagemaker_batch_inference_torchserve.ipynb)\n",
    "\n",
    "![This ap-southeast-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/ap-southeast-2/sagemaker-python-sdk|pytorch_batch_inference|sagemaker_batch_inference_torchserve.ipynb)\n",
    "\n",
    "![This ap-northeast-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/ap-northeast-1/sagemaker-python-sdk|pytorch_batch_inference|sagemaker_batch_inference_torchserve.ipynb)\n",
    "\n",
    "![This ap-northeast-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/ap-northeast-2/sagemaker-python-sdk|pytorch_batch_inference|sagemaker_batch_inference_torchserve.ipynb)\n",
    "\n",
    "![This ap-south-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/ap-south-1/sagemaker-python-sdk|pytorch_batch_inference|sagemaker_batch_inference_torchserve.ipynb)\n"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (Data Science 3.0)",
   "language": "python",
   "name": "python3__SAGEMAKER_INTERNAL__arn:aws:sagemaker:us-east-1:081325390199:image/sagemaker-data-science-310-v1"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.6"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}