{ "cells": [ { "attachments": {}, "cell_type": "markdown", "id": "615b38ad-e8e5-49f4-a66a-036534c62798", "metadata": {}, "source": [ "# SageMaker Real-time Dynamic Batching Inference with Torchserve" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "---\n", "\n", "This notebook's CI test result for us-west-2 is as follows. CI test results in other regions can be found at the end of the notebook. \n", "\n", "![This us-west-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/us-west-2/sagemaker-python-sdk|pytorch_batch_inference|sagemaker_batch_inference_torchserve.ipynb)\n", "\n", "---" ] }, { "attachments": {}, "cell_type": "markdown", "id": "d1647aa0-0140-40fc-bf58-bd8cf786d7a4", "metadata": {}, "source": [ "This notebook demonstrates the use of dynamic batching on SageMaker with [torchserve](https://github.com/pytorch/serve/) as a model server. It demonstrates the following\n", "1. Batch inference using DLC i.e. SageMaker's default backend container. This is done by using SageMaker python sdk in script-mode.\n", "2. Specifying inference parameters for torchserve using environment variables.\n", "3. Option to use a custom container with config file for torchserve baked-in the container." ] }, { "attachments": {}, "cell_type": "markdown", "id": "beb7434c-2d73-41dc-a56c-7db10b9f552f", "metadata": {}, "source": [ "**Imports**" ] }, { "cell_type": "code", "execution_count": null, "id": "5db9333a", "metadata": {}, "outputs": [], "source": [ "! pip install --upgrade sagemaker" ] }, { "cell_type": "code", "execution_count": null, "id": "ce290e30-dadc-4fa8-bd15-7db4bafe4c90", "metadata": {}, "outputs": [], "source": [ "import base64\n", "import json\n", "import numpy as np\n", "import matplotlib.pyplot as plt\n", "from PIL import Image\n", "import os\n", "import boto3, time, json\n", "import sagemaker" ] }, { "attachments": {}, "cell_type": "markdown", "id": "32d5ef87-09c4-44ac-9dc4-748cc90960e4", "metadata": {}, "source": [ "**Initiate session and retrieve region, account details**" ] }, { "cell_type": "code", "execution_count": null, "id": "474ea986-41bc-413b-88e6-470f7258d749", "metadata": {}, "outputs": [], "source": [ "sm_sess = sagemaker.Session()\n", "role = sagemaker.get_execution_role()" ] }, { "cell_type": "code", "execution_count": null, "id": "3227d6a8-3db5-4938-86db-59f2deec2303", "metadata": {}, "outputs": [], "source": [ "sess = boto3.Session()\n", "region = sess.region_name\n", "account = boto3.client(\"sts\").get_caller_identity().get(\"Account\")" ] }, { "attachments": {}, "cell_type": "markdown", "id": "aa306e3b-0f01-46d3-b35e-4700ff7c38ad", "metadata": {}, "source": [ "**Prepare model**" ] }, { "cell_type": "code", "execution_count": null, "id": "9ac74bc9-8463-45fe-b090-3a3a4bff6f66", "metadata": {}, "outputs": [], "source": [ "bucket = sm_sess.default_bucket()\n", "prefix = \"ts-dynamic-batching\"\n", "model_file_name = \"BERTSeqClassification\"\n", "\n", "!aws s3 cp s3://torchserve/tar_gz_files/BERTSeqClassification.tar.gz .\n", "!aws s3 cp BERTSeqClassification.tar.gz s3://{bucket}/{prefix}/models/\n", "\n", "f\"s3://{bucket}/{prefix}/models/\"" ] }, { "cell_type": "code", "execution_count": null, "id": "9fbed613-c3cd-4d10-95a6-93eecc4a7318", "metadata": {}, "outputs": [], "source": [ "model_artifact = f\"s3://{bucket}/{prefix}/models/{model_file_name}.tar.gz\"" ] }, { "cell_type": "code", "execution_count": null, "id": "4cdb7075-f6d4-4d6e-8192-ebe436746d25", "metadata": {}, "outputs": [], "source": [ "model_name = \"hf-dynamic-torchserve-sagemaker\"" ] }, { "attachments": {}, "cell_type": "markdown", "id": "d8119e33-fed7-4685-aa4c-caa468262e13", "metadata": {}, "source": [ "## Use AWS Deep Learning Container" ] }, { "cell_type": "code", "execution_count": null, "id": "930811ed-b740-4852-9671-37f797400cf4", "metadata": {}, "outputs": [], "source": [ "# We'll use a pytorch inference DLC image that ships with sagemaker-pytorch-inference-toolkit v2.0.6. This version includes support for Torchserve environment variables used below.\n", "image_uri = sagemaker.image_uris.retrieve(\n", " framework=\"pytorch\",\n", " region=region,\n", " py_version=\"py39\",\n", " image_scope=\"inference\",\n", " version=\"1.13.1\",\n", " instance_type=\"ml.c5.9xlarge\",\n", ")" ] }, { "cell_type": "code", "execution_count": null, "id": "37a0b4de-34fb-41ef-8c40-d4db1215818c", "metadata": {}, "outputs": [], "source": [ "image_uri" ] }, { "attachments": {}, "cell_type": "markdown", "id": "7e0d34fb-980e-402a-9773-e86d03c7bb1a", "metadata": {}, "source": [ "#### Create SageMaker model, deploy and predict" ] }, { "cell_type": "code", "execution_count": null, "id": "66d847dd-ab32-4313-b616-32f166bfacf4", "metadata": {}, "outputs": [], "source": [ "from sagemaker.pytorch.model import PyTorchModel\n", "\n", "env_variables_dict = {\n", " \"SAGEMAKER_TS_BATCH_SIZE\": \"3\",\n", " \"SAGEMAKER_TS_MAX_BATCH_DELAY\": \"100000\",\n", " \"SAGEMAKER_TS_MIN_WORKERS\": \"1\",\n", " \"SAGEMAKER_TS_MAX_WORKERS\": \"1\",\n", "}\n", "\n", "pytorch_model = PyTorchModel(\n", " model_data=model_artifact,\n", " role=role,\n", " image_uri=image_uri,\n", " source_dir=\"code\",\n", " framework_version=\"1.13.1\",\n", " entry_point=\"inference.py\",\n", " env=env_variables_dict,\n", ")" ] }, { "cell_type": "code", "execution_count": null, "id": "3bff4c42-9b5f-43dc-80d9-6d0e96082cab", "metadata": {}, "outputs": [], "source": [ "# Change the instance type as necessary, or use 'local' for executing in Sagemaker local mode\n", "instance_type = \"ml.c5.18xlarge\"\n", "\n", "predictor = pytorch_model.deploy(\n", " initial_instance_count=1,\n", " instance_type=instance_type,\n", " serializer=sagemaker.serializers.JSONSerializer(),\n", " deserializer=sagemaker.deserializers.BytesDeserializer(),\n", ")" ] }, { "attachments": {}, "cell_type": "markdown", "id": "274615f4-5247-4bff-accc-f66112730a16", "metadata": {}, "source": [ "## Predictions" ] }, { "attachments": {}, "cell_type": "markdown", "id": "c4d63f34-f14a-4a2e-8db7-9bc2307df40a", "metadata": {}, "source": [ "#### By spawning a pool of 3 processes we're able to simulate requests from multiple clients and verify inference results" ] }, { "cell_type": "code", "execution_count": null, "id": "bd983997-038d-4e1e-8540-e34ba5ffd0c3", "metadata": {}, "outputs": [], "source": [ "import multiprocessing\n", "\n", "\n", "def invoke(endpoint_name):\n", " predictor = sagemaker.predictor.Predictor(\n", " endpoint_name,\n", " sm_sess,\n", " serializer=sagemaker.serializers.JSONSerializer(),\n", " deserializer=sagemaker.deserializers.BytesDeserializer(),\n", " )\n", " return predictor.predict(\n", " \"{Bloomberg has decided to publish a new report on global economic situation.}\"\n", " )\n", "\n", "\n", "endpoint_name = predictor.endpoint_name\n", "pool = multiprocessing.Pool(3)\n", "results = pool.map(invoke, 3 * [endpoint_name])\n", "pool.close()\n", "pool.join()\n", "print(results)" ] }, { "cell_type": "code", "execution_count": null, "id": "ad59b939-744a-45cf-82fa-efbbf1cac2f6", "metadata": {}, "outputs": [], "source": [ "predictor.delete_endpoint(predictor.endpoint_name)" ] }, { "attachments": {}, "cell_type": "markdown", "id": "1c7d981a-46de-46af-9e96-fcd66e3f057a", "metadata": {}, "source": [ "## Conclusion\n", "\n", "Through this exercise, we were able to understand the basics of batch inference using torchserve on Amazon SageMaker. We learnt that we can have several inference requests from different processes/users batched together, and the results will be processed as a batch of inputs. We also learnt that we could either use SageMaker's default DLC container as the base environment, and supply an inference.py script with the model, or create a custom container that can be used with SageMaker for more involved workflows." ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "## Notebook CI Test Results\n", "\n", "This notebook was tested in multiple regions. The test results are as follows, except for us-west-2 which is shown at the top of the notebook.\n", "\n", "![This us-east-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/us-east-1/sagemaker-python-sdk|pytorch_batch_inference|sagemaker_batch_inference_torchserve.ipynb)\n", "\n", "![This us-east-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/us-east-2/sagemaker-python-sdk|pytorch_batch_inference|sagemaker_batch_inference_torchserve.ipynb)\n", "\n", "![This us-west-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/us-west-1/sagemaker-python-sdk|pytorch_batch_inference|sagemaker_batch_inference_torchserve.ipynb)\n", "\n", "![This ca-central-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/ca-central-1/sagemaker-python-sdk|pytorch_batch_inference|sagemaker_batch_inference_torchserve.ipynb)\n", "\n", "![This sa-east-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/sa-east-1/sagemaker-python-sdk|pytorch_batch_inference|sagemaker_batch_inference_torchserve.ipynb)\n", "\n", "![This eu-west-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/eu-west-1/sagemaker-python-sdk|pytorch_batch_inference|sagemaker_batch_inference_torchserve.ipynb)\n", "\n", "![This eu-west-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/eu-west-2/sagemaker-python-sdk|pytorch_batch_inference|sagemaker_batch_inference_torchserve.ipynb)\n", "\n", "![This eu-west-3 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/eu-west-3/sagemaker-python-sdk|pytorch_batch_inference|sagemaker_batch_inference_torchserve.ipynb)\n", "\n", "![This eu-central-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/eu-central-1/sagemaker-python-sdk|pytorch_batch_inference|sagemaker_batch_inference_torchserve.ipynb)\n", "\n", "![This eu-north-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/eu-north-1/sagemaker-python-sdk|pytorch_batch_inference|sagemaker_batch_inference_torchserve.ipynb)\n", "\n", "![This ap-southeast-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/ap-southeast-1/sagemaker-python-sdk|pytorch_batch_inference|sagemaker_batch_inference_torchserve.ipynb)\n", "\n", "![This ap-southeast-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/ap-southeast-2/sagemaker-python-sdk|pytorch_batch_inference|sagemaker_batch_inference_torchserve.ipynb)\n", "\n", "![This ap-northeast-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/ap-northeast-1/sagemaker-python-sdk|pytorch_batch_inference|sagemaker_batch_inference_torchserve.ipynb)\n", "\n", "![This ap-northeast-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/ap-northeast-2/sagemaker-python-sdk|pytorch_batch_inference|sagemaker_batch_inference_torchserve.ipynb)\n", "\n", "![This ap-south-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/ap-south-1/sagemaker-python-sdk|pytorch_batch_inference|sagemaker_batch_inference_torchserve.ipynb)\n" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (Data Science 3.0)", "language": "python", "name": "python3__SAGEMAKER_INTERNAL__arn:aws:sagemaker:us-east-1:081325390199:image/sagemaker-data-science-310-v1" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.6" } }, "nbformat": 4, "nbformat_minor": 5 }