# SageMaker Real-time Dynamic Batching Inference with Torchserve

---

This notebook's CI test result for us-west-2 is as follows. CI test results in other regions can be found at the end of the notebook. 

![This us-west-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/us-west-2/sagemaker-python-sdk|pytorch_batch_inference|sagemaker_batch_inference_torchserve.ipynb)

---

This notebook demonstrates the use of dynamic batching on SageMaker with [torchserve](https://github.com/pytorch/serve/) as a model server. It demonstrates the following
1. Batch inference using DLC i.e. SageMaker's default backend container. This is done by using SageMaker python sdk in script-mode.
2. Specifying inference parameters for torchserve using environment variables.
3. Option to use a custom container with config file for torchserve baked-in the container.

**Imports**

In [None]:
! pip install --upgrade sagemaker

In [None]:
import base64
import json
import numpy as np
import matplotlib.pyplot as plt
from PIL import Image
import os
import boto3, time, json
import sagemaker

**Initiate session and retrieve region, account details**

In [None]:
sm_sess = sagemaker.Session()
role = sagemaker.get_execution_role()

In [None]:
sess = boto3.Session()
region = sess.region_name
account = boto3.client("sts").get_caller_identity().get("Account")

**Prepare model**

In [None]:
bucket = sm_sess.default_bucket()
prefix = "ts-dynamic-batching"
model_file_name = "BERTSeqClassification"

!aws s3 cp s3://torchserve/tar_gz_files/BERTSeqClassification.tar.gz .
!aws s3 cp BERTSeqClassification.tar.gz s3://{bucket}/{prefix}/models/

f"s3://{bucket}/{prefix}/models/"

In [None]:
model_artifact = f"s3://{bucket}/{prefix}/models/{model_file_name}.tar.gz"

In [None]:
model_name = "hf-dynamic-torchserve-sagemaker"

## Use AWS Deep Learning Container

In [None]:
# We'll use a pytorch inference DLC image that ships with sagemaker-pytorch-inference-toolkit v2.0.6. This version includes support for Torchserve environment variables used below.
image_uri = sagemaker.image_uris.retrieve(
 framework="pytorch",
 region=region,
 py_version="py39",
 image_scope="inference",
 version="1.13.1",
 instance_type="ml.c5.9xlarge",
)

In [None]:
image_uri

#### Create SageMaker model, deploy and predict

In [None]:
from sagemaker.pytorch.model import PyTorchModel

env_variables_dict = {
 "SAGEMAKER_TS_BATCH_SIZE": "3",
 "SAGEMAKER_TS_MAX_BATCH_DELAY": "100000",
 "SAGEMAKER_TS_MIN_WORKERS": "1",
 "SAGEMAKER_TS_MAX_WORKERS": "1",
}

pytorch_model = PyTorchModel(
 model_data=model_artifact,
 role=role,
 image_uri=image_uri,
 source_dir="code",
 framework_version="1.13.1",
 entry_point="inference.py",
 env=env_variables_dict,
)

In [None]:
# Change the instance type as necessary, or use 'local' for executing in Sagemaker local mode
instance_type = "ml.c5.18xlarge"

predictor = pytorch_model.deploy(
 initial_instance_count=1,
 instance_type=instance_type,
 serializer=sagemaker.serializers.JSONSerializer(),
 deserializer=sagemaker.deserializers.BytesDeserializer(),
)

## Predictions

#### By spawning a pool of 3 processes we're able to simulate requests from multiple clients and verify inference results

In [None]:
import multiprocessing


def invoke(endpoint_name):
 predictor = sagemaker.predictor.Predictor(
 endpoint_name,
 sm_sess,
 serializer=sagemaker.serializers.JSONSerializer(),
 deserializer=sagemaker.deserializers.BytesDeserializer(),
 )
 return predictor.predict(
 "{Bloomberg has decided to publish a new report on global economic situation.}"
 )


endpoint_name = predictor.endpoint_name
pool = multiprocessing.Pool(3)
results = pool.map(invoke, 3 * [endpoint_name])
pool.close()
pool.join()
print(results)

In [None]:
predictor.delete_endpoint(predictor.endpoint_name)

## Conclusion

Through this exercise, we were able to understand the basics of batch inference using torchserve on Amazon SageMaker. We learnt that we can have several inference requests from different processes/users batched together, and the results will be processed as a batch of inputs. We also learnt that we could either use SageMaker's default DLC container as the base environment, and supply an inference.py script with the model, or create a custom container that can be used with SageMaker for more involved workflows.

## Notebook CI Test Results

This notebook was tested in multiple regions. The test results are as follows, except for us-west-2 which is shown at the top of the notebook.

![This us-east-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/us-east-1/sagemaker-python-sdk|pytorch_batch_inference|sagemaker_batch_inference_torchserve.ipynb)

![This us-east-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/us-east-2/sagemaker-python-sdk|pytorch_batch_inference|sagemaker_batch_inference_torchserve.ipynb)

![This us-west-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/us-west-1/sagemaker-python-sdk|pytorch_batch_inference|sagemaker_batch_inference_torchserve.ipynb)

![This ca-central-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/ca-central-1/sagemaker-python-sdk|pytorch_batch_inference|sagemaker_batch_inference_torchserve.ipynb)

![This sa-east-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/sa-east-1/sagemaker-python-sdk|pytorch_batch_inference|sagemaker_batch_inference_torchserve.ipynb)

![This eu-west-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/eu-west-1/sagemaker-python-sdk|pytorch_batch_inference|sagemaker_batch_inference_torchserve.ipynb)

![This eu-west-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/eu-west-2/sagemaker-python-sdk|pytorch_batch_inference|sagemaker_batch_inference_torchserve.ipynb)

![This eu-west-3 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/eu-west-3/sagemaker-python-sdk|pytorch_batch_inference|sagemaker_batch_inference_torchserve.ipynb)

![This eu-central-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/eu-central-1/sagemaker-python-sdk|pytorch_batch_inference|sagemaker_batch_inference_torchserve.ipynb)

![This eu-north-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/eu-north-1/sagemaker-python-sdk|pytorch_batch_inference|sagemaker_batch_inference_torchserve.ipynb)

![This ap-southeast-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/ap-southeast-1/sagemaker-python-sdk|pytorch_batch_inference|sagemaker_batch_inference_torchserve.ipynb)

![This ap-southeast-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/ap-southeast-2/sagemaker-python-sdk|pytorch_batch_inference|sagemaker_batch_inference_torchserve.ipynb)

![This ap-northeast-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/ap-northeast-1/sagemaker-python-sdk|pytorch_batch_inference|sagemaker_batch_inference_torchserve.ipynb)

![This ap-northeast-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/ap-northeast-2/sagemaker-python-sdk|pytorch_batch_inference|sagemaker_batch_inference_torchserve.ipynb)

![This ap-south-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/ap-south-1/sagemaker-python-sdk|pytorch_batch_inference|sagemaker_batch_inference_torchserve.ipynb)
