# Deploy Yolov7 to SageMaker + Inferentia


We'll create a SageMaker real-time endpoint with a Yolov7 model capable of detecting people and predicting the pose of each person. For that purpose, we need to get the model and prepare it to be deployed to AWS Inferentia.

## 1) Install dependencies

In [None]:
# with this library we can build docker images and push them to ECR
%pip install sagemaker-studio-image-build

## 2) Compile a pre-trained model
When you deploy a model to a SageMaker Endpoint/inf1 instance (AWS Inferentia), you first need compile the model with NeuronSDK. We'll use a sample provided by the official AWS Neuron SDK + Inferentia Samples.

- Clone the repo: https://github.com/aws-neuron/aws-neuron-samples
- Load the jupyter notebook for Yolov7: https://github.com/aws-neuron/aws-neuron-samples/tree/master/torch-neuron/inference/yolov7
- Start running the notebook, but enable Dynamic Batch and also Neuron Core Pipelines for 4 Neuron Cores,in model compilation section, as following:

```python
import torch
import torch.neuron

model_neuron = torch.neuron.trace(
 model, example_inputs=x,
 dynamic_batch_size=True,
 compiler_args['--neuron-core-pipeline', '4']
)

## Export to saved model
model_neuron.save("yolov7_neuron.pt")
```

## 3) Pack and upload the model to S3
After compiling the model with the instructions above, **copy** the model to the same directory of this Notebook

In [None]:
import os
import io
import tarfile
import sagemaker

sagemaker_session = sagemaker.Session()
bucket = sagemaker_session.default_bucket()
image_name='pytorch-inference-neuron'
image_tag="1.10.2h-neuron-py37-sdk1.19.0-ubuntu18.04"
model_s3_path="models/yolov7-pose/model.tar.gz"

In [None]:
with io.BytesIO() as tar_file:
 with tarfile.open(fileobj=tar_file, mode='w:gz') as tar:
 tar.add('yolov7_neuron.pt', 'model.pt')
 tar.list()
 tar_file.seek(0)
 s3_uri = sagemaker_session.upload_string_as_file_body(
 tar_file.read(), bucket=bucket, key=model_s3_path
 )
 print(s3_uri)

## 3) Build a custom docker container with additional libraries
**YOU DON"T NEED TO RUN** this section if you already did that before

We'll extend a pythorch-inference container to apply a patch that allow us to pass CustomAttributes to our code and also to install required libraries like libJPEG Turbo.

In [None]:
!pygmentize container_01/Dockerfile

In [None]:
!sm-docker build container_01/ --repository $image_name:$image_tag

## 4) Inference Code executed by SageMaker Endpoint
We need to create a custom inference file to pass to SageMaker. This code has the mechanisms to invoke the model and also pre/post process the input jpeg image & predictions.

- **input_fn()**: Will receive the bytes of a .jpeg file. This file needs to be a mosaic, composed of multiple frames in just one image. By using **CustomAttributes** we share some metadata about the mosaic to the endpoint. With tile_width and tile_height we can compute how many images does the mosaic have, parse it and build a batch.
- **output_fn()**: Gets the predictions and converts them to a numpy blob

In [None]:
!pygmentize code_01/inference.py

## 5) Deploy our model to SageMaker

In [None]:
import boto3
import logging
from sagemaker.pytorch.model import PyTorchModel
from sagemaker.predictor import Predictor

sagemaker_session = sagemaker.Session()

account_id = boto3.client('sts').get_caller_identity().get('Account')
region_name = sagemaker_session.boto_session.region_name
bucket = sagemaker_session.default_bucket()
s3_uri=f"s3://{bucket}/{model_s3_path}"
role=sagemaker.get_execution_role()
print(f"Bucket: {bucket}\nAWS AccountID: {account_id}\nRegion: {region_name}")

# https://github.com/aws/deep-learning-containers/blob/master/available_images.md#neuron-containers
image_uri=f"{account_id}.dkr.ecr.{region_name}.amazonaws.com/{image_name}:{image_tag}"

print(image_uri)
sagemaker_model = PyTorchModel(
 image_uri=image_uri,
 model_data=s3_uri, 
 role=role, 
 name="yolov7-pose-inferentia",
 sagemaker_session=sagemaker_session,
 entry_point="code_01/inference.py",
 container_log_level=logging.DEBUG,
 model_server_workers=4, # keep 4 workers
 framework_version="1.10.0",
 # for production it is important to define vpc_config and use a vpc_endpoint
 #vpc_config={
 # 'Subnets': ['', ''],
 # 'SecurityGroupIds': ['', '']
 #}
)
sagemaker_model._is_compiled_model = True

In [None]:
predictor = sagemaker_model.deploy(
 endpoint_name="yolov7-pose-inferentia",
 instance_type="ml.inf1.6xlarge",
 initial_instance_count=1
)

## 6) Test the endpoint

In [None]:
%matplotlib inline
import os
import cv2
import numpy as np
import urllib.request
import matplotlib.pyplot as plt

if not os.path.isfile('zidane.jpg'):
 urllib.request.urlretrieve(
 'https://raw.githubusercontent.com/ultralytics/yolov5/master/data/images/zidane.jpg',
 'zidane.jpg'
 )
 
if not os.path.isfile('mosaic4.jpg'):
 img = cv2.imread('zidane.jpg')
 h,w,c = img.shape
 factor = 960/w
 new_h,new_w=int(h*factor),int(w*factor)
 img = cv2.resize(img, (new_w,new_h))
 mosaic = np.zeros((new_h*2, new_w*2, c), dtype=np.uint8)
 for i in range(2):
 for j in range(2):
 ph, pw = i*new_h, j*new_w
 mosaic[ph:ph+new_h, pw:pw+new_w] = img[:]
 cv2.imwrite('mosaic4.jpg', mosaic)
plt.figure(figsize=(15,10))
plt.imshow(cv2.cvtColor(cv2.imread('mosaic4.jpg'), cv2.COLOR_BGR2RGB))

In [None]:
import json
import time
import sagemaker
import numpy as np
from sagemaker.predictor import Predictor
from sagemaker.serializers import DataSerializer
from sagemaker.deserializers import NumpyDeserializer

sagemaker_session = sagemaker.Session()

predictor = Predictor(endpoint_name="yolov7-pose-inferentia", sagemaker_session=sagemaker_session)
predictor.serializer = DataSerializer(content_type='image/jpeg')
predictor.deserializer = NumpyDeserializer()

mosaic_size=2
custom_attributes={
 'CustomAttributes': json.dumps({ 
 "tile_width": 960, 
 "tile_height": 540,
 "conf_thres": 0.15,
 "iou_thres": 0.45
 })
}
data = open(f'mosaic{mosaic_size*mosaic_size}.jpg', 'rb').read()
t = time.time()
y = predictor.predict(data, initial_args=custom_attributes)
elapsed = (time.time()-t) * 1000
print(f"Elapsed: {elapsed}, Latency per image: {elapsed / (mosaic_size ** 2)}")
y.shape