# Deploying pre-trained NGC PyTorch models with Amazon SageMaker Neo

Amazon SageMaker Neo is API to compile machine learning models to optimize them for our choice of hardward targets. In addition to the pre-trained models from TorchVision demonstrated in this SageMaker Neo example [notebook](https://github.com/awslabs/amazon-sagemaker-examples/blob/master/sagemaker_neo_compilation_jobs/pytorch_torchvision/pytorch_torchvision_neo.ipynb), NGC pre-trained models can also be deployed in Neo, as demonstrated in this notebook.

Before you get started, make sure you have downloaded the [NGC ResNet50 model](https://ngc.nvidia.com/catalog/models/nvidia:rnpyt_fp16) to the `NGC_assets` directory. 

In [None]:
# download the NGC model weights. We will later load the weights into a ResNet50.
!wget https://api.ngc.nvidia.com/v2/models/nvidia/rnpyt_fp16/versions/1/files/NVIDIA_ResNet50v15_FP16_PyT_20190225.pth -O NGC_assets/NVIDIA_ResNet50v15_FP16_PyT_20190225.pth

## Environment Setup

We'll import appropriate Python packages as well as a python script from [NVIDIA Github](https://github.com/NVIDIA/DeepLearningExamples/blob/master/PyTorch/Classification/ConvNets/image_classification/resnet.py)/[NGC model script](https://ngc.nvidia.com/catalog/model-scripts/nvidia:resnet_50_v1_5_for_pytorch).

For this notebook, you can use the kernel conda_pytorch_p36.

In [None]:
import torch
import tarfile

# a python file from DeepLearningExamples or NGC model scripts, to build the resnet model
import NGC_assets.image_classification_resnet as models 
!mkdir -p "NGC_assets"

In [None]:
torch.__version__
# 1.4.0 or 1.2.0 are both okay

In [None]:
# make sure the following shapes match
input_shape = [1,3,224,224]
data_shape = '{"input0":[1,3,224,224]}'

# make sure the following types match
target_device = 'ml_p3' 
endpoint_instance_type = 'ml.p3.2xlarge' 
#target_device options: https://docs.aws.amazon.com/cli/latest/reference/sagemaker/create-compilation-job.html

# Build ResNet50 from NGC
The next section will build a ResNet50 from NGC_assets/image_classification_resnet.py, and load in the downloaded weights. 

In [None]:
# First let's define parameters of the model to build. 
args_arch = "resnet50"
args_model_config = "fanin"
args_weights = "NGC_assets/NVIDIA_ResNet50v15_FP16_PyT_20190225.pth"
args_precision = "FP16"


The weights downloaded from NGC can then be loaded into the reconstructed model. For more details of the network, see NGC_assets/image_classification_resnet.py, downloaded from [NVIDIA Github](https://github.com/NVIDIA/DeepLearningExamples/blob/master/PyTorch/Classification/ConvNets/image_classification/resnet.py)/[NGC model script](https://ngc.nvidia.com/catalog/model-scripts/nvidia:resnet_50_v1_5_for_pytorch).

In [None]:
# build a model 
model = models.build_resnet(args_arch, args_model_config, verbose=False)

# load NGC downloaded weights 
weights = torch.load(args_weights)
model.load_state_dict(weights)

model = model.cuda()

# if we want FP16 precision
if args_precision == "FP16":
 model =model.half()

model.eval()

Save the reconstructed model, to pth and to a tar file.

In [None]:

trace = torch.jit.trace(model.float().eval(), torch.zeros(input_shape).float().cuda())
trace.save('NGC_assets/ngc_model.pth')

with tarfile.open('NGC_assets/NGC_model.tar.gz', 'w:gz') as f:
 f.add('NGC_assets/ngc_model.pth')

## Invoke Neo Compilation API

Next, forward the model artifact to Neo Compilation API:

In [None]:
import boto3
import sagemaker
import time
from sagemaker.utils import name_from_base

role = sagemaker.get_execution_role()
sess = sagemaker.Session()
region = sess.boto_region_name
bucket = sess.default_bucket()

compilation_job_name = name_from_base('NGC-ResNet50-Neo')

model_key = '{}/model/NGC_model.tar.gz'.format(compilation_job_name)
model_path = 's3://{}/{}'.format(bucket, model_key)
boto3.resource('s3').Bucket(bucket).upload_file('NGC_assets/NGC_model.tar.gz', model_key)

sm_client = boto3.client('sagemaker')

framework = 'PYTORCH'
compiled_model_path = 's3://{}/{}/output'.format(bucket, compilation_job_name)

In [None]:
response = sm_client.create_compilation_job(
 CompilationJobName=compilation_job_name,
 RoleArn=role,
 InputConfig={
 'S3Uri': model_path,
 'DataInputConfig': data_shape,
 'Framework': framework
 },
 OutputConfig={
 'S3OutputLocation': compiled_model_path,
 'TargetDevice': target_device
 },
 StoppingCondition={
 'MaxRuntimeInSeconds': 300
 }
)
print(response)

# Poll every 30 sec
while True:
 response = sm_client.describe_compilation_job(CompilationJobName=compilation_job_name)
 if response['CompilationJobStatus'] == 'COMPLETED':
 break
 elif response['CompilationJobStatus'] == 'FAILED':
 print(response)
 raise RuntimeError('Compilation failed')
 print('Compiling ...')
 time.sleep(30)
print('Done!')

# Extract compiled model artifact
compiled_model_path = response['ModelArtifacts']['S3ModelArtifacts']

## Create prediction endpoint

To create a prediction endpoint, we first specify two additional functions, to be used with Neo Deep Learning Runtime:

* `neo_preprocess(payload, content_type)`: Function that takes in the payload and Content-Type of each incoming request and returns a NumPy array. Here, the payload is byte-encoded NumPy array, so the function simply decodes the bytes to obtain the NumPy array.
* `neo_postprocess(result)`: Function that takes the prediction results produced by Deep Learining Runtime and returns the response body

Note: this file is reused from the sample notebook which runs a torchvision ResNet18 model.

In [None]:
!pygmentize resnet18.py

Upload the Python script containing the two functions to S3:

In [None]:
source_key = '{}/source/sourcedir.tar.gz'.format(compilation_job_name)
source_path = 's3://{}/{}'.format(bucket, source_key)

with tarfile.open('sourcedir.tar.gz', 'w:gz') as f:
 f.add('resnet18.py')

boto3.resource('s3').Bucket(bucket).upload_file('sourcedir.tar.gz', source_key)

We then create a SageMaker model record:

In [None]:
from sagemaker.model import NEO_IMAGE_ACCOUNT
from sagemaker.fw_utils import create_image_uri

model_name = name_from_base('NGC-ResNet50-Neo')

framework_version = "0.4.0"
image_uri = create_image_uri(region, 'neo-' + framework.lower(), target_device.replace('_', '.'),
 framework_version, py_version='py3', account=NEO_IMAGE_ACCOUNT[region])

response = sm_client.create_model(
 ModelName=model_name,
 PrimaryContainer={
 'Image': image_uri,
 'ModelDataUrl': compiled_model_path,
 'Environment': { 'SAGEMAKER_SUBMIT_DIRECTORY': source_path }
 },
 ExecutionRoleArn=role
)
print(response)

Then we create an Endpoint Configuration:

In [None]:
config_name = model_name

response = sm_client.create_endpoint_config(
 EndpointConfigName=config_name,
 ProductionVariants=[
 {
 'VariantName': 'default-variant-name',
 'ModelName': model_name,
 'InitialInstanceCount': 1,
 'InstanceType': endpoint_instance_type,
 'InitialVariantWeight': 1.0
 },
 ],
)
print(response)

Finally, we create an Endpoint:

In [None]:
endpoint_name = model_name + '-Endpoint'

response = sm_client.create_endpoint(
 EndpointName=endpoint_name,
 EndpointConfigName=config_name,
)
print(response)

print('Creating endpoint ...')
waiter = sm_client.get_waiter('endpoint_in_service')
waiter.wait(EndpointName=endpoint_name)

response = sm_client.describe_endpoint(EndpointName=endpoint_name)
print(response)

## Send requests

Let's try to send a cat picture.

![title](cat.jpg)

In [None]:
import json
import numpy as np
import time

sm_runtime = boto3.Session().client('sagemaker-runtime')


with open('cat.jpg', 'rb') as f:
 payload = f.read()

response = sm_runtime.invoke_endpoint(EndpointName=endpoint_name,
 ContentType='application/x-image',
 Body=payload)
print(response)
result = json.loads(response['Body'].read().decode())

print('Most likely class: {}'.format(np.argmax(result)))

# Clean up
Don't forget to delete an endpoint after we no longer need it. To see the status of your endpoints, to the SageMaker console, see Inference -> Endpoints.

In [None]:
sess.delete_endpoint(endpoint_name)