{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Deploying pre-trained NGC PyTorch models with Amazon SageMaker Neo" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Amazon SageMaker Neo is API to compile machine learning models to optimize them for our choice of hardward targets. In addition to the pre-trained models from TorchVision demonstrated in this SageMaker Neo example [notebook](https://github.com/awslabs/amazon-sagemaker-examples/blob/master/sagemaker_neo_compilation_jobs/pytorch_torchvision/pytorch_torchvision_neo.ipynb), NGC pre-trained models can also be deployed in Neo, as demonstrated in this notebook.\n", "\n", "Before you get started, make sure you have downloaded the [NGC ResNet50 model](https://ngc.nvidia.com/catalog/models/nvidia:rnpyt_fp16) to the `NGC_assets` directory. " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# download the NGC model weights. We will later load the weights into a ResNet50.\n", "!wget https://api.ngc.nvidia.com/v2/models/nvidia/rnpyt_fp16/versions/1/files/NVIDIA_ResNet50v15_FP16_PyT_20190225.pth -O NGC_assets/NVIDIA_ResNet50v15_FP16_PyT_20190225.pth" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Environment Setup" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We'll import appropriate Python packages as well as a python script from [NVIDIA Github](https://github.com/NVIDIA/DeepLearningExamples/blob/master/PyTorch/Classification/ConvNets/image_classification/resnet.py)/[NGC model script](https://ngc.nvidia.com/catalog/model-scripts/nvidia:resnet_50_v1_5_for_pytorch).\n", "\n", "For this notebook, you can use the kernel conda_pytorch_p36." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import torch\n", "import tarfile\n", "\n", "# a python file from DeepLearningExamples or NGC model scripts, to build the resnet model\n", "import NGC_assets.image_classification_resnet as models \n", "!mkdir -p \"NGC_assets\"" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "torch.__version__\n", "# 1.4.0 or 1.2.0 are both okay" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# make sure the following shapes match\n", "input_shape = [1,3,224,224]\n", "data_shape = '{\"input0\":[1,3,224,224]}'\n", "\n", "# make sure the following types match\n", "target_device = 'ml_p3' \n", "endpoint_instance_type = 'ml.p3.2xlarge' \n", "#target_device options: https://docs.aws.amazon.com/cli/latest/reference/sagemaker/create-compilation-job.html" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Build ResNet50 from NGC\n", "The next section will build a ResNet50 from NGC_assets/image_classification_resnet.py, and load in the downloaded weights. " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# First let's define parameters of the model to build. \n", "args_arch = \"resnet50\"\n", "args_model_config = \"fanin\"\n", "args_weights = \"NGC_assets/NVIDIA_ResNet50v15_FP16_PyT_20190225.pth\"\n", "args_precision = \"FP16\"\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The weights downloaded from NGC can then be loaded into the reconstructed model. For more details of the network, see NGC_assets/image_classification_resnet.py, downloaded from [NVIDIA Github](https://github.com/NVIDIA/DeepLearningExamples/blob/master/PyTorch/Classification/ConvNets/image_classification/resnet.py)/[NGC model script](https://ngc.nvidia.com/catalog/model-scripts/nvidia:resnet_50_v1_5_for_pytorch)." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# build a model \n", "model = models.build_resnet(args_arch, args_model_config, verbose=False)\n", "\n", "# load NGC downloaded weights \n", "weights = torch.load(args_weights)\n", "model.load_state_dict(weights)\n", "\n", "model = model.cuda()\n", "\n", "# if we want FP16 precision\n", "if args_precision == \"FP16\":\n", " model =model.half()\n", "\n", "model.eval()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Save the reconstructed model, to pth and to a tar file." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "\n", "trace = torch.jit.trace(model.float().eval(), torch.zeros(input_shape).float().cuda())\n", "trace.save('NGC_assets/ngc_model.pth')\n", "\n", "with tarfile.open('NGC_assets/NGC_model.tar.gz', 'w:gz') as f:\n", " f.add('NGC_assets/ngc_model.pth')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Invoke Neo Compilation API" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Next, forward the model artifact to Neo Compilation API:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import boto3\n", "import sagemaker\n", "import time\n", "from sagemaker.utils import name_from_base\n", "\n", "role = sagemaker.get_execution_role()\n", "sess = sagemaker.Session()\n", "region = sess.boto_region_name\n", "bucket = sess.default_bucket()\n", "\n", "compilation_job_name = name_from_base('NGC-ResNet50-Neo')\n", "\n", "model_key = '{}/model/NGC_model.tar.gz'.format(compilation_job_name)\n", "model_path = 's3://{}/{}'.format(bucket, model_key)\n", "boto3.resource('s3').Bucket(bucket).upload_file('NGC_assets/NGC_model.tar.gz', model_key)\n", "\n", "sm_client = boto3.client('sagemaker')\n", "\n", "framework = 'PYTORCH'\n", "compiled_model_path = 's3://{}/{}/output'.format(bucket, compilation_job_name)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "response = sm_client.create_compilation_job(\n", " CompilationJobName=compilation_job_name,\n", " RoleArn=role,\n", " InputConfig={\n", " 'S3Uri': model_path,\n", " 'DataInputConfig': data_shape,\n", " 'Framework': framework\n", " },\n", " OutputConfig={\n", " 'S3OutputLocation': compiled_model_path,\n", " 'TargetDevice': target_device\n", " },\n", " StoppingCondition={\n", " 'MaxRuntimeInSeconds': 300\n", " }\n", ")\n", "print(response)\n", "\n", "# Poll every 30 sec\n", "while True:\n", " response = sm_client.describe_compilation_job(CompilationJobName=compilation_job_name)\n", " if response['CompilationJobStatus'] == 'COMPLETED':\n", " break\n", " elif response['CompilationJobStatus'] == 'FAILED':\n", " print(response)\n", " raise RuntimeError('Compilation failed')\n", " print('Compiling ...')\n", " time.sleep(30)\n", "print('Done!')\n", "\n", "# Extract compiled model artifact\n", "compiled_model_path = response['ModelArtifacts']['S3ModelArtifacts']" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Create prediction endpoint" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To create a prediction endpoint, we first specify two additional functions, to be used with Neo Deep Learning Runtime:\n", "\n", "* `neo_preprocess(payload, content_type)`: Function that takes in the payload and Content-Type of each incoming request and returns a NumPy array. Here, the payload is byte-encoded NumPy array, so the function simply decodes the bytes to obtain the NumPy array.\n", "* `neo_postprocess(result)`: Function that takes the prediction results produced by Deep Learining Runtime and returns the response body\n", "\n", "Note: this file is reused from the sample notebook which runs a torchvision ResNet18 model." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "!pygmentize resnet18.py" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Upload the Python script containing the two functions to S3:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "source_key = '{}/source/sourcedir.tar.gz'.format(compilation_job_name)\n", "source_path = 's3://{}/{}'.format(bucket, source_key)\n", "\n", "with tarfile.open('sourcedir.tar.gz', 'w:gz') as f:\n", " f.add('resnet18.py')\n", "\n", "boto3.resource('s3').Bucket(bucket).upload_file('sourcedir.tar.gz', source_key)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We then create a SageMaker model record:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from sagemaker.model import NEO_IMAGE_ACCOUNT\n", "from sagemaker.fw_utils import create_image_uri\n", "\n", "model_name = name_from_base('NGC-ResNet50-Neo')\n", "\n", "framework_version = \"0.4.0\"\n", "image_uri = create_image_uri(region, 'neo-' + framework.lower(), target_device.replace('_', '.'),\n", " framework_version, py_version='py3', account=NEO_IMAGE_ACCOUNT[region])\n", "\n", "response = sm_client.create_model(\n", " ModelName=model_name,\n", " PrimaryContainer={\n", " 'Image': image_uri,\n", " 'ModelDataUrl': compiled_model_path,\n", " 'Environment': { 'SAGEMAKER_SUBMIT_DIRECTORY': source_path }\n", " },\n", " ExecutionRoleArn=role\n", ")\n", "print(response)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Then we create an Endpoint Configuration:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "config_name = model_name\n", "\n", "response = sm_client.create_endpoint_config(\n", " EndpointConfigName=config_name,\n", " ProductionVariants=[\n", " {\n", " 'VariantName': 'default-variant-name',\n", " 'ModelName': model_name,\n", " 'InitialInstanceCount': 1,\n", " 'InstanceType': endpoint_instance_type,\n", " 'InitialVariantWeight': 1.0\n", " },\n", " ],\n", ")\n", "print(response)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Finally, we create an Endpoint:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "endpoint_name = model_name + '-Endpoint'\n", "\n", "response = sm_client.create_endpoint(\n", " EndpointName=endpoint_name,\n", " EndpointConfigName=config_name,\n", ")\n", "print(response)\n", "\n", "print('Creating endpoint ...')\n", "waiter = sm_client.get_waiter('endpoint_in_service')\n", "waiter.wait(EndpointName=endpoint_name)\n", "\n", "response = sm_client.describe_endpoint(EndpointName=endpoint_name)\n", "print(response)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Send requests" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's try to send a cat picture.\n", "\n", "![title](cat.jpg)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import json\n", "import numpy as np\n", "import time\n", "\n", "sm_runtime = boto3.Session().client('sagemaker-runtime')\n", "\n", "\n", "with open('cat.jpg', 'rb') as f:\n", " payload = f.read()\n", "\n", "response = sm_runtime.invoke_endpoint(EndpointName=endpoint_name,\n", " ContentType='application/x-image',\n", " Body=payload)\n", "print(response)\n", "result = json.loads(response['Body'].read().decode())\n", "\n", "print('Most likely class: {}'.format(np.argmax(result)))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Clean up\n", "Don't forget to delete an endpoint after we no longer need it. To see the status of your endpoints, to the SageMaker console, see Inference -> Endpoints." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "sess.delete_endpoint(endpoint_name)" ] } ], "metadata": { "kernelspec": { "display_name": "conda_pytorch_p36", "language": "python", "name": "conda_pytorch_p36" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.5" } }, "nbformat": 4, "nbformat_minor": 2 }