{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Faster Inference on Raspberry Pi with Amazon SageMaker Neo\n", "\n", "\n", "Amazon SageMaker Neo enables developers to train machine learning models once and run them anywhere in the cloud and at the edge. Amazon SageMaker Neo optimizes models to run up to twice as fast, with less than a tenth of the memory footprint, with no loss in accuracy.\n", "\n", "This notebook demonstrates how a MobileNet based object detection model created in TensorFlow Lite can be compiled using Amazon SageMaker Neo for use on a Raspberry Pi 3B, a common edge platform for experimentation and development, and allows us to see the performance benefits of Sagemaker Neo on our own hardware." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Overview\n", "\n", "The general flow of this tutorial is as follows:\n", "1. Set up our device to access Amazon Web Services\n", "1. Prepare our model for compilation with SageMaker Neo\n", "1. Compile with SageMaker Neo\n", "1. Compare the performance of the model compiled with SageMaker Neo against that of the original model executing with TensorFlow Lite\n", "\n", "> This notebook is intended to be run on a Raspberry Pi 3B, but everything except the inference section can be run on a PC. As such, if running on a PC, it will not be possible to directly observe the performance benefits of Neo. Running on Raspberry Pi will allow for the full notebook to execute." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Prerequisites\n", "\n", "This tutorial leverages Amazon Web Services (AWS) capabilities and assumes we have a Raspberry Pi to use for inference. As such, the below are required:\n", "\n", "#### Required\n", "* An AWS account\n", " * AWS provides [instructions for creating an account](https://aws.amazon.com/premiumsupport/knowledge-center/create-and-activate-aws-account/) if you do not already have one.\n", "* A Raspberry Pi 3B running Raspberry Pi OS 10 to run the benchmarks on a personal device (support for Raspberry Pi 4B running Raspberry Pi OS 10 not tested)\n", "\n", "#### Recommended\n", "* A fresh [Python virtual environment](https://docs.python.org/3/tutorial/venv.html) to avoid conflicts when installing Python packages \n", "\n", "\n", "## Get Started with Amazon Web Services and Boto3\n", "\n", "Amazon SageMaker Neo is a service provided by Amazon Web Services, and using it requires that we do some basic configuration to set up our account on our device, set up access to our resources, and specify which [AWS Region](https://aws.amazon.com/about-aws/global-infrastructure/regions_az/) we want to execute in. While it is possible to access SageMaker Neo [via a web GUI](https://console.aws.amazon.com/sagemaker/home), this tutorial will leverage Boto3 to programmatically interact with SageMaker Neo and other AWS services.\n", "\n", "### Install Packages\n", "\n", "Before we can get started, we need to make sure that we have all of the necessary Python libraries installed on our machine. Specifically, we will be relying on [Boto3](https://github.com/boto/boto3) to access AWS services. If Boto3 is not already installed, it can be installed by running the below cell." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Looking in indexes: https://pypi.org/simple, https://www.piwheels.org/simple\n", "Requirement already satisfied: boto3 in /home/pi/.local/lib/python3.7/site-packages (1.12.6)\n", "Requirement already satisfied: jmespath<1.0.0,>=0.7.1 in /home/pi/.local/lib/python3.7/site-packages (from boto3) (0.9.5)\n", "Requirement already satisfied: s3transfer<0.4.0,>=0.3.0 in /home/pi/.local/lib/python3.7/site-packages (from boto3) (0.3.3)\n", "Requirement already satisfied: botocore<1.16.0,>=1.15.6 in /home/pi/.local/lib/python3.7/site-packages (from boto3) (1.15.6)\n", "Requirement already satisfied: python-dateutil<3.0.0,>=2.1 in /home/pi/.local/lib/python3.7/site-packages (from botocore<1.16.0,>=1.15.6->boto3) (2.8.1)\n", "Requirement already satisfied: urllib3<1.26,>=1.20; python_version != \"3.4\" in /usr/lib/python3/dist-packages (from botocore<1.16.0,>=1.15.6->boto3) (1.24.1)\n", "Requirement already satisfied: docutils<0.16,>=0.10 in /usr/lib/python3/dist-packages (from botocore<1.16.0,>=1.15.6->boto3) (0.14)\n", "Requirement already satisfied: six>=1.5 in /usr/lib/python3/dist-packages (from python-dateutil<3.0.0,>=2.1->botocore<1.16.0,>=1.15.6->boto3) (1.12.0)\n" ] } ], "source": [ "!pip3 install boto3" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Set Up Credentials\n", "Running Boto3 requires Amazon Web Services credentials be set up on our device. [The Boto3 documentation](https://boto3.amazonaws.com/v1/documentation/api/latest/guide/quickstart.html#configuration) has instructions for how to do this, but it is also easy to configure the credentials ourselves. By default, credentials should be stored in the file `~/.aws/credentials` and look as follows:\n", "\n", "```\n", "[default]\n", "aws_access_key_id = YOUR_ACCESS_KEY\n", "aws_secret_access_key = YOUR_SECRET_KEY\n", "```\n", "\n", "The [AWS documentation](https://docs.aws.amazon.com/general/latest/gr/aws-sec-cred-types.html#access-keys-and-secret-access-keys) has instructions for how to get the necessary `aws_access_key_id` and `aws_secret_access_key`." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Set Up Clients for Interaction with Amazon Web Servies\n", "\n", "The first step to get started with Boto3 is to import it and set up clients to interact with AWS services. We will be leveraging two AWS services in this tutorial:\n", "1. Amazon SageMaker Neo, for deep learning model compilation\n", "2. Amazon S3, for storage of our model before and after compilation\n", "\n", "We can set up clients for interaction with these services as follows:\n", "\n", "> Note: data stored in S3 is accessible only from within a single AWS region. As such, the sagemaker client and s3 client must be configured to use the same AWS region." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "import boto3\n", "\n", "AWS_REGION = 'us-west-2'\n", "sagemaker_client = boto3.client('sagemaker', region_name=AWS_REGION)\n", "s3_client = boto3.client('s3', region_name=AWS_REGION)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Set up IAM Role\n", "\n", "In order to use SageMaker Neo we will need to set up an [IAM role](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles.html). This role will allow SageMaker to run under our account and will also give it access to the S3 bucket where we upload our pretrained model.\n", "\n", "The process to programatically set up an IAM role is a bit complicated (it can be done more easily using the web GUI), but it looks as follows.\n", "\n", "First, set up the IAM client and name the role:" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "iam_client = boto3.client('iam', region_name=AWS_REGION)\n", "role_name = 'new-pi-demo-test-role'" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Next, create a dictionary describing the IAM policy to be applied. This policy is used to create a new IAM role." ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "policy = {\n", " 'Statement': [\n", " {\n", " 'Action': 'sts:AssumeRole',\n", " 'Effect': 'Allow',\n", " 'Principal': {'Service': 'sagemaker.amazonaws.com'},\n", " }], \n", " 'Version': '2012-10-17'\n", "}" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We now look to see if a role of name `role_name` already exists. If it does, we note its [ARN](https://docs.aws.amazon.com/IAM/latest/UserGuide/reference_identifiers.html#identifiers-arns), the name by which the role is referred to in AWS systems. If the role does not already exist, we create the role and record the new ARN.\n", "\n", "> If you see an error stating `An error occurred (EntityAlreadyExists) when calling the CreateRole operation: Role with name new-pi-demo-test-role already exists`, you may be able to resolve it by using credentials from a different AWS account." ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "import json\n", "\n", "roles = iam_client.list_roles()\n", "role_arn = None\n", "for role in roles['Roles']:\n", " if role['RoleName'] == role_name:\n", " role_arn = role['Arn']\n", " \n", "if role_arn == None:\n", " new_role = iam_client.create_role(\n", " AssumeRolePolicyDocument=json.dumps(policy),\n", " Path='/',\n", " RoleName=role_name,\n", " )\n", " role_arn = new_role['Role']['Arn']" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "With our role created, we now need to give it the necessary permissions to run SageMaker Neo and access data we store in S3. To give our role those permissions, we make the below function calls:" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [], "source": [ "iam_client.attach_role_policy(\n", " RoleName=role_name,\n", " PolicyArn='arn:aws:iam::aws:policy/AmazonSageMakerFullAccess'\n", ")\n", "\n", "iam_client.attach_role_policy(\n", " RoleName=role_name,\n", " PolicyArn='arn:aws:iam::aws:policy/AmazonS3FullAccess'\n", ");" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Prepare Model for Compilation\n", "\n", "Before we can compile our pretrained model for on device inference using SageMaker Neo, we need to put it in a location accessible to SageMaker Neo. As SageMaker Neo only accepts inputs from Amazon S3, this means that we will need to upload our pretrained model to S3.\n", "\n", "### Get Local Copy of Model\n", "First, we need to get a local copy of our pretrained model. As ours is hosted on the web, we can get a copy using curl." ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " % Total % Received % Xferd Average Speed Time Time Time Current\n", " Dload Upload Total Spent Left Speed\n", "100 2741k 100 2741k 0 0 6496k 0 --:--:-- --:--:-- --:--:-- 6496k\n" ] } ], "source": [ "model_zip_filename = './coco_ssd_mobilenet_v1_1.0.zip'\n", "!curl http://storage.googleapis.com/download.tensorflow.org/models/tflite/coco_ssd_mobilenet_v1_1.0_quant_2018_06_29.zip \\\n", " --output {model_zip_filename}" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As our model was stored in a zip file, we need to unzip it to get model out." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Archive: ./coco_ssd_mobilenet_v1_1.0.zip\r\n", " inflating: detect.tflite \r\n", " inflating: labelmap.txt \r\n" ] } ], "source": [ "!unzip -u {model_zip_filename}" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We see that `detect.tflite` was extracted from the zip file, perfect! This is the file we want to pass into SageMaker Neo for compilation." ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "model_filename = 'detect.tflite'\n", "model_name = model_filename.split('.')[0]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Upload Model to S3\n", "\n", "Before we can upload our model to S3, we need to create a [\"bucket\"](https://docs.aws.amazon.com/AmazonS3/latest/dev/UsingBucket.html) to store it in. We can do this using boto3 as follows:\n", "\n", "First, we need to give our bucket a unique name:\n", "\n", "> **NOTE**: This bucket *must* be renamed as S3 bucket names must be globally unique across all AWS accounts. As the default bucket name below has been used before, bucket creation will fail." ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [], "source": [ "# TODO: manually rename this bucket\n", "bucket_name = 'dvisnty-bucket-party'" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now we need to create a bucket of this name using Boto3:" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Bucket dvisnty-bucket-party already exists. No action needed.\n" ] } ], "source": [ "if boto3.resource('s3').Bucket(bucket_name) not in boto3.resource('s3').buckets.all():\n", " s3_client.create_bucket(\n", " Bucket=bucket_name,\n", " CreateBucketConfiguration={\n", " 'LocationConstraint': AWS_REGION\n", " }\n", " )\n", "else:\n", " print(f'Bucket {bucket_name} already exists. No action needed.')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now that we've created our bucket, we just need to upload our model, `detect.tflite`, to S3. SageMaker Neo expects the model be uploaded as a `.tar.gz` file, so we need to first compress our model." ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [], "source": [ "model_tar = model_name + '.tar.gz'\n", "!tar -czf {model_tar} {model_filename}" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now it is time to upload our pretrained model." ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [], "source": [ "s3_client.upload_file(model_tar, bucket_name, model_tar)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Our model should now be accessible at" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'s3://dvisnty-bucket-party/detect.tar.gz'" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "s3_input_location = f's3://{bucket_name}/{model_tar}'\n", "s3_input_location" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Compile with SageMaker Neo\n", "\n", "Now that we have our model uploaded to S3, we just need to invoke SageMaker Neo on our input.\n", "\n", "SageMaker Neo requires the following information in order to run:\n", "1. The location of the input model (in S3)\n", "2. Where to save the compiled model (in S3)\n", "3. The framework of the input model (TensorFlow Lite, MXNet, etc.)\n", "4. The shapes of the model's inputs\n", "5. The name of the target device to compile for, or the general details of the hardware platform" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Set S3 Output Location\n", "The finished model will be output to an S3 bucket of our choosing. Here we choose to output to the same bucket as the input model resides in." ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [], "source": [ "s3_output_location = f's3://{bucket_name}/output'" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Set Input Tensor Shapes\n", "\n", "SageMaker Neo requires the name and shape of every input tensor. These are passed in as `key: value` pairs where `value` is a list of the integer dimensions of an input tensor and `key` is the exact name of an input tensor in the model. \n", "\n", "> [Netron](https://lutzroeder.github.io/netron/) can be used to visualize a model and find the names/dimensions of the input tensors." ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [], "source": [ "# Note: if the model being compiled is different than the default\n", "# for this tutorial, the key in this dictionary will need to be changed\n", "# to the new input tensor's name\n", "data_shape = '{\"normalized_input_image_tensor\":[1, 300, 300, 3]}'" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Set Framework\n", "We are using a TensorFlow Lite model for this tutorial" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [], "source": [ "framework = 'tflite'" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Set Target Device\n", "\n", "This tutorial targets the Raspberry Pi 3B running Raspberry Pi OS 10. SageMaker comes with a saved configuration for this device that takes into account the capabilities and limitations of the Raspberry Pi 3B, so we use this configuration for ease of use. It is possible to target additional devices, such as the Raspberry Pi 4B, but that requires manually specifying the operating system, instruction set, and capabilities of the processor, e.g. ARM NEON." ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [], "source": [ "target_device = 'rasp3b'" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Submit Model to SageMaker Neo\n", "\n", "With all of the inputs ready for SageMaker Neo, we now make a call to SageMaker Neo using our previously created Boto3 client." ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Compilation job for pi-demo1602700548 started\n", "Compiling ...\n", "Compiling ...\n", "Done!\n" ] } ], "source": [ "import time\n", "compilation_job_name = 'pi-demo'+ str(time.time()).split('.')[0]\n", "print(f'Compilation job for {compilation_job_name} started')\n", "\n", "response = sagemaker_client.create_compilation_job(\n", " CompilationJobName=compilation_job_name,\n", " RoleArn=role_arn,\n", " InputConfig={\n", " 'S3Uri': s3_input_location,\n", " 'DataInputConfig': data_shape,\n", " 'Framework': framework.upper()\n", " },\n", " OutputConfig={\n", " 'S3OutputLocation': s3_output_location,\n", " 'TargetDevice': target_device \n", " },\n", " StoppingCondition={\n", " 'MaxRuntimeInSeconds': 900\n", " }\n", ")\n", "\n", "# Uncomment this for additional debug information\n", "# print(response)\n", "\n", "# Poll every 30 sec to check completion status\n", "while True:\n", " response = sagemaker_client.describe_compilation_job(CompilationJobName=compilation_job_name)\n", " if response['CompilationJobStatus'] == 'COMPLETED':\n", " break\n", " elif response['CompilationJobStatus'] == 'FAILED':\n", " raise RuntimeError('Compilation failed')\n", " print('Compiling ...')\n", " time.sleep(30)\n", "print('Done!')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Success! At this point compilation is done, and our compiled model specifically targeting the Raspberry Pi 3B is accessible in S3. Let's download our compiled model, consisting of multiple files in a `.tar.gz` file, back to our local machine\n", "\n", "> If you instead see a failure with the failure reason in the SageMaker console `ClientError: Failed to assume role: An error occurred (AccessDenied) when calling the AssumeRole operation...`, try compiling using a different AWS account's credentials" ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [], "source": [ "object_path = f'output/{model_name}-{target_device}.tar.gz'\n", "neo_compiled_model = f'compiled-{model_name}.tar.gz'\n", "s3_client.download_file(bucket_name, object_path, neo_compiled_model)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Measure On-device Inference Latency\n", "\n", "Now that we have our Neo-compiled model, we can run it on a Raspberry Pi. As our model was compiled targeting Raspberry Pi 3B, these instructions will **only** work on a Raspberry Pi 3B running Raspberry Pi OS 10. There will be errors if we try to do inference using this model using either x86_64 Linux or a Raspberry Pi 4B running 64-bit Ubuntu server.\n", "\n", "From here on out, this tutorial is written with the assumption that we are executing on a Raspberry Pi.\n", "\n", "> The below steps will not run properly on any platform but a Raspberry Pi 3B running Raspberry Pi OS 10" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Set Up the Raspberry Pi\n", "\n", "We need a few more libraries installed on the Raspberry Pi in order to run inference. Specifically we will want [Neo-AI-DLR](https://github.com/neo-ai/neo-ai-dlr) to run our Neo-compiled model, NumPy for data manipulation and statistics, and PIL to load images. We also want a copy of TensorFlow so we can gauge the impact of compilation using SageMaker Neo versus a baseline. Lastly, we want Matplotlib to generate charts so we can compare performance.\n", "\n", "> Rather than install the full TensorFlow, it is also possible to install just the TensorFlow Lite runtime. This can be done by following the instructions in the [official TensorFlow Lite documentation](https://www.tensorflow.org/lite/guide/python)\n", "\n", "> TensorFlow does not provide a current (v2.x) package for Raspberry Pi OS 10 (see https://github.com/tensorflow/tensorflow/issues/29704 for discussion). As such, this tutorial may install an older version of TensorFlow. We can load a more current version of TensorFlow onto a Raspberry Pi, but this requires more work and is out of scope for this tutorial." ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Looking in indexes: https://pypi.org/simple, https://www.piwheels.org/simple\n", "Requirement already satisfied: dlr in /usr/local/lib/python3.7/dist-packages/dlr-1.3.0-py3.7.egg (1.3.0)\n", "Requirement already satisfied: numpy in /home/pi/.local/lib/python3.7/site-packages (1.19.0)\n", "Requirement already satisfied: pillow in /usr/lib/python3/dist-packages (5.4.1)\n", "Requirement already satisfied: tensorflow in /home/pi/.local/lib/python3.7/site-packages (1.14.0)\n", "Requirement already satisfied: matplotlib in /home/pi/.local/lib/python3.7/site-packages (3.3.2)\n", "Requirement already satisfied: termcolor>=1.1.0 in /home/pi/.local/lib/python3.7/site-packages (from tensorflow) (1.1.0)\n", "Requirement already satisfied: six>=1.10.0 in /usr/lib/python3/dist-packages (from tensorflow) (1.12.0)\n", "Requirement already satisfied: astor>=0.6.0 in /home/pi/.local/lib/python3.7/site-packages (from tensorflow) (0.8.1)\n", "Requirement already satisfied: tensorflow-estimator<1.15.0rc0,>=1.14.0rc0 in /home/pi/.local/lib/python3.7/site-packages (from tensorflow) (1.14.0)\n", "Requirement already satisfied: wrapt>=1.11.1 in /home/pi/.local/lib/python3.7/site-packages (from tensorflow) (1.12.0)\n", "Requirement already satisfied: tensorboard<2.1.0,>=2.0.0 in /home/pi/.local/lib/python3.7/site-packages (from tensorflow) (2.0.2)\n", "Requirement already satisfied: keras-preprocessing>=1.0.5 in /home/pi/.local/lib/python3.7/site-packages (from tensorflow) (1.1.0)\n", "Requirement already satisfied: protobuf>=3.6.1 in /home/pi/.local/lib/python3.7/site-packages (from tensorflow) (3.11.3)\n", "Requirement already satisfied: grpcio>=1.8.6 in /home/pi/.local/lib/python3.7/site-packages (from tensorflow) (1.27.2)\n", "Requirement already satisfied: wheel>=0.26; python_version >= \"3\" in /usr/lib/python3/dist-packages (from tensorflow) (0.32.3)\n", "Requirement already satisfied: google-pasta>=0.1.6 in /home/pi/.local/lib/python3.7/site-packages (from tensorflow) (0.1.8)\n", "Requirement already satisfied: keras-applications>=1.0.8 in /home/pi/.local/lib/python3.7/site-packages (from tensorflow) (1.0.8)\n", "Requirement already satisfied: absl-py>=0.7.0 in /home/pi/.local/lib/python3.7/site-packages (from tensorflow) (0.9.0)\n", "Requirement already satisfied: gast==0.2.2 in /home/pi/.local/lib/python3.7/site-packages (from tensorflow) (0.2.2)\n", "Requirement already satisfied: opt-einsum>=2.3.2 in /home/pi/.local/lib/python3.7/site-packages (from tensorflow) (3.1.0)\n", "Requirement already satisfied: certifi>=2020.06.20 in /home/pi/.local/lib/python3.7/site-packages (from matplotlib) (2020.6.20)\n", "Requirement already satisfied: kiwisolver>=1.0.1 in /home/pi/.local/lib/python3.7/site-packages (from matplotlib) (1.2.0)\n", "Requirement already satisfied: cycler>=0.10 in /home/pi/.local/lib/python3.7/site-packages (from matplotlib) (0.10.0)\n", "Requirement already satisfied: python-dateutil>=2.1 in /home/pi/.local/lib/python3.7/site-packages (from matplotlib) (2.8.1)\n", "Requirement already satisfied: pyparsing!=2.0.4,!=2.1.2,!=2.1.6,>=2.0.3 in /home/pi/.local/lib/python3.7/site-packages (from matplotlib) (2.4.7)\n", "Requirement already satisfied: google-auth<2,>=1.6.3 in /home/pi/.local/lib/python3.7/site-packages (from tensorboard<2.1.0,>=2.0.0->tensorflow) (1.11.2)\n", "Requirement already satisfied: werkzeug>=0.11.15 in /usr/lib/python3/dist-packages (from tensorboard<2.1.0,>=2.0.0->tensorflow) (0.14.1)\n", "Requirement already satisfied: google-auth-oauthlib<0.5,>=0.4.1 in /home/pi/.local/lib/python3.7/site-packages (from tensorboard<2.1.0,>=2.0.0->tensorflow) (0.4.1)\n", "Requirement already satisfied: markdown>=2.6.8 in /home/pi/.local/lib/python3.7/site-packages (from tensorboard<2.1.0,>=2.0.0->tensorflow) (3.2.1)\n", "Requirement already satisfied: setuptools>=41.0.0 in /home/pi/.local/lib/python3.7/site-packages (from tensorboard<2.1.0,>=2.0.0->tensorflow) (45.2.0)\n", "Requirement already satisfied: requests<3,>=2.21.0 in /usr/lib/python3/dist-packages (from tensorboard<2.1.0,>=2.0.0->tensorflow) (2.21.0)\n", "Requirement already satisfied: h5py in /home/pi/.local/lib/python3.7/site-packages (from keras-applications>=1.0.8->tensorflow) (2.10.0)\n", "Requirement already satisfied: pyasn1-modules>=0.2.1 in /home/pi/.local/lib/python3.7/site-packages (from google-auth<2,>=1.6.3->tensorboard<2.1.0,>=2.0.0->tensorflow) (0.2.8)\n", "Requirement already satisfied: cachetools<5.0,>=2.0.0 in /home/pi/.local/lib/python3.7/site-packages (from google-auth<2,>=1.6.3->tensorboard<2.1.0,>=2.0.0->tensorflow) (4.0.0)\n", "Requirement already satisfied: rsa<4.1,>=3.1.4 in /home/pi/.local/lib/python3.7/site-packages (from google-auth<2,>=1.6.3->tensorboard<2.1.0,>=2.0.0->tensorflow) (4.0)\n", "Requirement already satisfied: requests-oauthlib>=0.7.0 in /usr/lib/python3/dist-packages (from google-auth-oauthlib<0.5,>=0.4.1->tensorboard<2.1.0,>=2.0.0->tensorflow) (1.0.0)\n", "Requirement already satisfied: pyasn1<0.5.0,>=0.4.6 in /home/pi/.local/lib/python3.7/site-packages (from pyasn1-modules>=0.2.1->google-auth<2,>=1.6.3->tensorboard<2.1.0,>=2.0.0->tensorflow) (0.4.8)\n" ] } ], "source": [ "!pip3 install numpy pillow tensorflow matplotlib https://neo-ai-dlr-release.s3-us-west-2.amazonaws.com/v1.3.0/pi-armv7l-raspbian4.14.71-glibc2_24-libstdcpp3_4/dlr-1.3.0-py3-none-any.whl" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In order to perform inference, we will need to have an image to pass into our object detection model. As our model was trained on the [COCO dataset](https://cocodataset.org/#home), we pass an arbitrary image from the dataset into our model." ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " % Total % Received % Xferd Average Speed Time Time Time Current\n", " Dload Upload Total Spent Left Speed\n", "100 48109 0 48109 0 0 319k 0 --:--:-- --:--:-- --:--:-- 321k\n" ] } ], "source": [ "input_image_filename = './input_image.jpg'\n", "!curl https://farm9.staticflickr.com/8325/8077197378_79efb4805e_z.jpg --output {input_image_filename}" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "And here is how our chosen image looks:" ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 26, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "from matplotlib.pyplot import imshow\n", "from PIL import Image\n", "import numpy as np\n", "\n", "image_to_show = Image.open(input_image_filename)\n", "imshow(np.asarray(image_to_show))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Measure Inference Latency\n", "\n", "We will be measuring the inference latency of the SageMaker Neo-compiled model and comparing it to that of the original TensorFlow Lite model. The measurement process will be as follows:\n", "\n", "1. Load the image used for inference into a usable input format\n", "2. Perform 25 inferences in a tight loop in order to warm up the Raspberry Pi\n", "3. Perform 50 inferences in a tight loop, recording the time required for each\n", "4. Repeat for the next framework\n", "\n", "We first import all necessary packages:" ] }, { "cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [], "source": [ "import time\n", "import tensorflow as tf\n", "import dlr\n", "import numpy as np\n", "from PIL import Image\n", "import matplotlib.pyplot as plt" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Take a moment to note the versions of DLR (which executes the Neo-compiled model) and TensorFlow being used.\n", "\n", "> Models compiled using SageMaker Neo include a `libdlr.so` responsible for executing the model, so performance can vary even with the same DLR version, depending on the latest SageMaker Neo updates" ] }, { "cell_type": "code", "execution_count": 28, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "TensorFlow version: 1.14.0\n", "DLR version: 1.3.0\n" ] } ], "source": [ "print('TensorFlow version:', tf.__version__)\n", "print('DLR version:', dlr.__version__)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We now read in the image, as it will be used in both benchmarks, and set the values of constants." ] }, { "cell_type": "code", "execution_count": 29, "metadata": {}, "outputs": [], "source": [ "NUM_WARMUP = 25\n", "NUM_ACTUAL = 50\n", "\n", "image = Image.open(input_image_filename)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Our model expects a input image of size 300x300, so we need to resize our input image." ] }, { "cell_type": "code", "execution_count": 30, "metadata": {}, "outputs": [], "source": [ "resized_image = image.resize((300, 300))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "With the image read into memory and resized, it is now time to benchmark. \n", "\n", "#### Benchmark TensorFlow Lite\n", "\n", "We first measure latency when executing using TensorFlow Lite to get a baseline level of performance. " ] }, { "cell_type": "code", "execution_count": 31, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "95th percentile latency: 413.3474111557007 ms\n", "99th percentile latency: 422.1384835243225 ms\n" ] } ], "source": [ "def benchmark_tensorflow(image):\n", " \"\"\"Measure the p99 and p95 latencies of the TensorFlow Lite model.\"\"\"\n", " # Load in the model\n", " interpreter = tf.lite.Interpreter(model_path='./detect.tflite')\n", " # Required setup\n", " interpreter.allocate_tensors()\n", "\n", " # Get a handle so we know where to put the image\n", " input_details = interpreter.get_input_details()\n", " x = interpreter.tensor(input_details[0]['index'])\n", " \n", " # Our model is quantized, so we must convert the image to uint8\n", " x = np.array(image).astype('uint8')\n", " \n", " # TensorFlow Lite expects for there to be a 4th dimension for\n", " # batch size\n", " x = np.expand_dims(x, axis=0)\n", "\n", " \n", " # Warmup runs\n", " for i in range(NUM_WARMUP):\n", " interpreter.invoke()\n", "\n", " latencies = []\n", "\n", " for _ in range(NUM_ACTUAL):\n", " start_time = time.time()\n", " interpreter.invoke()\n", " elapsed_seconds = time.time() - start_time\n", " elapsed_ms = elapsed_seconds * 1000\n", " latencies.append(elapsed_ms)\n", "\n", " p99_latency = np.percentile(latencies, 99)\n", " p95_latency = np.percentile(latencies, 95)\n", " print(f'95th percentile latency: {p95_latency} ms')\n", " print(f'99th percentile latency: {p99_latency} ms')\n", " \n", " return (p99_latency, p95_latency)\n", "tf_p99_latency, tf_p95_latency = benchmark_tensorflow(resized_image)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Benchmark DLR/Neo\n", "\n", "Before we can run the model we compiled with neo, we first need to extract it from the tar.gz" ] }, { "cell_type": "code", "execution_count": 32, "metadata": {}, "outputs": [], "source": [ "!mkdir ./dlr_model" ] }, { "cell_type": "code", "execution_count": 33, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "compiled.meta\n", "libdlr.so\n", "dlr.h\n", "compiled.params\n", "compiled_model.json\n", "compiled.so\n" ] } ], "source": [ "!tar -xzvf ./compiled-detect.tar.gz --directory ./dlr_model" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now we measure the performance of our Neo-Compiled model." ] }, { "cell_type": "code", "execution_count": 34, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "2020-10-14 11:43:12,383 INFO Found libdlr.so in model artifact. Using dlr from ./dlr_model/libdlr.so\n", "2020-10-14 11:43:12,383 INFO Found libdlr.so in model artifact. Using dlr from ./dlr_model/libdlr.so\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "95th percentile latency: 252.2238850593567 ms\n", "99th percentile latency: 252.3707127571106 ms\n" ] } ], "source": [ "def benchmark_dlr(image):\n", " \"\"\"Measure the p99 and p95 latencies of the Neo/DLR model.\"\"\"\n", " # Load in the model\n", " model = dlr.DLRModel('./dlr_model', 'cpu', 0)\n", " x = np.array(image).astype('uint8')\n", "\n", " # Warmup runs\n", " for i in range(NUM_WARMUP):\n", " model.run(x)\n", "\n", " latencies = []\n", "\n", " # Actual measurements\n", " for i in range(NUM_ACTUAL):\n", " start_time = time.time()\n", " model.run(x)\n", " elapsed_seconds = time.time() - start_time\n", " elapsed_ms = elapsed_seconds * 1000\n", " latencies.append(elapsed_ms)\n", "\n", " p99_latency = np.percentile(latencies, 99)\n", " p95_latency = np.percentile(latencies, 95)\n", " print(f'95th percentile latency: {p95_latency} ms')\n", " print(f'99th percentile latency: {p99_latency} ms')\n", " \n", " return (p99_latency, p95_latency)\n", "neo_p99_latency, neo_p95_latency = benchmark_dlr(resized_image)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Calculate Speedup" ] }, { "cell_type": "code", "execution_count": 35, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "We see a 1.64x speedup at the 95th percentile after compiling with SageMaker Neo.\n" ] } ], "source": [ "print('We see a {0:1.2f}x speedup at the 95th percentile after compiling with SageMaker Neo.'\\\n", " .format(tf_p95_latency / neo_p95_latency))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "> Note: While Neo/DLR uses multithreading by default, TensorFlow Lite typically does not use multiple threads unless explicitly specified. The ability to set `num_threads` through the TensorFlow Lite Python API requires TensorFlow >= 2.3.0, so we are not able to use multithreaded TensorFlow Lite here as we only have TensorFlow Lite 1.14.0." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Plot Results\n", "\n", "With our benchmarking done, it is now time to plot our results. \n", "\n", "Pull together our results:" ] }, { "cell_type": "code", "execution_count": 36, "metadata": {}, "outputs": [], "source": [ "# TODO(dvisnty): Show speedup instead of raw latency?\n", "labels = ['TensorFlow Lite', 'Neo/DLR']\n", "p95s = np.array([tf_p95_latency, neo_p95_latency]).astype('int')\n", "p99s = np.array([tf_p99_latency, neo_p99_latency]).astype('int')\n", "x = np.arange(2)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Plot the results:" ] }, { "cell_type": "code", "execution_count": 37, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" }, { "data": { "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "width = 0.35 # the width of the bars\n", "fig, ax = plt.subplots()\n", "rects1 = ax.bar(x - width/2, p95s, width, label='95th Percentile Latency')\n", "rects2 = ax.bar(x + width/2, p99s, width, label='99th Percentile Latency')\n", "\n", "# Add some text for labels, title and custom x-axis tick labels, etc.\n", "ax.set_ylabel('Latency (ms)')\n", "ax.set_title('p95 and p99 Latency by Framework')\n", "ax.set_xticks(x)\n", "ax.set_xticklabels(labels)\n", "ax.legend()\n", "\n", "\n", "def autolabel(rects):\n", " \"\"\"Attach a text label above each bar in *rects*, displaying its height.\"\"\"\n", " for rect in rects:\n", " height = rect.get_height()\n", " ax.annotate(\n", " '{}'.format(height),\n", " xy=(rect.get_x() + rect.get_width() / 2, height),\n", " xytext=(0, 3), # 3 points vertical offset\n", " textcoords=\"offset points\",\n", " ha='center',\n", " va='bottom',\n", " )\n", "\n", "\n", "autolabel(rects1)\n", "autolabel(rects2)\n", "\n", "fig.tight_layout()\n", "\n", "plt.show()\n", "plt.savefig('latency_results.png')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Conclusion\n", "\n", "This tutorial showed how to compile an object detection model built in TensorFlow Lite for inference on a Raspberry Pi 3B using SageMaker Neo. It also walked through how to measure the performance benefits of compiling with SageMaker Neo, showing a speedup of approximately 60%.\n", "\n", "This notebook is only a brief introduction to the capabilities of SageMaker Neo, and it did not touch on other benefits such as memory usage reduction, portability, or the optimized use of accelerators. To learn more about these features, please visit the [SageMaker Neo documentaiton](https://docs.aws.amazon.com/sagemaker/latest/dg/neo.html)." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.2" } }, "nbformat": 4, "nbformat_minor": 4 }