{
"cells": [
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true,
"jupyter": {
"outputs_hidden": true
}
},
"outputs": [],
"source": [
"# Install dependencies\n",
"!apt-get update\n",
"!apt-get install ffmpeg libsm6 libxext6 -y\n",
"!pip install -q smdebug\n",
"!pip install -q seaborn\n",
"!pip install -q plotly\n",
"!pip install -q opencv-python\n",
"!pip install -q shap\n",
"!pip install -q bokeh\n",
"!pip install -q imageio"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "3S0urcTM-uxM"
},
"source": [
"# Using SageMaker Neo to Compile a Tensorflow U-Net Model\n"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"---\n",
"\n",
"This notebook's CI test result for us-west-2 is as follows. CI test results in other regions can be found at the end of the notebook. \n",
"\n",
"\n",
"\n",
"---"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "3S0urcTM-uxM"
},
"source": [
"\n",
"[SageMaker Neo](https://aws.amazon.com/sagemaker/neo/) makes it easy to compile pre-trained TensorFlow models and build an inference optimized container without the need for any custom model serving or inference code.\n",
"\n",
"
\n",
"\n",
"[U-Net](https://paperswithcode.com/method/u-net) is an architecture for semantic segmentation. It's a popular model for biological images including Ultrasound, Microscopy, CT, MRI and more. \n",
"\n",
"In this example, we will show how deploy a pre-trained U-Net model to a SageMaker Endpoint with Neo compilation using the [SageMaker Python SDK](https://github.com/aws/sagemaker-python-sdk), and then use the models to perform inference requests. We also provide a performance comparison so you can see the benefits of model compilation."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "8zMksa4C-uxX"
},
"source": [
"## Setup\n",
"\n",
"First, we need to ensure we have SageMaker Python SDK 1.x and Tensorflow 1.15.x. Then, import necessary Python packages."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "AcGt7hZX-uxY",
"outputId": "bc1c8c6e-f204-4d04-eef5-4abbf18024d6",
"scrolled": true
},
"outputs": [],
"source": [
"!pip install -U --quiet --upgrade \"sagemaker\"\n",
"!pip install -U --quiet \"tensorflow==1.15.3\""
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import tarfile\n",
"import numpy as np\n",
"import sagemaker\n",
"import time\n",
"from sagemaker.utils import name_from_base"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "iWnz80a2-uxZ"
},
"source": [
"Next, we'll get the IAM execution role and a few other SageMaker specific variables from our notebook environment, so that SageMaker can access resources in your AWS account later in the example."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "2O3slomA-uxa"
},
"outputs": [],
"source": [
"from sagemaker import get_execution_role\n",
"from sagemaker.session import Session\n",
"\n",
"role = get_execution_role()\n",
"sess = Session()\n",
"region = sess.boto_region_name\n",
"bucket = sess.default_bucket()"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "Ij91QgHSAH-m"
},
"source": [
"SageMaker [Neo supports Tensorflow 1.15.x](https://docs.amazonaws.cn/en_us/sagemaker/latest/dg/neo-supported-cloud.html). Check your version of Tensorflow to prevent downstream framework errors."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "HTMoVu8M_-M3",
"outputId": "22fbf21b-254d-45ca-a2c3-b693dc9ef68d"
},
"outputs": [],
"source": [
"import tensorflow as tf\n",
"\n",
"print(tf.__version__) # This notebook runs on TensorFlow 1.15.x or earlier"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "RSxGrG4j-uxa"
},
"source": [
"## Download U-Net Model\n",
"\n",
"The SageMaker Neo TensorFlow Serving Container works with any model stored in TensorFlow's [SavedModel format](https://www.tensorflow.org/guide/saved_model). This could be the output of your own training job or a model trained elsewhere. For this example, we will use a pre-trained version of the U-Net model based on this [repo](https://github.com/kamalkraj/DATA-SCIENCE-BOWL-2018)."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "eqFeogKS-uxa",
"outputId": "ad9663e6-afb9-4706-a4a7-c2725e483f82"
},
"outputs": [],
"source": [
"model_name = \"unet_medical\"\n",
"export_path = \"export\"\n",
"model_archive_name = \"unet-medical.tar.gz\"\n",
"model_archive_url = \"https://sagemaker-neo-artifacts.s3.us-east-2.amazonaws.com/{}\".format(\n",
" model_archive_name\n",
")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"!wget {model_archive_url}"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The pre-trained model and its artifacts are saved in a compressed tar file (.tar.gz) so unzip first with:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"!tar -xvzf unet-medical.tar.gz"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "nJETn9oY-uxb"
},
"source": [
"After downloading the model, we can inspect it using TensorFlow's ``saved_model_cli`` command. In the command output, you should see \n",
"\n",
"```\n",
"MetaGraphDef with tag-set: 'serve' contains the following SignatureDefs:\n",
"\n",
"signature_def['serving_default']:\n",
"...\n",
"```\n",
"\n",
"The command output should also show details of the model inputs and outputs."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "J9wSkL4F-uxb",
"outputId": "c32a97b9-a312-4edf-ec86-6c8a401867d5"
},
"outputs": [],
"source": [
"import os\n",
"\n",
"model_path = os.path.join(export_path, \"Servo/1\")\n",
"!saved_model_cli show --all --dir {model_path}"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "5n6jhmSg-uxc"
},
"source": [
"Next we need to create a model archive file containing the exported model."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "z9kutpTP-uxd"
},
"source": [
"## Upload the model archive file to S3\n",
"\n",
"We now have a suitable model archive ready in our notebook. We need to upload it to S3 before we can create a SageMaker Model that. We'll use the SageMaker Python SDK to handle the upload."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "TocwZSw4-uxd"
},
"outputs": [],
"source": [
"model_data = Session().upload_data(path=model_archive_name, key_prefix=\"model\")\n",
"print(\"model uploaded to: {}\".format(model_data))"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "AuDTHa9v-uxd"
},
"source": [
"## Create a SageMaker Model and Endpoint\n",
"\n",
"Now that the model archive is in S3, we can create an unoptimized Model and deploy it to an \n",
"Endpoint."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "J3LQzaj5-uxd"
},
"outputs": [],
"source": [
"from sagemaker.tensorflow.serving import Model\n",
"\n",
"instance_type = \"ml.c4.xlarge\"\n",
"framework = \"TENSORFLOW\"\n",
"framework_version = \"1.15.3\""
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"sm_model = Model(model_data=model_data, framework_version=framework_version, role=role)\n",
"uncompiled_predictor = sm_model.deploy(initial_instance_count=1, instance_type=instance_type)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "ovafH7uH-uxe"
},
"source": [
"## Make predictions using the endpoint\n",
"\n",
"The endpoint is now up and running, and ready to handle inference requests. The `deploy` call above returned a `predictor` object. The `predict` method of this object handles sending requests to the endpoint. It also automatically handles JSON serialization of our input arguments, and JSON deserialization of the prediction results.\n",
"\n",
"We'll use this sample image:\n",
"\n",
"
"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"sample_img_fname = \"cell-4.png\"\n",
"sample_img_url = \"https://sagemaker-neo-artifacts.s3.us-east-2.amazonaws.com/{}\".format(\n",
" sample_img_fname\n",
")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"!wget {sample_img_url}"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "uBkEWKrP-uxe"
},
"outputs": [],
"source": [
"# read the image file into a tensor (numpy array)\n",
"import cv2\n",
"\n",
"image = cv2.imread(sample_img_fname)\n",
"original_shape = image.shape"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import matplotlib.pyplot as plt\n",
"\n",
"plt.imshow(image, cmap=\"gray\", interpolation=\"none\")\n",
"plt.show()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"image = np.resize(image, (256, 256, 3))\n",
"image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)\n",
"image = np.asarray(image)\n",
"image = np.expand_dims(image, axis=0)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"start_time = time.time()\n",
"\n",
"# get a prediction from the endpoint\n",
"# the image input is automatically converted to a JSON request.\n",
"# the JSON response from the endpoint is returned as a python dict\n",
"result = uncompiled_predictor.predict(image)\n",
"print(\"Prediction took %.2f seconds\" % (time.time() - start_time))"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# show the predicted segmentation image\n",
"\n",
"cutoff = 0.4\n",
"segmentation_img = np.squeeze(np.asarray(result[\"predictions\"])) > cutoff\n",
"segmentation_img = segmentation_img.astype(np.uint8)\n",
"segmentation_img = np.resize(segmentation_img, (original_shape[0], original_shape[1]))\n",
"plt.imshow(segmentation_img, \"gray\")\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Uncompiled Predictor Performance"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"shape_input = np.random.rand(1, 256, 256, 3)\n",
"uncompiled_results = []\n",
"\n",
"for _ in range(100):\n",
" start = time.time()\n",
" uncompiled_predictor.predict(image)\n",
" uncompiled_results.append((time.time() - start) * 1000)\n",
"\n",
"print(\"\\nPredictions for un-compiled model: \\n\")\n",
"print(\"\\nP95: \" + str(np.percentile(uncompiled_results, 95)) + \" ms\\n\")\n",
"print(\"P90: \" + str(np.percentile(uncompiled_results, 90)) + \" ms\\n\")\n",
"print(\"P50: \" + str(np.percentile(uncompiled_results, 50)) + \" ms\\n\")\n",
"print(\"Average: \" + str(np.average(uncompiled_results)) + \" ms\\n\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Compile model using SageMaker Neo"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Replace the value of data_shape below and\n",
"# specify the name & shape of the expected inputs for your trained model in JSON\n",
"# Note that -1 is replaced with 1 for the batch size placeholder\n",
"data_shape = {\"inputs\": [1, 224, 224, 3]}\n",
"\n",
"instance_family = \"ml_c4\"\n",
"\n",
"compilation_job_name = name_from_base(\"medical-tf-Neo\")\n",
"# output path for compiled model artifact\n",
"compiled_model_path = \"s3://{}/{}/output\".format(bucket, compilation_job_name)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"optimized_estimator = sm_model.compile(\n",
" target_instance_family=instance_family,\n",
" input_shape=data_shape,\n",
" job_name=compilation_job_name,\n",
" role=role,\n",
" framework=framework.lower(),\n",
" framework_version=framework_version,\n",
" output_path=compiled_model_path,\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Create Optimized Endpoint"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"optimized_predictor = optimized_estimator.deploy(\n",
" initial_instance_count=1, instance_type=instance_type\n",
")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"start_time = time.time()\n",
"\n",
"# get a prediction from the endpoint\n",
"# the image input is automatically converted to a JSON request.\n",
"# the JSON response from the endpoint is returned as a python dict\n",
"result = optimized_predictor.predict(image)\n",
"print(\"Prediction took %.2f seconds\" % (time.time() - start_time))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Compiled Predictor Performance"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"compiled_results = []\n",
"test_input = {\"instances\": np.asarray(shape_input).tolist()}\n",
"# Warmup inference.\n",
"optimized_predictor.predict(image)\n",
"# Inferencing 100 times.\n",
"for _ in range(100):\n",
" start = time.time()\n",
" optimized_predictor.predict(image)\n",
" compiled_results.append((time.time() - start) * 1000)\n",
"\n",
"print(\"\\nPredictions for compiled model: \\n\")\n",
"print(\"\\nP95: \" + str(np.percentile(compiled_results, 95)) + \" ms\\n\")\n",
"print(\"P90: \" + str(np.percentile(compiled_results, 90)) + \" ms\\n\")\n",
"print(\"P50: \" + str(np.percentile(compiled_results, 50)) + \" ms\\n\")\n",
"print(\"Average: \" + str(np.average(compiled_results)) + \" ms\\n\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Performance Comparison"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Here we compare inference speed up provided by SageMaker Neo. P90 is 90th percentile latency. We add this because it represents the tail of the latency distribution (worst case). More information on latency percentiles [here](https://blog.bramp.net/post/2018/01/16/measuring-percentile-latency/). "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"p90 = np.percentile(uncompiled_results, 90) / np.percentile(compiled_results, 90)\n",
"p50 = np.percentile(uncompiled_results, 50) / np.percentile(compiled_results, 50)\n",
"avg = np.average(uncompiled_results) / np.average(compiled_results)\n",
"\n",
"print(\"P90 Speedup: %.2f\" % p90)\n",
"print(\"P50 Speedup: %.2f\" % p50)\n",
"print(\"Average Speedup: %.2f\" % avg)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "WK615wgU-uxg"
},
"source": [
"## Additional Information\n",
"\n",
"## Cleaning up\n",
"\n",
"To avoid incurring charges to your AWS account for the resources used in this tutorial, you need to delete the SageMaker Endpoint."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "kTSKcyHW-uxg"
},
"outputs": [],
"source": [
"uncompiled_predictor.delete_endpoint()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"optimized_predictor.delete_endpoint()"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"## Notebook CI Test Results\n",
"\n",
"This notebook was tested in multiple regions. The test results are as follows, except for us-west-2 which is shown at the top of the notebook.\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n"
]
}
],
"metadata": {
"colab": {
"name": "sagemaker-neo-tf-hub-example.ipynb",
"provenance": []
},
"kernelspec": {
"display_name": "Python 3 (TensorFlow 2.10.0 Python 3.9 CPU Optimized)",
"language": "python",
"name": "python3__SAGEMAKER_INTERNAL__arn:aws:sagemaker:us-west-2:236514542706:image/tensorflow-2.10.1-cpu-py39-ubuntu20.04-sagemaker-v1.2"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.16"
}
},
"nbformat": 4,
"nbformat_minor": 4
}