{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Image classification at low latency with TensorFlow serving on Amazon SageMaker. \n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In this notebook, we walkthrough 3 ways of serving an image classification model with TensorFlow serving on SageMaker endpoints. \n",
    "\n",
    "1. Default with no custom inference script. This is adapted from https://github.com/aws/amazon-sagemaker-examples/tree/master/sagemaker-python-sdk/tensorflow_serving_container and expects preprocessing to be run at the client side. \n",
    "2. Custom inference script for preprocessing and triggering TFS internally via REST. This includes the flexibility for custom preprocessing of image byte stream, but is comparatively slower than option #3. \n",
    "3. Custom inference script for preprocessing and triggering TFS internally via gRPC. This includes the flexibility for custom preprocessing of image byte stream and 75% reduction of latency over option #2. "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Setup\n",
    "\n",
    "First, we need to ensure we have an up-to-date version of the SageMaker Python SDK, and install a few\n",
    "additional python packages."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "import sagemaker"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'2.33.0'"
      ]
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "sagemaker.__version__"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!pip install tensorflow==2.4.1 -U "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Next, we'll get the IAM execution role from our notebook environment, so that SageMaker can access resources in your AWS account later in the example."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from sagemaker import get_execution_role\n",
    "\n",
    "sagemaker_role = get_execution_role()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Download and prepare a model from TensorFlow Hub\n",
    "\n",
    "The TensorFlow Serving Container works with any model stored in TensorFlow's [SavedModel format](https://www.tensorflow.org/guide/saved_model). This could be the output of your own training job or a model trained elsewhere. For this example, we will use a pre-trained version of the MobileNet V2 image classification model. There are 2 options to retrieve the pre-trained model, 1) [TensorFlow Hub](https://tfhub.dev/) and 2) [Keras applications](https://keras.io/api/applications/mobilenet/#mobilenetv2-function). You can refer the differences in [stackoverflow.](https://stackoverflow.com/questions/60251715/difference-between-keras-and-tensorflow-hub-version-of-mobilenetv2)\n",
    "We will use option 2 and get the model from keras applications.\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [],
   "source": [
    "import tensorflow as tf"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 42,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'2.4.1'"
      ]
     },
     "execution_count": 42,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "tf.__version__"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [],
   "source": [
    "#2 options : get the model from tensorflow hub  and get the model from keras applications\n",
    "#refer differences here : https://stackoverflow.com/questions/60251715/difference-between-keras-and-tensorflow-hub-version-of-mobilenetv2\n",
    "\n",
    "#option 1 ((logit outputs need to add softmax))\n",
    "#hub_url = 'https://tfhub.dev/google/imagenet/mobilenet_v2_140_224/classification/4'\n",
    "#model = tf.keras.Sequential([\n",
    "#    hub.KerasLayer(hub_url)\n",
    "#])\n",
    "#model.build([None, 224, 224, 3])  # Batch input shape.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 61,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Downloading data from https://storage.googleapis.com/tensorflow/keras-applications/mobilenet_v2/mobilenet_v2_weights_tf_dim_ordering_tf_kernels_1.0_224.h5\n",
      "14540800/14536120 [==============================] - 0s 0us/step\n",
      "INFO:tensorflow:Assets written to: model/1/assets\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "INFO:tensorflow:Assets written to: model/1/assets\n"
     ]
    }
   ],
   "source": [
    "#option 2 \n",
    "from tensorflow.keras.applications.mobilenet_v2 import MobileNetV2, preprocess_input\n",
    "model = MobileNetV2()\n",
    "model.save('model/1/')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 62,
   "metadata": {},
   "outputs": [],
   "source": [
    "model_path = 'model/1/'"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "After exporting the model, we can inspect it using TensorFlow's ``saved_model_cli`` command. In the command output, you should see \n",
    "\n",
    "```\n",
    "MetaGraphDef with tag-set: 'serve' contains the following SignatureDefs:\n",
    "\n",
    "signature_def['serving_default']:\n",
    "...\n",
    "```\n",
    "\n",
    "The command output should also show details of the model inputs and outputs."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!saved_model_cli show --all --dir {model_path}"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "model.summary()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Next we need to create a model archive file containing the exported model."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Create a model archive file\n",
    "\n",
    "SageMaker models need to be packaged in `.tar.gz` files. When your endpoint is provisioned, the files in the archive will be extracted and put in `/opt/ml/model/` on the endpoint. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 66,
   "metadata": {},
   "outputs": [],
   "source": [
    "!tar -C \"$PWD\" -czf model.tar.gz model/"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Upload the model archive file to S3\n",
    "\n",
    "We now have a suitable model archive ready in our notebook. We need to upload it to S3 before we can create a SageMaker Model that. We'll use the SageMaker Python SDK to handle the upload."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from sagemaker.session import Session\n",
    "\n",
    "model_data = Session().upload_data(path='model.tar.gz', key_prefix='mobilenet')\n",
    "print('model uploaded to: {}'.format(model_data))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Create a SageMaker Model and Endpoint\n",
    "\n",
    "Now that the model archive is in S3, we can create a Model and deploy it to an \n",
    "Endpoint with a few lines of python code:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 69,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "WARNING:sagemaker.deprecations:The class sagemaker.tensorflow.serving.Model has been renamed in sagemaker>=2.\n",
      "See: https://sagemaker.readthedocs.io/en/stable/v2.html for details.\n",
      "WARNING:sagemaker.deprecations:update_endpoint is a no-op in sagemaker>=2.\n",
      "See: https://sagemaker.readthedocs.io/en/stable/v2.html for details.\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "-----------------!"
     ]
    }
   ],
   "source": [
    "from sagemaker.tensorflow.serving import Model\n",
    "\n",
    "model = Model(model_data=model_data, role=sagemaker_role, framework_version='2.4.1')\n",
    "predictor = model.deploy(initial_instance_count=1, instance_type='ml.g4dn.xlarge')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Make predictions using the endpoint\n",
    "\n",
    "The endpoint is now up and running, and ready to handle inference requests. The `deploy` call above returned a `predictor` object. The `predict` method of this object handles sending requests to the endpoint. It also automatically handles JSON serialization of our input arguments, and JSON deserialization of the prediction results.\n",
    "\n",
    "We'll use these sample images:\n",
    "\n",
    "<img src=\"kitten.jpg\" align=\"left\" style=\"padding: 8px;\">\n",
    "<img src=\"bee.jpg\" style=\"padding: 8px;\">"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 71,
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "metadata": {},
   "outputs": [],
   "source": [
    "from tensorflow.keras.preprocessing import image\n",
    "from PIL import Image\n",
    "HEIGHT = 224\n",
    "WIDTH  = 224\n",
    "def image_file_to_tensor(path):\n",
    "    img = Image.open(path).convert('RGB')\n",
    "    img = img.resize((WIDTH, HEIGHT))\n",
    "    img_array = image.img_to_array(img) #, data_format = \"channels_first\")\n",
    "    # the image is now in an array of shape (224, 224, 3) or (3, 224, 224) based on data_format\n",
    "    # need to expand it to add dim for num samples, e.g. (1, 224, 224, 3)\n",
    "    x = np.expand_dims(img_array, axis=0)\n",
    "    instance = preprocess_input(x)\n",
    "    return instance"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# read the image files into a tensor (numpy array)\n",
    "kitten_image = image_file_to_tensor('kitten.jpg')\n",
    "print(kitten_image.shape)\n",
    "# get a prediction from the endpoint\n",
    "# the image input is automatically converted to a JSON request.\n",
    "# the JSON response from the endpoint is returned as a python dict\n",
    "result = predictor.predict(kitten_image)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 80,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "Predictions for TF2 serving default: \n",
      "\n",
      "\n",
      "P95: 360.88467836380005 ms\n",
      "\n",
      "P90: 355.2844762802124 ms\n",
      "\n",
      "P50: 310.48285961151123 ms\n",
      "\n",
      "Average: 310.48285961151123 ms\n",
      "\n"
     ]
    }
   ],
   "source": [
    "import time\n",
    "results = []\n",
    "for i in range(1,1000):\n",
    "    start = time.time()\n",
    "    kitten_image = image_file_to_tensor('kitten.jpg')\n",
    "    predictor.predict(kitten_image)\n",
    "    results.append((time.time() - start) * 1000)\n",
    "print(\"\\nPredictions for TF2 serving default: \\n\")\n",
    "print('\\nP95: ' + str(np.percentile(results, 95)) + ' ms\\n')    \n",
    "print('P90: ' + str(np.percentile(results, 90)) + ' ms\\n')\n",
    "print('Average: ' + str(np.average(results)) + ' ms\\n')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 41,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "s3://sagemaker-us-east-1-436518610213/mobilenet/model.tar.gz\n"
     ]
    }
   ],
   "source": [
    "print(model_data)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Custom inference script for preprocessing and REST communication with TensorFlow serving"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 43,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "update_endpoint is a no-op in sagemaker>=2.\n",
      "See: https://sagemaker.readthedocs.io/en/stable/v2.html for details.\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "-------------------!"
     ]
    }
   ],
   "source": [
    "from sagemaker.tensorflow.serving import TensorFlowModel\n",
    "\n",
    "model2 = TensorFlowModel(source_dir='code',entry_point='inference.py',model_data=model_data, role=sagemaker_role, framework_version='2.4.1', env = {'PREDICT_USING_GRPC' : 'false'})\n",
    "#'ml.g4dn.xlarge'\n",
    "predictor2 = model2.deploy(initial_instance_count=1, instance_type='ml.g4dn.xlarge')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 45,
   "metadata": {},
   "outputs": [],
   "source": [
    "import boto3\n",
    "import numpy as np\n",
    "kitten_image = open('kitten.jpg', 'rb').read()\n",
    "endpoint_name = predictor2.endpoint_name\n",
    "runtime_client = boto3.client('runtime.sagemaker')\n",
    "response = runtime_client.invoke_endpoint(EndpointName=endpoint_name, \n",
    "                                   ContentType='application/x-image', \n",
    "                                   Body=kitten_image)\n",
    "result = response['Body'].read().decode('ascii')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 47,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "Predictions for TF2 serving REST API: \n",
      "\n",
      "\n",
      "P95: 279.7132968902588 ms\n",
      "\n",
      "P90: 278.2435894012451 ms\n",
      "\n",
      "Average: 266.48592948913574 ms\n",
      "\n"
     ]
    }
   ],
   "source": [
    "import time\n",
    "results = []\n",
    "for i in range(1,1000):\n",
    "    start = time.time()\n",
    "    kitten_image = open('kitten.jpg', 'rb').read()\n",
    "    response = runtime_client.invoke_endpoint(EndpointName=endpoint_name, \n",
    "                                   ContentType='application/x-image', \n",
    "                                   Body=kitten_image)\n",
    "    results.append((time.time() - start) * 1000)\n",
    "print(\"\\nPredictions for TF2 serving REST API: \\n\")\n",
    "print('\\nP95: ' + str(np.percentile(results, 95)) + ' ms\\n')    \n",
    "print('P90: ' + str(np.percentile(results, 90)) + ' ms\\n')\n",
    "print('Average: ' + str(np.average(results)) + ' ms\\n')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Custom inference script for preprocessing and gRPC communication with TensorFlow serving"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "update_endpoint is a no-op in sagemaker>=2.\n",
      "See: https://sagemaker.readthedocs.io/en/stable/v2.html for details.\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "-----------------!"
     ]
    }
   ],
   "source": [
    "from sagemaker.tensorflow.serving import TensorFlowModel\n",
    "\n",
    "model2 = TensorFlowModel(source_dir='code',entry_point='inference.py',model_data=model_data, role=sagemaker_role, framework_version='2.4.1', env = {'PREDICT_USING_GRPC' : 'true'})\n",
    "#'ml.g4dn.xlarge'\n",
    "predictor2 = model2.deploy(initial_instance_count=1, instance_type='ml.g4dn.xlarge')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {},
   "outputs": [],
   "source": [
    "import boto3\n",
    "kitten_image = open('kitten.jpg', 'rb').read()\n",
    "endpoint_name = predictor2.endpoint_name\n",
    "runtime_client = boto3.client('runtime.sagemaker')\n",
    "response = runtime_client.invoke_endpoint(EndpointName=endpoint_name, \n",
    "                                   ContentType='application/x-image', \n",
    "                                   Body=kitten_image)\n",
    "result = response['Body'].read().decode('ascii')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "Predictions for TF2 serving with gRPC : \n",
      "\n",
      "\n",
      "P95: 76.57002210617065 ms\n",
      "\n",
      "P90: 74.57253932952881 ms\n",
      "\n",
      "P50: 58.59267711639404 ms\n",
      "\n",
      "Average: 58.59267711639404 ms\n",
      "\n"
     ]
    }
   ],
   "source": [
    "import time\n",
    "results = []\n",
    "for i in range(1,1000):\n",
    "    start = time.time()\n",
    "    kitten_image = open('kitten.jpg', 'rb').read()\n",
    "    response = runtime_client.invoke_endpoint(EndpointName=endpoint_name, \n",
    "                                   ContentType='application/x-image', \n",
    "                                   Body=kitten_image)\n",
    "    results.append((time.time() - start) * 1000)\n",
    "print(\"\\nPredictions for TF2 serving with gRPC : \\n\")\n",
    "print('\\nP95: ' + str(np.percentile(results, 95)) + ' ms\\n')    \n",
    "print('P90: ' + str(np.percentile(results, 90)) + ' ms\\n')\n",
    "print('Average: ' + str(np.average(results)) + ' ms\\n')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Additional Information\n",
    "\n",
    "The TensorFlow Serving Container supports additional features not covered in this notebook, including support for:\n",
    "\n",
    "- TensorFlow Serving REST API requests:classify and regress requests\n",
    "- CSV input\n",
    "- Other JSON formats\n",
    "\n",
    "For information on how to use these features, refer to the documentation in the \n",
    "[SageMaker Python SDK](https://github.com/aws/sagemaker-python-sdk/blob/master/src/sagemaker/tensorflow/deploying_tensorflow_serving.rst).\n",
    "\n",
    "## Cleaning up\n",
    "\n",
    "To avoid incurring charges to your AWS account for the resources used in this tutorial, you need to delete the SageMaker Endpoint."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "predictor.delete_endpoint()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "range(1, 100)"
      ]
     },
     "execution_count": 1,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "range(1,100)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "instance_type": "ml.m5.large",
  "kernelspec": {
   "display_name": "Python 3 (Data Science)",
   "language": "python",
   "name": "python3__SAGEMAKER_INTERNAL__arn:aws:sagemaker:us-east-1:081325390199:image/datascience-1.0"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.10"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}