{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Deploy a Trained TensorFlow V2 Model\n",
"\n",
"In this notebook, we walk through the process of deploying a trained model to a SageMaker endpoint. If you recently ran [the notebook for training](get_started_mnist_deploy.ipynb) with %store% magic, the `model_data` can be restored. Otherwise, we retrieve the \n",
"model artifact from a public S3 bucket."
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"# setups\n",
"\n",
"import os\n",
"import json\n",
"\n",
"import sagemaker\n",
"from sagemaker.tensorflow import TensorFlowModel\n",
"from sagemaker import get_execution_role, Session\n",
"import boto3\n",
"\n",
"# Get global config\n",
"with open(\"code/config.json\", \"r\") as f:\n",
" CONFIG = json.load(f)\n",
"\n",
"sess = Session()\n",
"role = get_execution_role()\n",
"\n",
"%store -r tf_mnist_model_data\n",
"\n",
"\n",
"try:\n",
" tf_mnist_model_data\n",
"except NameError:\n",
" import json\n",
"\n",
" # copy a pretrained model from a public bucket to your default bucket\n",
" s3 = boto3.client(\"s3\")\n",
" bucket = CONFIG[\"public_bucket\"]\n",
" key = \"datasets/image/MNIST/model/tensorflow-training-2020-11-20-23-57-13-077/model.tar.gz\"\n",
" s3.download_file(bucket, key, \"model.tar.gz\")\n",
" tf_mnist_model_data = sess.upload_data(\n",
" path=\"model.tar.gz\", bucket=sess.default_bucket(), key_prefix=\"model/tensorflow\"\n",
" )\n",
" os.remove(\"model.tar.gz\")"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"s3://sagemaker-ap-southeast-2-431579215499/tensorflow/mnist/tensorflow-training-2022-06-06-03-53-29-103/output/model.tar.gz\n"
]
}
],
"source": [
"print(tf_mnist_model_data)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## TensorFlow Model Object\n",
"\n",
"The `TensorFlowModel` class allows you to define an environment for making inference using your\n",
"model artifact. Like `TensorFlow` estimator class we discussed \n",
"[in this notebook for training an Tensorflow model](\n",
"get_started_mnist_train.ipynb), it is high level API used to set up a docker image for your model hosting service.\n",
"\n",
"Once it is properly configured, it can be used to create a SageMaker\n",
"endpoint on an EC2 instance. The SageMaker endpoint is a containerized environment that uses your trained model \n",
"to make inference on incoming data via RESTful API calls. \n",
"\n",
"Some common parameters used to initiate the `TensorFlowModel` class are:\n",
"- role: An IAM role to make AWS service requests\n",
"- model_data: the S3 bucket URI of the compressed model artifact. It can be a path to a local file if the endpoint \n",
"is to be deployed on the SageMaker instance you are using to run this notebook (local mode)\n",
"- framework_version: version of the MXNet package to be used\n",
"- py_version: python version to be used"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [],
"source": [
"model = TensorFlowModel(\n",
" role=role,\n",
" model_data=tf_mnist_model_data,\n",
" framework_version=\"2.3.1\",\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Execute the Inference Container\n",
"Once the `TensorFlowModel` class is initiated, we can call its `deploy` method to run the container for the hosting\n",
"service. Some common parameters needed to call `deploy` methods are:\n",
"\n",
"- initial_instance_count: the number of SageMaker instances to be used to run the hosting service.\n",
"- instance_type: the type of SageMaker instance to run the hosting service. Set it to `local` if you want run the hosting service on the local SageMaker instance. Local mode are typically used for debugging. \n",
"\n",
" Note: local mode is not supported in SageMaker Studio "
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"update_endpoint is a no-op in sagemaker>=2.\n",
"See: https://sagemaker.readthedocs.io/en/stable/v2.html for details.\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"----!"
]
}
],
"source": [
"from sagemaker.serializers import JSONSerializer\n",
"from sagemaker.deserializers import JSONDeserializer\n",
"\n",
"# set local_mode to False if you want to deploy on a remote\n",
"# SageMaker instance\n",
"\n",
"local_mode = False\n",
"\n",
"if local_mode:\n",
" instance_type = \"local\"\n",
"else:\n",
" instance_type = \"ml.c4.xlarge\"\n",
"\n",
"predictor = model.deploy(\n",
" initial_instance_count=1,\n",
" instance_type=instance_type,\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Making Predictions Against a SageMaker endpoint\n",
"\n",
"Once you have the `Predictor` instance returned by `model.deploy(...)`, you can send prediction requests to your endpoints. In this case, the model accepts normalized \n",
"batch images in depth-minor convention. "
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"{'predictions': [[-3.07496858, -1.92263961, 0.40784055, 2.68889284, -4.51269293, 1.26628399, -0.371436179, -1.60043943, -1.73814678, -1.44925392], [-2.78310299, -2.50181174, 0.196516275, 2.38543034, -4.89259768, 1.30999255, -0.287920415, -1.23828626, -1.47772789, -2.17345548], [-2.62833285, -2.2909236, 0.589692771, 2.63945913, -4.41077375, 0.810409307, -0.695943296, -1.64216757, -1.58362, -1.63197041], [-2.6050365, -2.26944327, 0.382466316, 2.50133753, -4.92548084, 1.26441634, 0.0326526649, -1.47540545, -1.75987828, -1.9871254]]}\n"
]
}
],
"source": [
"# use some dummy inputs\n",
"import numpy as np\n",
"\n",
"dummy_inputs = {\"instances\": np.random.rand(4, 28, 28, 1)}\n",
"\n",
"res = predictor.predict(dummy_inputs)\n",
"print(res)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The formats of the input and output data correspond directly to the request and response\n",
"format of the `Predict` method in [TensorFlow Serving REST API](https://www.tensorflow.org/tfx/serving/api_rest), for example, the key of the array to be \n",
"parsed to the model in the `dummy_inputs` needs to be called `instances`. Moreover, the input data needs to have a batch dimension. "
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [],
"source": [
"# Uncomment the following lines to see an example that cannot be processed by the endpoint\n",
"\n",
"# dummy_data = {\n",
"# 'instances': np.random.rand(28, 28, 1).tolist()\n",
"# }\n",
"# print(predictor.predict(inputs))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now, let's use real MNIST test to test the endpoint. We use helper functions defined in `code.utils` to \n",
"download MNIST data set and normalize the input data."
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "\n",
"text/plain": [
""
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"from utils.mnist import mnist_to_numpy, normalize\n",
"import random\n",
"import matplotlib.pyplot as plt\n",
"\n",
"%matplotlib inline\n",
"\n",
"data_dir = \"/tmp/data\"\n",
"X, _ = mnist_to_numpy(data_dir, train=False)\n",
"\n",
"# randomly sample 16 images to inspect\n",
"mask = random.sample(range(X.shape[0]), 16)\n",
"samples = X[mask]\n",
"\n",
"# plot the images\n",
"fig, axs = plt.subplots(nrows=1, ncols=16, figsize=(16, 1))\n",
"\n",
"for i, splt in enumerate(axs):\n",
" splt.imshow(samples[i])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Since the model accepts normalized input, you will need to normalize the samples before \n",
"sending it to the endpoint. "
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [],
"source": [
"samples = normalize(samples, axis=(1, 2))\n",
"predictions = predictor.predict(np.expand_dims(samples, 3))[\"predictions\"] # add channel dim\n",
"\n",
"# softmax to logit\n",
"predictions = np.array(predictions, dtype=np.float32)\n",
"predictions = np.argmax(predictions, axis=1)"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Predictions: [8, 2, 2, 3, 3, 8, 4, 8, 9, 0, 1, 8, 3, 6, 1, 9]\n"
]
}
],
"source": [
"print(\"Predictions: \", predictions.tolist())"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## (Optional) Clean up \n",
"\n",
"If you do not plan to use the endpoint, you should delete it to free up some computation \n",
"resource. If you use local, you will need to manually delete the docker container bounded\n",
"at port 8080 (the port that listens to the incoming request).\n"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [],
"source": [
"import os\n",
"\n",
"if not local_mode:\n",
" predictor.delete_endpoint()\n",
"else:\n",
" os.system(\"docker container ls | grep 8080 | awk '{print $1}' | xargs docker container rm -f\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"instance_type": "ml.t3.medium",
"kernelspec": {
"display_name": "conda_python3",
"language": "python",
"name": "conda_python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.12"
}
},
"nbformat": 4,
"nbformat_minor": 4
}