{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "## Deploy Models for Inference\n", "\n", "1. [Introduction](#Introduction)\n", "2. [Prerequisites](#Prerequisites)\n", "3. [Setup](#Setup)\n", "4. [Deploy pretrained model to SageMaker Endpoint](#Deploy-pretrained-model-to-SageMaker-Endpoint)\n", " 1. [Model Config](#Model-Config)\n", " 2. [Option 1: Serverless Inference](#Option-1:-Serverless-Inference)\n", " 1. [Serverless Inference Deploy Config](#Serverless-Inference-Deploy-Config)\n", " 2. [Serverless Inference Deployment](#Serverless-Inference-Deployment)\n", " 3. [Option 2: Real-time Inference](#Option-2:-Real-time-Inference)\n", " 1. [Real-time Inference Deployment](#Real-time-Inference-Deployment)\n", "5. [Run Inference on Deployed Endpoint](#Run-Inference-on-Deployed-Endpoint)\n", " 1. [Create Predictor from Inference Endpoint](#Create-Predictor-from-Inference-Endpoint)\n", " 2. [Get trained Classes Info](#Get-trained-Classes-Info)\n", " 3. [Download sample test images from S3 for inference](#Download-sample-test-images-from-S3-for-inference)\n", " 4. [Get Predictions from local images](#Get-Predictions-from-local-images)\n", "6. [Clean Up](#Clean-Up)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Introduction\n", "After you build and train your models, you can deploy them to get predictions. SageMaker supports multiple deployment types for customers to choose from, based on the requirements, like Real-time inference, Serverless inference, Asynchronous inference, and Batch transform. To learn more about deploying models for inference using SageMaker refer [here](https://docs.aws.amazon.com/sagemaker/latest/dg/deploy-model.html). \n", "\n", "** Note: This Notebook was tested on Data Science Kernel for SageMaker Studio**" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Prerequisites\n", "\n", "To run this notebook, you can simply execute each cell in order. To understand what's happening, you'll need:\n", "\n", "- Prepare our model for deployment (either use the model from the previous modules or run the optional-prepare-data-and-model.ipynb notebook to create a new model)\n", "- Familiarity with Python and numpy\n", "- Basic familiarity with AWS S3.\n", "- Basic understanding of AWS Sagemaker.\n", "- Basic familiarity with AWS Command Line Interface (CLI) -- ideally, you should have it set up with credentials to access the AWS account you're running this notebook from.\n", "- SageMaker Studio is preferred for the full UI integration" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Setup\n", "\n", "Setting up the environment, load the libraries, and define the parameters for the entire notebook.\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import sagemaker\n", "from sagemaker import get_execution_role\n", "import boto3\n", "import botocore\n", "import json\n", "\n", "role = get_execution_role()\n", "sess = sagemaker.Session()\n", "region = boto3.Session().region_name\n", "\n", "# Feel free to specify different bucket/folders here if you wish.\n", "bucket = sess.default_bucket()\n", "prefix = \"BIRD-Sagemaker-Deployment\"\n", "\n", "TF_FRAMEWORK_VERSION = '2.4.1'\n", "ENDPOINT_INSTANCE_TYPE = 'ml.c5.4xlarge'" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Deploy pretrained model to SageMaker Endpoint" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Model Config\n", "\n", "You can use the bird model use created from the previous modules or run the `optional-prepare-data-and-model.ipynb` notebook to create a new model. Update the path to your model below as necessary." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# INPUT REQUIRED - Please enter the S3 URI of the model artifact\n", "# eg: s3://sagemaker-us-east-1-xxxxxxxx/BIRD-Sagemaker-Deployment/BIRD-Sagemaker-Deployment-2022-07-27-03-33-54-592/output/model.tar.gz\n", "bird_model_path = ''" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Choose the deployment option for Inference\n", "\n", "Run either Option 1 or Option 2" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Option 1: Serverless Inference\n", "\n", "Amazon SageMaker Serverless Inference is a new inference option that enables you to easily deploy machine learning models for inference without having to configure or manage the underlying infrastructure. Simply select the serverless option when deploying your machine learning model, and Amazon SageMaker automatically provisions, scales, and turns off compute capacity based on the volume of inference requests. With SageMaker Serverless Inference, you pay only for the duration of running the inference code and the amount of data processed, not for idle time. For more information on how Serverless Inference works visit [here](https://docs.aws.amazon.com/sagemaker/latest/dg/serverless-endpoints.html). " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Serverless Inference Deploy Config\n", "\n", "This object specifies configuration related to serverless endpoint. Use this configuration when trying to create serverless endpoint and make serverless inference\n", "\n", "Initialize a ServerlessInferenceConfig object for serverless inference configuration.\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import sagemaker.serverless as Serverless\n", "\n", "serverless_inf_config = Serverless.ServerlessInferenceConfig(memory_size_in_mb=4096, max_concurrency=5)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Serverless Inference Deployment\n", "\n", "Deploy the Model to a serverless endpoint and return a Predictor object to make serverless inference" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from sagemaker.tensorflow import TensorFlowModel\n", "\n", "model = TensorFlowModel(\n", " model_data=bird_model_path, \n", " role=role,\n", " framework_version=TF_FRAMEWORK_VERSION)\n", "\n", "\n", "predictor = model.deploy(serverless_inference_config=serverless_inf_config)\n", "tf_endpoint_name = str(predictor.endpoint_name)\n", "print(f\"Endpoint [{predictor.endpoint_name}] deployed\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The endpoint name will be displayed in the previous cell output when it's active and can also be seen under the SageMaker Resources option which is on the left side bar of the Studio as well. You will need it in the next section to create predictor from Inference endpoint\n", "\n", "![Active Endpoint](statics/active-sagemaker-endpoints.png)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Option 2: Real-time Inference" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Real-time Inference Deployment\n", "\n", "Real-time inference is ideal for inference workloads where you have real-time, interactive, low latency requirements. You can deploy your model to SageMaker hosting services and get an endpoint that can be used for inference. These endpoints are fully managed and support autoscaling.\n", "\n", "Deploy the Model to a real-time endpoint and return a Predictor object to make inference" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from sagemaker.tensorflow import TensorFlowModel\n", "\n", "model = TensorFlowModel(\n", " model_data=bird_model_path,\n", " role=role,\n", " framework_version=TF_FRAMEWORK_VERSION)\n", "\n", "\n", "predictor = model.deploy(initial_instance_count=1, instance_type=ENDPOINT_INSTANCE_TYPE)\n", "tf_endpoint_name = str(predictor.endpoint_name)\n", "print(f\"Endpoint [{predictor.endpoint_name}] deployed\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The endpoint name will be displayed in the previous cell output when it's active and can also be seen under the SageMaker Resources option which is on the left side bar of the Studio as well. You will need it in the next section to create predictor from Inference endpoint\n", "\n", "![Active Endpoint](statics/active-sagemaker-endpoints.png)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Run Inference on Deployed Endpoint" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Create Predictor from Inference Endpoint\n", "\n", "After the deployment is complete in the above step, capture the endpoint name from SageMaker console and input below in the Predictor config. We could have reused the predictor from above step that is returned after deploy is complete, but this section shows how you can create a predictor from an existing endpoint for inference." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from sagemaker import Predictor\n", "from sagemaker.serializers import IdentitySerializer\n", "from sagemaker.deserializers import JSONDeserializer\n", "\n", "serializer = IdentitySerializer(content_type=\"application/x-image\")\n", "deserializer = JSONDeserializer(accept='application/json')\n", "\n", "predictor = Predictor(endpoint_name=predictor.endpoint_name,serializer = serializer,deserializer = deserializer )" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Get trained Classes Info" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import cv_utils\n", "\n", "classes_file = f\"s3://{bucket}/{prefix}/full/data/classes.txt\"\n", "classes = [13, 17, 35, 36, 47, 68, 73, 87]\n", "\n", "possible_classes= cv_utils.get_classes_as_list(classes_file,classes)\n", "\n", "possible_classes" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Download sample test images from S3 for inference\n", "\n", "This cell downloads a random number of images (specified by value of 'n') from 'test' data set and use them for running inferences using our model." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "sample_images = cv_utils.get_n_random_images(bucket,prefix=f'{prefix}/outputs/test',n=2)\n", "\n", "local_paths = cv_utils.download_images_locally(bucket,sample_images)\n", "print(local_paths)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Get Predictions from local images" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "for inputfile in local_paths:\n", " print(inputfile)\n", " cv_utils.predict_bird_from_file(inputfile,predictor,possible_classes)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Clean Up" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "availableInstances": [ { "_defaultOrder": 0, "_isFastLaunch": true, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 4, "name": "ml.t3.medium", "vcpuNum": 2 }, { "_defaultOrder": 1, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 8, "name": "ml.t3.large", "vcpuNum": 2 }, { "_defaultOrder": 2, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 16, "name": "ml.t3.xlarge", "vcpuNum": 4 }, { "_defaultOrder": 3, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 32, "name": "ml.t3.2xlarge", "vcpuNum": 8 }, { "_defaultOrder": 4, "_isFastLaunch": true, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 8, "name": "ml.m5.large", "vcpuNum": 2 }, { "_defaultOrder": 5, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 16, "name": "ml.m5.xlarge", "vcpuNum": 4 }, { "_defaultOrder": 6, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 32, "name": "ml.m5.2xlarge", "vcpuNum": 8 }, { "_defaultOrder": 7, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 64, "name": "ml.m5.4xlarge", "vcpuNum": 16 }, { "_defaultOrder": 8, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 128, "name": "ml.m5.8xlarge", "vcpuNum": 32 }, { "_defaultOrder": 9, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 192, "name": "ml.m5.12xlarge", "vcpuNum": 48 }, { "_defaultOrder": 10, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 256, "name": "ml.m5.16xlarge", "vcpuNum": 64 }, { "_defaultOrder": 11, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 384, "name": "ml.m5.24xlarge", "vcpuNum": 96 }, { "_defaultOrder": 12, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 8, "name": "ml.m5d.large", "vcpuNum": 2 }, { "_defaultOrder": 13, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 16, "name": "ml.m5d.xlarge", "vcpuNum": 4 }, { "_defaultOrder": 14, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 32, "name": "ml.m5d.2xlarge", "vcpuNum": 8 }, { "_defaultOrder": 15, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 64, "name": "ml.m5d.4xlarge", "vcpuNum": 16 }, { "_defaultOrder": 16, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 128, "name": "ml.m5d.8xlarge", "vcpuNum": 32 }, { "_defaultOrder": 17, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 192, "name": "ml.m5d.12xlarge", "vcpuNum": 48 }, { "_defaultOrder": 18, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 256, "name": "ml.m5d.16xlarge", "vcpuNum": 64 }, { "_defaultOrder": 19, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 384, "name": "ml.m5d.24xlarge", "vcpuNum": 96 }, { "_defaultOrder": 20, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": true, "memoryGiB": 0, "name": "ml.geospatial.interactive", "supportedImageNames": [ "sagemaker-geospatial-v1-0" ], "vcpuNum": 0 }, { "_defaultOrder": 21, "_isFastLaunch": true, "category": "Compute optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 4, "name": "ml.c5.large", "vcpuNum": 2 }, { "_defaultOrder": 22, "_isFastLaunch": false, "category": "Compute optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 8, "name": "ml.c5.xlarge", "vcpuNum": 4 }, { "_defaultOrder": 23, "_isFastLaunch": false, "category": "Compute optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 16, "name": "ml.c5.2xlarge", "vcpuNum": 8 }, { "_defaultOrder": 24, "_isFastLaunch": false, "category": "Compute optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 32, "name": "ml.c5.4xlarge", "vcpuNum": 16 }, { "_defaultOrder": 25, "_isFastLaunch": false, "category": "Compute optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 72, "name": "ml.c5.9xlarge", "vcpuNum": 36 }, { "_defaultOrder": 26, "_isFastLaunch": false, "category": "Compute optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 96, "name": "ml.c5.12xlarge", "vcpuNum": 48 }, { "_defaultOrder": 27, "_isFastLaunch": false, "category": "Compute optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 144, "name": "ml.c5.18xlarge", "vcpuNum": 72 }, { "_defaultOrder": 28, "_isFastLaunch": false, "category": "Compute optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 192, "name": "ml.c5.24xlarge", "vcpuNum": 96 }, { "_defaultOrder": 29, "_isFastLaunch": true, "category": "Accelerated computing", "gpuNum": 1, "hideHardwareSpecs": false, "memoryGiB": 16, "name": "ml.g4dn.xlarge", "vcpuNum": 4 }, { "_defaultOrder": 30, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 1, "hideHardwareSpecs": false, "memoryGiB": 32, "name": "ml.g4dn.2xlarge", "vcpuNum": 8 }, { "_defaultOrder": 31, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 1, "hideHardwareSpecs": false, "memoryGiB": 64, "name": "ml.g4dn.4xlarge", "vcpuNum": 16 }, { "_defaultOrder": 32, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 1, "hideHardwareSpecs": false, "memoryGiB": 128, "name": "ml.g4dn.8xlarge", "vcpuNum": 32 }, { "_defaultOrder": 33, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 4, "hideHardwareSpecs": false, "memoryGiB": 192, "name": "ml.g4dn.12xlarge", "vcpuNum": 48 }, { "_defaultOrder": 34, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 1, "hideHardwareSpecs": false, "memoryGiB": 256, "name": "ml.g4dn.16xlarge", "vcpuNum": 64 }, { "_defaultOrder": 35, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 1, "hideHardwareSpecs": false, "memoryGiB": 61, "name": "ml.p3.2xlarge", "vcpuNum": 8 }, { "_defaultOrder": 36, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 4, "hideHardwareSpecs": false, "memoryGiB": 244, "name": "ml.p3.8xlarge", "vcpuNum": 32 }, { "_defaultOrder": 37, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 8, "hideHardwareSpecs": false, "memoryGiB": 488, "name": "ml.p3.16xlarge", "vcpuNum": 64 }, { "_defaultOrder": 38, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 8, "hideHardwareSpecs": false, "memoryGiB": 768, "name": "ml.p3dn.24xlarge", "vcpuNum": 96 }, { "_defaultOrder": 39, "_isFastLaunch": false, "category": "Memory Optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 16, "name": "ml.r5.large", "vcpuNum": 2 }, { "_defaultOrder": 40, "_isFastLaunch": false, "category": "Memory Optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 32, "name": "ml.r5.xlarge", "vcpuNum": 4 }, { "_defaultOrder": 41, "_isFastLaunch": false, "category": "Memory Optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 64, "name": "ml.r5.2xlarge", "vcpuNum": 8 }, { "_defaultOrder": 42, "_isFastLaunch": false, "category": "Memory Optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 128, "name": "ml.r5.4xlarge", "vcpuNum": 16 }, { "_defaultOrder": 43, "_isFastLaunch": false, "category": "Memory Optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 256, "name": "ml.r5.8xlarge", "vcpuNum": 32 }, { "_defaultOrder": 44, "_isFastLaunch": false, "category": "Memory Optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 384, "name": "ml.r5.12xlarge", "vcpuNum": 48 }, { "_defaultOrder": 45, "_isFastLaunch": false, "category": "Memory Optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 512, "name": "ml.r5.16xlarge", "vcpuNum": 64 }, { "_defaultOrder": 46, "_isFastLaunch": false, "category": "Memory Optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 768, "name": "ml.r5.24xlarge", "vcpuNum": 96 }, { "_defaultOrder": 47, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 1, "hideHardwareSpecs": false, "memoryGiB": 16, "name": "ml.g5.xlarge", "vcpuNum": 4 }, { "_defaultOrder": 48, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 1, "hideHardwareSpecs": false, "memoryGiB": 32, "name": "ml.g5.2xlarge", "vcpuNum": 8 }, { "_defaultOrder": 49, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 1, "hideHardwareSpecs": false, "memoryGiB": 64, "name": "ml.g5.4xlarge", "vcpuNum": 16 }, { "_defaultOrder": 50, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 1, "hideHardwareSpecs": false, "memoryGiB": 128, "name": "ml.g5.8xlarge", "vcpuNum": 32 }, { "_defaultOrder": 51, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 1, "hideHardwareSpecs": false, "memoryGiB": 256, "name": "ml.g5.16xlarge", "vcpuNum": 64 }, { "_defaultOrder": 52, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 4, "hideHardwareSpecs": false, "memoryGiB": 192, "name": "ml.g5.12xlarge", "vcpuNum": 48 }, { "_defaultOrder": 53, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 4, "hideHardwareSpecs": false, "memoryGiB": 384, "name": "ml.g5.24xlarge", "vcpuNum": 96 }, { "_defaultOrder": 54, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 8, "hideHardwareSpecs": false, "memoryGiB": 768, "name": "ml.g5.48xlarge", "vcpuNum": 192 } ], "instance_type": "ml.t3.medium", "kernelspec": { "display_name": "Python 3 (Data Science)", "language": "python", "name": "python3__SAGEMAKER_INTERNAL__arn:aws:sagemaker:us-east-1:081325390199:image/datascience-1.0" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.10" } }, "nbformat": 4, "nbformat_minor": 4 }