{ "cells": [ { "cell_type": "markdown", "id": "6d6ee543", "metadata": {}, "source": [ "# Implement a SageMaker Multi-Model Endpoint for TensorFlow Vision models on a Triton Server from NVIDIA" ] }, { "cell_type": "markdown", "id": "36910094", "metadata": {}, "source": [ "---\n", "\n", "This notebook's CI test result for us-west-2 is as follows. CI test results in other regions can be found at the end of the notebook. \n", "\n", "![This us-west-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/us-west-2/inference|cv|realtime|Triton|multi-model|tensorflow-backend|triton-cv-mme-tensorflow-backend.ipynb)\n", "\n", "---" ] }, { "cell_type": "markdown", "id": "7634d547", "metadata": {}, "source": [ "Amazon SageMaker Multi-Model Endpoint (MME) is a cost-effective way of running multiple models behind a single endpoint. SageMaker manages the process of loading the target model into memory when needed, which leads to better utilization of the container resources and reduces cost. \n", "\n", "Multi-model endpoints are ideal when you have infrequently used models that can handle minor delays introduced by an occasional cold start. \n", "\n", "NVIDIA Triton Inference Server is an open source software that provides high performance inference on a wide variety of CPU and GPU hardware and supports all the major ML frameworks. It has many built-in features to improve inference throughput and achieves better utilization of the resources. \n", "\n", "Now the NVIDIA Triton Inference Server can be deployed on GPU based SageMaker ML instances. It supports the SageMaker MME API to for dynamic loading and unloading of models for implementing SageMaker multi-model endpoints. \n", "\n", "This notebook shows how to deploy multiple TensorFlow models trained on the MNIST dataset to a SageMaker MME using the NVIDIA Triton Server.\n", "\n", "Here we use two different instances of an existing model artifact. The model used here was pre-trained on the MNIST dataset. If you want to learn how to train the model, please See [TensorFlow script mode training and serving](https://github.com/aws/amazon-sagemaker-examples/blob/main/sagemaker-python-sdk/tensorflow_script_mode_training_and_serving/tensorflow_script_mode_training_and_serving.ipynb). \n", "\n", "## Contents\n", "1. [Introduction to NVIDIA Triton Server](#section1)\n", "1. [Set up the environment](#section2)\n", "1. [Transform TensorFlow Model structure](#section3)\n", " 1. [Inspect the model using a CLI command](#section3a)\n", " 1. [Create the model configuration file](#section3b)\n", " 1. [Create the tar ball in the required Triton structure](#section3c)\n", " 1. [Upload model artifact to S3](#section3d)\n", " 1. [Create additional instances of the model in S3 for testing MME](#section3e)\n", "1. [Deploy model to SageMaker Triton Server MME](#section4)\n", "1. [Test the SageMaker Triton Server MME](#section5)\n", "1. [Clean up](#section6)" ] }, { "cell_type": "markdown", "id": "51ab88ee", "metadata": {}, "source": [ "\n", "\n", "## Introduction to NVIDIA Triton Server\n", "\n", "[NVIDIA Triton Inference Server](https://github.com/triton-inference-server/server/) was developed specifically to enable scalable, cost-effective, and easy deployment of models in production. NVIDIA Triton Inference Server is open-source inference serving software that simplifies the inference serving process and provides high inference performance.\n", "\n", "Some key features of Triton are:\n", "* **Support for Multiple frameworks**: Triton can be used to deploy models from all major frameworks. Triton supports TensorFlow, ONNX, PyTorch, and many other model formats. \n", "* **Model pipelines**: Triton model ensemble represents a pipeline of one or more models or pre- / post-processing logic and the connection of input and output tensors between them. A single inference request to an ensemble will trigger the execution of the entire pipeline.\n", "* **Concurrent model execution**: Multiple models (or multiple instances of the same model) can run simultaneously on the same GPU or on multiple GPUs for different model management needs.\n", "* **Dynamic batching**: For models that support batching, Triton has multiple built-in scheduling and batching algorithms that combine individual inference requests together to improve inference throughput. These scheduling and batching decisions are transparent to the client requesting inference.\n", "* **Diverse CPUs and GPUs**: The models can be executed on CPUs or GPUs for maximum flexibility and to support heterogeneous computing requirements.\n", "\n" ] }, { "cell_type": "markdown", "id": "5cf5f5fc", "metadata": {}, "source": [ "\n", "\n", "## Set up the environment\n", "\n", "This notebook uses the Python 3 (Data Science) kernel. \n", "\n" ] }, { "cell_type": "markdown", "id": "c10317c2-8d0d-4f2e-9dd4-f1c9f047da81", "metadata": { "tags": [] }, "source": [ "#### Install TensorFlow. This notebook is tested with version 2.11." ] }, { "cell_type": "code", "execution_count": null, "id": "947d6ae3-62de-4a7b-ba5d-2fe0ddcae9d8", "metadata": { "scrolled": true, "tags": [] }, "outputs": [], "source": [ "import sys\n", "\n", "!{sys.executable} -m pip install \"tensorflow>=2.1,<2.12\"" ] }, { "cell_type": "code", "execution_count": null, "id": "009da43c-5a3a-49ab-880c-716fbf895d9f", "metadata": { "tags": [] }, "outputs": [], "source": [ "# imports\n", "\n", "import boto3, json, sagemaker, time\n", "from sagemaker import get_execution_role\n", "import numpy as np\n", "from PIL import Image\n", "import gzip\n", "import matplotlib.pyplot as plt" ] }, { "cell_type": "markdown", "id": "421ba117-dc81-4afb-811e-e06de60baf2d", "metadata": { "tags": [] }, "source": [ "#### For this exercise we download a TensorFlow model pre-trained on the MNIST data set from an Amazon S3 bucket. The model artifact is saved locally." ] }, { "cell_type": "code", "execution_count": null, "id": "bc94f271-62e2-4e7d-8c36-43dee13a3bb6", "metadata": { "tags": [] }, "outputs": [], "source": [ "!mkdir -p models/SavedModel/\n", "s3 = boto3.client(\"s3\")\n", "s3.download_file(\n", " f\"sagemaker-example-files-prod-{boto3.session.Session().region_name}\",\n", " \"datasets/image/MNIST/model/tensorflow-training-2020-11-20-23-57-13-077/model.tar.gz\",\n", " \"models/SavedModel/model.tar.gz\",\n", ")" ] }, { "cell_type": "markdown", "id": "e6029992-a7bc-410f-9fb4-b1d277830f14", "metadata": { "tags": [] }, "source": [ "#### You should have already configured the default IAM role for running this notebook with access to the model artifacts and the NVIDIA Triton Server image in Amazon Elastic Container Registry (ECR)." ] }, { "cell_type": "code", "execution_count": null, "id": "7b779b59-cde6-4b85-9449-22aa64a0e291", "metadata": { "tags": [] }, "outputs": [], "source": [ "sm_session = sagemaker.Session()\n", "role = sagemaker.get_execution_role()\n", "bucket_name = sm_session.default_bucket()\n", "region = boto3.Session().region_name\n", "\n", "print(f\"Default IAM Role: {role}\")\n", "print(f\"Default S3 Bucket: {bucket_name}\")\n", "print(f\"AWS Region: {region}\")" ] }, { "cell_type": "markdown", "id": "ffbaab37-f9f7-46e4-8c6e-47c72492d12e", "metadata": {}, "source": [ "#### Download the Triton Server image from Amazon ECR." ] }, { "cell_type": "code", "execution_count": null, "id": "218bf56e-9549-469e-b76a-60b280cdd7c9", "metadata": { "tags": [] }, "outputs": [], "source": [ "# Amazon ECR images are region specific\n", "triton_server_version = \"23.02\"\n", "\n", "account_id_map = {\n", " \"us-east-1\": \"785573368785\",\n", " \"us-east-2\": \"007439368137\",\n", " \"us-west-1\": \"710691900526\",\n", " \"us-west-2\": \"301217895009\",\n", " \"eu-west-1\": \"802834080501\",\n", " \"eu-west-2\": \"205493899709\",\n", " \"eu-west-3\": \"254080097072\",\n", " \"eu-north-1\": \"601324751636\",\n", " \"eu-south-1\": \"966458181534\",\n", " \"eu-central-1\": \"746233611703\",\n", " \"ap-east-1\": \"110948597952\",\n", " \"ap-south-1\": \"763008648453\",\n", " \"ap-northeast-1\": \"941853720454\",\n", " \"ap-northeast-2\": \"151534178276\",\n", " \"ap-southeast-1\": \"324986816169\",\n", " \"ap-southeast-2\": \"355873309152\",\n", " \"cn-northwest-1\": \"474822919863\",\n", " \"cn-north-1\": \"472730292857\",\n", " \"sa-east-1\": \"756306329178\",\n", " \"ca-central-1\": \"464438896020\",\n", " \"me-south-1\": \"836785723513\",\n", " \"af-south-1\": \"774647643957\",\n", "}\n", "\n", "if region not in account_id_map.keys():\n", " raise (\"UNSUPPORTED REGION\")\n", "\n", "base = \"amazonaws.com.cn\" if region.startswith(\"cn-\") else \"amazonaws.com\"\n", "\n", "mme_triton_image_uri = \"{account_id}.dkr.ecr.{region}.{base}/sagemaker-tritonserver:{triton_server_version}-py3\".format(\n", " account_id=account_id_map[region],\n", " region=region,\n", " base=base,\n", " triton_server_version=triton_server_version,\n", ")\n", "\n", "print(f\"Triton server image: {mme_triton_image_uri}\")" ] }, { "cell_type": "code", "execution_count": null, "id": "39414be7", "metadata": { "tags": [] }, "outputs": [], "source": [ "# Extract the model into a local folder\n", "\n", "!tar -xf models/SavedModel/model.tar.gz -C models/SavedModel/ --no-same-owner" ] }, { "cell_type": "markdown", "id": "a1e5faca", "metadata": {}, "source": [ "\n", "\n", "## Transform TensorFlow Model structure\n", "\n", "\n", "The model that we want to deploy currently has the following structure:\n", "\n", "```\n", "00000000\n", " ├── saved_model.pb\n", " ├── assets/\n", " └── variables/\n", " ├── variables.data-00000-of-00001\n", " └── variables.index\n", "```\n", "For Triton, the model needs to have the following structure:\n", "```\n", "\n", "├── config.pbtxt\n", "└── 1\n", " └── model.savedmodel\n", " ├── saved_model.pb\n", " ├── assets/\n", " └── variables/\n", " ├── variables.data-00000-of-00001\n", " └── variables.index\n", " \n", "\n", "```" ] }, { "cell_type": "code", "execution_count": null, "id": "392b33db", "metadata": { "tags": [] }, "outputs": [], "source": [ "prefix = \"triton-mme\"\n", "\n", "! mkdir -p models/$prefix/MNIST1/1\n", "! cp models/SavedModel/00000000 --recursive ./models/$prefix/MNIST1/1/model.savedmodel/" ] }, { "cell_type": "markdown", "id": "4c21b8be", "metadata": {}, "source": [ "\n", "\n", "### Inspect the model using a CLI command\n", "\n", "In order to create the `config.pbtxt` we need to confirm the model inputs and outputs (Signature).\n", "We use a CLI command to inspect the model and take note of the input and output shape." ] }, { "cell_type": "code", "execution_count": null, "id": "42b58467", "metadata": { "tags": [] }, "outputs": [], "source": [ "!saved_model_cli show --all --dir {\"models/SavedModel/00000000\"}" ] }, { "cell_type": "markdown", "id": "1332701d", "metadata": {}, "source": [ "\n", "### Create the `config.pbtxt` file\n", "\n", "Triton requires a [Model Configuration file](https://github.com/triton-inference-server/server/blob/main/docs/model_configuration.md) called `config.pbtxt`. \n", "\n", "We create one below in the local folder for uploading with the model artifact.\n" ] }, { "cell_type": "code", "execution_count": null, "id": "0f843f41", "metadata": { "tags": [] }, "outputs": [], "source": [ "%%writefile models/triton-mme/MNIST1/config.pbtxt\n", "platform: \"tensorflow_savedmodel\"\n", "max_batch_size: 0\n", "\n", "instance_group {\n", " count: 1\n", " kind: KIND_GPU\n", "}\n", "\n", "dynamic_batching {\n", "\n", "}\n", "\n", "input [\n", " {\n", " name: \"input_1\"\n", " data_type: TYPE_FP32\n", " dims: [-1, 28, 28, 1]\n", " }\n", "]\n", "output [\n", " {\n", " name: \"output_1\"\n", " data_type: TYPE_FP32\n", " dims: [-1, 10]\n", " }\n", "]" ] }, { "cell_type": "markdown", "id": "311b6185", "metadata": {}, "source": [ "\n", "### Create a tar ball of the model in the required folder structure for the Triton Server" ] }, { "cell_type": "code", "execution_count": null, "id": "119fa6bc-0637-4986-a677-28f21be6d7a8", "metadata": { "tags": [] }, "outputs": [], "source": [ "!tar -C models/triton-mme -czvf models/triton-mme/TritonModel.tar.gz MNIST1/" ] }, { "cell_type": "markdown", "id": "a5169d20-1118-442a-8947-384b19a2e18d", "metadata": {}, "source": [ "\n", "\n", "### Upload model artifact to S3." ] }, { "cell_type": "code", "execution_count": null, "id": "a38082c2-d928-4285-a39f-d597f97ba4ff", "metadata": { "tags": [] }, "outputs": [], "source": [ "model_file = \"TritonModel.tar.gz\"\n", "\n", "s3_client = boto3.client(\"s3\")\n", "\n", "# load the first model to S3\n", "s3_client.upload_file(\n", " Filename=f\"models/triton-mme/{model_file}\",\n", " Bucket=f\"{bucket_name}\",\n", " Key=f\"{prefix}/{model_file}\",\n", ")" ] }, { "cell_type": "markdown", "id": "68e7c26b-1c29-4f18-b7ae-bf77c41038e8", "metadata": {}, "source": [ "\n", "### Create additional instances of the model in S3 for Inference using SageMaker MME\n", "For testing the MME we create two copies of the compressed model artifact under different names in the S3 folder. In the cell below, you can specify how many model instances to create." ] }, { "cell_type": "code", "execution_count": null, "id": "8b64745d-0e0d-434c-abc4-d010e60ad7f7", "metadata": {}, "outputs": [], "source": [ "# Create additonal instances of the model\n", "tf_model_count = 2 # create 2 instances\n", "\n", "# Save the model names in an array\n", "tf_models_mnist = []\n", "\n", "# Make copies of the original model in S3\n", "for i in range(tf_model_count):\n", " tf_models_mnist.append(f\"TritonModel{i}.tar.gz\") # append model name to array\n", "\n", " response = s3_client.copy_object(\n", " CopySource=f\"{bucket_name}/{prefix}/{model_file}\",\n", " Bucket=f\"{bucket_name}\", # Destination bucket\n", " Key=f\"{prefix}/{tf_models_mnist[i]}\", # Destination path/filename\n", " )\n", " print(f\"Added model {tf_models_mnist[i]} in S3\")" ] }, { "cell_type": "markdown", "id": "f9fb51ac-b002-4195-80bb-a81663639310", "metadata": {}, "source": [ "\n", "## Deploy TensorFlow models to a Multi-Model Endpoint for Triton Server " ] }, { "cell_type": "markdown", "id": "4d0888d9-0105-4d3d-ae9f-013905604471", "metadata": {}, "source": [ "### Define the serving container\n", "\n", "In the container definition below, we need to pass in the following parameters.\n", "- Image: Triton server image URI that supports deploying multi-model endpoints with GPUs.\n", "- URI to S3 folder that contains all the models that SageMaker multi-model endpoint will use to load and serve predictions. \n", "- Mode: Set to MultiModel " ] }, { "cell_type": "code", "execution_count": null, "id": "b7fe881c-7eda-4dfe-b0e3-dbeb76e737ff", "metadata": { "tags": [] }, "outputs": [], "source": [ "model_data_url = f\"s3://{bucket_name}/{prefix}/\"\n", "\n", "container = {\"Image\": mme_triton_image_uri, \"ModelDataUrl\": model_data_url, \"Mode\": \"MultiModel\"}" ] }, { "cell_type": "markdown", "id": "9bd551fb-d2a1-4781-b669-0bdb95397862", "metadata": {}, "source": [ "### Create a model object using the container defined above\n", "\n", "Create the model object using the Boto3 create_model API. We pass the container definition to the create model API along with the model name and execution role." ] }, { "cell_type": "code", "execution_count": null, "id": "6f6d0ecf-8568-4e75-afa4-3a2e068126e9", "metadata": { "tags": [] }, "outputs": [], "source": [ "ts = time.strftime(\"%Y-%m-%d-%H-%M-%S\", time.gmtime())\n", "\n", "sm_model_name = f\"{prefix}-mdl-{ts}\"\n", "\n", "sm_client = boto3.client(service_name=\"sagemaker\")\n", "\n", "create_model_response = sm_client.create_model(\n", " ModelName=sm_model_name, ExecutionRoleArn=role, PrimaryContainer=container\n", ")\n", "\n", "print(\"Model Arn: \" + create_model_response[\"ModelArn\"])" ] }, { "cell_type": "markdown", "id": "e0f17525-ebcd-4f95-ac53-cb9b1890a964", "metadata": {}, "source": [ "## Deploy and test the Multi-Model endpoint\n", "\n", "Create a multi-model endpoint configurations using the create_endpoint_config Boto3 API. \n", "We specify an accelerated GPU computing instance as the instance type. For testing we specify a single instance. In real scenarios we recommend the value of initial instance count to be two or higher for high availability. " ] }, { "cell_type": "markdown", "id": "e1d3be28-d399-4b6b-939e-9d5b4d5ed3aa", "metadata": { "tags": [] }, "source": [ "### Create endpoint configuration" ] }, { "cell_type": "code", "execution_count": null, "id": "6eb8f2d6-39b2-4920-8726-1d9f5ca1bc62", "metadata": { "tags": [] }, "outputs": [], "source": [ "endpoint_config_name = f\"{prefix}-epc-{ts}\"\n", "\n", "create_endpoint_config_response = sm_client.create_endpoint_config(\n", " EndpointConfigName=endpoint_config_name,\n", " ProductionVariants=[\n", " {\n", " \"InstanceType\": \"ml.g4dn.xlarge\",\n", " \"InitialVariantWeight\": 1,\n", " \"InitialInstanceCount\": 1,\n", " \"ModelName\": sm_model_name,\n", " \"VariantName\": \"AllTraffic\",\n", " }\n", " ],\n", ")\n", "\n", "print(\"Endpoint Config Arn: \" + create_endpoint_config_response[\"EndpointConfigArn\"])" ] }, { "cell_type": "markdown", "id": "1b6acdf5-11e5-45ef-9b5b-51685ef517dc", "metadata": {}, "source": [ "### Create Multi-Model endpoint\n", "\n", "Using the above endpoint configuration we create a new SageMaker endpoint and wait for the deployment to finish. The status will change to *In Service* once the deployment is successful." ] }, { "cell_type": "code", "execution_count": null, "id": "3e245eb2-27f2-4dad-ae3e-4a5aac829baf", "metadata": { "tags": [] }, "outputs": [], "source": [ "endpoint_name = f\"{prefix}-ep-{ts}\"\n", "\n", "create_endpoint_response = sm_client.create_endpoint(\n", " EndpointName=endpoint_name, EndpointConfigName=endpoint_config_name\n", ")\n", "\n", "print(\"Endpoint Arn: \" + create_endpoint_response[\"EndpointArn\"])" ] }, { "cell_type": "code", "execution_count": null, "id": "0abf6d28-b8f0-46f5-9ff2-c2e797ce5859", "metadata": { "tags": [] }, "outputs": [], "source": [ "resp = sm_client.describe_endpoint(EndpointName=endpoint_name)\n", "status = resp[\"EndpointStatus\"]\n", "print(\"Status: \" + status)\n", "\n", "while status == \"Creating\":\n", " time.sleep(60)\n", " resp = sm_client.describe_endpoint(EndpointName=endpoint_name)\n", " status = resp[\"EndpointStatus\"]\n", " print(\"Status: \" + status)\n", "\n", "print(\"Arn: \" + resp[\"EndpointArn\"])\n", "print(\"Status: \" + status)" ] }, { "cell_type": "markdown", "id": "2d0a4507-6ef6-4505-8823-4906e17cface", "metadata": {}, "source": [ "\n", "## Invoke target models behind the Multi-Model endpoint\n", "\n", "Once the endpoint is successfully created, we can send inference requests to the multi-model endpoint using invoke_endpoint API. We specify the target model in the invocation call and pass in the payload for each model type." ] }, { "cell_type": "markdown", "id": "591928b2-40df-41fa-918e-75bba1ebe487", "metadata": {}, "source": [ "### Let's download some test data" ] }, { "cell_type": "code", "execution_count": null, "id": "0f0f4903-ee28-4087-ac20-b06caf184160", "metadata": { "tags": [] }, "outputs": [], "source": [ "s3.download_file(\n", " f\"sagemaker-example-files-prod-{boto3.session.Session().region_name}\",\n", " \"datasets/image/MNIST/t10k-images-idx3-ubyte.gz\",\n", " \"t10k-images-idx3-ubyte.gz\",\n", ")\n", "s3.download_file(\n", " f\"sagemaker-example-files-prod-{boto3.session.Session().region_name}\",\n", " \"datasets/image/MNIST/t10k-labels-idx1-ubyte.gz\",\n", " \"t10k-labels-idx1-ubyte.gz\",\n", ")" ] }, { "cell_type": "code", "execution_count": null, "id": "5ba7ba5f-c365-4aa5-a091-4e81d19b228b", "metadata": { "tags": [] }, "outputs": [], "source": [ "# open the images file and extract the first 10 images\n", "file = gzip.open(\"t10k-images-idx3-ubyte.gz\", \"r\")\n", "\n", "record_count = 10\n", "\n", "file.read(16) # skip first 16 bytes of metadata\n", "buf = file.read(28 * 28 * record_count)\n", "train_data = np.frombuffer(buf, dtype=np.uint8).astype(np.float32)\n", "train_data = train_data.reshape(record_count, 28, 28, 1)\n", "\n", "# open the labels file and extract the first 10 labels\n", "file = gzip.open(\"t10k-labels-idx1-ubyte.gz\", \"r\")\n", "\n", "train_labels = np.array([])\n", "file.read(8) # skip first 8 bytes of metadata\n", "for i in range(0, record_count):\n", " buf = file.read(1)\n", " # label = np.frombuffer(buf, dtype=np.uint8).astype(np.int64)\n", " label = np.frombuffer(buf, dtype=np.int8)\n", " train_labels = np.append(train_labels, label)\n", "\n", "plt.imshow(np.asarray(train_data[0]).squeeze())\n", "plt.show()\n", "\n", "print(f\"Label: {train_labels[0]}\")" ] }, { "cell_type": "code", "execution_count": null, "id": "827b29f1-2a6f-4a47-9c8a-755216375dd4", "metadata": { "scrolled": true, "tags": [] }, "outputs": [], "source": [ "%%time\n", "\n", "runtime_sm_client = boto3.client(\"sagemaker-runtime\")\n", "\n", "# Run through all the models for inference twice\n", "# In the first invocation models will be loaded from memory\n", "# In subsequent invocations they should be found in the cache\n", "\n", "for iter in range(2):\n", " for tf_model in tf_models_mnist: # Invoke the models for inference\n", " print(f\"\\nModel invoked: {tf_model}\")\n", "\n", " for i in range(record_count):\n", " payload = {\n", " \"inputs\": [\n", " {\n", " \"name\": \"input_1\",\n", " \"shape\": [1, 28, 28, 1],\n", " \"datatype\": \"FP32\",\n", " \"data\": train_data[i].tolist(),\n", " }\n", " ]\n", " }\n", "\n", " response = runtime_sm_client.invoke_endpoint(\n", " EndpointName=endpoint_name,\n", " ContentType=\"application/octet-stream\",\n", " Body=json.dumps(payload),\n", " TargetModel=tf_model,\n", " )\n", "\n", " predictions = json.loads(response[\"Body\"].read())[\"outputs\"][0][\"data\"]\n", " predictions = np.array(predictions, dtype=np.float32)\n", " predictions = np.argmax(predictions)\n", " print(f\"Predicted value: {predictions},\\tActual value: {int(train_labels[i])}\")" ] }, { "cell_type": "markdown", "id": "f0f2a7d2", "metadata": {}, "source": [ "\n", "## Clean up\n", "We strongly recommend deleting the real-time endpoint created to stop incurring cost when finished with the example." ] }, { "cell_type": "code", "execution_count": null, "id": "8f9e893f", "metadata": {}, "outputs": [], "source": [ "# sm_client.delete_endpoint(EndpointName=endpoint_name)\n", "# sm_client.delete_endpoint_config(EndpointConfigName=endpoint_config_name)\n", "# sm_client.delete_model(ModelName=sm_model_name)\n", "\n", "print(f\"Deleted Endpoint: {endpoint_name}\")\n", "print(f\"Deleted Endpoint Config: {endpoint_config_name}\")\n", "print(f\"Deleted Model: {sm_model_name}\")" ] }, { "cell_type": "markdown", "id": "0c385eec", "metadata": {}, "source": [ "## Notebook CI Test Results\n", "\n", "This notebook was tested in multiple regions. The test results are as follows, except for us-west-2 which is shown at the top of the notebook.\n", "\n", "![This us-east-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/us-east-1/inference|cv|realtime|Triton|multi-model|tensorflow-backend|triton-cv-mme-tensorflow-backend.ipynb)\n", "\n", "![This us-east-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/us-east-2/inference|cv|realtime|Triton|multi-model|tensorflow-backend|triton-cv-mme-tensorflow-backend.ipynb)\n", "\n", "![This us-west-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/us-west-1/inference|cv|realtime|Triton|multi-model|tensorflow-backend|triton-cv-mme-tensorflow-backend.ipynb)\n", "\n", "![This ca-central-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/ca-central-1/inference|cv|realtime|Triton|multi-model|tensorflow-backend|triton-cv-mme-tensorflow-backend.ipynb)\n", "\n", "![This sa-east-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/sa-east-1/inference|cv|realtime|Triton|multi-model|tensorflow-backend|triton-cv-mme-tensorflow-backend.ipynb)\n", "\n", "![This eu-west-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/eu-west-1/inference|cv|realtime|Triton|multi-model|tensorflow-backend|triton-cv-mme-tensorflow-backend.ipynb)\n", "\n", "![This eu-west-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/eu-west-2/inference|cv|realtime|Triton|multi-model|tensorflow-backend|triton-cv-mme-tensorflow-backend.ipynb)\n", "\n", "![This eu-west-3 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/eu-west-3/inference|cv|realtime|Triton|multi-model|tensorflow-backend|triton-cv-mme-tensorflow-backend.ipynb)\n", "\n", "![This eu-central-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/eu-central-1/inference|cv|realtime|Triton|multi-model|tensorflow-backend|triton-cv-mme-tensorflow-backend.ipynb)\n", "\n", "![This eu-north-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/eu-north-1/inference|cv|realtime|Triton|multi-model|tensorflow-backend|triton-cv-mme-tensorflow-backend.ipynb)\n", "\n", "![This ap-southeast-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/ap-southeast-1/inference|cv|realtime|Triton|multi-model|tensorflow-backend|triton-cv-mme-tensorflow-backend.ipynb)\n", "\n", "![This ap-southeast-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/ap-southeast-2/inference|cv|realtime|Triton|multi-model|tensorflow-backend|triton-cv-mme-tensorflow-backend.ipynb)\n", "\n", "![This ap-northeast-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/ap-northeast-1/inference|cv|realtime|Triton|multi-model|tensorflow-backend|triton-cv-mme-tensorflow-backend.ipynb)\n", "\n", "![This ap-northeast-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/ap-northeast-2/inference|cv|realtime|Triton|multi-model|tensorflow-backend|triton-cv-mme-tensorflow-backend.ipynb)\n", "\n", "![This ap-south-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/ap-south-1/inference|cv|realtime|Triton|multi-model|tensorflow-backend|triton-cv-mme-tensorflow-backend.ipynb)\n" ] } ], "metadata": { "availableInstances": [ { "_defaultOrder": 0, "_isFastLaunch": true, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 4, "name": "ml.t3.medium", "vcpuNum": 2 }, { "_defaultOrder": 1, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 8, "name": "ml.t3.large", "vcpuNum": 2 }, { "_defaultOrder": 2, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 16, "name": "ml.t3.xlarge", "vcpuNum": 4 }, { "_defaultOrder": 3, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 32, "name": "ml.t3.2xlarge", "vcpuNum": 8 }, { "_defaultOrder": 4, "_isFastLaunch": true, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 8, "name": "ml.m5.large", "vcpuNum": 2 }, { "_defaultOrder": 5, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 16, "name": "ml.m5.xlarge", "vcpuNum": 4 }, { "_defaultOrder": 6, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 32, "name": "ml.m5.2xlarge", "vcpuNum": 8 }, { "_defaultOrder": 7, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 64, "name": "ml.m5.4xlarge", "vcpuNum": 16 }, { "_defaultOrder": 8, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 128, "name": "ml.m5.8xlarge", "vcpuNum": 32 }, { "_defaultOrder": 9, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 192, "name": "ml.m5.12xlarge", "vcpuNum": 48 }, { "_defaultOrder": 10, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 256, "name": "ml.m5.16xlarge", "vcpuNum": 64 }, { "_defaultOrder": 11, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 384, "name": "ml.m5.24xlarge", "vcpuNum": 96 }, { "_defaultOrder": 12, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 8, "name": "ml.m5d.large", "vcpuNum": 2 }, { "_defaultOrder": 13, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 16, "name": "ml.m5d.xlarge", "vcpuNum": 4 }, { "_defaultOrder": 14, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 32, "name": "ml.m5d.2xlarge", "vcpuNum": 8 }, { "_defaultOrder": 15, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 64, "name": "ml.m5d.4xlarge", "vcpuNum": 16 }, { "_defaultOrder": 16, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 128, "name": "ml.m5d.8xlarge", "vcpuNum": 32 }, { "_defaultOrder": 17, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 192, "name": "ml.m5d.12xlarge", "vcpuNum": 48 }, { "_defaultOrder": 18, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 256, "name": "ml.m5d.16xlarge", "vcpuNum": 64 }, { "_defaultOrder": 19, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 384, "name": "ml.m5d.24xlarge", "vcpuNum": 96 }, { "_defaultOrder": 20, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": true, "memoryGiB": 0, "name": "ml.geospatial.interactive", "supportedImageNames": [ "sagemaker-geospatial-v1-0" ], "vcpuNum": 0 }, { "_defaultOrder": 21, "_isFastLaunch": true, "category": "Compute optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 4, "name": "ml.c5.large", "vcpuNum": 2 }, { "_defaultOrder": 22, "_isFastLaunch": false, "category": "Compute optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 8, "name": "ml.c5.xlarge", "vcpuNum": 4 }, { "_defaultOrder": 23, "_isFastLaunch": false, "category": "Compute optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 16, "name": "ml.c5.2xlarge", "vcpuNum": 8 }, { "_defaultOrder": 24, "_isFastLaunch": false, "category": "Compute optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 32, "name": "ml.c5.4xlarge", "vcpuNum": 16 }, { "_defaultOrder": 25, "_isFastLaunch": false, "category": "Compute optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 72, "name": "ml.c5.9xlarge", "vcpuNum": 36 }, { "_defaultOrder": 26, "_isFastLaunch": false, "category": "Compute optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 96, "name": "ml.c5.12xlarge", "vcpuNum": 48 }, { "_defaultOrder": 27, "_isFastLaunch": false, "category": "Compute optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 144, "name": "ml.c5.18xlarge", "vcpuNum": 72 }, { "_defaultOrder": 28, "_isFastLaunch": false, "category": "Compute optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 192, "name": "ml.c5.24xlarge", "vcpuNum": 96 }, { "_defaultOrder": 29, "_isFastLaunch": true, "category": "Accelerated computing", "gpuNum": 1, "hideHardwareSpecs": false, "memoryGiB": 16, "name": "ml.g4dn.xlarge", "vcpuNum": 4 }, { "_defaultOrder": 30, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 1, "hideHardwareSpecs": false, "memoryGiB": 32, "name": "ml.g4dn.2xlarge", "vcpuNum": 8 }, { "_defaultOrder": 31, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 1, "hideHardwareSpecs": false, "memoryGiB": 64, "name": "ml.g4dn.4xlarge", "vcpuNum": 16 }, { "_defaultOrder": 32, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 1, "hideHardwareSpecs": false, "memoryGiB": 128, "name": "ml.g4dn.8xlarge", "vcpuNum": 32 }, { "_defaultOrder": 33, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 4, "hideHardwareSpecs": false, "memoryGiB": 192, "name": "ml.g4dn.12xlarge", "vcpuNum": 48 }, { "_defaultOrder": 34, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 1, "hideHardwareSpecs": false, "memoryGiB": 256, "name": "ml.g4dn.16xlarge", "vcpuNum": 64 }, { "_defaultOrder": 35, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 1, "hideHardwareSpecs": false, "memoryGiB": 61, "name": "ml.p3.2xlarge", "vcpuNum": 8 }, { "_defaultOrder": 36, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 4, "hideHardwareSpecs": false, "memoryGiB": 244, "name": "ml.p3.8xlarge", "vcpuNum": 32 }, { "_defaultOrder": 37, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 8, "hideHardwareSpecs": false, "memoryGiB": 488, "name": "ml.p3.16xlarge", "vcpuNum": 64 }, { "_defaultOrder": 38, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 8, "hideHardwareSpecs": false, "memoryGiB": 768, "name": "ml.p3dn.24xlarge", "vcpuNum": 96 }, { "_defaultOrder": 39, "_isFastLaunch": false, "category": "Memory Optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 16, "name": "ml.r5.large", "vcpuNum": 2 }, { "_defaultOrder": 40, "_isFastLaunch": false, "category": "Memory Optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 32, "name": "ml.r5.xlarge", "vcpuNum": 4 }, { "_defaultOrder": 41, "_isFastLaunch": false, "category": "Memory Optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 64, "name": "ml.r5.2xlarge", "vcpuNum": 8 }, { "_defaultOrder": 42, "_isFastLaunch": false, "category": "Memory Optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 128, "name": "ml.r5.4xlarge", "vcpuNum": 16 }, { "_defaultOrder": 43, "_isFastLaunch": false, "category": "Memory Optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 256, "name": "ml.r5.8xlarge", "vcpuNum": 32 }, { "_defaultOrder": 44, "_isFastLaunch": false, "category": "Memory Optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 384, "name": "ml.r5.12xlarge", "vcpuNum": 48 }, { "_defaultOrder": 45, "_isFastLaunch": false, "category": "Memory Optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 512, "name": "ml.r5.16xlarge", "vcpuNum": 64 }, { "_defaultOrder": 46, "_isFastLaunch": false, "category": "Memory Optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 768, "name": "ml.r5.24xlarge", "vcpuNum": 96 }, { "_defaultOrder": 47, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 1, "hideHardwareSpecs": false, "memoryGiB": 16, "name": "ml.g5.xlarge", "vcpuNum": 4 }, { "_defaultOrder": 48, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 1, "hideHardwareSpecs": false, "memoryGiB": 32, "name": "ml.g5.2xlarge", "vcpuNum": 8 }, { "_defaultOrder": 49, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 1, "hideHardwareSpecs": false, "memoryGiB": 64, "name": "ml.g5.4xlarge", "vcpuNum": 16 }, { "_defaultOrder": 50, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 1, "hideHardwareSpecs": false, "memoryGiB": 128, "name": "ml.g5.8xlarge", "vcpuNum": 32 }, { "_defaultOrder": 51, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 1, "hideHardwareSpecs": false, "memoryGiB": 256, "name": "ml.g5.16xlarge", "vcpuNum": 64 }, { "_defaultOrder": 52, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 4, "hideHardwareSpecs": false, "memoryGiB": 192, "name": "ml.g5.12xlarge", "vcpuNum": 48 }, { "_defaultOrder": 53, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 4, "hideHardwareSpecs": false, "memoryGiB": 384, "name": "ml.g5.24xlarge", "vcpuNum": 96 }, { "_defaultOrder": 54, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 8, "hideHardwareSpecs": false, "memoryGiB": 768, "name": "ml.g5.48xlarge", "vcpuNum": 192 }, { "_defaultOrder": 55, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 8, "hideHardwareSpecs": false, "memoryGiB": 1152, "name": "ml.p4d.24xlarge", "vcpuNum": 96 }, { "_defaultOrder": 56, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 8, "hideHardwareSpecs": false, "memoryGiB": 1152, "name": "ml.p4de.24xlarge", "vcpuNum": 96 } ], "instance_type": "ml.m5.large", "kernelspec": { "display_name": "Python 3 (Data Science 3.0)", "language": "python", "name": "python3__SAGEMAKER_INTERNAL__arn:aws:sagemaker:us-west-2:236514542706:image/sagemaker-data-science-310-v1" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.6" } }, "nbformat": 4, "nbformat_minor": 5 }