{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "c7b9f124",
   "metadata": {},
   "source": [
    "## Ensemble model inference with NVIDIA Triton Inference Server and NVIDIA DALI on Amazon SageMaker\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "52d1c323",
   "metadata": {},
   "source": [
    "---\n",
    "\n",
    "This notebook's CI test result for us-west-2 is as follows. CI test results in other regions can be found at the end of the notebook. \n",
    "\n",
    "![This us-west-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/us-west-2/sagemaker-triton|ensemble|dali-tf-inception|tf-dali-ensemble-cv.ipynb)\n",
    "\n",
    "---"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "07afd6f4",
   "metadata": {},
   "source": [
    "\n",
    "Deep learning applications are often complex, requiring multi-stage data loading and pre-processing pipelines. Optimizing these pre-processing steps are critical to achieve best performing inference workloads. In a computer vision application, pre-processing pipelines may include steps like image loading, cropping, image decoding, image resizing and other image augmentations. These data processing pipelines can be a bottleneck, limiting the performance and scalability of deep learning inference. Additionally, these pre-processing implementations can result in challenges like portability of inference workloads and code maintainability.\n",
    "\n",
    "In this notebook, we will deep dive into NVIDIA DALI pre-processing pipeline implementation for Inception V3 model. Pipeline implements image pre-processing steps like resize, decoder and crop. Serialize the pipeline and create a model configuration to be deployed with NVIDIA Triton Inference server. Finally, we deploy the Inception V3 model to an Amazon SageMaker real time endpoint using Triton Inference Deep Learning containers.\n",
    "\n",
    "### NVIDIA DALI\n",
    "\n",
    "The NVIDIA Data Loading Library (DALI) is a library for data loading and pre-processing to accelerate deep learning applications. It provides a collection of highly optimized building blocks for loading and processing image, video and audio data. It can be used as a portable drop-in replacement for built in data loaders and data iterators in popular deep learning frameworks.\n",
    "\n",
    "DALI addresses the problem of the CPU bottleneck by offloading data preprocessing to the GPU. Additionally, DALI relies on its own execution engine, built to maximize the throughput of the input pipeline. Features such as prefetching, parallel execution, and batch processing are handled transparently for the user. Data processing pipelines implemented using DALI are portable because they can easily be retargeted to TensorFlow, PyTorch, MXNet and PaddlePaddle.\n",
    "\n",
    "<img src=\"images/dali.png\" alt=\"DALI\" width=\"700\" />\n",
    "\n",
    "\n",
    "\n",
    "#### Highlights\n",
    "\n",
    "- Easy integration with NVIDIA Triton Inference Serve\n",
    "\n",
    "- Multiple data formats support - RecordIO, TFRecord, COCO, JPEG etc\n",
    "\n",
    "- Portable across popular deep learning frameworks: TensorFlow, PyTorch, MXNet.\n",
    "\n",
    "- Supports CPU and GPU execution.\n",
    "\n",
    "- Scalable across multiple GPUs.\n",
    "\n",
    "- Flexible graphs let developers create custom pipelines.\n",
    "\n",
    "## Triton Model Ensembles\n",
    "\n",
    "Triton Inference Server greatly simplifies the deployment of AI models at scale in production. Triton Server comes with a convenient solution that simplifies building pre-processing and post-processing pipelines. Triton Server platform provides the ensemble scheduler, which is responsible for pipelining models participating in the inference process while ensuring efficiency and optimizing throughput. \n",
    "\n",
    "<img src=\"images/triton-ensemble.png\" alt=\"triton-ensemble\" width=\"500\" align=\"left\"/>\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "727c0012",
   "metadata": {},
   "source": [
    "## Set up\n",
    "\n",
    "Install the dependencies required to package the model and run inferences using SageMaker Triton server."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "7cf0162f",
   "metadata": {},
   "outputs": [],
   "source": [
    "!pip install -qU pip awscli boto3 sagemaker --quiet\n",
    "!pip install nvidia-pyindex --quiet\n",
    "!pip install tritonclient[http] --quiet"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2429f562",
   "metadata": {},
   "source": [
    "Execute the following command to install the latest DALI for specified CUDA version"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "65cfbf80",
   "metadata": {},
   "source": [
    "### Note: We are installing NVIDIA DALI Cuda in the below step. You need to execute this notebook on a GPU based instance. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "0f97a014",
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "!pip install --extra-index-url https://developer.download.nvidia.com/compute/redist --upgrade nvidia-dali-cuda110"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "93cd6e9a",
   "metadata": {},
   "source": [
    "### Imports"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "28776121",
   "metadata": {},
   "outputs": [],
   "source": [
    "import boto3, json, sagemaker, time\n",
    "from sagemaker import get_execution_role\n",
    "import nvidia.dali as dali\n",
    "import nvidia.dali.types as types"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "bc23403c",
   "metadata": {},
   "source": [
    "### Variables"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "72e3041e",
   "metadata": {},
   "outputs": [],
   "source": [
    "# SageMaker varaibles\n",
    "sm_client = boto3.client(service_name=\"sagemaker\")\n",
    "runtime_sm_client = boto3.client(\"sagemaker-runtime\")\n",
    "sagemaker_session = sagemaker.Session(boto_session=boto3.Session())\n",
    "role = get_execution_role()\n",
    "\n",
    "# Other Variables\n",
    "instance_type = \"ml.g4dn.4xlarge\"\n",
    "sm_model_name = \"triton-tf-dali-ensemble-\" + time.strftime(\"%Y-%m-%d-%H-%M-%S\", time.gmtime())\n",
    "endpoint_config_name = \"triton-tf-dali-ensemble-\" + time.strftime(\n",
    "    \"%Y-%m-%d-%H-%M-%S\", time.gmtime()\n",
    ")\n",
    "endpoint_name = \"triton-tf-dali-ensemble-\" + time.strftime(\"%Y-%m-%d-%H-%M-%S\", time.gmtime())"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "695222e0",
   "metadata": {},
   "source": [
    "## Download models and set up pre-processing pipeline with DALI"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1f65a493",
   "metadata": {},
   "source": [
    "Create directories to host DALI ensemble models into the model repository. The following example shows the model repository directory structure, containing a DALI preprocessing model, TensorFlow Inception v3 model, and the model ensemble "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "36d5d485",
   "metadata": {},
   "source": [
    "<img src=\"images/model-repo.png\" alt=\"model-repo\" width=\"300\" align=\"left\"/>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "2e9c872a",
   "metadata": {},
   "outputs": [],
   "source": [
    "!mkdir -p model_repository/inception_graphdef/1\n",
    "!mkdir -p model_repository/dali/1\n",
    "!mkdir -p model_repository/ensemble_dali_inception/1"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "39beb5c3",
   "metadata": {},
   "source": [
    "Next, we will download Inception V3 model, this is an image classification neural network model"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b4507387",
   "metadata": {},
   "outputs": [],
   "source": [
    "!wget -O /tmp/inception_v3_2016_08_28_frozen.pb.tar.gz \\\n",
    "     https://storage.googleapis.com/download.tensorflow.org/models/inception_v3_2016_08_28_frozen.pb.tar.gz"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c34fbaa8",
   "metadata": {},
   "source": [
    "Place the downloaded Inception V3 model in model repository under `inception_graphdef` folder"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "f9a9dce0",
   "metadata": {},
   "outputs": [],
   "source": [
    "!(cd /tmp && tar xzf inception_v3_2016_08_28_frozen.pb.tar.gz)\n",
    "!mv /tmp/inception_v3_2016_08_28_frozen.pb model_repository/inception_graphdef/1/model.graphdef"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "66808fb7",
   "metadata": {},
   "source": [
    "Model configuration of ensemble model for image classification and dali pre-processing is shown below"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "e24a11fc",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%writefile model_repository/ensemble_dali_inception/config.pbtxt\n",
    "name: \"ensemble_dali_inception\"\n",
    "platform: \"ensemble\"\n",
    "max_batch_size: 256\n",
    "input [\n",
    "  {\n",
    "    name: \"INPUT\"\n",
    "    data_type: TYPE_UINT8\n",
    "    dims: [ -1 ]\n",
    "  }\n",
    "]\n",
    "output [\n",
    "  {\n",
    "    name: \"OUTPUT\"\n",
    "    data_type: TYPE_FP32\n",
    "    dims: [ 1001 ]\n",
    "  }\n",
    "]\n",
    "ensemble_scheduling {\n",
    "  step [\n",
    "    {\n",
    "      model_name: \"dali\"\n",
    "      model_version: -1\n",
    "      input_map {\n",
    "        key: \"DALI_INPUT_0\"\n",
    "        value: \"INPUT\"\n",
    "      }\n",
    "      output_map {\n",
    "        key: \"DALI_OUTPUT_0\"\n",
    "        value: \"preprocessed_image\"\n",
    "      }\n",
    "    },\n",
    "    {\n",
    "      model_name: \"inception_graphdef\"\n",
    "      model_version: -1\n",
    "      input_map {\n",
    "        key: \"input\"\n",
    "        value: \"preprocessed_image\"\n",
    "      }\n",
    "      output_map {\n",
    "        key: \"InceptionV3/Predictions/Softmax\"\n",
    "        value: \"OUTPUT\"\n",
    "      }\n",
    "    }\n",
    "  ]\n",
    "}"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "80620d3e",
   "metadata": {},
   "source": [
    "Model configuration for dali backend"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "7221dc8b",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%writefile model_repository/dali/config.pbtxt\n",
    "name: \"dali\"\n",
    "backend: \"dali\"\n",
    "max_batch_size: 256\n",
    "input [\n",
    "  {\n",
    "    name: \"DALI_INPUT_0\"\n",
    "    data_type: TYPE_UINT8\n",
    "    dims: [ -1 ]\n",
    "  }\n",
    "]\n",
    "output [\n",
    "  {\n",
    "    name: \"DALI_OUTPUT_0\"\n",
    "    data_type: TYPE_FP32\n",
    "    dims: [ 299, 299, 3 ]\n",
    "  }\n",
    "]\n",
    "parameters: [\n",
    "  {\n",
    "    key: \"num_threads\"\n",
    "    value: { string_value: \"12\" }\n",
    "  }\n",
    "]"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e47fed2e",
   "metadata": {},
   "source": [
    "Model configurations containing inception model graph definition"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "d8967564",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%writefile model_repository/inception_graphdef/config.pbtxt\n",
    "name: \"inception_graphdef\"\n",
    "platform: \"tensorflow_graphdef\"\n",
    "max_batch_size: 256\n",
    "input [\n",
    "  {\n",
    "    name: \"input\"\n",
    "    data_type: TYPE_FP32\n",
    "    format: FORMAT_NHWC\n",
    "    dims: [ 299, 299, 3 ]\n",
    "  }\n",
    "]\n",
    "output [\n",
    "  {\n",
    "    name: \"InceptionV3/Predictions/Softmax\"\n",
    "    data_type: TYPE_FP32\n",
    "    dims: [ 1001 ]\n",
    "    label_filename: \"inception_labels.txt\"\n",
    "  }\n",
    "]"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ff02314f",
   "metadata": {},
   "source": [
    "We will copy the inception classification model labels to `inception_graphdef` directory in model repository. The labels file contain 1000 class labels of [ImageNet](https://image-net.org/download.php) classification dataset."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "dd875e78",
   "metadata": {},
   "outputs": [],
   "source": [
    "!mkdir -p model_repository/inception_graphdef\n",
    "s3_client = boto3.client(\"s3\")\n",
    "s3_client.download_file(\n",
    "    f\"sagemaker-example-files-prod-{sagemaker_session.boto_region_name}\",\n",
    "    \"datasets/labels/inception_labels.txt\",\n",
    "    \"model_repository/inception_graphdef/inception_labels.txt\",\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "750412ab",
   "metadata": {},
   "source": [
    "### DALI Pipeline\n",
    "\n",
    "In DALI, any data processing task has a central object called Pipeline. Pipeline object is an instance of `nvidia.dali.Pipeline`. Pipeline encapsulates the data processing graph and the execution engine. You can define a DALI pipeline by implementing a function that uses DALI operators inside and decorating it with the `pipeline_def()` decorator. \n",
    "\n",
    "DALI pipelines are executed in stages. The stages correspond to the device parameter that can be specified for the operator, and are executed in following order:\n",
    "\n",
    "1. 'cpu' - operators that accept CPU inputs and produce CPU outputs.\n",
    "\n",
    "2. 'mixed' - operators that accept CPU inputs and produce GPU outputs, for example nvidia.dali.fn.decoders.image().\n",
    "\n",
    "3. 'gpu' - operators that accept GPU inputs and produce GPU outputs.\n",
    "\n",
    "#### Parameters\n",
    "1. batch_size - Maximum batch size of the pipeline\n",
    "2. num_threads - Number of CPU threads used by the pipeline\n",
    "3. device_id - Id of GPU used by the pipeline"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "5bfbcb55",
   "metadata": {},
   "outputs": [],
   "source": [
    "@dali.pipeline_def(batch_size=3, num_threads=1, device_id=0)\n",
    "def pipe():\n",
    "    \"\"\"Create a pipeline which reads images and masks, decodes the images and returns them.\"\"\"\n",
    "    images = dali.fn.external_source(device=\"cpu\", name=\"DALI_INPUT_0\")\n",
    "    images = dali.fn.decoders.image(images, device=\"mixed\", output_type=types.RGB)\n",
    "    images = dali.fn.resize(\n",
    "        images, resize_x=299, resize_y=299\n",
    "    )  # resize image to the default 299x299 size\n",
    "    images = dali.fn.crop_mirror_normalize(\n",
    "        images,\n",
    "        dtype=types.FLOAT,\n",
    "        output_layout=\"HWC\",\n",
    "        crop=(299, 299),  # crop image to the default 299x299 size\n",
    "        mean=[0.485 * 255, 0.456 * 255, 0.406 * 255],  # crop a central region of the image\n",
    "        std=[0.229 * 255, 0.224 * 255, 0.225 * 255],  # crop a central region of the image\n",
    "    )\n",
    "    return images"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5ed2c0bf",
   "metadata": {},
   "source": [
    "Serialize the pipeline to a Protobuf string, `filename` is the File where serialized pipeline will be written"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "e81f4674",
   "metadata": {},
   "outputs": [],
   "source": [
    "pipe().serialize(filename=\"model_repository/dali/1/model.dali\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "016cab14",
   "metadata": {},
   "source": [
    "## Get Triton Inference Server Container image"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "31e61f7c",
   "metadata": {},
   "source": [
    "Now that we have set up the DALI pipelines, we will get the SageMaker Triton image from ECR and use it to deploy the Inception V3 model to Amazon SageMaker real time endpoint"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "5ffec6e2",
   "metadata": {},
   "outputs": [],
   "source": [
    "account_id_map = {\n",
    "    \"us-east-1\": \"785573368785\",\n",
    "    \"us-east-2\": \"007439368137\",\n",
    "    \"us-west-1\": \"710691900526\",\n",
    "    \"us-west-2\": \"301217895009\",\n",
    "    \"eu-west-1\": \"802834080501\",\n",
    "    \"eu-west-2\": \"205493899709\",\n",
    "    \"eu-west-3\": \"254080097072\",\n",
    "    \"eu-north-1\": \"601324751636\",\n",
    "    \"eu-south-1\": \"966458181534\",\n",
    "    \"eu-central-1\": \"746233611703\",\n",
    "    \"ap-east-1\": \"110948597952\",\n",
    "    \"ap-south-1\": \"763008648453\",\n",
    "    \"ap-northeast-1\": \"941853720454\",\n",
    "    \"ap-northeast-2\": \"151534178276\",\n",
    "    \"ap-southeast-1\": \"324986816169\",\n",
    "    \"ap-southeast-2\": \"355873309152\",\n",
    "    \"cn-northwest-1\": \"474822919863\",\n",
    "    \"cn-north-1\": \"472730292857\",\n",
    "    \"sa-east-1\": \"756306329178\",\n",
    "    \"ca-central-1\": \"464438896020\",\n",
    "    \"me-south-1\": \"836785723513\",\n",
    "    \"af-south-1\": \"774647643957\",\n",
    "}"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "f12f62b7",
   "metadata": {},
   "outputs": [],
   "source": [
    "region = boto3.Session().region_name\n",
    "if region not in account_id_map.keys():\n",
    "    raise (\"UNSUPPORTED REGION\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "929cdca9",
   "metadata": {},
   "outputs": [],
   "source": [
    "base = \"amazonaws.com.cn\" if region.startswith(\"cn-\") else \"amazonaws.com\"\n",
    "triton_image_uri = \"{account_id}.dkr.ecr.{region}.{base}/sagemaker-tritonserver:21.08-py3\".format(\n",
    "    account_id=account_id_map[region], region=region, base=base\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "55391e4f",
   "metadata": {},
   "source": [
    "Let's create the model artifact "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "ad987290",
   "metadata": {},
   "outputs": [],
   "source": [
    "!tar -cvzf model.tar.gz -C model_repository ."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "961e173a",
   "metadata": {},
   "source": [
    "Once the content of the model repository directory tar'd to `model.tar.gz` file, we will upload the model artifacts to model_uri S3 location"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "f0492cff",
   "metadata": {},
   "outputs": [],
   "source": [
    "model_uri = sagemaker_session.upload_data(\n",
    "    path=\"model.tar.gz\", key_prefix=\"triton-serve-tf-dali-ensemble\"\n",
    ")\n",
    "print(model_uri)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5ae6474b",
   "metadata": {},
   "source": [
    "### Create SageMaker Endpoint\n",
    "\n",
    "We start off by creating a [SageMaker model](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateModel.html) from the model artifacts we uploaded to s3 in the previous step.\n",
    "\n",
    "In this step we also provide an additional Environment Variable i.e. `SAGEMAKER_TRITON_DEFAULT_MODEL_NAME` which specifies the name of the model to be loaded by Triton. **The value of this key should match the folder name in the model package uploaded to s3**. This variable is optional in case of a single model. In case of ensemble models, this key **has to be** specified for Triton to startup in SageMaker.\n",
    "\n",
    "Additionally, customers can set `SAGEMAKER_TRITON_BUFFER_MANAGER_THREAD_COUNT` and `SAGEMAKER_TRITON_THREAD_COUNT` for optimizing the thread counts.\n",
    "\n",
    "**Note**: The current release of Triton (21.08-py3) on SageMaker doesn't support running instances of different models on the same server, except in case of [ensembles](https://github.com/triton-inference-server/server/blob/main/docs/architecture.md#ensemble-models). Only multiple model instances of the same model are supported, which can be specified under the [instance-groups](https://github.com/triton-inference-server/server/blob/main/docs/model_configuration.md#instance-groups) section of the config.pbtxt file."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "15a8b4a5",
   "metadata": {},
   "outputs": [],
   "source": [
    "container = {\n",
    "    \"Image\": triton_image_uri,\n",
    "    \"ModelDataUrl\": model_uri,\n",
    "    \"Environment\": {\"SAGEMAKER_TRITON_DEFAULT_MODEL_NAME\": \"ensemble_dali_inception\"},\n",
    "}\n",
    "\n",
    "create_model_response = sm_client.create_model(\n",
    "    ModelName=sm_model_name, ExecutionRoleArn=role, PrimaryContainer=container\n",
    ")\n",
    "\n",
    "model_arn = create_model_response[\"ModelArn\"]\n",
    "\n",
    "print(f\"Model Arn: {model_arn}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d49f1fb1",
   "metadata": {},
   "source": [
    "Using the model above, we create an [endpoint configuration](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateEndpointConfig.html) where we can specify the type and number of instances we want in the endpoint."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b19e8cdd",
   "metadata": {},
   "outputs": [],
   "source": [
    "create_endpoint_config_response = sm_client.create_endpoint_config(\n",
    "    EndpointConfigName=endpoint_config_name,\n",
    "    ProductionVariants=[\n",
    "        {\n",
    "            \"InstanceType\": instance_type,\n",
    "            \"InitialVariantWeight\": 1,\n",
    "            \"InitialInstanceCount\": 1,\n",
    "            \"ModelName\": sm_model_name,\n",
    "            \"VariantName\": \"AllTraffic\",\n",
    "        }\n",
    "    ],\n",
    ")\n",
    "\n",
    "endpoint_config_arn = create_endpoint_config_response[\"EndpointConfigArn\"]\n",
    "\n",
    "print(f\"Endpoint Config Arn: {endpoint_config_arn}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9fbbaf8b",
   "metadata": {},
   "source": [
    "Using the above endpoint configuration we create a new sagemaker endpoint and wait for the deployment to finish. The status will change to **InService** once the deployment is successful."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "1a39b967",
   "metadata": {},
   "outputs": [],
   "source": [
    "create_endpoint_response = sm_client.create_endpoint(\n",
    "    EndpointName=endpoint_name, EndpointConfigName=endpoint_config_name\n",
    ")\n",
    "\n",
    "endpoint_arn = create_endpoint_response[\"EndpointArn\"]\n",
    "\n",
    "print(f\"Endpoint Arn: {endpoint_arn}\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "7822a573",
   "metadata": {},
   "outputs": [],
   "source": [
    "rv = sm_client.describe_endpoint(EndpointName=endpoint_name)\n",
    "status = rv[\"EndpointStatus\"]\n",
    "print(f\"Endpoint Creation Status: {status}\")\n",
    "\n",
    "while status == \"Creating\":\n",
    "    time.sleep(60)\n",
    "    rv = sm_client.describe_endpoint(EndpointName=endpoint_name)\n",
    "    status = rv[\"EndpointStatus\"]\n",
    "    print(f\"Endpoint Creation Status: {status}\")\n",
    "\n",
    "endpoint_arn = rv[\"EndpointArn\"]\n",
    "\n",
    "print(f\"Endpoint Arn: {endpoint_arn}\")\n",
    "print(f\"Endpoint Status: {status}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f78a6f01",
   "metadata": {},
   "source": [
    "### Prepare inference payload"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "95911423",
   "metadata": {},
   "source": [
    "Let's download an image from SageMaker S3 bucket to be used for Inception V3 model inference. This image will go through pre-processing DALI pipeline and used in ensemble scheduler provided by Triton Inference server."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "489f0851",
   "metadata": {},
   "outputs": [],
   "source": [
    "sample_img_fname = \"shiba_inu_dog.jpg\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "d37e44dc",
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "\n",
    "s3_client.download_file(\n",
    "    f\"sagemaker-example-files-prod-{sagemaker_session.boto_region_name}\",\n",
    "    \"datasets/image/pets/shiba_inu_dog.jpg\",\n",
    "    sample_img_fname,\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "5e5aec52",
   "metadata": {},
   "outputs": [],
   "source": [
    "def load_image(img_path):\n",
    "    \"\"\"\n",
    "    Loads image as an encoded array of bytes.\n",
    "    This is a typical approach you want to use in DALI backend\n",
    "    \"\"\"\n",
    "    with open(img_path, \"rb\") as f:\n",
    "        img = f.read()\n",
    "        return np.array(list(img)).astype(np.uint8)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "03f20b0d",
   "metadata": {},
   "outputs": [],
   "source": [
    "rv = load_image(sample_img_fname)\n",
    "print(f\"Shape of image {rv.shape}\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "4125c2f7",
   "metadata": {},
   "outputs": [],
   "source": [
    "rv2 = np.expand_dims(rv, 0)\n",
    "print(f\"Shape of expanded image array {rv2.shape}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "bcef1211",
   "metadata": {},
   "source": [
    "Prepare input payload with the name, shape, datatype and the data as list. This payload will be used to invoke the endpoint to get the prediction results"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "2287574f",
   "metadata": {},
   "outputs": [],
   "source": [
    "payload = {\n",
    "    \"inputs\": [\n",
    "        {\n",
    "            \"name\": \"INPUT\",\n",
    "            \"shape\": rv2.shape,\n",
    "            \"datatype\": \"UINT8\",\n",
    "            \"data\": rv2.tolist(),\n",
    "        }\n",
    "    ]\n",
    "}"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7d428334",
   "metadata": {},
   "source": [
    "### Run inference\n",
    "\n",
    "Once we have the endpoint running we can use the sample image downloaded to do an inference using json as the payload format. For inference request format, Triton uses the KFServing community standard [inference protocols](https://github.com/triton-inference-server/server/blob/main/docs/protocol/README.md)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "bb4809e7",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%timeit\n",
    "\n",
    "response = runtime_sm_client.invoke_endpoint(\n",
    "    EndpointName=endpoint_name, ContentType=\"application/octet-stream\", Body=json.dumps(payload)\n",
    ")\n",
    "\n",
    "print(json.loads(response[\"Body\"].read().decode(\"utf8\")))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5f7971d8",
   "metadata": {},
   "source": [
    "We can also use binary+json as the payload format to get better performance for the inference call. The specification of this format is provided [here](https://github.com/triton-inference-server/server/blob/main/docs/protocol/extension_binary_data.md).\n",
    "\n",
    "**Note:** With the `binary+json` format, we have to specify the length of the request metadata in the header to allow Triton to correctly parse the binary payload. This is done using a custom Content-Type header `application/vnd.sagemaker-triton.binary+json;json-header-size={}`.\n",
    "\n",
    "Please not, this is different from using `Inference-Header-Content-Length` header on a stand-alone Triton server since custom headers are not allowed in SageMaker.\n",
    "\n",
    "The `tritonclient` package provides utility methods to generate the payload without having to know the details of the specification. We'll use the following methods to convert our inference request into a binary format which provides lower latencies for inference."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "248952b7",
   "metadata": {},
   "outputs": [],
   "source": [
    "import tritonclient.http as httpclient\n",
    "\n",
    "\n",
    "def get_sample_image_binary(img_path, input_name, output_name):\n",
    "    inputs = []\n",
    "    outputs = []\n",
    "    input_data = load_image(img_path)\n",
    "    input_data = np.expand_dims(input_data, axis=0)\n",
    "    inputs.append(httpclient.InferInput(input_name, input_data.shape, \"UINT8\"))\n",
    "    inputs[0].set_data_from_numpy(input_data, binary_data=True)\n",
    "    outputs.append(httpclient.InferRequestedOutput(output_name, binary_data=True))\n",
    "    request_body, header_length = httpclient.InferenceServerClient.generate_request_body(\n",
    "        inputs, outputs=outputs\n",
    "    )\n",
    "    return request_body, header_length"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "6897d68b",
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "request_body, header_length = get_sample_image_binary(sample_img_fname, \"INPUT\", \"OUTPUT\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "be152539",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%timeit\n",
    "\n",
    "response = runtime_sm_client.invoke_endpoint(\n",
    "    EndpointName=endpoint_name,\n",
    "    ContentType=\"application/vnd.sagemaker-triton.binary+json;json-header-size={}\".format(\n",
    "        header_length\n",
    "    ),\n",
    "    Body=request_body,\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "035d14af",
   "metadata": {},
   "source": [
    "We use `invoke_endpoint` to pass in the payload in binary json format to the endpoint."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "639afbbc",
   "metadata": {},
   "outputs": [],
   "source": [
    "response = runtime_sm_client.invoke_endpoint(\n",
    "    EndpointName=endpoint_name,\n",
    "    ContentType=\"application/vnd.sagemaker-triton.binary+json;json-header-size={}\".format(\n",
    "        header_length\n",
    "    ),\n",
    "    Body=request_body,\n",
    ")\n",
    "\n",
    "# Parse json header size length from the response\n",
    "header_length_prefix = \"application/vnd.sagemaker-triton.binary+json;json-header-size=\"\n",
    "header_length_str = response[\"ContentType\"][len(header_length_prefix) :]\n",
    "\n",
    "# Read response body\n",
    "result = httpclient.InferenceServerClient.parse_response_body(\n",
    "    response[\"Body\"].read(), header_length=int(header_length_str)\n",
    ")\n",
    "output0_data = result.as_numpy(\"OUTPUT\")\n",
    "print(output0_data)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "75c16977",
   "metadata": {},
   "source": [
    "### Delete endpoint and model artifacts"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1486da15",
   "metadata": {},
   "source": [
    "Finally, we clean up the model artifacts i.e. SageMaker model, endpoint configuration and the endpoint."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "a502fc5c",
   "metadata": {},
   "outputs": [],
   "source": [
    "sm_client.delete_model(ModelName=sm_model_name)\n",
    "sm_client.delete_endpoint_config(EndpointConfigName=endpoint_config_name)\n",
    "sm_client.delete_endpoint(EndpointName=endpoint_name)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "97050d40",
   "metadata": {},
   "source": [
    "### Conclusion\n",
    "\n",
    "In this notebook, we implemented a model ensemble using NIVIDA Triton inference server and pre-processed images using NVIDIA DALI pipelines. This significantly accelerates model inference in terms of overall latency and throughput. Try it out! "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7cc682b4",
   "metadata": {},
   "source": [
    "## Notebook CI Test Results\n",
    "\n",
    "This notebook was tested in multiple regions. The test results are as follows, except for us-west-2 which is shown at the top of the notebook.\n",
    "\n",
    "![This us-east-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/us-east-1/sagemaker-triton|ensemble|dali-tf-inception|tf-dali-ensemble-cv.ipynb)\n",
    "\n",
    "![This us-east-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/us-east-2/sagemaker-triton|ensemble|dali-tf-inception|tf-dali-ensemble-cv.ipynb)\n",
    "\n",
    "![This us-west-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/us-west-1/sagemaker-triton|ensemble|dali-tf-inception|tf-dali-ensemble-cv.ipynb)\n",
    "\n",
    "![This ca-central-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/ca-central-1/sagemaker-triton|ensemble|dali-tf-inception|tf-dali-ensemble-cv.ipynb)\n",
    "\n",
    "![This sa-east-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/sa-east-1/sagemaker-triton|ensemble|dali-tf-inception|tf-dali-ensemble-cv.ipynb)\n",
    "\n",
    "![This eu-west-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/eu-west-1/sagemaker-triton|ensemble|dali-tf-inception|tf-dali-ensemble-cv.ipynb)\n",
    "\n",
    "![This eu-west-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/eu-west-2/sagemaker-triton|ensemble|dali-tf-inception|tf-dali-ensemble-cv.ipynb)\n",
    "\n",
    "![This eu-west-3 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/eu-west-3/sagemaker-triton|ensemble|dali-tf-inception|tf-dali-ensemble-cv.ipynb)\n",
    "\n",
    "![This eu-central-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/eu-central-1/sagemaker-triton|ensemble|dali-tf-inception|tf-dali-ensemble-cv.ipynb)\n",
    "\n",
    "![This eu-north-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/eu-north-1/sagemaker-triton|ensemble|dali-tf-inception|tf-dali-ensemble-cv.ipynb)\n",
    "\n",
    "![This ap-southeast-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/ap-southeast-1/sagemaker-triton|ensemble|dali-tf-inception|tf-dali-ensemble-cv.ipynb)\n",
    "\n",
    "![This ap-southeast-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/ap-southeast-2/sagemaker-triton|ensemble|dali-tf-inception|tf-dali-ensemble-cv.ipynb)\n",
    "\n",
    "![This ap-northeast-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/ap-northeast-1/sagemaker-triton|ensemble|dali-tf-inception|tf-dali-ensemble-cv.ipynb)\n",
    "\n",
    "![This ap-northeast-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/ap-northeast-2/sagemaker-triton|ensemble|dali-tf-inception|tf-dali-ensemble-cv.ipynb)\n",
    "\n",
    "![This ap-south-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/ap-south-1/sagemaker-triton|ensemble|dali-tf-inception|tf-dali-ensemble-cv.ipynb)\n"
   ]
  }
 ],
 "metadata": {
  "availableInstances": [
   {
    "_defaultOrder": 0,
    "_isFastLaunch": true,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 4,
    "name": "ml.t3.medium",
    "vcpuNum": 2
   },
   {
    "_defaultOrder": 1,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 8,
    "name": "ml.t3.large",
    "vcpuNum": 2
   },
   {
    "_defaultOrder": 2,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 16,
    "name": "ml.t3.xlarge",
    "vcpuNum": 4
   },
   {
    "_defaultOrder": 3,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 32,
    "name": "ml.t3.2xlarge",
    "vcpuNum": 8
   },
   {
    "_defaultOrder": 4,
    "_isFastLaunch": true,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 8,
    "name": "ml.m5.large",
    "vcpuNum": 2
   },
   {
    "_defaultOrder": 5,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 16,
    "name": "ml.m5.xlarge",
    "vcpuNum": 4
   },
   {
    "_defaultOrder": 6,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 32,
    "name": "ml.m5.2xlarge",
    "vcpuNum": 8
   },
   {
    "_defaultOrder": 7,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 64,
    "name": "ml.m5.4xlarge",
    "vcpuNum": 16
   },
   {
    "_defaultOrder": 8,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 128,
    "name": "ml.m5.8xlarge",
    "vcpuNum": 32
   },
   {
    "_defaultOrder": 9,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 192,
    "name": "ml.m5.12xlarge",
    "vcpuNum": 48
   },
   {
    "_defaultOrder": 10,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 256,
    "name": "ml.m5.16xlarge",
    "vcpuNum": 64
   },
   {
    "_defaultOrder": 11,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 384,
    "name": "ml.m5.24xlarge",
    "vcpuNum": 96
   },
   {
    "_defaultOrder": 12,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 8,
    "name": "ml.m5d.large",
    "vcpuNum": 2
   },
   {
    "_defaultOrder": 13,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 16,
    "name": "ml.m5d.xlarge",
    "vcpuNum": 4
   },
   {
    "_defaultOrder": 14,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 32,
    "name": "ml.m5d.2xlarge",
    "vcpuNum": 8
   },
   {
    "_defaultOrder": 15,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 64,
    "name": "ml.m5d.4xlarge",
    "vcpuNum": 16
   },
   {
    "_defaultOrder": 16,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 128,
    "name": "ml.m5d.8xlarge",
    "vcpuNum": 32
   },
   {
    "_defaultOrder": 17,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 192,
    "name": "ml.m5d.12xlarge",
    "vcpuNum": 48
   },
   {
    "_defaultOrder": 18,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 256,
    "name": "ml.m5d.16xlarge",
    "vcpuNum": 64
   },
   {
    "_defaultOrder": 19,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 384,
    "name": "ml.m5d.24xlarge",
    "vcpuNum": 96
   },
   {
    "_defaultOrder": 20,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": true,
    "memoryGiB": 0,
    "name": "ml.geospatial.interactive",
    "supportedImageNames": [
     "sagemaker-geospatial-v1-0"
    ],
    "vcpuNum": 0
   },
   {
    "_defaultOrder": 21,
    "_isFastLaunch": true,
    "category": "Compute optimized",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 4,
    "name": "ml.c5.large",
    "vcpuNum": 2
   },
   {
    "_defaultOrder": 22,
    "_isFastLaunch": false,
    "category": "Compute optimized",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 8,
    "name": "ml.c5.xlarge",
    "vcpuNum": 4
   },
   {
    "_defaultOrder": 23,
    "_isFastLaunch": false,
    "category": "Compute optimized",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 16,
    "name": "ml.c5.2xlarge",
    "vcpuNum": 8
   },
   {
    "_defaultOrder": 24,
    "_isFastLaunch": false,
    "category": "Compute optimized",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 32,
    "name": "ml.c5.4xlarge",
    "vcpuNum": 16
   },
   {
    "_defaultOrder": 25,
    "_isFastLaunch": false,
    "category": "Compute optimized",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 72,
    "name": "ml.c5.9xlarge",
    "vcpuNum": 36
   },
   {
    "_defaultOrder": 26,
    "_isFastLaunch": false,
    "category": "Compute optimized",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 96,
    "name": "ml.c5.12xlarge",
    "vcpuNum": 48
   },
   {
    "_defaultOrder": 27,
    "_isFastLaunch": false,
    "category": "Compute optimized",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 144,
    "name": "ml.c5.18xlarge",
    "vcpuNum": 72
   },
   {
    "_defaultOrder": 28,
    "_isFastLaunch": false,
    "category": "Compute optimized",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 192,
    "name": "ml.c5.24xlarge",
    "vcpuNum": 96
   },
   {
    "_defaultOrder": 29,
    "_isFastLaunch": true,
    "category": "Accelerated computing",
    "gpuNum": 1,
    "hideHardwareSpecs": false,
    "memoryGiB": 16,
    "name": "ml.g4dn.xlarge",
    "vcpuNum": 4
   },
   {
    "_defaultOrder": 30,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 1,
    "hideHardwareSpecs": false,
    "memoryGiB": 32,
    "name": "ml.g4dn.2xlarge",
    "vcpuNum": 8
   },
   {
    "_defaultOrder": 31,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 1,
    "hideHardwareSpecs": false,
    "memoryGiB": 64,
    "name": "ml.g4dn.4xlarge",
    "vcpuNum": 16
   },
   {
    "_defaultOrder": 32,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 1,
    "hideHardwareSpecs": false,
    "memoryGiB": 128,
    "name": "ml.g4dn.8xlarge",
    "vcpuNum": 32
   },
   {
    "_defaultOrder": 33,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 4,
    "hideHardwareSpecs": false,
    "memoryGiB": 192,
    "name": "ml.g4dn.12xlarge",
    "vcpuNum": 48
   },
   {
    "_defaultOrder": 34,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 1,
    "hideHardwareSpecs": false,
    "memoryGiB": 256,
    "name": "ml.g4dn.16xlarge",
    "vcpuNum": 64
   },
   {
    "_defaultOrder": 35,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 1,
    "hideHardwareSpecs": false,
    "memoryGiB": 61,
    "name": "ml.p3.2xlarge",
    "vcpuNum": 8
   },
   {
    "_defaultOrder": 36,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 4,
    "hideHardwareSpecs": false,
    "memoryGiB": 244,
    "name": "ml.p3.8xlarge",
    "vcpuNum": 32
   },
   {
    "_defaultOrder": 37,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 8,
    "hideHardwareSpecs": false,
    "memoryGiB": 488,
    "name": "ml.p3.16xlarge",
    "vcpuNum": 64
   },
   {
    "_defaultOrder": 38,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 8,
    "hideHardwareSpecs": false,
    "memoryGiB": 768,
    "name": "ml.p3dn.24xlarge",
    "vcpuNum": 96
   },
   {
    "_defaultOrder": 39,
    "_isFastLaunch": false,
    "category": "Memory Optimized",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 16,
    "name": "ml.r5.large",
    "vcpuNum": 2
   },
   {
    "_defaultOrder": 40,
    "_isFastLaunch": false,
    "category": "Memory Optimized",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 32,
    "name": "ml.r5.xlarge",
    "vcpuNum": 4
   },
   {
    "_defaultOrder": 41,
    "_isFastLaunch": false,
    "category": "Memory Optimized",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 64,
    "name": "ml.r5.2xlarge",
    "vcpuNum": 8
   },
   {
    "_defaultOrder": 42,
    "_isFastLaunch": false,
    "category": "Memory Optimized",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 128,
    "name": "ml.r5.4xlarge",
    "vcpuNum": 16
   },
   {
    "_defaultOrder": 43,
    "_isFastLaunch": false,
    "category": "Memory Optimized",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 256,
    "name": "ml.r5.8xlarge",
    "vcpuNum": 32
   },
   {
    "_defaultOrder": 44,
    "_isFastLaunch": false,
    "category": "Memory Optimized",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 384,
    "name": "ml.r5.12xlarge",
    "vcpuNum": 48
   },
   {
    "_defaultOrder": 45,
    "_isFastLaunch": false,
    "category": "Memory Optimized",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 512,
    "name": "ml.r5.16xlarge",
    "vcpuNum": 64
   },
   {
    "_defaultOrder": 46,
    "_isFastLaunch": false,
    "category": "Memory Optimized",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 768,
    "name": "ml.r5.24xlarge",
    "vcpuNum": 96
   },
   {
    "_defaultOrder": 47,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 1,
    "hideHardwareSpecs": false,
    "memoryGiB": 16,
    "name": "ml.g5.xlarge",
    "vcpuNum": 4
   },
   {
    "_defaultOrder": 48,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 1,
    "hideHardwareSpecs": false,
    "memoryGiB": 32,
    "name": "ml.g5.2xlarge",
    "vcpuNum": 8
   },
   {
    "_defaultOrder": 49,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 1,
    "hideHardwareSpecs": false,
    "memoryGiB": 64,
    "name": "ml.g5.4xlarge",
    "vcpuNum": 16
   },
   {
    "_defaultOrder": 50,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 1,
    "hideHardwareSpecs": false,
    "memoryGiB": 128,
    "name": "ml.g5.8xlarge",
    "vcpuNum": 32
   },
   {
    "_defaultOrder": 51,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 1,
    "hideHardwareSpecs": false,
    "memoryGiB": 256,
    "name": "ml.g5.16xlarge",
    "vcpuNum": 64
   },
   {
    "_defaultOrder": 52,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 4,
    "hideHardwareSpecs": false,
    "memoryGiB": 192,
    "name": "ml.g5.12xlarge",
    "vcpuNum": 48
   },
   {
    "_defaultOrder": 53,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 4,
    "hideHardwareSpecs": false,
    "memoryGiB": 384,
    "name": "ml.g5.24xlarge",
    "vcpuNum": 96
   },
   {
    "_defaultOrder": 54,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 8,
    "hideHardwareSpecs": false,
    "memoryGiB": 768,
    "name": "ml.g5.48xlarge",
    "vcpuNum": 192
   },
   {
    "_defaultOrder": 55,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 8,
    "hideHardwareSpecs": false,
    "memoryGiB": 1152,
    "name": "ml.p4d.24xlarge",
    "vcpuNum": 96
   },
   {
    "_defaultOrder": 56,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 8,
    "hideHardwareSpecs": false,
    "memoryGiB": 1152,
    "name": "ml.p4de.24xlarge",
    "vcpuNum": 96
   }
  ],
  "instance_type": "ml.g4dn.xlarge",
  "kernelspec": {
   "display_name": "Python 3 (TensorFlow 2.10.0 Python 3.9 GPU Optimized)",
   "language": "python",
   "name": "python3__SAGEMAKER_INTERNAL__arn:aws:sagemaker:us-west-2:236514542706:image/tensorflow-2.10.1-gpu-py39-cu112-ubuntu20.04-sagemaker-v1.2"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.16"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}