{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "2054403f",
   "metadata": {},
   "source": [
    "# Serve large models on SageMaker with DeepSpeed Container\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "cde88da4",
   "metadata": {},
   "source": [
    "---\n",
    "\n",
    "This notebook's CI test result for us-west-2 is as follows. CI test results in other regions can be found at the end of the notebook. \n",
    "\n",
    "![This us-west-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/us-west-2/inference|nlp|realtime|llm|opt30b|djl_deepspeed_deploy_opt30b.ipynb)\n",
    "\n",
    "---"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5d4f068c",
   "metadata": {},
   "source": [
    "\n",
    "In this notebook, we explore how to host a large language model on SageMaker using the latest container launched using from DeepSpeed and DJL. DJL provides for the serving framework while DeepSpeed is the key sharding library we leverage to enable hosting of large models.We use DJLServing as the model serving solution in this example. DJLServing is a high-performance universal model serving solution powered by the Deep Java Library (DJL) that is programming language agnostic. To learn more about DJL and DJLServing, you can refer to our recent blog post (https://aws.amazon.com/blogs/machine-learning/deploy-large-models-on-amazon-sagemaker-using-djlserving-and-deepspeed-model-parallel-inference/).\n",
    "\n",
    "Language models have recently exploded in both size and popularity. In 2018, BERT-large entered the scene and, with its 340M parameters and novel transformer architecture, set the standard on NLP task accuracy. Within just a few years, state-of-the-art NLP model size has grown by more than 500x with models such as OpenAI’s 175 billion parameter GPT-3 and similarly sized open source Bloom 176B raising the bar on NLP accuracy. This increase in the number of parameters is driven by the simple and empirically-demonstrated positive relationship between model size and accuracy: more is better. With easy access from models zoos such as Hugging Face and improved accuracy in NLP tasks such as classification and text generation, practitioners are increasingly reaching for these large models. However, deploying them can be a challenge because of their size.\n",
    "\n",
    "Model parallelism can help deploy large models that would normally be too large for a single GPU. With model parallelism, we partition and distribute a model across multiple GPUs. Each GPU holds a different part of the model, resolving the memory capacity issue for the largest deep learning models with billions of parameters. This notebook uses tensor parallelism techniques which allow GPUs to work simultaneously on the same layer of a model and achieve low latency inference relative to a pipeline parallel solution.\n",
    "\n",
    "SageMaker has rolled out DeepSpeed container which now provides users with the ability to leverage the managed serving capabilities and help to provide the un-differentiated heavy lifting.\n",
    "\n",
    "In this notebook, we deploy the open source OPT 30B model across GPU's on a ml.g5.48xlarge instance. DeepSpeed is used for tensor parallelism inference while DJLServing handles inference requests and the distributed workers. For further reading on DeepSpeed you can refer to https://arxiv.org/pdf/2207.00032.pdf \n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2829ec6b",
   "metadata": {},
   "source": [
    "## Licence agreement\n",
    "View license information https://huggingface.co/facebook/opt-30b/blob/main/LICENSE.md for this model including the restrictions in Section 2 before using the model.  \n",
    " \n",
    "OPT-175B is licensed under the OPT-175B license, Copyright (c) Meta Platforms, Inc. All Rights Reserved.\n",
    " \n",
    " \n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "c780d5fc",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Instal boto3 library to create model and run inference workloads\n",
    "%pip install -Uqq boto3 awscli"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d0d4c85a",
   "metadata": {},
   "source": [
    "## Section to Download Model from Hugging Face Hub\n",
    "\n",
    "Use this section for downloading the model directly from Huggingface hub and storing in your own S3 bucket. \n",
    "\n",
    "The below step to download and then upload to S3 can take several minutes since the model size is extremely large"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "4f5a06ca",
   "metadata": {},
   "outputs": [],
   "source": [
    "import sagemaker\n",
    "\n",
    "bucket = sagemaker.session.Session().default_bucket()\n",
    "print(bucket)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "5f73b659",
   "metadata": {},
   "outputs": [],
   "source": [
    "%pip install huggingface-hub -Uqq"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "0394c4a9",
   "metadata": {},
   "outputs": [],
   "source": [
    "from huggingface_hub import snapshot_download\n",
    "from pathlib import Path"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "e643130e",
   "metadata": {},
   "outputs": [],
   "source": [
    "# - This will download the model into the ./model directory where ever the jupyter file is running\n",
    "local_model_path = Path(\"./model\")\n",
    "local_model_path.mkdir(exist_ok=True)\n",
    "model_name = \"facebook/opt-30b\"\n",
    "commit_hash = \"463007d7da4e87fe962909a027811a8c0b32ede8\"\n",
    "# Only download pytorch checkpoint files\n",
    "ignore_patterns = [\"*.msgpack\", \"*.h5\"]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "330033d6",
   "metadata": {},
   "outputs": [],
   "source": [
    "# - Leverage the snapshot library to donload the model since the model is stored in repository using LFS\n",
    "snapshot_download(\n",
    "    repo_id=model_name,\n",
    "    revision=commit_hash,\n",
    "    cache_dir=local_model_path,\n",
    "    ignore_patterns=ignore_patterns,\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c467b82a",
   "metadata": {},
   "source": [
    "#### Upload to S3 using the awscli "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "d8bf90e2",
   "metadata": {},
   "outputs": [],
   "source": [
    "s3_model_prefix = \"hf-large-model-djl-opt30b/model\"  # folder where model checkpoint will go\n",
    "model_snapshot_path = list(local_model_path.glob(\"**/snapshots/*\"))[0]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "156d5002",
   "metadata": {},
   "outputs": [],
   "source": [
    "!aws s3 cp --recursive {model_snapshot_path} s3://{bucket}/{s3_model_prefix}"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8d7baf67",
   "metadata": {},
   "source": [
    "## Create SageMaker compatible Model artifact and Upload Model to S3\n",
    "\n",
    "SageMaker needs the model to be in a Tarball format. In this notebook we are going to create the model with the Inference code to shorten the end point creation time. In the Inference code we kick of a multi threaded approach to download the model weights into the container using awscli\n",
    "\n",
    "The tarball is in the following format\n",
    "\n",
    "```\n",
    "code\n",
    "├──── \n",
    "│   └── model.py\n",
    "│   └── requirements.txt\n",
    "│   └── serving.properties\n",
    "\n",
    "```\n",
    "\n",
    "The actual model is stored in S3 location and will be downloaded into the container directly when the endpoint is created. For that we will pass in two environment variables\n",
    "\n",
    "1.  \"MODEL_S3_BUCKET\" Specify the S3 Bucket where the model artifact is\n",
    "2.  \"MODEL_S3_PREFIX\" Specify the S3 prefix for where the model artifacts file are actually located\n",
    "\n",
    "This will be used in the model.py file to read in the actual model artifacts. \n",
    "\n",
    "- `model.py` is the key file which will handle any requests for serving. It is also responsible for loading the model from S3\n",
    "- `requirements.txt` has the awscli library needed to be installed when the container starts up.\n",
    "- `serving.properties` is the script that will have environment variables which can be used to customize model.py at run time.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "0815dd34",
   "metadata": {},
   "outputs": [],
   "source": [
    "!mkdir -p code_opt30"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "efb2b568",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%writefile ./code_opt30/model.py\n",
    "from djl_python import Input, Output\n",
    "import os\n",
    "import deepspeed\n",
    "import torch\n",
    "import torch.distributed as dist\n",
    "import sys\n",
    "import subprocess\n",
    "import time\n",
    "from glob import glob\n",
    "from transformers import pipeline, AutoModelForCausalLM, AutoTokenizer\n",
    "from transformers.models.opt.modeling_opt import OPTDecoderLayer\n",
    "\n",
    "predictor = None\n",
    "\n",
    "\n",
    "def check_config():\n",
    "    local_rank = os.getenv(\"LOCAL_RANK\")\n",
    "    curr_pid = os.getpid()\n",
    "    print(\n",
    "        f\"__Number CUDA Devices:{torch.cuda.device_count()}:::local_rank={local_rank}::curr_pid={curr_pid}::\"\n",
    "    )\n",
    "\n",
    "    if not local_rank:\n",
    "        return False\n",
    "    return True\n",
    "\n",
    "\n",
    "def get_model():\n",
    "    if not check_config():\n",
    "        raise Exception(\n",
    "            \"DJL:DeepSpeed configurations are not default. This code does not support non default configurations\"\n",
    "        )\n",
    "\n",
    "    deepspeed.init_distributed(\"nccl\")\n",
    "    tensor_parallel = int(os.getenv(\"TENSOR_PARALLEL_DEGREE\", \"1\"))\n",
    "    local_rank = int(os.getenv(\"LOCAL_RANK\", \"0\"))\n",
    "    model_dir = \"/tmp/model\"\n",
    "    bucket = os.environ.get(\"MODEL_S3_BUCKET\")\n",
    "    key_prefix = os.environ.get(\"MODEL_S3_PREFIX\")\n",
    "    curr_pid = os.getpid()\n",
    "    print(\n",
    "        f\"__Number CUDA Devices:{torch.cuda.device_count()}:tensor_parallel={tensor_parallel}::curr_pid={curr_pid}::\"\n",
    "    )\n",
    "\n",
    "    print(f\"rank: {local_rank} pid={curr_pid}::\")\n",
    "    if local_rank == 0:\n",
    "        if f\"{model_dir}/DONE\" not in glob(f\"{model_dir}/*\"):\n",
    "            print(f\"Starting Model downloading files pid={curr_pid}::\")\n",
    "            try:\n",
    "                proc_run = subprocess.run(\n",
    "                    [\"aws\", \"s3\", \"cp\", \"--recursive\", f\"s3://{bucket}/{key_prefix}\", model_dir],\n",
    "                    capture_output=True,\n",
    "                    text=True,\n",
    "                )  # python 7 onwards\n",
    "                print(f\"Model download finished pid={curr_pid}::\")\n",
    "\n",
    "                # write file when download complete. Could use dist.barrier() but this makes it easier to check if model is downloaded in case of retry\n",
    "                with open(f\"{model_dir}/DONE\", \"w\") as f:\n",
    "                    f.write(\"download_complete\")\n",
    "\n",
    "                print(\n",
    "                    f\"Model download:DONE checkmark written pid={curr_pid}::return_code:{proc_run.returncode}:stderr:{proc_run.stderr}:\"\n",
    "                )\n",
    "                proc_run.check_returncode()  # to throw the error in case there was one\n",
    "\n",
    "            except subprocess.CalledProcessError as e:\n",
    "                print(\n",
    "                    \"Model download failed: Error:\\nreturn code: \",\n",
    "                    e.returncode,\n",
    "                    \"\\nOutput: \",\n",
    "                    e.stderr,\n",
    "                )\n",
    "                raise  # FAIL FAST\n",
    "\n",
    "    dist.barrier()\n",
    "\n",
    "    print(f\"Load the Model pid={curr_pid}::\")\n",
    "\n",
    "    model = AutoModelForCausalLM.from_pretrained(model_dir, torch_dtype=torch.float16)\n",
    "    tokenizer = AutoTokenizer.from_pretrained(\"facebook/opt-30b\", use_fast=False)\n",
    "\n",
    "    model = deepspeed.init_inference(\n",
    "        model,\n",
    "        mp_size=tensor_parallel,\n",
    "        dtype=model.dtype,\n",
    "        replace_method=\"auto\",\n",
    "        replace_with_kernel_inject=True,\n",
    "    )\n",
    "    generator = pipeline(\n",
    "        task=\"text-generation\", model=model, tokenizer=tokenizer, device=local_rank\n",
    "    )\n",
    "    return generator\n",
    "\n",
    "\n",
    "def handle(inputs: Input) -> None:\n",
    "    global predictor\n",
    "    if not predictor:\n",
    "        predictor = get_model()\n",
    "\n",
    "    if inputs.is_empty():\n",
    "        # Model server makes an empty call to warmup the model on startup\n",
    "        return None\n",
    "\n",
    "    data = inputs.get_as_string()\n",
    "    result = predictor(data, do_sample=True)\n",
    "    return Output().add(result)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "05e4fb15",
   "metadata": {},
   "source": [
    "#### Serving.properties has engine parameter which tells the DJL model server to use the DeepSpeed engine to load the model"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "578b411c",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%writefile ./code_opt30/serving.properties\n",
    "engine = DeepSpeed\n",
    "# gpu.minWorkers=2\n",
    "# gpu.maxWorkers=3"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1251d271",
   "metadata": {},
   "source": [
    "#### Requirements will tell the container to load these additional libraries into the container. We need these to download the model into the container"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "92c81569",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%writefile ./code_opt30/requirements.txt\n",
    "boto3\n",
    "awscli"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "c55a7a09",
   "metadata": {},
   "outputs": [],
   "source": [
    "import sagemaker\n",
    "from sagemaker import image_uris\n",
    "import boto3\n",
    "import os\n",
    "import time\n",
    "import json\n",
    "from pathlib import Path"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "bffa9ed0",
   "metadata": {},
   "source": [
    "#### Create required variables and initialize them to create the endpoint, we leverage boto3 for this"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "36eaa614",
   "metadata": {},
   "outputs": [],
   "source": [
    "role = sagemaker.get_execution_role()  # execution role for the endpoint\n",
    "sess = sagemaker.session.Session()  # sagemaker session for interacting with different AWS APIs\n",
    "bucket = sess.default_bucket()  # bucket to house artifacts\n",
    "model_bucket = sess.default_bucket()  # bucket to house artifacts\n",
    "s3_code_prefix = (\n",
    "    \"hf-large-model-djl-opt30b/code_opt30\"  # folder within bucket where code artifact will go\n",
    ")\n",
    "s3_model_prefix = \"hf-large-model-djl-opt30b/model\"  # folder where model checkpoint will go\n",
    "\n",
    "region = sess._region_name\n",
    "account_id = sess.account_id()\n",
    "\n",
    "s3_client = boto3.client(\"s3\")\n",
    "sm_client = boto3.client(\"sagemaker\")\n",
    "smr_client = boto3.client(\"sagemaker-runtime\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "57eea2c7",
   "metadata": {},
   "source": [
    "**Image URI for the DJL container is being used here**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "1d40b586",
   "metadata": {},
   "outputs": [],
   "source": [
    "# inference_image_uri = f\"{account_id}.dkr.ecr.{region}.amazonaws.com/djl-ds:latest\"\n",
    "inference_image_uri = (\n",
    "    f\"763104351884.dkr.ecr.{region}.amazonaws.com/djl-inference:0.19.0-deepspeed0.7.3-cu113\"\n",
    ")\n",
    "print(f\"Image going to be used is ---- > {inference_image_uri}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "be1276ee",
   "metadata": {},
   "source": [
    "**Create the Tarball and then upload to S3 location**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "d8e257e0",
   "metadata": {},
   "outputs": [],
   "source": [
    "!rm model.tar.gz\n",
    "!tar czvf model.tar.gz code_opt30"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "2c4c9762",
   "metadata": {},
   "outputs": [],
   "source": [
    "s3_code_artifact = sess.upload_data(\"model.tar.gz\", bucket, s3_code_prefix)\n",
    "print(f\"S3 Code or Model tar ball uploaded to --- > {s3_code_artifact}\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "70a34329",
   "metadata": {},
   "outputs": [],
   "source": [
    "print(f\"S3 Model Prefix where the model files are -- > {s3_model_prefix}\")\n",
    "print(f\"S3 Model Bucket is -- > {model_bucket}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b164dde6",
   "metadata": {},
   "source": [
    "### This is optional in case you want to use VpcConfig to specify when creating the end points\n",
    "\n",
    "For more details you can refer to this link https://docs.aws.amazon.com/sagemaker/latest/dg/host-vpc.html\n",
    "\n",
    "The below is just an example to extract information about Security Groups and Subnets needed to configure"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "c48d23bf",
   "metadata": {},
   "outputs": [],
   "source": [
    "!aws ec2 describe-security-groups --filter Name=vpc-id,Values=<use vpcId> | python3 -c \"import sys, json; print(json.load(sys.stdin)['SecurityGroups'])\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "15dfdb2c",
   "metadata": {},
   "outputs": [],
   "source": [
    "# - provide networking configs if needed.\n",
    "security_group_ids = []  # add the security group id's\n",
    "subnets = []  # add the subnet id for this vpc\n",
    "privateVpcConfig = {\"SecurityGroupIds\": security_group_ids, \"Subnets\": subnets}\n",
    "print(privateVpcConfig)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c4fb13db",
   "metadata": {},
   "source": [
    "### To create the end point the steps are:\n",
    "\n",
    "1. Create the Model using the Image container and the Model Tarball uploaded earlier\n",
    "2. Create the endpoint config using the following key parameters\n",
    "\n",
    "    a) Instance Type is ml.p4d.24xlarge \n",
    "    \n",
    "    b) ModelDataDownloadTimeoutInSeconds is 2400 which is needed to ensure the Model downloads from S3 successfully,\n",
    "    \n",
    "    c) ContainerStartupHealthCheckTimeoutInSeconds is 2400 to ensure health check starts after the model is ready\n",
    "    \n",
    "3. Create the end point using the endpoint config created    \n",
    "    "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b8a3da0a",
   "metadata": {},
   "source": [
    "One of the key parameters here is **TENSOR_PARALLEL_DEGREE** which essentially tells the DeepSpeed library to partition the models along 8 GPU's. This is a tunable and configurable parameter. For the purpose of this notebook we would like to leave these as **default settings**.\n",
    "\n",
    "This parameter also controls the no of workers per model which will be started up when DJL serving runs. As an example if we have a 8 GPU machine and we are creating 8 partitions then we will have 1 worker per model to serve the requests. For further reading on DeepSpeedyou can follow the link https://www.deepspeed.ai/tutorials/inference-tutorial/#initializing-for-inference. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "9f85b504",
   "metadata": {},
   "outputs": [],
   "source": [
    "from sagemaker.utils import name_from_base\n",
    "\n",
    "model_name = name_from_base(f\"opt30b-djl-ds\")\n",
    "print(model_name)\n",
    "\n",
    "create_model_response = sm_client.create_model(\n",
    "    ModelName=model_name,\n",
    "    ExecutionRoleArn=role,\n",
    "    PrimaryContainer={\n",
    "        \"Image\": inference_image_uri,\n",
    "        \"ModelDataUrl\": s3_code_artifact,\n",
    "        \"Environment\": {\n",
    "            \"MODEL_S3_BUCKET\": bucket,\n",
    "            \"MODEL_S3_PREFIX\": s3_model_prefix,\n",
    "            \"TENSOR_PARALLEL_DEGREE\": \"8\",\n",
    "        },\n",
    "    },\n",
    "    # Uncomment if providing networking configs\n",
    "    # VpcConfig=privateVpcConfig\n",
    ")\n",
    "model_arn = create_model_response[\"ModelArn\"]\n",
    "\n",
    "print(f\"Created Model: {model_arn}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "282cbf26",
   "metadata": {},
   "source": [
    "VolumnSizeInGB has been left as commented out. You should use this value for Instance types which support EBS volume mounts. The current instance we are using comes with a pre configured space and does not support additional volume mounts"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "90a4bfd6",
   "metadata": {},
   "outputs": [],
   "source": [
    "endpoint_config_name = f\"{model_name}-config\"\n",
    "endpoint_name = f\"{model_name}-endpoint\"\n",
    "\n",
    "endpoint_config_response = sm_client.create_endpoint_config(\n",
    "    EndpointConfigName=endpoint_config_name,\n",
    "    ProductionVariants=[\n",
    "        {\n",
    "            \"VariantName\": \"variant1\",\n",
    "            \"ModelName\": model_name,\n",
    "            \"InstanceType\": \"ml.g5.48xlarge\",\n",
    "            \"InitialInstanceCount\": 1,\n",
    "            # \"VolumeSizeInGB\" : 200\n",
    "            \"ModelDataDownloadTimeoutInSeconds\": 2400,\n",
    "            \"ContainerStartupHealthCheckTimeoutInSeconds\": 2400,\n",
    "        },\n",
    "    ],\n",
    ")\n",
    "endpoint_config_response"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "41cc3a0a",
   "metadata": {},
   "outputs": [],
   "source": [
    "create_endpoint_response = sm_client.create_endpoint(\n",
    "    EndpointName=f\"{endpoint_name}\", EndpointConfigName=endpoint_config_name\n",
    ")\n",
    "print(f\"Created Endpoint: {create_endpoint_response['EndpointArn']}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6ed3a103",
   "metadata": {},
   "source": [
    "#### Wait for the end point to be created.\n",
    "However while that happens, let us look at the critical areas of the helper files we are using to load the model\n",
    "1. We will look at the code snippets for model.py to see the model downloading mechanism\n",
    "2. Requirements.txt to see the required libraries to be loaded\n",
    "3. Serving.properties to see the environment related properties"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "386ce773",
   "metadata": {},
   "outputs": [],
   "source": [
    "# This is the code snippet which is responsible to load the model from S3\n",
    "! sed -n '40,60p' code_opt30/model.py"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "8b15adf7",
   "metadata": {},
   "outputs": [],
   "source": [
    "# This is the code snippet which loads the libraries into the container needed for run\n",
    "! sed -n '1,3p' code_opt30/requirements.txt"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "0675ff6e",
   "metadata": {},
   "outputs": [],
   "source": [
    "# This is the code snippet which shows the environment variables being used to customize runtime\n",
    "! sed -n '1,3p' code_opt30/serving.properties"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6d34898a",
   "metadata": {},
   "source": [
    "### This step can take ~ 15 min or longer so please be patient"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "0a5046f4",
   "metadata": {},
   "outputs": [],
   "source": [
    "import time\n",
    "\n",
    "resp = sm_client.describe_endpoint(EndpointName=endpoint_name)\n",
    "status = resp[\"EndpointStatus\"]\n",
    "print(\"Status: \" + status)\n",
    "\n",
    "while status == \"Creating\":\n",
    "    time.sleep(60)\n",
    "    resp = sm_client.describe_endpoint(EndpointName=endpoint_name)\n",
    "    status = resp[\"EndpointStatus\"]\n",
    "    print(\"Status: \" + status)\n",
    "\n",
    "print(\"Arn: \" + resp[\"EndpointArn\"])\n",
    "print(\"Status: \" + status)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "60bc1e6f",
   "metadata": {},
   "source": [
    "#### Leverage the Boto3 to invoke the endpoint. \n",
    "\n",
    "This is a generative model so we pass in a Text as a prompt and Model will complete the sentence and return the results\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "07d94353",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%time\n",
    "smr_client.invoke_endpoint(\n",
    "    EndpointName=endpoint_name, Body=\"Amazon.com is the best\", ContentType=\"text/plain\"\n",
    ")[\"Body\"].read().decode(\"utf8\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4d97af6b",
   "metadata": {},
   "source": [
    "## Conclusion\n",
    "In this post, we demonstrated how to use SageMaker large model inference containers to host two large language models, BLOOM-176B and OPT-30B. We used DeepSpeed’s model parallel techniques with multiple GPUs on a single SageMaker machine learning instance. For more details about Amazon SageMaker and its large model inference capabilities, refer to the following:\n",
    "\n",
    "* Amazon SageMaker now supports deploying large models through configurable volume size and timeout quotas (https://aws.amazon.com/about-aws/whats-new/2022/09/amazon-sagemaker-deploying-large-models-volume-size-timeout-quotas/)\n",
    "* Real-time inference – Amazon SageMake (https://docs.aws.amazon.com/sagemaker/latest/dg/realtime-endpoints.html)\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ce7bf02c",
   "metadata": {},
   "source": [
    "## Clean Up"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "5c65d444",
   "metadata": {},
   "outputs": [],
   "source": [
    "# - Delete the end point\n",
    "sm_client.delete_endpoint(EndpointName=endpoint_name)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "bca9e21c",
   "metadata": {},
   "outputs": [],
   "source": [
    "# - In case the end point failed we still want to delete the model\n",
    "sm_client.delete_endpoint_config(EndpointConfigName=endpoint_config_name)\n",
    "sm_client.delete_model(ModelName=model_name)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2d2bbe68",
   "metadata": {},
   "source": [
    "#### Optionally delete the model checkpoint from S3"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "e054fa98",
   "metadata": {},
   "outputs": [],
   "source": [
    "!aws s3 rm --recursive s3://{bucket}/{s3_model_prefix}"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "38b0007e",
   "metadata": {},
   "source": [
    "## Notebook CI Test Results\n",
    "\n",
    "This notebook was tested in multiple regions. The test results are as follows, except for us-west-2 which is shown at the top of the notebook.\n",
    "\n",
    "![This us-east-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/us-east-1/inference|nlp|realtime|llm|opt30b|djl_deepspeed_deploy_opt30b.ipynb)\n",
    "\n",
    "![This us-east-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/us-east-2/inference|nlp|realtime|llm|opt30b|djl_deepspeed_deploy_opt30b.ipynb)\n",
    "\n",
    "![This us-west-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/us-west-1/inference|nlp|realtime|llm|opt30b|djl_deepspeed_deploy_opt30b.ipynb)\n",
    "\n",
    "![This ca-central-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/ca-central-1/inference|nlp|realtime|llm|opt30b|djl_deepspeed_deploy_opt30b.ipynb)\n",
    "\n",
    "![This sa-east-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/sa-east-1/inference|nlp|realtime|llm|opt30b|djl_deepspeed_deploy_opt30b.ipynb)\n",
    "\n",
    "![This eu-west-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/eu-west-1/inference|nlp|realtime|llm|opt30b|djl_deepspeed_deploy_opt30b.ipynb)\n",
    "\n",
    "![This eu-west-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/eu-west-2/inference|nlp|realtime|llm|opt30b|djl_deepspeed_deploy_opt30b.ipynb)\n",
    "\n",
    "![This eu-west-3 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/eu-west-3/inference|nlp|realtime|llm|opt30b|djl_deepspeed_deploy_opt30b.ipynb)\n",
    "\n",
    "![This eu-central-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/eu-central-1/inference|nlp|realtime|llm|opt30b|djl_deepspeed_deploy_opt30b.ipynb)\n",
    "\n",
    "![This eu-north-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/eu-north-1/inference|nlp|realtime|llm|opt30b|djl_deepspeed_deploy_opt30b.ipynb)\n",
    "\n",
    "![This ap-southeast-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/ap-southeast-1/inference|nlp|realtime|llm|opt30b|djl_deepspeed_deploy_opt30b.ipynb)\n",
    "\n",
    "![This ap-southeast-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/ap-southeast-2/inference|nlp|realtime|llm|opt30b|djl_deepspeed_deploy_opt30b.ipynb)\n",
    "\n",
    "![This ap-northeast-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/ap-northeast-1/inference|nlp|realtime|llm|opt30b|djl_deepspeed_deploy_opt30b.ipynb)\n",
    "\n",
    "![This ap-northeast-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/ap-northeast-2/inference|nlp|realtime|llm|opt30b|djl_deepspeed_deploy_opt30b.ipynb)\n",
    "\n",
    "![This ap-south-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/ap-south-1/inference|nlp|realtime|llm|opt30b|djl_deepspeed_deploy_opt30b.ipynb)\n"
   ]
  }
 ],
 "metadata": {
  "availableInstances": [
   {
    "_defaultOrder": 0,
    "_isFastLaunch": true,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 4,
    "name": "ml.t3.medium",
    "vcpuNum": 2
   },
   {
    "_defaultOrder": 1,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 8,
    "name": "ml.t3.large",
    "vcpuNum": 2
   },
   {
    "_defaultOrder": 2,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 16,
    "name": "ml.t3.xlarge",
    "vcpuNum": 4
   },
   {
    "_defaultOrder": 3,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 32,
    "name": "ml.t3.2xlarge",
    "vcpuNum": 8
   },
   {
    "_defaultOrder": 4,
    "_isFastLaunch": true,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 8,
    "name": "ml.m5.large",
    "vcpuNum": 2
   },
   {
    "_defaultOrder": 5,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 16,
    "name": "ml.m5.xlarge",
    "vcpuNum": 4
   },
   {
    "_defaultOrder": 6,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 32,
    "name": "ml.m5.2xlarge",
    "vcpuNum": 8
   },
   {
    "_defaultOrder": 7,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 64,
    "name": "ml.m5.4xlarge",
    "vcpuNum": 16
   },
   {
    "_defaultOrder": 8,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 128,
    "name": "ml.m5.8xlarge",
    "vcpuNum": 32
   },
   {
    "_defaultOrder": 9,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 192,
    "name": "ml.m5.12xlarge",
    "vcpuNum": 48
   },
   {
    "_defaultOrder": 10,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 256,
    "name": "ml.m5.16xlarge",
    "vcpuNum": 64
   },
   {
    "_defaultOrder": 11,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 384,
    "name": "ml.m5.24xlarge",
    "vcpuNum": 96
   },
   {
    "_defaultOrder": 12,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 8,
    "name": "ml.m5d.large",
    "vcpuNum": 2
   },
   {
    "_defaultOrder": 13,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 16,
    "name": "ml.m5d.xlarge",
    "vcpuNum": 4
   },
   {
    "_defaultOrder": 14,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 32,
    "name": "ml.m5d.2xlarge",
    "vcpuNum": 8
   },
   {
    "_defaultOrder": 15,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 64,
    "name": "ml.m5d.4xlarge",
    "vcpuNum": 16
   },
   {
    "_defaultOrder": 16,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 128,
    "name": "ml.m5d.8xlarge",
    "vcpuNum": 32
   },
   {
    "_defaultOrder": 17,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 192,
    "name": "ml.m5d.12xlarge",
    "vcpuNum": 48
   },
   {
    "_defaultOrder": 18,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 256,
    "name": "ml.m5d.16xlarge",
    "vcpuNum": 64
   },
   {
    "_defaultOrder": 19,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 384,
    "name": "ml.m5d.24xlarge",
    "vcpuNum": 96
   },
   {
    "_defaultOrder": 20,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": true,
    "memoryGiB": 0,
    "name": "ml.geospatial.interactive",
    "supportedImageNames": [
     "sagemaker-geospatial-v1-0"
    ],
    "vcpuNum": 0
   },
   {
    "_defaultOrder": 21,
    "_isFastLaunch": true,
    "category": "Compute optimized",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 4,
    "name": "ml.c5.large",
    "vcpuNum": 2
   },
   {
    "_defaultOrder": 22,
    "_isFastLaunch": false,
    "category": "Compute optimized",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 8,
    "name": "ml.c5.xlarge",
    "vcpuNum": 4
   },
   {
    "_defaultOrder": 23,
    "_isFastLaunch": false,
    "category": "Compute optimized",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 16,
    "name": "ml.c5.2xlarge",
    "vcpuNum": 8
   },
   {
    "_defaultOrder": 24,
    "_isFastLaunch": false,
    "category": "Compute optimized",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 32,
    "name": "ml.c5.4xlarge",
    "vcpuNum": 16
   },
   {
    "_defaultOrder": 25,
    "_isFastLaunch": false,
    "category": "Compute optimized",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 72,
    "name": "ml.c5.9xlarge",
    "vcpuNum": 36
   },
   {
    "_defaultOrder": 26,
    "_isFastLaunch": false,
    "category": "Compute optimized",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 96,
    "name": "ml.c5.12xlarge",
    "vcpuNum": 48
   },
   {
    "_defaultOrder": 27,
    "_isFastLaunch": false,
    "category": "Compute optimized",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 144,
    "name": "ml.c5.18xlarge",
    "vcpuNum": 72
   },
   {
    "_defaultOrder": 28,
    "_isFastLaunch": false,
    "category": "Compute optimized",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 192,
    "name": "ml.c5.24xlarge",
    "vcpuNum": 96
   },
   {
    "_defaultOrder": 29,
    "_isFastLaunch": true,
    "category": "Accelerated computing",
    "gpuNum": 1,
    "hideHardwareSpecs": false,
    "memoryGiB": 16,
    "name": "ml.g4dn.xlarge",
    "vcpuNum": 4
   },
   {
    "_defaultOrder": 30,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 1,
    "hideHardwareSpecs": false,
    "memoryGiB": 32,
    "name": "ml.g4dn.2xlarge",
    "vcpuNum": 8
   },
   {
    "_defaultOrder": 31,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 1,
    "hideHardwareSpecs": false,
    "memoryGiB": 64,
    "name": "ml.g4dn.4xlarge",
    "vcpuNum": 16
   },
   {
    "_defaultOrder": 32,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 1,
    "hideHardwareSpecs": false,
    "memoryGiB": 128,
    "name": "ml.g4dn.8xlarge",
    "vcpuNum": 32
   },
   {
    "_defaultOrder": 33,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 4,
    "hideHardwareSpecs": false,
    "memoryGiB": 192,
    "name": "ml.g4dn.12xlarge",
    "vcpuNum": 48
   },
   {
    "_defaultOrder": 34,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 1,
    "hideHardwareSpecs": false,
    "memoryGiB": 256,
    "name": "ml.g4dn.16xlarge",
    "vcpuNum": 64
   },
   {
    "_defaultOrder": 35,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 1,
    "hideHardwareSpecs": false,
    "memoryGiB": 61,
    "name": "ml.p3.2xlarge",
    "vcpuNum": 8
   },
   {
    "_defaultOrder": 36,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 4,
    "hideHardwareSpecs": false,
    "memoryGiB": 244,
    "name": "ml.p3.8xlarge",
    "vcpuNum": 32
   },
   {
    "_defaultOrder": 37,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 8,
    "hideHardwareSpecs": false,
    "memoryGiB": 488,
    "name": "ml.p3.16xlarge",
    "vcpuNum": 64
   },
   {
    "_defaultOrder": 38,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 8,
    "hideHardwareSpecs": false,
    "memoryGiB": 768,
    "name": "ml.p3dn.24xlarge",
    "vcpuNum": 96
   },
   {
    "_defaultOrder": 39,
    "_isFastLaunch": false,
    "category": "Memory Optimized",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 16,
    "name": "ml.r5.large",
    "vcpuNum": 2
   },
   {
    "_defaultOrder": 40,
    "_isFastLaunch": false,
    "category": "Memory Optimized",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 32,
    "name": "ml.r5.xlarge",
    "vcpuNum": 4
   },
   {
    "_defaultOrder": 41,
    "_isFastLaunch": false,
    "category": "Memory Optimized",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 64,
    "name": "ml.r5.2xlarge",
    "vcpuNum": 8
   },
   {
    "_defaultOrder": 42,
    "_isFastLaunch": false,
    "category": "Memory Optimized",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 128,
    "name": "ml.r5.4xlarge",
    "vcpuNum": 16
   },
   {
    "_defaultOrder": 43,
    "_isFastLaunch": false,
    "category": "Memory Optimized",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 256,
    "name": "ml.r5.8xlarge",
    "vcpuNum": 32
   },
   {
    "_defaultOrder": 44,
    "_isFastLaunch": false,
    "category": "Memory Optimized",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 384,
    "name": "ml.r5.12xlarge",
    "vcpuNum": 48
   },
   {
    "_defaultOrder": 45,
    "_isFastLaunch": false,
    "category": "Memory Optimized",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 512,
    "name": "ml.r5.16xlarge",
    "vcpuNum": 64
   },
   {
    "_defaultOrder": 46,
    "_isFastLaunch": false,
    "category": "Memory Optimized",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 768,
    "name": "ml.r5.24xlarge",
    "vcpuNum": 96
   },
   {
    "_defaultOrder": 47,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 1,
    "hideHardwareSpecs": false,
    "memoryGiB": 16,
    "name": "ml.g5.xlarge",
    "vcpuNum": 4
   },
   {
    "_defaultOrder": 48,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 1,
    "hideHardwareSpecs": false,
    "memoryGiB": 32,
    "name": "ml.g5.2xlarge",
    "vcpuNum": 8
   },
   {
    "_defaultOrder": 49,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 1,
    "hideHardwareSpecs": false,
    "memoryGiB": 64,
    "name": "ml.g5.4xlarge",
    "vcpuNum": 16
   },
   {
    "_defaultOrder": 50,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 1,
    "hideHardwareSpecs": false,
    "memoryGiB": 128,
    "name": "ml.g5.8xlarge",
    "vcpuNum": 32
   },
   {
    "_defaultOrder": 51,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 1,
    "hideHardwareSpecs": false,
    "memoryGiB": 256,
    "name": "ml.g5.16xlarge",
    "vcpuNum": 64
   },
   {
    "_defaultOrder": 52,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 4,
    "hideHardwareSpecs": false,
    "memoryGiB": 192,
    "name": "ml.g5.12xlarge",
    "vcpuNum": 48
   },
   {
    "_defaultOrder": 53,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 4,
    "hideHardwareSpecs": false,
    "memoryGiB": 384,
    "name": "ml.g5.24xlarge",
    "vcpuNum": 96
   },
   {
    "_defaultOrder": 54,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 8,
    "hideHardwareSpecs": false,
    "memoryGiB": 768,
    "name": "ml.g5.48xlarge",
    "vcpuNum": 192
   },
   {
    "_defaultOrder": 55,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 8,
    "hideHardwareSpecs": false,
    "memoryGiB": 1152,
    "name": "ml.p4d.24xlarge",
    "vcpuNum": 96
   },
   {
    "_defaultOrder": 56,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 8,
    "hideHardwareSpecs": false,
    "memoryGiB": 1152,
    "name": "ml.p4de.24xlarge",
    "vcpuNum": 96
   }
  ],
  "kernelspec": {
   "display_name": "Python 3 (Data Science 3.0)",
   "language": "python",
   "name": "python3__SAGEMAKER_INTERNAL__arn:aws:sagemaker:us-west-2:236514542706:image/sagemaker-data-science-310-v1"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.6"
  },
  "vscode": {
   "interpreter": {
    "hash": "916dbcbb3f70747c44a77c7bcd40155683ae19c65e1c03b4aa3499c5328201f1"
   }
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}