{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "0e48316a",
   "metadata": {},
   "source": [
    "# Deploy FLAN-XXL Model with high performance on SageMaker using NVIDIA FasterTransformer\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b3193872",
   "metadata": {},
   "source": [
    "---\n",
    "\n",
    "This notebook's CI test result for us-west-2 is as follows. CI test results in other regions can be found at the end of the notebook. \n",
    "\n",
    "![This us-west-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/us-west-2/inference|generativeai|llm-workshop|lab5-flan-t5-xxl|flan-xxl-sagemaker-fastertransformer-s5cmd.ipynb)\n",
    "\n",
    "---"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "25463bb2",
   "metadata": {},
   "source": [
    "\n",
    "In this notebook, we explore how to host a FLAN-T5-XXL large language model with FP32 precision on SageMaker using the NVIDIA Faster Transformer. We use DJLServing as the model serving solution in this example. DJLServing is a high-performance universal model serving solution powered by the Deep Java Library (DJL) that is programming language agnostic. To learn more about DJL and DJLServing, you can refer to our recent blog post (https://aws.amazon.com/blogs/machine-learning/deploy-bloom-176b-and-opt-30b-on-amazon-sagemaker-with-large-model-inference-deep-learning-containers-and-deepspeed/).\n",
    "\n",
    "\n",
    "Model parallelism can help deploy large models that would normally be too large for a single GPU. With model parallelism, we partition and distribute a model across multiple GPUs. Each GPU holds a different part of the model, resolving the memory capacity issue for the largest deep learning models with billions of parameters. This notebook uses tensor parallelism techniques which allow GPUs to work simultaneously on the same layer of a model and achieve low latency inference relative to a pipeline parallel solution.\n",
    "\n",
    "SageMaker has rolled out DeepSpeed container which now provides users with the ability to leverage the managed serving capabilities and help to provide the un-differentiated heavy lifting.\n",
    "\n",
    "In this notebook, we deploy the open source [FLAN-T5-XXL](https://huggingface.co/google/flan-t5-xxl) model across GPUs on a ml.g5.12xlarge instance. \n",
    "\n",
    "The Flan-T5 are T5 models trained on the Flan collection of datasets which include: taskmaster2, djaym7/wiki_dialog, deepmind/code_contests, lambada, gsm8k, aqua_rat, esnli, quasc and qed.\n",
    "\n",
    "[FasterTransformer](https://github.com/NVIDIA/FasterTransformer) implements a highly optimized transformer layer for both the encoder and decoder for inference. On Volta, Turing and Ampere GPUs, the computing power of Tensor Cores are used automatically when the precision of the data and weights are FP16.\n",
    "\n",
    "FasterTransformer is built on top of CUDA, cuBLAS, cuBLASLt and C++. We provide at least one API of the following frameworks: TensorFlow, PyTorch and Triton backend. Users can integrate FasterTransformer into these frameworks directly. For supporting frameworks, we also provide example codes to demonstrate how to use, and show the performance on these frameworks\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ff136863",
   "metadata": {},
   "source": [
    "## Licence agreement\n",
    " - View license information https://huggingface.co/models?license=license:apache-2.0 before using the model.\n",
    " - This notebook is a sample notebook and not intended for production use. Please refer to the licence at https://github.com/aws/mit-0. \n",
    "\n",
    " \n",
    " \n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "76fd81e4-d17a-44c1-9659-1b86cd6165ac",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "!pip install sagemaker boto3 huggingface_hub --upgrade #--quiet"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "9ea19605",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "import sagemaker\n",
    "import jinja2\n",
    "from sagemaker import image_uris\n",
    "import boto3\n",
    "import os\n",
    "import time\n",
    "import json\n",
    "from pathlib import Path"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "361b75a6",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "role = sagemaker.get_execution_role()  # execution role for the endpoint\n",
    "sess = sagemaker.session.Session()  # sagemaker session for interacting with different AWS APIs\n",
    "bucket = sess.default_bucket()  # bucket to house artifacts\n",
    "\n",
    "model_bucket = sess.default_bucket()  # bucket to house artifacts\n",
    "s3_code_prefix = (\n",
    "    \"hf-large-model-djl/code_flan_xxl_ft\"  # folder within bucket where code artifact will go\n",
    ")\n",
    "\n",
    "s3_model_prefix = (\n",
    "    \"hf-large-model-djl/model_flan_xxl_ft\"  # folder within bucket where code artifact will go\n",
    ")\n",
    "region = sess._region_name\n",
    "account_id = sess.account_id()\n",
    "\n",
    "s3_client = boto3.client(\"s3\")\n",
    "sm_client = boto3.client(\"sagemaker\")\n",
    "smr_client = boto3.client(\"sagemaker-runtime\")\n",
    "\n",
    "jinja_env = jinja2.Environment()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5b6318d1-2194-4b49-9ab4-d310aa7780c3",
   "metadata": {},
   "source": [
    "### [OPTIONAL] Download the model from Hugging Face and upload the model artifacts on Amazon S3\n",
    "\n",
    "If you intend to download your copy of the model and upload it to a s3 location in your AWS account, please follow the below steps, else you can skip to the next step."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "148ac115-641c-4c50-b57d-b6daec5b15d9",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "from huggingface_hub import snapshot_download\n",
    "from pathlib import Path\n",
    "import os\n",
    "\n",
    "# - This will download the model into the current directory where ever the jupyter notebook is running\n",
    "local_model_path = Path(\".\")\n",
    "local_model_path.mkdir(exist_ok=True)\n",
    "model_name = \"google/flan-t5-xxl\"\n",
    "# Only download pytorch checkpoint files\n",
    "allow_patterns = [\"*.json\", \"*.pt\", \"*.bin\", \"*.txt\", \"*.model\"]\n",
    "\n",
    "# - Leverage the snapshot library to donload the model since the model is stored in repository using LFS\n",
    "model_download_path = snapshot_download(\n",
    "    repo_id=model_name,\n",
    "    cache_dir=local_model_path,\n",
    "    allow_patterns=allow_patterns,\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "cf7d151b-78c2-45f5-981f-d27ae45e2e88",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "# define a variable to contain the s3url of the location that has the model\n",
    "pretrained_model_location = f\"s3://{model_bucket}/{s3_model_prefix}/\"\n",
    "print(f\"Pretrained model will be uploaded to ---- > {pretrained_model_location}\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "11c92d33-e16b-4432-b4e9-31b0082595ec",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "model_artifact = sess.upload_data(path=model_download_path, key_prefix=s3_model_prefix)\n",
    "print(f\"Model uploaded to --- > {model_artifact}\")\n",
    "print(f\"We will set option.s3url={model_artifact}\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "e7fa3e96-118d-4bf8-9378-1d75564d4826",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "!rm -rf {model_download_path}"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "cf3bc118-bf09-48aa-96ea-01e9039051e1",
   "metadata": {},
   "source": [
    "### Define a variable to contain the s3url of the location that has the model"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "7e1b6780-22b3-4f42-878b-41157fca76ca",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "pretrained_model_location = (\n",
    "    f\"s3://sagemaker-example-files-prod-{region}/models/hf-large-model-djl-ds/flant5-xxl/\"\n",
    ")\n",
    "print(f\"Pretrained model will be downloaded from ---- > {pretrained_model_location}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4c6d6f88",
   "metadata": {},
   "source": [
    "## Create SageMaker compatible Model artifact,  upload Model to S3 and bring your own inference script.\n",
    "\n",
    "SageMaker Large Model Inference containers can be used to host models without providing your own inference code. This is extremely useful when there is no custom pre-processing of the input data or post-processing of the model's predictions.\n",
    "\n",
    "However, in this notebook, we demonstrate how to deploy a model with custom inference code.\n",
    "\n",
    "SageMaker needs the model artifacts to be in a Tarball format. In this example, we provide the following files - `serving.properties` and `model.py`.\n",
    "\n",
    "The tarball is in the following format\n",
    "\n",
    "```\n",
    "code\n",
    "├──── \n",
    "│   └── serving.properties\n",
    "│   └── model.py\n",
    "   \n",
    "\n",
    "```\n",
    "\n",
    "- `serving.properties` is the configuration file that can be used to configure the model server.\n",
    "- `model.py` is the script handles any requests for serving.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b200bd54",
   "metadata": {},
   "source": [
    "#### Create serving.properties \n",
    "\n",
    "This is a configuration file to indicate to DJL Serving which model parallelization and inference optimization libraries you would like to use. Depending on your need, you can set the appropriate configuration.\n",
    "\n",
    "Here is a list of settings that we use in this configuration file -\n",
    "- `engine`: The engine for DJL to use. In this case, it is **FasterTransformer**.\n",
    "- `option.entryPoint`: The entrypoint python file or module. This should align with the engine that is being used. \n",
    "- `option.s3url`: Set this to the URI of the Amazon S3 bucket that contains the model. \n",
    "\n",
    "If you want to download the model from huggingface.co, you can set `option.modelid`. The model id of a pretrained model hosted inside a model repository on huggingface.co (https://huggingface.co/models). The container uses this model id to download the corresponding model repository on huggingface.co. \n",
    "- `option.tensor_parallel_degree`: Set to the number of GPU devices over which the model needs to be partitioned. This parameter also controls the number of workers per model which will be started up when DJL serving runs. As an example if we have a 8 GPU machine, and we are creating 8 partitions then we will have 1 worker per model to serve the requests.\n",
    "\n",
    "For more details on the configuration options and an exhaustive list, you can refer the documentation - https://docs.aws.amazon.com/sagemaker/latest/dg/realtime-endpoints-large-model-configuration.html.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "11909e63-abbb-4c72-a514-84548ceea58a",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "!rm -r code_fasterformer_flant5\n",
    "!mkdir -p code_fasterformer_flant5"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "c295397f",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "%%writefile code_fasterformer_flant5/serving.properties\n",
    "engine = FasterTransformer\n",
    "option.tensor_parallel_degree = 4\n",
    "option.s3url = {{s3url}}"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "9019c06a",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "# we plug in the appropriate model location into our `serving.properties` file based on the region in which this notebook is running\n",
    "template = jinja_env.from_string(Path(\"code_fasterformer_flant5/serving.properties\").open().read())\n",
    "Path(\"code_fasterformer_flant5/serving.properties\").open(\"w\").write(\n",
    "    template.render(s3url=pretrained_model_location)\n",
    ")\n",
    "!pygmentize code_fasterformer_flant5/serving.properties | cat -n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7cf68e17-d145-4083-832c-2d83cfaa1746",
   "metadata": {},
   "source": [
    "#### Create a model.py"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "581a558b-d73a-4a7b-8526-d9edaeccd299",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "%%writefile code_fasterformer_flant5/model.py\n",
    "import fastertransformer as ft\n",
    "from djl_python import Input, Output\n",
    "from transformers import (\n",
    "    AutoModelForCausalLM,\n",
    "    AutoTokenizer,\n",
    "    T5Tokenizer,\n",
    "    T5ForConditionalGeneration,\n",
    ")\n",
    "import os\n",
    "import logging\n",
    "import math\n",
    "import torch\n",
    "\n",
    "\n",
    "def load_model(properties):\n",
    "    model_name = \"google/flan-t5-xxl\"\n",
    "    tensor_parallel_degree = properties[\"tensor_parallel_degree\"]\n",
    "    pipeline_parallel_degree = 1\n",
    "    model_location = properties[\"model_dir\"]\n",
    "    if \"model_id\" in properties:\n",
    "        model_location = properties[\"model_id\"]\n",
    "    logging.info(f\"Loading model in {model_location}\")\n",
    "\n",
    "    tokenizer = T5Tokenizer.from_pretrained(model_location)\n",
    "    dtype = \"fp32\"\n",
    "    model = ft.init_inference(\n",
    "        model_location, tensor_parallel_degree, pipeline_parallel_degree, dtype\n",
    "    )\n",
    "    return model, tokenizer\n",
    "\n",
    "\n",
    "model = None\n",
    "tokenizer = None\n",
    "\n",
    "\n",
    "def handle(inputs: Input):\n",
    "    \"\"\"\n",
    "    inputs: Contains the configurations from serving.properties\n",
    "    \"\"\"\n",
    "    global model, tokenizer\n",
    "\n",
    "    if not model:\n",
    "        model, tokenizer = load_model(inputs.get_properties())\n",
    "\n",
    "    if inputs.is_empty():\n",
    "        # Model server makes an empty call to warmup the model on startup\n",
    "        return None\n",
    "\n",
    "    data = inputs.get_as_json()\n",
    "\n",
    "    input_sentences = data[\"inputs\"]\n",
    "    params = data[\"parameters\"]\n",
    "\n",
    "    outputs = model.pipeline_generate(input_sentences, **params)\n",
    "    result = {\"outputs\": outputs}\n",
    "\n",
    "    return Output().add_as_json(result)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5dd60a29",
   "metadata": {},
   "source": [
    "**Image URI for the DJL container is being used here**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "7e099e75-3a3e-4e60-8abf-aca672e083ee",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "inference_image_uri = image_uris.retrieve(\n",
    "    framework=\"djl-fastertransformer\", region=sess.boto_session.region_name, version=\"0.21.0\"\n",
    ")\n",
    "print(f\"Image going to be used is ---- > {inference_image_uri}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4ddce346",
   "metadata": {},
   "source": [
    "**Create the Tarball and then upload to S3 location**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "8e9c7652-cf9c-41ea-b8ef-968cf2cdeefc",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "!rm model.tar.gz\n",
    "!tar czvf model.tar.gz code_fasterformer_flant5"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "f388dd32",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "s3_code_artifact = sess.upload_data(\"model.tar.gz\", bucket, s3_code_prefix)\n",
    "print(f\"S3 Code or Model tar ball uploaded to --- > {s3_code_artifact}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e60ecd16",
   "metadata": {},
   "source": [
    "### To create the end point the steps are:\n",
    "\n",
    "1. Create the Model using the Image container and the Model Tarball uploaded earlier\n",
    "2. Create the endpoint config using the following key parameters\n",
    "\n",
    "    a) Instance Type is ml.g5.24xlarge \n",
    "    \n",
    "    b) ContainerStartupHealthCheckTimeoutInSeconds is 2400 to ensure health check starts after the model is ready    \n",
    "3. Create the end point using the endpoint config created    \n",
    "    "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "649cdd53",
   "metadata": {},
   "source": [
    "#### Create the Model\n",
    "Use the image URI for the DJL container and the s3 location to which the tarball was uploaded.\n",
    "\n",
    "The container downloads the model into the `/tmp` space on the instance because SageMaker maps the `/tmp` to the Amazon Elastic Block Store (Amazon EBS) volume that is mounted when we specify the endpoint creation parameter VolumeSizeInGB. It leverages `s5cmd`(https://github.com/peak/s5cmd) which offers a very fast download speed and hence extremely useful when downloading large models.\n",
    "\n",
    "For instances like p4dn, which come pre-built with the volume instance, we can continue to leverage the `/tmp` on the container. The size of this mount is large enough to hold the model.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "72f58be6",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "from sagemaker.utils import name_from_base\n",
    "\n",
    "model_name = name_from_base(f\"flan-xxl-fastertransformer\")\n",
    "print(model_name)\n",
    "\n",
    "create_model_response = sm_client.create_model(\n",
    "    ModelName=model_name,\n",
    "    ExecutionRoleArn=role,\n",
    "    PrimaryContainer={\"Image\": inference_image_uri, \"ModelDataUrl\": s3_code_artifact},\n",
    ")\n",
    "model_arn = create_model_response[\"ModelArn\"]\n",
    "\n",
    "print(f\"Created Model: {model_arn}\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "38025fec",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "endpoint_config_name = f\"{model_name}-config\"\n",
    "endpoint_name = f\"{model_name}-endpoint\"\n",
    "\n",
    "endpoint_config_response = sm_client.create_endpoint_config(\n",
    "    EndpointConfigName=endpoint_config_name,\n",
    "    ProductionVariants=[\n",
    "        {\n",
    "            \"VariantName\": \"variant1\",\n",
    "            \"ModelName\": model_name,\n",
    "            \"InstanceType\": \"ml.g5.12xlarge\",\n",
    "            \"InitialInstanceCount\": 1,\n",
    "            \"ContainerStartupHealthCheckTimeoutInSeconds\": 600,\n",
    "        },\n",
    "    ],\n",
    ")\n",
    "endpoint_config_response"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "53d84ba6",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "create_endpoint_response = sm_client.create_endpoint(\n",
    "    EndpointName=f\"{endpoint_name}\", EndpointConfigName=endpoint_config_name\n",
    ")\n",
    "print(f\"Created Endpoint: {create_endpoint_response['EndpointArn']}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d7454013",
   "metadata": {},
   "source": [
    "### This step can take ~ 10 min or longer so please be patient"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "0de9b6eb",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "import time\n",
    "\n",
    "resp = sm_client.describe_endpoint(EndpointName=endpoint_name)\n",
    "status = resp[\"EndpointStatus\"]\n",
    "print(\"Status: \" + status)\n",
    "\n",
    "while status == \"Creating\":\n",
    "    time.sleep(60)\n",
    "    resp = sm_client.describe_endpoint(EndpointName=endpoint_name)\n",
    "    status = resp[\"EndpointStatus\"]\n",
    "    print(\"Status: \" + status)\n",
    "\n",
    "print(\"Arn: \" + resp[\"EndpointArn\"])\n",
    "print(\"Status: \" + status)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "932d8421",
   "metadata": {},
   "source": [
    "#### While you wait for the endpoint to be created, you can read more about:\n",
    "- [Deep Learning containers for large model inference](https://docs.aws.amazon.com/sagemaker/latest/dg/realtime-endpoints-large-model-dlc.html)\n",
    "- [FasterTransformer](https://github.com/NVIDIA/FasterTransformer)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "12f1fa1b",
   "metadata": {},
   "source": [
    "#### Leverage the Boto3 to invoke the endpoint. \n",
    "\n",
    "This is a generative model so we pass in a Text as a prompt and the Model will complete the sentence and return the results.\n",
    "\n",
    "You can pass a batch of prompts as input to the model. This is done by setting `inputs` to the list of prompts. The model then returns a result for each prompt. The text generation can be configured using appropriate parameters. These `parameters` need to be passed to the endpoint as a dictionary of `kwargs`.\n",
    "\n",
    "The below code sample illustrates the invocation of the endpoint using prompts and also sets some parameters.\n",
    "\n",
    "Here's a list of default arguments that's used by the model for inference. You can pass specific values based on the use case - \n",
    "```\n",
    "default_args = dict(\n",
    "            inputs_embeds=None,\n",
    "            beam_width=1,\n",
    "            max_seq_len=200,\n",
    "            top_k=1,\n",
    "            top_p=0.0,\n",
    "            beam_search_diversity_rate=0.0,\n",
    "            temperature=1.0,\n",
    "            len_penalty=0.0,\n",
    "            repetition_penalty=1.0,\n",
    "            presence_penalty=None,\n",
    "            min_length=0,\n",
    "            random_seed=0,\n",
    "            is_return_output_log_probs=False,\n",
    "            is_return_cum_log_probs=False,\n",
    "            is_return_cross_attentions=False,\n",
    "            bad_words_list=None,\n",
    "            stop_words_list=None\n",
    "        )\n",
    "```\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "43c903b9-dff7-48cf-8dd3-d8f2b6d9d670",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "%%time\n",
    "smr_client.invoke_endpoint(\n",
    "    EndpointName=endpoint_name,\n",
    "    Body=json.dumps(\n",
    "        {\n",
    "            \"inputs\": [\"Amazon.com is the best \", \"Large model inference is\"],\n",
    "            \"parameters\": {\"min_length\": 20, \"max_seq_len\": 100, \"temperature\": 0.1},\n",
    "        }\n",
    "    ),\n",
    "    ContentType=\"application/json\",\n",
    ")[\"Body\"].read().decode(\"utf8\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "427c3bfc-4eee-4ab1-9d64-77b713193732",
   "metadata": {},
   "source": [
    "### Examples of prompt engineering for 'Zero Shot' NLP tasks\n",
    "Here are some examples of how you can use the deployed model for zero-shot NLP tasks."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "91b6419b-e6a7-4a94-812c-3fdb0ba8bcb8",
   "metadata": {},
   "source": [
    "#### Common Sense reasoning"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "3b646948-c05a-486e-a3bb-3c6052c6a5d2",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "%%time\n",
    "# -- Greedy generation\n",
    "smr_client.invoke_endpoint(\n",
    "    EndpointName=endpoint_name,\n",
    "    Body=json.dumps(\n",
    "        {\n",
    "            \"inputs\": [\n",
    "                \"The world cup has kicked off in Los Angeles, United States.\\n\\nBased on the paragraph above can we conclude that \\”The world cup takes place in United States.\\”?\\n\\n[\\”yes\\”, \\”no\\”]\"\n",
    "            ],\n",
    "            \"parameters\": {\"temperature\": 0.7},\n",
    "            \"padding\": True,\n",
    "        }\n",
    "    ),\n",
    "    ContentType=\"application/json\",\n",
    ")[\"Body\"].read().decode(\"utf8\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5d562804-016a-467a-bcdf-6469f7a362ca",
   "metadata": {
    "tags": []
   },
   "source": [
    "#### Text / Sentiment classification"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b818cb73-83e5-464b-9bb9-eb80a74556cf",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "%%time\n",
    "# -- Greedy generation\n",
    "smr_client.invoke_endpoint(\n",
    "    EndpointName=endpoint_name,\n",
    "    Body=json.dumps(\n",
    "        {\n",
    "            \"inputs\": [\n",
    "                \"Review:\\nThis moive is so great and once again dazzles and delights us\\nIs this movie review sentence negative or positive?\\nOPTIONS:\\n-positive \\n-negative\"\n",
    "            ],\n",
    "            \"parameters\": {},\n",
    "            \"padding\": True,\n",
    "        }\n",
    "    ),\n",
    "    ContentType=\"application/json\",\n",
    ")[\"Body\"].read().decode(\"utf8\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "120f9e4f-7530-44b9-a15f-c9ef68dab14f",
   "metadata": {},
   "source": [
    "#### Translation"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "f5e4d23a-0213-4c14-b4e0-bb756a850bdc",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "%%time\n",
    "smr_client.invoke_endpoint(\n",
    "    EndpointName=endpoint_name,\n",
    "    Body=json.dumps(\n",
    "        {\n",
    "            \"inputs\": [\"My name is Arthur\\n\\nTranslate to German\"],\n",
    "            \"parameters\": {\"temperature\": 0.7},\n",
    "            \"padding\": True,\n",
    "        }\n",
    "    ),\n",
    "    ContentType=\"application/json\",\n",
    ")[\"Body\"].read().decode(\"utf8\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2b4e2b2c-91e6-48c2-a777-17aaed72eb0a",
   "metadata": {},
   "source": [
    "#### Article generation"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "3034879b-79a5-490e-b26f-a49449201f5b",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "%%time\n",
    "smr_client.invoke_endpoint(\n",
    "    EndpointName=endpoint_name,\n",
    "    Body=json.dumps(\n",
    "        {\n",
    "            \"inputs\": [\n",
    "                \"Title: \\”University has new facility coming up“\\\\nGiven the above title of an imaginary article, imagine the article.\\n\"\n",
    "            ],\n",
    "            \"parameters\": {\"max_seq_len\": 200, \"temperature\": 0.7},\n",
    "            \"padding\": True,\n",
    "        }\n",
    "    ),\n",
    "    ContentType=\"application/json\",\n",
    ")[\"Body\"].read().decode(\"utf8\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "dcf121b2-92a7-4f3c-b97a-d1f0771af0fa",
   "metadata": {},
   "source": [
    "#### Abstractive Question Answering"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "94295a8f-bf68-454e-9177-c6bea818a462",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "context = \"\"\"\n",
    "Customer: Hi there, I'm having a problem with my iPhone.\n",
    "Agent: Hi! I'm sorry to hear that. What's happening?\n",
    "Customer: The phone is not charging properly, and the battery seems to be draining very quickly. I've tried different charging cables and power adapters, but the issue persists.\n",
    "Agent: Hmm, that's not good. Let's try some troubleshooting steps. Can you go to Settings, then Battery, and see if there are any apps that are using up a lot of battery life?\n",
    "Customer: Yes, there are some apps that are using up a lot of battery.\n",
    "Agent: Okay, try force quitting those apps by swiping up from the bottom of the screen and then swiping up on the app to close it.\n",
    "Customer: I did that, but the issue is still there.\n",
    "Agent: Alright, let's try resetting your iPhone's settings to their default values. This won't delete any of your data. Go to Settings, then General, then Reset, and then choose Reset All Settings.\n",
    "Customer: Okay, I did that. What's next?\n",
    "Agent: Now, let's try restarting your iPhone. Press and hold the power button until you see the \"slide to power off\" option. Slide to power off, wait a few seconds, and then turn your iPhone back on.\n",
    "Customer: Alright, I restarted it, but it's still not charging properly.\n",
    "Agent: I see. It looks like we need to run a diagnostic test on your iPhone. Please visit the nearest Apple Store or authorized service provider to get your iPhone checked out.\n",
    "Customer: Do I need to make an appointment?\n",
    "Agent: Yes, it's always best to make an appointment beforehand so you don't have to wait in line. You can make an appointment online or by calling the Apple Store or authorized service provider.\n",
    "Customer: Okay, will I have to pay for the repairs?\n",
    "Agent: That depends on whether your iPhone is covered under warranty or not. If it is, you won't have to pay anything. However, if it's not covered under warranty, you will have to pay for the repairs.\n",
    "Customer: How long will it take to get my iPhone back?\n",
    "Agent: It depends on the severity of the issue, but it usually takes 1-2 business days.\n",
    "Customer: Can I track the repair status online?\n",
    "Agent: Yes, you can track the repair status online or by calling the Apple Store or authorized service provider.\n",
    "Customer: Alright, thanks for your help.\n",
    "Agent: No problem, happy to help. Is there anything else I can assist you with?\n",
    "Customer: No, that's all for now.\n",
    "Agent: Alright, have a great day and good luck with your iPhone!\n",
    "\"\"\""
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d47e8c76-aae0-48c8-9d80-84ae62fa1346",
   "metadata": {},
   "source": [
    "#### Q1"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "961d4666-bd93-4b9e-bdd1-e406b1951a4a",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "query = (\n",
    "    \"What troubleshooting steps were suggested to the customer to fix their iPhone charging issue?\"\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "54a34be3-70c4-44ee-8e7d-2a32598ec4aa",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "%%time\n",
    "smr_client.invoke_endpoint(\n",
    "    EndpointName=endpoint_name,\n",
    "    Body=json.dumps(\n",
    "        {\n",
    "            \"inputs\": [f\"{context}\\n{query}\"],\n",
    "            \"parameters\": {\"max_seq_len\": 256, \"temperature\": 0.4, \"top_k\": 0, \"top_p\": 0.7},\n",
    "            \"padding\": True,\n",
    "        }\n",
    "    ),\n",
    "    ContentType=\"application/json\",\n",
    ")[\"Body\"].read().decode(\"utf8\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2fce1d44-0bbe-4e29-a0bc-4805675ea068",
   "metadata": {},
   "source": [
    "#### Q2"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "17a156fa-a844-447b-b5aa-ee845a597352",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "query = \"Was resetting the iPhone to its default settings able to solve the charging issue and battery drain problem?\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "d5d2f91c-7ffb-45ec-8ae9-9801fb109700",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "%%time\n",
    "smr_client.invoke_endpoint(\n",
    "    EndpointName=endpoint_name,\n",
    "    Body=json.dumps(\n",
    "        {\n",
    "            \"inputs\": [f\"{context}\\n{query}\"],\n",
    "            \"parameters\": {\"max_seq_len\": 256, \"temperature\": 0.4, \"top_k\": 0, \"top_p\": 0.7},\n",
    "            \"padding\": True,\n",
    "        }\n",
    "    ),\n",
    "    ContentType=\"application/json\",\n",
    ")[\"Body\"].read().decode(\"utf8\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b11c344c",
   "metadata": {},
   "source": [
    "## Clean Up"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "9947d080",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "# - Delete the end point\n",
    "sm_client.delete_endpoint(EndpointName=endpoint_name)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "f92e2391",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "# - In case the end point failed we still want to delete the model\n",
    "sm_client.delete_endpoint_config(EndpointConfigName=endpoint_config_name)\n",
    "sm_client.delete_model(ModelName=model_name)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e82b2d76",
   "metadata": {},
   "source": [
    "## Notebook CI Test Results\n",
    "\n",
    "This notebook was tested in multiple regions. The test results are as follows, except for us-west-2 which is shown at the top of the notebook.\n",
    "\n",
    "![This us-east-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/us-east-1/inference|generativeai|llm-workshop|lab5-flan-t5-xxl|flan-xxl-sagemaker-fastertransformer-s5cmd.ipynb)\n",
    "\n",
    "![This us-east-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/us-east-2/inference|generativeai|llm-workshop|lab5-flan-t5-xxl|flan-xxl-sagemaker-fastertransformer-s5cmd.ipynb)\n",
    "\n",
    "![This us-west-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/us-west-1/inference|generativeai|llm-workshop|lab5-flan-t5-xxl|flan-xxl-sagemaker-fastertransformer-s5cmd.ipynb)\n",
    "\n",
    "![This ca-central-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/ca-central-1/inference|generativeai|llm-workshop|lab5-flan-t5-xxl|flan-xxl-sagemaker-fastertransformer-s5cmd.ipynb)\n",
    "\n",
    "![This sa-east-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/sa-east-1/inference|generativeai|llm-workshop|lab5-flan-t5-xxl|flan-xxl-sagemaker-fastertransformer-s5cmd.ipynb)\n",
    "\n",
    "![This eu-west-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/eu-west-1/inference|generativeai|llm-workshop|lab5-flan-t5-xxl|flan-xxl-sagemaker-fastertransformer-s5cmd.ipynb)\n",
    "\n",
    "![This eu-west-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/eu-west-2/inference|generativeai|llm-workshop|lab5-flan-t5-xxl|flan-xxl-sagemaker-fastertransformer-s5cmd.ipynb)\n",
    "\n",
    "![This eu-west-3 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/eu-west-3/inference|generativeai|llm-workshop|lab5-flan-t5-xxl|flan-xxl-sagemaker-fastertransformer-s5cmd.ipynb)\n",
    "\n",
    "![This eu-central-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/eu-central-1/inference|generativeai|llm-workshop|lab5-flan-t5-xxl|flan-xxl-sagemaker-fastertransformer-s5cmd.ipynb)\n",
    "\n",
    "![This eu-north-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/eu-north-1/inference|generativeai|llm-workshop|lab5-flan-t5-xxl|flan-xxl-sagemaker-fastertransformer-s5cmd.ipynb)\n",
    "\n",
    "![This ap-southeast-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/ap-southeast-1/inference|generativeai|llm-workshop|lab5-flan-t5-xxl|flan-xxl-sagemaker-fastertransformer-s5cmd.ipynb)\n",
    "\n",
    "![This ap-southeast-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/ap-southeast-2/inference|generativeai|llm-workshop|lab5-flan-t5-xxl|flan-xxl-sagemaker-fastertransformer-s5cmd.ipynb)\n",
    "\n",
    "![This ap-northeast-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/ap-northeast-1/inference|generativeai|llm-workshop|lab5-flan-t5-xxl|flan-xxl-sagemaker-fastertransformer-s5cmd.ipynb)\n",
    "\n",
    "![This ap-northeast-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/ap-northeast-2/inference|generativeai|llm-workshop|lab5-flan-t5-xxl|flan-xxl-sagemaker-fastertransformer-s5cmd.ipynb)\n",
    "\n",
    "![This ap-south-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/ap-south-1/inference|generativeai|llm-workshop|lab5-flan-t5-xxl|flan-xxl-sagemaker-fastertransformer-s5cmd.ipynb)\n"
   ]
  }
 ],
 "metadata": {
  "availableInstances": [
   {
    "_defaultOrder": 0,
    "_isFastLaunch": true,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 4,
    "name": "ml.t3.medium",
    "vcpuNum": 2
   },
   {
    "_defaultOrder": 1,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 8,
    "name": "ml.t3.large",
    "vcpuNum": 2
   },
   {
    "_defaultOrder": 2,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 16,
    "name": "ml.t3.xlarge",
    "vcpuNum": 4
   },
   {
    "_defaultOrder": 3,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 32,
    "name": "ml.t3.2xlarge",
    "vcpuNum": 8
   },
   {
    "_defaultOrder": 4,
    "_isFastLaunch": true,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 8,
    "name": "ml.m5.large",
    "vcpuNum": 2
   },
   {
    "_defaultOrder": 5,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 16,
    "name": "ml.m5.xlarge",
    "vcpuNum": 4
   },
   {
    "_defaultOrder": 6,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 32,
    "name": "ml.m5.2xlarge",
    "vcpuNum": 8
   },
   {
    "_defaultOrder": 7,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 64,
    "name": "ml.m5.4xlarge",
    "vcpuNum": 16
   },
   {
    "_defaultOrder": 8,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 128,
    "name": "ml.m5.8xlarge",
    "vcpuNum": 32
   },
   {
    "_defaultOrder": 9,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 192,
    "name": "ml.m5.12xlarge",
    "vcpuNum": 48
   },
   {
    "_defaultOrder": 10,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 256,
    "name": "ml.m5.16xlarge",
    "vcpuNum": 64
   },
   {
    "_defaultOrder": 11,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 384,
    "name": "ml.m5.24xlarge",
    "vcpuNum": 96
   },
   {
    "_defaultOrder": 12,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 8,
    "name": "ml.m5d.large",
    "vcpuNum": 2
   },
   {
    "_defaultOrder": 13,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 16,
    "name": "ml.m5d.xlarge",
    "vcpuNum": 4
   },
   {
    "_defaultOrder": 14,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 32,
    "name": "ml.m5d.2xlarge",
    "vcpuNum": 8
   },
   {
    "_defaultOrder": 15,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 64,
    "name": "ml.m5d.4xlarge",
    "vcpuNum": 16
   },
   {
    "_defaultOrder": 16,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 128,
    "name": "ml.m5d.8xlarge",
    "vcpuNum": 32
   },
   {
    "_defaultOrder": 17,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 192,
    "name": "ml.m5d.12xlarge",
    "vcpuNum": 48
   },
   {
    "_defaultOrder": 18,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 256,
    "name": "ml.m5d.16xlarge",
    "vcpuNum": 64
   },
   {
    "_defaultOrder": 19,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 384,
    "name": "ml.m5d.24xlarge",
    "vcpuNum": 96
   },
   {
    "_defaultOrder": 20,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": true,
    "memoryGiB": 0,
    "name": "ml.geospatial.interactive",
    "supportedImageNames": [
     "sagemaker-geospatial-v1-0"
    ],
    "vcpuNum": 0
   },
   {
    "_defaultOrder": 21,
    "_isFastLaunch": true,
    "category": "Compute optimized",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 4,
    "name": "ml.c5.large",
    "vcpuNum": 2
   },
   {
    "_defaultOrder": 22,
    "_isFastLaunch": false,
    "category": "Compute optimized",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 8,
    "name": "ml.c5.xlarge",
    "vcpuNum": 4
   },
   {
    "_defaultOrder": 23,
    "_isFastLaunch": false,
    "category": "Compute optimized",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 16,
    "name": "ml.c5.2xlarge",
    "vcpuNum": 8
   },
   {
    "_defaultOrder": 24,
    "_isFastLaunch": false,
    "category": "Compute optimized",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 32,
    "name": "ml.c5.4xlarge",
    "vcpuNum": 16
   },
   {
    "_defaultOrder": 25,
    "_isFastLaunch": false,
    "category": "Compute optimized",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 72,
    "name": "ml.c5.9xlarge",
    "vcpuNum": 36
   },
   {
    "_defaultOrder": 26,
    "_isFastLaunch": false,
    "category": "Compute optimized",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 96,
    "name": "ml.c5.12xlarge",
    "vcpuNum": 48
   },
   {
    "_defaultOrder": 27,
    "_isFastLaunch": false,
    "category": "Compute optimized",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 144,
    "name": "ml.c5.18xlarge",
    "vcpuNum": 72
   },
   {
    "_defaultOrder": 28,
    "_isFastLaunch": false,
    "category": "Compute optimized",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 192,
    "name": "ml.c5.24xlarge",
    "vcpuNum": 96
   },
   {
    "_defaultOrder": 29,
    "_isFastLaunch": true,
    "category": "Accelerated computing",
    "gpuNum": 1,
    "hideHardwareSpecs": false,
    "memoryGiB": 16,
    "name": "ml.g4dn.xlarge",
    "vcpuNum": 4
   },
   {
    "_defaultOrder": 30,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 1,
    "hideHardwareSpecs": false,
    "memoryGiB": 32,
    "name": "ml.g4dn.2xlarge",
    "vcpuNum": 8
   },
   {
    "_defaultOrder": 31,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 1,
    "hideHardwareSpecs": false,
    "memoryGiB": 64,
    "name": "ml.g4dn.4xlarge",
    "vcpuNum": 16
   },
   {
    "_defaultOrder": 32,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 1,
    "hideHardwareSpecs": false,
    "memoryGiB": 128,
    "name": "ml.g4dn.8xlarge",
    "vcpuNum": 32
   },
   {
    "_defaultOrder": 33,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 4,
    "hideHardwareSpecs": false,
    "memoryGiB": 192,
    "name": "ml.g4dn.12xlarge",
    "vcpuNum": 48
   },
   {
    "_defaultOrder": 34,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 1,
    "hideHardwareSpecs": false,
    "memoryGiB": 256,
    "name": "ml.g4dn.16xlarge",
    "vcpuNum": 64
   },
   {
    "_defaultOrder": 35,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 1,
    "hideHardwareSpecs": false,
    "memoryGiB": 61,
    "name": "ml.p3.2xlarge",
    "vcpuNum": 8
   },
   {
    "_defaultOrder": 36,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 4,
    "hideHardwareSpecs": false,
    "memoryGiB": 244,
    "name": "ml.p3.8xlarge",
    "vcpuNum": 32
   },
   {
    "_defaultOrder": 37,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 8,
    "hideHardwareSpecs": false,
    "memoryGiB": 488,
    "name": "ml.p3.16xlarge",
    "vcpuNum": 64
   },
   {
    "_defaultOrder": 38,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 8,
    "hideHardwareSpecs": false,
    "memoryGiB": 768,
    "name": "ml.p3dn.24xlarge",
    "vcpuNum": 96
   },
   {
    "_defaultOrder": 39,
    "_isFastLaunch": false,
    "category": "Memory Optimized",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 16,
    "name": "ml.r5.large",
    "vcpuNum": 2
   },
   {
    "_defaultOrder": 40,
    "_isFastLaunch": false,
    "category": "Memory Optimized",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 32,
    "name": "ml.r5.xlarge",
    "vcpuNum": 4
   },
   {
    "_defaultOrder": 41,
    "_isFastLaunch": false,
    "category": "Memory Optimized",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 64,
    "name": "ml.r5.2xlarge",
    "vcpuNum": 8
   },
   {
    "_defaultOrder": 42,
    "_isFastLaunch": false,
    "category": "Memory Optimized",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 128,
    "name": "ml.r5.4xlarge",
    "vcpuNum": 16
   },
   {
    "_defaultOrder": 43,
    "_isFastLaunch": false,
    "category": "Memory Optimized",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 256,
    "name": "ml.r5.8xlarge",
    "vcpuNum": 32
   },
   {
    "_defaultOrder": 44,
    "_isFastLaunch": false,
    "category": "Memory Optimized",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 384,
    "name": "ml.r5.12xlarge",
    "vcpuNum": 48
   },
   {
    "_defaultOrder": 45,
    "_isFastLaunch": false,
    "category": "Memory Optimized",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 512,
    "name": "ml.r5.16xlarge",
    "vcpuNum": 64
   },
   {
    "_defaultOrder": 46,
    "_isFastLaunch": false,
    "category": "Memory Optimized",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 768,
    "name": "ml.r5.24xlarge",
    "vcpuNum": 96
   },
   {
    "_defaultOrder": 47,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 1,
    "hideHardwareSpecs": false,
    "memoryGiB": 16,
    "name": "ml.g5.xlarge",
    "vcpuNum": 4
   },
   {
    "_defaultOrder": 48,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 1,
    "hideHardwareSpecs": false,
    "memoryGiB": 32,
    "name": "ml.g5.2xlarge",
    "vcpuNum": 8
   },
   {
    "_defaultOrder": 49,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 1,
    "hideHardwareSpecs": false,
    "memoryGiB": 64,
    "name": "ml.g5.4xlarge",
    "vcpuNum": 16
   },
   {
    "_defaultOrder": 50,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 1,
    "hideHardwareSpecs": false,
    "memoryGiB": 128,
    "name": "ml.g5.8xlarge",
    "vcpuNum": 32
   },
   {
    "_defaultOrder": 51,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 1,
    "hideHardwareSpecs": false,
    "memoryGiB": 256,
    "name": "ml.g5.16xlarge",
    "vcpuNum": 64
   },
   {
    "_defaultOrder": 52,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 4,
    "hideHardwareSpecs": false,
    "memoryGiB": 192,
    "name": "ml.g5.12xlarge",
    "vcpuNum": 48
   },
   {
    "_defaultOrder": 53,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 4,
    "hideHardwareSpecs": false,
    "memoryGiB": 384,
    "name": "ml.g5.24xlarge",
    "vcpuNum": 96
   },
   {
    "_defaultOrder": 54,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 8,
    "hideHardwareSpecs": false,
    "memoryGiB": 768,
    "name": "ml.g5.48xlarge",
    "vcpuNum": 192
   },
   {
    "_defaultOrder": 55,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 8,
    "hideHardwareSpecs": false,
    "memoryGiB": 1152,
    "name": "ml.p4d.24xlarge",
    "vcpuNum": 96
   },
   {
    "_defaultOrder": 56,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 8,
    "hideHardwareSpecs": false,
    "memoryGiB": 1152,
    "name": "ml.p4de.24xlarge",
    "vcpuNum": 96
   }
  ],
  "kernelspec": {
   "display_name": "Python 3 (Data Science 3.0)",
   "language": "python",
   "name": "python3__SAGEMAKER_INTERNAL__arn:aws:sagemaker:us-west-2:236514542706:image/sagemaker-data-science-310-v1"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.6"
  },
  "vscode": {
   "interpreter": {
    "hash": "916dbcbb3f70747c44a77c7bcd40155683ae19c65e1c03b4aa3499c5328201f1"
   }
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}