{
 "cells": [
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "82fce004-0da0-4337-becb-a7ff859502ca",
   "metadata": {},
   "source": [
    "#  Serve Falcon 40B model with Amazon SageMaker Hosting"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "51897f0b-1059-4131-bc2b-165707a984d5",
   "metadata": {},
   "source": [
    "---\n",
    "\n",
    "This notebook's CI test result for us-west-2 is as follows. CI test results in other regions can be found at the end of the notebook.\n",
    "\n",
    "![This us-west-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/us-west-2/inference%7Cgenerativeai%7Cllm-workshop%7Clab10-falcon-40b-and-7b%7Cfalcon-40b-deepspeed.ipynb)\n",
    "\n",
    "---"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "20b832e6-641a-443c-b533-2396e19b01bc",
   "metadata": {},
   "source": [
    "In this example we walk through how to deploy and perform inference on the **Falcon 40B model** using the **Large Model Inference(LMI)** container provided by AWS using **DJL Serving** and **DeepSpeed**. The Falcon 40B model is a casual decoder model that was trained on AWS SageMaker, on 384 A100 40GB GPUs in P4d instances. Because this is a large language model (LLM) that does not fit on a single GPU, we will use an 'ml.g5.24xlarge\" instance which has **4** GPUs.\n"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "9aad5b38-cb9c-4df9-ada8-84d0bed0c91a",
   "metadata": {},
   "source": [
    "## Setup"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "4abbea2a-b6c8-4106-ae69-17c877a00f88",
   "metadata": {},
   "source": [
    "Installs the dependencies required to package the model and run inferences using Amazon SageMaker. Update SageMaker, boto3 etc"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "bd50d8cb-1e06-480a-afb9-a5cc770e3c5f",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "!pip install sagemaker boto3 --upgrade  --quiet"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "67b43806-e267-4259-8dd8-617d4feb839d",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "import sagemaker\n",
    "import jinja2\n",
    "from sagemaker import image_uris\n",
    "import boto3\n",
    "import os\n",
    "import time\n",
    "import json\n",
    "from pathlib import Path\n",
    "from sagemaker.utils import name_from_base"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "dc80a142-7f21-48ca-9a44-1462cd82004c",
   "metadata": {},
   "source": [
    "## Imports and variables"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "ac2cb0e7-e646-4a8b-b9b5-f25c08ec6b9c",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "role = sagemaker.get_execution_role()  # execution role for the endpoint\n",
    "sess = sagemaker.session.Session()  # sagemaker session for interacting with different AWS APIs\n",
    "bucket = sess.default_bucket()  # bucket to house artifacts\n",
    "model_bucket = sess.default_bucket()  # bucket to house artifacts\n",
    "s3_code_prefix_deepspeed = \"hf-large-model-djl-/code_falcon40b/deepspeed\"  # folder within bucket where code artifact will go\n",
    "\n",
    "region = sess._region_name\n",
    "account_id = sess.account_id()\n",
    "\n",
    "s3_client = boto3.client(\"s3\")\n",
    "sm_client = boto3.client(\"sagemaker\")\n",
    "smr_client = boto3.client(\"sagemaker-runtime\")\n",
    "\n",
    "jinja_env = jinja2.Environment()"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "8a6bdc38-8552-4d2a-b1fe-aeacc229b85c",
   "metadata": {
    "tags": []
   },
   "source": [
    "### 1. Create SageMaker compatible model artifacts"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "1a7f3f1f-d82e-403c-b966-a1a495a26d5d",
   "metadata": {},
   "source": [
    "In order to prepare our model for deployment to a SageMaker Endpoint for hosting, we will need to prepare a few things for SageMaker and our container. We will use a local folder as the location of these files including **serving.properties** that defines parameters for the LMI container and **requirements.txt** to detail what dependies to install."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "ac8d9604",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "!mkdir -p code_falcon40b_deepspeed"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "e9f79b23-fd1d-4923-8179-50ad4ee60b52",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "# define a variable to contain the s3url of the location that has the model\n",
    "pretrained_model_location = f\"s3://{bucket}/models/falcon40b/\"\n",
    "print(f\"Pretrained model will be downloaded from ---- > {pretrained_model_location}\")"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "466c1b6f-213b-4540-b65c-c2c2e029e283",
   "metadata": {},
   "source": [
    "In the **serving.properties** files  define the the **engine** to use and **model** to host. Note the **tensor_parallel_degree** parameter which is also required in this scenario. We will use **tensor parallelism** to divide the model into multiple parts because no single GPU has enough memory for the entire model. In this case we will use a 'ml.g5.24xlarge' instance which provides **4** GPUs. Be careful not to specify a value larger than the instance provides or your deployment will fail. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "212382fc",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "%%writefile ./code_falcon40b_deepspeed/serving.properties\n",
    "engine=DeepSpeed\n",
    "option.model_id=tiiuae/falcon-40b\n",
    "option.tensor_parallel_degree=4\n",
    "#option.s3url = {{s3url}}"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "8db0e5dc",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "%%writefile ./code_falcon40b_deepspeed/requirements.txt\n",
    "einops\n",
    "git+https://github.com/lanking520/DeepSpeed.git@falcon"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "ba7a2900-ef24-4426-bc5c-fd85c5ad269a",
   "metadata": {},
   "source": [
    "### 2. Create a model.py with custom inference code"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "62b8fbf7-608d-4223-b943-83dcd87ef335",
   "metadata": {},
   "source": [
    "SageMaker allows you to bring your own script for inference. Here we create our **model.py** file with the appropriate code for the Falcon 40B model."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b7b228ee",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "%%writefile ./code_falcon40b_deepspeed/model.py\n",
    "from djl_python import Input, Output\n",
    "import os\n",
    "import torch\n",
    "from transformers import pipeline, AutoModelForCausalLM, AutoTokenizer\n",
    "from typing import Any, Dict, Tuple\n",
    "import deepspeed\n",
    "import warnings\n",
    "\n",
    "predictor = None\n",
    "\n",
    "\n",
    "def get_model(properties):\n",
    "    model_name = properties[\"model_id\"]\n",
    "    local_rank = int(os.getenv(\"LOCAL_RANK\", \"0\"))\n",
    "    model = AutoModelForCausalLM.from_pretrained(\n",
    "        model_name, low_cpu_mem_usage=True, trust_remote_code=True, torch_dtype=torch.bfloat16\n",
    "    )\n",
    "    model = deepspeed.init_inference(model, mp_size=properties[\"tensor_parallel_degree\"])\n",
    "    tokenizer = AutoTokenizer.from_pretrained(model_name)\n",
    "    generator = pipeline(\n",
    "        task=\"text-generation\", model=model, tokenizer=tokenizer, device=local_rank\n",
    "    )\n",
    "    return generator\n",
    "\n",
    "\n",
    "def handle(inputs: Input) -> None:\n",
    "    global predictor\n",
    "    if not predictor:\n",
    "        predictor = get_model(inputs.get_properties())\n",
    "\n",
    "    if inputs.is_empty():\n",
    "        # Model server makes an empty call to warmup the model on startup\n",
    "        return None\n",
    "    data = inputs.get_as_json()\n",
    "    text = data[\"text\"]\n",
    "    generation_kwargs = data[\"properties\"]\n",
    "    outputs = predictor(text, **generation_kwargs)\n",
    "    result = {\"outputs\": outputs}\n",
    "    return Output().add(result)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "c53f3ac7-dd27-4e4b-b5a5-d71cb8bc0733",
   "metadata": {},
   "source": [
    "### 3. Create the Tarball and then upload to S3 location\n",
    "Next, we will package our artifacts as `*.tar.gz` files for uploading to S3 for SageMaker to use for deployment"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "0ae6f030",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "!rm -f model.tar.gz\n",
    "!rm -rf code_falcon40b_deepspeed/.ipynb_checkpoints\n",
    "!tar czvf model.tar.gz -C code_falcon40b_deepspeed .\n",
    "s3_code_artifact_deepspeed = sess.upload_data(\"model.tar.gz\", bucket, s3_code_prefix_deepspeed)\n",
    "print(f\"S3 Code or Model tar for deepspeed uploaded to --- > {s3_code_artifact_deepspeed}\")"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "c9cbc98f-2991-4fed-ae7b-5d801b15ac18",
   "metadata": {},
   "source": [
    "### 4. Define a serving container, SageMaker Model and SageMaker endpoint\n",
    "Now that we have uploaded the model artifacts to S3, we can create a SageMaker endpoint.\n"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "81489ba8-c6cc-4c44-a9d7-eda16f49394b",
   "metadata": {
    "tags": []
   },
   "source": [
    "#### Define the serving container\n",
    "Here we define the container to use for the model for inference. We will be using SageMaker's Large Model Inference(LMI) container using DeepSpeed. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "ae731378-f75d-41bc-ba67-cabdf4ca6cde",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "# inference_image_uri = f\"{account_id}.dkr.ecr.{region}.amazonaws.com/djl-ds:latest\"\n",
    "inference_image_uri = (\n",
    "    f\"763104351884.dkr.ecr.{region}.amazonaws.com/djl-inference:0.22.1-deepspeed0.9.2-cu118\"\n",
    ")\n",
    "print(f\"Image going to be used is ---- > {inference_image_uri}\")"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "7648de8f-351d-4f1b-96a7-899b413ff8f4",
   "metadata": {},
   "source": [
    "#### Create SageMaker model, endpoint configuration and endpoint.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "327a864e",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "model_name_ds = name_from_base(f\"falcon40b-model-ds\")\n",
    "print(model_name_ds)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "178c74e9",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "create_model_response = sm_client.create_model(\n",
    "    ModelName=model_name_ds,\n",
    "    ExecutionRoleArn=role,\n",
    "    PrimaryContainer={\"Image\": inference_image_uri, \"ModelDataUrl\": s3_code_artifact_deepspeed},\n",
    ")\n",
    "model_arn = create_model_response[\"ModelArn\"]\n",
    "\n",
    "print(f\"Created Model: {model_arn}\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "f37baaaa",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "model_name = model_name_ds\n",
    "print(f\"Building EndpointConfig and Endpoint for: {model_name}\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "3a6a0a9c-d2be-4548-a78c-4ad2cbab9087",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "endpoint_config_name = f\"{model_name}-config\"\n",
    "endpoint_name = f\"{model_name}-endpoint\"\n",
    "\n",
    "endpoint_config_response = sm_client.create_endpoint_config(\n",
    "    EndpointConfigName=endpoint_config_name,\n",
    "    ProductionVariants=[\n",
    "        {\n",
    "            \"VariantName\": \"variant1\",\n",
    "            \"ModelName\": model_name,\n",
    "            \"InstanceType\": \"ml.g5.24xlarge\",\n",
    "            \"InitialInstanceCount\": 1,\n",
    "            \"ModelDataDownloadTimeoutInSeconds\": 3600,\n",
    "            \"ContainerStartupHealthCheckTimeoutInSeconds\": 3600,\n",
    "            # \"VolumeSizeInGB\": 512\n",
    "        },\n",
    "    ],\n",
    ")\n",
    "endpoint_config_response"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "f34c6563-75bc-4441-b14a-0ea3694ce61f",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "create_endpoint_response = sm_client.create_endpoint(\n",
    "    EndpointName=f\"{endpoint_name}\", EndpointConfigName=endpoint_config_name\n",
    ")\n",
    "print(f\"Created Endpoint: {create_endpoint_response['EndpointArn']}\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "11e348ac-b9b4-4998-b9bd-51d0d07f51c6",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "import time\n",
    "\n",
    "resp = sm_client.describe_endpoint(EndpointName=endpoint_name)\n",
    "status = resp[\"EndpointStatus\"]\n",
    "print(\"Status: \" + status)\n",
    "\n",
    "while status == \"Creating\":\n",
    "    time.sleep(60)\n",
    "    resp = sm_client.describe_endpoint(EndpointName=endpoint_name)\n",
    "    status = resp[\"EndpointStatus\"]\n",
    "    print(\"Status: \" + status)\n",
    "\n",
    "print(\"Arn: \" + resp[\"EndpointArn\"])\n",
    "print(\"Status: \" + status)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "e5e6750d-d680-42f3-b879-005c4c006add",
   "metadata": {},
   "source": [
    "### Run Inference\n",
    "\n",
    "Large models such as Falcon have very high accelerator memory footprint. Thus, a very large input payload or generating a large output can cause out of memory errors. The inference examples below are calibrated such that they will work on the `ml.g5.24xlarge` instance within the\n",
    "SageMaker response time limit of 60 seconds. If you find that increasing the input length or generation length leads to CUDA Out Of Memory errors, we recommend that you try one of the following solutions:\n",
    "* Use 8 bit quantization. In the `model.py` script, you can enable `dtype=torch.int8` in the call to `deepspeed.init_inference`. This will reduce the memory footprint of the model on the GPUs, allowing for larger input and generation sizes.\n",
    "  * Using 8bit quantization may result in lower quality generated output.\n",
    "* Deploy to an instance with more GPUs, and/or GPUs with more memory. The `ml.g5.48xlarge`, `ml.p4d.24xlarge`, and `ml.p4de.24xlarge` instances are all good options here.\n",
    "\n",
    "When attempting to generate more tokens, you might run into issues with the SageMaker runtime client timing out after 60 seconds. To get around this issue, we recommend that you check out our example for Large Language Models with streaming via pagination.\n",
    "You can find that example [here](../lab6-stream-with-pagination/stream_pagination_lmi.ipynb). When using streaming, you still have to be conscious of memory constraints.\n",
    "\n",
    "In the following example inference requests, we limit the sequence length such that we return a response within 60 seconds and don't exceed the GPU memory available.\n",
    "\n",
    "You can pass additional generation arguments as part of the properties dictionary in the request (e.g. temperature, top_k etc.)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "401387f9-1f8e-440d-bd4c-11aa6714b975",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "%%time\n",
    "\n",
    "data = {\n",
    "    \"text\": \"What is the purpose of life?\",\n",
    "    \"properties\": {\n",
    "        \"min_length\": 100,\n",
    "        \"max_length\": 150,\n",
    "        \"do_sample\": True,\n",
    "    },\n",
    "}\n",
    "\n",
    "response_model = smr_client.invoke_endpoint(\n",
    "    EndpointName=endpoint_name,\n",
    "    Body=json.dumps(data),\n",
    "    ContentType=\"application/json\",\n",
    ")\n",
    "\n",
    "response_model[\"Body\"].read().decode(\"utf8\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "c4484728",
   "metadata": {},
   "outputs": [],
   "source": [
    "start_time = time.time()\n",
    "\n",
    "data = {\n",
    "    \"text\": \"What is Love?\",\n",
    "    \"properties\": {\n",
    "        \"min_length\": 100,\n",
    "        \"max_length\": 150,\n",
    "        \"do_sample\": True,\n",
    "    },\n",
    "}\n",
    "\n",
    "while (time.time() - start_time) < 300:  # 300 seconds = 5 minutes\n",
    "    response_model = smr_client.invoke_endpoint(\n",
    "        EndpointName=endpoint_name,\n",
    "        Body=json.dumps(data),\n",
    "        ContentType=\"application/json\",\n",
    "    )\n",
    "\n",
    "    print(\"Loop restarting - answer: \" + response_model[\"Body\"].read().decode(\"utf8\"))"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "2aeec2f9-51f5-406b-b8cb-615ac67a709c",
   "metadata": {},
   "source": [
    "### Clean Up"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b3744ca9-d6cf-4aa4-8693-4fd43a71e4fe",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "# - Delete the end point\n",
    "sm_client.delete_endpoint(EndpointName=endpoint_name)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "c05cfff2-a56d-4fe4-8915-74e44b68715e",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "# - In case the end point failed we still want to delete the model\n",
    "sm_client.delete_endpoint_config(EndpointConfigName=endpoint_config_name)\n",
    "sm_client.delete_model(ModelName=model_name)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "6da96321-2a5b-4482-8dd8-ba403dc4bb0a",
   "metadata": {},
   "source": [
    "## Notebook CI Test Results\n",
    "\n",
    "This notebook was tested in multiple regions. The test results are as follows, except for us-west-2 which is shown at the top of the notebook.\n",
    "\n",
    "\n",
    "![This us-east-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/us-east-1/inference%7Cgenerativeai%7Cllm-workshop%7Clab10-falcon-40b-and-7b%7Cfalcon-40b-deepspeed.ipynb)\n",
    "\n",
    "![This us-east-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/us-east-2/inference%7Cgenerativeai%7Cllm-workshop%7Clab10-falcon-40b-and-7b%7Cfalcon-40b-deepspeed.ipynb)\n",
    "\n",
    "![This us-west-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/us-west-1/inference%7Cgenerativeai%7Cllm-workshop%7Clab10-falcon-40b-and-7b%7Cfalcon-40b-deepspeed.ipynb)\n",
    "\n",
    "![This ca-central-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/ca-central-1/inference%7Cgenerativeai%7Cllm-workshop%7Clab10-falcon-40b-and-7b%7Cfalcon-40b-deepspeed.ipynb)\n",
    "\n",
    "![This sa-east-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/sa-east-1/inference%7Cgenerativeai%7Cllm-workshop%7Clab10-falcon-40b-and-7b%7Cfalcon-40b-deepspeed.ipynb)\n",
    "\n",
    "![This eu-west-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/eu-west-1/inference%7Cgenerativeai%7Cllm-workshop%7Clab10-falcon-40b-and-7b%7Cfalcon-40b-deepspeed.ipynb)\n",
    "\n",
    "![This eu-west-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/eu-west-2/inference%7Cgenerativeai%7Cllm-workshop%7Clab10-falcon-40b-and-7b%7Cfalcon-40b-deepspeed.ipynb)\n",
    "\n",
    "![This eu-west-3 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/eu-west-3/inference%7Cgenerativeai%7Cllm-workshop%7Clab10-falcon-40b-and-7b%7Cfalcon-40b-deepspeed.ipynb)\n",
    "\n",
    "![This eu-central-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/eu-central-1/inference%7Cgenerativeai%7Cllm-workshop%7Clab10-falcon-40b-and-7b%7Cfalcon-40b-deepspeed.ipynb)\n",
    "\n",
    "![This eu-north-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/eu-north-1/inference%7Cgenerativeai%7Cllm-workshop%7Clab10-falcon-40b-and-7b%7Cfalcon-40b-deepspeed.ipynb)\n",
    "\n",
    "![This ap-southeast-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/ap-southeast-1/inference%7Cgenerativeai%7Cllm-workshop%7Clab10-falcon-40b-and-7b%7Cfalcon-40b-deepspeed.ipynb)\n",
    "\n",
    "![This ap-southeast-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/ap-southeast-2/inference%7Cgenerativeai%7Cllm-workshop%7Clab10-falcon-40b-and-7b%7Cfalcon-40b-deepspeed.ipynb)\n",
    "\n",
    "![This ap-northeast-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/ap-northeast-1/inference%7Cgenerativeai%7Cllm-workshop%7Clab10-falcon-40b-and-7b%7Cfalcon-40b-deepspeed.ipynb)\n",
    "\n",
    "![This ap-northeast-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/ap-northeast-2/inference%7Cgenerativeai%7Cllm-workshop%7Clab10-falcon-40b-and-7b%7Cfalcon-40b-deepspeed.ipynb)\n",
    "\n",
    "![This ap-south-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/ap-south-1/inference%7Cgenerativeai%7Cllm-workshop%7Clab10-falcon-40b-and-7b%7Cfalcon-40b-deepspeed.ipynb)\n"
   ]
  }
 ],
 "metadata": {
  "availableInstances": [
   {
    "_defaultOrder": 0,
    "_isFastLaunch": true,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 4,
    "name": "ml.t3.medium",
    "vcpuNum": 2
   },
   {
    "_defaultOrder": 1,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 8,
    "name": "ml.t3.large",
    "vcpuNum": 2
   },
   {
    "_defaultOrder": 2,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 16,
    "name": "ml.t3.xlarge",
    "vcpuNum": 4
   },
   {
    "_defaultOrder": 3,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 32,
    "name": "ml.t3.2xlarge",
    "vcpuNum": 8
   },
   {
    "_defaultOrder": 4,
    "_isFastLaunch": true,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 8,
    "name": "ml.m5.large",
    "vcpuNum": 2
   },
   {
    "_defaultOrder": 5,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 16,
    "name": "ml.m5.xlarge",
    "vcpuNum": 4
   },
   {
    "_defaultOrder": 6,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 32,
    "name": "ml.m5.2xlarge",
    "vcpuNum": 8
   },
   {
    "_defaultOrder": 7,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 64,
    "name": "ml.m5.4xlarge",
    "vcpuNum": 16
   },
   {
    "_defaultOrder": 8,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 128,
    "name": "ml.m5.8xlarge",
    "vcpuNum": 32
   },
   {
    "_defaultOrder": 9,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 192,
    "name": "ml.m5.12xlarge",
    "vcpuNum": 48
   },
   {
    "_defaultOrder": 10,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 256,
    "name": "ml.m5.16xlarge",
    "vcpuNum": 64
   },
   {
    "_defaultOrder": 11,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 384,
    "name": "ml.m5.24xlarge",
    "vcpuNum": 96
   },
   {
    "_defaultOrder": 12,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 8,
    "name": "ml.m5d.large",
    "vcpuNum": 2
   },
   {
    "_defaultOrder": 13,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 16,
    "name": "ml.m5d.xlarge",
    "vcpuNum": 4
   },
   {
    "_defaultOrder": 14,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 32,
    "name": "ml.m5d.2xlarge",
    "vcpuNum": 8
   },
   {
    "_defaultOrder": 15,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 64,
    "name": "ml.m5d.4xlarge",
    "vcpuNum": 16
   },
   {
    "_defaultOrder": 16,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 128,
    "name": "ml.m5d.8xlarge",
    "vcpuNum": 32
   },
   {
    "_defaultOrder": 17,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 192,
    "name": "ml.m5d.12xlarge",
    "vcpuNum": 48
   },
   {
    "_defaultOrder": 18,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 256,
    "name": "ml.m5d.16xlarge",
    "vcpuNum": 64
   },
   {
    "_defaultOrder": 19,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 384,
    "name": "ml.m5d.24xlarge",
    "vcpuNum": 96
   },
   {
    "_defaultOrder": 20,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": true,
    "memoryGiB": 0,
    "name": "ml.geospatial.interactive",
    "supportedImageNames": [
     "sagemaker-geospatial-v1-0"
    ],
    "vcpuNum": 0
   },
   {
    "_defaultOrder": 21,
    "_isFastLaunch": true,
    "category": "Compute optimized",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 4,
    "name": "ml.c5.large",
    "vcpuNum": 2
   },
   {
    "_defaultOrder": 22,
    "_isFastLaunch": false,
    "category": "Compute optimized",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 8,
    "name": "ml.c5.xlarge",
    "vcpuNum": 4
   },
   {
    "_defaultOrder": 23,
    "_isFastLaunch": false,
    "category": "Compute optimized",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 16,
    "name": "ml.c5.2xlarge",
    "vcpuNum": 8
   },
   {
    "_defaultOrder": 24,
    "_isFastLaunch": false,
    "category": "Compute optimized",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 32,
    "name": "ml.c5.4xlarge",
    "vcpuNum": 16
   },
   {
    "_defaultOrder": 25,
    "_isFastLaunch": false,
    "category": "Compute optimized",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 72,
    "name": "ml.c5.9xlarge",
    "vcpuNum": 36
   },
   {
    "_defaultOrder": 26,
    "_isFastLaunch": false,
    "category": "Compute optimized",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 96,
    "name": "ml.c5.12xlarge",
    "vcpuNum": 48
   },
   {
    "_defaultOrder": 27,
    "_isFastLaunch": false,
    "category": "Compute optimized",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 144,
    "name": "ml.c5.18xlarge",
    "vcpuNum": 72
   },
   {
    "_defaultOrder": 28,
    "_isFastLaunch": false,
    "category": "Compute optimized",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 192,
    "name": "ml.c5.24xlarge",
    "vcpuNum": 96
   },
   {
    "_defaultOrder": 29,
    "_isFastLaunch": true,
    "category": "Accelerated computing",
    "gpuNum": 1,
    "hideHardwareSpecs": false,
    "memoryGiB": 16,
    "name": "ml.g4dn.xlarge",
    "vcpuNum": 4
   },
   {
    "_defaultOrder": 30,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 1,
    "hideHardwareSpecs": false,
    "memoryGiB": 32,
    "name": "ml.g4dn.2xlarge",
    "vcpuNum": 8
   },
   {
    "_defaultOrder": 31,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 1,
    "hideHardwareSpecs": false,
    "memoryGiB": 64,
    "name": "ml.g4dn.4xlarge",
    "vcpuNum": 16
   },
   {
    "_defaultOrder": 32,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 1,
    "hideHardwareSpecs": false,
    "memoryGiB": 128,
    "name": "ml.g4dn.8xlarge",
    "vcpuNum": 32
   },
   {
    "_defaultOrder": 33,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 4,
    "hideHardwareSpecs": false,
    "memoryGiB": 192,
    "name": "ml.g4dn.12xlarge",
    "vcpuNum": 48
   },
   {
    "_defaultOrder": 34,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 1,
    "hideHardwareSpecs": false,
    "memoryGiB": 256,
    "name": "ml.g4dn.16xlarge",
    "vcpuNum": 64
   },
   {
    "_defaultOrder": 35,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 1,
    "hideHardwareSpecs": false,
    "memoryGiB": 61,
    "name": "ml.p3.2xlarge",
    "vcpuNum": 8
   },
   {
    "_defaultOrder": 36,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 4,
    "hideHardwareSpecs": false,
    "memoryGiB": 244,
    "name": "ml.p3.8xlarge",
    "vcpuNum": 32
   },
   {
    "_defaultOrder": 37,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 8,
    "hideHardwareSpecs": false,
    "memoryGiB": 488,
    "name": "ml.p3.16xlarge",
    "vcpuNum": 64
   },
   {
    "_defaultOrder": 38,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 8,
    "hideHardwareSpecs": false,
    "memoryGiB": 768,
    "name": "ml.p3dn.24xlarge",
    "vcpuNum": 96
   },
   {
    "_defaultOrder": 39,
    "_isFastLaunch": false,
    "category": "Memory Optimized",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 16,
    "name": "ml.r5.large",
    "vcpuNum": 2
   },
   {
    "_defaultOrder": 40,
    "_isFastLaunch": false,
    "category": "Memory Optimized",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 32,
    "name": "ml.r5.xlarge",
    "vcpuNum": 4
   },
   {
    "_defaultOrder": 41,
    "_isFastLaunch": false,
    "category": "Memory Optimized",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 64,
    "name": "ml.r5.2xlarge",
    "vcpuNum": 8
   },
   {
    "_defaultOrder": 42,
    "_isFastLaunch": false,
    "category": "Memory Optimized",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 128,
    "name": "ml.r5.4xlarge",
    "vcpuNum": 16
   },
   {
    "_defaultOrder": 43,
    "_isFastLaunch": false,
    "category": "Memory Optimized",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 256,
    "name": "ml.r5.8xlarge",
    "vcpuNum": 32
   },
   {
    "_defaultOrder": 44,
    "_isFastLaunch": false,
    "category": "Memory Optimized",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 384,
    "name": "ml.r5.12xlarge",
    "vcpuNum": 48
   },
   {
    "_defaultOrder": 45,
    "_isFastLaunch": false,
    "category": "Memory Optimized",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 512,
    "name": "ml.r5.16xlarge",
    "vcpuNum": 64
   },
   {
    "_defaultOrder": 46,
    "_isFastLaunch": false,
    "category": "Memory Optimized",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 768,
    "name": "ml.r5.24xlarge",
    "vcpuNum": 96
   },
   {
    "_defaultOrder": 47,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 1,
    "hideHardwareSpecs": false,
    "memoryGiB": 16,
    "name": "ml.g5.xlarge",
    "vcpuNum": 4
   },
   {
    "_defaultOrder": 48,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 1,
    "hideHardwareSpecs": false,
    "memoryGiB": 32,
    "name": "ml.g5.2xlarge",
    "vcpuNum": 8
   },
   {
    "_defaultOrder": 49,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 1,
    "hideHardwareSpecs": false,
    "memoryGiB": 64,
    "name": "ml.g5.4xlarge",
    "vcpuNum": 16
   },
   {
    "_defaultOrder": 50,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 1,
    "hideHardwareSpecs": false,
    "memoryGiB": 128,
    "name": "ml.g5.8xlarge",
    "vcpuNum": 32
   },
   {
    "_defaultOrder": 51,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 1,
    "hideHardwareSpecs": false,
    "memoryGiB": 256,
    "name": "ml.g5.16xlarge",
    "vcpuNum": 64
   },
   {
    "_defaultOrder": 52,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 4,
    "hideHardwareSpecs": false,
    "memoryGiB": 192,
    "name": "ml.g5.12xlarge",
    "vcpuNum": 48
   },
   {
    "_defaultOrder": 53,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 4,
    "hideHardwareSpecs": false,
    "memoryGiB": 384,
    "name": "ml.g5.24xlarge",
    "vcpuNum": 96
   },
   {
    "_defaultOrder": 54,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 8,
    "hideHardwareSpecs": false,
    "memoryGiB": 768,
    "name": "ml.g5.48xlarge",
    "vcpuNum": 192
   }
  ],
  "instance_type": "ml.t3.medium",
  "kernelspec": {
   "display_name": "Python 3 (Data Science 2.0)",
   "language": "python",
   "name": "python3__SAGEMAKER_INTERNAL__arn:aws:sagemaker:us-west-2:236514542706:image/sagemaker-data-science-38"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.13"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}