{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "14a93ca6",
   "metadata": {},
   "source": [
    "# Amazon SageMaker Multi-Model Endpoints using TorchServe\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a801d076",
   "metadata": {},
   "source": [
    "---\n",
    "\n",
    "This notebook's CI test result for us-west-2 is as follows. CI test results in other regions can be found at the end of the notebook. \n",
    "\n",
    "![This us-west-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/us-west-2/inference|torchserve|mme-gpu|torchserve_multi_model_endpoint.ipynb)\n",
    "\n",
    "---"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "446701de",
   "metadata": {},
   "source": [
    "## Contents\n",
    "\n",
    "With Amazon SageMaker multi-model endpoints, customers can create an endpoint that seamlessly hosts up to thousands of models. These endpoints are well suited to use cases where any one of many models, which can be served from a common inference container, needs to be called on-demand and where it is acceptable for infrequently invoked models to incur some additional latency. For applications which require consistently low inference latency, a traditional endpoint is still the best choice.\n",
    "\n",
    "At a high level, Amazon SageMaker manages the loading and unloading of models for a multi-model endpoint, as they are needed. When an invocation request is made for a particular model, Amazon SageMaker routes the request to an instance assigned to that model, downloads the model artifacts from S3 onto that instance, and initiates loading of the model into the memory of the container. As soon as the loading is complete, Amazon SageMaker performs the requested invocation and returns the result. If the model is already loaded in memory on the selected instance, the downloading and loading steps are skipped, and the invocation is performed immediately.\n",
    "\n",
    "This notebook uses SageMaker notebook instance conda_python3 kernel, demonstrates how to use TorchServe on SageMaker MME. In this example, there are 3 distinct models, each with its own set of dependencies, handler implementation and model configuration."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "9dab330f",
   "metadata": {},
   "outputs": [],
   "source": [
    "!python --version"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "fc665186",
   "metadata": {},
   "outputs": [],
   "source": [
    "!pip install numpy\n",
    "!pip install pillow\n",
    "!pip install -U sagemaker"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "c7c6f24b",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Python Built-Ins:\n",
    "from datetime import datetime\n",
    "import os\n",
    "import json\n",
    "import logging\n",
    "import time\n",
    "\n",
    "# External Dependencies:\n",
    "import boto3\n",
    "from botocore.exceptions import ClientError\n",
    "import sagemaker\n",
    "from sagemaker.multidatamodel import MultiDataModel\n",
    "from sagemaker.model import Model\n",
    "\n",
    "sess = boto3.Session()\n",
    "sm = sess.client(\"sagemaker\")\n",
    "region = sess.region_name\n",
    "account = boto3.client(\"sts\").get_caller_identity().get(\"Account\")\n",
    "\n",
    "smsess = sagemaker.Session(boto_session=sess)\n",
    "role = sagemaker.get_execution_role()\n",
    "\n",
    "# Configuration:\n",
    "bucket_name = smsess.default_bucket()\n",
    "prefix = \"torchserve\"\n",
    "output_path = f\"s3://{bucket_name}/{prefix}/mme\"\n",
    "print(f\"account={account}, region={region}, role={role}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "62863c41",
   "metadata": {},
   "source": [
    "## Build a BYOD TorchServe Docker container and push it to Amazon ECR\n",
    "You can follow this [instruction](https://medium.com/@samx81/how-to-move-docker-root-dir-to-ebs-on-aws-sagemaker-7d2560f7347d) to move Docker Root Dir to EBS if you got error \"no space left\" during building docker image."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "c20294af",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Use SageMaker PyTorch DLC as base image\n",
    "baseimage = sagemaker.image_uris.retrieve(\n",
    "    framework=\"pytorch\",\n",
    "    region=region,\n",
    "    py_version=\"py310\",\n",
    "    image_scope=\"inference\",\n",
    "    version=\"2.0.0\",\n",
    "    instance_type=\"ml.g5.2xlarge\",\n",
    ")\n",
    "print(baseimage)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "959fe086",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Install our own dependencies\n",
    "!cat workspace/docker/Dockerfile"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "52c33c1d",
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "%%capture build_output\n",
    "\n",
    "reponame = \"torchserve-mme-demo\"\n",
    "versiontag = \"genai-0.1\"\n",
    "\n",
    "# Build our own docker image\n",
    "!cd workspace/docker && ./build_and_push.sh {reponame} {versiontag} {baseimage} {region} {account}"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "678bc01e",
   "metadata": {},
   "outputs": [],
   "source": [
    "if \"Error response from daemon\" in str(build_output):\n",
    "    print(build_output)\n",
    "    raise SystemExit(\"\\n\\n!!There was an error with the container build!!\")\n",
    "else:\n",
    "    container = str(build_output).strip().split(\"\\n\")[-1]\n",
    "\n",
    "print(container)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "761c4028",
   "metadata": {},
   "source": [
    "## Create Model Artifacts\n",
    "This example creates a TorchServe model artifact for each model.\n",
    "### Install torch-model-archiver"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "c2ced165",
   "metadata": {},
   "outputs": [],
   "source": [
    "!pip install torch-model-archiver"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a16063ea",
   "metadata": {},
   "source": [
    "### Model 1: Segment Anything Model(SAM)\n",
    "A new AI model from Meta that can segment any object in any image with a single click. No additional training needed. We are downloading one of the checkpoints\n",
    "#### Download Segment Anything Model(SAM)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "5375cc2d",
   "metadata": {},
   "outputs": [],
   "source": [
    "model_file_name = \"sam_vit_h_4b8939.pth\"\n",
    "download_path = f\"https://huggingface.co/spaces/abhishek/StableSAM/resolve/main/{model_file_name}\"\n",
    "\n",
    "!wget $download_path -P workspace/sam"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a41318bc",
   "metadata": {},
   "source": [
    "#### Implement customized handler\n",
    "This step can be skipped if your model uses [TorchServe default handler](https://github.com/pytorch/serve/blob/ffa6847393cb7c36ae0122598152ca4614fe21f1/docs/default_handlers.md?plain=1#L1). Here we follow [TorchServe instruction](https://github.com/pytorch/serve/blob/ffa6847393cb7c36ae0122598152ca4614fe21f1/docs/custom_service.md?plain=1#L10) to create a customized handler for this model."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "5edd0a80",
   "metadata": {},
   "outputs": [],
   "source": [
    "!cat workspace/sam/custom_handler.py"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ab2439db",
   "metadata": {},
   "source": [
    "#### Config model"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "2118c1fc",
   "metadata": {},
   "outputs": [],
   "source": [
    "!cat workspace/sam/model-config.yaml"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "042828b0",
   "metadata": {},
   "source": [
    "#### Create and upload `sam.tar.gz` file "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "57652cd5",
   "metadata": {},
   "outputs": [],
   "source": [
    "!torch-model-archiver --model-name sam --version 1.0 --serialized-file workspace/sam/sam_vit_h_4b8939.pth --handler workspace/sam/custom_handler.py --config-file workspace/sam/model-config.yaml --archive-format no-archive\n",
    "!cd sam && tar cvzf sam.tar.gz ."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "11924275",
   "metadata": {},
   "outputs": [],
   "source": [
    "!cd sam && aws s3 cp sam.tar.gz {output_path}/sam.tar.gz"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "88840dd0",
   "metadata": {},
   "source": [
    "### Model 2: Stable Diffusion In Paint (SD)\n",
    "#### Import and Save Stable Diffusion Model"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "aa1bdc87",
   "metadata": {},
   "outputs": [],
   "source": [
    "!pip install -U torch diffusers==0.13.0 transformers"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "d002bc26",
   "metadata": {},
   "outputs": [],
   "source": [
    "import diffusers\n",
    "import torch\n",
    "import transformers\n",
    "\n",
    "pipeline = diffusers.StableDiffusionInpaintPipeline.from_pretrained(\n",
    "    \"stabilityai/stable-diffusion-2-inpainting\", torch_dtype=torch.float16\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "407d31d1",
   "metadata": {},
   "outputs": [],
   "source": [
    "sd_dir = \"workspace/sd/model\"\n",
    "pipeline.save_pretrained(sd_dir)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6015e45a",
   "metadata": {},
   "source": [
    "#### Implement customized handler\n",
    "This step can be skipped if your model uses [TorchServe default handler](https://github.com/pytorch/serve/blob/ffa6847393cb7c36ae0122598152ca4614fe21f1/docs/default_handlers.md?plain=1#L1). Here we follow [TorchServe instruction](https://github.com/pytorch/serve/blob/ffa6847393cb7c36ae0122598152ca4614fe21f1/docs/custom_service.md?plain=1#L10) to create a customized handler for this model."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "af93d51a",
   "metadata": {},
   "outputs": [],
   "source": [
    "!cat workspace/sd/custom_handler.py"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9c60a0bc",
   "metadata": {},
   "source": [
    "#### Config model"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "dd969347",
   "metadata": {},
   "outputs": [],
   "source": [
    "!cat workspace/sd/model-config.yaml"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f1800fed",
   "metadata": {},
   "source": [
    "#### Create `sd.tar.gz` file"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "c1f534a7",
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "!torch-model-archiver --model-name sd --version 1.0 --handler workspace/sd/custom_handler.py --extra-files workspace/sd/model --config-file workspace/sam/model-config.yaml --archive-format no-archive\n",
    "!cd sd && tar cvzf sd.tar.gz ."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b505ed13",
   "metadata": {},
   "source": [
    "### Model 3: Large Mask In Painting Model (Lama)\n",
    "#### Download Pre-Trained Model"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "7baf9342",
   "metadata": {},
   "outputs": [],
   "source": [
    "!pip install wldhx.yadisk-direct --quiet"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "8d850797",
   "metadata": {},
   "outputs": [],
   "source": [
    "!cd workspace/lama && curl -L $(yadisk-direct https://disk.yandex.ru/d/ouP6l8VJ0HpMZg) -o big-lama.zip && unzip big-lama.zip -d model"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "010d6868",
   "metadata": {},
   "source": [
    "#### Clone Lama Repo"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "4375ae10",
   "metadata": {},
   "outputs": [],
   "source": [
    "!cd workspace/lama && git clone https://github.com/advimman/lama.git lama-repo"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4b3606ca",
   "metadata": {},
   "source": [
    "#### Implement customized handler\n",
    "This step can be skipped if your model uses [TorchServe default handler](https://github.com/pytorch/serve/blob/ffa6847393cb7c36ae0122598152ca4614fe21f1/docs/default_handlers.md?plain=1#L1). Here we follow [TorchServe instruction](https://github.com/pytorch/serve/blob/ffa6847393cb7c36ae0122598152ca4614fe21f1/docs/custom_service.md?plain=1#L10) to create a customized handler for this model."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "c1b0fef6",
   "metadata": {},
   "outputs": [],
   "source": [
    "!cat workspace/lama/custom_handler.py"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ac36ff91",
   "metadata": {},
   "source": [
    "#### Config model"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "62adbd17",
   "metadata": {},
   "outputs": [],
   "source": [
    "!cat workspace/lama/model-config.yaml"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "cc342d1c",
   "metadata": {},
   "source": [
    "#### Create `lama.tar.gz` file"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "70af91b6",
   "metadata": {},
   "outputs": [],
   "source": [
    "!torch-model-archiver --model-name lama --version 1.0 --handler workspace/lama/custom_handler.py --extra-files workspace/lama/model,workspace/lama/lama-repo --config-file workspace/sam/model-config.yaml --archive-format no-archive\n",
    "!cd lama && tar cvzf lama.tar.gz ."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9acd33e1",
   "metadata": {},
   "source": [
    "## Create the Multi-Model Endpoint with the SageMaker SDK"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7d7c9f75",
   "metadata": {},
   "source": [
    "### Create the Amazon SageMaker MultiDataModel entity\n",
    "\n",
    "We create the multi-model endpoint using the [```MultiDataModel```](https://sagemaker.readthedocs.io/en/stable/api/inference/multi_data_model.html) class.\n",
    "\n",
    "You can create a MultiDataModel by directly passing in a `sagemaker.model.Model` object - in which case, the Endpoint will inherit information about the image to use, as well as any environmental variables, network isolation, etc., once the MultiDataModel is deployed.\n",
    "\n",
    "In addition, a MultiDataModel can also be created without explicitly passing a `sagemaker.model.Model` object. Please refer to the documentation for additional details."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "f50efe26",
   "metadata": {},
   "outputs": [],
   "source": [
    "# This is where our MME will read models from on S3.\n",
    "multi_model_s3uri = output_path\n",
    "\n",
    "print(multi_model_s3uri)\n",
    "model = Model(\n",
    "    model_data=f\"{multi_model_s3uri}/sam.tar.gz\",\n",
    "    image_uri=container,\n",
    "    role=role,\n",
    "    sagemaker_session=smsess,\n",
    "    env={\"TF_ENABLE_ONEDNN_OPTS\": \"0\"},\n",
    ")\n",
    "\n",
    "mme = MultiDataModel(\n",
    "    name=\"torchserve-mme-genai-\" + datetime.now().strftime(\"%Y-%m-%d-%H-%M-%S\"),\n",
    "    model_data_prefix=multi_model_s3uri,\n",
    "    model=model,\n",
    "    sagemaker_session=smsess,\n",
    ")\n",
    "print(mme)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f6abb202",
   "metadata": {},
   "source": [
    "### Deploy the Multi-Model Endpoint\n",
    "\n",
    "You need to consider the appropriate instance type and number of instances for the projected prediction workload across all the models you plan to host behind your multi-model endpoint. The number and size of the individual models will also drive memory requirements."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "a86e8be2",
   "metadata": {},
   "outputs": [],
   "source": [
    "try:\n",
    "    predictor.delete_endpoint(delete_endpoint_config=True)\n",
    "    print(\"Deleting previous endpoint...\")\n",
    "    time.sleep(10)\n",
    "except (NameError, ClientError):\n",
    "    pass\n",
    "\n",
    "mme.deploy(\n",
    "    initial_instance_count=1,\n",
    "    instance_type=\"ml.g5.2xlarge\",\n",
    "    serializer=sagemaker.serializers.JSONSerializer(),\n",
    "    deserializer=sagemaker.deserializers.JSONDeserializer(),\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "fbe8e6dc",
   "metadata": {},
   "source": [
    "### Our endpoint has launched! Let's look at what models are available to the endpoint!\n",
    "\n",
    "By 'available', what we mean is, what model artifacts are currently stored under the S3 prefix we defined when setting up the `MultiDataModel` above i.e. `model_data_prefix`.\n",
    "\n",
    "Currently, since we only have one artifact (i.e. `sam.tar.gz` files) stored under our defined S3 prefix."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "1a174381",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Only sam.tar.gz visible!\n",
    "list(mme.list_models())"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e42ae3e9",
   "metadata": {},
   "source": [
    "### Dynamically deploying models to the endpoint\n",
    "\n",
    "The `.add_model()` method of the `MultiDataModel` will copy over our model artifacts from where they were initially stored, by training, to where our endpoint will source model artifacts for inference requests.\n",
    "\n",
    "Note that we can continue using this method, as shown below, to dynamically deploy more models to our live endpoint as required!\n",
    "\n",
    "`model_data_source` refers to the location of our model artifact (i.e. where it was deposited on S3 after training completed)\n",
    "\n",
    "`model_data_path` is the **relative** path to the S3 prefix we specified above (i.e. `model_data_prefix`) where our endpoint will source models for inference requests. Since this is a **relative** path, we can simply pass the name of what we wish to call the model artifact at inference time."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "52079e38",
   "metadata": {},
   "outputs": [],
   "source": [
    "models = [\"sd/sd.tar.gz\", \"lama/lama.tar.gz\"]\n",
    "for model in models:\n",
    "    mme.add_model(model_data_source=model)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4d1470f7",
   "metadata": {},
   "source": [
    "### Our models are ready to invoke!\n",
    "\n",
    "We can see that the S3 prefix we specified when setting up `MultiDataModel` now has model artifacts listed. As such, the endpoint can now serve up inference requests for these models."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "26261ce4",
   "metadata": {},
   "outputs": [],
   "source": [
    "list(mme.list_models())"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "289d28e7",
   "metadata": {},
   "source": [
    "## Get predictions from the endpoint\n",
    "\n",
    "Recall that `mme.deploy()` returns a [Real Time Predictor](https://github.com/aws/sagemaker-python-sdk/blob/master/src/sagemaker/predictor.py#L35) that we saved in a variable called `predictor`.\n",
    "\n",
    "That `predictor` can now be used as usual to request inference - but specifying which model to call:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "f24a0bfb",
   "metadata": {},
   "outputs": [],
   "source": [
    "predictor = sagemaker.predictor.Predictor(endpoint_name=mme.endpoint_name, sagemaker_session=smsess)\n",
    "print(predictor)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c83ea4e5",
   "metadata": {},
   "source": [
    "### Model Segment Anything Inference Request"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "a3593456",
   "metadata": {},
   "outputs": [],
   "source": [
    "# sam payload\n",
    "import base64\n",
    "import json\n",
    "import io\n",
    "import numpy as np\n",
    "from PIL import Image\n",
    "\n",
    "\n",
    "def encode_image(img):\n",
    "    # Convert the image to bytes\n",
    "    with io.BytesIO() as output:\n",
    "        img.save(output, format=\"JPEG\")\n",
    "        img_bytes = output.getvalue()\n",
    "\n",
    "    return base64.b64encode(img_bytes).decode(\"utf-8\")\n",
    "\n",
    "\n",
    "img_file = \"workspace/test_data/sample1.png\"\n",
    "img_bytes = None\n",
    "with Image.open(img_file) as f:\n",
    "    img_bytes = encode_image(f)\n",
    "\n",
    "gen_args = json.dumps(dict(point_coords=[750, 500], point_labels=1, dilate_kernel_size=15))\n",
    "\n",
    "payload = json.dumps({\"image\": img_bytes, \"gen_args\": gen_args}).encode(\"utf-8\")\n",
    "\n",
    "response = predictor.predict(data=payload, target_model=\"/sam.tar.gz\")\n",
    "encoded_masks_string = json.loads(response.decode(\"utf-8\"))[\"generated_image\"]\n",
    "base64_bytes_masks = base64.b64decode(encoded_masks_string)\n",
    "\n",
    "with Image.open(io.BytesIO(base64_bytes_masks)) as f:\n",
    "    generated_image_rgb = f.convert(\"RGB\")\n",
    "    generated_image_rgb.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7c8c69b9",
   "metadata": {},
   "source": [
    "### Model Stable Diffusion In Paint Inference Request"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b8a07f6b",
   "metadata": {},
   "outputs": [],
   "source": [
    "# sd payload\n",
    "import base64\n",
    "import json\n",
    "import io\n",
    "import numpy as np\n",
    "from PIL import Image\n",
    "\n",
    "\n",
    "def encode_image(img):\n",
    "    # Convert the image to bytes\n",
    "    with io.BytesIO() as output:\n",
    "        img.save(output, format=\"JPEG\")\n",
    "        img_bytes = output.getvalue()\n",
    "\n",
    "    return base64.b64encode(img_bytes).decode(\"utf-8\")\n",
    "\n",
    "\n",
    "img_file = \"workspace/test_data/sample1.png\"\n",
    "img_bytes = None\n",
    "with Image.open(img_file) as f:\n",
    "    img_bytes = encode_image(f)\n",
    "\n",
    "mask_file = \"workspace/test_data/sample1_mask.jpg\"\n",
    "mask_bytes = None\n",
    "with Image.open(mask_file) as f:\n",
    "    mask_bytes = encode_image(f)\n",
    "\n",
    "prompt = \"a teddy bear on a bench\"\n",
    "nprompt = \"ugly\"\n",
    "gen_args = json.dumps(dict(num_inference_steps=50, guidance_scale=10, seed=1))\n",
    "\n",
    "payload = json.dumps(\n",
    "    {\n",
    "        \"image\": img_bytes,\n",
    "        \"mask_image\": mask_bytes,\n",
    "        \"prompt\": prompt,\n",
    "        \"negative_prompt\": nprompt,\n",
    "        \"gen_args\": gen_args,\n",
    "    }\n",
    ").encode(\"utf-8\")\n",
    "\n",
    "response = predictor.predict(data=payload, target_model=\"/sd.tar.gz\")\n",
    "encoded_masks_string = json.loads(response.decode(\"utf-8\"))[\"generated_image\"]\n",
    "base64_bytes_masks = base64.b64decode(encoded_masks_string)\n",
    "with Image.open(io.BytesIO(base64_bytes_masks)) as f:\n",
    "    generated_image_rgb = f.convert(\"RGB\")\n",
    "    generated_image_rgb.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "29c220b2",
   "metadata": {},
   "source": [
    "### Large Mask In Painting Model Inference Request"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "ac04de2d",
   "metadata": {},
   "outputs": [],
   "source": [
    "# lama payload\n",
    "import base64\n",
    "import json\n",
    "import io\n",
    "import numpy as np\n",
    "from PIL import Image\n",
    "\n",
    "\n",
    "def encode_image(img):\n",
    "    # Convert the image to bytes\n",
    "    with io.BytesIO() as output:\n",
    "        img.save(output, format=\"JPEG\")\n",
    "        img_bytes = output.getvalue()\n",
    "\n",
    "    return base64.b64encode(img_bytes).decode(\"utf-8\")\n",
    "\n",
    "\n",
    "img_file = \"workspace/test_data/sample1.png\"\n",
    "img_bytes = None\n",
    "with Image.open(img_file) as f:\n",
    "    img_bytes = encode_image(f)\n",
    "\n",
    "mask_file = \"workspace/test_data/sample1_mask.jpg\"\n",
    "mask_bytes = None\n",
    "with Image.open(mask_file) as f:\n",
    "    mask_bytes = encode_image(f)\n",
    "\n",
    "payload = json.dumps(\n",
    "    {\n",
    "        \"image\": img_bytes,\n",
    "        \"mask_image\": mask_bytes,\n",
    "        \"prompt\": prompt,\n",
    "        \"negative_prompt\": nprompt,\n",
    "        \"gen_args\": gen_args,\n",
    "    }\n",
    ").encode(\"utf-8\")\n",
    "\n",
    "response = predictor.predict(data=payload, target_model=\"/lama.tar.gz\")\n",
    "encoded_masks_string = json.loads(response.decode(\"utf-8\"))[\"generated_image\"]\n",
    "base64_bytes_masks = base64.b64decode(encoded_masks_string)\n",
    "with Image.open(io.BytesIO(base64_bytes_masks)) as f:\n",
    "    generated_image_rgb = f.convert(\"RGB\")\n",
    "    generated_image_rgb.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2880e078",
   "metadata": {},
   "source": [
    "## Updating a model\n",
    "\n",
    "To update a model, you would follow the same approach as above and add it as a new model. For example, `ModelA-2`.\n",
    "\n",
    "You should avoid overwriting model artifacts in Amazon S3, because the old version of the model might still be loaded in the endpoint's running container(s) or on the storage volume of instances on the endpoint: This would lead invocations to still use the old version of the model.\n",
    "\n",
    "Alternatively, you could stop the endpoint and re-deploy a fresh set of models."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5bb05ed2",
   "metadata": {},
   "source": [
    "## Clean up\n",
    "\n",
    "Endpoints should be deleted when no longer in use, since (per the [SageMaker pricing page](https://aws.amazon.com/sagemaker/pricing/)) they're billed by time deployed. Here we'll also delete the endpoint configuration - to keep things tidy."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "deee6c61",
   "metadata": {},
   "outputs": [],
   "source": [
    "predictor.delete_endpoint(delete_endpoint_config=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "623248bc",
   "metadata": {},
   "source": [
    "## Notebook CI Test Results\n",
    "\n",
    "This notebook was tested in multiple regions. The test results are as follows, except for us-west-2 which is shown at the top of the notebook.\n",
    "\n",
    "![This us-east-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/us-east-1/inference|torchserve|mme-gpu|torchserve_multi_model_endpoint.ipynb)\n",
    "\n",
    "![This us-east-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/us-east-2/inference|torchserve|mme-gpu|torchserve_multi_model_endpoint.ipynb)\n",
    "\n",
    "![This us-west-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/us-west-1/inference|torchserve|mme-gpu|torchserve_multi_model_endpoint.ipynb)\n",
    "\n",
    "![This ca-central-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/ca-central-1/inference|torchserve|mme-gpu|torchserve_multi_model_endpoint.ipynb)\n",
    "\n",
    "![This sa-east-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/sa-east-1/inference|torchserve|mme-gpu|torchserve_multi_model_endpoint.ipynb)\n",
    "\n",
    "![This eu-west-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/eu-west-1/inference|torchserve|mme-gpu|torchserve_multi_model_endpoint.ipynb)\n",
    "\n",
    "![This eu-west-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/eu-west-2/inference|torchserve|mme-gpu|torchserve_multi_model_endpoint.ipynb)\n",
    "\n",
    "![This eu-west-3 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/eu-west-3/inference|torchserve|mme-gpu|torchserve_multi_model_endpoint.ipynb)\n",
    "\n",
    "![This eu-central-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/eu-central-1/inference|torchserve|mme-gpu|torchserve_multi_model_endpoint.ipynb)\n",
    "\n",
    "![This eu-north-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/eu-north-1/inference|torchserve|mme-gpu|torchserve_multi_model_endpoint.ipynb)\n",
    "\n",
    "![This ap-southeast-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/ap-southeast-1/inference|torchserve|mme-gpu|torchserve_multi_model_endpoint.ipynb)\n",
    "\n",
    "![This ap-southeast-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/ap-southeast-2/inference|torchserve|mme-gpu|torchserve_multi_model_endpoint.ipynb)\n",
    "\n",
    "![This ap-northeast-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/ap-northeast-1/inference|torchserve|mme-gpu|torchserve_multi_model_endpoint.ipynb)\n",
    "\n",
    "![This ap-northeast-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/ap-northeast-2/inference|torchserve|mme-gpu|torchserve_multi_model_endpoint.ipynb)\n",
    "\n",
    "![This ap-south-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/ap-south-1/inference|torchserve|mme-gpu|torchserve_multi_model_endpoint.ipynb)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "conda_python3",
   "language": "python",
   "name": "conda_python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.10"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}