# Deploy a pretrained HuggingFace InstructPix2Pix (Diffuser) Model into a SageMaker Endpoint

---

This notebook's CI test result for us-west-2 is as follows. CI test results in other regions can be found at the end of the notebook.

![This us-west-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/us-west-2/advanced_functionality|huggingface_deploy_instructpix2pix|deploy-instructpix2pix-to-sagemaker-ir.ipynb)

---

HuggingFace Diffusers provide pretrained vision and audio diffusion models, and serves as a modular toolbox for inference and training. 

Amazon SageMaker is a fully managed service that provides developers and data scientists with the ability to build, train, and deploy machine learning (ML) models quickly. Amazon SageMaker removes the heavy lifting from each step of the machine learning process to make it easier to develop high-quality models. The SageMaker Python SDK provides open source APIs and containers that make it easy to train and deploy models in Amazon SageMaker with several machine learning and deep learning frameworks.

HuggingFace models, which have a corresponding HuggingFace task, can be deployed to SageMaker as a `HuggingFaceModel` object, as described in [the documentation](https://huggingface.co/docs/sagemaker/inference). Yet, not all the models have a corresponding HuggingFace task.

In this notebook, we will use an approach to deploy any model from HuggingFace hub by taking files from a Git repo. As an example, we will use a pretrained instruction-based image editing model (https://huggingface.co/timbrooks/instruct-pix2pix). To do this, we will implement a custom entry point `inference.py` that will initiate the model in a SageMaker container and handle inference requests, which contain multimodal input (text prompt and image data).

We will execute the following steps:

- Download a HuggingFace model to the local file system with Git LFS.
- Tar-gzip the model and config files, and upload `model.tar.gz` to an S3 bucket.
- Deploy the model to a SageMaker Endpoint and make an inference request.
- Optionally, use inference recommender to check how different instance types perform as an endpoint.
- Optionally, cleanup.

Note that this notebook was adopted from another SageMaker example of [Pretrained PyTorch BERT model for sentiment analysis](https://github.com/aws/amazon-sagemaker-examples/blob/main/advanced_functionality/pytorch_deploy_pretrained_bert_model/pytorch_deploy_pretrained_bert_model.ipynb). Our example is different, because it doesn't require the model to be loaded with HuggingFace library into memory before saving it and sending for inference to SageMaker Endpoint.

For a step by step, hands-on learning experience about deploying other large generative AI models, please check: https://catalog.us-east-1.prod.workshops.aws/workshops/bb62b5d7-313f-4733-88cd-9c1aa41c724d/en-US/

Let's start by creating a SageMaker session and specifying:

- The S3 bucket and prefix that you want to use for the model data. This should be within the same region as the notebook instance, training, and hosting.
- The IAM role arn used to give hosting access to your data. See the documentation for how to create these. Note, if you want to use another role, please replace the `sagemaker.get_execution_role()` with the appropriate full IAM role arn string.

In [None]:
import sagemaker

sagemaker_session = sagemaker.Session()
bucket = sagemaker_session.default_bucket()
region = sagemaker_session.boto_region_name
model_prefix = "timbrooks/instruct-pix2pix"
role = sagemaker.get_execution_role()

## Download the `InstructPix2Pix` model artifacts from HuggingFace hub

We install Git LFS to handle large files in the git repository. Then we clone the repository locally to save the pre-trained model on the file system. 

In [None]:
!apt-get -qq update
!apt-get -qq install -y curl git-lfs

In [None]:
!git lfs install

We skip downloading `safetensors` and `ckpt` files as they are heavyweight and not needed to deploy the model into SageMaker.

In [None]:
!GIT_LFS_SKIP_SMUDGE=1 git clone --depth 1 https://huggingface.co/timbrooks/instruct-pix2pix
!git -C instruct-pix2pix/ lfs pull --exclude='*.safetensors,*.ckpt'

*Note:* As of the date of writing of this example, `git clone` command with LFS had [memory constraints](https://github.com/git-lfs/git-lfs/issues/3524). To be able to run this example on `ml.t3.medium` instance that has only 4 GB of RAM, we split clone the command in two.

Cleanup the LFS pointers to the skipped files.

In [None]:
!find ./instruct-pix2pix -name "*.ckpt" -type f -delete
!find ./instruct-pix2pix -name "*.safetensors" -type f -delete

SageMaker does not need git information. Cleanup the git files to save disk space.

In [None]:
!rm -r ./instruct-pix2pix/.git
!rm ./instruct-pix2pix/.gitattributes

## Package the pre-trained model and upload it to S3

Now you can see that there is a pretrained HuggingFace model under `instruct-pix2pix/` directory by listing the files in it, and you can upload it to S3.

In [None]:
!mkdir -p ./instruct-pix2pix/code/
!cp -v ./code/* ./instruct-pix2pix/code/

In [None]:
!find ./instruct-pix2pix/

In [None]:
!tar --use-compress-program='gzip --fast' -cvf ./model.tar.gz -C ./instruct-pix2pix/ .

In [None]:
pretrained_model_data = sagemaker_session.upload_data(
 path="model.tar.gz", bucket=bucket, key_prefix=model_prefix
)

In [None]:
!pygmentize instruct-pix2pix/code/inference.py

In [None]:
!pygmentize instruct-pix2pix/code/requirements.txt

In [None]:
from sagemaker.pytorch.model import PyTorchModel

pytorch_model = PyTorchModel(
 model_data=pretrained_model_data,
 role=role,
 framework_version="1.12",
 py_version="py38",
 source_dir="instruct-pix2pix/code",
 entry_point="inference.py",
)

In [None]:
predictor = pytorch_model.deploy(initial_instance_count=1, instance_type="ml.g4dn.xlarge")

Since in the `input_fn` we declared that the incoming requests are JSON-encoded, we need to use a `JSONSerializer`.

Also, we return a base64 string. So, we need to use a `StringDeserializer` to parse the response.

In [None]:
from sagemaker.serializers import JSONSerializer
from sagemaker.deserializers import StringDeserializer

predictor.serializer = JSONSerializer()
predictor.deserializer = StringDeserializer(accept="text/plain")

## Test the model

Using few samples, you can now invoke the SageMaker endpoint to get predictions. Note that we have a multimodal input as the model expects a prompt message as well as an image.
So, we will pack the prompt and the image into a JSON object. For that we need to base64-encode the image.

In [None]:
from PIL import Image, ImageOps
import requests

url = "https://raw.githubusercontent.com/timothybrooks/instruct-pix2pix/main/imgs/example.jpg"


def download_image(image_url):
 result = Image.open(requests.get(image_url, stream=True).raw)
 result = ImageOps.exif_transpose(result)
 result = result.convert("RGB")
 return result


image = download_image(url)
image

In [None]:
import base64
from io import BytesIO

buffered = BytesIO()
image.save(buffered, format="PNG")
img_str = base64.b64encode(buffered.getvalue())
base64_string = img_str.decode("latin1")

In [None]:
data = {"inputs": {"prompt": "turn him into cyborg", "image": base64_string}}

In [None]:
prediction_result = predictor.predict(data)

In [None]:
f = BytesIO(base64.b64decode(prediction_result))
img = Image.open(f)
img

## Optional: Use Inference Recommender to Select the Best Instance

Inference Recommender uses information about your ML model to recommend the best instance types and endpoint configurations for deployment. 

As the first step, we fetch the images the inference-recommender should use in the multimodal input. 

In [None]:
# Save image(s) to use for payload generation
from sagemaker.s3 import S3Downloader, S3Uploader

!mkdir -p ./images/pets
S3Downloader().download(s3_uri=f"s3://sagemaker-example-files-{region}/datasets/image/pets", local_path="./images/pets")

Now, we can create the multimodal payloads using the downloaded images and a prompt. The inference recommender will do random sampling to use the payloads in load tests. Note that we resize the images to 512x512 to load test based on the same size. Additionally, we only use one prompt message in this example. Make sure that your sample-payloads used by the inference recommender have a similar distribution of data to the different payloads expected in production.

In [None]:
from io import BytesIO
import json


def create_sample_payload(images_directory):
 for image_path in os.listdir(images_directory):
 input_path = os.path.join(images_directory, image_path)
 print(input_path)
 img = Image.open(input_path)
 img = img.resize((512, 512)) # resizing to load test based on the same size

 buffered = BytesIO()
 img.save(buffered, format="PNG")
 img_str = base64.b64encode(buffered.getvalue())
 base64_string = img_str.decode("latin1")
 payload = {"inputs": {"prompt": "turn him into cyborg", "image": base64_string}}
 output_path = os.path.join("sample-payload", image_path.replace(".", "_") + "_payload.txt")
 with open(output_path, "w") as file:
 file.write(json.dumps(payload))

In [None]:
!mkdir ./sample-payload
import os

create_sample_payload("./images/pets")

In [None]:
payload_archive_name = "payload.tar.gz"
!cd ./sample-payload/ && tar czvf ../{payload_archive_name} *

In [None]:
sample_payload_url = sagemaker.Session().upload_data(
 payload_archive_name, bucket=bucket, key_prefix=model_prefix + "/inference-recommender"
)

Provide domain and model related information to Inference Recommender:

Example ML Domains: COMPUTER_VISION, NATURAL_LANGUAGE_PROCESSING, MACHINE_LEARNING
Example ML Tasks: CLASSIFICATION, REGRESSION, OBJECT_DETECTION, OTHER
Note: Select the task that is the closest match to your model. Chose OTHER if none apply.

In [None]:
ml_domain = "MACHINE_LEARNING"
ml_task = "OTHER"
ml_framework = "PYTORCH"
framework_version = "1.12"

In [None]:
from sagemaker import image_uris

inference_image = image_uris.retrieve(
 framework="pytorch",
 region=region,
 version=framework_version,
 py_version="py38",
 instance_type="ml.p3.8xlarge",
 image_scope="inference",
)

print(inference_image)

In [None]:
import time
import boto3

client = boto3.client("sagemaker", region)

model_package_group_name = "instruct-pix2pix-" + str(round(time.time()))
print(model_package_group_name)
model_package_group_response = client.create_model_package_group(
 ModelPackageGroupName=str(model_package_group_name),
 ModelPackageGroupDescription="instruct-pix2pix model group",
)

In [None]:
model_package_version_response = client.create_model_package(
 ModelPackageGroupName=str(model_package_group_name),
 ModelPackageDescription="instruct-pix2pix Inference Recommender Demo",
 Domain=ml_domain,
 Task=ml_task,
 SamplePayloadUrl=sample_payload_url,
 InferenceSpecification={
 "Containers": [
 {
 "ContainerHostname": "pytorch",
 "Image": inference_image,
 "ModelDataUrl": pretrained_model_data,
 "Framework": ml_framework,
 "FrameworkVersion": framework_version,
 "Environment": {
 "SAGEMAKER_CONTAINER_LOG_LEVEL": "20",
 "SAGEMAKER_PROGRAM": "inference.py",
 "SAGEMAKER_REGION": region,
 "SAGEMAKER_SUBMIT_DIRECTORY": pretrained_model_data,
 },
 },
 ],
 "SupportedRealtimeInferenceInstanceTypes": [
 "ml.g4dn.xlarge",
 "ml.g4dn.12xlarge",
 "ml.g5.xlarge",
 "ml.p3.2xlarge",
 "ml.p3.8xlarge",
 ],
 "SupportedContentTypes": ["application/json"],
 "SupportedResponseMIMETypes": ["text/plain"],
 },
)

In [None]:
import datetime

default_job = "instrpx2px-" + datetime.datetime.now().strftime("%d-%H-%M-%S")
default_response = client.create_inference_recommendations_job(
 JobName=str(default_job),
 JobDescription="instruct-pix2pix Inference Basic Recommender Job",
 JobType="Default",
 RoleArn=role,
 InputConfig={"ModelPackageVersionArn": model_package_version_response["ModelPackageArn"]},
)

The inference recommender job provides multiple endpoint recommendations in its result. The recommendation includes `InstanceType`, `InitialInstanceCount`, `EnvironmentParameters` which includes tuned parameters for better performance. We also include the benchmarking results like `MaxInvocations`, `ModelLatency`, `CostPerHour` and `CostPerInference` for deeper analysis. The information provided will help you narrow down to a specific endpoint configuration that suits your use case.

In [None]:
%%time

import boto3

client = boto3.client("sagemaker", region)

ended = False
while not ended:
 inference_recommender_job = client.describe_inference_recommendations_job(
 JobName=str(default_job)
 )
 if inference_recommender_job["Status"] in ["COMPLETED", "STOPPED", "FAILED"]:
 ended = True
 else:
 print("Inference recommender job in progress")
 time.sleep(60)

if inference_recommender_job["Status"] == "FAILED":
 print("Inference recommender job failed ")
 print("Failed Reason: {}".inference_recommender_job["FailedReason"])
else:
 print("Inference recommender job completed")

In [None]:
import pandas as pd

data = [
 {**x["EndpointConfiguration"], **x["ModelConfiguration"], **x["Metrics"]}
 for x in inference_recommender_job["InferenceRecommendations"]
]
df = pd.DataFrame(data)
dropFilter = df.filter(["VariantName"])
df.drop(dropFilter, inplace=True, axis=1)
pd.set_option("max_colwidth", 400)

In [None]:
df[
 [
 "EndpointName",
 "InstanceType",
 "CostPerHour",
 "CostPerInference",
 "MaxInvocations",
 "ModelLatency",
 ]
].sort_values(by=["MaxInvocations"], ascending=False).head()

Note how the new generation instances such as G5 are outperforming the G4 instances of the previous generation!

Clean up Inference Recommender related artifacts.

In [None]:
!rm -rf ./sample-payload
!rm -rf ./images
!rm payload.tar.gz
!aws s3 rm --quiet $sample_payload_url

## Clean up deployment related artifacts and the endpoint

Endpoints should be deleted when no longer in use, since they're billed by time deployed, according to the [SageMaker pricing page](https://aws.amazon.com/sagemaker/pricing/).

In [None]:
predictor.delete_endpoint()

Also remove the cloned directory and the model object.

In [None]:
!rm -rf ./instruct-pix2pix

In [None]:
!rm model.tar.gz

In [None]:
!aws s3 rm --quiet $pretrained_model_data

## Notebook CI Test Results

This notebook was tested in multiple regions. The test results are as follows, except for us-west-2 which is shown at the top of the notebook.


![This us-east-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/us-east-1/advanced_functionality|huggingface_deploy_instructpix2pix|deploy-instructpix2pix-to-sagemaker-ir.ipynb)

![This us-east-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/us-east-2/advanced_functionality|huggingface_deploy_instructpix2pix|deploy-instructpix2pix-to-sagemaker-ir.ipynb)

![This us-west-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/us-west-1/advanced_functionality|huggingface_deploy_instructpix2pix|deploy-instructpix2pix-to-sagemaker-ir.ipynb)

![This ca-central-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/ca-central-1/advanced_functionality|huggingface_deploy_instructpix2pix|deploy-instructpix2pix-to-sagemaker-ir.ipynb)

![This sa-east-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/sa-east-1/advanced_functionality|huggingface_deploy_instructpix2pix|deploy-instructpix2pix-to-sagemaker-ir.ipynb)

![This eu-west-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/eu-west-1/advanced_functionality|huggingface_deploy_instructpix2pix|deploy-instructpix2pix-to-sagemaker-ir.ipynb)

![This eu-west-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/eu-west-2/advanced_functionality|huggingface_deploy_instructpix2pix|deploy-instructpix2pix-to-sagemaker-ir.ipynb)

![This eu-west-3 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/eu-west-3/advanced_functionality|huggingface_deploy_instructpix2pix|deploy-instructpix2pix-to-sagemaker-ir.ipynb)

![This eu-central-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/eu-central-1/advanced_functionality|huggingface_deploy_instructpix2pix|deploy-instructpix2pix-to-sagemaker-ir.ipynb)

![This eu-north-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/eu-north-1/advanced_functionality|huggingface_deploy_instructpix2pix|deploy-instructpix2pix-to-sagemaker-ir.ipynb)

![This ap-southeast-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/ap-southeast-1/advanced_functionality|huggingface_deploy_instructpix2pix|deploy-instructpix2pix-to-sagemaker-ir.ipynb)

![This ap-southeast-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/ap-southeast-2/advanced_functionality|huggingface_deploy_instructpix2pix|deploy-instructpix2pix-to-sagemaker-ir.ipynb)

![This ap-northeast-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/ap-northeast-1/advanced_functionality|huggingface_deploy_instructpix2pix|deploy-instructpix2pix-to-sagemaker-ir.ipynb)

![This ap-northeast-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/ap-northeast-2/advanced_functionality|huggingface_deploy_instructpix2pix|deploy-instructpix2pix-to-sagemaker-ir.ipynb)

![This ap-south-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/ap-south-1/advanced_functionality|huggingface_deploy_instructpix2pix|deploy-instructpix2pix-to-sagemaker-ir.ipynb)
