{ "cells": [ { "cell_type": "markdown", "id": "16c61f54", "metadata": {}, "source": [ "# Introduction to JumpStart - Text to Image\n", "\n", "Note: This notebook is originally from [SageMaker JumpStart Notebook](https://github.com/aws/amazon-sagemaker-examples/blob/main/introduction_to_amazon_algorithms/jumpstart_text_to_image/Amazon_JumpStart_Text_To_Image.ipynb)\n", "\n", "Note: This notebook requires AWS account and charge for AWS resources. Running this notebook take approximately \\\\$1.0. (\\\\$0.4 for training, $0.94/hour for inference) for Oregon Region. For more information about pricing of SageMaker, visit [pricing page](https://aws.amazon.com/sagemaker/pricing/)." ] }, { "cell_type": "markdown", "id": "bdc23bae", "metadata": {}, "source": [ "***\n", "Welcome to Amazon [SageMaker JumpStart](https://docs.aws.amazon.com/sagemaker/latest/dg/studio-jumpstart.html)! You can use JumpStart to solve many Machine Learning tasks through one-click in SageMaker Studio, or through [SageMaker JumpStart API](https://sagemaker.readthedocs.io/en/stable/overview.html#use-prebuilt-models-with-sagemaker-jumpstart). In this demo notebook, we demonstrate how to use the JumpStart API to generate images from text using state-of-the-art Stable Diffusion models. Furthermore, we show how to fine-tune the model to your dataset.\n", "\n", "Stable Diffusion is a text-to-image model that enables you to create photorealistic images from just a text prompt. A diffusion model trains by learning to remove noise that was added to a real image. This de-noising process generates a realistic image. These models can also generate images from text alone by conditioning the generation process on the text. For instance, Stable Diffusion is a latent diffusion where the model learns to recognize shapes in a pure noise image and gradually brings these shapes into focus if the shapes match the words in the input text.\n", "\n", "Training and deploying large models and running inference on models such as Stable Diffusion is often challenging and include issues such as cuda out of memory, payload size limit exceeded and so on. JumpStart simplifies this process by providing ready-to-use scripts that have been robustly tested. Furthermore, it provides guidance on each step of the process including the recommended instance types, how to select parameters to guide image generation process, prompt engineering etc. Moreover, you can deploy and run inference on any of the 80+ Diffusion models from JumpStart without having to write any piece of your own code.\n", "\n", "In the first part of this notebook, you will learn how to use JumpStart to generate highly realistic and artistic images of any subject/object/environment/scene. This may be as simple as an image of a cute dog or as detailed as a hyper-realistic image of a beautifully decoraded cozy kitchen by pixer in the style of greg rutkowski with dramatic sunset lighting and long shadows with cinematic atmosphere. This can be used to design products and build catalogs for ecommerce business needs or to generate realistic art pieces or stock images.\n", "\n", "In the second part of this notebook, you will learn how to use JumpStart to fine-tune the Stable Diffusion model to your dataset. This can be useful when creating art, logos, custom designs, NFTs, and so on, or fun stuff such as generating custom AI images of your pets or avatars of yourself.\n", "\n", "\n", "Model lincese: By using this model, you agree to the [CreativeML Open RAIL-M++ license](https://huggingface.co/stabilityai/stable-diffusion-2/blob/main/LICENSE-MODEL).\n", "\n", "***" ] }, { "cell_type": "markdown", "id": "5db28351", "metadata": {}, "source": [ "1. [Set Up](#1.-Set-Up)\n", "2. [Run inference on the pre-trained model](#2.-Run-inference-on-the-pre-trained-model)\n", " * [Select a model](#2.1.-Select-a-Model)\n", " * [Retrieve JumpStart Artifacts & Deploy an Endpoint](#2.2.-Retrieve-JumpStart-Artifacts-&-Deploy-an-Endpoint)\n", " * [Query endpoint and parse response](#2.3.-Query-endpoint-and-parse-response)\n", " * [Supported Inference parameters](#2.4.-Supported-Inference-parameters)\n", " * [Compressed Image Output](#2.5.-Compressed-Image-Output)\n", " * [Prompt Engineering](#2.6.-Prompt-Engineering)\n", " * [Clean up the endpoint](#2.7.-Clean-up-the-endpoint)\n", " * [Negative Prompting](#2.7.-Negative-Prompting)\n", "\n", "3. [Fine-tune the pre-trained model on a custom dataset](#3.-Fine-tune-the-pre-trained-model-on-a-custom-dataset)\n", " * [Retrieve Training Artifacts](#3.1.-Retrieve-Training-Artifacts)\n", " * [Set Training parameters](#3.2.-Set-Training-parameters)\n", " * [Start Training](#3.3.-Start-Training)\n", " * [Deploy and run inference on the fine-tuned model](#3.4.-Deploy-and-run-inference-on-the-fine-tuned-model)\n", "\n", "4. [Conclusion](#4.-Conclusion)\n" ] }, { "cell_type": "markdown", "id": "ce462973", "metadata": {}, "source": [ "Note: This notebook was tested on ml.t3.medium instance in Amazon SageMaker Studio with Python 3 (Data Science) kernel and in Amazon SageMaker Notebook instance with conda_python3 kernel.\n", "\n", "Note: To deploy the pre-trained or fine-tuned model, you can use `ml.p3.2xlarge` or `ml.g4dn.2xlarge` instance types. If `ml.g5.2xlarge` is available in your region, we recommend using that instance type for deployment. For fine-tuning the model on your dataset, you need `ml.g4dn.2xlarge` instance type available in your account." ] }, { "cell_type": "markdown", "id": "9ea47727", "metadata": {}, "source": [ "### 1. Set Up" ] }, { "cell_type": "markdown", "id": "35b91e81", "metadata": {}, "source": [ "***\n", "Before executing the notebook, there are some initial steps required for set up.\n", "\n", "1. You need to run this notebook with custom conda environment.\n", " 1. Right click `environment.yaml` and select `Build Conda Environment`.\n", " 2. After building the environment select the environment form dropdown at the top right corner of this page.\n", "2. You need to set up AWS credentials with existing AWS Account.\n", " 1. Open a terminal and run `aws configure` (We recommend using region supporting `ml.g5.2xlarge`. For exmaple, `us-west-2`)\n", "***" ] }, { "cell_type": "markdown", "id": "48370155", "metadata": {}, "source": [ "#### Permissions and environment variables\n", "\n", "***\n", "To host on Amazon SageMaker, we need to set up and authenticate the use of AWS services.\n", "\n", "Here, we use the execution role for SageMaker.\n", "\n", "1. If you are already used to using SageMaker within your own AWS account, please copy and paste the RoleName for your execution role below.\n", "2. If you are new to thise, follow the steps to create one [here](https://docs.aws.amazon.com/sagemaker/latest/dg/sagemaker-roles.html)\n", "\n", "Please note, in order to complete this you will need to have already created this SageMaker IAM Execution role.\n", "\n", "***" ] }, { "cell_type": "code", "execution_count": null, "id": "90518e45", "metadata": { "tags": [] }, "outputs": [], "source": [ "import sagemaker, boto3, json\n", "from sagemaker import get_execution_role\n", "\n", "try:\n", " aws_role = sagemaker.get_execution_role()\n", "except:\n", " iam = boto3.client(\"iam\")\n", " # TODO: replace with your role name (i.e. \"AmazonSageMaker-ExecutionRole-20211014T154824\")\n", " aws_role = iam.get_role(RoleName=\"\")[\"Role\"][\"Arn\"]\n", "\n", "boto_session = boto3.Session()\n", "aws_region = boto_session.region_name\n", "sess = sagemaker.Session(boto_session=boto_session)\n", "\n", "print(aws_role)\n", "print(aws_region)\n", "print(sess.boto_region_name)" ] }, { "cell_type": "markdown", "id": "310fca48", "metadata": {}, "source": [ "## 2. Run inference on the pre-trained model\n", "\n", "***\n", "\n", "Using JumpStart, we can perform inference on the pre-trained model, even without fine-tuning it first on a new dataset.\n", "***" ] }, { "cell_type": "markdown", "id": "0e072e72-8bb4-4a8d-b887-2e9658dc3672", "metadata": {}, "source": [ "### 2.1. Select a Model\n", "***\n", "You can continue with the default model, or can choose a different model from the dropdown generated upon running the next cell. A complete list of SageMaker pre-trained models can also be accessed at [Sagemaker pre-trained Models](https://sagemaker.readthedocs.io/en/stable/doc_utils/pretrainedmodels.html#).\n", "\n", "***" ] }, { "cell_type": "code", "execution_count": null, "id": "4fda3e9c-e59f-4a2a-96bf-60c0750b9ad0", "metadata": { "tags": [] }, "outputs": [], "source": [ "from ipywidgets import Dropdown\n", "from sagemaker.jumpstart.notebook_utils import list_jumpstart_models\n", "\n", "# Retrieves all Text-to-Image generation models.\n", "filter_value = \"task == txt2img\"\n", "txt2img_models = list_jumpstart_models(filter=filter_value)\n", "\n", "# display the model-ids in a dropdown to select a model for inference.\n", "model_dropdown = Dropdown(\n", " options=txt2img_models,\n", " value=\"model-txt2img-stabilityai-stable-diffusion-v2-1-base\",\n", " description=\"Select a model\",\n", " style={\"description_width\": \"initial\"},\n", " layout={\"width\": \"max-content\"},\n", ")\n", "display(model_dropdown)" ] }, { "cell_type": "code", "execution_count": null, "id": "3e4f77d3-bd76-4d0c-b3a2-3ae3fe9e52f8", "metadata": { "tags": [] }, "outputs": [], "source": [ "# model_version=\"*\" fetches the latest version of the model\n", "model_id, model_version = model_dropdown.value, \"*\"" ] }, { "cell_type": "markdown", "id": "282e37a1-e379-4bd3-af2c-02d02fd41d78", "metadata": {}, "source": [ "### 2.2. Retrieve JumpStart Artifacts & Deploy an Endpoint\n", "\n", "***\n", "\n", "Using JumpStart, we can perform inference on the pre-trained model, even without fine-tuning it first on a new dataset. We start by retrieving the `deploy_image_uri`, `deploy_source_uri`, and `model_uri` for the pre-trained model. To host the pre-trained model, we create an instance of [sagemaker.model.Model](https://sagemaker.readthedocs.io/en/stable/api/inference/model.html) and deploy it.\n", "\n", "### This may take upto 10 minutes. Please do not kill the kernel while you wait.\n", "\n", "While you wait, you can checkout the [Generate images from text with the stable diffusion model on Amazon SageMaker JumpStart](https://aws.amazon.com/blogs/machine-learning/generate-images-from-text-with-the-stable-diffusion-model-on-amazon-sagemaker-jumpstart/) blog to learn more about Stable Diffusion model and JumpStart.\n", "\n", "\n", "***" ] }, { "cell_type": "code", "execution_count": null, "id": "a8a79ec9", "metadata": { "tags": [] }, "outputs": [], "source": [ "from sagemaker import image_uris, model_uris, script_uris, hyperparameters\n", "from sagemaker.model import Model\n", "from sagemaker.predictor import Predictor\n", "from sagemaker.utils import name_from_base\n", "\n", "\n", "endpoint_name = name_from_base(f\"jumpstart-example-infer-{model_id}\")\n", "\n", "# Please use ml.g5.24xlarge instance type if it is available in your region. ml.g5.24xlarge has 24GB GPU compared to 16GB in ml.p3.2xlarge and supports generation of larger and better quality images.\n", "inference_instance_type = \"ml.p3.2xlarge\"\n", "\n", "# Retrieve the inference docker container uri. This is the base HuggingFace container image for the default model above.\n", "deploy_image_uri = image_uris.retrieve(\n", " region=None,\n", " framework=None, # automatically inferred from model_id\n", " image_scope=\"inference\",\n", " model_id=model_id,\n", " model_version=model_version,\n", " instance_type=inference_instance_type,\n", ")\n", "\n", "# Retrieve the inference script uri. This includes all dependencies and scripts for model loading, inference handling etc.\n", "deploy_source_uri = script_uris.retrieve(\n", " model_id=model_id, model_version=model_version, script_scope=\"inference\"\n", ")\n", "\n", "\n", "# Retrieve the model uri. This includes the pre-trained nvidia-ssd model and parameters.\n", "model_uri = model_uris.retrieve(\n", " model_id=model_id, model_version=model_version, model_scope=\"inference\"\n", ")\n", "\n", "# To increase the maximum response size from the endpoint.\n", "env = {\n", " \"MMS_MAX_RESPONSE_SIZE\": \"20000000\",\n", "}\n", "\n", "# Create the SageMaker model instance\n", "model = Model(\n", " image_uri=deploy_image_uri,\n", " source_dir=deploy_source_uri,\n", " model_data=model_uri,\n", " entry_point=\"inference.py\", # entry point file in source_dir and present in deploy_source_uri\n", " role=aws_role,\n", " predictor_cls=Predictor,\n", " name=endpoint_name,\n", " env=env,\n", ")\n", "\n", "# deploy the Model. Note that we need to pass Predictor class when we deploy model through Model class,\n", "# for being able to run inference through the sagemaker API.\n", "model_predictor = model.deploy(\n", " initial_instance_count=1,\n", " instance_type=inference_instance_type,\n", " predictor_cls=Predictor,\n", " endpoint_name=endpoint_name,\n", ")" ] }, { "cell_type": "markdown", "id": "b2e0fd36", "metadata": {}, "source": [ "### 2.3. Query endpoint and parse response\n", "\n", "***\n", "Input to the endpoint is any string of text dumped in json and encoded in `utf-8` format. Output of the endpoint is a `json` with generated text.\n", "\n", "***" ] }, { "cell_type": "code", "execution_count": null, "id": "84fb30d0", "metadata": { "tags": [] }, "outputs": [], "source": [ "import matplotlib.pyplot as plt\n", "import numpy as np\n", "\n", "\n", "def query(model_predictor, text):\n", " \"\"\"Query the model predictor.\"\"\"\n", "\n", " encoded_text = text.encode(\"utf-8\")\n", "\n", " query_response = model_predictor.predict(\n", " encoded_text,\n", " {\n", " \"ContentType\": \"application/x-text\",\n", " \"Accept\": \"application/json\",\n", " },\n", " )\n", " return query_response\n", "\n", "\n", "def parse_response(query_response):\n", " \"\"\"Parse response and return generated image and the prompt\"\"\"\n", "\n", " response_dict = json.loads(query_response)\n", " return response_dict[\"generated_image\"], response_dict[\"prompt\"]\n", "\n", "\n", "def display_img_and_prompt(img, prmpt):\n", " \"\"\"Display hallucinated image.\"\"\"\n", " plt.figure(figsize=(12, 12))\n", " plt.imshow(np.array(img))\n", " plt.axis(\"off\")\n", " plt.title(prmpt)\n", " plt.show()" ] }, { "cell_type": "markdown", "id": "aea0434b", "metadata": {}, "source": [ "***\n", "Below, we put in some example input text. You can put in any text and the model predicts the image corresponding to that text.\n", "\n", "***" ] }, { "cell_type": "code", "execution_count": null, "id": "a5a12e3e-c269-432a-8e41-7e0903c975af", "metadata": { "pycharm": { "is_executing": true }, "tags": [] }, "outputs": [], "source": [ "text = \"cottage in impressionist style\"\n", "query_response = query(model_predictor, text)\n", "img, prmpt = parse_response(query_response)\n", "display_img_and_prompt(img, prmpt)" ] }, { "cell_type": "markdown", "id": "7d591919-1be0-4e9f-b7ff-0aa6e0959053", "metadata": { "pycharm": { "is_executing": true } }, "source": [ "### 2.4. Supported Inference parameters\n", "\n", "***\n", "This model also supports many advanced parameters while performing inference. They include:\n", "\n", "* **prompt**: prompt to guide the image generation. Must be specified and can be a string or a list of strings.\n", "* **width**: width of the hallucinated image. If specified, it must be a positive integer divisible by 8.\n", "* **height**: height of the hallucinated image. If specified, it must be a positive integer divisible by 8.\n", "* **num_inference_steps**: Number of denoising steps during image generation. More steps lead to higher quality image. If specified, it must a positive integer.\n", "* **guidance_scale**: Higher guidance scale results in image closely related to the prompt, at the expense of image quality. If specified, it must be a float. guidance_scale<=1 is ignored.\n", "* **negative_prompt**: guide image generation against this prompt. If specified, it must be a string or a list of strings and used with guidance_scale. If guidance_scale is disabled, this is also disabled. Moreover, if prompt is a list of strings then negative_prompt must also be a list of strings. \n", "* **num_images_per_prompt**: number of images returned per prompt. If specified it must be a positive integer. \n", "* **seed**: Fix the randomized state for reproducibility. If specified, it must be an integer.\n", "\n", "***" ] }, { "cell_type": "code", "execution_count": null, "id": "4fee71b1-5584-4916-bd78-5b895be08d41", "metadata": { "pycharm": { "is_executing": true }, "tags": [] }, "outputs": [], "source": [ "import json\n", "\n", "# Training data for different models had different image sizes and it is often observed that the model performs best when the generated image\n", "# has dimensions same as the training data dimension. For dimensions not matching the default dimensions, it may result in a black image.\n", "# Stable Diffusion v1-4 was trained on 512x512 images and Stable Diffusion v2 was trained on 768x768 images.\n", "payload = {\n", " \"prompt\": \"astronaut on a horse\",\n", " \"width\": 512,\n", " \"height\": 512,\n", " \"num_images_per_prompt\": 1,\n", " \"num_inference_steps\": 50,\n", " \"guidance_scale\": 7.5,\n", " \"seed\": 1,\n", "}\n", "\n", "\n", "def query_endpoint_with_json_payload(model_predictor, payload, content_type, accept):\n", " \"\"\"Query the model predictor with json payload.\"\"\"\n", "\n", " encoded_payload = json.dumps(payload).encode(\"utf-8\")\n", "\n", " query_response = model_predictor.predict(\n", " encoded_payload,\n", " {\n", " \"ContentType\": content_type,\n", " \"Accept\": accept,\n", " },\n", " )\n", " return query_response\n", "\n", "\n", "def parse_response_multiple_images(query_response):\n", " \"\"\"Parse response and return generated image and the prompt\"\"\"\n", "\n", " response_dict = json.loads(query_response)\n", " return response_dict[\"generated_images\"], response_dict[\"prompt\"]\n", "\n", "\n", "query_response = query_endpoint_with_json_payload(\n", " model_predictor, payload, \"application/json\", \"application/json\"\n", ")\n", "generated_images, prompt = parse_response_multiple_images(query_response)\n", "\n", "for img in generated_images:\n", " display_img_and_prompt(img, prompt)" ] }, { "cell_type": "markdown", "id": "62857efd-e53d-4730-a3d2-b7a9bcd03771", "metadata": { "pycharm": { "is_executing": true }, "tags": [] }, "source": [ "### 2.5. Compressed Image Output\n", "\n", "---\n", "\n", "Default response type above from an endpoint is a nested array with RGB values and if the generated image size is large, this may hit response size limit. To address this, we also support endpoint response where each image is returned as a JPEG image returned as bytes. To do this, please set `Accept = 'application/json;jpeg'`.\n", "\n", "\n", "---" ] }, { "cell_type": "code", "execution_count": null, "id": "bfdf0bd9-37a6-4401-afbd-34388a4ecbe8", "metadata": { "pycharm": { "is_executing": true }, "tags": [] }, "outputs": [], "source": [ "from PIL import Image\n", "from io import BytesIO\n", "import base64\n", "import json\n", "\n", "\n", "def display_encoded_images(generated_images, title):\n", " \"\"\"Decode the images and convert to RGB format and display\n", "\n", " Args:\n", " generated_images: are a list of jpeg images as bytes with b64 encoding.\n", " \"\"\"\n", "\n", " for generated_image in generated_images:\n", " generated_image_decoded = BytesIO(base64.b64decode(generated_image.encode()))\n", " generated_image_rgb = Image.open(generated_image_decoded).convert(\"RGB\")\n", " display_img_and_prompt(generated_image_rgb, title)\n", "\n", "\n", "def compressed_output_query_and_display(payload, title):\n", " query_response = query_endpoint_with_json_payload(\n", " model_predictor, payload, \"application/json\", \"application/json;jpeg\"\n", " )\n", " generated_images, prompt = parse_response_multiple_images(query_response)\n", "\n", " display_encoded_images(generated_images, title)\n", "\n", "\n", "payload = {\n", " \"prompt\": \"astronaut on a horse\",\n", " \"width\": 512,\n", " \"height\": 512,\n", " \"num_images_per_prompt\": 1,\n", " \"num_inference_steps\": 50,\n", " \"guidance_scale\": 7.5,\n", " \"seed\": 1,\n", "}\n", "compressed_output_query_and_display(payload, \"generated image with compressed response type\")" ] }, { "cell_type": "markdown", "id": "3569c635", "metadata": {}, "source": [ "### 2.6. Prompt Engineering\n", "---\n", "Writing a good prompt can sometime be an art. It is often difficult to predict whether a certain prompt will yield a satisfactory image with a given model. However, there are certain templates that have been observed to work. Broadly, a prompt can be roughly broken down into three pieces: (i) type of image (photograph/sketch/painting etc.), (ii) description (subject/object/environment/scene etc.) and (iii) the style of the image (realistic/artistic/type of art etc.). You can change each of the three parts individually to generate variations of an image. Adjectives have been known to play a significant role in the image generation process. Also, adding more details help in the generation process.\n", "\n", "To generate a realistic image, you can use phrases such as “a photo of”, “a photograph of”, “realistic” or “hyper realistic”. To generate images by artists you can use phrases like “by Pablo Piccaso” or “oil painting by Rembrandt” or “landscape art by Frederic Edwin Church” or “pencil drawing by Albrecht Dürer”. You can also combine different artists as well. To generate artistic image by category, you can add the art category in the prompt such as “lion on a beach, abstract”. Some other categories include “oil painting”, “pencil drawing, “pop art”, “digital art”, “anime”, “cartoon”, “futurism”, “watercolor”, “manga” etc. You can also include details such as lighting or camera lens such as 35mm wide lens or 85mm wide lens and details about the framing (portrait/landscape/close up etc.).\n", "\n", "Note that model generates different images even if same prompt is given multiple times. So, you can generate multiple images and select the image that suits your application best.\n", "\n", "---" ] }, { "cell_type": "code", "execution_count": null, "id": "c889119a-246d-4cc3-8f47-07df32c19243", "metadata": { "tags": [] }, "outputs": [], "source": [ "prompts = [\n", " \"An african woman with turban smiles at the camera, pexels contest winner, smiling young woman, face is wrapped in black scarf, a cute young woman, loosely cropped, beautiful girl, acting headshot\",\n", " \"Ancient Japanese Samurai, a person standing on a ledge in a city at night, cyberpunk art, trending on shutterstock, batman mecha, stylized cyberpunk minotaur logo, cinematic, cyberpunk\",\n", " \"Character design of a robot warrior, concept art, contest winner, diverse medical cybersuits, Football armor, triade color scheme, black shirt underneath armor, in golden armor, clothes in military armor, high resolution render, octane\",\n", " \"A croissant sitting on top of a yellow plate, a portait, trending on unsplash, sitting on a mocha-coloured table, magazine, woodfired, bakery, great composition, amber\",\n", " \"symmetry!! portrait of vanessa hudgens in the style of horizon zero dawn, machine face, intricate, elegant, highly detailed, digital painting, artstation, concept art, smooth, sharp focus, illustration, art by artgerm and greg rutkowski and alphonse mucha, 8 k\",\n", " \"landscape of the beautiful city of paris rebuilt near the pacific ocean in sunny california, amazing weather, sandy beach, palm trees, splendid haussmann architecture, digital painting, highly detailed, intricate, without duplication, art by craig mullins, greg rutkwowski, concept art, matte painting, trending on artstation\",\n", "]\n", "for prompt in prompts:\n", " payload = {\"prompt\": prompt, \"width\": 512, \"height\": 512, \"seed\": 1}\n", " compressed_output_query_and_display(payload, \"generated image with detailed prompt\")" ] }, { "cell_type": "markdown", "id": "159008ec-d6c0-4934-8d37-acd0a5718a5b", "metadata": { "tags": [] }, "source": [ "### 2.7. Negative Prompting\n", "---\n", "\n", "Negative prompt is an important parameter while generating images using Stable Diffusion Models. It provides you an additional control over the image generation process and let you direct the model to avoid certain objects, colors, styles, attributes and more from the generated images. \n", "\n", "---" ] }, { "cell_type": "code", "execution_count": null, "id": "c2fcd9ea-95e6-4847-bcb7-df0275fe7572", "metadata": { "tags": [] }, "outputs": [], "source": [ "prompt = \"emma watson as nature magic celestial, top down pose, long hair, soft pink and white transparent cloth, space, D&D, shiny background, intricate, elegant, highly detailed, digital painting, artstation, concept art, smooth, sharp focus, illustration, artgerm, bouguereau\"\n", "payload = {\"prompt\": prompt, \"seed\": 0}\n", "compressed_output_query_and_display(payload, \"generated image with no negative prompt\")\n", "\n", "\n", "negative_prompt = \"windy\"\n", "payload = {\"prompt\": prompt, \"negative_prompt\": negative_prompt, \"seed\": 0}\n", "compressed_output_query_and_display(\n", " payload, f\"generated image with negative prompt: `{negative_prompt}`\"\n", ")" ] }, { "cell_type": "markdown", "id": "affbd634-c56c-4f7d-bba5-56dc2988d406", "metadata": {}, "source": [ "---\n", "\n", "Even though, you can specify many of these concepts in the original prompt by specifying negative words “without”, “except”, “no” and “not”, Stable Diffusion models have been observed to not understand the negative words very well. Thus, you should use negative prompt parameter when tailoring the image to your use case. \n", "\n", "---" ] }, { "cell_type": "code", "execution_count": null, "id": "1dc72f04-d225-4e67-a5a2-dc702ec8c264", "metadata": {}, "outputs": [], "source": [ "prompt = \"a portrait of a man without beard\"\n", "payload = {\"prompt\": prompt, \"seed\": 0}\n", "compressed_output_query_and_display(payload, f\"prompt: `{prompt}`, negative prompt: None\")\n", "\n", "prompt, negative_prompt = \"a portrait of a man\", \"beard\"\n", "payload = {\"prompt\": prompt, \"negative_prompt\": negative_prompt, \"seed\": 0}\n", "compressed_output_query_and_display(\n", " payload, f\"prompt: `{prompt}`, negative prompt: `{negative_prompt}`\"\n", ")" ] }, { "cell_type": "markdown", "id": "4127dc77-6f1a-446b-bb32-edfcdb07258d", "metadata": {}, "source": [ "---\n", "While trying to generate images, we recommend starting with prompt and progressively building negative prompt to exclude the subjects/styles that you do not want in the image.\n", "\n", "---\n" ] }, { "cell_type": "code", "execution_count": null, "id": "a65babac-1f3f-41dd-a9a0-f72406e3df66", "metadata": {}, "outputs": [], "source": [ "prompt = \"cyberpunk forest by Salvador Dali\"\n", "payload = {\"prompt\": prompt, \"seed\": 1}\n", "compressed_output_query_and_display(payload, f\"prompt: `{prompt}`, negative prompt: None\")\n", "\n", "negative_prompt = \"trees, green\"\n", "payload = {\"prompt\": prompt, \"negative_prompt\": negative_prompt, \"seed\": 1}\n", "compressed_output_query_and_display(\n", " payload, f\"prompt: `{prompt}`, negative prompt: `{negative_prompt}`\"\n", ")" ] }, { "cell_type": "markdown", "id": "a17de3f4-0b56-4232-9706-09bf36411af6", "metadata": {}, "source": [ "---\n", "Some of the helpful keywords while constructing negative prompts are: duplicate, blurry, Missing legs, mutation, morbid, deformed, malformed limbs, missing legs, bad anatomy, extra fingers, cloned face, too many fingers. \n", "\n", "---" ] }, { "cell_type": "code", "execution_count": null, "id": "be527519-59c3-4093-9e23-aa36040dbebd", "metadata": {}, "outputs": [], "source": [ "prompt = \"a fantasy style portrait painting of rachel lane / alison brie / sally kellerman hybrid in the style of francois boucher oil painting unreal 5 daz. rpg portrait, extremely detailed artgerm greg rutkowski alphonse mucha\"\n", "payload = {\"prompt\": prompt, \"seed\": 1}\n", "compressed_output_query_and_display(payload, f\"No negative prompt\")\n", "\n", "\n", "negative_prompt = \"duplicate\"\n", "payload = {\"prompt\": prompt, \"negative_prompt\": negative_prompt, \"seed\": 1}\n", "compressed_output_query_and_display(payload, f\"negative prompt: `{negative_prompt}`\")" ] }, { "cell_type": "markdown", "id": "50712011-66a9-4200-9932-ed49c66ab03d", "metadata": {}, "source": [ "---\n", "\n", "You can also use negative prompts to substitute parts of the prompt. For instance, instead of using “sharp”/“focused” in the prompt, you can use “blurry” in the negative prompt. \n", "\n", "Negative Prompts have been observed to be critical especially for Stable Diffusion V2 (identified by model_id `model-txt2img-stabilityai-stable-diffusion-v2`, `model-txt2img-stabilityai-stable-diffusion-v2-fp16`, `model-txt2img-stabilityai-stable-diffusion-v2-1-base`). Thus, we recommend usage of negative prompts especially when using version 2.x. To learn more about negative prompting, please see [How to use negative prompts?](https://stable-diffusion-art.com/how-to-use-negative-prompts/) and [How does negative prompt work?](https://stable-diffusion-art.com/how-negative-prompt-work/)\n", "\n", "---" ] }, { "cell_type": "markdown", "id": "870d1173", "metadata": {}, "source": [ "### 2.8. Clean up the endpoint" ] }, { "cell_type": "code", "execution_count": null, "id": "63cb143b", "metadata": {}, "outputs": [], "source": [ "# Delete the SageMaker endpoint\n", "model_predictor.delete_model()\n", "model_predictor.delete_endpoint()" ] }, { "cell_type": "markdown", "id": "2c8edfc4", "metadata": {}, "source": [ "## 3. Fine-tune the pre-trained model on a custom dataset\n", "\n", "---\n", "Previously, we saw how to run inference on a pre-trained model. Next, we discuss how a model can be finetuned to a custom dataset with any number of classes.\n", "\n", "The model can be fine-tuned to any dataset of images. It works very well even with as little as five training images.\n", "\n", "The fine-tuning script is built on the script from [dreambooth](https://dreambooth.github.io/). The model returned by fine-tuning can be further deployed for inference. Below are the instructions for how the training data should be formatted.\n", "\n", "- **Input:** A directory containing the instance images, `dataset_info.json` and (optional) directory `class_data_dir`.\n", " - Images may be of `.png` or `.jpg` or `.jpeg` format.\n", " - `dataset_info.json` file must be of the format {'instance_prompt':<>,'class_prompt':<>}.\n", " - If with_prior_preservation = False, you may choose to ignore 'class_prompt'.\n", " - `class_data_dir` directory must have class images. If with_prior_preservation = True and class_data_dir is not present or there are not enough images already present in class_data_dir, additional images will be sampled with class_prompt.\n", "- **Output:** A trained model that can be deployed for inference.\n", "\n", "The s3 path should look like `s3://bucket_name/input_directory/`. Note the trailing `/` is required.\n", "\n", "Here is an example format of the training data.\n", "\n", " input_directory\n", " |---instance_image_1.png\n", " |---instance_image_2.png\n", " |---instance_image_3.png\n", " |---instance_image_4.png\n", " |---instance_image_5.png\n", " |---dataset_info.json\n", " |---class_data_dir\n", " |---class_image_1.png\n", " |---class_image_2.png\n", " |---class_image_3.png\n", " |---class_image_4.png\n", "\n", "**Prior preservation, instance prompt and class prompt:** Prior preservation is a technique that uses additional images of the same class that we are trying to train on. For instance, if the training data consists of images of a particular dog, with prior preservation, we incorporate class images of generic dogs. It tries to avoid overfitting by showing images of different dogs while training for a particular dog. Tag indicating the specific dog present in instance prompt is missing in the class prompt. For instance, instance prompt may be \"a photo of a Doppler dog\" and class prompt may be \"a photo of a dog\". You can enable prior preservation by setting the hyper-parameter with_prior_preservation = True.\n", "\n", "\n", "\n", "We provide a default dataset of dog images. It consists of images (instance images corresponding to instance prompt) of a single dog with no class images. If using the default dataset, try the prompt \"a photo of a Doppler dog\" while doing inference in the demo notebook.\n", "\n", "\n", "License: [MIT](https://github.com/marshmellow77/dreambooth-sm/blob/main/LICENSE)." ] }, { "cell_type": "markdown", "id": "b8bfaa4d", "metadata": {}, "source": [ "### 3.1. Retrieve Training Artifacts\n", "\n", "---\n", "Here, we retrieve the training docker container, the training algorithm source, and the pre-trained base model. Note that model_version=\"*\" fetches the latest model.\n", "\n", "---" ] }, { "cell_type": "code", "execution_count": null, "id": "f11ff722", "metadata": { "tags": [] }, "outputs": [], "source": [ "from sagemaker import image_uris, model_uris, script_uris\n", "\n", "# Currently, not all the stable diffusion models in jumpstart support finetuning. Thus, we manually select a model\n", "# which supports finetuning.\n", "train_model_id, train_model_version, train_scope = (\n", " \"model-txt2img-stabilityai-stable-diffusion-v2-1-base\",\n", " \"*\",\n", " \"training\",\n", ")\n", "\n", "# Tested with ml.g4dn.2xlarge (16GB GPU memory) and ml.g5.2xlarge (24GB GPU memory) instances. Other instances may work as well.\n", "# If ml.g5.2xlarge instance type is available, please change the following instance type to speed up training.\n", "training_instance_type = \"ml.g4dn.2xlarge\"\n", "\n", "# Retrieve the docker image\n", "train_image_uri = image_uris.retrieve(\n", " region=None,\n", " framework=None, # automatically inferred from model_id\n", " model_id=train_model_id,\n", " model_version=train_model_version,\n", " image_scope=train_scope,\n", " instance_type=training_instance_type,\n", ")\n", "\n", "# Retrieve the training script. This contains all the necessary files including data processing, model training etc.\n", "train_source_uri = script_uris.retrieve(\n", " model_id=train_model_id, model_version=train_model_version, script_scope=train_scope\n", ")\n", "# Retrieve the pre-trained model tarball to further fine-tune\n", "train_model_uri = model_uris.retrieve(\n", " model_id=train_model_id, model_version=train_model_version, model_scope=train_scope\n", ")" ] }, { "cell_type": "markdown", "id": "6e266289", "metadata": {}, "source": [ "### 3.2. Set Training parameters\n", "\n", "---\n", "Now that we are done with all the set up that is needed, we are ready to train our stable diffusion model. To begin, let us create a [``sageMaker.estimator.Estimator``](https://sagemaker.readthedocs.io/en/stable/api/training/estimators.html) object. This estimator will launch the training job.\n", "\n", "There are two kinds of parameters that need to be set for training. The first one are the parameters for the training job. These include: (i) Training data path. This is S3 folder in which the input data is stored, (ii) Output path: This the s3 folder in which the training output is stored. (iii) Training instance type: This indicates the type of machine on which to run the training. We defined the training instance type above to fetch the correct train_image_uri.\n", "\n", "The second set of parameters are algorithm specific training hyper-parameters.\n", "\n", "---" ] }, { "cell_type": "code", "execution_count": null, "id": "e21c709f", "metadata": { "tags": [] }, "outputs": [], "source": [ "# Sample training data is available in this bucket\n", "training_data_bucket = f\"jumpstart-cache-prod-{aws_region}\"\n", "training_data_prefix = \"training-datasets/dogs_sd_finetuning/\"\n", "\n", "training_dataset_s3_path = f\"s3://{training_data_bucket}/{training_data_prefix}\"\n", "\n", "output_bucket = sess.default_bucket()\n", "output_prefix = \"jumpstart-example-sd-training\"\n", "\n", "s3_output_location = f\"s3://{output_bucket}/{output_prefix}/output\"" ] }, { "cell_type": "markdown", "id": "adda2a1e", "metadata": {}, "source": [ "---\n", "For algorithm specific hyper-parameters, we start by fetching python dictionary of the training hyper-parameters that the algorithm accepts with their default values. This can then be overridden to custom values.\n", "\n", "---" ] }, { "cell_type": "code", "execution_count": null, "id": "aa371787", "metadata": { "tags": [] }, "outputs": [], "source": [ "from sagemaker import hyperparameters\n", "\n", "# Retrieve the default hyper-parameters for fine-tuning the model\n", "hyperparameters = hyperparameters.retrieve_default(\n", " model_id=train_model_id, model_version=train_model_version\n", ")\n", "\n", "# [Optional] Override default hyperparameters with custom values\n", "hyperparameters[\"max_steps\"] = \"400\"\n", "print(hyperparameters)" ] }, { "cell_type": "markdown", "id": "d102c884", "metadata": {}, "source": [ "---\n", "If setting `with_prior_preservation=True`, please use ml.g5.2xlarge instance type as more memory is required to generate class images. Currently, training on ml.g4dn.2xlarge instance type run into CUDA out of memory issue when setting `with_prior_preservation=True`.\n", "\n", "---" ] }, { "cell_type": "markdown", "id": "7cda2854", "metadata": {}, "source": [ "### 3.3. Start Training\n", "---\n", "We start by creating the estimator object with all the required assets and then launch the training job. It takes less than 10 mins on the default dataset.\n", "\n", "---" ] }, { "cell_type": "code", "execution_count": null, "id": "76bdbb83", "metadata": { "tags": [] }, "outputs": [], "source": [ "from sagemaker.estimator import Estimator\n", "from sagemaker.utils import name_from_base\n", "from sagemaker.tuner import HyperparameterTuner\n", "\n", "training_job_name = name_from_base(f\"jumpstart-example-{train_model_id}-transfer-learning\")\n", "\n", "# Create SageMaker Estimator instance\n", "sd_estimator = Estimator(\n", " role=aws_role,\n", " image_uri=train_image_uri,\n", " source_dir=train_source_uri,\n", " model_uri=train_model_uri,\n", " entry_point=\"transfer_learning.py\", # Entry-point file in source_dir and present in train_source_uri.\n", " instance_count=1,\n", " instance_type=training_instance_type,\n", " max_run=360000,\n", " hyperparameters=hyperparameters,\n", " output_path=s3_output_location,\n", " base_job_name=training_job_name,\n", ")\n", "\n", "# Launch a SageMaker Training job by passing s3 path of the training data\n", "sd_estimator.fit({\"training\": training_dataset_s3_path}, logs=True)" ] }, { "cell_type": "markdown", "id": "6fadc21e", "metadata": {}, "source": [ "### 3.4. Deploy and run inference on the fine-tuned model\n", "\n", "---\n", "\n", "A trained model does nothing on its own. We now want to use the model to perform inference. For this example, that means predicting the bounding boxes of an image. We follow the same steps as in [2. Run inference on the pre-trained model](#2.-Run-inference-on-the-pre-trained-model). We start by retrieving the jumpstart artifacts for deploying an endpoint. However, instead of base_predictor, we deploy the `od_estimator` that we fine-tuned.\n", "\n", "---" ] }, { "cell_type": "code", "execution_count": null, "id": "fcf25c1e", "metadata": { "tags": [] }, "outputs": [], "source": [ "inference_instance_type = \"ml.g4dn.2xlarge\"\n", "\n", "# Retrieve the inference docker container uri\n", "deploy_image_uri = image_uris.retrieve(\n", " region=None,\n", " framework=None, # automatically inferred from model_id\n", " image_scope=\"inference\",\n", " model_id=train_model_id,\n", " model_version=train_model_version,\n", " instance_type=inference_instance_type,\n", ")\n", "# Retrieve the inference script uri. This includes scripts for model loading, inference handling etc.\n", "deploy_source_uri = script_uris.retrieve(\n", " model_id=train_model_id, model_version=train_model_version, script_scope=\"inference\"\n", ")\n", "\n", "endpoint_name = name_from_base(f\"jumpstart-example-FT-{train_model_id}-\")\n", "\n", "# Use the estimator from the previous step to deploy to a SageMaker endpoint\n", "finetuned_predictor = sd_estimator.deploy(\n", " initial_instance_count=1,\n", " instance_type=inference_instance_type,\n", " entry_point=\"inference.py\", # entry point file in source_dir and present in deploy_source_uri\n", " image_uri=deploy_image_uri,\n", " source_dir=deploy_source_uri,\n", " endpoint_name=endpoint_name,\n", ")" ] }, { "cell_type": "markdown", "id": "76b76663", "metadata": {}, "source": [ "Next, we query the finetuned model, parse the response and display the generated image. Functions for these are implemented in sections [2.3. Query endpoint and parse response](#2.3.-Query-endpoint-and-parse-response). Please execute those cells." ] }, { "cell_type": "code", "execution_count": null, "id": "6a4bf38f", "metadata": { "pycharm": { "is_executing": true }, "tags": [] }, "outputs": [], "source": [ "text = \"a photo of a Doppler dog with a hat\"\n", "query_response = query(finetuned_predictor, text)\n", "img, prmpt = parse_response(query_response)\n", "display_img_and_prompt(img, prmpt)" ] }, { "cell_type": "markdown", "id": "944a6f0f", "metadata": {}, "source": [ "All the parameters mentioned in [2.4. Supported Inference parameters](#2.4.-Supported-Inference-parameters) are supported with finetuned model as well. You may also receive compressed image output as in [2.5. Compressed Image Output](#2.5.-Compressed-Image-Output) by changing `accept`." ] }, { "cell_type": "markdown", "id": "f3381a2c", "metadata": {}, "source": [ "---\n", "Next, we delete the endpoint corresponding to the finetuned model.\n", "\n", "---" ] }, { "cell_type": "code", "execution_count": null, "id": "b03c8594", "metadata": { "tags": [] }, "outputs": [], "source": [ "# Delete the SageMaker endpoint\n", "finetuned_predictor.delete_model()\n", "finetuned_predictor.delete_endpoint()" ] }, { "cell_type": "markdown", "id": "a504c9ac", "metadata": {}, "source": [ "### 4. Conclusion\n", "---\n", "In this tutorial, we learnt how to deploy a pre-trained Stable Diffusion model on SageMaker using JumpStart. We saw that Stable Diffusion models can generate highly photo-realistic images from text. JumpStart provides both Stable Diffusion 1 and Stable Diffusion 2 and their FP16 revisions. JumpStart also provides additional 84 diffusion models which have been trained to generate images from different themes and different languages. You can deploy any of these models without writing any code of your own. To deploy a specific model, you can select a `model_id` in the dropdown menu in [2.1. Select a Model](#2.1.-Select-a-Model).\n", "\n", "You can tweak the image generation process by selecting the appropriate parameters during inference. Guidance on how to set these parameters is provided in [2.4. Supported Inference parameters](#2.4.-Supported-Inference-parameters). We also saw how returning a large image payload can lead to response size limit issues. JumpStart handles it by encoding the image at the endpoint and decoding it in the notebook before displaying. Finally, we saw how prompt engineering is a crucial step in generating high quality images. We discussed how to set your own prompts and saw a some examples of good prompts.\n", "\n", "To learn more about Inference on pre-trained Stable Diffusion models, please check out the blog [Generate images from text with the stable diffusion model on Amazon SageMaker JumpStart](https://aws.amazon.com/blogs/machine-learning/generate-images-from-text-with-the-stable-diffusion-model-on-amazon-sagemaker-jumpstart/)\n", "\n", "Although creating impressive images can find use in industries ranging from art to NFTs and beyond, today we also expect AI to be personalizable. JumpStart provides fine-tuning capability to the pre-trained models so that you can adapt the model to your own use case with as little as five training images. This can be useful when creating art, logos, custom designs, NFTs, and so on, or fun stuff such as generating custom AI images of your pets or avatars of yourself. To learn more about Stable Diffusion fine-tuning, please check out the blog [Fine-tune text-to-image Stable Diffusion models with Amazon SageMaker JumpStart](https://aws.amazon.com/blogs/machine-learning/fine-tune-text-to-image-stable-diffusion-models-with-amazon-sagemaker-jumpstart/)." ] } ], "metadata": { "instance_type": "ml.t3.medium", "kernelspec": { "display_name": "sagemaker:Python", "language": "python", "name": "conda-env-sagemaker-py" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.15" }, "pycharm": { "stem_cell": { "cell_type": "raw", "metadata": { "collapsed": false }, "source": [] } } }, "nbformat": 4, "nbformat_minor": 5 }