# Text2Image, Visual Grounding, Image Caption, and Visual Question and Answer using OFA
## Based on the [OFA model](https://huggingface.co/OFA-Sys/OFA-large)
### Code by Suresh Poopandi, October 2022

From [OFA GitHub](https://github.com/OFA-Sys/OFA):
OFA is a unified sequence-to-sequence pre-trained model (support English and Chinese) that unifies modalities (i.e., cross-modality, vision, language) and tasks (finetuning and prompt tuning are supported): image captioning (1st at the MSCOCO Leaderboard), VQA , visual grounding, text-to-image generation, text classification, text generation, image classification, etc. We provide step-by-step instructions for pretraining and fine tuning and corresponding checkpoints (check official ckpt [EN|CN] or huggingface ckpt).

This notebook will walk through the process of deploying the OFA model to an AWS SageMaker endpoint. We'll start by downloading the model locally to test it out, but note this will only work if you have access to a GPU. Next, we create a custom docker container which contains the model. In this use case, we will have the endpoint put the generated images directly in an S3 bucket, and return the location of the image.

Note that this model is large. You'll want to make sure to use an instance type with at least 15GB of storage in order to have space to download the model and package up the container. This notebook was tested on a ml.g4dn.2xlarge with 25GB of EBS storage, you can also run it on CPU based instances but the performance might be impacted.

The notebook follows these basic steps:
1. Install Dependencies (for local testing)
2. Test the model locally
3. Create a custom inference script
4. Create a unit test file to test inference script
5. Create a custom Docker container for the model and inference script
6. Test the Docker container locally
7. Define and Deploy the model
8. Test the new endpoint
9. Clean up resources

References:
 * [OFA Model on Hugging Face](https://huggingface.co/OFA-Sys/OFA-large)

Container Structure: This should be the directory structure locally, in order to pack everything correctly into your container. ([reference](https://sagemaker-workshop.com/custom/containers.html)) You will already have this structure if you cloned the git repo, if not, following the directions in this notebook will rebuild this structure and all required files.
- This Notebook
- container
 - OFA
 - predictor.py: Flask app for inference, our custom inference code
 - wsgi.py: Wrapper around predictor
 - nginx.conf: Config for nginx front-end
 - serve: Launches gunicorn server
 - OFA model downloaded from git
 - test_predictor.py: test methods to test predictor
 - Dockerfile

# 1. Install Dependencies

### 1(a) Setup Local Environment
This step creates the required folders and download the ofa-large pre-trained model
Make sure you start in the same directory that this notebook is in.

In [None]:
%%sh

mkdir container

cd container

git clone https://github.com/OFA-Sys/OFA

cd OFA

pip install -r requirements.txt

pip install --upgrade gradio

cd fairseq

pip install ./

cd ../..

wget https://ofa-silicon.oss-us-west-1.aliyuncs.com/checkpoints/ofa_large_clean.pt

mkdir -p checkpoints

mv ofa_large_clean.pt checkpoints/ofa_large_clean.pt


### 1(b) Set up SageMaker enviornment
This gives us access to basic information and functionality for our SageMaker environment, including the IAM role we are going to use in the next setup step.

In [None]:
import sagemaker
import boto3
sess = sagemaker.Session()
# sagemaker session bucket -> used for uploading data, models and logs
# sagemaker will automatically create this bucket if it does not exist
sagemaker_session_bucket=None
if sagemaker_session_bucket is None and sess is not None:
 # set to default bucket if a bucket name is not given
 sagemaker_session_bucket = sess.default_bucket()

try:
 role = sagemaker.get_execution_role()
except ValueError:
 iam = boto3.client('iam')
 role = iam.get_role(RoleName='sagemaker_execution_role')['Role']['Arn']

sess = sagemaker.Session(default_bucket=sagemaker_session_bucket)

#used later on when deploying a model:
sm_client = boto3.client(service_name='sagemaker')
runtime_sm_client = boto3.client(service_name='sagemaker-runtime')
account_id = boto3.client('sts').get_caller_identity()['Account']
region = boto3.Session().region_name

print(f"sagemaker role arn: {role}")
print(f"sagemaker bucket: {sess.default_bucket()}")
print(f"sagemaker session region: {sess.boto_region_name}")

### 1(c) Set up IAM permissions
In additon to the default IAM permissions, you need to add S3 access (because the model stores images in S3, and ECR access because we push our custom docker container to ECR for use by SageMaker.

Add this policy for ECR access, note that you are giving permission to create a new repository, as well as push images to it.
`
{
 "Version": "2012-10-17",
 "Statement": [
 {
 "Effect": "Allow",
 "Action": [
 "ecr:CompleteLayerUpload",
 "ecr:GetAuthorizationToken",
 "ecr:UploadLayerPart",
 "ecr:InitiateLayerUpload",
 "ecr:BatchCheckLayerAvailability",
 "ecr:PutImage",
 "ecr:CreateRepository"
 ],
 "Resource": "*"
 }
 ]
}
`

For S3 access, attach AWS Managed IAM Policy **AmazonS3FullAccess** to the role used by AWS SageMaker



# 2. Test the model locally
This allows a user to get a feel for what the model can do, and how to use the hyperparameters. Feel free to skip this step if you are already comfortable with the model.

Following fix is needed to workaround the version incompatibility between Pytorch and horovod

In [None]:
!pip uninstall -y horovod
!HOROVOD_WITH_PYTORCH=1 
!pip install --no-cache-dir horovod

In [None]:
import sys
import os
print(os.getcwd())
sys.path.append(os.path.normpath(os.path.join(os.getcwd(), 'container/OFA')))

from lib2to3.pgen2 import token
from flask import Flask
import flask
import os
import json
import logging

import boto3
import io

import torch
import numpy as np
from fairseq import checkpoint_utils
from fairseq import options, tasks, utils
from fairseq.dataclass.utils import convert_namespace_to_omegaconf
from tasks.mm_tasks.refcoco import RefcocoTask
from PIL import Image
from torchvision import transforms
import cv2
import gradio as gr



OFA_TASK_IMAGE_CAPTION="OFA_TASK_IMAGE_CAPTION"
OFA_TASK_VISUAL_QA="OFA_TASK_VISUAL_QA"
OFA_TASK_VISUAL_GROUNDING="OFA_TASK_VISUAL_GROUNDING"
OFA_TASK_TEXT2IMAGE="OFA_TASK_TEXT2IMAGE"

# Register
tasks.register_task('refcoco', RefcocoTask)

# turn on cuda if GPU is available
use_cuda = torch.cuda.is_available()
# use fp16 only when GPU is available
use_fp16 = False

# specify some options for evaluation
parser = options.get_generation_parser()
input_args = ["", "--task=refcoco", "--beam=10", "--path=container/checkpoints/ofa_large_clean.pt", "--bpe-dir=container/OFA/utils/BPE"]
args = options.parse_args_and_arch(parser, input_args)
cfg = convert_namespace_to_omegaconf(args)

# Load pretrained ckpt & config
task = tasks.setup_task(cfg.task)
models, cfg = checkpoint_utils.load_model_ensemble(
 utils.split_paths(cfg.common_eval.path),
 task=task
)

# Move models to GPU
for model in models:
 model.eval()
 if use_fp16:
 model.half()
 if use_cuda and not cfg.distributed_training.pipeline_model_parallel:
 model.cuda()
 model.prepare_for_inference_(cfg)

# Initialize generator
generator = task.build_generator(models, cfg.generation)

mean = [0.5, 0.5, 0.5]
std = [0.5, 0.5, 0.5]

patch_resize_transform = transforms.Compose([
 lambda image: image.convert("RGB"),
 transforms.Resize((task.cfg.patch_image_size, task.cfg.patch_image_size), interpolation=Image.BICUBIC),
 transforms.ToTensor(),
 transforms.Normalize(mean=mean, std=std),
])

# Text preprocess
bos_item = torch.LongTensor([task.src_dict.bos()])
eos_item = torch.LongTensor([task.src_dict.eos()])
pad_idx = task.src_dict.pad()


def get_symbols_to_strip_from_output(generator):
 if hasattr(generator, "symbols_to_strip_from_output"):
 return generator.symbols_to_strip_from_output
 else:
 return {generator.bos, generator.eos}


def decode_fn(x, tgt_dict, bpe, generator, tokenizer=None):
 x = tgt_dict.string(x.int().cpu(), extra_symbols_to_ignore=get_symbols_to_strip_from_output(generator))
 token_result = []
 bin_result = []
 img_result = []
 for token in x.strip().split():
 if token.startswith('".format(int((coord_list[0] * w_resize_ratio / task.cfg.max_image_size * (task.cfg.num_bins - 1))))]
 bin_list += [
 "".format(int((coord_list[1] * h_resize_ratio / task.cfg.max_image_size * (task.cfg.num_bins - 1))))]
 bin_list += [
 "".format(int((coord_list[2] * w_resize_ratio / task.cfg.max_image_size * (task.cfg.num_bins - 1))))]
 bin_list += [
 "".format(int((coord_list[3] * h_resize_ratio / task.cfg.max_image_size * (task.cfg.num_bins - 1))))]
 return ' '.join(bin_list)


def bin2coord(bins, w_resize_ratio, h_resize_ratio):
 bin_list = [int(bin[5:-1]) for bin in bins.strip().split()]
 coord_list = []
 coord_list += [bin_list[0] / (task.cfg.num_bins - 1) * task.cfg.max_image_size / w_resize_ratio]
 coord_list += [bin_list[1] / (task.cfg.num_bins - 1) * task.cfg.max_image_size / h_resize_ratio]
 coord_list += [bin_list[2] / (task.cfg.num_bins - 1) * task.cfg.max_image_size / w_resize_ratio]
 coord_list += [bin_list[3] / (task.cfg.num_bins - 1) * task.cfg.max_image_size / h_resize_ratio]
 return coord_list


def encode_text(text, length=None, append_bos=False, append_eos=False):
 line = [
 task.bpe.encode(' {}'.format(word.strip()))
 if not word.startswith('".format(int((coord_list[0] * w_resize_ratio / task.cfg.max_image_size * (task.cfg.num_bins - 1))))]
 bin_list += [
 "".format(int((coord_list[1] * h_resize_ratio / task.cfg.max_image_size * (task.cfg.num_bins - 1))))]
 bin_list += [
 "".format(int((coord_list[2] * w_resize_ratio / task.cfg.max_image_size * (task.cfg.num_bins - 1))))]
 bin_list += [
 "".format(int((coord_list[3] * h_resize_ratio / task.cfg.max_image_size * (task.cfg.num_bins - 1))))]
 return ' '.join(bin_list)


def bin2coord(bins, w_resize_ratio, h_resize_ratio):
 bin_list = [int(bin[5:-1]) for bin in bins.strip().split()]
 coord_list = []
 coord_list += [bin_list[0] / (task.cfg.num_bins - 1) * task.cfg.max_image_size / w_resize_ratio]
 coord_list += [bin_list[1] / (task.cfg.num_bins - 1) * task.cfg.max_image_size / h_resize_ratio]
 coord_list += [bin_list[2] / (task.cfg.num_bins - 1) * task.cfg.max_image_size / w_resize_ratio]
 coord_list += [bin_list[3] / (task.cfg.num_bins - 1) * task.cfg.max_image_size / h_resize_ratio]
 return coord_list


def encode_text(text, length=None, append_bos=False, append_eos=False):
 line = [
 task.bpe.encode(' {}'.format(word.strip()))
 if not word.startswith('"
 print(f"Instruction:{instruction}")
 print (f"New request:{bucket_name}:{key_name}")

 if ofa_task in [OFA_TASK_VISUAL_QA,OFA_TASK_VISUAL_GROUNDING,OFA_TASK_IMAGE_CAPTION]:
 #download the image from S3 URL
 s3 = boto3.client('s3')
 s3_response_object = s3.get_object(Bucket=bucket_name, Key=key_name)
 image_data = s3_response_object['Body'].read()
 image = Image.open(io.BytesIO(image_data))


 output_img, tokens=general_interface(image,instruction)

 if output_img:
 print("uploading file")
 s3 = boto3.client('s3')
 s3.upload_fileobj(output_img, bucket_name, instruction+key_name)

 result = json.dumps( {'output':tokens})

 return flask.Response(response=result, status=200, mimetype='application/json')




Now let's make a small modification to the serve file. Normally, it would spawn a worker thread for each CPU core, but this doesn't work in our case because the process is GPU bound. The original serve file lines are all here, but we add the model_server_workers = 1 to limit the endpoint to only loading a single model into the GPU. Additional logic may be required if using an instance type with multiple GPU's.

In [None]:
%%writefile container/OFA/serve
#!/usr/bin/env python

# This file implements the scoring service shell. You don't necessarily need to modify it for various
# algorithms. It starts nginx and gunicorn with the correct configurations and then simply waits until
# gunicorn exits.
#
# The flask server is specified to be the app object in wsgi.py
#
# We set the following parameters:
#
# Parameter Environment Variable Default Value
# --------- -------------------- -------------
# number of workers MODEL_SERVER_WORKERS the number of CPU cores
# timeout MODEL_SERVER_TIMEOUT 60 seconds

import multiprocessing
import os
import signal
import subprocess
import sys

cpu_count = multiprocessing.cpu_count()

model_server_timeout = os.environ.get('MODEL_SERVER_TIMEOUT', 60)
model_server_workers = int(os.environ.get('MODEL_SERVER_WORKERS', cpu_count))

#for our GPU based inference, set to one. This process is GPU bound, and the GPU may run out of space if more than one model is loaded.
model_server_workers = 1

def sigterm_handler(nginx_pid, gunicorn_pid):
 try:
 os.kill(nginx_pid, signal.SIGQUIT)
 except OSError:
 pass
 try:
 os.kill(gunicorn_pid, signal.SIGTERM)
 except OSError:
 pass

 sys.exit(0)

def start_server():
 print('Starting the inference server with {} workers.'.format(model_server_workers))


 # link the log streams to stdout/err so they will be logged to the container logs
 subprocess.check_call(['ln', '-sf', '/dev/stdout', '/var/log/nginx/access.log'])
 subprocess.check_call(['ln', '-sf', '/dev/stderr', '/var/log/nginx/error.log'])

 nginx = subprocess.Popen(['nginx', '-c', '/OFA/nginx.conf'])
 gunicorn = subprocess.Popen(['gunicorn',
 '--timeout', str(model_server_timeout),
 '-k', 'sync',
 '-b', 'unix:/tmp/gunicorn.sock',
 '-w', str(model_server_workers),
 'wsgi:app'])

 signal.signal(signal.SIGTERM, lambda a, b: sigterm_handler(nginx.pid, gunicorn.pid))

 # If either subprocess exits, so do we.
 pids = set([nginx.pid, gunicorn.pid])
 while True:
 pid, _ = os.wait()
 if pid in pids:
 break

 sigterm_handler(nginx.pid, gunicorn.pid)
 print('Inference server exiting')

# The main routine just invokes the start function.

if __name__ == '__main__':
 start_server()


Next, we write an nginx configuration file. This is standard for most custom containers, and we don't make any changes for this model.

In [None]:
%%writefile container/OFA/nginx.conf

worker_processes 1;
daemon off; # Prevent forking


pid /tmp/nginx.pid;
error_log /var/log/nginx/error.log;

events {
 # defaults
}

http {
 include /etc/nginx/mime.types;
 default_type application/octet-stream;
 access_log /var/log/nginx/access.log combined;

 upstream gunicorn {
 server unix:/tmp/gunicorn.sock;
 }

 server {
 listen 8080 deferred;
 client_max_body_size 5m;

 keepalive_timeout 5;
 proxy_read_timeout 1200s;

 location ~ ^/(ping|invocations) {
 proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
 proxy_set_header Host $http_host;
 proxy_redirect off;
 proxy_pass http://gunicorn;
 }

 location / {
 return 404 "{}";
 }
 }
}




Finally, we write a quick wrapper for wsgi, so that the web server can find and load our inference script.

In [None]:
%%writefile container/OFA/wsgi.py
import predictor as myapp

# This is just a simple wrapper for gunicorn to find your app.
# If you want to change the algorithm file, simply change "predictor" above to the
# new file.

app = myapp.app


Now that we have all the files we need for our container, we create the Dockerfile. This is the instructions for Docker on how to build the container. Most of this file is the same for any custom container, but note the `RUN pip install` line, which installs the specific PyTorch and HuggingFace libraries we need for our model. These are the same ones we used for local testing, as well as boto3 to handle the upload of generated images to S3.

In [None]:
%%writefile container/Dockerfile
FROM python:3.8

RUN apt-get -y update && apt-get install -y --no-install-recommends \
 wget \
 python3 \
 nginx \
 ca-certificates \
 git \
 ffmpeg \
 libsm6 \
 libxext6 \
 && rm -rf /var/lib/apt/lists/*

RUN wget https://bootstrap.pypa.io/get-pip.py && python3 get-pip.py && \
 pip install flask gevent gunicorn && \
 rm -rf /root/.cache

RUN git clone https://github.com/OFA-Sys/OFA
RUN cd OFA && pip install -r requirements.txt

RUN cd OFA && wget https://ofa-silicon.oss-us-west-1.aliyuncs.com/checkpoints/ofa_large_clean.pt &&\
 mkdir -p checkpoints &&\
 mv ofa_large_clean.pt checkpoints/ofa_large_clean.pt

RUN pip install --upgrade torchvision boto3 gradio
RUN cd OFA/fairseq &&\
 pip install ./


#testing
RUN pip install pytest behave


COPY OFA /OFA
WORKDIR /OFA
ENV PATH="/OFA:${PATH}"
RUN chmod +x /OFA/serve

RUN ["chmod", "+x", "/OFA/test.sh"]

CMD ["./test.sh"]



Now create a python unit tests to test OFA model

In [None]:
%%writefile container/OFA/test_predictor.py
import os
import uuid
import boto3


import predictor as myapp
import pytest
import json

bucket_name = f"pytest-ofa-{str(uuid.uuid4())}"
print(f"bucket name:{bucket_name}")
file_name="cat.png"

def setup_module(module):
 """..."""
 s3_client = boto3.client('s3')
 s3_client.create_bucket(Bucket=bucket_name)
 s3_client.upload_file(file_name, bucket_name, file_name)
 print("copied files")

def teardown_module(module):
 """..."""
 #delete all objeccs
 s3_client = boto3.client('s3')
 response = s3_client.list_objects(Bucket=bucket_name)
 if 'Contents' in response:
 for content in response['Contents']:
 s3_client.delete_object(Bucket=bucket_name, Key=content['Key'])

 #delete bucket
 s3_client.delete_bucket(Bucket=bucket_name)
 print("Deleted bucket and files")

@pytest.fixture
def client():
 with myapp.app.test_client() as client:
 yield client

def test_ping(client):
 """Test ping operation"""
 print("test ping")
 rv = client.get('/ping')
 print(rv.data)
 print("done ping")

def test_image_caption(client):
 """Test image caption"""
 print("inside the test function")
 rv = client.post('/invocations', json={
 "ofa_task":"OFA_TASK_IMAGE_CAPTION",
 "bucket_name" : bucket_name,
 "key_name" : file_name
 })
 print(rv.data)
 data = json.loads(rv.data)
 assert "a cat wearing a face mask" in data['output']


def test_visual_qa(client):
 """Test visual qa"""
 print("inside the test function")
 rv = client.post('/invocations', json={
 "ofa_task":"OFA_TASK_VISUAL_QA",
 "bucket_name" : bucket_name,
 "key_name" : file_name,
 "instruction": "what is cat wearing?"

 })
 print(rv.data)
 data = json.loads(rv.data)
 assert "mask" in data['output']


Create a test shell script that will be invoked when the container is run locally

In [None]:
%%writefile container/OFA/test.sh
#!/bin/sh

pytest -rPv --capture=no test_predictor.py

Copy the sample image used by the unit tests

In [None]:
!cp cat.png container/OFA

# 4. Create a custom Docker container for this inference script

First, clean out any old Docker images to prevent your Jupyter instance from running out of space.

In [None]:
!docker system prune -af

Now let us run the docker image locally and test the model using unit tests

In [None]:
%%capture capt
%%time
algorithm_name="ofa"
!docker build -t $algorithm_name container/.
!docker run -it $algorithm_name

In [None]:
capt.stderr

Next, we do a bit of setup work to get the name variables into the shell, and then run Docker build to actually build the container. Once it's built, we push it to ECR, and create a new repository in ECR for the container if one does not already exist.

Print the errror from executing above shell script

In [None]:
%%capture capt
%%sh
set -x

# The name of our algorithm
algorithm_name=ofa

cd container

account=$(aws sts get-caller-identity --query Account --output text)

# Get the region defined in the current configuration (default to us-west-2 if none defined)
region=$(aws configure get region)
region=${region:-us-west-2}

fullname="${account}.dkr.ecr.${region}.amazonaws.com/${algorithm_name}:latest"

# If the repository doesn't exist in ECR, create it.

aws ecr describe-repositories --repository-names "${algorithm_name}" > /dev/null 2>&1

if [ $? -ne 0 ]
then
 aws ecr create-repository --repository-name "${algorithm_name}" > /dev/null
fi

# Get the login command from ECR and execute it directly
$(aws ecr get-login --region ${region} --no-include-email)

# Get the login command from ECR in order to pull down the SageMaker PyTorch image
#$(aws ecr get-login --registry-ids 520713654638 --region ${region} --no-include-email)

# Build the docker image locally with the image name and then push it to ECR
# with the full name.

docker build -t ${algorithm_name} . --build-arg REGION=${region}
docker tag ${algorithm_name} ${fullname}

docker push ${fullname}


Print the errror from executing above shell script

In [None]:
capt.stdout

# 5. Define and Deploy the model

### 5 a) Define the model object
Now that the container is built, we can start to set up the SageMaker endpoint. The first step is to create a [SageMaker model object](https://docs.aws.amazon.com/sagemaker/latest/dg/realtime-endpoints-deployment.html#realtime-endpoints-deployment-create-model), which is a unique name for the model, the location of the image we just built, and the role the endpoint should use. Here we use the same role that is used by this notebook.

In [None]:
model='ofa'

import boto3
from sagemaker import get_execution_role

sm_client = boto3.client(service_name='sagemaker')
runtime_sm_client = boto3.client(service_name='sagemaker-runtime')

account_id = boto3.client('sts').get_caller_identity()['Account']
region = boto3.Session().region_name


role = get_execution_role()


from time import gmtime, strftime

model_name = f"{model}-{strftime('%Y-%m-%d-%H-%M-%S', gmtime())}"
container = f"{account_id}.dkr.ecr.{region}.amazonaws.com/{model}:latest"
instance_type = 'ml.c5d.18xlarge'

print('Model name: ' + model_name)
print('Container image: ' + container)

container = {
 'Image': container
}

create_model_response = sm_client.create_model(
 ModelName = model_name,
 ExecutionRoleArn = role,
 Containers = [container])

print("Model Arn: " + create_model_response['ModelArn'])

### 5 b) Define the Endpoint Configuration
Now that we've defined what the model object is, we set up the [Endpoint Configuration](https://docs.aws.amazon.com/sagemaker/latest/dg/realtime-endpoints-deployment.html#realtime-endpoints-deployment-create-endpoint-config). This is where we set the details on what kind of machines the model should be running on. Here, we set it to g4dn.2xlarge, which is an instance type with one GPU. We also set our autoscaling group to 1, meaning only a single instance max will be added to the group. This helps to keep cost down while testing.

In [None]:
endpoint_config_name = f"{model}-config{strftime('%Y-%m-%d-%H-%M-%S', gmtime())}"
print('Endpoint config name: ' + endpoint_config_name)

instance_type = 'ml.g4dn.2xlarge'

create_endpoint_config_response = sm_client.create_endpoint_config(
 EndpointConfigName = endpoint_config_name,
 ProductionVariants=[{
 'InstanceType': instance_type,
 'InitialInstanceCount': 1,
 'InitialVariantWeight': 1,
 'ModelName': model_name,
 'VariantName': 'AllTraffic'}])

print("Endpoint config Arn: " + create_endpoint_config_response['EndpointConfigArn'])

### 5 c) Deploy the endpoint
Now that we have set up our configuration for the model and for the endpoint, we can bring these two together and actually [deploy the model](https://docs.aws.amazon.com/sagemaker/latest/dg/realtime-endpoints-deployment.html#w1131aac27c17b9b9b7c29). Note that this step make take up to 10 minutes as the endpoint is turned on.

In [None]:
%%time

import time

endpoint_name = f"{model}-endpoint{strftime('%Y-%m-%d-%H-%M-%S', gmtime())}"
print('Endpoint name: ' + endpoint_name)

create_endpoint_response = sm_client.create_endpoint(
 EndpointName=endpoint_name,
 EndpointConfigName=endpoint_config_name)
print('Endpoint Arn: ' + create_endpoint_response['EndpointArn'])

resp = sm_client.describe_endpoint(EndpointName=endpoint_name)
status = resp['EndpointStatus']
print("Endpoint Status: " + status)

print('Waiting for {} endpoint to be in service...'.format(endpoint_name))
waiter = sm_client.get_waiter('endpoint_in_service')
waiter.wait(EndpointName=endpoint_name)

## 6. Test the Endpoint
Now that the endpoint is online, we can test it using the [Invoke Endpoint](https://boto3.amazonaws.com/v1/documentation/api/1.9.42/reference/services/sagemaker-runtime.html#SageMakerRuntime.Client.invoke_endpoint) function inside SageMaker. The endpoint is returning the image name, so we also set up a connection to S3 to download and display the image generated by the endpoint. Alternativly, we could set up a [REST API endpoint using API Gateway](https://aws.amazon.com/blogs/machine-learning/creating-a-machine-learning-powered-rest-api-with-amazon-api-gateway-mapping-templates-and-amazon-sagemaker/).

### 6.1. Create a temporary bucket for testing

In [None]:
s3_client = boto3.client('s3')
import uuid

#create a temporary bucket

bucket_name = f"{model}-test-{str(uuid.uuid4())}"
print(f"bucket name:{bucket_name}")
if region=='us-east-1':
 s3_client.create_bucket(Bucket=bucket_name)
else:
 s3_client.create_bucket(Bucket=bucket_name,CreateBucketConfiguration={'LocationConstraint': region})

### 6.2. Test Image Captioning

In this test, we will update sample image and ask the model to describe the objects in the image.

### 6.2.1 Test the model with a image of cat

Upload a sample image of a cat with a mask to the test S3 bucket

In [None]:
file_name="cat.png"
s3_client.upload_file(file_name, bucket_name, file_name)

from PIL import Image
#display the image
Image.open(file_name)

Invoke the sagemaker endpoint with the image file name and retrive the output of the model

In [None]:
%%time
import json
content_type = "application/json"
request_body = {
 "ofa_task":"OFA_TASK_IMAGE_CAPTION",
 "bucket_name" : bucket_name,
 "key_name" : file_name
}

payload = json.dumps(request_body)
print(payload)

#Endpoint invocation
response = runtime_sm_client.invoke_endpoint(
 EndpointName=endpoint_name,
 ContentType=content_type,
 Body=payload)

#Parse results
result = json.loads(response['Body'].read().decode())
print (result)

## 7. Clean up resources
Use this section to delete any of the resources we have deployed using this notebook. Don't forget to shut off the instance running this notebook when you are done! You may also want to head over to S3 and clear out your image bucket.

In [None]:
sm_client = boto3.client(service_name='sagemaker')
sm_client.delete_model(ModelName=model_name)
sm_client.delete_endpoint_config(EndpointConfigName=endpoint_config_name)
sm_client.delete_endpoint(EndpointName=endpoint_name)