# Amazon SageMaker Object Detection using the RecordIO format

## Introduction

Object detection is the process of identifying and localizing objects in an image. A typical object detection solution takes in an image as input and provides a bounding box on the image where a object of interest is along with identifying what object the box encapsulates. But before we have this solution, we need to acquire and process a traning dataset, create and setup a training job for the alorithm so that the aglorithm can learn about the dataset and then host the algorithm as an endpoint, to which we can supply the query image.

This notebook is an end-to-end example introducing the Amazon SageMaker Object Detection algorithm. In this demo, we will demonstrate how to train and to host an object detection model on the [Pascal VOC dataset](http://host.robots.ox.ac.uk/pascal/VOC/) using the Single Shot multibox Detector ([SSD](https://arxiv.org/abs/1512.02325)) algorithm. In doing so, we will also demonstrate how to construct a training dataset using the RecordIO format as this is the format that the training job will consume. We will also demonstrate how to host and validate this trained model. Amazon SageMaker Object Detection also allow training with the image and JSON format, which is illustrated in the [image and JSON Notebook](https://github.com/awslabs/amazon-sagemaker-examples/blob/master/introduction_to_amazon_algorithms/object_detection_pascalvoc_coco/object_detection_image_json_format.ipynb).

In [None]:
from IPython.display import clear_output
clear_output()

## Setup

To train the Object Detection algorithm on Amazon SageMaker, we need to setup and authenticate the use of AWS services. To begin with we need an AWS account role with SageMaker access. This role is used to give SageMaker access to your data in S3 will automatically be obtained from the role used to start the notebook.

In [None]:
%%time
import boto3
import time
import string
import sagemaker
import random
from sagemaker import get_execution_role

role = get_execution_role()
print(role)
sess = sagemaker.Session()

We also need the S3 bucket that you want to use for training and to store the tranied model artifacts. In this notebook, we require a custom bucket that exists so as to keep the naming clean. You can end up using a default bucket that SageMaker comes with as well.

In [None]:
bucket = ''
prefix = 'object-detection-training'

Lastly, we need the Amazon SageMaker Object Detection docker image, which is static and need not be changed.

In [None]:
from sagemaker.amazon.amazon_estimator import get_image_uri

training_image = get_image_uri(sess.boto_region_name, 'object-detection', repo_version="latest")
print (training_image)

## Data Upload

Let's list the content of the generated dataset RecordIO file

In [None]:
!head -n 3 RecordIO/train.lst > example.lst
f = open('example.lst','r')
lst_content = f.read()
print(lst_content)

As can be seen that each line in the .lst file represents the annotations for a image. A .lst file is a **tab**-delimited file with multiple columns. The rows of the file are annotations of the of image files. The first column specifies a unique image index. The second column specifies the header size of the current row. In the above example .lst file, 2 from the second column means the second and third columns are header information, which will not be considered as label and bounding box information of the image specified by the current row.

The third column specifies the label width of a single object. In the first row of above sample .lst file, 5 from the third row means each object within an image will have 5 numbers to describe its label information, including class index, and bounding box coordinates. If there are multiple objects within one image, all the label information should be listed in one line. The annotation information for each object is represented as ``[class_index, xmin, ymin, xmax, ymax]``. 

The classes should be labeled with successive numbers and start with 0. The bounding box coordinates are ratios of its top-left (xmin, ymin) and bottom-right (xmax, ymax) corner indices to the overall image size. Note that the top-left corner of the entire image is the origin (0, 0). The last column specifies the relative path of the image file.

After generating the .lst file, the RecordIO can be created by running the following command:

### Upload data to S3
Upload the data to the S3 bucket. We do this in multiple channels. Channels are simply directories in the bucket that differentiate between training and validation data. Let us simply call these directories `train` and `validation`.

In [None]:
%%time

# Upload the RecordIO files to train and validation channels
train_channel = prefix + '/train'
validation_channel = prefix + '/validation'

sess.upload_data(path='RecordIO/train.rec', bucket=bucket, key_prefix=train_channel)
sess.upload_data(path='RecordIO/val.rec', bucket=bucket, key_prefix=validation_channel)

s3_train_data = 's3://{}/{}'.format(bucket, train_channel)
s3_validation_data = 's3://{}/{}'.format(bucket, validation_channel)

Next we need to setup an output location at S3, where the model artifact will be dumped. These artifacts are also the output of the algorithm's traning job.

In [None]:
s3_input_train = 's3://{}/{}'.format(bucket, train_channel)
s3_input_validation = 's3://{}/{}'.format(bucket, validation_channel)
s3_output_location = 's3://{}/{}/output'.format(bucket, prefix)
print(s3_input_train)
print(s3_input_validation)
print(s3_output_location)

## Hyperparameter Tuning

The object detection algorithm at its core is the [Single-Shot Multi-Box detection algorithm (SSD)](https://arxiv.org/abs/1512.02325). This algorithm uses a `base_network`, which is typically a [VGG](https://arxiv.org/abs/1409.1556) or a [ResNet](https://arxiv.org/abs/1512.03385). The Amazon SageMaker object detection algorithm supports VGG-16 and ResNet-50 now. It also has a lot of options for hyperparameters that help configure the training job. The next step in our training, is to setup these hyperparameters and data channels for training the model. Consider the following example definition of hyperparameters. See the SageMaker Object Detection [documentation](https://docs.aws.amazon.com/sagemaker/latest/dg/object-detection.html) for more details on the hyperparameters.

In [None]:
# Parameters

## Number of Classes
num_classes = 2

## Number of training samples (80% total)
num_training_samples = 1600

## Instance type
instance_type = 'ml.p3.8xlarge'

## Instance count per job
instance_count_per_job = 1

## Learn Scheduler Step
lr_scheduler_step = '20,40'

## Max HyperParameter Training Epochs
ht_epochs = 60

## Max Final Training Epochs
final_epochs = 240

## Max HyperParameter Training jobs
max_ht_training_jobs = 10

## Max Parallel jobs
max_parallel_jobs = 1


Now that we are done with all the setup that is needed, and also uploaded the files to S3, we are ready to start training, but the objective is get the best train from this dataset. To do it we have to run a Hyperparameter Tunning Job.

Follow the next steps to do it.

In [None]:
# Data channels
train_data = sagemaker.session.s3_input(s3_input_train, distribution='FullyReplicated', 
 content_type='application/x-recordio', s3_data_type='S3Prefix')
validation_data = sagemaker.session.s3_input(s3_input_validation, distribution='FullyReplicated', 
 content_type='application/x-recordio', s3_data_type='S3Prefix')

In [None]:
from sagemaker.tuner import IntegerParameter, CategoricalParameter, ContinuousParameter, HyperparameterTuner

# Base training estimator
od = sagemaker.estimator.Estimator(training_image,
 role, 
 train_instance_count = instance_count_per_job, 
 train_instance_type = instance_type,
 train_volume_size = 50,
 train_max_run = 360000,
 input_mode= 'File',
 output_path = s3_output_location,
 sagemaker_session = sess,
 base_job_name = 'smart-cooler')
# Fixed hyperparameters
od.set_hyperparameters(base_network = 'resnet-50',
 use_pretrained_model = 1,
 early_stopping = True,
 num_classes = num_classes,
 num_training_samples = num_training_samples,
 epochs = ht_epochs,
 lr_scheduler_step = lr_scheduler_step,
 lr_scheduler_factor = 0.1,
 overlap_threshold = 0.5,
 nms_threshold = 0.45,
 image_shape = 512,
 label_width = 350)

# Run tuning job
tuning_job_name = "smart-cooler-tuning-{}".format(''.join(random.choices(string.ascii_letters + string.digits, k=8)))

hyperparameter_ranges = {'learning_rate': ContinuousParameter(0.001, 0.100),
 'mini_batch_size': IntegerParameter(16, 64),
 'momentum': ContinuousParameter(0.80, 0.99),
 'weight_decay': ContinuousParameter(0.001, 0.100),
 'optimizer': CategoricalParameter(['sgd', 'adam', 'rmsprop', 'adadelta'])}

tuner = HyperparameterTuner(od, 
 'validation:mAP', 
 hyperparameter_ranges,
 objective_type = 'Maximize', 
 max_jobs = max_ht_training_jobs, 
 max_parallel_jobs = max_parallel_jobs, 
 early_stopping_type = 'Auto')

tuner.fit({'train': train_data, 'validation': validation_data}, 
 job_name = tuning_job_name, include_cls_metadata = False)
tuner.wait()

After the tuning job is finished, the top 5 performing hyperparameters can be listed below.

In [None]:
# Get best jobs
tuner_metrics = sagemaker.HyperparameterTuningJobAnalytics(tuning_job_name)
best_jobs = tuner_metrics.dataframe().sort_values(['FinalObjectiveValue'], ascending=False).head(5)

best_jobs

## Training
Now that we are done with the Hyperparameter Tuning job we will take the best hyperparameter results and mature the model, training with more epochs.

In [None]:
# Train with best job hyperparameters
best_job = best_jobs.head(1)
i = best_job.index[0]

hyperparams = od.hyperparameters()
hyperparams['learning_rate'] = best_job.at[i, 'learning_rate']
hyperparams['mini_batch_size'] = int(best_job.at[i, 'mini_batch_size'])
hyperparams['momentum'] = best_job.at[i, 'momentum']
hyperparams['weight_decay'] = best_job.at[i, 'weight_decay']
hyperparams['optimizer'] = best_job.at[i, 'optimizer']

hyperparams['epochs'] = final_epochs

print(hyperparams)

od.set_hyperparameters(**hyperparams)

x = ''.join(random.choices(string.ascii_letters + string.digits, k=8))
od.fit({'train': train_data, 'validation': validation_data}, 
 job_name='smart-cooler-training-' + x)

## Hosting
Once the training is done, we can deploy the trained model as an Amazon SageMaker real-time hosted endpoint. This will allow us to make predictions (or inference) from the model. Note that we don't have to host on the same instance type that we used to train. Training is a prolonged and compute-intensive job that require different compute and memory requirements than hosting. We can choose any type of instance we want to host the model. In our case, we chose the `ml.p3.2xlarge` instance type to train, but we will host the model on a less expensive instance type, `ml.m5.xlarge`. 
The endpoint deployment can be accomplished as follows:

In [None]:
# Deploy model to endpoint
od_endpoint = od.deploy(initial_instance_count = 1,
 instance_type = 'ml.m5.2xlarge',
 endpoint_name='smart-cooler-endpoint-' + x)

## Inference
Now that the trained model is deployed at an endpoint that is up-and-running, we can use this endpoint for inference.
The following code blocks define some functions that will be used for processing and visualizing inference results.

In [None]:
!pip install wget pillow

In [None]:
import json
import glob
import wget
import random
import numpy as np
from PIL import Image
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
from PIL import Image
from math import floor, ceil
import itertools
import boto3

In [None]:
def get_object_boundary_box(img, obj_boundaries):
 if (len(obj_boundaries) != 4):
 raise Exception("Sagemaker boundaries are not of size 4")
 # Find size of boundary box 
 img_boundaries = img.getbbox()
 x_size = img_boundaries[2] - img_boundaries[0]
 y_size = img_boundaries[3] - img_boundaries[1]

 # Generate tuple of pixel boundaries using the boundaries generated from model. 
 x_min = floor(x_size * obj_boundaries[0])
 y_min = floor(y_size * obj_boundaries[1])
 x_max = ceil(x_size * obj_boundaries[2])
 y_max = ceil(y_size * obj_boundaries[3])
 return tuple(map(int, [x_min, y_min, x_max, y_max]))

In [None]:
def visualize_detection(img_file, dets, classes=[], colors=[], thresh=0.6):
 import random
 import matplotlib.pyplot as plt
 import matplotlib.image as mpimg

 fig_size = plt.rcParams["figure.figsize"]
 fig_size[0] = 9
 fig_size[1] = 6
 plt.rcParams["figure.figsize"] = fig_size
 
 img=mpimg.imread(img_file)
 plt.imshow(img)
 height = img.shape[0]
 width = img.shape[1]
 for det in dets:
 (klass, score, x0, y0, x1, y1) = det
 if score < thresh:
 continue
 cls_id = int(klass)
 xmin = int(x0 * width)
 ymin = int(y0 * height)
 xmax = int(x1 * width)
 ymax = int(y1 * height)
 rect = plt.Rectangle((xmin, ymin), xmax - xmin,
 ymax - ymin, fill=False,
 edgecolor=colors[cls_id],
 linewidth=3.5)
 plt.gca().add_patch(rect)
 class_name = str(cls_id)
 if classes and len(classes) > cls_id:
 class_name = classes[cls_id]
 plt.gca().text(xmin, ymin - 2,
 '{:s} {:.2f}'.format(class_name, score),
 bbox=dict(facecolor=colors[cls_id], alpha=0.5),
 fontsize=12, color='white')
 
 plt.show()

In [None]:
def get_iou(bb1, bb2):
 """
 Calculate the Intersection over Union (IoU) of two bounding boxes.

 Parameters
 ----------
 bb1 : dict
 Keys: {'x1', 'x2', 'y1', 'y2'}
 The (x1, y1) position is at the top left corner,
 the (x2, y2) position is at the bottom right corner
 bb2 : dict
 Keys: {'x1', 'x2', 'y1', 'y2'}
 The (x, y) position is at the top left corner,
 the (x2, y2) position is at the bottom right corner

 Returns
 -------
 float
 in [0, 1]
 """
 
 #print(bb1[0])
 #bb1 = float(bb1)
 #bb2 = float(bb2)
 #print(bb1)
 
 assert bb1[0] < bb1[2]
 assert bb1[1] < bb1[3]
 assert bb2[0] < bb2[2]
 assert bb2[1] < bb2[3]

 # determine the coordinates of the intersection rectangle
 x_left = max(bb1[0], bb2[0])
 y_top = max(bb1[1], bb2[1])
 x_right = min(bb1[2], bb2[2])
 y_bottom = min(bb1[3], bb2[3])

 if x_right < x_left or y_bottom < y_top:
 return 0.0
 
 # The intersection of two axis-aligned bounding boxes is always an
 # axis-aligned bounding box
 intersection_area = (x_right - x_left) * (y_bottom - y_top)
 
 # compute the area of both AABBs
 bb1_area = (bb1[2] - bb1[0]) * (bb1[3] - bb1[1])
 bb2_area = (bb2[2] - bb2[0]) * (bb2[3] - bb2[1])
 
 if intersection_area == bb1_area or intersection_area == bb1_area:
 return 1.0
 
 if intersection_area / bb1_area > 0.5 or intersection_area / bb2_area > 0.5:
 return 0.5

 # compute the intersection over union by taking the intersection
 # area and dividing it by the sum of prediction + ground-truth
 # areas - the interesection area
 iou = intersection_area / float(bb1_area + bb2_area - intersection_area)
 assert iou >= 0.0
 assert iou <= 1.0
 return iou

In [None]:
def get_bounding_boxes(result, treshold_ssd, treshold_iou):
 # Array of bounding boxes
 bounding_boxes = []

 print("Prediction quantity: {}".format(len(result["prediction"])))

 # Remove bounding boxes above the confidence ssd treshold 
 for prediction in result["prediction"]:
 if prediction[1] >= treshold_ssd:
 bounding_boxes.append(prediction)

 print("Bounding box quantity before NMS: {}".format(len(bounding_boxes)))

 i=0
 while i < len(bounding_boxes):
 boundbox = bounding_boxes[i]
 #print(str(i) + " BB: " + str(boundbox[1]))
 i+=1

 if boundbox[5] - boundbox[3] < 0.05:
 print("Size Removing, Y small: " + str(boundbox[5] - boundbox[3]))
 bounding_boxes.remove(boundbox)
 i=0
 continue

 if boundbox[4] - boundbox[2] < 0.05:
 print("Size Removing, x small: " + str(boundbox[4] - boundbox[2]))
 bounding_boxes.remove(boundbox)
 i=0
 continue 

 if boundbox[5] - boundbox[3] > 0.20:
 print("Size Removing, Y large: " + str(boundbox[5] - boundbox[3]))
 bounding_boxes.remove(boundbox)
 i=0
 continue

 if boundbox[4] - boundbox[2] > 0.20:
 print("Size Removing, x large: " + str(boundbox[4] - boundbox[2]))
 bounding_boxes.remove(boundbox)
 i=0
 continue 

 count = 0
 len_bounding_boxes = len(bounding_boxes)
 while count < len_bounding_boxes:

 if boundbox != bounding_boxes[count]:
 check = get_iou(boundbox[2:],bounding_boxes[count][2:])
 if check >= treshold_iou:

 # Best Score
 if boundbox[1] < bounding_boxes[count][1]:
 print("Removing this: " + str(boundbox[1]))
 if boundbox in bounding_boxes:
 bounding_boxes.remove(boundbox)
 

 else:
 print("Removing last: " + str(bounding_boxes[count][1]))
 if bounding_boxes[count] in bounding_boxes:
 bounding_boxes.remove(bounding_boxes[count])

 print("Item Deleted: {}".format(count))
 len_bounding_boxes -= 1

 # Raise Counter
 count += 1
 
 print("Bounding box quantity after NMS: {}".format(len(bounding_boxes)))

 return bounding_boxes

Let us download an image to test. Please add the URL of the image you want test. Better if this image is one from the fridge with the products trained.

In [None]:
file_name = 'test.jpg'

with open(file_name, 'rb') as image:
 f = image.read()
 b = bytearray(f)
 ne = open('n.txt','wb')
 ne.write(b)

Let us use our endpoint to try to detect objects within this image. Since the image is jpeg, we use the appropriate content_type to run the prediction job. The endpoint returns a JSON file that we can simply load and peek into.

In [None]:
import json
file_name = 'test.jpg'

with open(file_name, 'rb') as f:
 payload = f.read()
 payload = bytearray(payload)
 
od_endpoint.content_type = 'image/jpeg'
result = json.loads(od_endpoint.predict(payload))

print(result)

The results are in a format that is similar to the input .lst file (See [RecordIO Notebook](https://github.com/awslabs/amazon-sagemaker-examples/blob/master/introduction_to_amazon_algorithms/object_detection_pascalvoc_coco/object_detection_recordio_format.ipynb) for more details on the .lst file definition. )with an addition of a confidence score for each detected object. The format of the output can be represented as `[class_index, confidence_score, xmin, ymin, xmax, ymax]`. Typically, we don't consider low-confidence predictions.

We have provided additional script to easily visualize the detection outputs. You can visulize the high-confidence preditions with bounding box by filtering out low-confidence detections following next steps:

Setting the threshold to see results according to these limits.

In [None]:
# Treshold definition
threshold_ssd = 0.25
threshold_iou = 0.10

# Define Object categories
object_categories = ['red',
 'blue']

object_colors = [(0.9,0.4,0.4),
 (0.1,0.1,0.9)]

# Visualize the detections.
visualize_detection(file_name, 
 get_bounding_boxes(result, threshold_ssd, threshold_iou),
 object_categories, 
 object_colors, 
 threshold_ssd)