# Amazon SageMaker Object Detection using the augmented manifest file format

1. [Introduction](#Introduction)
2. [Setup](#Setup)
3. [Specifying input Dataset](#Specifying-input-Dataset)
4. [Training](#Training)

## Introduction

Object detection is the process of identifying and localizing objects in an image. A typical object detection solution takes in an image as input and provides a bounding box on the image where an object of interest is, along with identifying what object the box encapsulates. But before we have this solution, we need to process a training dataset, create and setup a training job for the algorithm so that the aglorithm can learn about the dataset and then host the algorithm as an endpoint, to which we can supply the query image.

This notebook focuses on using the built-in SageMaker Single Shot multibox Detector ([SSD](https://arxiv.org/abs/1512.02325)) object detection algorithm to train model on your custom dataset. For dataset prepration or using the model for inference, please see other scripts in [this folder](./)

## Setup

To train the Object Detection algorithm on Amazon SageMaker, we need to setup and authenticate the use of AWS services. To begin with we need an AWS account role with SageMaker access. This role is used to give SageMaker access to your data in S3. In this example, we will use the same role that was used to start this SageMaker notebook.

In [1]:
%%time
import sagemaker
import boto3
from sagemaker import get_execution_role

role = get_execution_role()
print(role)

arn:aws:iam::947146424102:role/service-role/AmazonSageMaker-ExecutionRole-20190731T152369
CPU times: user 971 ms, sys: 96.3 ms, total: 1.07 s
Wall time: 1.06 s


We also need the S3 bucket that has the training manifests and will be used to store the tranied model artifacts. 

In [None]:
bucket = ''
prefix = 'demo'

## Specifying input Dataset

This notebook assumes you already have prepared two [Augmented Manifest Files](https://docs.aws.amazon.com/sagemaker/latest/dg/augmented-manifest.html) as training and validation input data for the object detection model. 

There are many advantages to using **augmented manifest files** for your training input

* No format conversion is required if you are using SageMaker Ground Truth to generate the data labels
* Unlike the traditional approach of providing paths to the input images separately from its labels, augmented manifest file already combines both into one entry for each input image, reducing complexity in algorithm code for matching each image with labels. (Read this [blog post](https://aws.amazon.com/blogs/machine-learning/easily-train-models-using-datasets-labeled-by-amazon-sagemaker-ground-truth/) for more explanation.) 
* When splitting your dataset for train/validation/test, you don't need to rearrange and re-upload image files to different s3 prefixes for train vs validation. Once you upload your image files to S3, you never need to move it again. You can just place pointers to these images in your augmented manifest file for training and validation. More on the train/validation data split in this post later. 
* When using augmented manifest file, the training input images is loaded on to the training instance in *Pipe mode,* which means the input data is streamed directly to the training algorithm while it is running (vs. File mode, where all input files need to be downloaded to disk before the training starts). This results in faster training performance and less disk resource utilization. Read more in this [blog post](https://aws.amazon.com/blogs/machine-learning/accelerate-model-training-using-faster-pipe-mode-on-amazon-sagemaker/) on the benefits of pipe mode.


In [3]:
train_data_prefix = "demo"
# below uses the training data after augmentation
s3_train_data= "s3://{}/{}/all_augmented.json".format(bucket, train_data_prefix)
# uncomment below to use the non-augmented input
# s3_train_data= "s3://{}/training-manifest/{}/train.manifest".format(bucket, train_data_prefix)
s3_validation_data = "s3://{}/training-manifest/{}/validation.manifest".format(bucket, train_data_prefix)
print("Train data: {}".format(s3_train_data) )
print("Validation data: {}".format(s3_validation_data) )

Train data: s3://angelaw-test-sagemaker-blog/demo/all_augmented.json
Validation data: s3://angelaw-test-sagemaker-blog/training-manifest/demo/validation.manifest


In [4]:
train_input = {
 "ChannelName": "train",
 "DataSource": {
 "S3DataSource": {
 "S3DataType": "AugmentedManifestFile", 
 "S3Uri": s3_train_data,
 "S3DataDistributionType": "FullyReplicated",
 # This must correspond to the JSON field names in your augmented manifest.
 "AttributeNames": ['source-ref', 'bb']
 }
 },
 "ContentType": "application/x-recordio",
 "RecordWrapperType": "RecordIO",
 "CompressionType": "None"
}


In [5]:
validation_input = {
 "ChannelName": "validation",
 "DataSource": {
 "S3DataSource": {
 "S3DataType": "AugmentedManifestFile", 
 "S3Uri": s3_validation_data,
 "S3DataDistributionType": "FullyReplicated",
 # This must correspond to the JSON field names in your augmented manifest.
 "AttributeNames": ['source-ref', 'bb']
 }
 },
 "ContentType": "application/x-recordio",
 "RecordWrapperType": "RecordIO",
 "CompressionType": "None"
}


Below code computes the number of training samples, required in the training job request.

In [6]:
import json
import os 

def read_manifest_file(file_path):
 with open(file_path, 'r') as f:
 output = [json.loads(line.strip()) for line in f.readlines()]
 return output
 
!aws s3 cp $s3_train_data . 
train_data = read_manifest_file(os.path.split(s3_train_data)[1])
num_training_samples = len(train_data)
num_training_samples

download: s3://angelaw-test-sagemaker-blog/demo/all_augmented.json to ./all_augmented.json


5870

In [7]:
s3_output_path = 's3://{}/{}/output'.format(bucket, prefix)
s3_output_path

's3://angelaw-test-sagemaker-blog/demo/output'

## Training
Now that we are done with all the setup that is needed, we are ready to train our object detector. 

In [9]:
from sagemaker.amazon.amazon_estimator import get_image_uri

# This retrieves a docker container with the built in object detection SSD model. 
training_image = sagemaker.amazon.amazon_estimator.get_image_uri(boto3.Session().region_name, 'object-detection', repo_version='latest')
print (training_image)

811284229777.dkr.ecr.us-east-1.amazonaws.com/object-detection:latest


Create a unique job name

In [10]:
import time 

job_name_prefix = 'od-demo'
timestamp = time.strftime('-%Y-%m-%d-%H-%M-%S', time.gmtime())
model_job_name = job_name_prefix + timestamp
model_job_name

'od-demo-2019-08-01-04-57-12'

The object detection algorithm at its core is the [Single-Shot Multi-Box detection algorithm (SSD)](https://arxiv.org/abs/1512.02325). This algorithm uses a `base_network`, which is typically a [VGG](https://arxiv.org/abs/1409.1556) or a [ResNet](https://arxiv.org/abs/1512.03385). (resnet is typically faster so for edge inferences, I'd recommend using this base network). The Amazon SageMaker object detection algorithm supports VGG-16 and ResNet-50 now. It also has a lot of options for hyperparameters that help configure the training job. The next step in our training, is to setup these hyperparameters and data channels for training the model. See the SageMaker Object Detection [documentation](https://docs.aws.amazon.com/sagemaker/latest/dg/object-detection.html) for more details on the hyperparameters.

To figure out which works best for your data, run a hyperparameter tuning job. There's some example notebooks at [https://github.com/awslabs/amazon-sagemaker-examples](https://github.com/awslabs/amazon-sagemaker-examples) that you can use for reference. 

In [11]:
# This is where transfer learning happens. We use the pre-trained model and nuke the output layer by specifying
# the num_classes value. You can also run a hyperparameter tuning job to figure out which values work the best. 
hyperparams = { 
 "base_network": 'resnet-50',
 "use_pretrained_model": "1",
 "num_classes": "2", 
 "mini_batch_size": "30",
 "epochs": "30",
 "learning_rate": "0.001",
 "lr_scheduler_step": "10,20",
 "lr_scheduler_factor": "0.25",
 "optimizer": "sgd",
 "momentum": "0.9",
 "weight_decay": "0.0005",
 "overlap_threshold": "0.5",
 "nms_threshold": "0.45",
 "image_shape": "512",
 "label_width": "150",
 "num_training_samples": str(num_training_samples)
 }

Now that the hyperparameters are set up, we configure the rest of the training job parameters

In [12]:
training_params = \
 {
 "AlgorithmSpecification": {
 "TrainingImage": training_image,
 "TrainingInputMode": "Pipe"
 },
 "RoleArn": role,
 "OutputDataConfig": {
 "S3OutputPath": s3_output_path
 },
 "ResourceConfig": {
 "InstanceCount": 1,
 "InstanceType": "ml.p3.8xlarge",
 "VolumeSizeInGB": 200
 },
 "TrainingJobName": model_job_name,
 "HyperParameters": hyperparams,
 "StoppingCondition": {
 "MaxRuntimeInSeconds": 86400
 },
 "InputDataConfig": [
 train_input,
 validation_input
 ]
 }


Now we create the SageMaker training job.

In [13]:
client = boto3.client(service_name='sagemaker')
client.create_training_job(**training_params)

# Confirm that the training job has started
status = client.describe_training_job(TrainingJobName=model_job_name)['TrainingJobStatus']
print('Training job current status: {}'.format(status))

Training job current status: InProgress


To check the progess of the training job, you can repeatedly evaluate the following cell. When the training job status reads 'Completed', move on to the next part of the tutorial.


In [15]:
client = boto3.client(service_name='sagemaker')
print("Training job status: ", client.describe_training_job(TrainingJobName=model_job_name)['TrainingJobStatus'])
print("Secondary status: ", client.describe_training_job(TrainingJobName=model_job_name)['SecondaryStatus'])

Training job status: InProgress
Secondary status: Starting


# Next step

Once the training job completes, move on to the [next notebook](./03_local_inference_post_training.ipynb) to convert the trained model to a deployable format and run local inference