{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Amazon SageMaker Object Detection using the augmented manifest file format\n", "\n", "1. [Introduction](#Introduction)\n", "2. [Setup](#Setup)\n", "3. [Specifying input Dataset](#Specifying-input-Dataset)\n", "4. [Training](#Training)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Introduction\n", "\n", "Object detection is the process of identifying and localizing objects in an image. A typical object detection solution takes in an image as input and provides a bounding box on the image where an object of interest is, along with identifying what object the box encapsulates. But before we have this solution, we need to process a training dataset, create and setup a training job for the algorithm so that the aglorithm can learn about the dataset and then host the algorithm as an endpoint, to which we can supply the query image.\n", "\n", "This notebook focuses on using the built-in SageMaker Single Shot multibox Detector ([SSD](https://arxiv.org/abs/1512.02325)) object detection algorithm to train model on your custom dataset. For dataset prepration or using the model for inference, please see other scripts in [this folder](./)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Setup\n", "\n", "To train the Object Detection algorithm on Amazon SageMaker, we need to setup and authenticate the use of AWS services. To begin with we need an AWS account role with SageMaker access. This role is used to give SageMaker access to your data in S3. In this example, we will use the same role that was used to start this SageMaker notebook." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "arn:aws:iam::947146424102:role/service-role/AmazonSageMaker-ExecutionRole-20190731T152369\n", "CPU times: user 971 ms, sys: 96.3 ms, total: 1.07 s\n", "Wall time: 1.06 s\n" ] } ], "source": [ "%%time\n", "import sagemaker\n", "import boto3\n", "from sagemaker import get_execution_role\n", "\n", "role = get_execution_role()\n", "print(role)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We also need the S3 bucket that has the training manifests and will be used to store the tranied model artifacts. " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "bucket = ''\n", "prefix = 'demo'" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Specifying input Dataset\n", "\n", "This notebook assumes you already have prepared two [Augmented Manifest Files](https://docs.aws.amazon.com/sagemaker/latest/dg/augmented-manifest.html) as training and validation input data for the object detection model. \n", "\n", "There are many advantages to using **augmented manifest files** for your training input\n", "\n", "* No format conversion is required if you are using SageMaker Ground Truth to generate the data labels\n", "* Unlike the traditional approach of providing paths to the input images separately from its labels, augmented manifest file already combines both into one entry for each input image, reducing complexity in algorithm code for matching each image with labels. (Read this [blog post](https://aws.amazon.com/blogs/machine-learning/easily-train-models-using-datasets-labeled-by-amazon-sagemaker-ground-truth/) for more explanation.) \n", "* When splitting your dataset for train/validation/test, you don't need to rearrange and re-upload image files to different s3 prefixes for train vs validation. Once you upload your image files to S3, you never need to move it again. You can just place pointers to these images in your augmented manifest file for training and validation. More on the train/validation data split in this post later. \n", "* When using augmented manifest file, the training input images is loaded on to the training instance in *Pipe mode,* which means the input data is streamed directly to the training algorithm while it is running (vs. File mode, where all input files need to be downloaded to disk before the training starts). This results in faster training performance and less disk resource utilization. Read more in this [blog post](https://aws.amazon.com/blogs/machine-learning/accelerate-model-training-using-faster-pipe-mode-on-amazon-sagemaker/) on the benefits of pipe mode.\n" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Train data: s3://angelaw-test-sagemaker-blog/demo/all_augmented.json\n", "Validation data: s3://angelaw-test-sagemaker-blog/training-manifest/demo/validation.manifest\n" ] } ], "source": [ "train_data_prefix = \"demo\"\n", "# below uses the training data after augmentation\n", "s3_train_data= \"s3://{}/{}/all_augmented.json\".format(bucket, train_data_prefix)\n", "# uncomment below to use the non-augmented input\n", "# s3_train_data= \"s3://{}/training-manifest/{}/train.manifest\".format(bucket, train_data_prefix)\n", "s3_validation_data = \"s3://{}/training-manifest/{}/validation.manifest\".format(bucket, train_data_prefix)\n", "print(\"Train data: {}\".format(s3_train_data) )\n", "print(\"Validation data: {}\".format(s3_validation_data) )" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "train_input = {\n", " \"ChannelName\": \"train\",\n", " \"DataSource\": {\n", " \"S3DataSource\": {\n", " \"S3DataType\": \"AugmentedManifestFile\", \n", " \"S3Uri\": s3_train_data,\n", " \"S3DataDistributionType\": \"FullyReplicated\",\n", " # This must correspond to the JSON field names in your augmented manifest.\n", " \"AttributeNames\": ['source-ref', 'bb']\n", " }\n", " },\n", " \"ContentType\": \"application/x-recordio\",\n", " \"RecordWrapperType\": \"RecordIO\",\n", " \"CompressionType\": \"None\"\n", "}\n" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "validation_input = {\n", " \"ChannelName\": \"validation\",\n", " \"DataSource\": {\n", " \"S3DataSource\": {\n", " \"S3DataType\": \"AugmentedManifestFile\", \n", " \"S3Uri\": s3_validation_data,\n", " \"S3DataDistributionType\": \"FullyReplicated\",\n", " # This must correspond to the JSON field names in your augmented manifest.\n", " \"AttributeNames\": ['source-ref', 'bb']\n", " }\n", " },\n", " \"ContentType\": \"application/x-recordio\",\n", " \"RecordWrapperType\": \"RecordIO\",\n", " \"CompressionType\": \"None\"\n", "}\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Below code computes the number of training samples, required in the training job request." ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "download: s3://angelaw-test-sagemaker-blog/demo/all_augmented.json to ./all_augmented.json\n" ] }, { "data": { "text/plain": [ "5870" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import json\n", "import os \n", "\n", "def read_manifest_file(file_path):\n", " with open(file_path, 'r') as f:\n", " output = [json.loads(line.strip()) for line in f.readlines()]\n", " return output\n", " \n", "!aws s3 cp $s3_train_data . \n", "train_data = read_manifest_file(os.path.split(s3_train_data)[1])\n", "num_training_samples = len(train_data)\n", "num_training_samples" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'s3://angelaw-test-sagemaker-blog/demo/output'" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "s3_output_path = 's3://{}/{}/output'.format(bucket, prefix)\n", "s3_output_path" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Training\n", "Now that we are done with all the setup that is needed, we are ready to train our object detector. " ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "811284229777.dkr.ecr.us-east-1.amazonaws.com/object-detection:latest\n" ] } ], "source": [ "from sagemaker.amazon.amazon_estimator import get_image_uri\n", "\n", "# This retrieves a docker container with the built in object detection SSD model. \n", "training_image = sagemaker.amazon.amazon_estimator.get_image_uri(boto3.Session().region_name, 'object-detection', repo_version='latest')\n", "print (training_image)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Create a unique job name" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'od-demo-2019-08-01-04-57-12'" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import time \n", "\n", "job_name_prefix = 'od-demo'\n", "timestamp = time.strftime('-%Y-%m-%d-%H-%M-%S', time.gmtime())\n", "model_job_name = job_name_prefix + timestamp\n", "model_job_name" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The object detection algorithm at its core is the [Single-Shot Multi-Box detection algorithm (SSD)](https://arxiv.org/abs/1512.02325). This algorithm uses a `base_network`, which is typically a [VGG](https://arxiv.org/abs/1409.1556) or a [ResNet](https://arxiv.org/abs/1512.03385). (resnet is typically faster so for edge inferences, I'd recommend using this base network). The Amazon SageMaker object detection algorithm supports VGG-16 and ResNet-50 now. It also has a lot of options for hyperparameters that help configure the training job. The next step in our training, is to setup these hyperparameters and data channels for training the model. See the SageMaker Object Detection [documentation](https://docs.aws.amazon.com/sagemaker/latest/dg/object-detection.html) for more details on the hyperparameters.\n", "\n", "To figure out which works best for your data, run a hyperparameter tuning job. There's some example notebooks at [https://github.com/awslabs/amazon-sagemaker-examples](https://github.com/awslabs/amazon-sagemaker-examples) that you can use for reference. " ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [], "source": [ "# This is where transfer learning happens. We use the pre-trained model and nuke the output layer by specifying\n", "# the num_classes value. You can also run a hyperparameter tuning job to figure out which values work the best. \n", "hyperparams = { \n", " \"base_network\": 'resnet-50',\n", " \"use_pretrained_model\": \"1\",\n", " \"num_classes\": \"2\", \n", " \"mini_batch_size\": \"30\",\n", " \"epochs\": \"30\",\n", " \"learning_rate\": \"0.001\",\n", " \"lr_scheduler_step\": \"10,20\",\n", " \"lr_scheduler_factor\": \"0.25\",\n", " \"optimizer\": \"sgd\",\n", " \"momentum\": \"0.9\",\n", " \"weight_decay\": \"0.0005\",\n", " \"overlap_threshold\": \"0.5\",\n", " \"nms_threshold\": \"0.45\",\n", " \"image_shape\": \"512\",\n", " \"label_width\": \"150\",\n", " \"num_training_samples\": str(num_training_samples)\n", " }" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now that the hyperparameters are set up, we configure the rest of the training job parameters" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [], "source": [ "training_params = \\\n", " {\n", " \"AlgorithmSpecification\": {\n", " \"TrainingImage\": training_image,\n", " \"TrainingInputMode\": \"Pipe\"\n", " },\n", " \"RoleArn\": role,\n", " \"OutputDataConfig\": {\n", " \"S3OutputPath\": s3_output_path\n", " },\n", " \"ResourceConfig\": {\n", " \"InstanceCount\": 1,\n", " \"InstanceType\": \"ml.p3.8xlarge\",\n", " \"VolumeSizeInGB\": 200\n", " },\n", " \"TrainingJobName\": model_job_name,\n", " \"HyperParameters\": hyperparams,\n", " \"StoppingCondition\": {\n", " \"MaxRuntimeInSeconds\": 86400\n", " },\n", " \"InputDataConfig\": [\n", " train_input,\n", " validation_input\n", " ]\n", " }\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now we create the SageMaker training job." ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Training job current status: InProgress\n" ] } ], "source": [ "client = boto3.client(service_name='sagemaker')\n", "client.create_training_job(**training_params)\n", "\n", "# Confirm that the training job has started\n", "status = client.describe_training_job(TrainingJobName=model_job_name)['TrainingJobStatus']\n", "print('Training job current status: {}'.format(status))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To check the progess of the training job, you can repeatedly evaluate the following cell. When the training job status reads 'Completed', move on to the next part of the tutorial.\n" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Training job status: InProgress\n", "Secondary status: Starting\n" ] } ], "source": [ "client = boto3.client(service_name='sagemaker')\n", "print(\"Training job status: \", client.describe_training_job(TrainingJobName=model_job_name)['TrainingJobStatus'])\n", "print(\"Secondary status: \", client.describe_training_job(TrainingJobName=model_job_name)['SecondaryStatus'])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Next step" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Once the training job completes, move on to the [next notebook](./03_local_inference_post_training.ipynb) to convert the trained model to a deployable format and run local inference" ] } ], "metadata": { "kernelspec": { "display_name": "conda_python3", "language": "python", "name": "conda_python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.5" }, "notice": "Copyright 2018 Amazon.com, Inc. or its affiliates. All Rights Reserved. Licensed under the Apache License, Version 2.0 (the \"License\"). You may not use this file except in compliance with the License. A copy of the License is located at http://aws.amazon.com/apache2.0/ or in the \"license\" file accompanying this file. This file is distributed on an \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License." }, "nbformat": 4, "nbformat_minor": 1 }