--- title: "Object detection using Pascal VOC dataset with SageMaker" date: 2020-04-02T11:17:32-04:00 draft: False algo: sageobj --- The Amazon SageMaker Object Detection algorithm detects and classifies objects in images using a single deep neural network. It is a supervised learning algorithm that takes images as input and identifies all instances of objects within the image scene. The object is categorized into one of the classes in a specified collection with a confidence score that it belongs to the class. Its location and scale in the image are indicated by a rectangular bounding box. It uses the Single Shot multibox Detector (SSD) framework and supports two base networks: VGG and ResNet. The network can be trained from scratch, or trained with models that have been pre-trained on the ImageNet dataset. The recommended input format for the Amazon SageMaker object detection algorithms is Apache MXNet RecordIO. However, you can also use raw images in .jpg or .png format. We think that training with the RecordIO format is the easiest way to get started with your image classification PoC on SageMaker. For full details on the input-output interface, see [this](https://docs.aws.amazon.com/sagemaker/latest/dg/object-detection.html#object-detection-inputoutput) #### Prepare your dataset in ImageRecord format [Reference](https://gluon-cv.mxnet.io/build/examples_datasets/recordio.html) Raw images are natural data format for computer vision tasks. However, when loading data from image files for training, disk IO might be a bottleneck. For instance, when training a ResNet50 model with ImageNet on an AWS p3.16xlarge instance, The parallel training on 8 GPUs makes it so fast, with which even reading images from ramdisk can’t catch up. To boost the performance on top-configured platform, we suggest users to train with MXNet’s ImageRecord format. We will use Pascal VOC, a popular computer vision challenge, [dataset](http://host.robots.ox.ac.uk/pascal/VOC/). We will use the data sets from 2007 and 2012, named as VOC07 and VOC12 respectively the latest one comprises of more than 20,000 images containing about 50,000 annotated objects. These annotated objects are grouped into 20 categories. #### Download prerequisite packages ``` wget https://raw.githubusercontent.com/awslabs/amazon-sagemaker-examples/master/introduction_to_amazon_algorithms/object_detection_pascalvoc_coco/tools/im2rec.py ./ wget https://raw.githubusercontent.com/awslabs/amazon-sagemaker-examples/master/introduction_to_amazon_algorithms/object_detection_pascalvoc_coco/tools/prepare_dataset.py ./ wget https://raw.githubusercontent.com/awslabs/amazon-sagemaker-examples/master/introduction_to_amazon_algorithms/object_detection_pascalvoc_coco/tools/concat_db.py ./ wget https://raw.githubusercontent.com/awslabs/amazon-sagemaker-examples/master/introduction_to_amazon_algorithms/object_detection_pascalvoc_coco/tools/imdb.py ./ wget https://raw.githubusercontent.com/awslabs/amazon-sagemaker-examples/master/introduction_to_amazon_algorithms/object_detection_pascalvoc_coco/tools/pascal_voc.names ./ wget https://raw.githubusercontent.com/awslabs/amazon-sagemaker-examples/master/introduction_to_amazon_algorithms/object_detection_pascalvoc_coco/tools/pascal_voc.py ./ pip install mxnet pip install opencv-python ``` Download Pascal VOC data sets ``` wget -P /tmp http://host.robots.ox.ac.uk/pascal/VOC/voc2012/VOCtrainval_11-May-2012.tar wget -P /tmp http://host.robots.ox.ac.uk/pascal/VOC/voc2007/VOCtrainval_06-Nov-2007.tar wget -P /tmp http://host.robots.ox.ac.uk/pascal/VOC/voc2007/VOCtest_06-Nov-2007.tar ``` Extract the data ``` tar -xf /tmp/VOCtrainval_11-May-2012.tar && rm /tmp/VOCtrainval_11-May-2012.tar tar -xf /tmp/VOCtrainval_06-Nov-2007.tar && rm /tmp/VOCtrainval_06-Nov-2007.tar tar -xf /tmp/VOCtest_06-Nov-2007.tar && rm /tmp/VOCtest_06-Nov-2007.tar ``` Now, we will combine the training and validation sets from both 2007 and 2012 as the training data set, and use the test set from 2007 as our validation set. ``` python prepare_dataset.py --dataset pascal --year 2007,2012 --root VOCdevkit --set trainval --target VOCdevkit/train.lst rm -rf VOCdevkit/VOC2012 python prepare_dataset.py --dataset pascal --year 2007 --root VOCdevkit --set test --target VOCdevkit/val.lst --no-shuffle rm -rf VOCdevkit/VOC2007 ``` It gives you two sets of files, one with "train" and other with "val": Such as train.idx, train.lst and train.rec. Now, you can use them to train! Then use [this link](../uploadtos3) to upload your .rec files to s3! For example, do: ```python import sagemaker sess = sagemaker.Session() trainpath = sess.upload_data( path='train.rec', bucket='mybucketname', key_prefix='sagemaker/input') testpath = sess.upload_data( path='val.rec', bucket='mybucketname', key_prefix='sagemaker/input') ``` #### Training the model Once we have a usable dataset, we are ready to train the model. ```python import sagemaker from sagemaker import get_execution_role from sagemaker.amazon.amazon_estimator import get_image_uri role = get_execution_role() sess = sagemaker.Session() training_image = get_image_uri(sess.boto_region_name, 'object-detection', repo_version="latest") #the estimator will launch the training job od_model = sagemaker.estimator.Estimator(training_image, role, train_instance_count=1, train_instance_type='ml.p3.2xlarge', train_volume_size = 50, train_max_run = 360000, input_mode= 'File', output_path=s3_output_location, sagemaker_session=sess) #setup the hyperparameters od_model.set_hyperparameters(base_network='resnet-50', use_pretrained_model=1, num_classes=20, mini_batch_size=16, epochs=240, learning_rate=0.005, lr_scheduler_step='3,6', lr_scheduler_factor=0.1, optimizer='sgd', momentum=0.9, weight_decay=0.0005, overlap_threshold=0.5, nms_threshold=0.45, image_shape=512, label_width=350, num_training_samples=16551) #setup data channels train_data = sagemaker.session.s3_input(trainpath, distribution='FullyReplicated', content_type='application/x-recordio', s3_data_type='S3Prefix') validation_data = sagemaker.session.s3_input(testpath, distribution='FullyReplicated', content_type='application/x-recordio', s3_data_type='S3Prefix') data_channels = {'train': train_data, 'validation': validation_data} #train the model od_model.fit(inputs=data_channels, logs=True) ``` #### Create Endpoint Once the training is done, you can deploy the trained model as an endpoint. ```python object_detector = od_model.deploy(initial_instance_count = 1, instance_type = 'ml.m4.xlarge') ``` #### Perform inference Now, as the model is deployed, we can use it to derive inference. Let's download a sample image. ```html wget -O test.jpg https://images.pexels.com/photos/980382/pexels-photo-980382.jpeg ``` ```python import json file_name = 'test.jpg' with open(file_name, 'rb') as image: f = image.read() b = bytearray(f) #deriving inference #inference will be a JSON object object_detector.content_type = 'image/jpeg' results = object_detector.predict(b) detections = json.loads(results) print (detections) ```