{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# 2. Augment the training dataset and train a new model with a larger dataset\n", "\n", "<div class=\"alert alert-block alert-warning\">\n", "<b>Prerequisite:</b> This notebook is part of the <a href=\"https://catalog.workshops.aws/cv-retail\">Computer vision for retail inventory workshop</a>. Please follow the workshop instructions before running this notebook.\n", "</div>\n", "\n", "<div class=\"alert alert-block alert-warning\">\n", "<b>Notebook Kernel:</b> Please use the Python 3 (Data Science) kernel to run this notebook.\n", "</div>\n", "\n", "<div class=\"alert alert-block alert-warning\">\n", "<b>Execution sequence:</b> Before running this notebook, make sure you have already started <b>1.Train-a-custom-object-detection-model.ipynb</b>. \n", "</div>\n", "\n", "In the previous notebook we trained a custom object detection model using a small dataset of 95 images, out of which, only 66 images were used for training. In this notebook, we will generate **synthetic variations** from these 66 images (called **augmentations**) in order to generate a larger training set. This has the potential to result in an object detection model that generalizes better compared to the previous one, because of the larger variability in the training dataset. \n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Step 0: Dependencies and configuration\n", "\n", "Similarly as before, we start by loading useful Python libraries, defining our configuration, and connecting to the AWS SDKs, which will allow us to interface with various AWS services, like Amazon S3 and Amazon SageMaker. " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# generic packages\n", "import os\n", "import json\n", "import shutil\n", "import imageio\n", "import pandas as pd\n", "import numpy as np\n", "import matplotlib.pyplot as plt\n", "%matplotlib inline \n", "plt.style.use('seaborn')\n", "\n", "# AWS-related packages\n", "import boto3 # AWS Python SDK that allows us to interface with all AWS services\n", "import sagemaker # SageMaker Python SDK that allows us to easily build train and deploy models" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "sagemaker_role = sagemaker.get_execution_role()\n", "sagemaker_session = sagemaker.Session()\n", "boto_session = boto3.session.Session()\n", "region = boto_session.region_name\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Again, we will be using the **default bucket** of Amazon SageMaker to store our data, model artifacts and model outputs. " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# variable setup\n", "BUCKET_NAME = sagemaker_session.default_bucket() # here we will store our data\n", "PREFIX_PROJECT = 'computer-vision-for-retail-workshop' # main project folder in S3\n", "PREFIX_DATASET = 'dataset-full' # where our dataset will be located\n", "PREFIX_MODELS = 'models' # where our trained model weights will be saved\n", "CLASS_NAMES = [ # names of the 10 products that we will be trying to detect\n", " 'flakes', \n", " 'mm', \n", " 'coke', \n", " 'spam', \n", " 'nutella', \n", " 'doritos', \n", " 'ritz', \n", " 'skittles', \n", " 'mountaindew', \n", " 'evian'\n", "]\n", "# print a list of the class names along with their indices\n", "print('Index and class name')\n", "class_indx_names = pd.Series(CLASS_NAMES)\n", "print(class_indx_names,'\\n')\n", "\n", "MANIFEST_ATTRIBUTE_NAMES = ['source-ref', 'retail-object-labeling'] # attributes to be considered in the manifest files\n", "LOCAL_DATASET_FOLDER = 'dataset'\n", "\n", "print('Region:', region)\n", "print('Bucket:', BUCKET_NAME)\n", "\n", "# Initialize some empty variables we need to exist:\n", "predictor_std = None\n", "predictor_hpo = None\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Step 1: Explore the image augmentation function\n", "\n", "We will be using an image augmentation function `augmentations.py`, located in the `util` folder. The function takes as input an image and generates a set of new images using random transformations, like zooming/unzooming, cropping, sheering, rotation, brightness adjustment, noise, flipping etc. \n", "\n", "Take some time to **explore the impact of each augmentation parameter** in the example below. Here is a quick description for each one. \n", "\n", "| Parameter | Description | Type | Range | Example values and behavior | How to deactivate | \n", "| --- | --- | --- | --- | --- | --- |\n", "| `how_many` | How many image variations to generate from a single source image | Number | int [0,inf) | e.g. 10 | 0 |\n", "| `random_seed` | A number that controls randomness (for reproducibility) | None or Number| int (-inf, inf) | e.g. 0 | None |\n", "| `range_scale` | Minimum and maximum range for zooming/unzooming on the image | None or Tuple (min,max) | float (0,inf) | <1=zoom in, >1=zoom out, e.g. (0.5,1.5) | None |\n", "| `range_translation` | Minimum and maximum range for offsetting the (x,y) position of the image (in pixels) | None or Tuple (min,max) | int [0,inf) | e.g. (-100, 100) | None |\n", "| `range_rotation` | Minimum and maximum range for rotating image left/right (in degrees) | None or Tuple (min,max) | float [-360,360] | e.g.(-45, 45) | None |\n", "| `range_sheer` | Minimum and maximum range for skewing image left/right (in degrees) | None or Tuple (min,max) | float [-360,360] | e.g.(-45, 45) | None |\n", "| `range_noise` | Minimum and maximum range of noise variance | None or Tuple (min,max) | float [0, inf) | e.g. (0, 0.001) | None |\n", "| `range_brightness` | Minimum and maximum range for brightness gain | None or Tuple (min,max) | float (0, inf) | 1=no change, <1=darken, >1=brighten, e.g. (0.5, 1.5) | None |\n", "| `flip_lr` | Flipping image left-right | None or String | None / 'random' / 'all' | If 'all', all images are doubled (flipped + original). If 'random', images are flipped randomly. | None |\n", "| `flip_ud` | Flipping image up-down| None or String | None / 'random' / 'all' | If 'all', all images are doubled (flipped + original). If 'random', images are flipped randomly. | None |\n", "| `bbox_truncate` | Truncate bounding boxes that may end up outside the augmented image| Boolean | False/True | e.g. True | False |\n", "| `bbox_discard_thr` | Percentage of bounding box surface to be located inside the image, in order not to be discarded | Number | float [0,1] | e.g. 0.85 | N/A |\n", "| `display` | Display augmentations or not in the notebook | Boolean | False/True | Use True only for testing! | False |\n", "\n", "\n", "You need to come up with a *plausible set of variations* and ranges that makes sense for the current use case. For example, flipping the image upside down is not useful in our case, because it is highly unlikely that we will encounter these retail products upside down (particularly for bottles). Additionally, rotating the image too much, will also result in unrealistic images that cannot be encountered in real life." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "%load_ext autoreload\n", "%autoreload 2\n", "\n", "from util.augmentations import augment_affine\n", "\n", "filename = 'dataset/dataset-test/IMG_1599.jpg'\n", "ls_bboxes = [{\"class_id\": 4, \"top\": 169, \"left\": 357, \"height\": 70, \"width\": 59}, {\"class_id\": 4, \"top\": 662, \"left\": 171, \"height\": 82, \"width\": 55}, {\"class_id\": 8, \"top\": 85, \"left\": 256, \"height\": 153, \"width\": 60}, {\"class_id\": 9, \"top\": 71, \"left\": 183, \"height\": 167, \"width\": 49}, {\"class_id\": 7, \"top\": 343, \"left\": 351, \"height\": 89, \"width\": 77}, {\"class_id\": 5, \"top\": 372, \"left\": 244, \"height\": 65, \"width\": 113}, {\"class_id\": 5, \"top\": 363, \"left\": 143, \"height\": 74, \"width\": 112}, {\"class_id\": 1, \"top\": 500, \"left\": 270, \"height\": 91, \"width\": 96}, {\"class_id\": 1, \"top\": 497, \"left\": 173, \"height\": 96, \"width\": 99}, {\"class_id\": 0, \"top\": 619, \"left\": 241, \"height\": 128, \"width\": 99}, {\"class_id\": 6, \"top\": 640, \"left\": 353, \"height\": 106, \"width\": 82}]\n", "\n", "\n", "# TODO: experiment with different augmentation parameters and test their impact. \n", "\n", "image_augm = augment_affine(\n", " image_filename=filename, # the image that will be used as source to generate augmentations\n", " bboxes =ls_bboxes, # a list of bounding boxes in the source image\n", " how_many=10, # how many image variations to generate from the source image\n", " random_seed=0, # controls randomness for reproducibility\n", " range_scale=(0.75, 1.5), # (multiplier) minimum and maximum range for zooming/unzooming on the image\n", " range_translation=(-50, 50), # (in pixels) minimum and maximum range for offsetting the position of the image \n", " range_rotation=(-5, 5), # (in degrees) minimum and maximum range for rotating image left/right.\n", " range_sheer=(-5, 5), # (in degrees) minimum and maximum range for skewing image left/right.\n", " range_noise=(0, 0.001), # (variance) minimum and maximum range for noise variance\n", " range_brightness=(0.8, 1.5), # (multiplier) minimum and maximum range for brightness gain\n", " flip_lr='random', # If None, no left-right flipping is applied. If 'all', all images are flipped. If 'random', images are flipped randomly\n", " flip_ud=None, # same as flip_lr, but for up-down.\n", " bbox_truncate = True, # truncate bboxes that may end up outside the augmented image.\n", " bbox_discard_thr = 0.85, # percentage of bounding box surface to be located inside the image, in order not to be discarded. \n", " display=True # display augmentations (set as False if you are generating lots of images!!!)\n", " )\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Step 2: Augment the training dataset\n", "\n", "Based on your experimentation, select the final range of values that you would like your augmentations to have. These parameters will be applied when generating the augmented images from the original ones. Also, be mindful of the `AUGM_PER_IMAGE` parameter, because it can increase the dataset a lot (acts as a multiplier in the total number of images) and thus, result in longer training times. \n", "\n", "Here are some suggestions, but feel free to use your own! \n", "\n", "- `AUGM_PER_IMAGE = 5`\n", "- `RANDOM_SEED = 0`\n", "- `RANGE_SCALE = (0.75, 1.5)`\n", "- `RANGE_TRANSLATION = (-50, 50)`\n", "- `RANGE_ROTATION = (-5, 5)`\n", "- `RANGE_SHEER = (-5, 5)`\n", "- `RANGE_NOISE = (0, 0.001)`\n", "- `RANGE_BRIGHTNESS = (0.8, 1.5)`\n", "- `FLIP_LR = 'random'` \n", "- `FLIP_UD = None`\n", "\n", "Replace the `?` with your own values. After adding your chosen values, continue to run the rest of the notebook. Optionally, you can also choose to use \"Run Selected Cell and All Below\". " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# TODO: select the range of augmentations that you want to apply\n", "\n", "AUGM_PER_IMAGE = ? # how many augmentations to generate from each image (increases the whole dataset size by this factor)\n", "RANDOM_SEED = ? # control randomness for reproducibility\n", "RANGE_SCALE = (?, ?) # (multiplier) minimum and maximum range for zooming/unzooming on the image\n", "RANGE_TRANSLATION = (?, ?) # (in pixels) minimum and maximum range for offsetting the position of the image \n", "RANGE_ROTATION = (?, ?) # (in degrees) minimum and maximum range for rotating image left/right.\n", "RANGE_SHEER = (?, ?) # (in degrees) minimum and maximum range for skewing image left/right.\n", "RANGE_NOISE = (?, ?) # (variance) minimum and maximum range for noise variance\n", "RANGE_BRIGHTNESS = (?, ?) # (multiplier) minimum and maximum range for brightness gain\n", "FLIP_LR = ? # mirror image left right\n", "FLIP_UD = ? # mirror image up down\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# initialize a local folder to store the augmented images\n", "\n", "LOCAL_AUGMENTED_TRAINSET_FOLDER = f'{LOCAL_DATASET_FOLDER}/dataset-augmented' \n", "if os.path.exists(LOCAL_AUGMENTED_TRAINSET_FOLDER) is False:\n", " os.makedirs(LOCAL_AUGMENTED_TRAINSET_FOLDER)\n", " " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We will now generate the augmented images. First we will load the updated training manifest file (created in the 1st notebook) and use it as a guide. From each JSON line (which essentially is one training image), we will generate multiple augmented images, along with their new updated bounding boxes. At the same time, we will start populating a new training manifest file, which will include *both* the original training images and the augmented ones. In the meantime, we will also count the number of training examples generated per class. This can take 2-3 minutes to complete." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Augment the whole training dataset\n", "\n", "from pathlib import Path\n", "\n", "new_manifest = []\n", "n_samples_training_augmented = 0\n", "n_samples_training_original = 0\n", "class_histogram_train_original = np.zeros(len(CLASS_NAMES), dtype=int)\n", "class_histogram_train_augmented = np.zeros(len(CLASS_NAMES), dtype=int)\n", "\n", "with open(f'{LOCAL_DATASET_FOLDER}/{PREFIX_DATASET}/train-updated.manifest') as f: # open the manifest file\n", " lines = f.readlines()\n", " \n", "for line in lines:\n", " line_dict = json.loads(line) # load one json line (corresponding to one image)\n", " filename_object = Path(line_dict['source-ref'])\n", " filename = str(filename_object.name) # filename without the path\n", " \n", " # add json line of the original image and count the examples inside it\n", " new_manifest.append(json.dumps(line_dict))\n", " n_samples_training_original += 1\n", " for j,annotation in enumerate(line_dict['retail-object-labeling']['annotations']):\n", " class_histogram_train_original[int(line_dict['retail-object-labeling']['annotations'][j]['class_id'])] += 1 # counting annotations\n", " \n", " # generate augmented images\n", " print('Augmenting image:',filename)\n", " image_augm = augment_affine(\n", " image_filename=f'{LOCAL_DATASET_FOLDER}/{PREFIX_DATASET}/{filename}',\n", " bboxes =line_dict['retail-object-labeling']['annotations'],\n", " how_many=AUGM_PER_IMAGE,\n", " random_seed=RANDOM_SEED,\n", " range_scale=RANGE_SCALE,\n", " range_translation=RANGE_TRANSLATION,\n", " range_rotation=RANGE_ROTATION,\n", " range_sheer=RANGE_SHEER,\n", " range_noise=RANGE_NOISE, \n", " range_brightness=RANGE_BRIGHTNESS, \n", " flip_lr=FLIP_LR,\n", " flip_ud=FLIP_UD,\n", " bbox_truncate = True,\n", " bbox_discard_thr = 0.85,\n", " display=False # otherwise the notebook will be flooded with images!\n", " )\n", " \n", " # save augmented images locally\n", " for i,image in enumerate(image_augm['Images']):\n", " \n", " # new image size of augmented image\n", " image_height = image.shape[0]\n", " image_width = image.shape[1]\n", " if len(image.shape) == 3:\n", " image_depth = image.shape[2]\n", " else:\n", " image_depth = 1\n", " line_dict['retail-object-labeling']['image_size'] = [{\"width\": image_width, \"height\": image_height, \"depth\": image_depth}]\n", " \n", " # augmented image filename\n", " filename_no_extension = str(filename_object.stem) # filename without extension \n", " filename_augmented = f'{filename_no_extension}_augm_{str(i+1)}.jpg'\n", " image_augm_filename = f'{LOCAL_AUGMENTED_TRAINSET_FOLDER}/{filename_augmented}'\n", " imageio.imsave(image_augm_filename, image, quality=95) # save locally\n", " new_filename_s3 = f's3://{BUCKET_NAME}/{PREFIX_PROJECT}/{PREFIX_DATASET}/{filename_augmented}'\n", " line_dict['source-ref'] = new_filename_s3 # add new filename to the manifest file\n", " \n", " # new image bounding boxes\n", " line_dict['retail-object-labeling']['annotations'] = image_augm['bboxes'][i]\n", " \n", " # add a new json line for this augmentation image\n", " new_manifest.append(json.dumps(line_dict))\n", " \n", " n_samples_training_augmented += 1 # count training images\n", " for j,annotation in enumerate(line_dict['retail-object-labeling']['annotations']):\n", " class_histogram_train_augmented[int(line_dict['retail-object-labeling']['annotations'][j]['class_id'])] += 1 # count annotations\n", " \n", " \n", "# save the updated training manifest file locally\n", "with open(f\"{LOCAL_AUGMENTED_TRAINSET_FOLDER}/train-augmented.manifest\", \"w\") as f:\n", " for line in new_manifest:\n", " f.write(f\"{line}\\n\") " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's compare now the size of the augmented dataset to the original one. Remember that the augmented training dataset includes both the original training images plus the generated ones. " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# depict statistics about the existing dataset splits\n", "\n", "print('Oroginal training images:', n_samples_training_original)\n", "print('Augmented training images:', n_samples_training_original + n_samples_training_augmented)\n", "\n", "df_dataset_stats = pd.DataFrame(\n", " {\n", " 'train_augmented': class_histogram_train_augmented + class_histogram_train_original, \n", " 'train_original': class_histogram_train_original, \n", " 'class': CLASS_NAMES\n", " }\n", ") \n", "\n", "df_dataset_stats.plot.bar(x='class')\n", "plt.title('Number of examples per dataset')\n", "plt.show()\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We see that now we have many more training examples per class, as well as more overall training images. We need now to copy these images and the new manifest file to Amazon S3, in order to use them for training. " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# copy augmented manifest file to S3\n", "sagemaker_session.upload_data(\n", " path=f'{LOCAL_AUGMENTED_TRAINSET_FOLDER}/train-augmented.manifest',\n", " bucket=BUCKET_NAME,\n", " key_prefix=f'{PREFIX_PROJECT}/{PREFIX_DATASET}'\n", ")\n", "\n", "# copy all augmented images to the rest of the dataset in S3\n", "import glob\n", "ls_augmented_images = glob.glob(f'{LOCAL_AUGMENTED_TRAINSET_FOLDER}/*.jpg')\n", "\n", "for i,filename in enumerate(ls_augmented_images):\n", " print('Copying augmented image', i+1, 'out of', len(ls_augmented_images), '...\\r', end='')\n", " sagemaker_session.upload_data(\n", " path=filename,\n", " bucket=BUCKET_NAME,\n", " key_prefix=f'{PREFIX_PROJECT}/{PREFIX_DATASET}'\n", " )\n", " " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Step 3: Set up input data channels\n", "\n", "In order for SageMaker to train a machine learning model, it needs to know where the training and validation datasets are located. In SageMaker language, this is called **input channels**. Input channels are **objects** containing the location of the datasets in S3, along with the type of data and annotations. \n", "\n", "In our case we have in S3, both for the training and validation datasets:\n", "\n", "* A **JSONLines manifest file** listing what images are in the dataset (by their S3 URI) and what annotations have been collected for those images (bounding boxes + class names)\n", "* The image files themselves\n", "\n", "We want SageMaker to provide the algorithm with a **stream of image records** comprising both the image data and their annotations. This will be faster compared to downloading the full dataset to the training container. The [algorithm docs](https://docs.aws.amazon.com/sagemaker/latest/dg/object-detection.html#object-detection-inputoutput) give guidance on how to set this up: SageMaker already provides functionality to create RecordIO files from manifest files. " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "train_channel = sagemaker.inputs.TrainingInput(\n", " f's3://{BUCKET_NAME}/{PREFIX_PROJECT}/{PREFIX_DATASET}/train-augmented.manifest', # we are using the new AUGMENTED training manifest file\n", " distribution=\"FullyReplicated\",\n", " content_type=\"application/x-recordio\",\n", " s3_data_type=\"AugmentedManifestFile\",\n", " record_wrapping=\"RecordIO\",\n", " attribute_names=MANIFEST_ATTRIBUTE_NAMES,\n", " shuffle_config=sagemaker.inputs.ShuffleConfig(seed=1)\n", ")\n", " \n", "validation_channel = sagemaker.inputs.TrainingInput(\n", " f's3://{BUCKET_NAME}/{PREFIX_PROJECT}/{PREFIX_DATASET}/validation-updated.manifest', # we are using the same validation manifest file\n", " distribution=\"FullyReplicated\",\n", " content_type=\"application/x-recordio\",\n", " record_wrapping=\"RecordIO\",\n", " s3_data_type=\"AugmentedManifestFile\",\n", " attribute_names=MANIFEST_ATTRIBUTE_NAMES,\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Step 4: Configure the algorithm\n", "\n", "The first step in deciding to use a SageMaker built-in algorithm is to review its [documentation](https://docs.aws.amazon.com/sagemaker/latest/dg/object-detection.html) and [hyperparameters](https://docs.aws.amazon.com/sagemaker/latest/dg/object-detection-api-config.html). In particular we'll need the **URL for the Docker image** in order to use a built-in algorithm. While this is listed [in the docs](https://docs.aws.amazon.com/sagemaker/latest/dg/sagemaker-algo-docker-registry-paths.html), it's also nice and easy to fetch programmatically.\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "training_image = sagemaker.image_uris.retrieve(\n", " region=region, \n", " framework=\"object-detection\", \n", " version=\"1\" # or use \"latest\"\n", ")\n", "print('Container image:', training_image)\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The remainder of configuration includes:\n", "\n", "* Setting where to store final model artifacts and intermediate checkpoints\n", "* Specifying which compute resource to use\n", "* Selecting the algorithm's hyperparameters\n", "\n", "We do this through the [SageMaker SDK's Estimator API](https://sagemaker.readthedocs.io/en/stable/api/training/estimators.html), similarly to estimators in other common frameworks. Some things to keep in mind:\n", "\n", "* [Pipe Mode](https://docs.aws.amazon.com/sagemaker/latest/dg/cdf-training.html#cdf-pipe-mode) streams input data to the algorithm rather than (the default) downloading the whole dataset up-front. This can accelerate training start-up for algorithms that support it.\n", "* The Object Detection built-in algorithm supports GPU-accelerated and distributed training. Here we use a GPU-accelerated `ml.p3.2xlarge` instance. There is no need for distributed training (more than one instance), due to the small dataset size." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "estimator = sagemaker.estimator.Estimator(\n", " training_image, # URL to container image implementing the algorithm \n", " sagemaker_role, # IAM access to perform the API actions\n", " input_mode=\"Pipe\", # or \"File\" mode\n", " instance_count=1, # if more than 1, then we have distributed training (not needed here!)\n", " instance_type=\"ml.p3.2xlarge\", # type of instance to be used for training\n", " volume_size=50, # (in GB) storage volume to use for storing input and output data during training \n", " max_run=10*60*60, # (in sec) maximum time of the training job\n", " output_path=f\"s3://{BUCKET_NAME}/{PREFIX_PROJECT}/{PREFIX_MODELS}\" # where to store the model weights\n", ")\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Next we will select the [algorithm's hyperparameters](https://docs.aws.amazon.com/sagemaker/latest/dg/object-detection-api-config.html). Here we selected a promising set of hyperparameters, after performing some initial testing. \n", "\n", "Since the size of the training set is now larger (depending on the number of augmentations you introduced), you may be able to train the model from scratch (using random initial weights), without starting from pretrained ones. This can be done by setting `use_pretrained_model=0` in the hyperparameters. In this case however, training will take much longer; the optimizer will need more epochs (~50-100) to reach a plateau, and therefore, more waiting time. \n", "\n", "For this reason, and in the interest of time, we suggest to still use pretrained weights (by setting `use_pretrained_model=1`). This will allow the optimizer to converge faster. In fact, you will later observe that the optimizer reaches a plateau in fewer than 10 epochs, whereas before, without augmentations, it needed at least 20 epochs." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "estimator.set_hyperparameters(\n", " base_network=\"resnet-50\", # or 'vgg-16' architecture to be used\n", " use_pretrained_model=1, # 0/1 whether to use pretrained or random weights (0-> train from scratch) \n", " early_stopping = True,\n", " early_stopping_min_epochs = 5,\n", " early_stopping_patience = 3,\n", " early_stopping_tolerance = 0.00,\n", " num_classes=len(CLASS_NAMES),\n", " mini_batch_size=8, # depends on the GPU memory and the image sizes\n", " epochs=20,\n", " learning_rate=0.00005, # very small learning rate to avoid catastrophic forgeting\n", " lr_scheduler_step=\"20,40,60,80\",\n", " lr_scheduler_factor=0.5,\n", " optimizer=\"adam\",\n", " momentum=0.2834,\n", " weight_decay=0.94,\n", " overlap_threshold=0.5,\n", " nms_threshold=0.45,\n", " image_shape=832,\n", " label_width=350,\n", " num_training_samples=n_samples_training_original + n_samples_training_augmented,\n", ")\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Step 5: Train the model\n", "\n", "The hyperparameters above represent our best up-front guess; and it's easy enough to call `estimator.fit()` to train a model as shown below.\n", "\n", "One way to improve model performance and reduce some of the guesswork, is to let SageMaker `HyperParameterTuner` optimize them. SageMaker Hyper-Parameter Optiomization (HPO) supports [Random, Grid, Bayesian and Hyperband strategies](https://docs.aws.amazon.com/sagemaker/latest/dg/automatic-model-tuning-how-it-works.html)). In this example we prefer the [Hyperband approach, which usually exhibits a more competitive performance compared to ther rest](https://aws.amazon.com/blogs/machine-learning/amazon-sagemaker-automatic-model-tuning-now-provides-up-to-three-times-faster-hyperparameter-tuning-with-hyperband/) and faster convergance. Because HPO typically takes much longer than standard model fitting, `tuner.fit()` is an **asynchronous** method by default whereas `estimator.fit()` is **synchronous** (blocking).\n", "\n", "<div class=\"alert alert-block alert-warning\">\n", "<b>Notice:</b> The following training code will take <b>approximately 12-15 minutes</b> to execute (without HPO). If you opt to activate HPO, it will take 2-4 hours (depending on how much data you added through augmentation)! Therefore, we do not recommend running HPO during the workshop, but you are welcome to try it in your own time. \n", "</div>\n", "\n", "While you wait for the training to finish, why not read up on SageMaker using any of the links in this notebook? Or, if you don't like reading, watch this cool under-the-hood video about [Amazon Go](https://www.youtube.com/watch?v=Lu4szyPjIGY)." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "WITH_HPO = False # change to True if you want to find better parameters (attention! it takes long time!)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "scrolled": true, "tags": [] }, "outputs": [], "source": [ "%%time\n", "\n", "if WITH_HPO is False:\n", " estimator.fit(\n", " { \"train\": train_channel, \"validation\": validation_channel }, \n", " logs=True\n", " )\n", " \n", "else:\n", " hyperparameter_ranges = {\n", " \"learning_rate\": sagemaker.tuner.ContinuousParameter(0.00001, 0.01),\n", " \"momentum\": sagemaker.tuner.ContinuousParameter(0.0, 0.99),\n", " \"weight_decay\": sagemaker.tuner.ContinuousParameter(0.0, 0.99),\n", " \"optimizer\": sagemaker.tuner.CategoricalParameter([\"sgd\", \"adam\", \"rmsprop\", \"adadelta\"])\n", " }\n", "\n", " tuner = sagemaker.tuner.HyperparameterTuner(\n", " estimator,\n", " \"validation:mAP\", # Name of the objective metric to optimize. \"Mean Average Precision\" high = good\n", " objective_type=\"Maximize\", # or Minimize\n", " strategy=\"Hyperband\", # or Baysian or Random. \n", " hyperparameter_ranges=hyperparameter_ranges,\n", " base_tuning_job_name=\"object-detection-SSD-HPO\",\n", " max_jobs=50, # how many searches (training jobs) we will have in the parameter space\n", " max_parallel_jobs=1 # how many searches (training jobs) will happen in parallel\n", " )\n", " \n", " tuner.fit(\n", " { \"train\": train_channel, \"validation\": validation_channel },\n", " include_cls_metadata=False\n", " )" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Once training finishes, we can explore the training logs and see how performance on the validation set changed over time through the training epochs.\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import boto3\n", "import numpy as np\n", "import matplotlib.pyplot as plt\n", "import matplotlib.ticker as ticker\n", "\n", "%matplotlib inline\n", "plt.style.use('seaborn')\n", "\n", "client = boto3.client(\"logs\")\n", "BASE_LOG_NAME = \"/aws/sagemaker/TrainingJobs\"\n", "\n", "def plot_object_detection_log(model, title):\n", " # retrieve from the training job logs the mAP across epochs and plot it\n", " logs = client.describe_log_streams(\n", " logGroupName=BASE_LOG_NAME, logStreamNamePrefix=model._current_job_name\n", " )\n", " cw_log = client.get_log_events(\n", " logGroupName=BASE_LOG_NAME, logStreamName=logs[\"logStreams\"][0][\"logStreamName\"]\n", " )\n", "\n", " mAP_accs = []\n", " for e in cw_log[\"events\"]:\n", " msg = e[\"message\"]\n", " if \"validation mAP <score>=\" in msg:\n", " num_start = msg.find(\"(\")\n", " num_end = msg.find(\")\")\n", " mAP = msg[num_start + 1 : num_end]\n", " mAP_accs.append(float(mAP))\n", "\n", " print(title)\n", " print(\"Maximum mAP: %f \" % max(mAP_accs))\n", "\n", " fig, ax = plt.subplots()\n", " plt.xlabel(\"Epochs\")\n", " plt.ylabel(\"Mean Avg Precision (mAP)\")\n", " plt.title(\"Validation performance per training epoch\")\n", " (val_plot,) = ax.plot(range(len(mAP_accs)), mAP_accs, label=\"mAP\")\n", " plt.legend(handles=[val_plot])\n", " ax.yaxis.set_ticks(np.arange(0.0, 1.05, 0.1))\n", " ax.yaxis.set_major_formatter(ticker.FormatStrFormatter(\"%0.2f\"))\n", " plt.show()\n", " \n", " \n", "if WITH_HPO is True: \n", " estimator = tuner.best_estimator()\n", " \n", "plot_object_detection_log(estimator, \"mAP tracking for job: \" + estimator._current_job_name)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Step 6: Deploy the model\n", "\n", "Once the model is trained, SageMaker supports many different [deployment options](https://docs.aws.amazon.com/sagemaker/latest/dg/deploy-model.html). In this example we'll deploy our trained model to a **real-time endpoint**, in order to have the lowest possible latency in our predictions. You can think an endpoint as a dedicated web-server, making accessible our trained model's predictions through a REST API. Since our endpoints won't be handling any significant traffic volumes, we provision a single non-accelerated instance.\n", "\n", "<div class=\"alert alert-block alert-warning\">\n", "<b>Notice:</b> The following deployment of the model to a realtime endpoint will take approximately <b>5-7 minutes</b>.\n", "</div>\n", "\n", "<div class=\"alert alert-block alert-warning\">\n", "<b>Attention:</b> Please copy the <b>name of the deployed endpoint</b> (in the output of the following cell). You will need to paste it in the application. This will allow the application to route camera images to your endpoint for real-time inference.\n", "</div>" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "scrolled": true }, "outputs": [], "source": [ "%%time\n", "if WITH_HPO is True:\n", " if (predictor_hpo): # in case HPO was selected\n", " try:\n", " predictor_hpo.delete_endpoint()\n", " print(\"Deleted previous HPO endpoint\")\n", " except:\n", " print(\"Couldn't delete previous HPO endpoint\")\n", " print(\"Deploying HPO model...\")\n", " predictor_hpo = tuner.deploy(\n", " initial_instance_count=1,\n", " instance_type=\"ml.c5.large\",\n", " # wait=False,\n", " )\n", "else:\n", " if (predictor_std): # in case simple training was selected\n", " try:\n", " predictor_std.delete_endpoint()\n", " print(\"Deleted previous non-HPO endpoint\")\n", " except:\n", " print(\"Couldn't delete previous non-HPO endpoint\")\n", " print(\"Deploying standard (non-HPO) model...\")\n", " predictor_std = estimator.deploy(\n", " initial_instance_count=1, # number of instances for the endpoint\n", " instance_type=\"ml.c5.large\", # type of instance to be used for the endpoint\n", " # wait=False,\n", " )\n", "\n", "predictor = predictor_hpo if WITH_HPO else predictor_std\n", "print('Name of the deployed endpoint:',predictor.endpoint_name)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "<div class=\"alert alert-block alert-warning\">\n", "<b>Attention:</b> Please copy the <b>name of the deployed endpoint</b> (in the output of the previous cell). You will need to paste it in the application. This will allow the application to route camera images to your endpoint for real-time inference.\n", "</div>" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Step 7: Run inference on test images\n", "\n", "Now that we have one object detection model deployed, we can send some test images and see how it performs!\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Test on individual images\n", "\n", "The `predict()` function used here, provided in the `util` folder, includes code that allows us to send image files to our deployed endpoint and receive back the detection results. \n", "\n", "The built-in Object Detection algorithm doesn't estimate an optimal confidence threshold for us. Instead, it returns **all the detections**, irrespective of their confidence score. The `predict()` function includes the confidence threshold parameter `thresh`. Any detection with confidence score higher than the `thresh` parameter, will be visualized. The function also returns a Pandas DataFrame with all the detected bounding boxes for the given detection threshold. \n", "\n", "For visualization, the `predict()` function calls the `visualize_detection()` function, which is uses Matplotlib to plot the provided detection boxes over the image. Test different filenames from the test folder, as well as different `thresh` values." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%load_ext autoreload\n", "%autoreload 2\n", "\n", "from util.util import predict\n", "\n", "sagemaker_runtime = boto3.client(service_name=\"runtime.sagemaker\")\n", "endpoint_name = predictor.endpoint_name\n", "print('Using endpoint:',endpoint_name)\n", "\n", "df_results = predict(\n", " # change this to other test images to see more results\n", " filename=f'{LOCAL_DATASET_FOLDER}/dataset-test/IMG_1599.jpg', \n", " runtime=sagemaker_runtime,\n", " class_names = CLASS_NAMES,\n", " endpoint_name=endpoint_name, \n", " thresh=0.2, \n", " visualize=True\n", ")\n", "\n", "df_results\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Compare multiple detection thresholds\n", "\n", "You can even generate side by side comparison of the results of multiple detection thresholds, by providing a **list of thresholds**, instead of a single number. In the following example, we depict the results for 3 different detection threshold values `[0.3, 0.6, 0.9]`." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%load_ext autoreload\n", "%autoreload 2\n", "\n", "from util.util import predict\n", "\n", "sagemaker_runtime = boto3.client(service_name=\"runtime.sagemaker\")\n", "endpoint_name = predictor.endpoint_name\n", "print('Using endpoint:',endpoint_name)\n", "\n", "df_results = predict(\n", " # change this to other test images to see more results\n", " filename=f'{LOCAL_DATASET_FOLDER}/dataset-test/IMG_1599.jpg', \n", " runtime=sagemaker_runtime,\n", " class_names = CLASS_NAMES,\n", " endpoint_name=endpoint_name, \n", " thresh=[0.3,0.6,0.9], # a list of thresholds to be visualized\n", " visualize=True\n", ")\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "One thing that is evident, is that with the augmented training dataset, the model makes more confident decisions compared to when it was trained with the original smaller dataset. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Test on the whole test set\n", "\n", "We can now call the `evaluate_testset()` helper function, which will test the deployed model against the **whole test set** and generate a performance report, for each class. The helper function relies on two important parameters. **Intersection Over Union (IOU)** and the model's confidence threshold. IOU is a metric that describes the degree of overlap between two bounding boxes, and it is needed in order to know when the output of the model coincides with the ground truth bounding box. Check this page for a more [detailed description on IOU](https://en.wikipedia.org/wiki/Jaccard_index)." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%load_ext autoreload\n", "%autoreload 2\n", "\n", "from util.util import evaluate_testset\n", "\n", "df_class_performance = evaluate_testset(\n", " runtime=sagemaker_runtime, \n", " endpoint_name=endpoint_name,\n", " class_names = CLASS_NAMES,\n", " testset_folder = f'{LOCAL_DATASET_FOLDER}/dataset-test', \n", " test_manifest_file = f'{LOCAL_DATASET_FOLDER}/dataset-test/test-updated.manifest', \n", " thr_iou = 0.5, # Intersection Over Union threshold\n", " thr_conf = 0.2 # Confidence Threshold\n", ")\n", "\n", "df_class_performance\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here is a quick explanation for the performance statistics. Check this page for a more [detailed description](https://en.wikipedia.org/wiki/Precision_and_recall).\n", "\n", "| Abbreviation | Full name | Explanation | Range | Intuition |\n", "| --- | --- | --- | --- | --- | \n", "| `TP` | True Positives | How many times the model has correctly detected an actual object | Depends on test set | Higher is better |\n", "| `FP` | False Positives | How many times the model has mistakenly detected something that was not an actual object | Depends on test set | Lower is better |\n", "| `FN` | False Negatives | How many times the model did not detect (missed) an actual object | Depends on test set | Lower is better |\n", "| `TN` | True Negatives | (**Does not apply in object detection!**) How many times the model correctly did not detect something that was not an actual object | N/A | N/A |\n", "| `PR` | Precision | What percentage of all the model's detections were indeed actual objects. | [0,1] | Higher is better |\n", "| `RE` | Recall | What percentage of all the actual available objects were detected by the model | [0,1] | Higher is better |\n", "| `F1` | F1 score | A balanced combination (harmonic mean) of Precision and Recall. | [0,1] | Higher is better |" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# plot a graph of the results \n", "\n", "plt.figure()\n", "df_class_performance.plot.bar(x='CLASS')\n", "plt.title('Model performance on the Testset')\n", "plt.show()\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We see that performance is better compared to when the model was trained with the original smaller training dataset (Notebook 1). " ] }, { "cell_type": "markdown", "metadata": { "tags": [] }, "source": [ "## Step 8: Clean up\n", "\n", "Although training instances are ephemeral, the resources we allocated for real-time endpoints need to be cleaned up to avoid ongoing charges. The code below will delete the *most recently deployed* endpoint for the HPO and non-HPO configurations, but note that if you deployed either more than once, you might end up with extra endpoints. To be safe, it's best to still check through the SageMaker console for any left-over resources when cleaning up." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# uncomment the following code in order to delete the created resources\n", "\n", "# if (predictor_hpo):\n", "# print(\"Deleting HPO-optimized predictor endpoint\")\n", "# predictor_hpo.delete_endpoint()\n", "# if (predictor_std):\n", "# print(\"Deleting standard (non-HPO) predictor endpoint\")\n", "# predictor_std.delete_endpoint()" ] }, { "cell_type": "markdown", "metadata": { "jp-MarkdownHeadingCollapsed": true, "tags": [] }, "source": [ "## Review\n", "\n", "In this notebook we augmented the initial small training dataset, by generating sythetic transformations from each training image (augmentations). This resulted in a larger training dataset, that covered more variations in the position, size and orientation of the objects. We saw that the resulting model gives more confident decisions and achieved a better performance in our hold out testing dataset. \n" ] } ], "metadata": { "instance_type": "ml.t3.medium", "kernelspec": { "display_name": "Python 3.10.6 64-bit", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.6" }, "vscode": { "interpreter": { "hash": "31f2aee4e71d21fbe5cf8b01ff0e069b9275f58929596ceb00d14d90e3e16cd6" } } }, "nbformat": 4, "nbformat_minor": 4 }