{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 1. Build an object detection model with Amazon SageMaker's built-in Object Detection algorithm\n",
    "\n",
    "<div class=\"alert alert-block alert-warning\">\n",
    "<b>Prerequisite:</b> This notebook is part of the <a href=\"https://catalog.workshops.aws/cv-retail\">Computer vision for retail inventory workshop</a>. Please follow the workshop instructions before running this notebook.\n",
    "</div>\n",
    "\n",
    "<div class=\"alert alert-block alert-warning\">\n",
    "<b>Notebook Kernel:</b> Please use the Python 3 (Data Science) kernel to run this notebook.\n",
    "</div>\n",
    "\n",
    "In this notebook we will build a custom object detector model, using [Amazon SageMaker's built-in Object Detection algorithm](https://docs.aws.amazon.com/sagemaker/latest/dg/object-detection.html), which is based on the Single Shot multibox Detector (SSD). The object detection model will be the main component for our retail inventory monitoring system. \n",
    "\n",
    "Like most of the built-in algorithms, the Object Detection documentation includes a [How It Works](https://docs.aws.amazon.com/sagemaker/latest/dg/algo-object-detection-tech-notes.html) section, with an overview and links to relevant resources. The SSD algorithm is described in [Liu et al, 2016](https://arxiv.org/pdf/1512.02325.pdf).\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Step 0: Dependencies and configuration\n",
    "\n",
    "Start by loading useful Python libraries, defining our configuration, and connecting to the AWS SDKs, which will allow us to interface with various AWS services, like Amazon S3 and Amazon SageMaker. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# generic packages\n",
    "import os\n",
    "import json\n",
    "import shutil\n",
    "import imageio\n",
    "import pandas as pd\n",
    "import numpy as np\n",
    "import matplotlib.pyplot as plt\n",
    "%matplotlib inline  \n",
    "plt.style.use('seaborn')\n",
    "\n",
    "# AWS-related packages\n",
    "import boto3  # AWS Python SDK that allows us to interface with all AWS services\n",
    "import sagemaker  # SageMaker Python SDK that allows us to easily build, train and deploy models"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Next we will connect to the AWS SDKs and set up some common variables we will be using, in order to define the folder structure locally and in S3."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "sagemaker_role = sagemaker.get_execution_role()\n",
    "sagemaker_session = sagemaker.Session()\n",
    "boto_session = boto3.session.Session()\n",
    "region = boto_session.region_name\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Amazon SageMaker provides a **default bucket** in each AWS region, which is automatically created for us. We will be using this default bucket to store our data, model artifacts and model outputs. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# variable setup\n",
    "BUCKET_NAME = sagemaker_session.default_bucket()  # here we will store our data\n",
    "PREFIX_PROJECT = 'computer-vision-for-retail-workshop'  # main project folder in S3\n",
    "PREFIX_DATASET = 'dataset-full'  # where our dataset will be located\n",
    "PREFIX_MODELS = 'models'  # where our trained model weights will be saved\n",
    "CLASS_NAMES = [  # names of the 10 products that we will be trying to detect\n",
    "    'flakes',  \n",
    "    'mm', \n",
    "    'coke', \n",
    "    'spam', \n",
    "    'nutella', \n",
    "    'doritos', \n",
    "    'ritz', \n",
    "    'skittles', \n",
    "    'mountaindew', \n",
    "    'evian'\n",
    "]\n",
    "# print a list of the class names along with their indices\n",
    "print('Index and class name')\n",
    "class_indx_names = pd.Series(CLASS_NAMES)\n",
    "print(class_indx_names,'\\n')\n",
    "\n",
    "MANIFEST_ATTRIBUTE_NAMES = ['source-ref', 'retail-object-labeling']  # attributes to be considered in the manifest files\n",
    "LOCAL_DATASET_FOLDER = 'dataset'\n",
    "\n",
    "print('Region:', region)\n",
    "print('Bucket:', BUCKET_NAME)\n",
    "\n",
    "# initialize some empty variables we need to exist:\n",
    "predictor_std = None\n",
    "predictor_hpo = None\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Step 1: Prepare the dataset\n",
    "\n",
    "A small training dataset of 95 images is already included, in the file `dataset.zip`. The dataset includes images of the same shelf, with different combinations of the 10 products placed in various random positions on it. \n",
    "\n",
    "The dataset has already been split into 3 parts: \n",
    "- Train (66 images)\n",
    "- Validation (19 images)\n",
    "- Test (10 images)\n",
    "\n",
    "In each split, the distribution of training examples is roughly similar across the 10 item classes. 3 [manifest files](https://docs.aws.amazon.com/sagemaker/latest/dg/sms-input-data-input-manifest.html) are also included: \n",
    "- `train.manifest`\n",
    "- `validation.manifest`\n",
    "- `test.manifest`\n",
    "\n",
    "Each manifest file describes which of the images belong to each split, along with details about the bounding boxes of the items in each image. \n",
    "\n",
    "First, we will need to unzip the dataset, explore it, update the manifest files, and upload everything to the S3 bucket. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "! unzip dataset/dataset-full.zip -d dataset/"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Here we will show one image and its corresponding annotations from the `train.manifest` file. In a manifest file, each line is a stand-alone JSON expression, containing metadata. In our case, it contains an image filename and its annotations (i.e. bounding box coordinates and class name)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "# depict one image from the dataset, along with its corresponding annotations\n",
    "\n",
    "from pathlib import Path\n",
    "\n",
    "with open(f'{LOCAL_DATASET_FOLDER}/{PREFIX_DATASET}/train.manifest') as f:\n",
    "    lines = f.readlines()\n",
    "\n",
    "line_dict = json.loads(lines[0])  # load the 1st line of the manifest file\n",
    "filename = str(Path(line_dict['source-ref']).name)\n",
    "\n",
    "image = imageio.imread(f'{LOCAL_DATASET_FOLDER}/{PREFIX_DATASET}/{filename}')\n",
    "plt.imshow(image)\n",
    "plt.grid(False)\n",
    "plt.axis(True)\n",
    "plt.title(f'{LOCAL_DATASET_FOLDER}/{PREFIX_DATASET}/{filename}')\n",
    "plt.show()\n",
    "\n",
    "print(json.dumps(line_dict, indent=4))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Next, we will copy all the image files (training + validation + testing) to the S3 bucket. We will need them to be in the S3 bucket, in order to initiate training with Amazon SageMaker."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "import glob\n",
    "\n",
    "ls_dataset_files = glob.glob(f'{LOCAL_DATASET_FOLDER}/{PREFIX_DATASET}/*.jpg')  # get all image files\n",
    "print('Copying', len(ls_dataset_files), 'images to', f's3://{BUCKET_NAME}/{PREFIX_PROJECT}/{PREFIX_DATASET}')\n",
    "\n",
    "for file in ls_dataset_files:\n",
    "    filename = str(Path(file).name)\n",
    "    print('Copying', filename)\n",
    "    sagemaker_session.upload_data(\n",
    "        path=f'{LOCAL_DATASET_FOLDER}/{PREFIX_DATASET}/{filename}',\n",
    "        bucket=BUCKET_NAME,\n",
    "        key_prefix=f'{PREFIX_PROJECT}/{PREFIX_DATASET}',\n",
    "    )"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We now need to update the manifest files with the correct location of the images in our S3 bucket, and copy them to the same S3 location. At the same time, we will count the number of training examples per class, per split, in order to understand more about our dataset. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "#---------- analyze and update the train.manifest\n",
    "new_manifest = []\n",
    "n_samples_training = 0\n",
    "class_histogram_train = np.zeros(len(CLASS_NAMES), dtype=int)\n",
    "\n",
    "with open(f'{LOCAL_DATASET_FOLDER}/{PREFIX_DATASET}/train.manifest') as f:  # open the manifest file\n",
    "    lines = f.readlines()\n",
    "for line in lines:\n",
    "    line_dict = json.loads(line)  # load one json line (corresponding to one image)\n",
    "    \n",
    "    filename = str(Path(line_dict['source-ref']).name)\n",
    "    new_filename_s3 = f's3://{BUCKET_NAME}/{PREFIX_PROJECT}/{PREFIX_DATASET}/{filename}'\n",
    "    line_dict['source-ref'] = new_filename_s3\n",
    "    new_manifest.append(json.dumps(line_dict))  # add updated json line\n",
    "    \n",
    "    n_samples_training += 1  # counting training images\n",
    "    for i,annotation in enumerate(line_dict['retail-object-labeling']['annotations']):\n",
    "        class_histogram_train[int(line_dict['retail-object-labeling']['annotations'][i]['class_id'])] += 1  # counting annotations\n",
    "\n",
    "# save the updated training manifest file locally\n",
    "with open(f\"{LOCAL_DATASET_FOLDER}/{PREFIX_DATASET}/train-updated.manifest\", \"w\") as f:\n",
    "    for line in new_manifest:\n",
    "        f.write(f\"{line}\\n\")        \n",
    "        \n",
    "        \n",
    "#---------- analyze and update the validation.manifest       \n",
    "new_manifest = []\n",
    "n_samples_validation = 0\n",
    "class_histogram_val = np.zeros(len(CLASS_NAMES), dtype=int)\n",
    "\n",
    "with open(f'{LOCAL_DATASET_FOLDER}/{PREFIX_DATASET}/validation.manifest') as f:\n",
    "    lines = f.readlines()\n",
    "for line in lines:\n",
    "    line_dict = json.loads(line)  # load one json line (corresponding to one image)\n",
    "    \n",
    "    filename = str(Path(line_dict['source-ref']).name)\n",
    "    new_filename_s3 = f's3://{BUCKET_NAME}/{PREFIX_PROJECT}/{PREFIX_DATASET}/{filename}'\n",
    "    line_dict['source-ref'] = new_filename_s3\n",
    "    new_manifest.append(json.dumps(line_dict))  # add updated json line\n",
    "    \n",
    "    n_samples_validation += 1  # counting validation images\n",
    "    for i,annotation in enumerate(line_dict['retail-object-labeling']['annotations']):\n",
    "        class_histogram_val[int(line_dict['retail-object-labeling']['annotations'][i]['class_id'])] += 1  # counting validation samples\n",
    "\n",
    "# save the updated validation manifest file locally\n",
    "with open(f\"{LOCAL_DATASET_FOLDER}/{PREFIX_DATASET}/validation-updated.manifest\", \"w\") as f:\n",
    "    for line in new_manifest:\n",
    "        f.write(f\"{line}\\n\")           \n",
    "\n",
    "        \n",
    "#---------- analyze and update the test.manifest           \n",
    "new_manifest = []\n",
    "n_samples_testing = 0\n",
    "class_histogram_test = np.zeros(len(CLASS_NAMES), dtype=int)\n",
    "\n",
    "with open(f'{LOCAL_DATASET_FOLDER}/{PREFIX_DATASET}/test.manifest') as f:\n",
    "    lines = f.readlines()\n",
    "for line in lines:\n",
    "    line_dict = json.loads(line)  # load one json line (corresponding to one image)\n",
    "    \n",
    "    filename = str(Path(line_dict['source-ref']).name)\n",
    "    new_filename_s3 = f's3://{BUCKET_NAME}/{PREFIX_PROJECT}/{PREFIX_DATASET}/{filename}'\n",
    "    line_dict['source-ref'] = new_filename_s3\n",
    "    new_manifest.append(json.dumps(line_dict))  # add updated json line\n",
    "    \n",
    "    n_samples_testing += 1  # counting testing images\n",
    "    for i,annotation in enumerate(line_dict['retail-object-labeling']['annotations']):\n",
    "        class_histogram_test[int(line_dict['retail-object-labeling']['annotations'][i]['class_id'])] += 1  # counting testing samples\n",
    "\n",
    "# save the updated test manifest file locally\n",
    "with open(f\"{LOCAL_DATASET_FOLDER}/{PREFIX_DATASET}/test-updated.manifest\", \"w\") as f:\n",
    "    for line in new_manifest:\n",
    "        f.write(f\"{line}\\n\")         \n",
    "        "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# copy the new updated manifest files to S3\n",
    "\n",
    "sagemaker_session.upload_data(\n",
    "    path=f'{LOCAL_DATASET_FOLDER}/{PREFIX_DATASET}/train-updated.manifest',\n",
    "    bucket=BUCKET_NAME,\n",
    "    key_prefix=f'{PREFIX_PROJECT}/{PREFIX_DATASET}'\n",
    ")\n",
    "\n",
    "sagemaker_session.upload_data(\n",
    "    path=f'{LOCAL_DATASET_FOLDER}/{PREFIX_DATASET}/validation-updated.manifest',\n",
    "    bucket=BUCKET_NAME,\n",
    "    key_prefix=f'{PREFIX_PROJECT}/{PREFIX_DATASET}'\n",
    ")\n",
    "\n",
    "sagemaker_session.upload_data(\n",
    "    path=f'{LOCAL_DATASET_FOLDER}/{PREFIX_DATASET}/test-updated.manifest',\n",
    "    bucket=BUCKET_NAME,\n",
    "    key_prefix=f'{PREFIX_PROJECT}/{PREFIX_DATASET}'\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# depict statistics about the existing dataset splits\n",
    "\n",
    "df_dataset_stats = pd.DataFrame(\n",
    "    {\n",
    "        'train': class_histogram_train, \n",
    "        'validation': class_histogram_val, \n",
    "        'test': class_histogram_test, \n",
    "        'class': CLASS_NAMES\n",
    "    }\n",
    ") \n",
    "\n",
    "df_dataset_stats.plot.bar(x='class')\n",
    "plt.title('Number of examples per split')\n",
    "plt.show()\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Finally, we will copy the training images into a separate local folder. This will make our life easier later when we test the performance of our model. \n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# initialize local test set folder\n",
    "local_testset_folder = f'{LOCAL_DATASET_FOLDER}/dataset-test'  \n",
    "if os.path.exists(local_testset_folder) is False:\n",
    "    os.makedirs(local_testset_folder)\n",
    "    \n",
    "with open(f'{LOCAL_DATASET_FOLDER}/{PREFIX_DATASET}/test-updated.manifest') as f:\n",
    "    lines = f.readlines()\n",
    "for i,line in enumerate(lines):\n",
    "    line_dict = json.loads(line)\n",
    "    filename = str(Path(line_dict['source-ref']).name)\n",
    "    shutil.copy(  \n",
    "        src=f'{LOCAL_DATASET_FOLDER}/{PREFIX_DATASET}/{filename}',  # copy test images to a separate folder for later testing\n",
    "        dst=f'{local_testset_folder}/{filename}'\n",
    "    )\n",
    "\n",
    "shutil.copy(  \n",
    "        src=f'{LOCAL_DATASET_FOLDER}/{PREFIX_DATASET}/test-updated.manifest',  # also copy the test manifest file\n",
    "        dst=f'{local_testset_folder}/test-updated.manifest'\n",
    "    )\n",
    "\n",
    "print('Test images located in:', local_testset_folder)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Step 2: Set up input data channels\n",
    "\n",
    "In order for SageMaker to train a machine learning model, it needs to know where the training and validation datasets are located. In SageMaker language, this is called **input channels**. Input channels are **objects** containing the location of the datasets in S3, along with the type of data and annotations. \n",
    "\n",
    "In our case, we have in S3, both for the training and validation datasets:\n",
    "\n",
    "* A **JSONLines manifest file** listing what images are in the data-set (by their S3 URI) and what annotations have been collected for those images (bounding boxes + class names)\n",
    "* The image files themselves\n",
    "\n",
    "We want SageMaker to provide the algorithm with a **stream of image records** comprising both the image data and their annotations. This will be faster compared to downloading the full dataset to the training container. The [algorithm docs](https://docs.aws.amazon.com/sagemaker/latest/dg/object-detection.html#object-detection-inputoutput) give guidance on how to set this up: SageMaker already provides functionality to create RecordIO files from manifest files. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "train_channel = sagemaker.inputs.TrainingInput(\n",
    "    f's3://{BUCKET_NAME}/{PREFIX_PROJECT}/{PREFIX_DATASET}/train-updated.manifest',\n",
    "    distribution=\"FullyReplicated\",  # In case we want to try distributed training\n",
    "    content_type=\"application/x-recordio\",\n",
    "    s3_data_type=\"AugmentedManifestFile\",\n",
    "    record_wrapping=\"RecordIO\",\n",
    "    attribute_names=MANIFEST_ATTRIBUTE_NAMES,  # focus only on specific attributes inside the manifest file\n",
    "    shuffle_config=sagemaker.inputs.ShuffleConfig(seed=1)\n",
    ")\n",
    "                                        \n",
    "validation_channel = sagemaker.inputs.TrainingInput(\n",
    "    f's3://{BUCKET_NAME}/{PREFIX_PROJECT}/{PREFIX_DATASET}/validation-updated.manifest',\n",
    "    distribution=\"FullyReplicated\",  # In case we want to try distributed training\n",
    "    content_type=\"application/x-recordio\",\n",
    "    record_wrapping=\"RecordIO\",\n",
    "    s3_data_type=\"AugmentedManifestFile\",\n",
    "    attribute_names=MANIFEST_ATTRIBUTE_NAMES,  # focus only on specific attributes inside the manifest file\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Step 3: Configure the algorithm\n",
    "\n",
    "The first step in deciding to use a SageMaker built-in algorithm is to review its [documentation](https://docs.aws.amazon.com/sagemaker/latest/dg/object-detection.html) and [hyperparameters](https://docs.aws.amazon.com/sagemaker/latest/dg/object-detection-api-config.html). In particular we'll need the **URL for the Docker image** in order to use a built-in algorithm. While this is listed [in the docs](https://docs.aws.amazon.com/sagemaker/latest/dg/sagemaker-algo-docker-registry-paths.html), it's also nice and easy to fetch programmatically.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "training_image = sagemaker.image_uris.retrieve(\n",
    "    region=region, \n",
    "    framework=\"object-detection\", \n",
    "    version=\"1\"  # or you can use \"latest\"\n",
    ")\n",
    "print('Container image:', training_image)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The remainder of configuration includes:\n",
    "\n",
    "* Setting where to store final model artifacts and intermediate checkpoints\n",
    "* Specifying which compute resource to use\n",
    "* Selecting the algorithm's hyperparameters\n",
    "\n",
    "We do this through the [SageMaker SDK's Estimator API](https://sagemaker.readthedocs.io/en/stable/api/training/estimators.html), similarly to estimators in other common frameworks. Some things to keep in mind:\n",
    "\n",
    "* [Pipe Mode](https://docs.aws.amazon.com/sagemaker/latest/dg/cdf-training.html#cdf-pipe-mode) streams input data to the algorithm rather than (the default) downloading the whole dataset up-front. This can accelerate training start-up for algorithms that support it.\n",
    "* The Object Detection built-in algorithm supports GPU-accelerated and distributed training. Here we use a GPU-accelerated `ml.p3.2xlarge` instance. There is no need for distributed training (more than one instance),  due to the small dataset size."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "estimator = sagemaker.estimator.Estimator(\n",
    "    training_image,  # URL to container image implementing the algorithm \n",
    "    sagemaker_role,  # IAM access to perform the API actions\n",
    "    input_mode=\"Pipe\",  # or \"File\" mode\n",
    "    instance_count=1,  # if more than 1, then we have distributed training (not needed here!)\n",
    "    instance_type=\"ml.p3.2xlarge\",  # type of instance to be used for training\n",
    "    volume_size=50,  # (in GB) storage volume to use for storing input and output data during training \n",
    "    max_run=10*60*60,  # (in sec) maximum time of the training job\n",
    "    output_path=f\"s3://{BUCKET_NAME}/{PREFIX_PROJECT}/{PREFIX_MODELS}\"  # where to store the model weights\n",
    ")\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Next we will select the [algorithm's hyperparameters](https://docs.aws.amazon.com/sagemaker/latest/dg/object-detection-api-config.html). Here we selected a promising set of hyperparameters, after performing some initial testing. \n",
    "\n",
    "<div class=\"alert alert-block alert-warning\">\n",
    "<b>Important:</b> Our training dataset is quite small (only 66 images). If we train a large neural network from scratch, the results will not be good, because the model parameters will be more than the available training examples. As such, we are using <b>Transfer Learning</b>, by initializing our network with weights trained from a very large dataset (ImageNet) which contains millions of images. By doing so, we hope that our network will automatically learn some basic image features, like corners, edges and colors. Then, we use a <b>very low learning rate</b> (to avoid catastrophic forgetting), and we adjust the network weights only for a <b>few epochs</b>. This way, our network will adjust to our small dataset without forgetting the basic image features learned from millions of images. This technique is also known as <b>Supervised Finetuning</b>.\n",
    "</div>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "estimator.set_hyperparameters(\n",
    "    # Pre-training is particularly important for tiny data-sets like this!\n",
    "    base_network=\"resnet-50\",  # or 'vgg-16' architecture to be used\n",
    "    use_pretrained_model=1,  # 0/1 whether to use pretrained or random weights (0-> train from scratch) \n",
    "    early_stopping = True,\n",
    "    early_stopping_min_epochs = 10,\n",
    "    early_stopping_patience = 5,\n",
    "    early_stopping_tolerance = 0.00,\n",
    "    num_classes=len(CLASS_NAMES),\n",
    "    mini_batch_size=8,  # depends on the GPU memory and the image sizes\n",
    "    epochs=35,  # only for a few epochs. \n",
    "    learning_rate=0.00009,  # very small learning rate to avoid catastrophic forgetting\n",
    "    lr_scheduler_step=\"20,40,60,80\",\n",
    "    lr_scheduler_factor=0.5,\n",
    "    optimizer=\"adam\",\n",
    "    momentum=0.99,\n",
    "    weight_decay=0.98,\n",
    "    overlap_threshold=0.5,\n",
    "    nms_threshold=0.45,\n",
    "    image_shape=832,\n",
    "    label_width=350,\n",
    "    num_training_samples=n_samples_training,\n",
    ")\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Step 4: Train the model\n",
    "\n",
    "The hyperparameters above represent our best up-front guess; and it's easy enough to call `estimator.fit()` to train a model as shown below.\n",
    "\n",
    "One way to improve model performance and reduce some of the guesswork, is to let SageMaker `HyperParameterTuner` optimize them. SageMaker Hyper-Parameter Optiomization (HPO) supports [Random, Grid, Bayesian and Hyperband strategies](https://docs.aws.amazon.com/sagemaker/latest/dg/automatic-model-tuning-how-it-works.html)). In this example we prefer the [Hyperband approach, which usually exhibits a more competitive performance compared to ther rest](https://aws.amazon.com/blogs/machine-learning/amazon-sagemaker-automatic-model-tuning-now-provides-up-to-three-times-faster-hyperparameter-tuning-with-hyperband/). Because HPO typically takes much longer than standard model fitting, `tuner.fit()` is an **asynchronous** method by default whereas `estimator.fit()` is **synchronous** (blocking).\n",
    "\n",
    "<div class=\"alert alert-block alert-warning\">\n",
    "<b>Notice:</b> The following training code will take <b>approximately 10 minutes</b> to execute (without HPO). If you opt to activate HPO, it will take 2-3 hours! Therefore, we do not recommend running HPO during the workshop, but you are welcome to try it in your own time. \n",
    "</div>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "WITH_HPO = False  # change to True if you want to find better parameters (attention! it takes long time!)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true,
    "tags": []
   },
   "outputs": [],
   "source": [
    "%%time\n",
    "\n",
    "if WITH_HPO is False:\n",
    "    estimator.fit(\n",
    "        { \"train\": train_channel, \"validation\": validation_channel }, \n",
    "        logs=True\n",
    "    )\n",
    "    \n",
    "else:\n",
    "    hyperparameter_ranges = {\n",
    "        \"learning_rate\": sagemaker.tuner.ContinuousParameter(0.00001, 0.01),\n",
    "        \"momentum\": sagemaker.tuner.ContinuousParameter(0.0, 0.99),\n",
    "        \"weight_decay\": sagemaker.tuner.ContinuousParameter(0.0, 0.99),\n",
    "        \"optimizer\": sagemaker.tuner.CategoricalParameter([\"sgd\", \"adam\", \"rmsprop\", \"adadelta\"])\n",
    "    }\n",
    "\n",
    "    tuner = sagemaker.tuner.HyperparameterTuner(\n",
    "        estimator,\n",
    "        \"validation:mAP\",  # Name of the objective metric to optimize. \"Mean Average Precision\" high = good\n",
    "        objective_type=\"Maximize\",  # or Minimize\n",
    "        strategy=\"Hyperband\", # or Baysian or Random. \n",
    "        hyperparameter_ranges=hyperparameter_ranges,\n",
    "        base_tuning_job_name=\"object-detection-SSD-HPO\",\n",
    "        max_jobs=20,  # how many searches (training jobs) we will have in the parameter space\n",
    "        max_parallel_jobs=1  # how many searches (training jobs) will happen in parallel\n",
    "    )\n",
    "    \n",
    "    tuner.fit(\n",
    "        { \"train\": train_channel, \"validation\": validation_channel },\n",
    "        include_cls_metadata=False\n",
    "    )"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<div class=\"alert alert-block alert-warning\">\n",
    "<b>Note:</b> While you are waiting for the training to finish, you may start exploring the 2nd notebook: <b>2.Train-augmented-object-detection-model.ipynb</b>, which performs augmentations on the initial small dataset, in order to increase its size.</div>\n",
    "\n",
    "If you ever lose the notebook state e.g. due to a kernel restart or crash, you can **attach()** you estimator/tuner to a previous training/tuning job as follows (uncomment and run). There is no need to retrain because the results are all stored."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "# Examples to attach to a previous training run:\n",
    "\n",
    "#estimator.attach(\"SSD-HPO-220924-1158-003-ba8e84f9\")  # change to the name of the training job\n",
    "\n",
    "#tuner.attach(\"SSD-HPO-220924-1158\")  # change to the name of the HPO job\n",
    "\n",
    "#WITH_HPO=?"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Once training finishes, we can explore the training logs and see how performance on the validation set changed over time through the training epochs.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import boto3\n",
    "import numpy as np\n",
    "import matplotlib.pyplot as plt\n",
    "import matplotlib.ticker as ticker\n",
    "\n",
    "%matplotlib inline\n",
    "plt.style.use('seaborn')\n",
    "\n",
    "client = boto3.client(\"logs\")\n",
    "BASE_LOG_NAME = \"/aws/sagemaker/TrainingJobs\"\n",
    "\n",
    "def plot_object_detection_log(model, title):\n",
    "    # retrieve from the training job logs the mAP across epochs and plot it\n",
    "    logs = client.describe_log_streams(\n",
    "        logGroupName=BASE_LOG_NAME, logStreamNamePrefix=model._current_job_name\n",
    "    )\n",
    "    cw_log = client.get_log_events(\n",
    "        logGroupName=BASE_LOG_NAME, logStreamName=logs[\"logStreams\"][0][\"logStreamName\"]\n",
    "    )\n",
    "\n",
    "    mAP_accs = []\n",
    "    for e in cw_log[\"events\"]:\n",
    "        msg = e[\"message\"]\n",
    "        if \"validation mAP <score>=\" in msg:\n",
    "            num_start = msg.find(\"(\")\n",
    "            num_end = msg.find(\")\")\n",
    "            mAP = msg[num_start + 1 : num_end]\n",
    "            mAP_accs.append(float(mAP))\n",
    "\n",
    "    print(title)\n",
    "    print(\"Maximum mAP: %f \" % max(mAP_accs))\n",
    "\n",
    "    fig, ax = plt.subplots()\n",
    "    plt.xlabel(\"Epochs\")\n",
    "    plt.ylabel(\"Mean Avg Precision (mAP)\")\n",
    "    plt.title(\"Validation performance per training epoch\")\n",
    "    (val_plot,) = ax.plot(range(len(mAP_accs)), mAP_accs, label=\"mAP\")\n",
    "    plt.legend(handles=[val_plot])\n",
    "    ax.yaxis.set_ticks(np.arange(0.0, 1.05, 0.1))\n",
    "    ax.yaxis.set_major_formatter(ticker.FormatStrFormatter(\"%0.2f\"))\n",
    "    plt.show()\n",
    " \n",
    "    \n",
    "if WITH_HPO is True: \n",
    "    estimator = tuner.best_estimator()\n",
    "    \n",
    "plot_object_detection_log(estimator, \"mAP tracking for job: \" + estimator._current_job_name)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Step 5: Deploy the model\n",
    "\n",
    "Once the model is trained, SageMaker supports many different [deployment options](https://docs.aws.amazon.com/sagemaker/latest/dg/deploy-model.html). In this example we'll deploy our trained model to a **real-time endpoint**, in order to have the lowest possible latency in our predictions. \n",
    "\n",
    "You can think of an endpoint as a dedicated web-server, making accessible our trained model's predictions through a REST API. Since our endpoints won't be handling any significant traffic volumes, we provision a single non-accelerated instance.\n",
    "\n",
    "<div class=\"alert alert-block alert-warning\">\n",
    "<b>Notice:</b> The following deployment of the model to a realtime endpoint will take approximately <b>5-7 minutes</b>.\n",
    "</div>\n",
    "\n",
    "<div class=\"alert alert-block alert-warning\">\n",
    "<b>Attention:</b> Please copy the <b>name of the deployed endpoint</b> (in the output of the following cell). You will need to paste it in the application. This will allow the application to route camera images to your endpoint for real-time inference.\n",
    "</div>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "%%time\n",
    "if WITH_HPO is True:\n",
    "    if (predictor_hpo):  # in case HPO was selected\n",
    "        try:\n",
    "            predictor_hpo.delete_endpoint()\n",
    "            print(\"Deleted previous HPO endpoint\")\n",
    "        except:\n",
    "            print(\"Couldn't delete previous HPO endpoint\")\n",
    "    print(\"Deploying HPO model...\")\n",
    "    predictor_hpo = tuner.deploy(\n",
    "        initial_instance_count=1,\n",
    "        instance_type=\"ml.c5.large\",\n",
    "        # wait=False,\n",
    "    )\n",
    "else:\n",
    "    if (predictor_std):  # in case simple training was selected\n",
    "        try:\n",
    "            predictor_std.delete_endpoint()\n",
    "            print(\"Deleted previous non-HPO endpoint\")\n",
    "        except:\n",
    "            print(\"Couldn't delete previous non-HPO endpoint\")\n",
    "    print(\"Deploying standard (non-HPO) model...\")\n",
    "    predictor_std = estimator.deploy(\n",
    "        initial_instance_count=1,  # number of instances for the endpoint\n",
    "        instance_type=\"ml.c5.large\",  # type of instance to be used for the endpoint\n",
    "        # wait=False,\n",
    "    )\n",
    "\n",
    "predictor = predictor_hpo if WITH_HPO else predictor_std\n",
    "print('Name of the deployed endpoint:',predictor.endpoint_name)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<div class=\"alert alert-block alert-warning\">\n",
    "<b>Attention:</b> Please copy the <b>name of the deployed endpoint</b> (in the output of the previous cell). You will need to paste it in the application. This will allow the application to route camera images to your endpoint for real-time inference.\n",
    "</div>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Step 6: Run inference on test images\n",
    "\n",
    "Now that we have one object detection model deployed, we can send some test images and see how it performs!\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Test on individual images\n",
    "\n",
    "The `predict()` function used here, provided in the `util` folder, includes code that allows us to send image files to our deployed endpoint and receive back the detection results. \n",
    "\n",
    "The built-in Object Detection algorithm doesn't estimate an optimal confidence threshold for us. Instead, it returns **all the detections**, irrespective of their confidence score. The `predict()` function includes the confidence threshold parameter `thresh`. Any detection with confidence score higher than the `thresh` parameter, will be visualized. The function also returns a Pandas DataFrame with all the detected bounding boxes for the given detection threshold. \n",
    "\n",
    "For visualization, the `predict()` function calls the `visualize_detection()` function, which is uses Matplotlib to plot the provided detection boxes over the image. You can test different filenames from the test folder, as well as different `thresh` values."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%load_ext autoreload\n",
    "%autoreload 2\n",
    "\n",
    "from util.util import predict\n",
    "\n",
    "sagemaker_runtime = boto3.client(service_name=\"runtime.sagemaker\")\n",
    "endpoint_name = predictor.endpoint_name\n",
    "print('Using endpoint:',endpoint_name)\n",
    "\n",
    "df_results = predict(\n",
    "    # change this to other test images to see more results\n",
    "    filename=f'{LOCAL_DATASET_FOLDER}/dataset-test/IMG_1599.jpg',  \n",
    "    runtime=sagemaker_runtime,\n",
    "    class_names = CLASS_NAMES,\n",
    "    endpoint_name=endpoint_name, \n",
    "    thresh=0.2, \n",
    "    visualize=True\n",
    ")\n",
    "\n",
    "df_results\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Compare multiple detection thresholds\n",
    "\n",
    "You can even generate side by side comparison of the results of multiple detection thresholds, by providing a **list of thresholds**, instead of a single number. In the following example, we depict the results for 3 different detection threshold values `[0.2, 0.4, 0.6]`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%load_ext autoreload\n",
    "%autoreload 2\n",
    "\n",
    "from util.util import predict\n",
    "\n",
    "sagemaker_runtime = boto3.client(service_name=\"runtime.sagemaker\")\n",
    "endpoint_name = predictor.endpoint_name\n",
    "print('Using endpoint:',endpoint_name)\n",
    "\n",
    "df_results = predict(\n",
    "    # change this to other test images to see more results\n",
    "    filename=f'{LOCAL_DATASET_FOLDER}/dataset-test/IMG_1599.jpg',  \n",
    "    runtime=sagemaker_runtime,\n",
    "    class_names = CLASS_NAMES,\n",
    "    endpoint_name=endpoint_name, \n",
    "    thresh=[0.2,0.4,0.6],  # a list of thresholds to be visualized\n",
    "    visualize=True\n",
    ")\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Test on the whole test set\n",
    "\n",
    "We can now call the `evaluate_testset()` helper function, which will test the deployed model against the **whole test set** and generate a performance report, for each class. The helper function relies on two important parameters. **Intersection Over Union (IOU)** and the model's confidence threshold. IOU is a metric that describes the degree of overlap between two bounding boxes, and it is needed in order to know when the output of the model coincides with the ground truth bounding box. Check this page for a more [detailed description on IOU](https://en.wikipedia.org/wiki/Jaccard_index)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%load_ext autoreload\n",
    "%autoreload 2\n",
    "\n",
    "from util.util import evaluate_testset\n",
    "\n",
    "df_class_performance = evaluate_testset(\n",
    "    runtime=sagemaker_runtime, \n",
    "    endpoint_name=endpoint_name,\n",
    "    class_names = CLASS_NAMES,\n",
    "    testset_folder = f'{LOCAL_DATASET_FOLDER}/dataset-test', \n",
    "    test_manifest_file = f'{LOCAL_DATASET_FOLDER}/dataset-test/test-updated.manifest', \n",
    "    thr_iou = 0.5,  # Intersection Over Union threshold\n",
    "    thr_conf = 0.2  # Confidence Threshold\n",
    ")\n",
    "\n",
    "df_class_performance\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Here is a quick explanation for the performance statistics. Check this page for a more [detailed description](https://en.wikipedia.org/wiki/Precision_and_recall).\n",
    "\n",
    "| Abbreviation | Full name | Explanation | Range | Intuition |\n",
    "| --- | --- | --- | --- | --- | \n",
    "| `TP` | True Positives | How many times the model has correctly detected an actual object | Depends on test set | Higher is better |\n",
    "| `FP` | False Positives | How many times the model has mistakenly detected something that was not an actual object | Depends on test set  | Lower is better |\n",
    "| `FN` | False Negatives | How many times the model did not detect (missed) an actual object | Depends on test set | Lower is better |\n",
    "| `TN` | True Negatives | (**Does not apply in object detection!**) How many times the model correctly did not detect something that was not an actual object | N/A | N/A |\n",
    "| `PR` | Precision | What percentage of all the model's detections were indeed actual objects. | [0,1] | Higher is better |\n",
    "| `RE` | Recall | What percentage of all the actual available objects were detected by the model | [0,1] | Higher is better |\n",
    "| `F1` | F1 score | A balanced combination (harmonic mean) of Precision and Recall.  | [0,1] | Higher is better |"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# plot a graph of the results \n",
    "\n",
    "plt.figure()\n",
    "df_class_performance.plot.bar(x='CLASS')\n",
    "plt.title('Model performance on the Testset')\n",
    "plt.show()\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "tags": []
   },
   "source": [
    "## Step 7: Clean up\n",
    "\n",
    "<div class=\"alert alert-block alert-warning\">\n",
    "<b>Note:</b> Don't run this clean up code until you have finished the whole workshop! In fact, in an AWS-hosted event using AWS-owned accounts, you don't have to clean up at all. :)\n",
    "</div>\n",
    "\n",
    "Although training instances are ephemeral, the resources we allocated for real-time endpoints need to be cleaned up to avoid ongoing charges. The code below will delete the *most recently deployed* endpoint for the HPO and non-HPO configurations, but note that if you deployed either more than once, you might end up with extra endpoints. To be safe, it's best to still check through the SageMaker console for any left-over resources when cleaning up."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# uncomment the following code in order to delete the created resources\n",
    "\n",
    "# if (predictor_hpo):\n",
    "#     print(\"Deleting HPO-optimized predictor endpoint\")\n",
    "#     predictor_hpo.delete_endpoint()\n",
    "# if (predictor_std):\n",
    "#     print(\"Deleting standard (non-HPO) predictor endpoint\")\n",
    "#     predictor_std.delete_endpoint()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "jp-MarkdownHeadingCollapsed": true,
    "tags": []
   },
   "source": [
    "## Review\n",
    "\n",
    "In this notebook we used Amazon SageMaker to train a custom computer vision object detection model using the built-in Object Detection algorithm. We particularly used a **Tranfer Learning** technique, which allowed us to leverage pre-trained weights from a larger generic dataset and use a smaller specialized training dataset, relevant to our use case. Once you have a model, you can return to the [next page](https://catalog.workshops.aws/cv-retail/en-US/test-model) of the workshop instructions to test your model on a live camera feed.  \n"
   ]
  }
 ],
 "metadata": {
  "instance_type": "ml.t3.medium",
  "kernelspec": {
   "display_name": "Python 3.6.9 64-bit",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.6"
  },
  "vscode": {
   "interpreter": {
    "hash": "31f2aee4e71d21fbe5cf8b01ff0e069b9275f58929596ceb00d14d90e3e16cd6"
   }
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}