{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "2bc39d14-7603-4588-8f35-e8c29320b90c",
   "metadata": {
    "tags": []
   },
   "source": [
    "# Train and Test custom YOLOv5 on Amazon SageMaker Studio"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "369a9b33-2052-4c57-aa4f-b4b26e1c1d62",
   "metadata": {},
   "source": [
    "In this notebook we will train and test a custom YOLOv5 object detection CV model within Amazon SageMaker Studio. \n",
    "\n",
    "**Steps:**\n",
    "\n",
    "0. Initial configuration.\n",
    "1. Download a labeled dataset.\n",
    "3. Train the custom YOLOv5 model.\n",
    "4. Make predictions against the created model. \n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1e59f9ec-43f8-4040-bebc-4423afdc4b04",
   "metadata": {},
   "source": [
    "| ⚠️ WARNING: For this notebook to work, make sure to select the following settings in your jupyter environment: |\n",
    "| -- |\n",
    "Image: \"PyTorch 1.10 Python 3.8 GPU Optimized\"\n",
    "Instance_type: \"ml.g4dn.xlarge\" (fast launch)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "356f3d22-a857-40b5-8cb5-3185dd353233",
   "metadata": {},
   "source": [
    "## 0. Initial Configuration"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "cea75758-365c-41a4-b229-01bb7689848b",
   "metadata": {},
   "source": [
    "#### Download the YOLOv5 repository"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "3b786198-5500-4fbe-bc2b-b36bbf109125",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "!git clone --quiet https://github.com/ultralytics/yolov5\n",
    "!pip install -qr yolov5/requirements.txt"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "a22293fe-b5c2-42c0-906a-add1fbf406e0",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "import os\n",
    "import boto3\n",
    "import glob\n",
    "s3_resource = boto3.resource('s3')"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "cddd1da6-e457-4e2b-acf3-d085643aa8c3",
   "metadata": {},
   "source": [
    "## 1. Download a labeled dataset with YOLOv5 expected format."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "cfefb767-e740-42e7-8b5b-5c30abeff17b",
   "metadata": {},
   "source": [
    "Before we train a custom YOLOv5 model, we need to have a labeled dataset. \n",
    "In the previous notebook \"0 - Label your dataset with Amazon SageMaker GroundTruth\" you will be able to label your own dataset and transform it into YOLOv5 expected format or use an example custom dataset. Once you have run through one of the two options you will have available the S3 dataset location and labels used."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "3856bee1-e3ed-4304-b7de-3ee4417430a4",
   "metadata": {},
   "outputs": [],
   "source": [
    "dataset_s3_uri = \"\"\n",
    "labels = []"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ee81a053-6502-453a-805a-5e640df0733f",
   "metadata": {},
   "source": [
    "#### Download the dataset"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "7f2122a6-1c42-4ce7-b8d4-6e72a81475f2",
   "metadata": {},
   "outputs": [],
   "source": [
    "def split_s3_path(s3_path):\n",
    "    path_parts=s3_path.replace(\"s3://\",\"\").split(\"/\")\n",
    "    bucket=path_parts.pop(0)\n",
    "    key=\"/\".join(path_parts)\n",
    "    return bucket, key"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "a6794fb3-de08-4527-8a73-683175121493",
   "metadata": {},
   "outputs": [],
   "source": [
    "bucket,dataset_name = split_s3_path(dataset_s3_uri)\n",
    "bucket,dataset_name"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "e10c3229-ac6d-4cfc-9588-38b4431b92c4",
   "metadata": {},
   "outputs": [],
   "source": [
    "def download_dataset(bucket_name, folder):\n",
    "    bucket = s3_resource.Bucket(bucket_name) \n",
    "    for obj in bucket.objects.filter(Prefix = folder):\n",
    "        if not os.path.exists(os.path.dirname(obj.key)):\n",
    "            os.makedirs(os.path.dirname(obj.key))\n",
    "        bucket.download_file(obj.key, obj.key)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "32a60468-0295-42ed-93d0-2efef95a016c",
   "metadata": {},
   "outputs": [],
   "source": [
    "download_dataset(bucket, dataset_name)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6edd94f5-f78a-44a7-8c7d-a23cd3cfe90e",
   "metadata": {},
   "source": [
    "#### Lets explore our dataset"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "44534876-38eb-4c01-a0ed-1447911ae23e",
   "metadata": {},
   "outputs": [],
   "source": [
    "for filename in glob.iglob(dataset_name + '**/**', recursive=True):\n",
    "     print(filename)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d388275f-d930-44ac-be59-cd9e1312c7a4",
   "metadata": {},
   "source": [
    "#### Now let's add these data sources to the data library in the yolov5 folder for our model to train"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "03755c69-4523-4d50-9cb2-e6a0b4ea2820",
   "metadata": {},
   "outputs": [],
   "source": [
    "with open(\"yolov5/data/custom-model.yaml\", 'w') as target:\n",
    "    target.write(\"path: ../{}\\n\".format(dataset_name))\n",
    "    target.write(\"train: images/train\\n\")\n",
    "    target.write(\"val: images/validation\\n\")\n",
    "    target.write(\"names:\\n\")\n",
    "    for i, label in enumerate(labels):\n",
    "        target.write(\"  {}: {}\\n\".format(i, label))\n",
    "        \n",
    "with open('yolov5/data/custom-model.yaml') as file:\n",
    "    lines = file.readlines()\n",
    "    for line in lines:\n",
    "        print(line)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "be7b8fbd-ccdb-4440-af14-ea428bcc1cce",
   "metadata": {
    "tags": []
   },
   "source": [
    "## 3. Train the custom YOLOv5 model."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "ee2f8907-0640-4aa0-a4b0-80b061b43659",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "!python yolov5/train.py --workers 4 --device 0 --img 640 --batch 8 --epochs 10 --data yolov5/data/custom-model.yaml --weights yolov5s.pt --cache"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "27d5835f-e975-408f-97f1-3c47f44ba56e",
   "metadata": {},
   "source": [
    "## 4. Make inferences with the created model."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "3a3dfee9-e2fc-4058-a5aa-2ffe186d4a1f",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "!python yolov5/detect.py --weights yolov5/runs/train/exp/weights/best.pt --img 640 --conf 0.5 --source \"\""
   ]
  },
  {
   "cell_type": "markdown",
   "id": "660fdab9-e4d8-4a76-8e0e-4ca98fbecd4c",
   "metadata": {},
   "source": [
    "| ⚠️ WARNING: Remember to shutdown the instance once finalized with this notebook to prevent unnecesary charges. Head to running Terminals and Kernels tab and shutdown the running instance. |\n",
    "| -- |"
   ]
  }
 ],
 "metadata": {
  "instance_type": "ml.g4dn.xlarge",
  "kernelspec": {
   "display_name": "Python 3 (PyTorch 1.10 Python 3.8 GPU Optimized)",
   "language": "python",
   "name": "python3__SAGEMAKER_INTERNAL__arn:aws:sagemaker:eu-west-1:470317259841:image/pytorch-1.10-gpu-py38"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.10"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}