{ "cells": [ { "cell_type": "markdown", "id": "08010a53-1695-4da2-aa66-5e9e91c710d0", "metadata": {}, "source": [ "# Video Object Detection -- Rekognition Custom Labels\n", "This sample notebook shows demonstrates how to detect custom labels in a video with Amazon Rekognition Custom Labels and draw their corresponding bounding boxes." ] }, { "cell_type": "markdown", "id": "a1ad8d05-8a62-4cfe-b737-e4b53bceafdb", "metadata": {}, "source": [ "### Import and install necessary packages\n", "Let's begin by importing and installing all the necessary packages we need to make the notebook run." ] }, { "cell_type": "code", "execution_count": null, "id": "ffcf8054-d084-4283-bd00-7325a40b1fd3", "metadata": {}, "outputs": [], "source": [ "# Install the first time you execute the notebook\n", "!apt-get -qq update\n", "!apt-get -qq install ffmpeg -y \n", "!pip install --quiet opencv-python-headless" ] }, { "cell_type": "code", "execution_count": null, "id": "9203c3ac-6dcd-4554-ad8b-50ec0e5df743", "metadata": {}, "outputs": [], "source": [ "import cv2\n", "import boto3\n", "import os\n", "import glob\n", "import time\n", "import queue\n", "import shutil\n", "from IPython.display import Video\n", "from multiprocessing import Lock, Process, Queue, current_process\n", "rekognition = boto3.client('rekognition')" ] }, { "cell_type": "markdown", "id": "d4862501-429e-440f-873b-d4e352960909", "metadata": {}, "source": [ "To be able to use your Amazon Rekognition Custom Labels running model, insert the arn of the project below, which you will locate in the Custom Labels console." ] }, { "cell_type": "code", "execution_count": null, "id": "e3af6fc2-ad74-469d-b050-497b84173b12", "metadata": {}, "outputs": [], "source": [ "projectVersionArn = \"***\" ## INSERT THE ARN OF YOUR CUSTOM LABELS PROJECT" ] }, { "cell_type": "markdown", "id": "26d6a21a-c157-4f5e-8315-10fe60c12a04", "metadata": {}, "source": [ "### Helper Functions\n", "Here are a couple of helper functions we are going to use to process our video frames and detect and draw the bounding boxes." ] }, { "cell_type": "code", "execution_count": null, "id": "53c96d21-b311-4caa-8bde-ad9e35484336", "metadata": {}, "outputs": [], "source": [ "os.mkdir(\"input_video\")\n", "os.mkdir(\"output_video\")\n", "\n", "def chunks(lst, n):\n", " for i in range(0, len(lst), n):\n", " yield lst[i:i + n]\n", " \n", "def transform_bounding(frame,box):\n", " imgWidth, imgHeight = frame\n", " left = int(imgWidth * box['Left'])\n", " top = int(imgHeight * box['Top'])\n", " right = left + int(imgWidth * box['Width'])\n", " bottom = top + int(imgHeight * box['Height'])\n", " return left,top,right,bottom\n", "\n", "def process_frames(frames_list):\n", " for f in frames_list:\n", " frame = cv2.imread(f)\n", " image_bytes = cv2.imencode('.png', frame)[1].tobytes()\n", " response = rekognition.detect_custom_labels(\n", " Image={'Bytes': image_bytes},\n", " ProjectVersionArn = projectVersionArn\n", " )\n", " if (len(response[\"CustomLabels\"]) > 0):\n", " for elabel in response[\"CustomLabels\"]:\n", " if int(elabel[\"Confidence\"]) > 50:\n", " left,top,right,bottom = transform_bounding(size ,elabel[\"Geometry\"][\"BoundingBox\"])\n", " label = elabel[\"Name\"]\n", " conf = elabel[\"Confidence\"]\n", "\n", " imgWidth, imgHeight= size\n", " thick = int((imgHeight + imgWidth) // 500)\n", "\n", " color = (0,255,0)\n", " cv2.rectangle(frame,(left, top), (right, bottom), color, thick)\n", " cv2.putText(frame, label+\":\"+str(conf)[0:4], (left, top - 12), 0, 1e-3 * imgHeight, color, thick//1) \n", " cv2.imwrite(f,frame) \n", " else:\n", " cv2.imwrite(f,frame)\n", " else:\n", " cv2.imwrite(f,frame)\n", " \n", "def detect_labels(frames_queue):\n", " while True:\n", " try:\n", " task = frames_queue.get_nowait()\n", " except queue.Empty:\n", " print(\"Queue Empty\")\n", " break\n", " else:\n", " process_frames(task)\n", " return True\n", "\n", "def get_video_info(video):\n", " cap = cv2.VideoCapture(original_video)\n", " fps = cap.get(cv2.CAP_PROP_FPS)\n", " size = (int(cap.get(cv2.CAP_PROP_FRAME_WIDTH)),\n", " int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT)))\n", " cap.release()\n", " return fps, size" ] }, { "cell_type": "markdown", "id": "a729cbe6-87fd-43dd-8392-394d23755b4c", "metadata": {}, "source": [ "### Input/Output Configuration\n", "Upload a video into the \"input_video\" folder to be processed. Next, specify the name of the file below." ] }, { "cell_type": "code", "execution_count": null, "id": "e9691f93-0e6d-40ff-bfa6-e8d78d04e7f2", "metadata": {}, "outputs": [], "source": [ "#---------------------------------------------------------------------------\n", "# INSERT THE NAME OF THE INPUT VIDEO LOCATED IN THE INPUT_VIDEO FOLDER\n", "original_video_name = \"***.mp4\" # eg. \"original.mp4\"\n", "#---------------------------------------------------------------------------" ] }, { "cell_type": "code", "execution_count": null, "id": "2cf98985-8830-4bda-b4e5-c229113bcf3b", "metadata": {}, "outputs": [], "source": [ "original_video = \"input_video/{}\".format(original_video_name)\n", "frames_folder = \"frames-{}\".format(original_video_name.split('.')[0])\n", "output_video = \"output_video/{}-labeled.mp4\".format(original_video_name.split('.')[0])\n", "fps, size = get_video_info(original_video)" ] }, { "cell_type": "markdown", "id": "68878324-d3bb-40e9-b070-e2322bcd57e7", "metadata": {}, "source": [ "Review the video you have chosen to process." ] }, { "cell_type": "code", "execution_count": null, "id": "b4551872-699c-41cc-b253-ff4bd7d8aa59", "metadata": {}, "outputs": [], "source": [ "Video(original_video)" ] }, { "cell_type": "markdown", "id": "f2bea1c1-fc43-4464-888f-ef46a6dff747", "metadata": {}, "source": [ "### Split the video into frames " ] }, { "cell_type": "markdown", "id": "e4b219ed-6b6b-480f-8a13-df9c0ed1046c", "metadata": {}, "source": [ "Now we are going to split our input video into frames, we'll use FFMPEG for this task and save the frames into a frames folder." ] }, { "cell_type": "code", "execution_count": null, "id": "74b6ea60-e284-4c43-885b-7e7ccd41961a", "metadata": {}, "outputs": [], "source": [ "print(\"Splitting frames...\")\n", "os.mkdir(frames_folder)\n", "!ffmpeg -hide_banner -loglevel error -i {original_video} {frames_folder}/frame-%03d.png\n", "print(\"Splitting frames... -- Complete!\")" ] }, { "cell_type": "markdown", "id": "f28ba6e8-c365-4d51-9a68-1ec31d464288", "metadata": {}, "source": [ "### Multiprocessing Configuration" ] }, { "cell_type": "markdown", "id": "aa4a6a3e-f9ea-4b99-837a-40d0dbf6ea0e", "metadata": {}, "source": [ "Now we have our video split into frames let's move them into a queue divided in X chunks." ] }, { "cell_type": "code", "execution_count": null, "id": "ca1d9707-1070-43da-a8ae-bf8669b4bdf6", "metadata": {}, "outputs": [], "source": [ "files_list = glob.glob(\"{}/*\".format(frames_folder))\n", "n = 20 #Frames divided into chunks of n\n", "number_of_chunks = int(len(files_list)/n)+1\n", "split_list = list(chunks(files_list, number_of_chunks))\n", "\n", "frames_queue = Queue()\n", "for chunk in split_list:\n", " frames_queue.put(chunk)\n", "\n", "number_of_processors = 5 #Number of subprocesses\n", "processes = []\n", "\n", "print(\"Size of queue:\",frames_queue.qsize())" ] }, { "cell_type": "markdown", "id": "4ecb125c-f177-438b-be1e-dab30d666457", "metadata": {}, "source": [ "### Get bounding boxes for frames and overwrite the image file" ] }, { "cell_type": "markdown", "id": "18f148ba-b937-4804-a70a-a47258fcf4ac", "metadata": {}, "source": [ "Let's iterate over the queue of chunks (using multiprocessing) to call Amazon Rekognition Custom Labels to detect objects in our frames." ] }, { "cell_type": "code", "execution_count": null, "id": "cf266622-aa64-49e9-a676-01946da87e9e", "metadata": {}, "outputs": [], "source": [ "print(\"Detecting Labels...\")\n", "for w in range(number_of_processors):\n", " p = Process(target=detect_labels, args=(frames_queue,))\n", " processes.append(p)\n", " p.start()\n", "for p in processes:\n", " p.join()\n", " p.kill()\n", "print(\"Detecting Labels... -- Complete!\")" ] }, { "cell_type": "markdown", "id": "b831b58d-90e8-42c5-8779-c7fdd9e2538c", "metadata": {}, "source": [ "### Create the labeled video from frames" ] }, { "cell_type": "markdown", "id": "e411b6e3-11bb-4ae1-97d4-063995823d52", "metadata": {}, "source": [ "Once we have finished detecting objects and drawing the bounding boxes over the frames, we can stich the video back together." ] }, { "cell_type": "code", "execution_count": null, "id": "ce562cc2-6a26-4ce3-8421-61b678c5a98a", "metadata": {}, "outputs": [], "source": [ "print(\"Creating output video...\")\n", "!ffmpeg -hide_banner -loglevel error -f image2 -r {fps} -i {frames_folder}/frame-%03d.png -vcodec libx264 -crf 18 -pix_fmt yuv420p {output_video} -y\n", "print(\"Creating output video... -- Complete!\")\n", "shutil.rmtree(frames_folder)" ] }, { "cell_type": "markdown", "id": "0650c183-3648-40f3-b755-b8b3bcdf2ca0", "metadata": {}, "source": [ "Review your labeled video" ] }, { "cell_type": "code", "execution_count": null, "id": "5f4f2712-c409-4462-9ced-9b9460508d0b", "metadata": {}, "outputs": [], "source": [ "Video(output_video)" ] } ], "metadata": { "instance_type": "ml.t3.medium", "kernelspec": { "display_name": "Python 3 (Data Science)", "language": "python", "name": "python3__SAGEMAKER_INTERNAL__arn:aws:sagemaker:eu-west-1:470317259841:image/datascience-1.0" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.10" } }, "nbformat": 4, "nbformat_minor": 5 }