{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Video Segment Detection using Amazon Rekognition" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "***\n", "This notebook provides a walkthrough of [Video Segment Detection APIs](https://docs.aws.amazon.com/rekognition/latest/dg/segments.html) in Amazon Rekognition.\n", "\n", "Today, companies use large teams of trained human workforces to perform tasks such as the following.\n", "\n", "* Finding where the end credits begin in a piece of content.\n", "* Choosing the right spots to insert advertisments.\n", "* Breaking up videos into smaller clips for better indexing.\n", "\n", "Amazon Rekognition Video makes it easy to automate these operational media analysis tasks by providing fully managed, purpose-built APIs powered by Machine Learning (ML). By using the Amazon Rekognition Video segment APIs, you can easily analyze large volumes of videos and detect markers such as black frames or shot changes.\n", "***" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Getting Started" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Initialise Notebook\n", "import boto3\n", "from IPython.display import Image as IImage, display\n", "from IPython.display import HTML, display\n", "from PIL import Image, ImageDraw, ImageFont\n", "import time\n", "import os" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Curent AWS Region. Use this to choose corresponding S3 bucket with sample content\n", "\n", "mySession = boto3.session.Session()\n", "awsRegion = mySession.region_name" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Init clients\n", "rekognition = boto3.client('rekognition')\n", "s3 = boto3.client('s3')\n", "\n", "# Set the name of our bucket\n", "bucketName = \"aws-rek-immersionday-\" + awsRegion" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Shot Detection\n", "\n", "***\n", "A shot is a series of interrelated consecutive pictures taken contiguously by a single camera and representing a continuous action in time and space. With Amazon Rekognition Video, you can detect the start, end, and duration of each shot, as well as a count for all the shots in a piece of content.\n", "\n", "Our video contains two different shots, and Amazon Rekognition detects the change in shot, and provides specific information about when the shots start and finish.\n", "***" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Let's take a look at new raw video" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#Define the video that we want to process\n", "videoName = \"media/video-segment-detection/shots_video.mp4\"\n", "s3VideoUrl = s3.generate_presigned_url('get_object', Params={'Bucket': bucketName, 'Key': videoName})\n", "\n", "#Create a video HTML 5 tag which can be rendered in our Jupyter notebook and display it.\n", "videoTag = \"\".format(s3VideoUrl)\n", "videoui = \"
{}
\".format(videoTag)\n", "display(HTML(videoui))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Now we start the asynchronous job to detect technical cues" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#Make the API Call to start shot detection\n", "startSegmentDetection = rekognition.start_segment_detection(\n", " Video={\n", " 'S3Object': {\n", " 'Bucket': bucketName,\n", " 'Name': videoName,\n", " },\n", " },\n", " SegmentTypes=['SHOT']\n", ")\n", "\n", "#Grab and print the ID of our job\n", "segmentationJobId = startSegmentDetection['JobId']\n", "display(\"Job Id: {0}\".format(startSegmentDetection))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### And wait for the job to complete" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#Grab the segment detection response\n", "getSegmentDetection = rekognition.get_segment_detection(\n", " JobId=segmentationJobId\n", ")\n", "\n", "#Determine the state. If the job is still processing we will wait a bit and check again\n", "while(getSegmentDetection['JobStatus'] == 'IN_PROGRESS'):\n", " time.sleep(5)\n", " print('.', end='')\n", " \n", " getSegmentDetection = rekognition.get_segment_detection(\n", " JobId=segmentationJobId)\n", " \n", "#Once the job is no longer in progress we will proceed\n", "display(getSegmentDetection['JobStatus'])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Now we will view and process the response from Amazon Rekognition" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#Print the raw response\n", "print(getSegmentDetection)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "for technicalCue in getSegmentDetection['Segments']:\n", " print(technicalCue)\n", " \n", " #Find the start point of the scene\n", " frameStartValue = technicalCue['StartTimestampMillis']\n", " #Divide by 1000 to convert from milliseconds to seconds\n", " frameStartValue = frameStartValue/1000.0\n", " \n", " #Find the start point of the scene\n", " frameEndValue = technicalCue['EndTimestampMillis']\n", " #Divide by 1000 to convert from milliseconds to seconds\n", " frameEndValue = frameEndValue/1000.0\n", " \n", " #Create a video HTML 5 tag which can be rendered in our Jupyter notebook and display it.\n", " #This video tag will start on the first frame identified by the shot, and end on the last frame.\n", " videoTag = \"\".format(s3VideoUrl,'#t=',frameStartValue,',',frameEndValue)\n", " videoui = \"
{}
\".format(videoTag)\n", " display(HTML(videoui))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Technical Cue Identification\n", "\n", "***\n", "We've gone ahead and added some technical cues to our previous video. These include a SMPTE color bar image which is used for device callibration. It also includes a group of black frames which are commonly included in content to symbol where a break may be placed for something like commercial insertion. Finally, we've included some sample credits at the end.\n", "\n", "These cues are all identified using the \"Technical Cue\" functionality of the detect segment APIs\n", "***" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Let's take a look at the new raw video" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#Define the video that we want to process\n", "videoName = \"media/video-segment-detection/technical_cues.mp4\"\n", "s3VideoUrl = s3.generate_presigned_url('get_object', Params={'Bucket': bucketName, 'Key': videoName})\n", "\n", "#Create a video HTML 5 tag which can be rendered in our Jupyter notebook and display it.\n", "videoTag = \"\".format(s3VideoUrl)\n", "videoui = \"
{}
\".format(videoTag)\n", "display(HTML(videoui))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Now we start the asynchronous job to detect technical cues" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#Make the API Call to start segment detection for Technical Cues\n", "startSegmentDetection = rekognition.start_segment_detection(\n", " Video={\n", " 'S3Object': {\n", " 'Bucket': bucketName,\n", " 'Name': videoName,\n", " },\n", " },\n", " SegmentTypes=['TECHNICAL_CUE'] #This indicates we only want the technical cues right now\n", ")\n", "\n", "#Grab and print the ID of our job\n", "segmentationJobId = startSegmentDetection['JobId']\n", "display(\"Job Id: {0}\".format(startSegmentDetection))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### And wait for the job to complete" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#Grab the segment detection response\n", "getSegmentDetection = rekognition.get_segment_detection(\n", " JobId=segmentationJobId\n", ")\n", "\n", "#Determine the state. If the job is still processing we will wait a bit and check again\n", "while(getSegmentDetection['JobStatus'] == 'IN_PROGRESS'):\n", " time.sleep(5)\n", " print('.', end='')\n", " \n", " getSegmentDetection = rekognition.get_segment_detection(\n", " JobId=segmentationJobId)\n", "\n", "#Once the job is no longer in progress we will proceed\n", "display(getSegmentDetection['JobStatus'])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Now we will view and process the response from Amazon Rekognition" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#Print the raw response\n", "print(getSegmentDetection)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#Now we're going to iterate through the Technical Cues one by one, and display a sample frame\n", "\n", "for technicalCue in getSegmentDetection['Segments']:\n", " print(technicalCue)\n", " #Find the middle point of the technical cue\n", " frameExampleValue = (technicalCue['StartTimestampMillis'] + technicalCue['EndTimestampMillis'])/2\n", " #Divide by 1000 to convert from milliseconds to seconds\n", " frameExampleValue = frameExampleValue/1000.0\n", " print(frameExampleValue)\n", " #Create a video HTML 5 tag which can be rendered in our Jupyter notebook and display it.\n", " #This video tag will display the first frame, and does not contain the ability to progress through the video (effectively just displaying a single key frame)\n", " videoTag = \"\".format(s3VideoUrl,'#t=',frameExampleValue)\n", " videoui = \"
{}
\".format(videoTag)\n", " display(HTML(videoui))" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.6" } }, "nbformat": 4, "nbformat_minor": 4 }