# Video Segment Detection using Amazon Rekognition

***
This notebook provides a walkthrough of [Video Segment Detection APIs](https://docs.aws.amazon.com/rekognition/latest/dg/segments.html) in Amazon Rekognition.

Today, companies use large teams of trained human workforces to perform tasks such as the following.

* Finding where the end credits begin in a piece of content.
* Choosing the right spots to insert advertisments.
* Breaking up videos into smaller clips for better indexing.

Amazon Rekognition Video makes it easy to automate these operational media analysis tasks by providing fully managed, purpose-built APIs powered by Machine Learning (ML). By using the Amazon Rekognition Video segment APIs, you can easily analyze large volumes of videos and detect markers such as black frames or shot changes.
***

### Getting Started

In [None]:
# Initialise Notebook
import boto3
from IPython.display import Image as IImage, display
from IPython.display import HTML, display
from PIL import Image, ImageDraw, ImageFont
import time
import os

In [None]:
# Curent AWS Region. Use this to choose corresponding S3 bucket with sample content

mySession = boto3.session.Session()
awsRegion = mySession.region_name

In [None]:
# Init clients
rekognition = boto3.client('rekognition')
s3 = boto3.client('s3')

# Set the name of our bucket
bucketName = "aws-rek-immersionday-" + awsRegion

## Shot Detection

***
A shot is a series of interrelated consecutive pictures taken contiguously by a single camera and representing a continuous action in time and space. With Amazon Rekognition Video, you can detect the start, end, and duration of each shot, as well as a count for all the shots in a piece of content.

Our video contains two different shots, and Amazon Rekognition detects the change in shot, and provides specific information about when the shots start and finish.
***

#### Let's take a look at new raw video

In [None]:
#Define the video that we want to process
videoName = "media/video-segment-detection/shots_video.mp4"
s3VideoUrl = s3.generate_presigned_url('get_object', Params={'Bucket': bucketName, 'Key': videoName})

#Create a video HTML 5 tag which can be rendered in our Jupyter notebook and display it.
videoTag = "<video controls='controls' width='640' height='360' name='Video' src='{0}'></video>".format(s3VideoUrl)
videoui = "<table><tr><td style='vertical-align: top'>{}</td></tr></table>".format(videoTag)
display(HTML(videoui))

#### Now we start the asynchronous job to detect technical cues

In [None]:
#Make the API Call to start shot detection
startSegmentDetection = rekognition.start_segment_detection(
    Video={
        'S3Object': {
            'Bucket': bucketName,
            'Name': videoName,
        },
    },
    SegmentTypes=['SHOT']
)

#Grab and print the ID of our job
segmentationJobId = startSegmentDetection['JobId']
display("Job Id: {0}".format(startSegmentDetection))

#### And wait for the job to complete

In [None]:
#Grab the segment detection response
getSegmentDetection = rekognition.get_segment_detection(
    JobId=segmentationJobId
)

#Determine the state. If the job is still processing we will wait a bit and check again
while(getSegmentDetection['JobStatus'] == 'IN_PROGRESS'):
    time.sleep(5)
    print('.', end='')
 
    getSegmentDetection = rekognition.get_segment_detection(
    JobId=segmentationJobId)
    
#Once the job is no longer in progress we will proceed
display(getSegmentDetection['JobStatus'])

#### Now we will view and process the response from Amazon Rekognition

In [None]:
#Print the raw response
print(getSegmentDetection)

In [None]:
for technicalCue in getSegmentDetection['Segments']:
    print(technicalCue)
    
    #Find the start point of the scene
    frameStartValue = technicalCue['StartTimestampMillis']
    #Divide by 1000 to convert from milliseconds to seconds
    frameStartValue = frameStartValue/1000.0
    
    #Find the start point of the scene
    frameEndValue = technicalCue['EndTimestampMillis']
    #Divide by 1000 to convert from milliseconds to seconds
    frameEndValue = frameEndValue/1000.0
    
    #Create a video HTML 5 tag which can be rendered in our Jupyter notebook and display it.
    #This video tag will start on the first frame identified by the shot, and end on the last frame.
    videoTag = "<video width='640' controls loop height='360' name='Video' src='{0}{1}{2}{3}{4}'></video>".format(s3VideoUrl,'#t=',frameStartValue,',',frameEndValue)
    videoui = "<table><tr><td style='vertical-align: top'>{}</td></tr></table>".format(videoTag)
    display(HTML(videoui))

## Technical Cue Identification

***
We've gone ahead and added some technical cues to our previous video. These include a SMPTE color bar image which is used for device callibration. It also includes a group of black frames which are commonly included in content to symbol where a break may be placed for something like commercial insertion. Finally, we've included some sample credits at the end.

These cues are all identified using  the "Technical Cue" functionality of the detect segment APIs
***

#### Let's take a look at the new raw video

In [None]:
#Define the video that we want to process
videoName = "media/video-segment-detection/technical_cues.mp4"
s3VideoUrl = s3.generate_presigned_url('get_object', Params={'Bucket': bucketName, 'Key': videoName})

#Create a video HTML 5 tag which can be rendered in our Jupyter notebook and display it.
videoTag = "<video controls='controls' width='640' height='360' name='Video' src='{0}'></video>".format(s3VideoUrl)
videoui = "<table><tr><td style='vertical-align: top'>{}</td></tr></table>".format(videoTag)
display(HTML(videoui))

#### Now we start the asynchronous job to detect technical cues

In [None]:
#Make the API Call to start segment detection for Technical Cues
startSegmentDetection = rekognition.start_segment_detection(
    Video={
        'S3Object': {
            'Bucket': bucketName,
            'Name': videoName,
        },
    },
    SegmentTypes=['TECHNICAL_CUE'] #This indicates we only want the technical cues right now
)

#Grab and print the ID of our job
segmentationJobId = startSegmentDetection['JobId']
display("Job Id: {0}".format(startSegmentDetection))

#### And wait for the job to complete

In [None]:
#Grab the segment detection response
getSegmentDetection = rekognition.get_segment_detection(
    JobId=segmentationJobId
)

#Determine the state. If the job is still processing we will wait a bit and check again
while(getSegmentDetection['JobStatus'] == 'IN_PROGRESS'):
    time.sleep(5)
    print('.', end='')
 
    getSegmentDetection = rekognition.get_segment_detection(
    JobId=segmentationJobId)

#Once the job is no longer in progress we will proceed
display(getSegmentDetection['JobStatus'])

#### Now we will view and process the response from Amazon Rekognition

In [None]:
#Print the raw response
print(getSegmentDetection)

In [None]:
#Now we're going to iterate through the Technical Cues one by one, and display a sample frame

for technicalCue in getSegmentDetection['Segments']:
    print(technicalCue)
    #Find the middle point of the technical cue
    frameExampleValue = (technicalCue['StartTimestampMillis'] + technicalCue['EndTimestampMillis'])/2
    #Divide by 1000 to convert from milliseconds to seconds
    frameExampleValue = frameExampleValue/1000.0
    print(frameExampleValue)
    #Create a video HTML 5 tag which can be rendered in our Jupyter notebook and display it.
    #This video tag will display the first frame, and does not contain the ability to progress through the video (effectively just displaying a single key frame)
    videoTag = "<video width='640' height='360' name='Video' src='{0}{1}{2}'></video>".format(s3VideoUrl,'#t=',frameExampleValue)
    videoui = "<table><tr><td style='vertical-align: top'>{}</td></tr></table>".format(videoTag)
    display(HTML(videoui))