# Transfer learning and action inference on input video segments
In this notebook, we will demonstrate activity detection on a video segment with machine learning. We will use the MXNet framework in script mode with the gluoncv toolkit. We will use a pre-trined i3D model (https://arxiv.org/abs/1705.07750) with a resnet50 backbone (https://arxiv.org/abs/1512.03385) trained on the Kinetics400 dataset (https://arxiv.org/abs/1705.06950) . 
We will then use transfer learning with our own custom action dataset 

(In this case, we select 101 classes from the UCF101 dataset -https://www.crcv.ucf.edu/research/data-sets/ucf101/

1) We will fine-tune the pre-trained model with this custom dataset to learn the typical video patterns belonging to these 101 action classes.

2) We will then deploy this model and host it on a sagemaker endpoint. 

3) Finally, we will make a inference request for a test video. 

Install and import the required gluoncv library 

In [None]:
!pip install gluoncv

In [None]:
import boto3, re, os
import numpy as np
import uuid

import mxnet as mx
from mxnet import gluon, nd, image
from mxnet.gluon.data.vision import transforms
from mxnet import gluon

from gluoncv.data.transforms import video
from gluoncv import utils
from gluoncv.model_zoo import get_model
from gluoncv import utils
from gluoncv.utils import export_block

import sagemaker
from sagemaker import get_execution_role
from sagemaker.mxnet import MXNet



In [None]:
sagemaker_session = sagemaker.Session()
role = get_execution_role()

Check the mxnet framework version = 1.6.0

In [None]:
mx.__version__

## Data preparation

Load the UCF101 dataset as described in the gluoncv guide here https://gluon-cv.mxnet.io/build/examples_datasets/ucf101.html#sphx-glr-build-examples-datasets-ucf101-py

Note : We are downloading only a tiny fraction of the entire UCF101 dataset here. You can modify the script flag below to download the entire dataset. The entire dataset size is 6.5 GB and will require update to the default volume size attached to the notebook instance. 

In [None]:
%%capture
!pip install rarfile --user
!pip install Cython --user
!pip install mmcv --user
!pip install torch --user
!python data-prep-code/ucf101.py --tiny_dataset

1) Raw frames have been extracted from the videos in a folder for each video. 

2) A settings file has been generated. There are three items in each line, separated by spaces. The first item is the path to your training videos, e.g., video_001. It should be a folder containing the frames of video_001.mp4. The second item is the number of frames in each video, e.g., 200. The third item is the label of the videos, e.g., 0.

Upload the raw frames and the settings list to S3 (can take upto 15 minutes)

In [None]:
import time
print(time.time())
sagemaker_session.upload_data(path='datasets/ucf101/rawframes/', key_prefix='data/ucf101-tiny/rawframes')
print(time.time())
sagemaker_session.upload_data(path='datasets/ucf101/ucfTrainTestlist/', key_prefix='data/ucf101-tiny/ucfTrainTestlist')
print(time.time())

In [None]:
bucket_name=sagemaker_session.default_bucket()
inputs = 's3://' + bucket_name + '/data/ucf101-tiny'

output_path = 'i3d_transfer_learning/output/'
code_location = 'i3d_transfer_learning/code/'

## Transfer Learning 
Transfer learning focuses on storing knowledge gained while solving one task and applying it to a different but related task. 

I3D (Inflated 3D Networks) is a widely adopted 3D video classification network. It uses 3D convolution to learn spatiotemporal information directly from videos. I3D is proposed to improve C3D (Convolutional 3D Networks) by inflating from 2D models. We can not only reuse the 2D models’ architecture (e.g., ResNet, Inception), but also bootstrap the model weights from 2D pretrained models. In this manner, training 3D networks for video classification is feasible and getting much better results.

In this example, we use Inflated 3D model (I3D) with ResNet50 backbone trained on Kinetics400 dataset.

Dataset size is a big factor in the performance of deep learning models. Kinetics400 has 306,245 short trimmed videos from 400 action categories. However, most often we dont have so much labeled data in another domain. Training a deep learning model on small datasets may lead to severe overfitting. 

Transfer learning is a technique that addresses this problem. The idea is simple: start training with a pre-trained model, instead of starting from scratch. For simple fine-tuning, just replace the last classification (dense) layer to the number of classes in the dataset. We can obtain good models on our own data without large annotated datasets and with less computing resource utilization for training.

Review the following training script as the entrypoint script to the MXNet estimator framework. The script executes training with the following steps : 

1) Data transformation 

 The transformation function does three things: center crop the image to 224x224 in size, transpose it to num_channels*num_frames*height*width, and normalize with mean and standard deviation calculated across all ImageNet images.

2) Data loader

Use the general gluoncv dataloader VideoClsCustom to load the data with num_frames = 32 as the length. For another dataset, you can just replace the value of root and setting to your data directory and your prepared text file.

3) Model training 

a) Load the pre-trained model.

b) Load input hyperparameters and number of action classes.

c) Re-define the output layer for the new task. In GluonCV, you can get your customized model with one line of code.

d) Define optimizer, loss and metric. Train the network for the new dataset.

In [1]:
!cat transfer-learning-code/transfer_learning.py

# Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
# SPDX-License-Identifier: MIT-0

from __future__ import print_function

import argparse
import logging
import os
import numpy as np
import json
import time

import mxnet as mx
from mxnet import gluon
from mxnet.gluon import nn
from mxnet import autograd as ag
from mxnet.gluon.data.vision import transforms

import gluoncv as gcv
from gluoncv.data.transforms import video
from gluoncv.data import VideoClsCustom
from gluoncv.model_zoo import get_model
from gluoncv.utils import makedirs, LRSequential, LRScheduler, split_and_load, TrainingHistory

logging.basicConfig(level=logging.DEBUG)

# ------------------------------------------------------------ #
# Training methods #
# ------------------------------------------------------------ #


def train(args):
 # SageMaker passes num_cpus, num_gpus and other args we can use to tailor training to
 # the current container environment
 num_gpus = 

Define the MXNet estimator to prepare for training. We use a p3 instance 'ml.p3.2xlarge' here to demonstrate gpu based training. You can update the instance type based on your dataset size and expected training times.

Training time recorded for the current dataset with 'ml.p3.2xlarge' is approximately 5 minutes.

Instance types for SageMaker are available here https://aws.amazon.com/sagemaker/pricing/instance-types/

In [None]:
m = MXNet("transfer_learning.py",
 source_dir="transfer-learning-code/",
 debugger_hook_config=False,
 role=role,
 output_path='s3://' + bucket_name + '/' + output_path,
 code_location='s3://' + bucket_name + '/' + code_location,
 train_instance_count=1,
 train_instance_type="ml.p3.2xlarge",
 framework_version="1.6.0",
 py_version="py3",
 hyperparameters={'batch-size': 8,
 'epochs': 20,
 'learning-rate': 0.001,
 'wd': 0.0001,
 'momentum': 0.9, 
 'log-interval': 100})

Launch a training job 

In [None]:
JOB_NAME=str(uuid.uuid4())
print(JOB_NAME)
m.fit(inputs,job_name=JOB_NAME)

### Model Inference

First, create a MXNet SageMaker Model that can be deployed to a SageMaker Endpoint. By default, this will use the SageMaker MXNet Inference toolkit for serving MXNet models on Amazon SageMaker. 

1) This will use a default framework image for MXNet version specified.

2) Provide the S3 location of the SageMaker model data .tar.gz file.

3) Provide the path to the Python inference file which should be executed as the entry point to model hosting.

4) Number of model server workers set to 10 to process parallel invocation requests

In [None]:
from sagemaker.mxnet.model import MXNetModel
sagemaker_model = MXNetModel(model_data = 's3://' + bucket_name + '/' + output_path + JOB_NAME + '/output/model.tar.gz', source_dir='inference-code/',
 role = role,framework_version='1.6.0',py_version='py3',entry_point='inference.py',model_server_workers=10,name='sagemaker-activity-detection-model-{0}'.format(str(int(time.time()))))
print(sagemaker_model.name)

Entrypoint script for inference. 

Input : Preprocessed 5D video clip (batch,channels,frames,height,width)

Output : UCF101 class prediction and probability

The script executes the same preprocessing and inference with the following steps :

1) Data transformation

The transformation function does three things: center crop the image to 224x224 in size, transpose it to num_channelsnum_framesheight*width, and normalize with mean and standard deviation calculated across all ImageNet images.

2) Data loader

Use the general gluoncv dataloader VideoClsCustom to load the data with num_frames = 32 as the length.

3) Load the gluon model in GPU memory on initialization.

4) Trigger model invocation with the input payload

In [2]:
!cat inference-code/inference.py

# Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
# SPDX-License-Identifier: MIT-0

from __future__ import absolute_import

import subprocess
import sys
import io
import os
import boto3
import time
import json
import uuid

import mxnet as mx
import numpy as np
from mxnet import gluon,nd
from sagemaker_inference import content_types, default_inference_handler, errors
from io import BytesIO
from datetime import datetime


import gluoncv
from gluoncv.data.transforms import video
from gluoncv.data import VideoClsCustom
from gluoncv.utils.filesystem import try_import_decord

ctx = mx.gpu(0) if mx.context.num_gpus() > 0 else mx.cpu()
#UCF101 classes
classes = ['ApplyEyeMakeup'
,'ApplyLipstick'
, 'Archery'
, 'BabyCrawling'
, 'BalanceBeam'
, 'BandMarching'
, 'BaseballPitch'
, 'Basketball'
, 'BasketballDunk'
, 'BenchPress'
, 'Biking'
, 'Billiards'
, 'BlowDryHair'
, 'BlowingCandles'
, 'BodyWeightSquats'
, 'Bowling'
, 'BoxingPunching

### Model hosting 

Deploy the model on a single g4dn instance. 

G4 is a good platform for ML inference on images at low cost. G4 is based on the Turing T4 GPU which is purposed built with RTX tracing cores, tensor cores. Here is a link to inference benchmarks from Nvidia
https://developer.nvidia.com/deep-learning-performance-training-inference .
G4 prove to have similar throughput with higher energy efficiency wrt P3 instances, which means they are a good choice for inference tasks at a low cost.

In [None]:
import logging
logging.getLogger().setLevel(logging.WARNING)
#Instance type used for deployment
MODEL_INSTANCE_TYPE = 'ml.g4dn.2xlarge'
#Number of instances used for deployment (could be increased based on the prediction requests)
INSTANCE_COUNT = 1
#Model endpoint name
ENDPOINT_NAME = 'activity-detection-model-endpoint'
predictor = sagemaker_model.deploy(initial_instance_count=INSTANCE_COUNT,instance_type=MODEL_INSTANCE_TYPE,endpoint_name=ENDPOINT_NAME)

In [None]:
sm_client = boto3.client('sagemaker')
sm_client.describe_endpoint(EndpointName=ENDPOINT_NAME)['EndpointArn']

### Video inference test

Test ML inference on videos from another free video data source (Pexels)

In [None]:
#https://www.pexels.com/video/men-playing-tennis-at-daylight-992695/
payload1 = sagemaker_session.upload_data(path='../videos/PexelsVideos992695.mp4', key_prefix='data/ucf101')
S3_VIDEO_PATH = payload1

#Dict data to be passed to the endpoint
data1 = {
 'S3_VIDEO_PATH': S3_VIDEO_PATH,
}

In [None]:
#https://www.pexels.com/video/people-skiing-857074/
payload2 = sagemaker_session.upload_data(path='../videos/PeopleSkiing.mp4', key_prefix='data/ucf101')
S3_VIDEO_PATH = payload2

#Dict data to be passed to the endpoint
data2 = {
 'S3_VIDEO_PATH': S3_VIDEO_PATH,
}

Invoke endpoint and print results from the API with details 
1. S3 input path
2. Output class
3. Output probability
4. Time of detection

In [None]:
print(ENDPOINT_NAME)
import time
import json
print(time.time())
sm_runtime = boto3.Session().client('sagemaker-runtime')
response = sm_runtime.invoke_endpoint(EndpointName=ENDPOINT_NAME, ContentType='application/json',Accept='application/json',Body=json.dumps(data1))
print(time.time())
#Get and print the response
response_body = json.loads(response['Body'].read().decode('utf-8'))
print(response_body)

In [None]:
print(ENDPOINT_NAME)
import time
import json
print(time.time())
sm_runtime = boto3.Session().client('sagemaker-runtime')
response = sm_runtime.invoke_endpoint(EndpointName=ENDPOINT_NAME, ContentType='application/json',Accept='application/json',Body=json.dumps(data2))
print(time.time())
#Get and print the response
response_body = json.loads(response['Body'].read().decode('utf-8'))
print(response_body)