{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Transfer learning and action inference on input video segments\n", "In this notebook, we will demonstrate activity detection on a video segment with machine learning. We will use the MXNet framework in script mode with the gluoncv toolkit. We will use a pre-trined i3D model (https://arxiv.org/abs/1705.07750) with a resnet50 backbone (https://arxiv.org/abs/1512.03385) trained on the Kinetics400 dataset (https://arxiv.org/abs/1705.06950) . \n", "We will then use transfer learning with our own custom action dataset \n", "\n", "(In this case, we select 101 classes from the UCF101 dataset -https://www.crcv.ucf.edu/research/data-sets/ucf101/\n", "\n", "1) We will fine-tune the pre-trained model with this custom dataset to learn the typical video patterns belonging to these 101 action classes.\n", "\n", "2) We will then deploy this model and host it on a sagemaker endpoint. \n", "\n", "3) Finally, we will make a inference request for a test video. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Install and import the required gluoncv library " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "!pip install gluoncv" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import boto3, re, os\n", "import numpy as np\n", "import uuid\n", "\n", "import mxnet as mx\n", "from mxnet import gluon, nd, image\n", "from mxnet.gluon.data.vision import transforms\n", "from mxnet import gluon\n", "\n", "from gluoncv.data.transforms import video\n", "from gluoncv import utils\n", "from gluoncv.model_zoo import get_model\n", "from gluoncv import utils\n", "from gluoncv.utils import export_block\n", "\n", "import sagemaker\n", "from sagemaker import get_execution_role\n", "from sagemaker.mxnet import MXNet\n", "\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "sagemaker_session = sagemaker.Session()\n", "role = get_execution_role()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Check the mxnet framework version = 1.6.0" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "mx.__version__" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Data preparation\n", "\n", "Load the UCF101 dataset as described in the gluoncv guide here https://gluon-cv.mxnet.io/build/examples_datasets/ucf101.html#sphx-glr-build-examples-datasets-ucf101-py\n", "\n", "Note : We are downloading only a tiny fraction of the entire UCF101 dataset here. You can modify the script flag below to download the entire dataset. The entire dataset size is 6.5 GB and will require update to the default volume size attached to the notebook instance. " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%%capture\n", "!pip install rarfile --user\n", "!pip install Cython --user\n", "!pip install mmcv --user\n", "!pip install torch --user\n", "!python data-prep-code/ucf101.py --tiny_dataset" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "1) Raw frames have been extracted from the videos in a folder for each video. \n", "\n", "2) A settings file has been generated. There are three items in each line, separated by spaces. The first item is the path to your training videos, e.g., video_001. It should be a folder containing the frames of video_001.mp4. The second item is the number of frames in each video, e.g., 200. The third item is the label of the videos, e.g., 0.\n", "\n", "Upload the raw frames and the settings list to S3 (can take upto 15 minutes)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import time\n", "print(time.time())\n", "sagemaker_session.upload_data(path='datasets/ucf101/rawframes/', key_prefix='data/ucf101-tiny/rawframes')\n", "print(time.time())\n", "sagemaker_session.upload_data(path='datasets/ucf101/ucfTrainTestlist/', key_prefix='data/ucf101-tiny/ucfTrainTestlist')\n", "print(time.time())" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "bucket_name=sagemaker_session.default_bucket()\n", "inputs = 's3://' + bucket_name + '/data/ucf101-tiny'\n", "\n", "output_path = 'i3d_transfer_learning/output/'\n", "code_location = 'i3d_transfer_learning/code/'" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Transfer Learning \n", "Transfer learning focuses on storing knowledge gained while solving one task and applying it to a different but related task. \n", "\n", "I3D (Inflated 3D Networks) is a widely adopted 3D video classification network. It uses 3D convolution to learn spatiotemporal information directly from videos. I3D is proposed to improve C3D (Convolutional 3D Networks) by inflating from 2D models. We can not only reuse the 2D models’ architecture (e.g., ResNet, Inception), but also bootstrap the model weights from 2D pretrained models. In this manner, training 3D networks for video classification is feasible and getting much better results.\n", "\n", "In this example, we use Inflated 3D model (I3D) with ResNet50 backbone trained on Kinetics400 dataset.\n", "\n", "Dataset size is a big factor in the performance of deep learning models. Kinetics400 has 306,245 short trimmed videos from 400 action categories. However, most often we dont have so much labeled data in another domain. Training a deep learning model on small datasets may lead to severe overfitting. \n", "\n", "Transfer learning is a technique that addresses this problem. The idea is simple: start training with a pre-trained model, instead of starting from scratch. For simple fine-tuning, just replace the last classification (dense) layer to the number of classes in the dataset. We can obtain good models on our own data without large annotated datasets and with less computing resource utilization for training.\n", "\n", "Review the following training script as the entrypoint script to the MXNet estimator framework. The script executes training with the following steps : \n", "\n", "1) Data transformation \n", "\n", " The transformation function does three things: center crop the image to 224x224 in size, transpose it to num_channels*num_frames*height*width, and normalize with mean and standard deviation calculated across all ImageNet images.\n", "\n", "2) Data loader\n", "\n", "Use the general gluoncv dataloader VideoClsCustom to load the data with num_frames = 32 as the length. For another dataset, you can just replace the value of root and setting to your data directory and your prepared text file.\n", "\n", "3) Model training \n", "\n", "a) Load the pre-trained model.\n", "\n", "b) Load input hyperparameters and number of action classes.\n", "\n", "c) Re-define the output layer for the new task. In GluonCV, you can get your customized model with one line of code.\n", "\n", "d) Define optimizer, loss and metric. Train the network for the new dataset." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "# Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.\r\n", "# SPDX-License-Identifier: MIT-0\r\n", "\r\n", "from __future__ import print_function\r\n", "\r\n", "import argparse\r\n", "import logging\r\n", "import os\r\n", "import numpy as np\r\n", "import json\r\n", "import time\r\n", "\r\n", "import mxnet as mx\r\n", "from mxnet import gluon\r\n", "from mxnet.gluon import nn\r\n", "from mxnet import autograd as ag\r\n", "from mxnet.gluon.data.vision import transforms\r\n", "\r\n", "import gluoncv as gcv\r\n", "from gluoncv.data.transforms import video\r\n", "from gluoncv.data import VideoClsCustom\r\n", "from gluoncv.model_zoo import get_model\r\n", "from gluoncv.utils import makedirs, LRSequential, LRScheduler, split_and_load, TrainingHistory\r\n", "\r\n", "logging.basicConfig(level=logging.DEBUG)\r\n", "\r\n", "# ------------------------------------------------------------ #\r\n", "# Training methods #\r\n", "# ------------------------------------------------------------ #\r\n", "\r\n", "\r\n", "def train(args):\r\n", " # SageMaker passes num_cpus, num_gpus and other args we can use to tailor training to\r\n", " # the current container environment\r\n", " num_gpus = mx.context.num_gpus()\r\n", " ctx = [mx.gpu(i) for i in range(num_gpus)] if num_gpus > 0 else [mx.cpu()]\r\n", " # retrieve the hyperparameters we set in notebook (with some defaults)\r\n", " \r\n", " #number of training examples utilized in one iteration.\r\n", " batch_size = args.batch_size\r\n", " #number of times an entire dataset is passed forward and backward through the neural network \r\n", " epochs = args.epochs\r\n", " #tuning parameter in an optimization algorithm that determines the step size at each iteration while moving toward a minimum of a loss function.\r\n", " learning_rate = args.learning_rate\r\n", " #Momentum remembers the update Δ w at each iteration, and determines the next update as a linear combination of the gradient and the previous update\r\n", " momentum = args.momentum\r\n", " #Optimizers are algorithms or methods used to change the attributes of your neural network such as weights and learning rate in order to reduce the losses. \r\n", " optimizer = args.optimizer\r\n", " #after each update, the weights are multiplied by a factor slightly less than 1.\r\n", " wd = args.wd\r\n", " optimizer_params = {'learning_rate': learning_rate, 'wd': wd, 'momentum': momentum}\r\n", " log_interval = args.log_interval\r\n", " \r\n", " #In this example, we use Inflated 3D model (I3D) with ResNet50 backbone trained on Kinetics400 dataset. We want to replace the last classification (dense) layer to the number of classes in the dataset. \r\n", " model_name = 'i3d_resnet50_v1_custom'\r\n", " #number of classes in the dataset\r\n", " nclass = 101\r\n", " #number of workers for the data loader\r\n", " num_workers = 8\r\n", " \r\n", " current_host = args.current_host\r\n", " hosts = args.hosts\r\n", " model_dir = args.model_dir\r\n", " CHECKPOINTS_DIR = '/opt/ml/checkpoints'\r\n", " checkpoints_enabled = os.path.exists(CHECKPOINTS_DIR)\r\n", "\r\n", " data_dir = args.train\r\n", " segments = 'rawframes'\r\n", " train ='ucfTrainTestlist/ucf101_train_split_2_rawframes.txt'\r\n", " \r\n", " #load the data with data loader\r\n", " train_data = load_data(data_dir,batch_size,num_workers,segments,train)\r\n", " # define the network\r\n", " net = define_network(ctx,model_name,nclass)\r\n", " #define the gluon trainer\r\n", " trainer = gluon.Trainer(net.collect_params(), optimizer, optimizer_params)\r\n", " #define loss function\r\n", " loss_fn = gluon.loss.SoftmaxCrossEntropyLoss()\r\n", " #define training metric\r\n", " train_metric = mx.metric.Accuracy()\r\n", " train_history = TrainingHistory(['training-acc'])\r\n", " net.hybridize()\r\n", " #learning rate decay hyperparameters\r\n", " lr_decay_count = 0\r\n", " lr_decay = 0.1\r\n", " lr_decay_epoch = [40, 80, 100]\r\n", " for epoch in range(epochs):\r\n", " tic = time.time()\r\n", " train_metric.reset()\r\n", " train_loss = 0\r\n", "\r\n", " # Learning rate decay\r\n", " if epoch == lr_decay_epoch[lr_decay_count]:\r\n", " trainer.set_learning_rate(trainer.learning_rate*lr_decay)\r\n", " lr_decay_count += 1\r\n", "\r\n", " # Loop through each batch of training data\r\n", " for i, batch in enumerate(train_data):\r\n", " # Extract data and label\r\n", " data = split_and_load(batch[0], ctx_list=ctx, batch_axis=0,even_split=False)\r\n", " label = split_and_load(batch[1], ctx_list=ctx, batch_axis=0,even_split=False)\r\n", "\r\n", " # AutoGrad\r\n", " with ag.record():\r\n", " output = []\r\n", " for _, X in enumerate(data):\r\n", " X = X.reshape((-1,) + X.shape[2:])\r\n", " pred = net(X)\r\n", " output.append(pred)\r\n", " loss = [loss_fn(yhat, y) for yhat, y in zip(output, label)]\r\n", "\r\n", " # Backpropagation\r\n", " for l in loss:\r\n", " l.backward()\r\n", "\r\n", " # Optimize\r\n", " trainer.step(batch_size)\r\n", "\r\n", " # Update metrics\r\n", " train_loss += sum([l.mean().asscalar() for l in loss])\r\n", " train_metric.update(label, output)\r\n", "\r\n", " if i == 100:\r\n", " break\r\n", "\r\n", " name, acc = train_metric.get()\r\n", "\r\n", " # Update history and print metrics\r\n", " train_history.update([acc])\r\n", " print('[Epoch %d] train=%f loss=%f time: %f' %\r\n", " (epoch, acc, train_loss / (i+1), time.time()-tic))\r\n", "\r\n", " print('saving the model')\r\n", " save(net, model_dir)\r\n", " \r\n", "def save(net, model_dir):\r\n", " # save the model\r\n", " net.export('%s/model'% model_dir)\r\n", "\r\n", "\r\n", "def define_network(ctx,model_name,nclass):\r\n", " #In GluonCV, we can get a customized model with one line of code.\r\n", " net = get_model(name=model_name, nclass=nclass)\r\n", " net.collect_params().reset_ctx(ctx)\r\n", " print(net)\r\n", " return net\r\n", "\r\n", "\r\n", "def load_data(data_dir, batch_size,num_workers,segments,train):\r\n", "\r\n", " #The transformation function does three things: center crop the image to 224x224 in size, transpose it to num_channels,num_frames,height*width, and normalize with mean and standard deviation calculated across all ImageNet images.\r\n", "\r\n", " #Use the general gluoncv dataloader VideoClsCustom to load the data with num_frames = 32 as the length. For another dataset, you can just replace the value of root and setting to your data directory and your prepared text file.\r\n", " \r\n", " transform_train = video.VideoGroupTrainTransform(size=(224, 224), scale_ratios=[1.0, 0.8], mean=[0.485, 0.456, 0.406], \r\n", " std=[0.229, 0.224, 0.225])\r\n", " train_dataset = VideoClsCustom(root=data_dir + '/' + \r\n", " segments,setting=data_dir + '/' + train,train=True,new_length=32,transform=transform_train)\r\n", " print(os.listdir(data_dir+ '/' + segments))\r\n", " print('Load %d training samples.' % len(train_dataset))\r\n", " return gluon.data.DataLoader(train_dataset, batch_size=batch_size,\r\n", " shuffle=True, num_workers=num_workers)\r\n", "\r\n", "\r\n", "\r\n", "# ------------------------------------------------------------ #\r\n", "# Training execution #\r\n", "# ------------------------------------------------------------ #\r\n", "\r\n", "def parse_args():\r\n", " parser = argparse.ArgumentParser()\r\n", "\r\n", " parser.add_argument('--batch-size', type=int, default=8)\r\n", " parser.add_argument('--epochs', type=int, default=10)\r\n", " parser.add_argument('--learning-rate', type=float, default=0.001)\r\n", " parser.add_argument('--momentum', type=float, default=0.9)\r\n", " parser.add_argument('--wd', type=float, default=0.0001)\r\n", " parser.add_argument('--log-interval', type=float, default=100)\r\n", "\r\n", " parser.add_argument('--optimizer', type=str, default='sgd')\r\n", " parser.add_argument('--model-dir', type=str, default=os.environ['SM_MODEL_DIR'])\r\n", " parser.add_argument('--train', type=str, default=os.environ['SM_CHANNEL_TRAINING'])\r\n", "\r\n", " parser.add_argument('--current-host', type=str, default=os.environ['SM_CURRENT_HOST'])\r\n", " parser.add_argument('--hosts', type=list, default=json.loads(os.environ['SM_HOSTS']))\r\n", "\r\n", " return parser.parse_args()\r\n", "\r\n", "\r\n", "if __name__ == '__main__':\r\n", " args = parse_args()\r\n", "\r\n", " train(args)" ] } ], "source": [ "!cat transfer-learning-code/transfer_learning.py" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Define the MXNet estimator to prepare for training. We use a p3 instance 'ml.p3.2xlarge' here to demonstrate gpu based training. You can update the instance type based on your dataset size and expected training times.\n", "\n", "Training time recorded for the current dataset with 'ml.p3.2xlarge' is approximately 5 minutes.\n", "\n", "Instance types for SageMaker are available here https://aws.amazon.com/sagemaker/pricing/instance-types/" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "m = MXNet(\"transfer_learning.py\",\n", " source_dir=\"transfer-learning-code/\",\n", " debugger_hook_config=False,\n", " role=role,\n", " output_path='s3://' + bucket_name + '/' + output_path,\n", " code_location='s3://' + bucket_name + '/' + code_location,\n", " train_instance_count=1,\n", " train_instance_type=\"ml.p3.2xlarge\",\n", " framework_version=\"1.6.0\",\n", " py_version=\"py3\",\n", " hyperparameters={'batch-size': 8,\n", " 'epochs': 20,\n", " 'learning-rate': 0.001,\n", " 'wd': 0.0001,\n", " 'momentum': 0.9, \n", " 'log-interval': 100})" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Launch a training job " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "JOB_NAME=str(uuid.uuid4())\n", "print(JOB_NAME)\n", "m.fit(inputs,job_name=JOB_NAME)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Model Inference\n", "\n", "First, create a MXNet SageMaker Model that can be deployed to a SageMaker Endpoint. By default, this will use the SageMaker MXNet Inference toolkit for serving MXNet models on Amazon SageMaker. \n", "\n", "1) This will use a default framework image for MXNet version specified.\n", "\n", "2) Provide the S3 location of the SageMaker model data .tar.gz file.\n", "\n", "3) Provide the path to the Python inference file which should be executed as the entry point to model hosting.\n", "\n", "4) Number of model server workers set to 10 to process parallel invocation requests" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from sagemaker.mxnet.model import MXNetModel\n", "sagemaker_model = MXNetModel(model_data = 's3://' + bucket_name + '/' + output_path + JOB_NAME + '/output/model.tar.gz', source_dir='inference-code/',\n", " role = role,framework_version='1.6.0',py_version='py3',entry_point='inference.py',model_server_workers=10,name='sagemaker-activity-detection-model-{0}'.format(str(int(time.time()))))\n", "print(sagemaker_model.name)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Entrypoint script for inference. \n", "\n", "Input : Preprocessed 5D video clip (batch,channels,frames,height,width)\n", "\n", "Output : UCF101 class prediction and probability\n", "\n", "The script executes the same preprocessing and inference with the following steps :\n", "\n", "1) Data transformation\n", "\n", "The transformation function does three things: center crop the image to 224x224 in size, transpose it to num_channelsnum_framesheight*width, and normalize with mean and standard deviation calculated across all ImageNet images.\n", "\n", "2) Data loader\n", "\n", "Use the general gluoncv dataloader VideoClsCustom to load the data with num_frames = 32 as the length.\n", "\n", "3) Load the gluon model in GPU memory on initialization.\n", "\n", "4) Trigger model invocation with the input payload" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "# Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.\r\n", "# SPDX-License-Identifier: MIT-0\r\n", "\r\n", "from __future__ import absolute_import\r\n", "\r\n", "import subprocess\r\n", "import sys\r\n", "import io\r\n", "import os\r\n", "import boto3\r\n", "import time\r\n", "import json\r\n", "import uuid\r\n", "\r\n", "import mxnet as mx\r\n", "import numpy as np\r\n", "from mxnet import gluon,nd\r\n", "from sagemaker_inference import content_types, default_inference_handler, errors\r\n", "from io import BytesIO\r\n", "from datetime import datetime\r\n", "\r\n", "\r\n", "import gluoncv\r\n", "from gluoncv.data.transforms import video\r\n", "from gluoncv.data import VideoClsCustom\r\n", "from gluoncv.utils.filesystem import try_import_decord\r\n", "\r\n", "ctx = mx.gpu(0) if mx.context.num_gpus() > 0 else mx.cpu()\r\n", "#UCF101 classes\r\n", "classes = ['ApplyEyeMakeup'\r\n", ",'ApplyLipstick'\r\n", ", 'Archery'\r\n", ", 'BabyCrawling'\r\n", ", 'BalanceBeam'\r\n", ", 'BandMarching'\r\n", ", 'BaseballPitch'\r\n", ", 'Basketball'\r\n", ", 'BasketballDunk'\r\n", ", 'BenchPress'\r\n", ", 'Biking'\r\n", ", 'Billiards'\r\n", ", 'BlowDryHair'\r\n", ", 'BlowingCandles'\r\n", ", 'BodyWeightSquats'\r\n", ", 'Bowling'\r\n", ", 'BoxingPunchingBag'\r\n", ", 'BoxingSpeedBag'\r\n", ", 'BreastStroke'\r\n", ", 'BrushingTeeth'\r\n", ", 'CleanAndJerk'\r\n", ", 'CliffDiving'\r\n", ", 'CricketBowling'\r\n", ", 'CricketShot'\r\n", ", 'CuttingInKitchen'\r\n", ", 'Diving'\r\n", ", 'Drumming'\r\n", ", 'Fencing'\r\n", ", 'FieldHockeyPenalty'\r\n", ", 'FloorGymnastics'\r\n", ", 'FrisbeeCatch'\r\n", ", 'FrontCrawl'\r\n", ", 'GolfSwing'\r\n", ", 'Haircut'\r\n", ", 'Hammering'\r\n", ", 'HammerThrow'\r\n", ", 'HandstandPushups'\r\n", ", 'HandstandWalking'\r\n", ", 'HeadMassage'\r\n", ", 'HighJump'\r\n", ", 'HorseRace'\r\n", ", 'HorseRiding'\r\n", ", 'HulaHoop'\r\n", ", 'IceDancing'\r\n", ", 'JavelinThrow'\r\n", ", 'JugglingBalls'\r\n", ", 'JumpingJack'\r\n", ", 'JumpRope'\r\n", ", 'Kayaking'\r\n", ", 'Knitting'\r\n", ", 'LongJump'\r\n", ", 'Lunges'\r\n", ", 'MilitaryParade'\r\n", ", 'Mixing'\r\n", ", 'MoppingFloor'\r\n", ", 'Nunchucks'\r\n", ", 'ParallelBars'\r\n", ", 'PizzaTossing'\r\n", ", 'PlayingCello'\r\n", ", 'PlayingDaf'\r\n", ", 'PlayingDhol'\r\n", ", 'PlayingFlute'\r\n", ", 'PlayingGuitar'\r\n", ", 'PlayingPiano'\r\n", ", 'PlayingSitar'\r\n", ", 'PlayingTabla'\r\n", ", 'PlayingViolin'\r\n", ", 'PoleVault'\r\n", ", 'PommelHorse'\r\n", ", 'PullUps'\r\n", ", 'Punch'\r\n", ", 'PushUps'\r\n", ", 'Rafting'\r\n", ", 'RockClimbingIndoor'\r\n", ", 'RopeClimbing'\r\n", ", 'Rowing'\r\n", ", 'SalsaSpin'\r\n", ", 'ShavingBeard'\r\n", ", 'Shotput'\r\n", ", 'SkateBoarding'\r\n", ", 'Skiing'\r\n", ", 'Skijet'\r\n", ", 'SkyDiving'\r\n", ", 'SoccerJuggling'\r\n", ", 'SoccerPenalty'\r\n", ", 'StillRings'\r\n", ", 'SumoWrestling'\r\n", ", 'Surfing'\r\n", ", 'Swing'\r\n", ", 'TableTennisShot'\r\n", ", 'TaiChi'\r\n", ", 'TennisSwing'\r\n", ", 'ThrowDiscus'\r\n", ", 'TrampolineJumping'\r\n", ", 'Typing'\r\n", ", 'UnevenBars'\r\n", ", 'VolleyballSpiking'\r\n", ", 'WalkingWithDog'\r\n", ", 'WallPushups'\r\n", ", 'WritingOnBoard'\r\n", ", 'YoYo']\r\n", "dict_classes = dict(zip(range(len(classes)), classes))\r\n", "# ------------------------------------------------------------ #\r\n", "# Hosting methods #\r\n", "# ------------------------------------------------------------ #\r\n", "\r\n", "def model_fn(model_dir):\r\n", " print('here')\r\n", " print(ctx)\r\n", " symbol = mx.sym.load('%s/model-symbol.json' % model_dir)\r\n", " outputs = mx.symbol.softmax(data=symbol, name='softmax_label')\r\n", " inputs = mx.sym.var('data')\r\n", " net = gluon.SymbolBlock(outputs, inputs)\r\n", " net.load_parameters('%s/model-0000.params' % model_dir, ctx=ctx)\r\n", " return net\r\n", "\r\n", "#transform function that uses json (s3 path) as input and output\r\n", "def transform_fn(net, data, input_content_type, output_content_type):\r\n", " print('transform_fn here')\r\n", " start = time.time()\r\n", " data = json.loads(data)\r\n", " video_data = read_video_data(data['S3_VIDEO_PATH'])\r\n", " print(time.time())\r\n", " video_input = video_data.as_in_context(ctx)\r\n", " probs = net(video_input.astype('float32', copy=False))\r\n", " print(time.time())\r\n", " predicted = mx.nd.argmax(probs, axis=1).asnumpy().tolist()[0]\r\n", " probability = mx.nd.max(probs, axis=1).asnumpy().tolist()[0]\r\n", " \r\n", " probability = '{:.4f}'.format(probability)\r\n", " predicted_name = dict_classes[int(predicted)]\r\n", " total_prediction = time.time()-start\r\n", " total_prediction = '{:.4f}'.format(total_prediction)\r\n", " print(probability)\r\n", " print(predicted_name)\r\n", " print('Model prediction time: ', total_prediction)\r\n", " \r\n", " now = datetime.utcnow()\r\n", " time_format = '%Y-%m-%d %H:%M:%S %Z%z'\r\n", " now = now.strftime(time_format)\r\n", "\r\n", " response = {\r\n", " 'S3Path': {'S': data['S3_VIDEO_PATH']},\r\n", " 'Predicted': {'S': predicted_name},\r\n", " 'Probability': {'S': probability},\r\n", " 'DateCreatedUTC': {'S': now},\r\n", " }\r\n", "\r\n", " return json.dumps(response), output_content_type\r\n", "\r\n", "def get_bucket_and_key(s3_path):\r\n", " \"\"\"Get the bucket name and key from the given path.\r\n", " Args:\r\n", " s3_path(str): Input S3 path\r\n", " \"\"\"\r\n", " s3_path = s3_path.replace('s3://', '')\r\n", " s3_path = s3_path.replace('S3://', '') #Both cases\r\n", " bucket, key = s3_path.split('/', 1)\r\n", " return bucket, key\r\n", "\r\n", "\r\n", "\r\n", "def read_video_data(s3_video_path, num_frames=32):\r\n", " \"\"\"Read and preprocess video data from the S3 bucket.\"\"\"\r\n", " print('read and preprocess video data here ')\r\n", " s3_client = boto3.client('s3')\r\n", " #print(uuid.uuid4())\r\n", " fname = s3_video_path.replace('s3://', '')\r\n", " fname = fname.replace('S3://', '')\r\n", " fname = fname.replace('/', '')\r\n", " #download_path = '/tmp/{}-{}'.format(uuid.uuid4(), fname)\r\n", " #video_list_path = '/tmp/{}-{}'.format(uuid.uuid4(), 'video_list.txt')\r\n", " download_path = '/tmp/' + fname\r\n", " video_list_path = '/tmp/video_list' + str(uuid.uuid4()) + '.txt' \r\n", " bucket, key = get_bucket_and_key(s3_video_path)\r\n", " s3_client.download_file(bucket, key, download_path)\r\n", " \r\n", " #update download_path filename to be unique\r\n", " filename,ext = os.path.splitext(download_path) # save the file extension\r\n", " filename = filename + str(uuid.uuid4())\r\n", " os.rename(download_path, filename+ext)\r\n", " download_path = filename+ext\r\n", " \r\n", " #Dummy duration and label with each video path\r\n", " video_list = '{} {} {}'.format(download_path, 10, 1)\r\n", " with open(video_list_path, 'w') as fopen:\r\n", " fopen.write(video_list)\r\n", "\r\n", " #Constants\r\n", " data_dir = '/tmp/'\r\n", " num_segments = 1\r\n", " new_length = num_frames\r\n", " new_step =1\r\n", " use_decord = True\r\n", " video_loader = True\r\n", " slowfast = False\r\n", " #Preprocessing params \r\n", " \r\n", " #The transformation function does three things: center crop the image to 224x224 in size, transpose it to num_channels,num_frames,height*width, and normalize with mean and standard deviation calculated across all ImageNet images.\r\n", "\r\n", " #Use the general gluoncv dataloader VideoClsCustom to load the data with num_frames = 32 as the length.\r\n", " input_size = 224\r\n", " mean = [0.485, 0.456, 0.406]\r\n", " std=[0.229, 0.224, 0.225]\r\n", "\r\n", " transform = video.VideoGroupValTransform(size=input_size, mean=mean, std=std)\r\n", " video_utils = VideoClsCustom(root=data_dir,\r\n", " setting=video_list_path,\r\n", " num_segments=num_segments,\r\n", " new_length=new_length,\r\n", " new_step=new_step,\r\n", " video_loader=video_loader,\r\n", " use_decord=use_decord,\r\n", " slowfast=slowfast)\r\n", " \r\n", " #Read for the video list\r\n", " video_name = video_list.split()[0]\r\n", "\r\n", " decord = try_import_decord()\r\n", " decord_vr = decord.VideoReader(video_name)\r\n", " duration = len(decord_vr)\r\n", "\r\n", " skip_length = new_length * new_step\r\n", " segment_indices, skip_offsets = video_utils._sample_test_indices(duration)\r\n", "\r\n", " if video_loader:\r\n", " if slowfast:\r\n", " clip_input = video_utils._video_TSN_decord_slowfast_loader(video_name, decord_vr, \r\n", " duration, segment_indices, skip_offsets)\r\n", " else:\r\n", " clip_input = video_utils._video_TSN_decord_batch_loader(video_name, decord_vr, \r\n", " duration, segment_indices, skip_offsets)\r\n", " else:\r\n", " raise RuntimeError('We only support video-based inference.')\r\n", "\r\n", " clip_input = transform(clip_input)\r\n", "\r\n", " if slowfast:\r\n", " sparse_sampels = len(clip_input) // (num_segments * num_crop)\r\n", " clip_input = np.stack(clip_input, axis=0)\r\n", " clip_input = clip_input.reshape((-1,) + (sparse_sampels, 3, input_size, input_size))\r\n", " clip_input = np.transpose(clip_input, (0, 2, 1, 3, 4))\r\n", " else:\r\n", " clip_input = np.stack(clip_input, axis=0)\r\n", " clip_input = clip_input.reshape((-1,) + (new_length, 3, input_size, input_size))\r\n", " clip_input = np.transpose(clip_input, (0, 2, 1, 3, 4))\r\n", "\r\n", " if new_length == 1:\r\n", " clip_input = np.squeeze(clip_input, axis=2) # this is for 2D input case\r\n", "\r\n", " clip_input = nd.array(clip_input)\r\n", " \r\n", " #Cleanup temp files\r\n", " os.remove(download_path)\r\n", " os.remove(video_list_path)\r\n", " #os.system('rm {}'.format(download_path))\r\n", " #os.system('rm {}'.format(video_list_path))\r\n", "\r\n", " return clip_input\r\n", "\r\n", "\r\n" ] } ], "source": [ "!cat inference-code/inference.py" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Model hosting \n", "\n", "Deploy the model on a single g4dn instance. \n", "\n", "G4 is a good platform for ML inference on images at low cost. G4 is based on the Turing T4 GPU which is purposed built with RTX tracing cores, tensor cores. Here is a link to inference benchmarks from Nvidia\n", "https://developer.nvidia.com/deep-learning-performance-training-inference .\n", "G4 prove to have similar throughput with higher energy efficiency wrt P3 instances, which means they are a good choice for inference tasks at a low cost." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import logging\n", "logging.getLogger().setLevel(logging.WARNING)\n", "#Instance type used for deployment\n", "MODEL_INSTANCE_TYPE = 'ml.g4dn.2xlarge'\n", "#Number of instances used for deployment (could be increased based on the prediction requests)\n", "INSTANCE_COUNT = 1\n", "#Model endpoint name\n", "ENDPOINT_NAME = 'activity-detection-model-endpoint'\n", "predictor = sagemaker_model.deploy(initial_instance_count=INSTANCE_COUNT,instance_type=MODEL_INSTANCE_TYPE,endpoint_name=ENDPOINT_NAME)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "sm_client = boto3.client('sagemaker')\n", "sm_client.describe_endpoint(EndpointName=ENDPOINT_NAME)['EndpointArn']" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Video inference test\n", "\n", "Test ML inference on videos from another free video data source (Pexels)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#https://www.pexels.com/video/men-playing-tennis-at-daylight-992695/\n", "payload1 = sagemaker_session.upload_data(path='../videos/PexelsVideos992695.mp4', key_prefix='data/ucf101')\n", "S3_VIDEO_PATH = payload1\n", "\n", "#Dict data to be passed to the endpoint\n", "data1 = {\n", " 'S3_VIDEO_PATH': S3_VIDEO_PATH,\n", "}" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#https://www.pexels.com/video/people-skiing-857074/\n", "payload2 = sagemaker_session.upload_data(path='../videos/PeopleSkiing.mp4', key_prefix='data/ucf101')\n", "S3_VIDEO_PATH = payload2\n", "\n", "#Dict data to be passed to the endpoint\n", "data2 = {\n", " 'S3_VIDEO_PATH': S3_VIDEO_PATH,\n", "}" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Invoke endpoint and print results from the API with details \n", "1. S3 input path\n", "2. Output class\n", "3. Output probability\n", "4. Time of detection" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "print(ENDPOINT_NAME)\n", "import time\n", "import json\n", "print(time.time())\n", "sm_runtime = boto3.Session().client('sagemaker-runtime')\n", "response = sm_runtime.invoke_endpoint(EndpointName=ENDPOINT_NAME, ContentType='application/json',Accept='application/json',Body=json.dumps(data1))\n", "print(time.time())\n", "#Get and print the response\n", "response_body = json.loads(response['Body'].read().decode('utf-8'))\n", "print(response_body)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "print(ENDPOINT_NAME)\n", "import time\n", "import json\n", "print(time.time())\n", "sm_runtime = boto3.Session().client('sagemaker-runtime')\n", "response = sm_runtime.invoke_endpoint(EndpointName=ENDPOINT_NAME, ContentType='application/json',Accept='application/json',Body=json.dumps(data2))\n", "print(time.time())\n", "#Get and print the response\n", "response_body = json.loads(response['Body'].read().decode('utf-8'))\n", "print(response_body)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "conda_mxnet_p36", "language": "python", "name": "conda_mxnet_p36" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.10" } }, "nbformat": 4, "nbformat_minor": 4 }