# AWS re:Invent 2019
## Train and deploy custom deep learning models with AWS DeepLens and Amazon SageMaker
## Lab 2. Train a Classification Model using Amazon SageMaker

When the notebook launchs for the first time, select `conda_mxnet_p36` for kernel.

In [None]:
# Run only once to install the gluoncv package with the following code:
!pip install gluoncv==0.5.0 
!pip install tqdm

## Download Data

With this step, bear/non-bear image dataset will be downloaded into ./data folder.

In [None]:
import requests
import os
import csv
import zipfile
from tqdm import tqdm
import time

ZIP_FILE = './open_images_bears.zip'
ERRORS_FILE = 'download-errors.txt'
CSV_DIR = './image_csv/'
DATA_DIR = './data/'
if not os.path.isdir(DATA_DIR):
 os.mkdir(DATA_DIR)
 
with zipfile.ZipFile(ZIP_FILE, 'r') as f:
 f.extractall(os.path.expanduser(CSV_DIR))
 
files = list(filter(lambda x: x.endswith('csv'), os.listdir(CSV_DIR)))

f = files[0]
with open(CSV_DIR + f, 'r') as f:
 reader = csv.reader(f)
 records = list(reader)
 
def download(url, path):
 r = requests.get(url, allow_redirects=True)
 if len(r.content) < 1024:
 raise Exception((path.split('/')[-1]).split('.')[0])
 else:
 open(path, 'wb').write(r.content)
 
with open(ERRORS_FILE,'w') as f:
 f.write('')
for idx,fn in enumerate(files):
 print('{}/{} {} is being processed.'.format(idx, len(files), fn))
 time.sleep(1)
 with open(CSV_DIR + fn, 'r') as f:
 reader = csv.reader(f)
 records = list(reader)[1:] # no header row
 stage = fn.split('-')[0]
 lbl = fn.split('-')[1]
 dir_path = DATA_DIR + stage
 if not os.path.isdir(dir_path):
 os.mkdir(dir_path)
 dir_path = DATA_DIR + '{}/{}'.format(stage,lbl)
 if not os.path.isdir(dir_path):
 os.mkdir(dir_path)
 
 cnt = 0 
 for row in tqdm(records):
 path = dir_path + '/{}.jpg'.format(row[0])
 try:
 # If thumnail url is empty, download original url
 if not row[13]:
 download(row[5], path)
 else:
 download(row[13], path)
 except Exception as e:
 with open(ERRORS_FILE,'a') as f:
 f.write(e.args[0]+'\n')


## Upload data to S3

In [None]:
import os
import glob
import boto3

s3_bucket = '__PUT-S3-BUCKET-NAME-WHICH-WAS-CREATED-IN-LAB1__'

SEARCH_CRITERION = '**/*.jpg'
train_images = glob.glob(os.path.join(DATA_DIR + 'train', SEARCH_CRITERION), recursive=True)
val_images = glob.glob(os.path.join(DATA_DIR + 'val', SEARCH_CRITERION), recursive=True)
test_images = glob.glob(os.path.join(DATA_DIR + 'test', SEARCH_CRITERION), recursive=True)
prefix = 'bear'

for im_name in tqdm(train_images):
 boto3.Session().resource('s3').Bucket(s3_bucket).Object(os.path.join(prefix, 'train' + im_name.split('train')[1])).upload_file(im_name)
for im_name in tqdm(val_images):
 boto3.Session().resource('s3').Bucket(s3_bucket).Object(os.path.join(prefix, 'val' + im_name.split('val')[1])).upload_file(im_name)
for im_name in tqdm(test_images):
 boto3.Session().resource('s3').Bucket(s3_bucket).Object(os.path.join(prefix, 'test' + im_name.split('test')[1])).upload_file(im_name)


Now we have all the data stored in S3 bucket. 

## Fine-tuning the Image Classification Model
Now that we are done with all the setup that is needed, we are ready to train our object detector. To begin, let us create a sagemaker.estimator.Estimator object. This estimator will launch the training job.

### Bring your own script (BYOS)
Amazon SageMaker provides pre-built containers to supports deep learning frameworks such as Apache MXNet, TensorFlow, PyTorch, and Chainer. We are going to bring in **bear-classification.py**, which is a image classification script using Gluon CV toolkit (Apache MXNet). The SageMaker MXNet estimator allows us to run single machine or distributed training in SageMaker, using CPU or GPU-based instances.


### Training parameters
There are parameters for the training job. These include:

* **Training instance count**: This is the number of instances on which to run the training. When the number of instances is greater than one, then the image classification algorithm will run in distributed settings.
* **Training instance type**: This indicates the type of machine on which to run the training. Typically, we use GPU instances for these training
* **Output path**: This the s3 folder in which the training output is stored

In [None]:
import sagemaker
from sagemaker.mxnet import MXNet

data_channels = 's3://' + s3_bucket + '/' + prefix
model_artifacts_location = 's3://' + s3_bucket + '/model_dir'

gluon_bear_classification = MXNet("bear-classification.py", 
 role=sagemaker.get_execution_role(), 
 train_instance_count=1,
 train_instance_type="ml.p2.xlarge",
 output_path= model_artifacts_location,
 framework_version="1.4.1",
 py_version = "py3",
 hyperparameters={'batch-size': 128, 
 'epochs': 10}) 

## Running the Training Job
Once MXNet object is constructed, we can fit it using data stored in S3.

During training, SageMaker makes this data stored in S3 available in the local filesystem where the bear classification script is running. The **bear-classification.py** script simply loads the train and test data from disk.

In [None]:
gluon_bear_classification.fit(data_channels)

### Once the training is done, you can proceed to Lab3 

## (Optional) Optimize your edge deployment model using Amazon SageMaker Neo

You can also optimize the trained model using Amazon SageMaker Neo. This is a sample script for that. 

```python
neo_model = gluon_bear_classification.compile_model(target_instance_family='deeplens', 
 input_shape={'data':[1, 3, 224, 224]}, # Batch size 1, 3 channels, 224x224 Images.
 output_path=model_artifacts_location + '/neo',
 framework='mxnet', framework_version='1.4.1')
```

## (Optional) Creating an inference Endpoint

After training, we use the MXNet estimator object to build and deploy an MXNetPredictor. This creates a Sagemaker Endpoint -- a hosted prediction service that we can use to perform inference.

The arguments to the deploy function allow us to set the number and type of instances that will be used for the Endpoint. These do not need to be the same as the values we used for the training job. For example, you can train a model on a set of GPU-based instances, and then deploy the Endpoint to a fleet of CPU-based instances. Here we will deploy the model to a single ml.c4.xlarge instance.

In [None]:
from sagemaker.mxnet.model import MXNetModel

training_job_name = gluon_bear_classification.latest_training_job.name
sagemaker_model = MXNetModel(model_data= model_artifacts_location + '/{}/output/model.tar.gz'.format(training_job_name),
 role=sagemaker.get_execution_role(),
 framework_version="1.4.1",
 py_version='py3',
 entry_point="bear-classification.py")

In [None]:
predictor = sagemaker_model.deploy(initial_instance_count=1,
 instance_type='ml.c4.xlarge')

## Predict with the finetuned model

We can test the performance using finetuned weights.

In [None]:
import json
import numpy as np
from PIL import Image as PILImage

def test_image(filename):
 data = PILImage.open(file_name)

 predictor.content_type = 'application/json'
 predictor.accept = 'application/json'
 
 payload = np.expand_dims((np.asarray(data.resize((224,224))).astype('float16')/255.0).transpose((2,0,1)),0)
 result = predictor.predict(payload)[0]
 # the result will output the probabilities for all classes
 # find the class with maximum probability and print the class index
 index = np.argmax(result)
 object_categories = ['brown','no','polar']
 print("Result: label - " + object_categories[index] + ", probability - " + str(result[index]))

In [None]:
!wget -O /tmp/test.jpg https://19mvmv3yn2qc2bdb912o1t2n-wpengine.netdna-ssl.com/science/files/2013/12/tnc_17745326_preview-1260x708.jpg
file_name = '/tmp/test.jpg'
test_image(file_name)
from IPython.display import Image
Image(file_name)

In [None]:
!wget -O /tmp/test_2.jpg https://www.nps.gov/lacl/learn/nature/images/Image-w-cred-cap_-1200w-_-Brown-Bear-page_-brown-bear-in-fog_2.jpg

file_name = '/tmp/test_2.jpg'
test_image(file_name)
from IPython.display import Image
Image(file_name)

In [None]:
!wget -O /tmp/test_3.jpg https://www.dollargeneral.com/media/catalog/product/cache/image/beff4985b56e3afdbeabfc89641a4582/p/l/plush_teddy-bear_giant_092018.jpg
file_name = '/tmp/test_3.jpg'
test_image(file_name)
from IPython.display import Image
Image(file_name)

## Cleanup
After you have finished with this example, remember to delete the prediction endpoint to release the instance(s) associated with it.

In [None]:
predictor.delete_endpoint()

## (Optional) Lab 2. Train a Classification Model Deep-Dive

## Install GluonCV and the required python packages.
See the link below for GluonCV's `model_zoo` and `utils` packages.
- `model_zoo`: [https://gluon-cv.mxnet.io/model_zoo/index.html](https://gluon-cv.mxnet.io/model_zoo/index.html)
- `utils`: [https://gluon-cv.mxnet.io/api/utils.html](https://gluon-cv.mxnet.io/api/utils.html)

In [None]:
# Run only once to install the gluoncv package with the following code:
!pip install gluoncv==0.5.0 
!pip install tqdm

Hyperparameters
----------

First, let's import all other necessary libraries.



In [None]:
import mxnet as mx
import numpy as np
import os, time, shutil

from mxnet import gluon, image, init, nd
from mxnet import autograd as ag
from mxnet.gluon import nn
from mxnet.gluon.data.vision import transforms
from gluoncv.utils import makedirs
from gluoncv.model_zoo import get_model

We set the hyperparameters as following:



In [None]:
# class: brown, polar, no bear
classes = 3
epochs = 10
lr = 0.001
per_device_batch_size = 128
num_gpus = len(list(mx.test_utils.list_gpus()))
num_workers = 8
ctx = [mx.gpu(i) for i in range(num_gpus)] if num_gpus > 0 else [mx.cpu()]
batch_size = per_device_batch_size * max(num_gpus, 1)

Things to keep in mind:

1. You can change the `epochs` value to a larger number in your experiments.

2. `per_device_batch_size` can be a larger number. If you get `cudaMalloc failed: out of memory` error at the training loop, try to decrease this value (e.g. 128).

3. remember to tune `num_gpus` and `num_workers` according to your machine.

4. A pre-trained model is already in a pretty good status. So we can start with a small `lr`.

Data Augmentation
-----------------

In transfer learning, data augmentation can also help.
We use the following augmentation in training:

1. Randomly crop the image and resize it to 224x224
2. Randomly flip the image horizontally
3. Randomly jitter color and add noise
4. Transpose the data from height \* width \* num_channels to num_channels \* height \* width, and map values from [0, 255] to [0, 1]



In [None]:
jitter_param = 0.4
lighting_param = 0.1

transform_train = transforms.Compose([
 transforms.RandomResizedCrop(224),
 transforms.RandomFlipLeftRight(),
 transforms.RandomColorJitter(brightness=jitter_param, 
 contrast=jitter_param,
 saturation=jitter_param),
 transforms.RandomLighting(lighting_param),
 transforms.ToTensor(),
])

transform_test = transforms.Compose([
 transforms.Resize(256),
 transforms.CenterCrop(224),
 transforms.ToTensor(),
])

With the data augmentation functions, we can define our data loaders:



In [None]:
path = 'data'

train_path = os.path.join(path, 'train')
val_path = os.path.join(path, 'val')
test_path = os.path.join(path, 'test')

train_data = gluon.data.DataLoader(
 gluon.data.vision.ImageFolderDataset(train_path).transform_first(transform_train),
 batch_size=batch_size, shuffle=True, num_workers=num_workers)

val_data = gluon.data.DataLoader(
 gluon.data.vision.ImageFolderDataset(val_path).transform_first(transform_test),
 batch_size=batch_size, shuffle=False, num_workers = num_workers)

test_data = gluon.data.DataLoader(
 gluon.data.vision.ImageFolderDataset(test_path).transform_first(transform_test),
 batch_size=batch_size, shuffle=False, num_workers = num_workers)

Note that only ``train_data`` uses ``transform_train``, while
``val_data`` and ``test_data`` use ``transform_test`` to produce deterministic
results for evaluation.

Model and Trainer
-----------------

We use a pre-trained [``MobileNet1.0``](https://arxiv.org/pdf/1704.04861.pdf) model, which is useful for mobile and embedded vision applications due to its smaller model size and complexity.

![alt text](https://3.bp.blogspot.com/-ujGePiv1gZ8/WUBjrgwrPmI/AAAAAAAAB14/zOw9URnrMnIbe7Vv8ftYT4PsnH7S-gJIQCLcBGAs/s1600/image1.png "MobileNet1.0")

Figure 1. MobileNet use cases. Reprinted from “MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications,” by Andrew G. Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, Hartwig Adam, 2017, Retrieved from https://arxiv.org/abs/1704.04861. Copyright 2017 by Google.

In [None]:
model_name = 'MobileNet1.0'

Here we introduce a common technique in transfer learning: fine-tuning. As shown in the figure below, **fine-tuning** consists of the following steps:

1. load the pre-trained model (e.g. `MobileNet1.0`)
2. re-define the output layer whose output size is the number of target dataset categories to the target model, and randomly initialize the model parameters of this layer.
3. train the target model on the target dataset.

![alt text](https://www.d2l.ai/_images/finetune.svg "Fine tuning")

In [None]:
finetune_net = get_model(model_name, pretrained=True)

with finetune_net.name_scope():
 finetune_net.output = nn.Dense(classes)
finetune_net.output.initialize(init.Xavier(), ctx = ctx)
finetune_net.collect_params().reset_ctx(ctx)
finetune_net.hybridize()

trainer = gluon.Trainer(finetune_net.collect_params(), 'adam', 
 {'learning_rate': lr})

metric = mx.metric.Accuracy()
L = gluon.loss.SoftmaxCrossEntropyLoss()

We define a evaluation function for validation and testing.

In [None]:
def test(net, val_data, ctx):
 metric = mx.metric.Accuracy()
 for i, batch in enumerate(val_data):
 data = gluon.utils.split_and_load(batch[0], ctx_list=ctx, batch_axis=0, even_split=False)
 label = gluon.utils.split_and_load(batch[1], ctx_list=ctx, batch_axis=0, even_split=False)
 outputs = [net(X) for X in data]
 metric.update(label, outputs)

 return metric.get()

Training Loop
-------------

Following is the main training loop.

In [None]:
num_batch = len(train_data)

for epoch in range(epochs):
 tic = time.time()
 train_loss = 0
 metric.reset()

 for i, batch in enumerate(train_data):
 data = gluon.utils.split_and_load(batch[0], ctx_list=ctx, batch_axis=0, even_split=False)
 label = gluon.utils.split_and_load(batch[1], ctx_list=ctx, batch_axis=0, even_split=False)
 with ag.record():
 outputs = [finetune_net(X) for X in data]
 loss = [L(yhat, y) for yhat, y in zip(outputs, label)]
 for l in loss:
 l.backward()

 trainer.step(batch_size)
 train_loss += sum([l.mean().asscalar() for l in loss]) / len(loss)

 metric.update(label, outputs)

 _, train_acc = metric.get()
 train_loss /= num_batch

 _, val_acc = test(finetune_net, val_data, ctx)

 print('[Epoch %d] Train-acc: %.3f, loss: %.3f | Val-acc: %.3f | time: %.1f' %
 (epoch, train_acc, train_loss, val_acc, time.time() - tic))

_, test_acc = test(finetune_net, test_data, ctx)
print('[Finished] Test-acc: %.3f' % (test_acc))

**NOTE**: If you get `cudaMalloc failed: out of memory` error at the training loop, you can:
1. shutdown the previous Lab session. The Notebook Dashboard has a tab named `Running` that shows all the running notebooks and allows shutting them down (by clicking on a `Shutdown` button).
    
![](./images/jupyter_running.png)
2. try to decrease the `per_device_batch_size` value (e.g. 128).

Predict with finetuned model
-------------

We can test the performance using finetuned weights.

In [None]:
%matplotlib inline
import matplotlib.image as mpimg
import matplotlib.pyplot as plt

from gluoncv.utils import viz, download

Let's test with the first picture.

In [None]:
plt.rcParams['figure.figsize'] = (15, 9)

url = 'https://cdn.pixabay.com/photo/2019/07/14/12/55/brown-bear-swimming-4337049_960_720.jpg'

file = download(url, path='.')
img = image.imread(file)

viz.plot_image(img)
plt.show()

In [None]:
transform_fn = transforms.Compose([
 transforms.Resize(size=(224, 224)),
 transforms.ToTensor(),
])

img = transform_fn(img)

In [None]:
ctx = mx.gpu(0)
pred = finetune_net(img.expand_dims(0).as_in_context(ctx))

In [None]:
class_names = ['brown', 'no', 'polar']

topK = 3
ind = nd.topk(pred, k=topK).astype('int')[0]
for i in range(topK):
 print('[%s], with probability %.1f%%'%
 (class_names[ind[i].asscalar()], nd.softmax(pred)[0][ind[i]].asscalar()*100))

Let's test with another picture.

In [None]:
url = 'https://cdn.pixabay.com/photo/2016/09/12/17/51/polar-bears-1665367_960_720.jpg'

file = download(url, path='.')
img = image.imread(file)

viz.plot_image(img)
plt.show()

In [None]:
img = transform_fn(img)
pred = finetune_net(img.expand_dims(0).as_in_context(ctx))

ind = nd.topk(pred, k=topK).astype('int')[0]
for i in range(topK):
 print('[%s], with probability %.1f%%'%
 (class_names[ind[i].asscalar()], nd.softmax(pred)[0][ind[i]].asscalar()*100))

This time let's try a more difficult example.

In [None]:
url = 'https://cdn.pixabay.com/photo/2015/12/12/14/57/giant-rubber-bear-1089612_960_720.jpg'

file = download(url, path='.')
img = image.imread(file)

viz.plot_image(img)
plt.show()

In [None]:
img = transform_fn(img)
pred = finetune_net(img.expand_dims(0).as_in_context(ctx))

ind = nd.topk(pred, k=topK).astype('int')[0]
for i in range(topK):
 print('[%s], with probability %.1f%%'%
 (class_names[ind[i].asscalar()], nd.softmax(pred)[0][ind[i]].asscalar()*100))

Congratulations! You have built your own object classification model based on a custom dataset.

## Converting the trained model for DeepLens deployment

Now we are saving the trained model into the format which can be deployed to your DeepLens device. Specifically, the model symbol and parameter files needs to be packaged together.

In [None]:
s3_bucket = 'deeplens-'

We will name the model as 'mobilenet1.0-bear', and this name will be used within the Lambda function of a DeepLens project package. Here we also add Softmax layer on top of the last layer which is required by Intel OpenVINO.

In [None]:
model_name = 'mobilenet1.0-bear'

finetune_net.export(model_name)
net_with_softmax = finetune_net(mx.sym.var('data'))
net_with_softmax = mx.sym.SoftmaxOutput(data=net_with_softmax, name='softmax')
net_with_softmax.save('./{}-symbol.json'.format(model_name))

Let's tar the two files and upload the tar file to Amazon S3 bucket to be refered by DeepLens in the next Lab.

In [None]:
!tar cvfz ./{model_name}.tar.gz ./{model_name}-*

In [None]:
!aws s3 cp {model_name}.tar.gz s3://{s3_bucket}/models/