## Training Notebook

This notebook illustrates training of a simple model to classify digits using the MNIST dataset. This code is used to train the model included with the templates. This is meant to be a starter model to show you how to set up Serverless applications to do inferences. For deeper understanding of how to train a good model for MNIST, we recommend literature from the [MNIST website](http://yann.lecun.com/exdb/mnist/). The dataset is made available under a [Creative Commons Attribution-Share Alike 3.0](https://creativecommons.org/licenses/by-sa/3.0/) license.

In [1]:
# Install required dependencies

! pip install -q torch==1.8.0 torchvision==0.9.0

In [2]:
# Torchvision provides an easy way to import MNIST dataset into DataLoaders

import torch
import torchvision
from torchvision.transforms import ToTensor

# mini-batch size when training and testing
mini_batch_size = 64

train_loader = torch.utils.data.DataLoader(
 torchvision.datasets.MNIST('./mnist_data/', train=True, download=True, transform=ToTensor()),
 batch_size=mini_batch_size)

test_loader = torch.utils.data.DataLoader(
 torchvision.datasets.MNIST('./mnist_data/', train=False, download=True, transform=ToTensor()),
 batch_size=mini_batch_size)


## PyTorch Model Training

For this example, we will train a simple CNN classifier using PyTorch to classify the MNIST digits. We will then freeze the model in the TorchScript format. This is same as the starter model file included with the SAM templates.

In [6]:
import torch.nn as nn
import torch.nn.functional as F

# Use a GPU if set up on this machine
device = "cuda" if torch.cuda.is_available() else "cpu"
print("Using {} device".format(device))

# We'll start with building a model
class Model(nn.Module):
 
 def __init__(self):
 super(Model, self).__init__()
 self.conv1 = nn.Conv2d(1, 32, kernel_size=3)
 self.convbn1 = nn.BatchNorm2d(32)
 
 self.conv2 = nn.Conv2d(32, 32, kernel_size=3)
 self.convbn2 = nn.BatchNorm2d(32)
 
 layer_size = 100

 self.layer1 = nn.Linear(800, layer_size)
 self.bn1 = nn.BatchNorm1d(layer_size)
 
 self.layer2 = nn.Linear(layer_size, layer_size)
 self.bn2 = nn.BatchNorm1d(layer_size)
 
 self.layer3 = nn.Linear(layer_size, layer_size)
 self.bn3 = nn.BatchNorm1d(layer_size)
 
 self.layer4 = nn.Linear(layer_size, layer_size)
 self.bn4 = nn.BatchNorm1d(layer_size)
 
 self.layer5 = nn.Linear(layer_size, layer_size)
 self.bn5 = nn.BatchNorm1d(layer_size)
 
 self.smax = nn.Linear(layer_size, 10)

 def forward(self, x):
 x = self.convbn1(F.relu(F.max_pool2d(self.conv1(x), 2)))
 x = F.dropout2d(x, training=self.training)
 
 x = self.convbn2(F.relu(F.max_pool2d(self.conv2(x), 2)))
 x = F.dropout2d(x, training=self.training)
 
 x = x.view(-1, 800)
 x = F.dropout(self.bn1(F.relu(self.layer1(x))), training=self.training)
 x = F.dropout(self.bn2(F.relu(self.layer2(x))), training=self.training)
 x = F.dropout(self.bn3(F.relu(self.layer3(x))), training=self.training)
 x = F.dropout(self.bn4(F.relu(self.layer4(x))), training=self.training)
 x = F.dropout(self.bn5(F.relu(self.layer5(x))), training=self.training)
 
 return self.smax(x)

model = Model().to(device)
print(model)

Using cuda device
Model(
 (conv1): Conv2d(1, 32, kernel_size=(3, 3), stride=(1, 1))
 (convbn1): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
 (conv2): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1))
 (convbn2): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
 (layer1): Linear(in_features=800, out_features=100, bias=True)
 (bn1): BatchNorm1d(100, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
 (layer2): Linear(in_features=100, out_features=100, bias=True)
 (bn2): BatchNorm1d(100, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
 (layer3): Linear(in_features=100, out_features=100, bias=True)
 (bn3): BatchNorm1d(100, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
 (layer4): Linear(in_features=100, out_features=100, bias=True)
 (bn4): BatchNorm1d(100, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
 (layer5): Linear(in_features=100, out_features=100, bias=Tr

In [7]:
# Define some hand tuned parameters
# (we already defined the batch size above)

epochs = 10
learning_rate = 10**-4
log_step = 200

# Define our loss function and optimizer
loss_fn = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)

# Single training epoch loop
def train(train_loader, model, loss_fn, optimizer):
 size = len(train_loader.dataset)
 
 for batch, (X, y) in enumerate(train_loader):
 X, y = X.to(device), y.to(device)
 
 # Forward pass and compute loss
 pred = model(X)
 loss = loss_fn(pred, y)
 
 # Backpropagate loss
 optimizer.zero_grad()
 loss.backward()
 optimizer.step()
 
 if batch % log_step == 0:
 loss, current = loss.item(), batch * len(X)
 print(f'loss: {loss} [{current}/{size}]')


def test(test_loader, model):
 size = len(test_loader.dataset)
 model.eval()
 
 test_loss, correct = 0, 0
 with torch.no_grad():
 for X, y in test_loader:
 X, y = X.to(device), y.to(device)
 pred = model(X)
 
 test_loss += loss_fn(pred, y).item()
 correct += (pred.argmax(1) == y).type(torch.float).sum().item()
 
 test_loss /= size
 correct /= size
 
 print(f'Test accuracy: {100*correct}%, avg loss: {test_loss}')

# Driver loop to start training
for epoch_no in range(epochs):
 print(f'\nEpoch {epoch_no}\n---------------------------------------------')
 
 train(train_loader, model, loss_fn, optimizer)
 test(test_loader, model)

print('Done!')


Epoch 0
---------------------------------------------
loss: 2.399322748184204 [0/60000]
loss: 2.5006144046783447 [12800/60000]
loss: 2.528806447982788 [25600/60000]
loss: 2.287709951400757 [38400/60000]
loss: 2.41180419921875 [51200/60000]
Test accuracy: 14.360000000000001%, avg loss: 0.0338871225476265

Epoch 1
---------------------------------------------
loss: 2.191370964050293 [0/60000]
loss: 0.7450801730155945 [12800/60000]
loss: 0.2356947809457779 [25600/60000]
loss: 0.13707228004932404 [38400/60000]
loss: 0.2939474582672119 [51200/60000]
Test accuracy: 95.89%, avg loss: 0.0021439245976740493

Epoch 2
---------------------------------------------
loss: 0.14360113441944122 [0/60000]
loss: 0.08505825698375702 [12800/60000]
loss: 0.06298833340406418 [25600/60000]
loss: 0.07103477418422699 [38400/60000]
loss: 0.18312042951583862 [51200/60000]
Test accuracy: 97.7%, avg loss: 0.0011607079559122213

Epoch 3
---------------------------------------------
loss: 0.08531039953231812 [0/6000

We will save the model as a [TorchScript](https://pytorch.org/docs/stable/jit.html) file to export it for inferencing. Note that PyTorch offers [more ways](https://pytorch.org/tutorials/beginner/saving_loading_models.html?highlight=load#saving-loading-model-for-inference) for saving models depending on your use case and execution environment.

In [31]:
# Convert to a TorchScript model optimized for running on CPU
scripted_model = torch.jit.script(model.cpu())

# Let's sanity check the models give same results using random input
model.eval()
scripted_model.eval()

for i in range(1000):
 X = torch.randn(1, 1, 28, 28)
 
 pt_ans = torch.argmax(model(X)).item()
 ts_ans = torch.argmax(scripted_model(X)).item()
 assert pt_ans == ts_ans

# Freeze the scripted model to include with the template
scripted_model.save('digit_classifier.pt')