{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
""
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Machine Learning Accelerator - Computer Vision - Lecture 1\n",
"\n",
"## Neural Networks with Gluon\n",
"\n",
"In this notebook, we build, train and validate a Neural Network in [Gluon](https://mxnet.apache.org/api/python/docs/tutorials/packages/gluon/index.html), a library that provides a clear, concise, and simple API for deep learning. \n",
"\n",
"1. Implementing a neural network with Gluon \n",
"2. Loss Functions\n",
"3. Training\n",
"4. Example - Binary Classification\n",
" "
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\u001b[33mWARNING: You are using pip version 21.3.1; however, version 22.3.1 is available.\r\n",
"You should consider upgrading via the '/home/ec2-user/anaconda3/envs/pytorch_p39/bin/python3.9 -m pip install --upgrade pip' command.\u001b[0m\r\n"
]
}
],
"source": [
"! pip install -q -r ../requirements.txt"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 1. Implementing a neural network with Gluon\n",
"(Go to top)\n",
"\n",
"Let's implement a simple neural network with two hidden layers of size 64 using the sequential mode (Adding things in sequence). We will have 3 inputs, 2 hidden layers and 1 output layer. Some drop-outs attached to the hidden layers."
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Sequential(\n",
" (0): Dense(None -> 64, Activation(tanh))\n",
" (1): Dropout(p = 0.4, axes=())\n",
" (2): Dense(None -> 64, Activation(tanh))\n",
" (3): Dropout(p = 0.3, axes=())\n",
" (4): Dense(None -> 1, linear)\n",
")\n"
]
}
],
"source": [
"from mxnet.gluon import nn\n",
"\n",
"net = nn.Sequential()\n",
"\n",
"net.add(nn.Dense(64, # Dense layer-1 with 64 units\n",
"# in_units=3, # Input size of 3 is expected\n",
" activation='tanh'), # Tanh activation is applied\n",
" nn.Dropout(.4), # Apply random 40% drop-out to layer_1\n",
" \n",
" nn.Dense(64, # Dense layer-2 with 64 units \n",
" activation='tanh'), # Tanh activation is applied\n",
" \n",
" nn.Dropout(.3), # Apply random 30% drop-out to layer_2\n",
" \n",
" nn.Dense(1)) # Output layer with single unit\n",
"\n",
"print(net)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We first randomly initialize the weights parameters of the network with the `initialize()` function. Using these weights as a start, we can later apply optimization such as SGD to train the weights. As a result, using a strategic technique to initialize the weights is crucial. \n",
"\n",
"Here is a full list of [Initializers](https://mxnet.apache.org/api/python/docs/api/initializer/index.html). The commonly used one is called *Xavier initilaization*, which can keep the scale of gradients roughly the same in all the layers. (Here are more technical details of [Xavier initilaization](https://d2l.ai/chapter_multilayer-perceptrons/numerical-stability-and-init.html#xavier-initialization).) Let's use it in our implementation."
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [],
"source": [
"from mxnet import init\n",
"\n",
"net.initialize(init=init.Xavier())"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can easily access them with `net[layer_index]`:"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Dense(None -> 64, Activation(tanh))\n",
"Dropout(p = 0.4, axes=())\n",
"Dense(None -> 64, Activation(tanh))\n",
"Dropout(p = 0.3, axes=())\n",
"Dense(None -> 1, linear)\n"
]
}
],
"source": [
"print(net[0])\n",
"print(net[1])\n",
"print(net[2])\n",
"print(net[3])\n",
"print(net[4])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 2. Loss Functions\n",
"(Go to top)\n",
"\n",
"We can select [loss functions](https://d2l.ai/chapter_linear-networks/linear-regression.html#loss-function) according to our problem. A full list of supported `Loss` functions in Gluon are available [here](https://mxnet.incubator.apache.org/api/python/docs/api/gluon/loss/index.html). \n",
"\n",
"Let's go over some popular loss functions and see how to call a built-in loss function:\n",
"\n",
"\n",
"__Binary Cross-entropy Loss:__ A common used loss function for binary classification. \n",
"\n",
"```python\n",
"from mxnet.gluon.loss import SigmoidBinaryCrossEntropyLoss\n",
"loss = SigmoidBinaryCrossEntropyLoss()\n",
"```\n",
"\n",
"__Categorical Cross-entropy Loss:__ A common used loss function for multi-class classification. \n",
"\n",
"```python\n",
"from mxnet.gluon.loss import SoftmaxCrossEntropyLoss\n",
"loss = SoftmaxCrossEntropyLoss()\n",
"```\n",
"\n",
"__L2 Loss:__ A most common used loss function for regression problems. \n",
"\n",
"```python\n",
"from mxnet.gluon.loss import L2Loss\n",
"loss = L2Loss()\n",
"```"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 3. Training\n",
"(Go to top)\n",
"\n",
"`mxnet.gluon.Trainer()` module provides necessary training algorithms for neural networks. We can use the following `Trainer` to train a network using [Stochastic Gradient Descent (SGD)](https://d2l.ai/chapter_optimization/sgd.html) method and learning rate of 0.001.\n",
"\n",
"```python\n",
"from mxnet import gluon\n",
"\n",
"trainer = gluon.Trainer(net.collect_params(),\n",
" 'sgd', \n",
" {'learning_rate': 0.001}\n",
" )\n",
"```"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 4. Example - Binary Classification\n",
"(Go to top)\n",
"\n",
"In this example, we will train a neural network on a dataset that we randomly generated. We will have two classes and train a neural network to classify them."
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [],
"source": [
"from sklearn.datasets import make_circles\n",
"\n",
"X, y = make_circles(n_samples=750, shuffle=True, random_state=42, noise=0.05, factor=0.3)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"First let's plot the simulated dataset."
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "\n",
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"%matplotlib inline\n",
"import matplotlib.pyplot as plt\n",
"import seaborn as sns\n",
"\n",
"def plot_dataset(X, y, title):\n",
" \n",
" # Activate Seaborn visualization\n",
" sns.set()\n",
" \n",
" # Plot both classes: Class1->Blue, Class2->Red\n",
" plt.scatter(X[y==1, 0], X[y==1, 1], c='blue', label=\"class 1\")\n",
" plt.scatter(X[y==0, 0], X[y==0, 1], c='red', label=\"class 2\")\n",
" plt.legend(loc='upper right')\n",
" plt.xlabel('x1')\n",
" plt.ylabel('x2')\n",
" plt.xlim(-2, 2)\n",
" plt.ylim(-2, 2)\n",
" plt.title(title)\n",
" plt.show()\n",
" \n",
"plot_dataset(X, y, title=\"Dataset\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Next, we import the necessary libraries and classes."
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [],
"source": [
"import time\n",
"import mxnet as mx\n",
"from mxnet import gluon, autograd\n",
"import mxnet.ndarray as nd\n",
"from mxnet.gluon.loss import SigmoidBinaryCrossEntropyLoss"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Then, we create the network as below. It will have two hidden layers. Since the data seems easily seperable, we can have a small network (2 hidden layers) with 10 units at each layer."
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [],
"source": [
"from mxnet import init\n",
"from mxnet.gluon import nn\n",
"\n",
"context = mx.cpu() # Using CPU resource; mx.gpu() will use GPU resources if available\n",
"net = nn.Sequential()\n",
"net.add(nn.Dense(10, in_units=2, activation='relu'),\n",
" nn.Dense(10, activation='relu'),\n",
" nn.Dense(1, activation='sigmoid'))\n",
"net.initialize(init=init.Xavier(), ctx=context)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now let's prepare the training set and validation set, and load each of them to a `DataLoader`, respectively."
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [],
"source": [
"# Split the dataset into two parts: 80%-20% split\n",
"X_train, X_val = X[0:int(len(X)*0.8), :], X[int(len(X)*0.8):, :]\n",
"y_train, y_val = y[:int(len(X)*0.8)], y[int(len(X)*0.8):]\n",
"\n",
"# Use Gluon DataLoaders to load the data in batches\n",
"batch_size = 4 # How many samples to use for each weight update \n",
"train_dataset = gluon.data.ArrayDataset(nd.array(X_train), nd.array(y_train))\n",
"train_loader = gluon.data.DataLoader(train_dataset, batch_size=batch_size)\n",
"\n",
"# Move validation dataset in CPU/GPU context\n",
"X_val = nd.array(X_val).as_in_context(context)\n",
"y_val = nd.array(y_val).as_in_context(context)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Before the training, one last thing is to define the hyper-parameters for training."
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [],
"source": [
"epochs = 50 # Total number of iterations\n",
"learning_rate = 0.01 # Learning rate\n",
"\n",
"# Define the loss. As we used sigmoid in the last layer, use from_sigmoid=True\n",
"binary_cross_loss = SigmoidBinaryCrossEntropyLoss(from_sigmoid=True)\n",
"\n",
"# Define the trainer, SGD with learning rate\n",
"trainer = gluon.Trainer(net.collect_params(),\n",
" 'sgd',\n",
" {'learning_rate': learning_rate}\n",
" )"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Finally, it is the time for training! We will run through the training set 50 times (i.e., epochs) and print training and validation losses at each epoch."
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"[19:36:22] ../src/base.cc:79: cuDNN lib mismatch: linked-against version 8400 != compiled-against version 8101. Set MXNET_CUDNN_LIB_CHECKING=0 to quiet this warning.\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"Epoch 0. Train_loss 0.699338 Validation_loss 0.689151 Seconds 0.483013\n",
"Epoch 9. Train_loss 0.460032 Validation_loss 0.436385 Seconds 0.376627\n",
"Epoch 19. Train_loss 0.084463 Validation_loss 0.082216 Seconds 0.312815\n",
"Epoch 29. Train_loss 0.026741 Validation_loss 0.028140 Seconds 0.368647\n",
"Epoch 39. Train_loss 0.013978 Validation_loss 0.015424 Seconds 0.335571\n",
"Epoch 49. Train_loss 0.009026 Validation_loss 0.010315 Seconds 0.411211\n"
]
}
],
"source": [
"import time\n",
"\n",
"train_losses = []\n",
"val_losses = []\n",
"for epoch in range(epochs):\n",
" start = time.time()\n",
" training_loss = 0\n",
" # Build a training loop, to train the network\n",
" for idx, (data, target) in enumerate(train_loader):\n",
"\n",
" data = data.as_in_context(context)\n",
" target = target.as_in_context(context)\n",
" \n",
" with autograd.record():\n",
" output = net(data)\n",
" L = binary_cross_loss(output, target)\n",
" training_loss += nd.sum(L).asscalar()\n",
" L.backward()\n",
" trainer.step(data.shape[0])\n",
" \n",
" # Get validation predictions\n",
" val_predictions = net(X_val)\n",
" # Calculate the validation loss\n",
" val_loss = nd.sum(binary_cross_loss(val_predictions, y_val)).asscalar()\n",
" \n",
" # Take the average losses\n",
" training_loss = training_loss / len(y_train)\n",
" val_loss = val_loss / len(y_val)\n",
" \n",
" train_losses.append(training_loss)\n",
" val_losses.append(val_loss)\n",
" \n",
" end = time.time()\n",
" # Print the losses every 10 epochs\n",
" if (epoch == 0) or ((epoch+1)%10 == 0):\n",
" print(\"Epoch %s. Train_loss %f Validation_loss %f Seconds %f\" % \\\n",
" (epoch, training_loss, val_loss, end-start))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's see the training and validation loss plots below. Losses go down as the training process continues as expected."
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "\n",
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"%matplotlib inline\n",
"import matplotlib.pyplot as plt\n",
"import seaborn as sns\n",
"\n",
"plt.plot(train_losses, label=\"Training Loss\")\n",
"plt.plot(val_losses, label=\"Validation Loss\")\n",
"plt.title(\"Loss values\")\n",
"plt.xlabel(\"Epoch\")\n",
"plt.ylabel(\"Loss\")\n",
"plt.legend()\n",
"plt.show()"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "conda_pytorch_p39",
"language": "python",
"name": "conda_pytorch_p39"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.13"
}
},
"nbformat": 4,
"nbformat_minor": 2
}