{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
""
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Machine Learning Accelerator - Natural Language Processing - Lecture 3\n",
"\n",
"## Neural Networks with Gluon\n",
"\n",
"In this notebook, we will build, train and validate a Neural Network using Gluon/MXNet.\n",
"1. Implementing a neural network with Gluon\n",
"2. Loss Functions\n",
"3. Training\n",
"4. Example - Binary Classification\n",
"5. Natural Language Processing Context"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Note: you may need to restart the kernel to use updated packages.\n"
]
}
],
"source": [
"%pip install -q -r ../requirements.txt"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 1. Implementing a neural network with Gluon\n",
"(Go to top)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's implement a simple neural network with two hidden layers of size 64 and 128 using the sequential mode (Adding things in sequence). We will have 3 inputs, 2 hidden layers and 1 output layer. Some drop-outs attached to the hidden layers."
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Sequential(\n",
" (0): Dense(3 -> 64, Activation(tanh))\n",
" (1): Dropout(p = 0.4, axes=())\n",
" (2): Dense(None -> 64, Activation(tanh))\n",
" (3): Dropout(p = 0.3, axes=())\n",
" (4): Dense(None -> 1, linear)\n",
")\n"
]
}
],
"source": [
"from mxnet.gluon import nn\n",
"\n",
"net = nn.Sequential()\n",
"\n",
"net.add(nn.Dense(64, # Dense layer-1 with 64 units\n",
" in_units=3, # Input size of 3 is expected\n",
" activation='tanh'), # Tanh activation is applied\n",
" nn.Dropout(.4), # Apply random 40% drop-out to layer_1\n",
" \n",
" nn.Dense(64, # Dense layer-2 with 64 units \n",
" activation='tanh' # Tanh activation is applied\n",
" ),\n",
" nn.Dropout(.3), # Apply random 30% drop-out to layer_2\n",
" \n",
" nn.Dense(1)) # Output layer with single unit\n",
"\n",
"print(net)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can initialize the weights of the network with 'initialize()' function. We prefer to use the following:"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [],
"source": [
"from mxnet import init\n",
"\n",
"net.initialize(init=init.Xavier())"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's look at our layers and dropouts on them. We can easily access them wth net[layer_index]"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Dense(3 -> 64, Activation(tanh))\n",
"Dropout(p = 0.4, axes=())\n",
"Dense(None -> 64, Activation(tanh))\n",
"Dropout(p = 0.3, axes=())\n",
"Dense(None -> 1, linear)\n"
]
}
],
"source": [
"print(net[0])\n",
"print(net[1])\n",
"print(net[2])\n",
"print(net[3])\n",
"print(net[4])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 2. Loss Functions\n",
"(Go to top)\n",
"\n",
"We will go over some popular loss functions here. We can select loss functions according to our problem. Full list of supported loss functions are available [here](https://mxnet.incubator.apache.org/api/python/docs/api/gluon/loss/index.html)\n",
"\n",
"\n",
"__Binary Cross-entropy Loss:__ A common loss function for binary classification. It is given by: \n",
"$$\n",
"\\mathrm{BinaryCrossEntropyLoss} = -\\sum_{examples}{(y\\log(p) + (1 - y)\\log(1 - p))}\n",
"$$\n",
"where p is the prediction (between 0 and 1, ie. 0.831) and y is the true class (either 1 or 0). \n",
"\n",
"In gluon, we can use binary cross entropy with `SigmoidBinaryCrossEntropyLoss`. It also applies sigmoid function on the predictions. Therefore, p is always between 0 and 1.\n",
"\n",
"\n",
"```python\n",
"from mxnet.gluon.loss import SigmoidBinaryCrossEntropyLoss\n",
"loss = SigmoidBinaryCrossEntropyLoss()\n",
"```\n",
"__Categorical Cross-entropy Loss:__ It is used for multi-class classification. We apply the softmax function on prediction probabilities and then extend the equation of binary cross-entropy. After the softmax function, summation of the predictions are equal to 1. Equation is below. y becomes 1 for true class and 0 for other classes.\n",
"$$\n",
"\\mathrm{CategoricalCrossEntropyLoss} = -\\sum_{examples}\\sum_{classes}{y_j\\log(p_j)}\n",
"$$\n",
"In gluon, `SoftmaxCrossEntropyLoss` implements the categorical cross-entropy loss with softmax function\n",
"\n",
"\n",
"```python\n",
"from mxnet.gluon.loss import SoftmaxCrossEntropyLoss\n",
"loss = SoftmaxCrossEntropyLoss()\n",
"```\n",
"__L2 Loss:__ This is a loss function for regression problems. It measures the squared difference between target values (y) and predictions (p). Here, square makes sure the offsets with different signs don't cancel out each other.\n",
"$$\n",
"\\mathrm{L2 loss} = \\frac{1}{2} \\sum_{examples}(y - p)^2\n",
"$$\n",
"In gluon, we can use it with `L2Loss`:\n",
"```python\n",
"from mxnet.gluon.loss import L2Loss\n",
"loss = L2Loss()\n",
"```\n",
"__L1 Loss:__ This is similar to L2 loss. It measures the abolsute difference between target values (y) and predictions (p).\n",
"$$\n",
"\\mathrm{L1 loss} = \\frac{1}{2} \\sum_{examples}|y - p|\n",
"$$\n",
"In gluon, we can use it with `L1Loss`:\n",
"```python\n",
"from mxnet.gluon.loss import L1Loss\n",
"loss = L1Loss()\n",
"```"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 3. Training\n",
"(Go to top)\n",
"\n",
"`mxnet.gluon.Trainer()` module provides necessary training algorithms for neural networks. We can use the following for training a network using Stochastic Gradient Descent method and learning rate of 0.001.\n",
"\n",
"```python\n",
"from mxnet import gluon\n",
"\n",
"trainer = gluon.Trainer(net.collect_params(),\n",
" 'sgd', \n",
" {'learning_rate': 0.001}\n",
" )\n",
"```"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 4. Example - Binary Classification\n",
"(Go to top)\n",
"\n",
"Let's train a neural network on a random dataset. We have two classes and will learn to classify them."
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [],
"source": [
"from sklearn.datasets import make_circles\n",
"\n",
"X, y = make_circles(n_samples=750, shuffle=True, random_state=42, noise=0.05, factor=0.3)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's plot the dataset"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "\n",
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"%matplotlib inline\n",
"import matplotlib.pyplot as plt\n",
"import seaborn as sns\n",
"\n",
"def plot_dataset(X, y, title):\n",
" \n",
" # Activate Seaborn visualization\n",
" sns.set()\n",
" \n",
" # Plot both classes: Class1->Blue, Class2->Red\n",
" plt.scatter(X[y==1, 0], X[y==1, 1], c='blue', label=\"class 1\")\n",
" plt.scatter(X[y==0, 0], X[y==0, 1], c='red', label=\"class 2\")\n",
" plt.legend(loc='upper right')\n",
" plt.xlabel('x1')\n",
" plt.ylabel('x2')\n",
" plt.xlim(-2, 2)\n",
" plt.ylim(-2, 2)\n",
" plt.title(title)\n",
" plt.show()\n",
" \n",
"plot_dataset(X, y, title=\"Dataset\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Importing the necessary libraries"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [],
"source": [
"import time\n",
"import mxnet as mx\n",
"from mxnet import gluon, autograd\n",
"import mxnet.ndarray as nd\n",
"from mxnet.gluon.loss import SigmoidBinaryCrossEntropyLoss"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We are creating the network below. We will have two hidden layers. Since the data seems easily seperable, we can have a small network (2 hidden layers) with 10 units at each layer."
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [],
"source": [
"from mxnet import init\n",
"from mxnet.gluon import nn\n",
"\n",
"net = nn.Sequential()\n",
"net.add(nn.Dense(10, in_units=2, activation='relu'),\n",
" nn.Dense(10, activation='relu'),\n",
" nn.Dense(1, activation='sigmoid'))\n",
"net.initialize(init=init.Xavier())"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's define the training parameters"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [],
"source": [
"batch_size = 4 # How many samples to use for each weight update \n",
"epochs = 50 # Total number of iterations\n",
"learning_rate = 0.01 # Learning rate\n",
"context = mx.cpu() # Using CPU resource\n",
"\n",
"# Define the loss. As we used sigmoid in the last layer, use from_sigmoid=True\n",
"binary_cross_loss = SigmoidBinaryCrossEntropyLoss(from_sigmoid=True)\n",
"\n",
"# Define the trainer, SGD with learning rate\n",
"trainer = gluon.Trainer(net.collect_params(),\n",
" 'sgd',\n",
" {'learning_rate': learning_rate}\n",
" )"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [],
"source": [
"# Splitting the dataset into two parts: 80%-20% split\n",
"X_train, X_val = X[0:int(len(X)*0.8), :], X[int(len(X)*0.8):, :]\n",
"y_train, y_val = y[:int(len(X)*0.8)], y[int(len(X)*0.8):]\n",
"\n",
"# Convert to ND arrays for gluon\n",
"X_train = nd.array(X_train)\n",
"X_val = nd.array(X_val)\n",
"y_train = nd.array(y_train)\n",
"y_val = nd.array(y_val)\n",
"\n",
"# Using Gluon Data loaders to load the data in batches\n",
"train_dataset = gluon.data.ArrayDataset(X_train, y_train)\n",
"train_loader = gluon.data.DataLoader(train_dataset, batch_size=batch_size)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's start the training process. We will have training and validation sets and print our losses at each step."
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Epoch 0. Train_loss 0.717287 Validation_loss 0.707328 Seconds 0.466028\n",
"Epoch 1. Train_loss 0.698457 Validation_loss 0.694622 Seconds 0.378597\n",
"Epoch 2. Train_loss 0.683981 Validation_loss 0.681235 Seconds 0.330993\n",
"Epoch 3. Train_loss 0.670197 Validation_loss 0.667785 Seconds 0.402593\n",
"Epoch 4. Train_loss 0.653411 Validation_loss 0.648352 Seconds 0.349503\n",
"Epoch 5. Train_loss 0.632768 Validation_loss 0.627517 Seconds 0.315740\n",
"Epoch 6. Train_loss 0.611411 Validation_loss 0.606206 Seconds 0.315187\n",
"Epoch 7. Train_loss 0.588618 Validation_loss 0.582911 Seconds 0.362654\n",
"Epoch 8. Train_loss 0.564000 Validation_loss 0.557464 Seconds 0.359108\n",
"Epoch 9. Train_loss 0.536602 Validation_loss 0.528374 Seconds 0.315541\n",
"Epoch 10. Train_loss 0.505457 Validation_loss 0.494894 Seconds 0.396428\n",
"Epoch 11. Train_loss 0.470475 Validation_loss 0.457515 Seconds 0.416723\n",
"Epoch 12. Train_loss 0.432076 Validation_loss 0.416653 Seconds 0.498043\n",
"Epoch 13. Train_loss 0.391388 Validation_loss 0.374049 Seconds 0.500873\n",
"Epoch 14. Train_loss 0.349958 Validation_loss 0.330396 Seconds 0.371929\n",
"Epoch 15. Train_loss 0.309373 Validation_loss 0.289564 Seconds 0.340768\n",
"Epoch 16. Train_loss 0.272011 Validation_loss 0.252969 Seconds 0.341112\n",
"Epoch 17. Train_loss 0.238545 Validation_loss 0.221302 Seconds 0.410799\n",
"Epoch 18. Train_loss 0.209346 Validation_loss 0.194521 Seconds 0.406427\n",
"Epoch 19. Train_loss 0.183823 Validation_loss 0.171154 Seconds 0.409105\n",
"Epoch 20. Train_loss 0.157312 Validation_loss 0.139015 Seconds 0.406448\n",
"Epoch 21. Train_loss 0.122792 Validation_loss 0.108570 Seconds 0.418938\n",
"Epoch 22. Train_loss 0.097598 Validation_loss 0.087676 Seconds 0.412663\n",
"Epoch 23. Train_loss 0.079057 Validation_loss 0.071903 Seconds 0.353745\n",
"Epoch 24. Train_loss 0.065036 Validation_loss 0.060049 Seconds 0.414701\n",
"Epoch 25. Train_loss 0.054316 Validation_loss 0.050988 Seconds 0.365269\n",
"Epoch 26. Train_loss 0.046060 Validation_loss 0.043861 Seconds 0.404951\n",
"Epoch 27. Train_loss 0.039583 Validation_loss 0.038183 Seconds 0.375408\n",
"Epoch 28. Train_loss 0.034440 Validation_loss 0.033622 Seconds 0.315408\n",
"Epoch 29. Train_loss 0.030300 Validation_loss 0.029884 Seconds 0.385614\n",
"Epoch 30. Train_loss 0.026918 Validation_loss 0.026790 Seconds 0.386093\n",
"Epoch 31. Train_loss 0.024119 Validation_loss 0.024206 Seconds 0.316170\n",
"Epoch 32. Train_loss 0.021771 Validation_loss 0.022016 Seconds 0.313672\n",
"Epoch 33. Train_loss 0.019779 Validation_loss 0.020145 Seconds 0.328363\n",
"Epoch 34. Train_loss 0.018077 Validation_loss 0.018535 Seconds 0.339953\n",
"Epoch 35. Train_loss 0.016609 Validation_loss 0.017139 Seconds 0.372699\n",
"Epoch 36. Train_loss 0.015334 Validation_loss 0.015918 Seconds 0.377074\n",
"Epoch 37. Train_loss 0.014218 Validation_loss 0.014844 Seconds 0.392896\n",
"Epoch 38. Train_loss 0.013235 Validation_loss 0.013891 Seconds 0.408250\n",
"Epoch 39. Train_loss 0.012365 Validation_loss 0.013041 Seconds 0.394251\n",
"Epoch 40. Train_loss 0.011589 Validation_loss 0.012281 Seconds 0.409285\n",
"Epoch 41. Train_loss 0.010894 Validation_loss 0.011598 Seconds 0.368799\n",
"Epoch 42. Train_loss 0.010268 Validation_loss 0.010978 Seconds 0.313839\n",
"Epoch 43. Train_loss 0.009702 Validation_loss 0.010416 Seconds 0.314766\n",
"Epoch 44. Train_loss 0.009189 Validation_loss 0.009902 Seconds 0.323163\n",
"Epoch 45. Train_loss 0.008720 Validation_loss 0.009432 Seconds 0.314554\n",
"Epoch 46. Train_loss 0.008292 Validation_loss 0.009001 Seconds 0.402744\n",
"Epoch 47. Train_loss 0.007899 Validation_loss 0.008603 Seconds 0.411461\n",
"Epoch 48. Train_loss 0.007537 Validation_loss 0.008237 Seconds 0.362417\n",
"Epoch 49. Train_loss 0.007204 Validation_loss 0.007897 Seconds 0.315366\n"
]
}
],
"source": [
"import time\n",
"\n",
"train_losses = []\n",
"val_losses = []\n",
"for epoch in range(epochs):\n",
" start = time.time()\n",
" training_loss = 0\n",
" # Training loop, train the network\n",
" for idx, (data, target) in enumerate(train_loader):\n",
"\n",
" data = data.as_in_context(context)\n",
" target = target.as_in_context(context)\n",
" \n",
" with autograd.record():\n",
" output = net(data)\n",
" L = binary_cross_loss(output, target)\n",
" training_loss += nd.sum(L).asscalar()\n",
" L.backward()\n",
" trainer.step(data.shape[0])\n",
" \n",
" # Get validation predictions\n",
" val_predictions = net(X_val.as_in_context(context))\n",
" # Calculate validation loss\n",
" val_loss = nd.sum(binary_cross_loss(val_predictions, y_val)).asscalar()\n",
" \n",
" # Let's take the average losses\n",
" training_loss = training_loss / len(y_train)\n",
" val_loss = val_loss / len(y_val)\n",
" \n",
" train_losses.append(training_loss)\n",
" val_losses.append(val_loss)\n",
" \n",
" end = time.time()\n",
" print(\"Epoch %s. Train_loss %f Validation_loss %f Seconds %f\" % \\\n",
" (epoch, training_loss, val_loss, end-start))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's see the training and validation loss plots below. Losses go down as the training process continues as expected."
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "\n",
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"%matplotlib inline\n",
"import matplotlib.pyplot as plt\n",
"import seaborn as sns\n",
"\n",
"plt.plot(train_losses, label=\"Training Loss\")\n",
"plt.plot(val_losses, label=\"Validation Loss\")\n",
"plt.title(\"Loss values\")\n",
"plt.xlabel(\"Epoch\")\n",
"plt.ylabel(\"Loss\")\n",
"plt.legend()\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 5. Natural Language Processing Context\n",
"(Go to top)\n",
"\n",
"If we want to use the same type of architecture for text classification, we need to apply some feature extraction methods first. For example: We can get TF-IDF vectors of text fields. After that, we can use neural networks on those features. \n",
"\n",
"We will also look at __more advanced neural network architrectures__ such as __Recurrent Neural Networks (RNNs)__, __Long Short-Term Memory networks (LSTMs)__ and __Transformers__. "
]
}
],
"metadata": {
"kernelspec": {
"display_name": "conda_pytorch_p39",
"language": "python",
"name": "conda_pytorch_p39"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.13"
}
},
"nbformat": 4,
"nbformat_minor": 2
}