{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
""
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Machine Learning Accelerator - Natural Language Processing - Lecture 3\n",
"\n",
"## Neural Networks with PyTorch\n",
"\n",
"In this notebook, we will build, train and validate a Neural Network using PyTorch.\n",
"1. Implementing a neural network with PyTorch\n",
"2. Loss Functions\n",
"3. Training\n",
"4. Example - Binary Classification\n",
"5. Natural Language Processing Context"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"%pip install -q -r ../../requirements.txt"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 1. Implementing a neural network with PyTorch\n",
"(Go to top)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's implement a simple neural network with two hidden layers of size 64 and 128 using the sequential mode (Adding things in sequence). We will have 3 inputs, 2 hidden layers and 1 output layer. Some drop-outs attached to the hidden layers."
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"ExecuteTime": {
"end_time": "2021-01-09T14:42:26.315306Z",
"start_time": "2021-01-09T14:42:25.876374Z"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Sequential(\n",
" (0): Linear(in_features=3, out_features=64, bias=True)\n",
" (1): Tanh()\n",
" (2): Dropout(p=0.4, inplace=False)\n",
" (3): Linear(in_features=64, out_features=64, bias=True)\n",
" (4): Tanh()\n",
" (5): Dropout(p=0.3, inplace=False)\n",
" (6): Linear(in_features=64, out_features=1, bias=True)\n",
")\n"
]
}
],
"source": [
"import torch\n",
"from torch import nn\n",
"\n",
"net = nn.Sequential(\n",
" nn.Linear(in_features=3, # Input size of 3 is expected\n",
" out_features=64), # Linear layer-1 with 64 units\n",
" nn.Tanh(), # Tanh activation is applied\n",
" nn.Dropout(p=.4), # Apply random 40% drop-out to layer_1\n",
" nn.Linear(64, 64), # Linear layer-2 with 64 units \n",
" nn.Tanh(), # Tanh activation is applied\n",
" nn.Dropout(p=.3), # Apply random 30% drop-out to layer_2\n",
" nn.Linear(64,1)) # Output layer with single unit\n",
"\n",
"print(net)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can initialize the weights of the network with 'initialize()' function. We prefer to use the following:"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"ExecuteTime": {
"end_time": "2021-01-09T14:42:26.323790Z",
"start_time": "2021-01-09T14:42:26.316902Z"
}
},
"outputs": [
{
"data": {
"text/plain": [
"Sequential(\n",
" (0): Linear(in_features=3, out_features=64, bias=True)\n",
" (1): Tanh()\n",
" (2): Dropout(p=0.4, inplace=False)\n",
" (3): Linear(in_features=64, out_features=64, bias=True)\n",
" (4): Tanh()\n",
" (5): Dropout(p=0.3, inplace=False)\n",
" (6): Linear(in_features=64, out_features=1, bias=True)\n",
")"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"def xavier_init_weights(m):\n",
" if type(m) == nn.Linear:\n",
" torch.nn.init.xavier_uniform_(m.weight)\n",
"\n",
"net.apply(xavier_init_weights)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's look at our layers and dropouts on them. We can easily access them wth net[layer_index]"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"ExecuteTime": {
"end_time": "2021-01-09T14:42:26.329940Z",
"start_time": "2021-01-09T14:42:26.326430Z"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Linear(in_features=3, out_features=64, bias=True)\n",
"Tanh()\n",
"Dropout(p=0.4, inplace=False)\n",
"Linear(in_features=64, out_features=64, bias=True)\n",
"Tanh()\n"
]
}
],
"source": [
"print(net[0])\n",
"print(net[1])\n",
"print(net[2])\n",
"print(net[3])\n",
"print(net[4])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 2. Loss Functions\n",
"(Go to top)\n",
"\n",
"We can select [loss functions](https://d2l.ai/chapter_linear-networks/linear-regression.html#loss-function) according to our problem. A full list of supported `Loss` functions in PyTorch are available [here](https://pytorch.org/docs/stable/nn.html#loss-functions). \n",
"\n",
"Let's go over some popular loss functions and see how to call a built-in loss function:\n",
"\n",
"\n",
"__Binary Cross-entropy Loss:__ A common used loss function for binary classification. \n",
"\n",
"```python\n",
"loss = nn.BCELoss()\n",
"```\n",
"\n",
"__Categorical Cross-entropy Loss:__ A common used loss function for multi-class classification. \n",
"\n",
"```python\n",
"loss = nn.CrossEntropyLoss()\n",
"```\n",
"\n",
"__MSE Loss:__ One of the most common loss functions for regression problems. \n",
"\n",
"```python\n",
"loss = nn.MSELoss()\n",
"```\n",
"\n",
"__L1 Loss:__ This is similar to L2 loss. It measures the abolsute difference between target values (y) and predictions (p).\n",
"$$\n",
"\\mathrm{L1 loss} = \\frac{1}{2} \\sum_{examples}|y - p|\n",
"$$\n",
"In pytorch, we can use it with `L1Loss`:\n",
"```python\n",
"loss = nn.L1Loss()\n",
"```"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 3. Training\n",
"(Go to top)\n",
"\n",
"`torch.optim` module provides necessary optimization algorithms for neural networks. We can use the following `Optimizers` to train a network using [Stochastic Gradient Descent (SGD)](https://d2l.ai/chapter_optimization/sgd.html) method and learning rate of 0.001.\n",
"\n",
"```python\n",
"from torch import optim\n",
"optimizer = optim.SGD(net.parameters(), lr=0.001)\n",
"```"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 4. Example - Binary Classification\n",
"(Go to top)\n",
"\n",
"Let's train a neural network on a random dataset. We have two classes and will learn to classify them."
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"ExecuteTime": {
"end_time": "2021-01-09T14:42:26.750569Z",
"start_time": "2021-01-09T14:42:26.332404Z"
}
},
"outputs": [],
"source": [
"from sklearn.datasets import make_circles\n",
"\n",
"X, y = make_circles(n_samples=750, shuffle=True, random_state=42, noise=0.05, factor=0.3)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's plot the dataset"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {
"ExecuteTime": {
"end_time": "2021-01-09T14:42:27.482480Z",
"start_time": "2021-01-09T14:42:26.752160Z"
}
},
"outputs": [
{
"data": {
"image/png": "\n",
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"%matplotlib inline\n",
"import matplotlib.pyplot as plt\n",
"import seaborn as sns\n",
"\n",
"def plot_dataset(X, y, title):\n",
" \n",
" # Activate Seaborn visualization\n",
" sns.set()\n",
" \n",
" # Plot both classes: Class1->Blue, Class2->Red\n",
" plt.scatter(X[y==1, 0], X[y==1, 1], c='blue', label=\"class 1\")\n",
" plt.scatter(X[y==0, 0], X[y==0, 1], c='red', label=\"class 2\")\n",
" plt.legend(loc='upper right')\n",
" plt.xlabel('x1')\n",
" plt.ylabel('x2')\n",
" plt.xlim(-2, 2)\n",
" plt.ylim(-2, 2)\n",
" plt.title(title)\n",
" plt.show()\n",
" \n",
"plot_dataset(X, y, title=\"Dataset\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Importing the necessary libraries"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {
"ExecuteTime": {
"end_time": "2021-01-09T14:42:28.491779Z",
"start_time": "2021-01-09T14:42:28.489635Z"
}
},
"outputs": [],
"source": [
"import time\n",
"from torch.nn import BCELoss"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We are creating the network below. We will have two hidden layers. Since the data seems easily seperable, we can have a small network (2 hidden layers) with 10 units at each layer."
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {
"ExecuteTime": {
"end_time": "2021-01-09T14:42:28.498956Z",
"start_time": "2021-01-09T14:42:28.494975Z"
}
},
"outputs": [],
"source": [
"net = nn.Sequential(nn.Linear(in_features=2, out_features=10),\n",
" nn.ReLU(),\n",
" nn.Linear(10, 10),\n",
" nn.ReLU(),\n",
" nn.Linear(10, 1),\n",
" nn.Sigmoid())"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's define the training parameters"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {
"ExecuteTime": {
"end_time": "2021-01-09T14:42:28.504886Z",
"start_time": "2021-01-09T14:42:28.500985Z"
}
},
"outputs": [],
"source": [
"batch_size = 4 # How many samples to use for each weight update \n",
"epochs = 50 # Total number of iterations\n",
"learning_rate = 0.01 # Learning rate\n",
"device = torch.device(\"cuda:0\" if torch.cuda.is_available() else \"cpu\")\n",
"\n",
"epochs = 50 # Total number of iterations\n",
"lr = 0.01 # Learning rate\n",
"\n",
"# Define the loss. As we used sigmoid in the last layer, we use `nn.BCELoss`.\n",
"# Otherwise we could have made use of `nn.BCEWithLogitsLoss`.\n",
"loss = BCELoss(reduction='none')\n",
"\n",
"# Define the optimizer, SGD with learning rate\n",
"optimizer = torch.optim.SGD(net.parameters(), lr=lr)"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {
"ExecuteTime": {
"end_time": "2021-01-09T14:42:28.512093Z",
"start_time": "2021-01-09T14:42:28.506620Z"
}
},
"outputs": [],
"source": [
"# Split the dataset into two parts: 80%-20% split\n",
"X_train, X_val = X[0:int(len(X)*0.8), :], X[int(len(X)*0.8):, :]\n",
"y_train, y_val = y[:int(len(X)*0.8)], y[int(len(X)*0.8):]\n",
"\n",
"# Use PyTorch DataLoaders to load the data in batches\n",
"train_dataset = torch.utils.data.TensorDataset(torch.tensor(X_train, dtype=torch.float32),\n",
" torch.tensor(y_train, dtype=torch.float32))\n",
"train_loader = torch.utils.data.DataLoader(train_dataset, batch_size=batch_size)\n",
"\n",
"# Move validation dataset on CPU/GPU device\n",
"X_val = torch.tensor(X_val, dtype=torch.float32).to(device)\n",
"y_val = torch.tensor(y_val, dtype=torch.float32).to(device)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's start the training process. We will have training and validation sets and print our losses at each step."
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {
"ExecuteTime": {
"end_time": "2021-01-09T14:42:32.187608Z",
"start_time": "2021-01-09T14:42:28.513937Z"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Epoch 0. Train_loss 0.691907 Validation_loss 0.676005 Seconds 0.082086\n",
"Epoch 1. Train_loss 0.670287 Validation_loss 0.655867 Seconds 0.077845\n",
"Epoch 2. Train_loss 0.643775 Validation_loss 0.618162 Seconds 0.076957\n",
"Epoch 3. Train_loss 0.595045 Validation_loss 0.548945 Seconds 0.077757\n",
"Epoch 4. Train_loss 0.498955 Validation_loss 0.408997 Seconds 0.077910\n",
"Epoch 5. Train_loss 0.303406 Validation_loss 0.193668 Seconds 0.076993\n",
"Epoch 6. Train_loss 0.125799 Validation_loss 0.080758 Seconds 0.077985\n",
"Epoch 7. Train_loss 0.057654 Validation_loss 0.042393 Seconds 0.077412\n",
"Epoch 8. Train_loss 0.032992 Validation_loss 0.026434 Seconds 0.077835\n",
"Epoch 9. Train_loss 0.021789 Validation_loss 0.018319 Seconds 0.077515\n",
"Epoch 10. Train_loss 0.015715 Validation_loss 0.013642 Seconds 0.077841\n",
"Epoch 11. Train_loss 0.012024 Validation_loss 0.010670 Seconds 0.076876\n",
"Epoch 12. Train_loss 0.009590 Validation_loss 0.008659 Seconds 0.077558\n",
"Epoch 13. Train_loss 0.007895 Validation_loss 0.007228 Seconds 0.077840\n",
"Epoch 14. Train_loss 0.006661 Validation_loss 0.006167 Seconds 0.076993\n",
"Epoch 15. Train_loss 0.005727 Validation_loss 0.005353 Seconds 0.078901\n",
"Epoch 16. Train_loss 0.005001 Validation_loss 0.004713 Seconds 0.077746\n",
"Epoch 17. Train_loss 0.004424 Validation_loss 0.004198 Seconds 0.077517\n",
"Epoch 18. Train_loss 0.003954 Validation_loss 0.003775 Seconds 0.077761\n",
"Epoch 19. Train_loss 0.003560 Validation_loss 0.003421 Seconds 0.077350\n",
"Epoch 20. Train_loss 0.003228 Validation_loss 0.003124 Seconds 0.077678\n",
"Epoch 21. Train_loss 0.002946 Validation_loss 0.002869 Seconds 0.077460\n",
"Epoch 22. Train_loss 0.002704 Validation_loss 0.002650 Seconds 0.077357\n",
"Epoch 23. Train_loss 0.002494 Validation_loss 0.002459 Seconds 0.077229\n",
"Epoch 24. Train_loss 0.002313 Validation_loss 0.002288 Seconds 0.077835\n",
"Epoch 25. Train_loss 0.002154 Validation_loss 0.002138 Seconds 0.077294\n",
"Epoch 26. Train_loss 0.002015 Validation_loss 0.002005 Seconds 0.077167\n",
"Epoch 27. Train_loss 0.001892 Validation_loss 0.001887 Seconds 0.078338\n",
"Epoch 28. Train_loss 0.001782 Validation_loss 0.001781 Seconds 0.076799\n",
"Epoch 29. Train_loss 0.001683 Validation_loss 0.001686 Seconds 0.077582\n",
"Epoch 30. Train_loss 0.001593 Validation_loss 0.001599 Seconds 0.077930\n",
"Epoch 31. Train_loss 0.001513 Validation_loss 0.001521 Seconds 0.076850\n",
"Epoch 32. Train_loss 0.001439 Validation_loss 0.001449 Seconds 0.076659\n",
"Epoch 33. Train_loss 0.001372 Validation_loss 0.001383 Seconds 0.077665\n",
"Epoch 34. Train_loss 0.001310 Validation_loss 0.001323 Seconds 0.076882\n",
"Epoch 35. Train_loss 0.001253 Validation_loss 0.001268 Seconds 0.076400\n",
"Epoch 36. Train_loss 0.001201 Validation_loss 0.001216 Seconds 0.077991\n",
"Epoch 37. Train_loss 0.001152 Validation_loss 0.001169 Seconds 0.077189\n",
"Epoch 38. Train_loss 0.001107 Validation_loss 0.001124 Seconds 0.076929\n",
"Epoch 39. Train_loss 0.001065 Validation_loss 0.001083 Seconds 0.077906\n",
"Epoch 40. Train_loss 0.001026 Validation_loss 0.001044 Seconds 0.076958\n",
"Epoch 41. Train_loss 0.000990 Validation_loss 0.001008 Seconds 0.077863\n",
"Epoch 42. Train_loss 0.000956 Validation_loss 0.000974 Seconds 0.078232\n",
"Epoch 43. Train_loss 0.000924 Validation_loss 0.000943 Seconds 0.077750\n",
"Epoch 44. Train_loss 0.000894 Validation_loss 0.000913 Seconds 0.077740\n",
"Epoch 45. Train_loss 0.000865 Validation_loss 0.000885 Seconds 0.078359\n",
"Epoch 46. Train_loss 0.000838 Validation_loss 0.000858 Seconds 0.077139\n",
"Epoch 47. Train_loss 0.000813 Validation_loss 0.000833 Seconds 0.077528\n",
"Epoch 48. Train_loss 0.000789 Validation_loss 0.000809 Seconds 0.077115\n",
"Epoch 49. Train_loss 0.000767 Validation_loss 0.000787 Seconds 0.077799\n"
]
}
],
"source": [
"train_losses = []\n",
"val_losses = []\n",
"for epoch in range(epochs):\n",
" start = time.time()\n",
" training_loss = 0\n",
" # Build a training loop, to train the network\n",
" for idx, (data, target) in enumerate(train_loader):\n",
" # zero the parameter gradients\n",
" optimizer.zero_grad()\n",
" \n",
" data = data.to(device)\n",
" target = target.to(device).view(-1, 1)\n",
" \n",
" output = net(data)\n",
" L = loss(output, target).sum()\n",
" training_loss += L.item()\n",
" L.backward()\n",
" optimizer.step()\n",
" \n",
" # Get validation predictions\n",
" val_predictions = net(X_val)\n",
" # Calculate the validation loss\n",
" val_loss = torch.sum(loss(val_predictions, y_val.view(-1, 1))).item()\n",
" \n",
" # Take the average losses\n",
" training_loss = training_loss / len(y_train)\n",
" val_loss = val_loss / len(y_val)\n",
" \n",
" train_losses.append(training_loss)\n",
" val_losses.append(val_loss)\n",
" \n",
" end = time.time()\n",
" print(\"Epoch %s. Train_loss %f Validation_loss %f Seconds %f\" % \\\n",
" (epoch, training_loss, val_loss, end-start))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's see the training and validation loss plots below. Losses go down as the training process continues as expected."
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {
"ExecuteTime": {
"end_time": "2021-01-09T14:42:32.401843Z",
"start_time": "2021-01-09T14:42:32.189298Z"
}
},
"outputs": [
{
"data": {
"image/png": "\n",
"text/plain": [
""
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"%matplotlib inline\n",
"import matplotlib.pyplot as plt\n",
"import seaborn as sns\n",
"\n",
"plt.plot(train_losses, label=\"Training Loss\")\n",
"plt.plot(val_losses, label=\"Validation Loss\")\n",
"plt.title(\"Loss values\")\n",
"plt.xlabel(\"Epoch\")\n",
"plt.ylabel(\"Loss\")\n",
"plt.legend()\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 5. Natural Language Processing Context\n",
"(Go to top)\n",
"\n",
"If we want to use the same type of architecture for text classification, we need to apply some feature extraction methods first. For example: We can get TF-IDF vectors of text fields. After that, we can use neural networks on those features. \n",
"\n",
"We will also look at __more advanced neural network architrectures__ such as __Recurrent Neural Networks (RNNs)__, __Long Short-Term Memory networks (LSTMs)__ and __Transformers__. "
]
}
],
"metadata": {
"kernelspec": {
"display_name": "conda_pytorch_p39",
"language": "python",
"name": "conda_pytorch_p39"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.13"
}
},
"nbformat": 4,
"nbformat_minor": 2
}