{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "![MLU Logo](../../data/MLU_Logo.png)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Machine Learning Accelerator - Computer Vision - Lecture 1\n", "\n", "## Neural Networks with PyTorch\n", "\n", "In this notebook, we build, train and validate a Neural Network in [PyTorch](https://pytorch.org/docs/stable/index.html), an open source machine learning framework that accelerates the path from research prototyping to production deployment with a clear, concise, and simple API. \n", "\n", "1. Implementing a neural network with PyTorch \n", "2. Loss Functions\n", "3. Training\n", "4. Example - Binary Classification" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "! pip install -q -r ../../requirements.txt" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 1. Implementing a neural network with PyTorch\n", "(Go to top)\n", "\n", "Let's implement a simple neural network with two hidden layers of size 64 using the sequential container (Adding things in sequence). We will have 3 inputs, 2 hidden layers and 1 output layer. Some drop-outs attached to the hidden layers." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import torch\n", "from torch import nn\n", "\n", "net = nn.Sequential(\n", " nn.Linear(3, 64), # Linear layer-1 with 64 out_features and input size 3\n", " nn.Tanh(), # Tanh activation is applied\n", " nn.Dropout(p=0.4), # Apply random 40% drop-out to layer_1\n", " nn.Linear(64, 64), # Linear layer-2 with 64 units and input size 64 \n", " nn.Tanh(), # Tanh activation is applied\n", " nn.Dropout(p=0.3), # Apply random 30% drop-out to layer_2\n", " nn.Linear(64, 1)) # Output layer with single unit\n", "\n", "print(net)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The weight parameters of the `Linear` layer in pytorch are initialized with a modified form of the Xavier Initialization. Using these weights as a start, we can later apply optimization such as SGD to train the weights. As a result, using a strategic technique to initialize the weights is crucial. \n", "\n", "Here is a full list of [Initializers](https://pytorch.org/docs/stable/nn.init.html). The commonly used one is called *Xavier initilaization*, which can keep the scale of gradients roughly the same in all the layers. (Here are more technical details of [Xavier initilaization](https://d2l.ai/chapter_multilayer-perceptrons/numerical-stability-and-init.html#xavier-initialization).)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def xavier_init_weights(m):\n", " if type(m) == nn.Linear:\n", " torch.nn.init.xavier_uniform_(m.weight)\n", "\n", "net.apply(xavier_init_weights)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can easily access them with `net[layer_index]`:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "print(net[0])\n", "print(net[1])\n", "print(net[2])\n", "print(net[3])\n", "print(net[4])\n", "print(net[5])\n", "print(net[6])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 2. Loss Functions\n", "(Go to top)\n", "\n", "We can select [loss functions](https://d2l.ai/chapter_linear-networks/linear-regression.html#loss-function) according to our problem. A full list of supported `Loss` functions in PyTorch are available [here](https://pytorch.org/docs/stable/nn.html#loss-functions). \n", "\n", "Let's go over some popular loss functions and see how to call a built-in loss function:\n", "\n", "\n", "__Binary Cross-entropy Loss:__ A common used loss function for binary classification. \n", "\n", "```python\n", "loss = nn.BCELoss()\n", "```\n", "\n", "__Categorical Cross-entropy Loss:__ A common used loss function for multi-class classification. \n", "\n", "```python\n", "loss = nn.CrossEntropyLoss()\n", "```\n", "\n", "__MSE Loss:__ One of the most common loss functions for regression problems. \n", "\n", "```python\n", "loss = nn.MSELoss()\n", "```" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 3. Training\n", "(Go to top)\n", "\n", "`torch.optim` module provides necessary optimization algorithms for neural networks. We can use the following `Optimizers` to train a network using [Stochastic Gradient Descent (SGD)](https://d2l.ai/chapter_optimization/sgd.html) method and learning rate of 0.001.\n", "\n", "```python\n", "from torch import optim\n", "optimizer = optim.SGD(net.parameters(), lr=0.001)\n", "```" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 4. Example - Binary Classification\n", "(Go to top)\n", "\n", "In this example, we will train a neural network on a dataset that we randomly generated. We will have two classes and train a neural network to classify them." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from sklearn.datasets import make_circles\n", "X, y = make_circles(n_samples=750, shuffle=True, random_state=42, noise=0.05, factor=0.3)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "First let's plot the simulated dataset." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%matplotlib inline\n", "import matplotlib.pyplot as plt\n", "import seaborn as sns\n", "\n", "def plot_dataset(X, y, title):\n", " \n", " # Activate Seaborn visualization\n", " sns.set()\n", " \n", " # Plot both classes: Class1->Blue, Class2->Red\n", " plt.scatter(X[y==1, 0], X[y==1, 1], c='blue', label=\"class 1\")\n", " plt.scatter(X[y==0, 0], X[y==0, 1], c='red', label=\"class 2\")\n", " plt.legend(loc='upper right')\n", " plt.xlabel('x1')\n", " plt.ylabel('x2')\n", " plt.xlim(-2, 2)\n", " plt.ylim(-2, 2)\n", " plt.title(title)\n", " plt.show()\n", " \n", "plot_dataset(X, y, title=\"Dataset\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Next, we import the necessary libraries and classes." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import time\n", "from torch.nn import BCELoss" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Then, we create the network as below. It will have two hidden layers. Since the data seems easily seperable, we can have a small network (2 hidden layers) with 10 units at each layer." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Use GPU resource if available, otherwise wil use CPU\n", "device = torch.device(\"cuda:0\" if torch.cuda.is_available() else \"cpu\")\n", "\n", "net = nn.Sequential(nn.Linear(in_features=2, out_features=10),\n", " nn.ReLU(),\n", " nn.Linear(10, 10),\n", " nn.ReLU(),\n", " nn.Linear(10, 1),\n", " nn.Sigmoid()).to(device)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now let's prepare the training set and validation set, and load each of them to a `DataLoader`, respectively." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Split the dataset into two parts: 80%-20% split\n", "X_train, X_val = X[0:int(len(X)*0.8), :], X[int(len(X)*0.8):, :]\n", "y_train, y_val = y[:int(len(X)*0.8)], y[int(len(X)*0.8):]\n", "\n", "# Use PyTorch DataLoaders to load the data in batches\n", "batch_size = 4 # How many samples to use for each weight update \n", "train_dataset = torch.utils.data.TensorDataset(torch.tensor(X_train, dtype=torch.float32),\n", " torch.tensor(y_train, dtype=torch.float32))\n", "train_loader = torch.utils.data.DataLoader(train_dataset, batch_size=batch_size)\n", "\n", "# Move validation dataset on CPU/GPU device\n", "X_val = torch.tensor(X_val, dtype=torch.float32).to(device)\n", "y_val = torch.tensor(y_val, dtype=torch.float32).to(device)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Before the training, one last thing is to define the hyper-parameters for training." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "epochs = 50 # Total number of iterations\n", "lr = 0.01 # Learning rate\n", "\n", "# Define the loss. As we used sigmoid in the last layer, we use `nn.BCELoss`.\n", "# Otherwise we could have made use of `nn.BCEWithLogitsLoss`.\n", "loss = BCELoss(reduction='none')\n", "\n", "# Define the optimizer, SGD with learning rate\n", "optimizer = torch.optim.SGD(net.parameters(), lr=lr)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Finally, it is the time for training! We will run through the training set 50 times (i.e., epochs) and print training and validation losses at each epoch." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "train_losses = []\n", "val_losses = []\n", "for epoch in range(epochs):\n", " start = time.time()\n", " training_loss = 0\n", " # Build a training loop, to train the network\n", " for idx, (data, target) in enumerate(train_loader):\n", " # zero the parameter gradients\n", " optimizer.zero_grad()\n", " \n", " data = data.to(device)\n", " target = target.to(device).view(-1, 1)\n", " \n", " output = net(data)\n", " L = loss(output, target).sum()\n", " training_loss += L.item()\n", " L.backward()\n", " optimizer.step()\n", " \n", " # Get validation predictions\n", " val_predictions = net(X_val)\n", " # Calculate the validation loss\n", " val_loss = torch.sum(loss(val_predictions, y_val.view(-1, 1))).item()\n", " \n", " # Take the average losses\n", " training_loss = training_loss / len(y_train)\n", " val_loss = val_loss / len(y_val)\n", " \n", " train_losses.append(training_loss)\n", " val_losses.append(val_loss)\n", " \n", " end = time.time()\n", " # Print the losses every 10 epochs\n", " if (epoch == 0) or ((epoch+1)%10 == 0):\n", " print(\"Epoch %s. Train_loss %f Validation_loss %f Seconds %f\" % \\\n", " (epoch, training_loss, val_loss, end-start))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's see the training and validation loss plots below. Losses go down as the training process continues as expected." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%matplotlib inline\n", "import matplotlib.pyplot as plt\n", "import seaborn as sns\n", "\n", "plt.plot(train_losses, label=\"Training Loss\")\n", "plt.plot(val_losses, label=\"Validation Loss\")\n", "plt.title(\"Loss values\")\n", "plt.xlabel(\"Epoch\")\n", "plt.ylabel(\"Loss\")\n", "plt.legend()\n", "plt.show()" ] } ], "metadata": { "kernelspec": { "display_name": "conda_pytorch_p39", "language": "python", "name": "conda_pytorch_p39" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.13" } }, "nbformat": 4, "nbformat_minor": 2 }