{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
""
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Machine Learning Accelerator - Computer Vision - Lecture 1\n",
"\n",
"## Neural Networks with PyTorch\n",
"\n",
"In this notebook, we build, train and validate a Neural Network in [PyTorch](https://pytorch.org/docs/stable/index.html), an open source machine learning framework that accelerates the path from research prototyping to production deployment with a clear, concise, and simple API. \n",
"\n",
"1. Implementing a neural network with PyTorch \n",
"2. Loss Functions\n",
"3. Training\n",
"4. Example - Binary Classification"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"! pip install -q -r ../../requirements.txt"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 1. Implementing a neural network with PyTorch\n",
"(Go to top)\n",
"\n",
"Let's implement a simple neural network with two hidden layers of size 64 using the sequential container (Adding things in sequence). We will have 3 inputs, 2 hidden layers and 1 output layer. Some drop-outs attached to the hidden layers."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import torch\n",
"from torch import nn\n",
"\n",
"net = nn.Sequential(\n",
" nn.Linear(3, 64), # Linear layer-1 with 64 out_features and input size 3\n",
" nn.Tanh(), # Tanh activation is applied\n",
" nn.Dropout(p=0.4), # Apply random 40% drop-out to layer_1\n",
" nn.Linear(64, 64), # Linear layer-2 with 64 units and input size 64 \n",
" nn.Tanh(), # Tanh activation is applied\n",
" nn.Dropout(p=0.3), # Apply random 30% drop-out to layer_2\n",
" nn.Linear(64, 1)) # Output layer with single unit\n",
"\n",
"print(net)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The weight parameters of the `Linear` layer in pytorch are initialized with a modified form of the Xavier Initialization. Using these weights as a start, we can later apply optimization such as SGD to train the weights. As a result, using a strategic technique to initialize the weights is crucial. \n",
"\n",
"Here is a full list of [Initializers](https://pytorch.org/docs/stable/nn.init.html). The commonly used one is called *Xavier initilaization*, which can keep the scale of gradients roughly the same in all the layers. (Here are more technical details of [Xavier initilaization](https://d2l.ai/chapter_multilayer-perceptrons/numerical-stability-and-init.html#xavier-initialization).)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"def xavier_init_weights(m):\n",
" if type(m) == nn.Linear:\n",
" torch.nn.init.xavier_uniform_(m.weight)\n",
"\n",
"net.apply(xavier_init_weights)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can easily access them with `net[layer_index]`:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"print(net[0])\n",
"print(net[1])\n",
"print(net[2])\n",
"print(net[3])\n",
"print(net[4])\n",
"print(net[5])\n",
"print(net[6])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 2. Loss Functions\n",
"(Go to top)\n",
"\n",
"We can select [loss functions](https://d2l.ai/chapter_linear-networks/linear-regression.html#loss-function) according to our problem. A full list of supported `Loss` functions in PyTorch are available [here](https://pytorch.org/docs/stable/nn.html#loss-functions). \n",
"\n",
"Let's go over some popular loss functions and see how to call a built-in loss function:\n",
"\n",
"\n",
"__Binary Cross-entropy Loss:__ A common used loss function for binary classification. \n",
"\n",
"```python\n",
"loss = nn.BCELoss()\n",
"```\n",
"\n",
"__Categorical Cross-entropy Loss:__ A common used loss function for multi-class classification. \n",
"\n",
"```python\n",
"loss = nn.CrossEntropyLoss()\n",
"```\n",
"\n",
"__MSE Loss:__ One of the most common loss functions for regression problems. \n",
"\n",
"```python\n",
"loss = nn.MSELoss()\n",
"```"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 3. Training\n",
"(Go to top)\n",
"\n",
"`torch.optim` module provides necessary optimization algorithms for neural networks. We can use the following `Optimizers` to train a network using [Stochastic Gradient Descent (SGD)](https://d2l.ai/chapter_optimization/sgd.html) method and learning rate of 0.001.\n",
"\n",
"```python\n",
"from torch import optim\n",
"optimizer = optim.SGD(net.parameters(), lr=0.001)\n",
"```"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 4. Example - Binary Classification\n",
"(Go to top)\n",
"\n",
"In this example, we will train a neural network on a dataset that we randomly generated. We will have two classes and train a neural network to classify them."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from sklearn.datasets import make_circles\n",
"X, y = make_circles(n_samples=750, shuffle=True, random_state=42, noise=0.05, factor=0.3)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"First let's plot the simulated dataset."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%matplotlib inline\n",
"import matplotlib.pyplot as plt\n",
"import seaborn as sns\n",
"\n",
"def plot_dataset(X, y, title):\n",
" \n",
" # Activate Seaborn visualization\n",
" sns.set()\n",
" \n",
" # Plot both classes: Class1->Blue, Class2->Red\n",
" plt.scatter(X[y==1, 0], X[y==1, 1], c='blue', label=\"class 1\")\n",
" plt.scatter(X[y==0, 0], X[y==0, 1], c='red', label=\"class 2\")\n",
" plt.legend(loc='upper right')\n",
" plt.xlabel('x1')\n",
" plt.ylabel('x2')\n",
" plt.xlim(-2, 2)\n",
" plt.ylim(-2, 2)\n",
" plt.title(title)\n",
" plt.show()\n",
" \n",
"plot_dataset(X, y, title=\"Dataset\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Next, we import the necessary libraries and classes."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import time\n",
"from torch.nn import BCELoss"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Then, we create the network as below. It will have two hidden layers. Since the data seems easily seperable, we can have a small network (2 hidden layers) with 10 units at each layer."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Use GPU resource if available, otherwise wil use CPU\n",
"device = torch.device(\"cuda:0\" if torch.cuda.is_available() else \"cpu\")\n",
"\n",
"net = nn.Sequential(nn.Linear(in_features=2, out_features=10),\n",
" nn.ReLU(),\n",
" nn.Linear(10, 10),\n",
" nn.ReLU(),\n",
" nn.Linear(10, 1),\n",
" nn.Sigmoid()).to(device)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now let's prepare the training set and validation set, and load each of them to a `DataLoader`, respectively."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Split the dataset into two parts: 80%-20% split\n",
"X_train, X_val = X[0:int(len(X)*0.8), :], X[int(len(X)*0.8):, :]\n",
"y_train, y_val = y[:int(len(X)*0.8)], y[int(len(X)*0.8):]\n",
"\n",
"# Use PyTorch DataLoaders to load the data in batches\n",
"batch_size = 4 # How many samples to use for each weight update \n",
"train_dataset = torch.utils.data.TensorDataset(torch.tensor(X_train, dtype=torch.float32),\n",
" torch.tensor(y_train, dtype=torch.float32))\n",
"train_loader = torch.utils.data.DataLoader(train_dataset, batch_size=batch_size)\n",
"\n",
"# Move validation dataset on CPU/GPU device\n",
"X_val = torch.tensor(X_val, dtype=torch.float32).to(device)\n",
"y_val = torch.tensor(y_val, dtype=torch.float32).to(device)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Before the training, one last thing is to define the hyper-parameters for training."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"epochs = 50 # Total number of iterations\n",
"lr = 0.01 # Learning rate\n",
"\n",
"# Define the loss. As we used sigmoid in the last layer, we use `nn.BCELoss`.\n",
"# Otherwise we could have made use of `nn.BCEWithLogitsLoss`.\n",
"loss = BCELoss(reduction='none')\n",
"\n",
"# Define the optimizer, SGD with learning rate\n",
"optimizer = torch.optim.SGD(net.parameters(), lr=lr)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Finally, it is the time for training! We will run through the training set 50 times (i.e., epochs) and print training and validation losses at each epoch."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"train_losses = []\n",
"val_losses = []\n",
"for epoch in range(epochs):\n",
" start = time.time()\n",
" training_loss = 0\n",
" # Build a training loop, to train the network\n",
" for idx, (data, target) in enumerate(train_loader):\n",
" # zero the parameter gradients\n",
" optimizer.zero_grad()\n",
" \n",
" data = data.to(device)\n",
" target = target.to(device).view(-1, 1)\n",
" \n",
" output = net(data)\n",
" L = loss(output, target).sum()\n",
" training_loss += L.item()\n",
" L.backward()\n",
" optimizer.step()\n",
" \n",
" # Get validation predictions\n",
" val_predictions = net(X_val)\n",
" # Calculate the validation loss\n",
" val_loss = torch.sum(loss(val_predictions, y_val.view(-1, 1))).item()\n",
" \n",
" # Take the average losses\n",
" training_loss = training_loss / len(y_train)\n",
" val_loss = val_loss / len(y_val)\n",
" \n",
" train_losses.append(training_loss)\n",
" val_losses.append(val_loss)\n",
" \n",
" end = time.time()\n",
" # Print the losses every 10 epochs\n",
" if (epoch == 0) or ((epoch+1)%10 == 0):\n",
" print(\"Epoch %s. Train_loss %f Validation_loss %f Seconds %f\" % \\\n",
" (epoch, training_loss, val_loss, end-start))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's see the training and validation loss plots below. Losses go down as the training process continues as expected."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%matplotlib inline\n",
"import matplotlib.pyplot as plt\n",
"import seaborn as sns\n",
"\n",
"plt.plot(train_losses, label=\"Training Loss\")\n",
"plt.plot(val_losses, label=\"Validation Loss\")\n",
"plt.title(\"Loss values\")\n",
"plt.xlabel(\"Epoch\")\n",
"plt.ylabel(\"Loss\")\n",
"plt.legend()\n",
"plt.show()"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "conda_pytorch_p39",
"language": "python",
"name": "conda_pytorch_p39"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.13"
}
},
"nbformat": 4,
"nbformat_minor": 2
}