{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "![MLU Logo](../../data/MLU_Logo.png)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# <a name=\"0\">Machine Learning Accelerator - Computer Vision - Lecture 1</a>\n",
    "\n",
    "## Neural Networks with PyTorch\n",
    "\n",
    "In this notebook, we build, train and validate a Neural Network in [PyTorch](https://pytorch.org/docs/stable/index.html), an open source machine learning framework that accelerates the path from research prototyping to production deployment with a clear, concise, and simple API. \n",
    "\n",
    "1. <a href=\"#1\">Implementing a neural network with PyTorch</a>       \n",
    "2. <a href=\"#2\">Loss Functions</a>\n",
    "3. <a href=\"#3\">Training</a>\n",
    "4. <a href=\"#4\">Example - Binary Classification</a>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "! pip install -q -r ../../requirements.txt"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 1. <a name=\"1\">Implementing a neural network with PyTorch</a>\n",
    "(<a href=\"#0\">Go to top</a>)\n",
    "\n",
    "Let's implement a simple neural network with two hidden layers of size 64 using the sequential container (Adding things in sequence). We will have 3 inputs, 2 hidden layers and 1 output layer. Some drop-outs attached to the hidden layers."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import torch\n",
    "from torch import nn\n",
    "\n",
    "net = nn.Sequential(\n",
    "        nn.Linear(3, 64),     # Linear layer-1 with 64 out_features and input size 3\n",
    "        nn.Tanh(),            # Tanh activation is applied\n",
    "        nn.Dropout(p=0.4),    # Apply random 40% drop-out to layer_1\n",
    "        nn.Linear(64, 64),    # Linear layer-2 with 64 units and input size 64  \n",
    "        nn.Tanh(),            # Tanh activation is applied\n",
    "        nn.Dropout(p=0.3),    # Apply random 30% drop-out to layer_2\n",
    "        nn.Linear(64, 1))     # Output layer with single unit\n",
    "\n",
    "print(net)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The weight parameters of the `Linear` layer in pytorch are initialized with a modified form of the Xavier Initialization. Using these weights as a start, we can later apply optimization such as SGD to train the weights. As a result, using a strategic technique to initialize the weights is crucial. \n",
    "\n",
    "Here is a full list of [Initializers](https://pytorch.org/docs/stable/nn.init.html). The commonly used one is called *Xavier initilaization*, which can keep the scale of gradients roughly the same in all the layers. (Here are more technical details of [Xavier initilaization](https://d2l.ai/chapter_multilayer-perceptrons/numerical-stability-and-init.html#xavier-initialization).)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def xavier_init_weights(m):\n",
    "    if type(m) == nn.Linear:\n",
    "        torch.nn.init.xavier_uniform_(m.weight)\n",
    "\n",
    "net.apply(xavier_init_weights)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can easily access them with `net[layer_index]`:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(net[0])\n",
    "print(net[1])\n",
    "print(net[2])\n",
    "print(net[3])\n",
    "print(net[4])\n",
    "print(net[5])\n",
    "print(net[6])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 2. <a name=\"2\">Loss Functions</a>\n",
    "(<a href=\"#0\">Go to top</a>)\n",
    "\n",
    "We can select [loss functions](https://d2l.ai/chapter_linear-networks/linear-regression.html#loss-function) according to our problem. A full list of supported `Loss` functions in PyTorch are available [here](https://pytorch.org/docs/stable/nn.html#loss-functions). \n",
    "\n",
    "Let's go over some popular loss functions and see how to call a built-in loss function:\n",
    "\n",
    "\n",
    "__Binary Cross-entropy Loss:__ A common used loss function for binary classification. \n",
    "\n",
    "```python\n",
    "loss = nn.BCELoss()\n",
    "```\n",
    "\n",
    "__Categorical Cross-entropy Loss:__ A common used loss function for multi-class classification. \n",
    "\n",
    "```python\n",
    "loss = nn.CrossEntropyLoss()\n",
    "```\n",
    "\n",
    "__MSE Loss:__ One of the most common loss functions for regression problems. \n",
    "\n",
    "```python\n",
    "loss = nn.MSELoss()\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 3. <a name=\"3\">Training</a>\n",
    "(<a href=\"#0\">Go to top</a>)\n",
    "\n",
    "`torch.optim` module provides necessary optimization algorithms for neural networks. We can use the following `Optimizers` to train a network using [Stochastic Gradient Descent (SGD)](https://d2l.ai/chapter_optimization/sgd.html) method and learning rate of 0.001.\n",
    "\n",
    "```python\n",
    "from torch import optim\n",
    "optimizer = optim.SGD(net.parameters(), lr=0.001)\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 4. <a name=\"4\">Example - Binary Classification</a>\n",
    "(<a href=\"#0\">Go to top</a>)\n",
    "\n",
    "In this example, we will train a neural network on a dataset that we randomly generated. We will have two classes and train a neural network to classify them."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from sklearn.datasets import make_circles\n",
    "X, y = make_circles(n_samples=750, shuffle=True, random_state=42, noise=0.05, factor=0.3)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "First let's plot the simulated dataset."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%matplotlib inline\n",
    "import matplotlib.pyplot as plt\n",
    "import seaborn as sns\n",
    "\n",
    "def plot_dataset(X, y, title):\n",
    "    \n",
    "    # Activate Seaborn visualization\n",
    "    sns.set()\n",
    "    \n",
    "    # Plot both classes: Class1->Blue, Class2->Red\n",
    "    plt.scatter(X[y==1, 0], X[y==1, 1], c='blue', label=\"class 1\")\n",
    "    plt.scatter(X[y==0, 0], X[y==0, 1], c='red', label=\"class 2\")\n",
    "    plt.legend(loc='upper right')\n",
    "    plt.xlabel('x1')\n",
    "    plt.ylabel('x2')\n",
    "    plt.xlim(-2, 2)\n",
    "    plt.ylim(-2, 2)\n",
    "    plt.title(title)\n",
    "    plt.show()\n",
    "    \n",
    "plot_dataset(X, y, title=\"Dataset\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Next, we import the necessary libraries and classes."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import time\n",
    "from torch.nn import BCELoss"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Then, we create the network as below. It will have two hidden layers. Since the data seems easily seperable, we can have a small network (2 hidden layers) with 10 units at each layer."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Use GPU resource if available, otherwise wil use CPU\n",
    "device = torch.device(\"cuda:0\" if torch.cuda.is_available() else \"cpu\")\n",
    "\n",
    "net = nn.Sequential(nn.Linear(in_features=2, out_features=10),\n",
    "                    nn.ReLU(),\n",
    "                    nn.Linear(10, 10),\n",
    "                    nn.ReLU(),\n",
    "                    nn.Linear(10, 1),\n",
    "                    nn.Sigmoid()).to(device)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now let's prepare the training set and validation set, and load each of them to a `DataLoader`, respectively."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Split the dataset into two parts: 80%-20% split\n",
    "X_train, X_val = X[0:int(len(X)*0.8), :], X[int(len(X)*0.8):, :]\n",
    "y_train, y_val = y[:int(len(X)*0.8)], y[int(len(X)*0.8):]\n",
    "\n",
    "# Use PyTorch DataLoaders to load the data in batches\n",
    "batch_size = 4           # How many samples to use for each weight update \n",
    "train_dataset = torch.utils.data.TensorDataset(torch.tensor(X_train, dtype=torch.float32),\n",
    "                                               torch.tensor(y_train, dtype=torch.float32))\n",
    "train_loader = torch.utils.data.DataLoader(train_dataset, batch_size=batch_size)\n",
    "\n",
    "# Move validation dataset on CPU/GPU device\n",
    "X_val = torch.tensor(X_val, dtype=torch.float32).to(device)\n",
    "y_val = torch.tensor(y_val, dtype=torch.float32).to(device)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Before the training, one last thing is to define the hyper-parameters for training."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "epochs = 50   # Total number of iterations\n",
    "lr = 0.01     # Learning rate\n",
    "\n",
    "# Define the loss. As we used sigmoid in the last layer, we use `nn.BCELoss`.\n",
    "# Otherwise we could have made use of `nn.BCEWithLogitsLoss`.\n",
    "loss = BCELoss(reduction='none')\n",
    "\n",
    "# Define the optimizer, SGD with learning rate\n",
    "optimizer = torch.optim.SGD(net.parameters(), lr=lr)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Finally, it is the time for training! We will run through the training set 50 times (i.e., epochs) and print training and validation losses at each epoch."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "train_losses = []\n",
    "val_losses = []\n",
    "for epoch in range(epochs):\n",
    "    start = time.time()\n",
    "    training_loss = 0\n",
    "    # Build a training loop, to train the network\n",
    "    for idx, (data, target) in enumerate(train_loader):\n",
    "        # zero the parameter gradients\n",
    "        optimizer.zero_grad()\n",
    "        \n",
    "        data = data.to(device)\n",
    "        target = target.to(device).view(-1, 1)\n",
    "        \n",
    "        output = net(data)\n",
    "        L = loss(output, target).sum()\n",
    "        training_loss += L.item()\n",
    "        L.backward()\n",
    "        optimizer.step()\n",
    "    \n",
    "    # Get validation predictions\n",
    "    val_predictions = net(X_val)\n",
    "    # Calculate the validation loss\n",
    "    val_loss = torch.sum(loss(val_predictions, y_val.view(-1, 1))).item()\n",
    "    \n",
    "    # Take the average losses\n",
    "    training_loss = training_loss / len(y_train)\n",
    "    val_loss = val_loss / len(y_val)\n",
    "    \n",
    "    train_losses.append(training_loss)\n",
    "    val_losses.append(val_loss)\n",
    "    \n",
    "    end = time.time()\n",
    "    # Print the losses every 10 epochs\n",
    "    if (epoch == 0) or ((epoch+1)%10 == 0):\n",
    "        print(\"Epoch %s. Train_loss %f Validation_loss %f Seconds %f\" % \\\n",
    "              (epoch, training_loss, val_loss, end-start))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's see the training and validation loss plots below. Losses go down as the training process continues as expected."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%matplotlib inline\n",
    "import matplotlib.pyplot as plt\n",
    "import seaborn as sns\n",
    "\n",
    "plt.plot(train_losses, label=\"Training Loss\")\n",
    "plt.plot(val_losses, label=\"Validation Loss\")\n",
    "plt.title(\"Loss values\")\n",
    "plt.xlabel(\"Epoch\")\n",
    "plt.ylabel(\"Loss\")\n",
    "plt.legend()\n",
    "plt.show()"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "conda_pytorch_p39",
   "language": "python",
   "name": "conda_pytorch_p39"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.13"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}