{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "![MLU Logo](../data/MLU_Logo.png)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# <a name=\"0\">Machine Learning Accelerator - Tabular Data - Lecture 3</a>\n",
    "\n",
    "\n",
    "## MXNet and Gluon\n",
    "\n",
    "1. <a href=\"#1\">MXNet: NDarrays and Autograd</a>\n",
    "2. <a href=\"#2\">Gluon: Building a Neural Network</a>\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\u001b[33mWARNING: You are using pip version 21.3.1; however, version 22.3.1 is available.\n",
      "You should consider upgrading via the '/home/ec2-user/anaconda3/envs/pytorch_p39/bin/python -m pip install --upgrade pip' command.\u001b[0m\n",
      "Note: you may need to restart the kernel to use updated packages.\n"
     ]
    }
   ],
   "source": [
    "%pip install -q -r ../requirements.txt"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 1. <a name=\"1\">MXNet: NDarrays and Autograd</a>\n",
    "<a href=\"#0\">Go to top</a>\n",
    "\n",
    "This tutorial is following the notebooks under the MXNet crash course [here](https://mxnet.apache.org/api/python/docs/tutorials/packages/ndarray/01-ndarray-intro.html).\n",
    "\n",
    "To get started, let’s import the ndarray package (nd is shortform) from MXNet.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "from mxnet import nd"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Next, let’s see how to create a 2D array (also called a matrix) with values from two sets of numbers: 1, 2, 3 and 4, 5, 6. This might also be referred to as a tuple of a tuple of integers."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "\n",
       "[[1. 2. 3.]\n",
       " [5. 6. 7.]]\n",
       "<NDArray 2x3 @cpu(0)>"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "nd.array(((1,2,3),(5,6,7)))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can also create a very simple matrix with the same shape (2 rows by 3 columns), but fill it with 1s."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "\n",
       "[[1. 1. 1.]\n",
       " [1. 1. 1.]]\n",
       "<NDArray 2x3 @cpu(0)>"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "x = nd.ones((2,3))\n",
    "x"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Often we’ll want to create arrays whose values are sampled randomly. For example, sampling values uniformly between -1 and 1. Here we create the same shape, but with random sampling."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "\n",
       "[[0.09762704 0.18568921 0.43037868]\n",
       " [0.6885315  0.20552671 0.71589124]]\n",
       "<NDArray 2x3 @cpu(0)>"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "y = nd.random.uniform(-1,1,(2,3))\n",
    "y"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You can also fill an array of a given shape with a given value, such as 2.0."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "\n",
       "[[2. 2. 2.]\n",
       " [2. 2. 2.]]\n",
       "<NDArray 2x3 @cpu(0)>"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "x = nd.full((2,3), 2.0)\n",
    "x"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "As with NumPy, the dimensions of each NDArray are accessible by accessing the .shape attribute. We can also query its size, which is equal to the product of the components of the shape. In addition, .dtype tells the data type of the stored values."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "((2, 3), 6, numpy.float32)"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "(x.shape, x.size, x.dtype)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Operations\n",
    "\n",
    "NDArray supports a large number of standard mathematical operations. Such as element-wise multiplication:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "\n",
       "[[0.19525409 0.37137842 0.86075735]\n",
       " [1.377063   0.41105342 1.4317825 ]]\n",
       "<NDArray 2x3 @cpu(0)>"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "x * y"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Exponentiation:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "\n",
       "[[1.1025515 1.204048  1.5378398]\n",
       " [1.9907899 1.2281718 2.0460093]]\n",
       "<NDArray 2x3 @cpu(0)>"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "y.exp()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "And grab a matrix’s transpose to compute a proper matrix-matrix product:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "\n",
       "[[1.4273899 3.219899 ]\n",
       " [1.4273899 3.219899 ]]\n",
       "<NDArray 2x2 @cpu(0)>"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "nd.dot(x, y.T)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Indexing\n",
    "\n",
    "MXNet NDArrays support slicing in all the ridiculous ways you might imagine accessing your data. Here’s an example of reading a particular element, which returns a 1D array with shape (1,)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "\n",
       "[0.71589124]\n",
       "<NDArray 1 @cpu(0)>"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "y[1,2]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Read the second and third columns from y."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "\n",
       "[[0.18568921 0.43037868]\n",
       " [0.20552671 0.71589124]]\n",
       "<NDArray 2x2 @cpu(0)>"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "y[:,1:3]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "and writing to a specific element"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "\n",
       "[[0.09762704 2.         2.        ]\n",
       " [0.6885315  2.         2.        ]]\n",
       "<NDArray 2x3 @cpu(0)>"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "y[:,1:3] = 2\n",
    "y"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Multi-dimensional slicing is also supported."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "\n",
       "[[0.09762704 2.         2.        ]\n",
       " [4.         4.         2.        ]]\n",
       "<NDArray 2x3 @cpu(0)>"
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "y[1:2,0:2] = 4\n",
    "y"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Automatic differentiation with autograd\n",
    "\n",
    "We train models to get better and better as a function of experience. <br/>\n",
    "__Usually, getting better means minimizing a loss function__. To achieve this goal, we often iteratively compute the gradient of the loss with respect to weights and then update the weights accordingly. While the gradient calculations are straightforward through a chain rule, for complex models, working it out by hand can be a pain.<br/>\n",
    "__Before diving deep into the model training, let’s go through how MXNet’s autograd package expedites this work by automatically calculating derivatives.__\n",
    "\n",
    "__Basic usage__\n",
    "\n",
    "Let’s first import the autograd package."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [],
   "source": [
    "from mxnet import nd\n",
    "from mxnet import autograd"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "As a toy example, let’s say that we are interested in differentiating a function $f(x) = 0.6x^2$ with respect to parameter $x$. We can start by assigning an intial value of x."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "\n",
       "[[1. 2.]\n",
       " [3. 4.]]\n",
       "<NDArray 2x2 @cpu(0)>"
      ]
     },
     "execution_count": 16,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "x = nd.array([[1, 2], [3, 4]])\n",
    "x"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Once we compute the gradient of $f(x)$ with respect to $x$, __we’ll need a place to store it__.<br/>\n",
    "In MXNet, we can tell an NDArray that we plan to store a gradient by invoking its attach_grad method."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [],
   "source": [
    "x.attach_grad()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now we’re going to define the function $y=f(x)$. To let MXNet store 𝑦, so that we can compute gradients later, we need to put the definition inside a autograd.record() scope."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {},
   "outputs": [],
   "source": [
    "with autograd.record():\n",
    "    y = 0.6 * x * x"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let’s invoke back propagation (backprop) by calling y.backward(). When 𝑦 has more than one entry, y.backward() is equivalent to y.sum().backward()."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {},
   "outputs": [],
   "source": [
    "y.backward()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now, let’s see if this is the expected output. Note that $y=0.6x^2$ and $dx/dy = 1.2x$ which should be [[1.2, 2.4],[3.6, 4.8]]. Let’s check the automatically computed results:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "\n",
       "[[1.2       2.4      ]\n",
       " [3.6000001 4.8      ]]\n",
       "<NDArray 2x2 @cpu(0)>"
      ]
     },
     "execution_count": 20,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "x.grad"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 2. <a name=\"2\">Gluon: Building a Neural Network</a>\n",
    "<a href=\"#0\">Go to top</a>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Implement a network with sequential mode \n",
    "\n",
    "Let's implement a simple neural network with two hidden layers of size 64 and 128 using the sequential mode. We will have 5 inputs, 1 output and some dropouts between the layers."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Sequential(\n",
       "  (0): Dense(None -> 64, Activation(relu))\n",
       "  (1): Dropout(p = 0.4, axes=())\n",
       "  (2): Dense(None -> 128, Activation(relu))\n",
       "  (3): Dropout(p = 0.3, axes=())\n",
       "  (4): Dense(None -> 1, Activation(sigmoid))\n",
       ")"
      ]
     },
     "execution_count": 21,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from mxnet import nd\n",
    "from mxnet.gluon import nn\n",
    "\n",
    "net = nn.Sequential()\n",
    "\n",
    "net.add(nn.Dense(64 ,activation='relu'),    # Layer 1\n",
    "        nn.Dropout(.4),                     # Apply random 40% dropout to layer 1\n",
    "        nn.Dense(128, activation='relu'),   # Layer 2\n",
    "        nn.Dropout(.3),                     # Apply random 30% dropout to layer 2\n",
    "        nn.Dense(1, activation='sigmoid'))  # Output layer\n",
    "net"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Initialize weights"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {},
   "outputs": [],
   "source": [
    "net.initialize()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's look at our layers and dropouts on them. We can easily access them with net[index]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Dense(None -> 64, Activation(relu))\n",
      "Dropout(p = 0.4, axes=())\n",
      "Dense(None -> 128, Activation(relu))\n",
      "Dropout(p = 0.3, axes=())\n",
      "Dense(None -> 1, Activation(sigmoid))\n"
     ]
    }
   ],
   "source": [
    "print(net[0])\n",
    "print(net[1])\n",
    "print(net[2])\n",
    "print(net[3])\n",
    "print(net[4])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's send a batch of data to this network (batch size is 4 in this case)\n",
    "\n",
    "__Important note:__ Weights are initialized after you pass some data through network. This is because network input size is learned from your input data."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Input shape is (batch_size, data lenght)\n",
    "x = nd.random.uniform(shape=(4, 5))\n",
    "y = net(x)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Random input data with shape (4, 5)\n",
      "\n",
      "[[0.5448832  0.8472517  0.4236548  0.6235637  0.6458941 ]\n",
      " [0.3843817  0.4375872  0.2975346  0.891773   0.05671298]\n",
      " [0.96366274 0.2726563  0.3834415  0.47766513 0.79172504]\n",
      " [0.8121687  0.5288949  0.47997716 0.56804454 0.3927848 ]]\n",
      "<NDArray 4x5 @cpu(0)>\n"
     ]
    }
   ],
   "source": [
    "print(\"Random input data with shape\" , x.shape)\n",
    "print(x)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Output shape: (4, 1)\n",
      "Network output:  \n",
      "[[0.500085  ]\n",
      " [0.49984136]\n",
      " [0.5009381 ]\n",
      " [0.5004806 ]]\n",
      "<NDArray 4x1 @cpu(0)>\n"
     ]
    }
   ],
   "source": [
    "print(\"Output shape:\", y.shape)\n",
    "print(\"Network output: \", y)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can also see the initialized weights for each layer."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "(64, 5) (64,)\n",
      "\n",
      "[[ 0.05958354  0.04705103 -0.06005495 -0.02276454 -0.0578019 ]\n",
      " [ 0.02074406 -0.06716943 -0.01844618  0.04656678  0.06400172]\n",
      " [ 0.03894195 -0.05035089  0.0518017   0.05181222  0.06700657]\n",
      " [-0.00369488  0.0418822   0.0421275  -0.00539289  0.00286685]\n",
      " [ 0.03927409  0.02504314 -0.05344158  0.03088857  0.01958894]\n",
      " [ 0.01148278 -0.04993054  0.00523225  0.06225365  0.03620619]\n",
      " [ 0.00305876 -0.05517294 -0.01194733 -0.00369594 -0.03296221]\n",
      " [-0.04391347  0.03839272  0.03316854 -0.00613896 -0.03968295]\n",
      " [ 0.00958075 -0.05106945 -0.06736943 -0.02462026  0.01646897]\n",
      " [-0.04904552  0.0156934  -0.03887501  0.01637076 -0.01589154]\n",
      " [ 0.06212472  0.05636378  0.02545484 -0.007007   -0.0196689 ]\n",
      " [ 0.01582889 -0.00881553  0.0563288   0.02766836 -0.05610075]\n",
      " [-0.06156844  0.06577327  0.02334734  0.0214396  -0.01161692]\n",
      " [ 0.06960588  0.03084543  0.06055803 -0.06998399 -0.05206258]\n",
      " [-0.02767344  0.06986568 -0.04945417 -0.03694754 -0.0570726 ]\n",
      " [-0.0144787  -0.04392357 -0.01569249 -0.0216215   0.02376445]\n",
      " [-0.01445255  0.06097547  0.00543434  0.04848353 -0.01131277]\n",
      " [-0.02614171  0.02593073  0.00343674 -0.04137669 -0.0079166 ]\n",
      " [ 0.05293644 -0.03785919 -0.06616574  0.00481795  0.02386545]\n",
      " [ 0.05795468 -0.01157733 -0.00599132  0.00821657 -0.0097022 ]\n",
      " [-0.05034583  0.06147789 -0.04226579  0.03897449  0.04210424]\n",
      " [ 0.03023587  0.06555662  0.04238605 -0.02612062 -0.05700789]\n",
      " [ 0.02692517  0.00254136  0.05269448  0.05110284  0.05524493]\n",
      " [ 0.04608057 -0.05809381  0.04614447 -0.06453233 -0.031773  ]\n",
      " [-0.04622374 -0.06170595  0.05293994  0.02387393 -0.05623144]\n",
      " [ 0.01302917 -0.01104493  0.02403157 -0.00896072 -0.04408851]\n",
      " [-0.06637033  0.06041572  0.00695275  0.06268228 -0.00905486]\n",
      " [-0.00213513 -0.01114851 -0.0251249  -0.02375313 -0.04838026]\n",
      " [-0.04134919  0.02784077  0.01669794 -0.05320692 -0.02804835]\n",
      " [-0.00207537 -0.03264418  0.01858328  0.01695874  0.04455174]\n",
      " [ 0.00407989  0.02562364 -0.05115881 -0.00020143  0.00190093]\n",
      " [ 0.01215158 -0.04417842  0.03076559  0.03994692 -0.03381027]\n",
      " [ 0.04955654  0.00646903 -0.00080685 -0.0129769   0.04851861]\n",
      " [-0.04522216 -0.05884963  0.06574854  0.00073446 -0.02841743]\n",
      " [-0.06085989 -0.02969836 -0.01006287 -0.05373294 -0.05648567]\n",
      " [-0.04455822 -0.05219761 -0.00079943  0.01354434  0.00920712]\n",
      " [-0.03835832 -0.03894307 -0.0550276   0.03744876 -0.03915713]\n",
      " [ 0.01082313 -0.02102432 -0.04650474 -0.00450975 -0.01855401]\n",
      " [-0.04175595 -0.00438569  0.00711171 -0.06009852  0.0291407 ]\n",
      " [ 0.04759287 -0.02927334 -0.053014    0.00151587  0.00970358]\n",
      " [ 0.05501258 -0.00881133  0.05548104 -0.06737528 -0.05241805]\n",
      " [-0.0643117  -0.040986   -0.03529564 -0.06279459 -0.05690279]\n",
      " [-0.00828662  0.02727532 -0.06581733 -0.04964818 -0.00604335]\n",
      " [-0.00655588  0.02088017 -0.03981922 -0.03101178 -0.02045329]\n",
      " [ 0.02467569 -0.00103097  0.01272079  0.0578622  -0.06664254]\n",
      " [ 0.03721569  0.00823957  0.06631076 -0.03370466 -0.01366951]\n",
      " [-0.01188583  0.00738807 -0.03030649 -0.02710951  0.02703931]\n",
      " [ 0.0121268  -0.00833648 -0.03151118 -0.04803852 -0.00635491]\n",
      " [ 0.00625086  0.03607313  0.03924406  0.0444599  -0.02710911]\n",
      " [-0.03393806 -0.0389259   0.05502912 -0.01568402  0.03057908]\n",
      " [ 0.06109371 -0.06711397  0.06663936 -0.05001957  0.02413371]\n",
      " [-0.02445713  0.06538417  0.05608701  0.00661252 -0.04582266]\n",
      " [ 0.06617581  0.04978693  0.03007424  0.01526499  0.02768204]\n",
      " [ 0.01365788 -0.03974747 -0.05019502  0.06667843 -0.03856917]\n",
      " [-0.06912776  0.02777883 -0.03458247  0.056445   -0.00912919]\n",
      " [ 0.01861484  0.03911361 -0.06927772 -0.04232409  0.01012991]\n",
      " [ 0.05081905 -0.01939051  0.0676761   0.01457847 -0.04706208]\n",
      " [-0.01515273  0.01362675  0.04367269 -0.06874195  0.01575354]\n",
      " [-0.01588002 -0.03269367 -0.06381759  0.020148    0.06393141]\n",
      " [ 0.0435487  -0.00893947 -0.06733654  0.06285682 -0.06443075]\n",
      " [ 0.04008283  0.06230054  0.05128051 -0.00751111 -0.04575684]\n",
      " [-0.00821121 -0.0595072  -0.06080066  0.01410398 -0.04537943]\n",
      " [-0.0464839   0.05122358  0.03267322  0.04809393 -0.01281786]\n",
      " [ 0.05952144  0.00390723  0.05184291  0.06126002  0.04693594]]\n",
      "<NDArray 64x5 @cpu(0)> \n",
      "[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.\n",
      " 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.\n",
      " 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]\n",
      "<NDArray 64 @cpu(0)>\n"
     ]
    }
   ],
   "source": [
    "print(net[0].weight.data().shape, net[0].bias.data().shape)\n",
    "print(net[0].weight.data(), net[0].bias.data())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Implement the network flexibly:\n",
    "\n",
    "In nn.Sequential, MXNet will automatically construct the forward function that sequentially executes added layers. Now let’s introduce another way to construct a network with a flexible forward function. This second approach gives you more control and you can do things like having parallel branches, skip connections and different types of connections between layers. These are all advanced concepts that we won't cover in this class.\n",
    "\n",
    "To do it, we create a subclass of nn.Block and implement two methods:\n",
    "\n",
    "* __init()__ create the layers\n",
    "\n",
    "* __forward()__ define the forward function."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "MixMLP(\n",
       "  (blk): Sequential(\n",
       "    (0): Dense(None -> 64, Activation(relu))\n",
       "    (1): Dropout(p = 0.4, axes=())\n",
       "    (2): Dense(None -> 128, Activation(relu))\n",
       "    (3): Dropout(p = 0.3, axes=())\n",
       "    (4): Dense(None -> 1, Activation(sigmoid))\n",
       "  )\n",
       ")"
      ]
     },
     "execution_count": 28,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "class MixMLP(nn.Block):\n",
    "    def __init__(self, **kwargs):\n",
    "        # Run `nn.Block`'s init method\n",
    "        super(MixMLP, self).__init__(**kwargs)\n",
    "        self.blk = nn.Sequential()\n",
    "        self.blk.add(nn.Dense(64, activation='relu'),   # Layer 1\n",
    "                     nn.Dropout(.4),                    # Apply random 40% dropout to layer 1\n",
    "                     nn.Dense(128, activation='relu'),  # Layer 2\n",
    "                     nn.Dropout(.3),                    # Apply random 30% dropout to layer 2\n",
    "                     nn.Dense(1, activation='sigmoid')  # Output layer\n",
    "                    )\n",
    "    def forward(self, x):\n",
    "        y = self.blk(x)\n",
    "        return y\n",
    "\n",
    "net = MixMLP()\n",
    "net"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In the sequential chaining approach, we can only add instances with nn.Block as the base class and then run them in a forward pass. In this example, we used print to get the intermediate results and nd.relu to apply relu activation. So this approach provides a more flexible way to define the forward function.\n",
    "\n",
    "The usage of net is similar as before."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "\n",
       "[[0.49996397]\n",
       " [0.5000398 ]\n",
       " [0.49985966]\n",
       " [0.49998817]]\n",
       "<NDArray 4x1 @cpu(0)>"
      ]
     },
     "execution_count": 29,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "net.initialize()\n",
    "# Input shape is (batch_size, data lenght)\n",
    "x = nd.random.uniform(shape=(4, 5))\n",
    "net(x)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "conda_pytorch_p39",
   "language": "python",
   "name": "conda_pytorch_p39"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.13"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}