![MLU Logo](../data/MLU_Logo.png)

# <a name="0">Machine Learning Accelerator - Tabular Data - Lecture 3</a>


## MXNet and Gluon

1. <a href="#1">MXNet: NDarrays and Autograd</a>
2. <a href="#2">Gluon: Building a Neural Network</a>


In [1]:
%pip install -q -r ../requirements.txt

You should consider upgrading via the '/home/ec2-user/anaconda3/envs/pytorch_p39/bin/python -m pip install --upgrade pip' command.[0m
Note: you may need to restart the kernel to use updated packages.


## 1. <a name="1">MXNet: NDarrays and Autograd</a>
<a href="#0">Go to top</a>

This tutorial is following the notebooks under the MXNet crash course [here](https://mxnet.apache.org/api/python/docs/tutorials/packages/ndarray/01-ndarray-intro.html).

To get started, let‚Äôs import the ndarray package (nd is shortform) from MXNet.


In [2]:
from mxnet import nd

Next, let‚Äôs see how to create a 2D array (also called a matrix) with values from two sets of numbers: 1, 2, 3 and 4, 5, 6. This might also be referred to as a tuple of a tuple of integers.

In [3]:
nd.array(((1,2,3),(5,6,7)))


[[1. 2. 3.]
 [5. 6. 7.]]
<NDArray 2x3 @cpu(0)>

We can also create a very simple matrix with the same shape (2 rows by 3 columns), but fill it with 1s.

In [4]:
x = nd.ones((2,3))
x


[[1. 1. 1.]
 [1. 1. 1.]]
<NDArray 2x3 @cpu(0)>

Often we‚Äôll want to create arrays whose values are sampled randomly. For example, sampling values uniformly between -1 and 1. Here we create the same shape, but with random sampling.

In [5]:
y = nd.random.uniform(-1,1,(2,3))
y


[[0.09762704 0.18568921 0.43037868]
 [0.6885315  0.20552671 0.71589124]]
<NDArray 2x3 @cpu(0)>

You can also fill an array of a given shape with a given value, such as 2.0.

In [6]:
x = nd.full((2,3), 2.0)
x


[[2. 2. 2.]
 [2. 2. 2.]]
<NDArray 2x3 @cpu(0)>

As with NumPy, the dimensions of each NDArray are accessible by accessing the .shape attribute. We can also query its size, which is equal to the product of the components of the shape. In addition, .dtype tells the data type of the stored values.

In [7]:
(x.shape, x.size, x.dtype)

((2, 3), 6, numpy.float32)

### Operations

NDArray supports a large number of standard mathematical operations. Such as element-wise multiplication:

In [8]:
x * y


[[0.19525409 0.37137842 0.86075735]
 [1.377063   0.41105342 1.4317825 ]]
<NDArray 2x3 @cpu(0)>

Exponentiation:

In [9]:
y.exp()


[[1.1025515 1.204048  1.5378398]
 [1.9907899 1.2281718 2.0460093]]
<NDArray 2x3 @cpu(0)>

And grab a matrix‚Äôs transpose to compute a proper matrix-matrix product:

In [10]:
nd.dot(x, y.T)


[[1.4273899 3.219899 ]
 [1.4273899 3.219899 ]]
<NDArray 2x2 @cpu(0)>

### Indexing

MXNet NDArrays support slicing in all the ridiculous ways you might imagine accessing your data. Here‚Äôs an example of reading a particular element, which returns a 1D array with shape (1,).

In [11]:
y[1,2]


[0.71589124]
<NDArray 1 @cpu(0)>

Read the second and third columns from y.

In [12]:
y[:,1:3]


[[0.18568921 0.43037868]
 [0.20552671 0.71589124]]
<NDArray 2x2 @cpu(0)>

and writing to a specific element

In [13]:
y[:,1:3] = 2
y


[[0.09762704 2.         2.        ]
 [0.6885315  2.         2.        ]]
<NDArray 2x3 @cpu(0)>

Multi-dimensional slicing is also supported.

In [14]:
y[1:2,0:2] = 4
y


[[0.09762704 2.         2.        ]
 [4.         4.         2.        ]]
<NDArray 2x3 @cpu(0)>

### Automatic differentiation with autograd

We train models to get better and better as a function of experience. <br/>
__Usually, getting better means minimizing a loss function__. To achieve this goal, we often iteratively compute the gradient of the loss with respect to weights and then update the weights accordingly. While the gradient calculations are straightforward through a chain rule, for complex models, working it out by hand can be a pain.<br/>
__Before diving deep into the model training, let‚Äôs go through how MXNet‚Äôs autograd package expedites this work by automatically calculating derivatives.__

__Basic usage__

Let‚Äôs first import the autograd package.

In [15]:
from mxnet import nd
from mxnet import autograd

As a toy example, let‚Äôs say that we are interested in differentiating a function $f(x) = 0.6x^2$ with respect to parameter $x$. We can start by assigning an intial value of x.

In [16]:
x = nd.array([[1, 2], [3, 4]])
x


[[1. 2.]
 [3. 4.]]
<NDArray 2x2 @cpu(0)>

Once we compute the gradient of $f(x)$ with respect to $x$, __we‚Äôll need a place to store it__.<br/>
In MXNet, we can tell an NDArray that we plan to store a gradient by invoking its attach_grad method.

In [17]:
x.attach_grad()

Now we‚Äôre going to define the function $y=f(x)$. To let MXNet store ùë¶, so that we can compute gradients later, we need to put the definition inside a autograd.record() scope.

In [18]:
with autograd.record():
    y = 0.6 * x * x

Let‚Äôs invoke back propagation (backprop) by calling y.backward(). When ùë¶ has more than one entry, y.backward() is equivalent to y.sum().backward().

In [19]:
y.backward()

Now, let‚Äôs see if this is the expected output. Note that $y=0.6x^2$ and $dx/dy = 1.2x$ which should be [[1.2, 2.4],[3.6, 4.8]]. Let‚Äôs check the automatically computed results:

In [20]:
x.grad


[[1.2       2.4      ]
 [3.6000001 4.8      ]]
<NDArray 2x2 @cpu(0)>

## 2. <a name="2">Gluon: Building a Neural Network</a>
<a href="#0">Go to top</a>

### Implement a network with sequential mode 

Let's implement a simple neural network with two hidden layers of size 64 and 128 using the sequential mode. We will have 5 inputs, 1 output and some dropouts between the layers.

In [21]:
from mxnet import nd
from mxnet.gluon import nn

net = nn.Sequential()

net.add(nn.Dense(64 ,activation='relu'),    # Layer 1
        nn.Dropout(.4),                     # Apply random 40% dropout to layer 1
        nn.Dense(128, activation='relu'),   # Layer 2
        nn.Dropout(.3),                     # Apply random 30% dropout to layer 2
        nn.Dense(1, activation='sigmoid'))  # Output layer
net

Sequential(
  (0): Dense(None -> 64, Activation(relu))
  (1): Dropout(p = 0.4, axes=())
  (2): Dense(None -> 128, Activation(relu))
  (3): Dropout(p = 0.3, axes=())
  (4): Dense(None -> 1, Activation(sigmoid))
)

Initialize weights

In [22]:
net.initialize()

Let's look at our layers and dropouts on them. We can easily access them with net[index]

In [23]:
print(net[0])
print(net[1])
print(net[2])
print(net[3])
print(net[4])

Dense(None -> 64, Activation(relu))
Dropout(p = 0.4, axes=())
Dense(None -> 128, Activation(relu))
Dropout(p = 0.3, axes=())
Dense(None -> 1, Activation(sigmoid))


Let's send a batch of data to this network (batch size is 4 in this case)

__Important note:__ Weights are initialized after you pass some data through network. This is because network input size is learned from your input data.

In [24]:
# Input shape is (batch_size, data lenght)
x = nd.random.uniform(shape=(4, 5))
y = net(x)

In [25]:
print("Random input data with shape" , x.shape)
print(x)

Random input data with shape (4, 5)

[[0.5448832  0.8472517  0.4236548  0.6235637  0.6458941 ]
 [0.3843817  0.4375872  0.2975346  0.891773   0.05671298]
 [0.96366274 0.2726563  0.3834415  0.47766513 0.79172504]
 [0.8121687  0.5288949  0.47997716 0.56804454 0.3927848 ]]
<NDArray 4x5 @cpu(0)>


In [26]:
print("Output shape:", y.shape)
print("Network output: ", y)

Output shape: (4, 1)
Network output:  
[[0.500085  ]
 [0.49984136]
 [0.5009381 ]
 [0.5004806 ]]
<NDArray 4x1 @cpu(0)>


We can also see the initialized weights for each layer.

In [27]:
print(net[0].weight.data().shape, net[0].bias.data().shape)
print(net[0].weight.data(), net[0].bias.data())

(64, 5) (64,)

[[ 0.05958354  0.04705103 -0.06005495 -0.02276454 -0.0578019 ]
 [ 0.02074406 -0.06716943 -0.01844618  0.04656678  0.06400172]
 [ 0.03894195 -0.05035089  0.0518017   0.05181222  0.06700657]
 [-0.00369488  0.0418822   0.0421275  -0.00539289  0.00286685]
 [ 0.03927409  0.02504314 -0.05344158  0.03088857  0.01958894]
 [ 0.01148278 -0.04993054  0.00523225  0.06225365  0.03620619]
 [ 0.00305876 -0.05517294 -0.01194733 -0.00369594 -0.03296221]
 [-0.04391347  0.03839272  0.03316854 -0.00613896 -0.03968295]
 [ 0.00958075 -0.05106945 -0.06736943 -0.02462026  0.01646897]
 [-0.04904552  0.0156934  -0.03887501  0.01637076 -0.01589154]
 [ 0.06212472  0.05636378  0.02545484 -0.007007   -0.0196689 ]
 [ 0.01582889 -0.00881553  0.0563288   0.02766836 -0.05610075]
 [-0.06156844  0.06577327  0.02334734  0.0214396  -0.01161692]
 [ 0.06960588  0.03084543  0.06055803 -0.06998399 -0.05206258]
 [-0.02767344  0.06986568 -0.04945417 -0.03694754 -0.0570726 ]
 [-0.0144787  -0.04392357 -0.01569249 -0

### Implement the network flexibly:

In nn.Sequential, MXNet will automatically construct the forward function that sequentially executes added layers. Now let‚Äôs introduce another way to construct a network with a flexible forward function. This second approach gives you more control and you can do things like having parallel branches, skip connections and different types of connections between layers. These are all advanced concepts that we won't cover in this class.

To do it, we create a subclass of nn.Block and implement two methods:

* __init()__ create the layers

* __forward()__ define the forward function.

In [28]:
class MixMLP(nn.Block):
    def __init__(self, **kwargs):
        # Run `nn.Block`'s init method
        super(MixMLP, self).__init__(**kwargs)
        self.blk = nn.Sequential()
        self.blk.add(nn.Dense(64, activation='relu'),   # Layer 1
                     nn.Dropout(.4),                    # Apply random 40% dropout to layer 1
                     nn.Dense(128, activation='relu'),  # Layer 2
                     nn.Dropout(.3),                    # Apply random 30% dropout to layer 2
                     nn.Dense(1, activation='sigmoid')  # Output layer
                    )
    def forward(self, x):
        y = self.blk(x)
        return y

net = MixMLP()
net

MixMLP(
  (blk): Sequential(
    (0): Dense(None -> 64, Activation(relu))
    (1): Dropout(p = 0.4, axes=())
    (2): Dense(None -> 128, Activation(relu))
    (3): Dropout(p = 0.3, axes=())
    (4): Dense(None -> 1, Activation(sigmoid))
  )
)

In the sequential chaining approach, we can only add instances with nn.Block as the base class and then run them in a forward pass. In this example, we used print to get the intermediate results and nd.relu to apply relu activation. So this approach provides a more flexible way to define the forward function.

The usage of net is similar as before.

In [29]:
net.initialize()
# Input shape is (batch_size, data lenght)
x = nd.random.uniform(shape=(4, 5))
net(x)


[[0.49996397]
 [0.5000398 ]
 [0.49985966]
 [0.49998817]]
<NDArray 4x1 @cpu(0)>