# Inference of MNIST using MXNet on Amazon EKS

This document explains how to perform inference of MNIST model using [Apache MXNet Model Server](https://github.com/awslabs/mxnet-model-server) (MMS) on Amazon EKS. MMS is a flexible and easy to use tool for serving deep learning models trained by MXNet.

## Pre-requisite

Create [EKS cluster using GPU](../../eks-gpu.md).

## Run inference using EKS

In order to run MNIST inferene on EKS, we need to have Docker image and k8s manifest to create inference service backed by deployment.

1. You can either create a docker image from file `samples/mnist/inference/mxnet/Dockerfile` or use an existing image `rgaut/deeplearning-mxnet:inference`. 

	The MXNet model is bundled with the Docker image.

1. Create deployment and service for inference:

    ```
    kubectl create -f samples/mnist/inference/mxnet/mxnet_eks.yaml
    ```

    Check for the deployment to run:

    ```
    kubectl get pods --selector=app=mnist-service -w
	NAME                             READY   STATUS              RESTARTS   AGE
	mnist-service-7df4759f74-xhj5x   0/1     ContainerCreating   0          29s
	mnist-service-7df4759f74-xhj5x   1/1     Running             0          46s
    ```

1. Service is exposed as `clusterIP`. Use port forwarding so that the service can be accessed locally:

     ```
     kubectl port-forward \
		`kubectl get pods --selector=app=mnist-service -o jsonpath='{.items[0].metadata.name}'` \
		8080:8080 &
     ```

1. Run the inference:

    ```
    curl -X POST localhost:8080/predictions/mnist -T samples/mnist/inference/mxnet/utils/9.png
	% Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
	                             Dload  Upload   Total   Spent    Left  Speed
	100  8042  100    56  100  7986   3105   432k --:--:-- --:--:-- --:--:--  458k
	Prediction is [9] with probability of 92.52161979675293%
    ```

    Run another inference:

    ```
    curl -X POST localhost:8080/predictions/mnist -T samples/mnist/inference/mxnet/utils/7.jpg
      % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                     Dload  Upload   Total   Spent    Left  Speed
    100   608  100    52  100   556    568   6081 --:--:-- --:--:-- --:--:--  6109
    Prediction is [7] with probability of 99.9999761581%
    ```

## Run inference using MXNet Model Server locally

### Install MXNet Model Server

1. Install Java:

	```
	brew tap caskroom/versions
	brew update
	brew cask install java8
	```

1. Setup a virtual environment:

	```
	pip install virtualenv --user
	export PATH=~/Library/Python/2.7/bin:$PATH
	# create a Python2.7 virtual environment
	virtualenv -p /usr/bin/python /tmp/pyenv2
	# Enter this virtual environment
	source /tmp/pyenv2/bin/activate
	```

	Location of `virtualenv` binary may be different. This can be found using `pip show virtualenv` command.

1. Install MXNet Model Server for CPU inference:

   ```
   pip install mxnet-mkl
   ```

1. Install MXNet Model Server:

	```
	pip install mxnet-model-server
	```

### Prepare model archive

Model Archive is an artifact that MMS can consume natively. This archive package can be easily created with the trained artifacts. A copy of this archive is available at `samples/mnist/inference/archived_model/mnist_cnn.mar`.

Skip rest of the section if you are using the pre-generated archive. This section explains how to generate MMS archive from the artifacts produced by model training.

1. Two artifacts were generated at end of the training - symbols file (`mnist_cnn-symbol.json`) and a params file (`mnist_cnn-0000.params`). These artifacts are provided in the [saved_model](../../training/mxnet/saved_model) directory. Copy these artifacts to `/tmp/models` directory.

	```
	mkdir /tmp/models
	cp samples/mnist/training/mxnet/saved_model/mnist_cnn-* /tmp/models
	```

1. `model-archiver` tool is also installed as part of MMS installation. It can be manually installed:

	```
	pip install model-archiver
	```

1. Create a `model-store` location under `tmp`:

	```
	mkdir /tmp/model-store
	```

1. Copy the [../../../samples/mnist/inference/mxnet/mnist_cnn_inference.py](mnist_cnn_inference.py) to `/tmp/models` directory:

	```
	cp samples/mnist/inference/mxnet/mnist_cnn_inference.py /tmp/models
	```

1. Generate model archive:

	```
	model-archiver \
		--model-name mnist_cnn \
		--model-path /tmp/models \
		--export-path /tmp/model-store \
		--handler mnist_cnn_inference:handle -f
	```

	This command creates an model archive called `mnist_cnn.mar` under `/tmp/model-store`.

### Run inference

1. Update `~/.keras/keras.json` so that it looks like:

	```
	{
	    "epsilon": 1e-07, 
	    "floatx": "float32", 
	    "image_data_format": "channels_last", 
	    "backend": "mxnet"
	}
	```

	This is to ensure that the `backend` is `mxnet` and `image_data_format` is `channels_last`.

1. Run MXNet Model Server:

	```
	mxnet-model-server \
	--start \
	--model-store samples/mnist/inference/mxnet/archived_model \
	--models mnist=mnist_cnn.mar
	```

	The above command creates an endpoint called `mnist`.

	If you generated your own archive at `/tmp/model-store`, then make sure to specify that directory as parameter to `--model-store`.

1. In a new terminal, run the inference:

	```
	curl -X POST localhost:8080/predictions/mnist -T samples/mnist/inference/mxnet/utils/9.png
    % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
    100  8042  100    56  100  7986   3105   432k --:--:-- --:--:-- --:--:--  458k
    Prediction is [9] with probability of 92.52161979675293%
	```

	Run another inference:

	```
	curl -X POST localhost:8080/predictions/mnist -T samples/mnist/inference/mxnet/utils/7.jpg
	  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
	                                 Dload  Upload   Total   Spent    Left  Speed
	100   608  100    52  100   556    568   6081 --:--:-- --:--:-- --:--:--  6109
	Prediction is [7] with probability of 99.9999761581%
	```