{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "## Hotdog or Not HotDog\n", "\n", "Welcome to this Amazon SageMaker Notebook! This is an entirely managed notebook service that you can use to create and edit machine learning models with Python. We will be using it today to create a binary image classification model using the Apache MXNet deep learning framework. We will then learn how to delpoy this model onto our AWS DeepLens device.\n", "\n", "In this notebook we will be to using MXNet's Gluon interface, to download and edit a pre-trained [ImageNet](http://www.image-net.org/) model and transform it into binary classifier, which we can use to differentiate between hot dogs and other objects.\n", "\n", "### Setup\n", "\n", "Before we start, make sure the kernel in the the notebook is set to the correct one, `condamxnet3.6` which has most of the the Python library dependencies we will need for this tutorial already installed.\n", "\n", "First we'll start by importing a bunch of packages into the notebook that you'll need later and installing any required packages that are missing into our notebook kernel." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "bash: line 1: conda: command not found\n" ] } ], "source": [ "%%bash\n", "conda install scikit-image" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "collapsed": true }, "outputs": [], "source": [ "from __future__ import print_function\n", "import logging\n", "logging.basicConfig(level=logging.INFO)\n", "import os\n", "import time\n", "from collections import OrderedDict\n", "import skimage.io as io\n", "import numpy as np\n", "\n", "import mxnet as mx" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Model\n", "\n", "The model we will be downloading and editing is [SqueezeNet](https://arxiv.org/abs/1602.07360), an extremely efficient image classification model that achived 2012 State of the Art accuracy on the popular [ImageNet](http://www.image-net.org/challenges/LSVRC/), image classification challenge. SqueezeNet is just a convolutional neural network (CNN), with an architecture chosen to have a small number of parameters and to require a minimal amount of computation. It's especially popular for folks that need to run CNNs on low-powered devices like cell phones and other internet-of-things devices. The MXNet Deep Learning framework offers SqueezeNet v1.0 and v1.1 that are pretrained on ImageNet through it's [model zoo](https://mxnet.incubator.apache.org/api/python/gluon/model_zoo.html).\n", "\n", "![image](https://community.arm.com/cfs-file/__key/communityserver-discussions-components-files/18/pastedimage1485588767177v1.png)\n", "Image 1. The layerwise visualization of the SqueezeNet architecture\n", "\n", "## Pulling the pre-trained model\n", "The MXNet model zoo gives us convenient access to a number of popular models,\n", "both their architectures and their pretrained parameters.\n", "Let's download a pretrained SqueezeNet right now with just a few lines of code.\n" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": true }, "outputs": [], "source": [ "from mxnet.gluon import nn\n", "from mxnet.gluon.model_zoo import vision as models\n", "\n", "# Get pretrained SqueezeNet\n", "net = models.squeezenet1_1(pretrained=True, prefix='deep_dog_')\n", "\n", "# hot dog happens to be a class in imagenet, which this model was trained on\n", "# we can reuse the weight for that class for better performance\n", "# here's the index for that class\n", "imagenet_hotdog_index = 713" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### DeepDog Net\n", "\n", "In vision networks it's common that the first set of layers learns the task of recognizing edges, curves and other important visual features of the input image. We call this feature extraction, and once the abstract features are extracted we can leverage a simpler model at the end of the network to classify images using these features.\n", "\n", "We will use the feature extractor from the pretrained SqueezeNet (every layer except the last one) to build our own classifier for hotdogs. Conveniently, the MXNet model zoo handles the editing of the model for us. All we have to do is specify the number out of output classes in our new task, which we do via the keyword argument `classes=2`." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "SqueezeNet(\n", " (classifier): HybridSequential(\n", " (0): Dropout(p = 0.5)\n", " (1): Conv2D(2, kernel_size=(1, 1), stride=(1, 1))\n", " (2): Activation(relu)\n", " (3): AvgPool2D(size=(13, 13), stride=(13, 13), padding=(0, 0), ceil_mode=False)\n", " (4): Flatten\n", " )\n", " (features): HybridSequential(\n", " (0): Conv2D(64, kernel_size=(3, 3), stride=(2, 2))\n", " (1): Activation(relu)\n", " (2): MaxPool2D(size=(3, 3), stride=(2, 2), padding=(0, 0), ceil_mode=True)\n", " (3): HybridSequential(\n", " (0): HybridSequential(\n", " (0): Conv2D(16, kernel_size=(1, 1), stride=(1, 1))\n", " (1): Activation(relu)\n", " )\n", " (1): HybridConcurrent(\n", " (0): HybridSequential(\n", " (0): Conv2D(64, kernel_size=(1, 1), stride=(1, 1))\n", " (1): Activation(relu)\n", " )\n", " (1): HybridSequential(\n", " (0): Conv2D(64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))\n", " (1): Activation(relu)\n", " )\n", " )\n", " )\n", " (4): HybridSequential(\n", " (0): HybridSequential(\n", " (0): Conv2D(16, kernel_size=(1, 1), stride=(1, 1))\n", " (1): Activation(relu)\n", " )\n", " (1): HybridConcurrent(\n", " (0): HybridSequential(\n", " (0): Conv2D(64, kernel_size=(1, 1), stride=(1, 1))\n", " (1): Activation(relu)\n", " )\n", " (1): HybridSequential(\n", " (0): Conv2D(64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))\n", " (1): Activation(relu)\n", " )\n", " )\n", " )\n", " (5): MaxPool2D(size=(3, 3), stride=(2, 2), padding=(0, 0), ceil_mode=True)\n", " (6): HybridSequential(\n", " (0): HybridSequential(\n", " (0): Conv2D(32, kernel_size=(1, 1), stride=(1, 1))\n", " (1): Activation(relu)\n", " )\n", " (1): HybridConcurrent(\n", " (0): HybridSequential(\n", " (0): Conv2D(128, kernel_size=(1, 1), stride=(1, 1))\n", " (1): Activation(relu)\n", " )\n", " (1): HybridSequential(\n", " (0): Conv2D(128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))\n", " (1): Activation(relu)\n", " )\n", " )\n", " )\n", " (7): HybridSequential(\n", " (0): HybridSequential(\n", " (0): Conv2D(32, kernel_size=(1, 1), stride=(1, 1))\n", " (1): Activation(relu)\n", " )\n", " (1): HybridConcurrent(\n", " (0): HybridSequential(\n", " (0): Conv2D(128, kernel_size=(1, 1), stride=(1, 1))\n", " (1): Activation(relu)\n", " )\n", " (1): HybridSequential(\n", " (0): Conv2D(128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))\n", " (1): Activation(relu)\n", " )\n", " )\n", " )\n", " (8): MaxPool2D(size=(3, 3), stride=(2, 2), padding=(0, 0), ceil_mode=True)\n", " (9): HybridSequential(\n", " (0): HybridSequential(\n", " (0): Conv2D(48, kernel_size=(1, 1), stride=(1, 1))\n", " (1): Activation(relu)\n", " )\n", " (1): HybridConcurrent(\n", " (0): HybridSequential(\n", " (0): Conv2D(192, kernel_size=(1, 1), stride=(1, 1))\n", " (1): Activation(relu)\n", " )\n", " (1): HybridSequential(\n", " (0): Conv2D(192, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))\n", " (1): Activation(relu)\n", " )\n", " )\n", " )\n", " (10): HybridSequential(\n", " (0): HybridSequential(\n", " (0): Conv2D(48, kernel_size=(1, 1), stride=(1, 1))\n", " (1): Activation(relu)\n", " )\n", " (1): HybridConcurrent(\n", " (0): HybridSequential(\n", " (0): Conv2D(192, kernel_size=(1, 1), stride=(1, 1))\n", " (1): Activation(relu)\n", " )\n", " (1): HybridSequential(\n", " (0): Conv2D(192, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))\n", " (1): Activation(relu)\n", " )\n", " )\n", " )\n", " (11): HybridSequential(\n", " (0): HybridSequential(\n", " (0): Conv2D(64, kernel_size=(1, 1), stride=(1, 1))\n", " (1): Activation(relu)\n", " )\n", " (1): HybridConcurrent(\n", " (0): HybridSequential(\n", " (0): Conv2D(256, kernel_size=(1, 1), stride=(1, 1))\n", " (1): Activation(relu)\n", " )\n", " (1): HybridSequential(\n", " (0): Conv2D(256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))\n", " (1): Activation(relu)\n", " )\n", " )\n", " )\n", " (12): HybridSequential(\n", " (0): HybridSequential(\n", " (0): Conv2D(64, kernel_size=(1, 1), stride=(1, 1))\n", " (1): Activation(relu)\n", " )\n", " (1): HybridConcurrent(\n", " (0): HybridSequential(\n", " (0): Conv2D(256, kernel_size=(1, 1), stride=(1, 1))\n", " (1): Activation(relu)\n", " )\n", " (1): HybridSequential(\n", " (0): Conv2D(256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))\n", " (1): Activation(relu)\n", " )\n", " )\n", " )\n", " )\n", ")\n" ] } ], "source": [ "# Create the model with a two class output classifier and apply the pretrained weights\n", "deep_dog_net = models.squeezenet1_1(prefix='deep_dog_', classes=2)\n", "deep_dog_net.collect_params().initialize()\n", "deep_dog_net.features = net.features\n", "\n", "# Lets take a look at what this network looks like\n", "print(deep_dog_net)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The network can already be used for prediction. However, since it hasn't been fine tuned yet for the hot dog classification task the network performance is not optimal.\n", "\n", "Let's test it out by defining a prediction function to preprocess an image into the shape and color scheme expected by the network and feed it in to get the predicted output." ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "collapsed": true }, "outputs": [], "source": [ "from skimage.color import rgba2rgb\n", "\n", "def classify_hotdog(net, url):\n", "\n", " # Pull in image and ensure there are only 3 color channels (RGB)\n", " I = io.imread(url)\n", " if I.shape[2] == 4:\n", " I = rgba2rgb(I)\n", " \n", " # Normalize the color channels and crop the image to the expected input size (224,224)\n", " image = mx.nd.array(I).astype(np.uint8)\n", " image = mx.image.resize_short(image, 256)\n", " image, _ = mx.image.center_crop(image, (224, 224))\n", " image = mx.image.color_normalize(image.astype(np.float32)/255,\n", " mean=mx.nd.array([0.485, 0.456, 0.406]),\n", " std=mx.nd.array([0.229, 0.224, 0.225]))\n", "\n", " # Flip the color channels from RGB to the expected BGR input\n", " image = mx.nd.transpose(image.astype('float32'), (2,1,0))\n", " image = mx.nd.expand_dims(image, axis=0)\n", " \n", " # Feed the pre-processed image into the net and get the predicted result\n", " inference_result = net(image)\n", " print('Raw inference output is:'+str(inference_result))\n", " \n", " # Squeeze the inference result into a softmax function to turn it into a probability\n", " out = mx.nd.SoftmaxActivation(inference_result)\n", " print('Probabilities are: '+str(out[0].asnumpy()))\n", " \n", " # Take max probability to predict if the image has a hotdog or not\n", " result = np.argmax(out.asnumpy())\n", " outstring = ['Not hotdog!', 'Hotdog!']\n", " print(outstring[result])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now let's download a hot dog image (hotdog_mustard-main.jpg) and an image of a dog (scroll001.jpg) to our local directory to test this model on" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "--2017-11-28 18:59:14-- http://www.wienerschnitzel.com/wp-content/uploads/2014/10/hotdog_mustard-main.jpg\n", "Resolving www.wienerschnitzel.com (www.wienerschnitzel.com)... 104.198.109.247\n", "Connecting to www.wienerschnitzel.com (www.wienerschnitzel.com)|104.198.109.247|:80... connected.\n", "HTTP request sent, awaiting response... 200 OK\n", "Length: 22917 (22K) [image/jpeg]\n", "Saving to: ‘hotdog_mustard-main.jpg.1’\n", "\n", " 0K .......... .......... .. 100% 358K=0.06s\n", "\n", "2017-11-28 18:59:15 (358 KB/s) - ‘hotdog_mustard-main.jpg.1’ saved [22917/22917]\n", "\n", "--2017-11-28 18:59:15-- https://www.what-dog.net/Images/faces2/scroll001.jpg\n", "Resolving www.what-dog.net (www.what-dog.net)... 191.237.47.20\n", "Connecting to www.what-dog.net (www.what-dog.net)|191.237.47.20|:443... connected.\n", "HTTP request sent, awaiting response... 200 OK\n", "Length: 48316 (47K) [image/jpeg]\n", "Saving to: ‘scroll001.jpg.1’\n", "\n", " 0K .......... .......... .......... .......... ....... 100% 8.52M=0.005s\n", "\n", "2017-11-28 18:59:15 (8.52 MB/s) - ‘scroll001.jpg.1’ saved [48316/48316]\n", "\n" ] } ], "source": [ "%%bash\n", "wget http://www.wienerschnitzel.com/wp-content/uploads/2014/10/hotdog_mustard-main.jpg\n", "wget https://www.what-dog.net/Images/faces2/scroll001.jpg" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Before deploying our net we usually want to run the hybridize function on it, which will essentially \"compile\" the graph, allowing it to run much faster for both inference and training. This will also allow us to serialize the network as well as its parameters and export it to a file." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "deep_dog_net.hybridize()" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Probabilities are: [ 0.66653055 0.33346951]\n", "Not hotdog!\n", "Probabilities are: [ 0.48589769 0.51410228]\n", "Hotdog!\n" ] } ], "source": [ "# Let's run the classification on our tow downloaded images to see what our model comes up with\n", "classify_hotdog(deep_dog_net, './hotdog_mustard-main.jpg') # check for hotdog\n", "classify_hotdog(deep_dog_net, './scroll001.jpg') # check for not hotdog" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As you can see the predictions are not very accurate. The hot dog is classified as not a hot dog, due to the fact that original model was trained on far more images of objects other than hot dogs. This is typically reffered to as a class imbalance problem. To improve the model we can download a set of new parameters for the model that we have pre-optimized through a \"fine tuning\" process, where we retrained the model using a more balanced set of images of hotdogs and other objects. We can then apply these new parameters to our model to make it even more accurate." ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "INFO:root:downloaded https://apache-mxnet.s3-accelerate.amazonaws.com/gluon/models/deep-dog-5a342a6f.params into deep-dog-5a342a6f.params successfully\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Probabilities are: [ 0.37115085 0.62884909]\n", "Hotdog!\n", "Probabilities are: [ 0.9988153 0.00118477]\n", "Not hotdog!\n" ] } ], "source": [ "from mxnet.test_utils import download\n", "\n", "# Pull the new parameters using the download utility provided by MXNet\n", "download('https://apache-mxnet.s3-accelerate.amazonaws.com/gluon/models/deep-dog-5a342a6f.params',\n", " overwrite=True)\n", "\n", "# This simply applies the new parameters onto the model we already have\n", "deep_dog_net.load_params('deep-dog-5a342a6f.params', mx.cpu())\n", "\n", "deep_dog_net.hybridize()\n", "classify_hotdog(deep_dog_net, './hotdog_mustard-main.jpg')\n", "classify_hotdog(deep_dog_net, './scroll001.jpg')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The predictions seem reasonable, so we can export this as a serialized model to our local dirctory. This is a simple one line command, which produces a set of two files: a json file (hotdog_or_not_model-symbol.json) holding the network architecture, and a params file (hotdog_or_not_model-0000.params) holding the parameters the network learned." ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "collapsed": true }, "outputs": [], "source": [ "deep_dog_net.export('hotdog_or_not_model')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now let's push this serialized model to S3, where we can then optimize it for our AWS DeepLens and then push it down onto our device for inference." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "import boto3\n", "import re\n", "\n", "assumed_role = boto3.client('sts').get_caller_identity()['Arn']\n", "s3_access_role = re.sub(r'^(.+)sts::(\\d+):assumed-role/(.+?)/.*$', r'\\1iam::\\2:role/\\3', assumed_role)\n", "print(s3_access_role)\n", "s3 = boto3.resource('s3')\n", "\n", "json = open('hotdog_or_not_model-symbol.json', 'rb')\n", "params = open('hotdog_or_not_model-0000.params', 'rb')\n", "s3.Bucket('test-bucket').put_object(Key='hotdog_or_not_model-symbol.json', Body=json)\n", "s3.Bucket('test-bucket').put_object(Key='hotdog_or_not_model-0000.params', Body=params)" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.5.2" } }, "nbformat": 4, "nbformat_minor": 2 }