{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# MNIST Tensorflow training options\n", "\n", "The **SageMaker Python SDK** helps you deploy your models for training and hosting in optimized, productions ready containers in SageMaker. The SageMaker Python SDK is easy to use, modular, extensible and compatible with TensorFlow and MXNet. This tutorial focuses on how to create a convolutional neural network model to train the [MNIST dataset](http://yann.lecun.com/exdb/mnist/) using **TensorFlow training**.\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Set up the environment" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import os\n", "import sagemaker\n", "from sagemaker import get_execution_role\n", "import project_path # path to helper methods\n", "from lib import tf_scripts\n", "from lib import workshop\n", "import boto3\n", "\n", "sagemaker_session = sagemaker.Session()\n", "cfn = boto3.client('cloudformation')\n", "\n", "role = get_execution_role()\n", "\n", "session = boto3.session.Session()\n", "region = session.region_name" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### [Create S3 Bucket](https://docs.aws.amazon.com/AmazonS3/latest/gsg/CreatingABucket.html)\n", "\n", "We will create an S3 bucket that will be used throughout the workshop for storing our data.\n", "\n", "[s3.create_bucket](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3.html#S3.Client.create_bucket) boto3 documentation" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "bucket = workshop.create_bucket(region, session, 'sage-')\n", "print(bucket)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Download the [MNIST dataset](https://en.wikipedia.org/wiki/MNIST_database#Dataset)\n", "\n", "In this step, we are going to convert the MNIST dataset into [tfrecord](https://www.tensorflow.org/tutorials/load_data/tf-records) binary files for training, testing, and validation." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "scrolled": false }, "outputs": [], "source": [ "from tensorflow.contrib.learn.python.learn.datasets import mnist\n", "import tensorflow as tf\n", "\n", "data_sets = mnist.read_data_sets('data', dtype=tf.uint8, reshape=False, validation_size=5000)\n", "\n", "tf_scripts.convert_to(data_sets.train, 'train', 'data')\n", "tf_scripts.convert_to(data_sets.validation, 'validation', 'data')\n", "tf_scripts.convert_to(data_sets.test, 'test', 'data')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Upload the data\n", "We use the ```sagemaker.Session.upload_data``` function to upload our datasets to an S3 location. The return value inputs identifies the location -- we will use this later when we start the training job." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "inputs = sagemaker_session.upload_data(bucket=bucket, path='data', key_prefix='data/DEMO-mnist')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Construct a script for distributed training \n", "Here is the full code for the network model:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "scrolled": false }, "outputs": [], "source": [ "!pygmentize 'mnist.py'" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Create a training job using the sagemaker.TensorFlow estimator \n", "\n", "The script here is and adaptation of the [TensorFlow MNIST example](https://github.com/tensorflow/models/tree/master/official/mnist). It provides a ```model_fn(features, labels, mode)```, which is used for training, evaluation and inference. \n", "\n", "## A regular ```model_fn```\n", "\n", "A regular **```model_fn```** follows the pattern:\n", "1. [defines a neural network](https://github.com/tensorflow/models/blob/master/official/mnist/mnist.py#L96)\n", "- [applies the ```features``` in the neural network](https://github.com/tensorflow/models/blob/master/official/mnist/mnist.py#L178)\n", "- [if the ```mode``` is ```PREDICT```, returns the output from the neural network](https://github.com/tensorflow/models/blob/master/official/mnist/mnist.py#L186)\n", "- [calculates the loss function comparing the output with the ```labels```](https://github.com/tensorflow/models/blob/master/official/mnist/mnist.py#L188)\n", "- [creates an optimizer and minimizes the loss function to improve the neural network](https://github.com/tensorflow/models/blob/master/official/mnist/mnist.py#L193)\n", "- [returns the output, optimizer and loss function](https://github.com/tensorflow/models/blob/master/official/mnist/mnist.py#L205)\n", "\n", "## Writing a ```model_fn``` for distributed training\n", "When distributed training happens, the same neural network will be sent to the multiple training instances. Each instance will predict a batch of the dataset, calculate loss and minimize the optimizer. One entire loop of this process is called **training step**.\n", "\n", "### Syncronizing training steps\n", "A [global step](https://www.tensorflow.org/api_docs/python/tf/train/global_step) is a global variable shared between the instances. It's necessary for distributed training, so the optimizer will keep track of the number of **training steps** between runs: \n", "\n", "```python\n", "train_op = optimizer.minimize(loss, tf.train.get_or_create_global_step())\n", "```\n", "\n", "That is the only required change for distributed training!" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from sagemaker.session import Session\n", "\n", "# Location to save your custom code in tar.gz format.\n", "custom_code_upload_location = 's3://{}/customcode/tensorflow_pipemode'.format(bucket)\n", "\n", "# Location where results of model training are saved.\n", "model_artifacts_location = 's3://{}/sagemaker/artifacts'.format(bucket)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "scrolled": false }, "outputs": [], "source": [ "%%time\n", "from sagemaker.tensorflow import TensorFlow\n", "\n", "mnist_estimator = TensorFlow(entry_point='mnist.py',\n", " role=role,\n", " framework_version='1.11.0',\n", " training_steps=1000, \n", " evaluation_steps=100,\n", " output_path=model_artifacts_location,\n", " train_instance_count=2,\n", " train_instance_type='ml.c4.xlarge',\n", " tags=[{\"Key\":\"Project\", \"Value\":\"Tensorflow_Demo\"}])\n", "\n", "mnist_estimator.fit(inputs)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The **```fit```** method will create a training job in two **ml.c4.xlarge** instances. The logs above will show the instances doing training, evaluation, and incrementing the number of **training steps**. \n", "\n", "In the end of the training, the training job will generate a saved model for TF serving." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Deploy the trained model to prepare for predictions\n", "\n", "The deploy() method creates an endpoint which serves prediction requests in real-time." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "mnist_predictor = mnist_estimator.deploy(initial_instance_count=1,\n", " instance_type='ml.m4.xlarge')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Perform inference\n", "\n", "Now that we've trained a model, we're going to use it to perform inference with a SageMaker endpoint as well as a batch transform job. The request handling behavior of the Endpoint deployed during the transform job is determined by the `mnist.py` script we looked at earlier." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Invoking the endpoint\n", "\n", "We will be using matplotlib to show and sample image from the MNIST dataset. From there we will make a prediction against the inference endpiint we just created above running 5 samples through the endpoint." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%matplotlib inline\n", "import matplotlib.pyplot as plt\n", "plt.rcParams[\"figure.figsize\"] = (2,10)\n", "\n", "def show_digit(arr):\n", " two_d = (np.reshape(arr, (28, 28)) * 255).astype(np.uint8)\n", " plt.imshow(two_d, interpolation='nearest')\n", " return plt" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "from tensorflow.examples.tutorials.mnist import input_data\n", "\n", "mnist = input_data.read_data_sets(\"/tmp/data/\", one_hot=True)\n", "\n", "for i in range(5):\n", " data = mnist.test.images[i].tolist()\n", " tensor_proto = tf.make_tensor_proto(values=np.asarray(data), shape=[1, len(data)], dtype=tf.float32)\n", " predict_response = mnist_predictor.predict(tensor_proto)\n", " \n", " print(\"========================================\")\n", " label = np.argmax(mnist.test.labels[i])\n", " print(\"label is {}\".format(label))\n", " show_digit(mnist.test.images[i]).show()\n", " prediction = predict_response['outputs']['classes']['int64_val'][0]\n", " print(\"prediction is {}\".format(prediction))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Deploy the trained model using Neo\n", "\n", "Now the model is ready to be compiled by Neo to be optimized for our hardware of choice. We are using the ``TensorFlowEstimator.compile_model`` method to do this. For this example, our target hardware is ``'ml_c5'``. You can changed these to other supported target hardware if you prefer. [Amazon SageMaker Neo Blog](https://aws.amazon.com/blogs/aws/amazon-sagemaker-neo-train-your-machine-learning-models-once-run-them-anywhere/) for more information.\n", "\n", "## Compiling the model\n", "The ``input_shape`` is the definition for the model's input tensor and ``output_path`` is where the compiled model will be stored in S3. **Important. If the following command result in a permission error, scroll up and locate the value of execution role returned by `get_execution_role()`. The role must have access to the S3 bucket specified in ``output_path``.**" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "!pygmentize mnist.py" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The script here is and adaptation of the [TensorFlow MNIST example](https://github.com/tensorflow/models/tree/master/official/mnist). It provides a ```model_fn(features, labels, mode)```, which is used for training, evaluation and inference. See [TensorFlow MNIST distributed training notebook](https://github.com/awslabs/amazon-sagemaker-examples/blob/master/sagemaker-python-sdk/tensorflow_distributed_mnist/tensorflow_distributed_mnist.ipynb) for more details about the training script.\n", "\n", "At the end of the training script, there are two additional functions, to be used with Neo Deep Learning Runtime:\n", "* `neo_preprocess(payload, content_type)`: Function that takes in the payload and Content-Type of each incoming request and returns a NumPy array\n", "* `neo_postprocess(result)`: Function that takes the prediction results produced by Deep Learining Runtime and returns the response body" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "output_path = '/'.join(mnist_estimator.output_path.split('/')[:-1])\n", "optimized_estimator = mnist_estimator.compile_model(target_instance_family='ml_c5', \n", " input_shape={'data':[1, 784]}, # Batch size 1, 3 channels, 224x224 Images.\n", " output_path=output_path,\n", " framework='tensorflow', framework_version='1.11.0')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Deploying the compiled model\n", "\n", "With the optimizer model now created we will deploy a new inference endpoint in SageMaker. We will also set the `content-type` and `serializer` to use when making a request to the endpoint." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "optimized_predictor = optimized_estimator.deploy(initial_instance_count = 1,\n", " instance_type = 'ml.c5.xlarge')" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# The neo_preprocess() function expects an image in the request body\n", "# But the MNIST example data is saved as NumPy array.\n", "# So we convert it to PNG before invoking the endpoint\n", "def png_serializer(data):\n", " im = PIL.Image.fromarray(data.reshape((28,28))*255).convert('L')\n", " f = io.BytesIO()\n", " im.save(f, format='png')\n", " f.seek(0)\n", " return f.read()\n", "\n", "optimized_predictor.content_type = 'application/x-image'\n", "optimized_predictor.serializer = png_serializer" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Invoking the endpoint" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from tensorflow.examples.tutorials.mnist import input_data\n", "from IPython import display\n", "import PIL.Image\n", "import io\n", "\n", "mnist = input_data.read_data_sets(\"/tmp/data/\", one_hot=True)\n", "\n", "for i in range(5):\n", " data = mnist.test.images[i]\n", " print(data.shape)\n", " # Display image\n", " im = PIL.Image.fromarray(data.reshape((28,28))*255).convert('L')\n", " display.display(im)\n", " # Invoke endpoint with image\n", " predict_response = optimized_predictor.predict(data)\n", " \n", " print(\"========================================\")\n", " label = np.argmax(mnist.test.labels[i])\n", " print(\"label is {}\".format(label))\n", " prediction = predict_response\n", " print(\"prediction is {}\".format(prediction))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## AWS Serverless Application Model (SAM)\n", "\n", "The AWS Serverless Application Model is an extension of AWS CloudFormation to provide a simplified way of defining Amazon API Gateway APIs, AWS Lambda functions, and Amazon DynamoDB tables needed by your serverless application. In the next few steps we will create a serverless application to invoke the SageMaker Neo endpoint through an API Gateway.\n", "\n", "### Deploy using SAM \n", "The `service.py` file in the zip file contains the code required to invoke the SageMaker endpoint.\n", "\n", "```python\n", "from __future__ import print_function\n", "\n", "import boto3\n", "from PIL import Image\n", "from io import BytesIO\n", "import base64\n", "import os \n", "import json\n", "import os.path\n", "import sys\n", "import urllib\n", "import ast\n", "import numpy as np \n", "\n", "def handler(event, context):\n", " print(event)\n", " print(context)\n", " body = json.loads(event['body'])\n", " endpoint = os.environ['ENDPOINT_NAME'] \n", "\n", " print(\"%s\" % (endpoint))\n", " image_str = base64.b64decode(body['data'])\n", " image = Image.open(BytesIO(image_str))\n", "\n", " with BytesIO() as output:\n", " with image as img:\n", " img.save(output, 'PNG')\n", " data = output.getvalue()\n", "\n", " runtime = boto3.Session().client(service_name='sagemaker-runtime')\n", " response = runtime.invoke_endpoint(EndpointName=endpoint, ContentType='application/x-image', Body=data)\n", " \n", " result = response['Body'].read()\n", " predictions = ast.literal_eval(result)\n", " prediction = np.argmax(predictions)\n", " response = {\n", " \"statusCode\": \"200\",\n", " \"headers\": {\n", " 'Content-Type': 'application/json'\n", " },\n", " \"body\": json.dumps({\n", " 'prediction' : prediction\n", " })\n", " }\n", "\n", " return response\n", "\n", "```" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Preparing the [AWS Lamba Layer](https://docs.aws.amazon.com/lambda/latest/dg/configuration-layers.html)\n", "\n", "We will be making a custom [AWS Lambda Layer](https://medium.com/@adhorn/getting-started-with-aws-lambda-layers-for-python-6e10b1f9a5d) to make a common library for the dependencies we require on [NumPy](http://www.numpy.org/) and [Pillow](https://python-pillow.org/). To do so we must first create the directory `layers/pil_layer/python` that will be used to `pip` install the necessary libraries from a `requirements.txt` file locally." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "!mkdir -p layers/pil_layer/python" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%%writefile layers/pil_layer/requirements.txt\n", "\n", "numpy\n", "pillow" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "!pip install -r layers/pil_layer/requirements.txt -t layers/pil_layer/python " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Zip the Lambda Layer\n", "\n", "**NOTE: One trick to packaging a Lambda Layer for Python is that the contents of the layer must sit within a folder with one of the following names. You should not place the contents of the layer into the root of the .zip file.**\n", "\n", "Acceptable folder names for Python are: python, python/lib/python3.7/site-packages\n", "\n", "More info on including library dependencies in a layer [here](https://docs.aws.amazon.com/lambda/latest/dg/configuration-layers.html#configuration-layers-path)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%%bash\n", "cd layers/pil_layer/\n", "\n", "zip -r ../pil-python27-layer.zip python/*" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### [Publish the Lambda Layer](https://docs.aws.amazon.com/lambda/latest/dg/configuration-layers.html#configuration-layers-manage)\n", "\n", "To create a layer you can use the `publish-layer-version` command with a name, description, archive file, and a list of compatible runtimes with the layer. The list of runtimes is optional, but it makes the layer easier to discover." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "!aws lambda publish-layer-version --layer-name pil_numpy_layer_2_7 --zip-file fileb://layers/pil-python27-layer.zip --compatible-runtimes python2.7 --description 'Numpy, Pillow dependencies layer' --license-info \"MIT\"\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### [Create AWS Lambda Function](https://docs.aws.amazon.com/lambda/latest/dg/python-programming-model.html)\n", "\n", "We now want to create the AWS Lambda function that will be used to invoke the SageMaker endpoint. All files required for the application will be loaded into `lambda_func` directory in the notebook. We will pass the endpoint name in an environment variable `ENDPOINT_NAME` that will be used with boto3 to invoke the SageMaker endpoint." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "!mkdir lambda_func" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%%writefile lambda_func/service.py\n", "\n", "from __future__ import print_function\n", "\n", "import boto3\n", "from PIL import Image\n", "from io import BytesIO\n", "import base64\n", "import os \n", "import json\n", "import os.path\n", "import sys\n", "import urllib\n", "import ast\n", "import numpy as np \n", "\n", "def handler(event, context):\n", " print(event)\n", " print(context)\n", " body = json.loads(event['body'])\n", " endpoint = os.environ['ENDPOINT_NAME'] \n", "\n", " print(\"%s\" % (endpoint))\n", " image_str = base64.b64decode(body['data'])\n", " image = Image.open(BytesIO(image_str))\n", "\n", " with BytesIO() as output:\n", " with image as img:\n", " img.save(output, 'PNG')\n", " data = output.getvalue()\n", "\n", " runtime = boto3.Session().client(service_name='sagemaker-runtime')\n", " response = runtime.invoke_endpoint(EndpointName=endpoint, ContentType='application/x-image', Body=data)\n", " \n", " result = response['Body'].read()\n", " predictions = ast.literal_eval(result)\n", " prediction = np.argmax(predictions)\n", " response = {\n", " \"statusCode\": \"200\",\n", " \"headers\": {\n", " 'Content-Type': 'application/json'\n", " },\n", " \"body\": json.dumps({\n", " 'prediction' : prediction\n", " })\n", " }\n", "\n", " return response\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Get the Neo optimized endpoint from the estimator" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "print(optimized_predictor.endpoint)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Create the SAM yaml file to deploy our service\n", "\n", "The yaml file below is the CloudFormation template needed to deploy our service. We will point to the CodeUri we uploaded above and created an environment variable `ENDPOINT_NAME` to pass into the Lambda function. Find the `{{optimized_predictor.endpoint}}` below and replace with the cell result above." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%%writefile tensorflow-lambda.yaml\n", "\n", "AWSTemplateFormatVersion: '2010-09-09'\n", "Transform: 'AWS::Serverless-2016-10-31'\n", "Description: Tensorflow MNIST Inference REST API\n", "Resources:\n", " InferenceApi:\n", " Type: AWS::Serverless::Function\n", " Properties:\n", " CodeUri: lambda_func/\n", " Handler: service.handler\n", " Runtime: python2.7\n", " MemorySize: 1024\n", " Timeout: 30\n", " Layers:\n", " - Ref: PILLayer\n", " Events:\n", " Endpoint:\n", " Type: Api\n", " Properties:\n", " Path: /predict\n", " Method: post\n", " Policies:\n", " # Give SageMaker Full Access to your Lambda Function\n", " - AmazonSageMakerFullAccess\n", " Environment:\n", " Variables:\n", " ENDPOINT_NAME: {{optimized_predictor.endpoint}} #Replace with result from cell above\n", " PILLayer:\n", " Type: 'AWS::Serverless::LayerVersion'\n", " Properties:\n", " LayerName: pil-python27\n", " Description: Pillow and Tensorflow for Python 2.7\n", " ContentUri: ./layers/pil-python27-layer.zip\n", " CompatibleRuntimes:\n", " - python2.7\n", " RetentionPolicy: Retain\n", "\n", "Outputs:\n", " InferenceApi:\n", " Description: \"API Gateway endpoint URL for Prod stage for Hello World function\"\n", " Value: !Sub \"https://${ServerlessRestApi}.execute-api.${AWS::Region}.amazonaws.com/Prod/predict/\"\n", "\n", " InferenceFunction:\n", " Description: \"Hello World Lambda Function ARN\"\n", " Value: !GetAtt InferenceApi.Arn" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### [Validate template](https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/using-cfn-validate-template.html)\n", "\n", "The `aws cloudformation validate-template` command is designed to check only the syntax of your template. It does not ensure that the property values that you have specified for a resource are valid for that resource. Nor does it determine the number of resources that will exist when the stack is created." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "!aws cloudformation validate-template --template-body file://tensorflow-lambda.yaml" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### [Package deployment](https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/using-cfn-cli-package.html)\n", "\n", "For some resource properties that require an Amazon S3 location (a bucket name and filename), you can specify local references instead. For example, you might specify the S3 location of your AWS Lambda function's source code or an Amazon API Gateway REST API's OpenAPI (formerly Swagger) file. Instead of manually uploading the files to an S3 bucket and then adding the location to your template, you can specify local references, called local artifacts, in your template and then use the package command to quickly upload them. A local artifact is a path to a file or folder that the package command uploads to Amazon S3. For example, an artifact can be a local path to your AWS Lambda function's source code or an Amazon API Gateway REST API's OpenAPI file.\n", "\n", "With the yaml file created we can now `package` out CloudFormation template to prepare it for deployment and finally call the `deploy` function on the CloudFormation service to build our API service for out Inference endpoint." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "prefix = os.path.join('sagemaker', 'lambda')" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "!aws cloudformation package \\\n", " --template-file tensorflow-lambda.yaml \\\n", " --output-template-file tensorflow-lambda-out.yaml \\\n", " --s3-bucket $bucket \\\n", " --s3-prefix $prefix" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### [Deploy Application](https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/using-cfn-cli-deploy.html)\n", "\n", "AWS CloudFormation requires you to use a change set to create a template that includes transforms. Instead of independently creating and then executing a change set, use the `aws cloudformation deploy` command. When you run this command, it creates a change set, executes the change set, and then terminates. This command reduces the numbers of required steps when you create or update a stack that includes transforms." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "stack_name = \"Tensorflow-Lambda-MNIST\"\n", "\n", "!aws cloudformation deploy \\\n", "--template-file tensorflow-lambda-out.yaml \\\n", "--stack-name $stack_name \\\n", "--capabilities CAPABILITY_IAM \\\n", "--region $region" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Get API Gateway Endpoint" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "stacks = cfn.describe_stacks(StackName=stack_name)\n", "stack = stacks[\"Stacks\"][0]\n", "\n", "for output in stack[\"Outputs\"]:\n", " if output[\"OutputKey\"] == 'InferenceApi':\n", " api_endpoint = output[\"OutputValue\"]\n", " print('%s=%s (%s)' % (output[\"OutputKey\"], output[\"OutputValue\"], output[\"Description\"]))\n", " \n", "print(api_endpoint)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Evaluate\n", "We can now use this predictor to classify hand-written digits. Drawing into the image box loads the pixel data into a `data` variable in this notebook, which we can then pass to the `predictor`." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from IPython.display import HTML\n", "HTML(open(\"input.html\").read())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Save the output of the html reader to PNG" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%matplotlib inline\n", "import numpy as np\n", "import imageio\n", "from PIL import Image\n", "\n", "image = np.array([data], dtype=np.float32)\n", "two_d = (np.reshape(image, (28, 28)) * 255).astype(np.uint8)\n", "imageio.imwrite(\"output.png\", two_d)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Using the requests library to make a request to the new API Gateway endpoint\n", "\n", "In this section we will take the PNG we saved locally and base64 encode it and send it to the API gateway endpoint. The gateway endpoint is deployed to the Prod stage and has a POST medthod at the /predict path. The Lambda function calls the Neo optimized hosted model and will use the `neo_preprocess` method of the model to convert the image into the appropriate input format." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import requests\n", "import json\n", "import base64\n", "import numpy as np\n", "from io import BytesIO\n", "import ast\n", "\n", "print(api_endpoint)\n", "\n", "# prepare headers for http request\n", "content_type = 'application/json'\n", "headers = {'content-type': content_type}\n", "\n", "file_name = \"output.png\"\n", "encoded_string = \"\"\n", "with open(file_name, \"rb\") as image_file:\n", " encoded_string = base64.b64encode(image_file.read())\n", "\n", "body = {\n", " \"data\": encoded_string\n", "}\n", "print(body)\n", "response = requests.post(api_endpoint, data=json.dumps(body), headers=headers)\n", "print(response)\n", "prediction = response.json()\n", "print(\"prediction is {}\".format(prediction['prediction']))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Set up hyperparameter tuning job\n", "*Note, with the default setting below, the hyperparameter tuning job can take about 30 minutes to complete.*\n", "\n", "Now we will set up the hyperparameter tuning job using SageMaker Python SDK, following below steps:\n", "* Create an estimator to set up the TensorFlow training job\n", "* Define the ranges of hyperparameters we plan to tune, in this example, we are tuning \"learning_rate\"\n", "* Define the objective metric for the tuning job to optimize\n", "* Create a hyperparameter tuner with above setting, as well as tuning resource configurations " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Similar to training a single TensorFlow job in SageMaker, we define our TensorFlow estimator passing in the TensorFlow script, IAM role, and (per job) hardware configuration." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import boto3\n", "from time import gmtime, strftime\n", "from sagemaker.tensorflow import TensorFlow\n", "from sagemaker.tuner import IntegerParameter, CategoricalParameter, ContinuousParameter, HyperparameterTuner\n", "\n", "estimator = TensorFlow(entry_point='mnist.py',\n", " role=role,\n", " framework_version='1.11.0',\n", " training_steps=1000, \n", " evaluation_steps=100,\n", " train_instance_count=1,\n", " train_instance_type='ml.m4.xlarge',\n", " base_job_name='DEMO-hpo-tensorflow')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Once we've defined our estimator we can specify the hyperparameters we'd like to tune and their possible values. We have three different types of hyperparameters.\n", "- Categorical parameters need to take one value from a discrete set. We define this by passing the list of possible values to `CategoricalParameter(list)`\n", "- Continuous parameters can take any real number value between the minimum and maximum value, defined by `ContinuousParameter(min, max)`\n", "- Integer parameters can take any integer value between the minimum and maximum value, defined by `IntegerParameter(min, max)`\n", "\n", "*Note, if possible, it's almost always best to specify a value as the least restrictive type. For example, tuning learning rate as a continuous value between 0.01 and 0.2 is likely to yield a better result than tuning as a categorical parameter with values 0.01, 0.1, 0.15, or 0.2.*" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "hyperparameter_ranges = {'learning_rate': ContinuousParameter(0.01, 0.2)}" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Next we'll specify the objective metric that we'd like to tune and its definition, which includes the regular expression (Regex) needed to extract that metric from the CloudWatch logs of the training job. In this particular case, our script emits loss value and we will use it as the objective metric, we also set the objective_type to be 'minimize', so that hyperparameter tuning seeks to minize the objective metric when searching for the best hyperparameter setting. By default, objective_type is set to 'maximize'." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "objective_metric_name = 'loss'\n", "objective_type = 'Minimize'\n", "metric_definitions = [{'Name': 'loss',\n", " 'Regex': 'loss = ([0-9\\\\.]+)'}]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now, we'll create a `HyperparameterTuner` object, to which we pass:\n", "- The TensorFlow estimator we created above\n", "- Our hyperparameter ranges\n", "- Objective metric name and definition\n", "- Tuning resource configurations such as Number of training jobs to run in total and how many training jobs can be run in parallel." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "tuner = HyperparameterTuner(estimator,\n", " objective_metric_name,\n", " hyperparameter_ranges,\n", " metric_definitions,\n", " max_jobs=9,\n", " max_parallel_jobs=3,\n", " objective_type=objective_type)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Launch hyperparameter tuning job\n", "And finally, we can start our hyperprameter tuning job by calling `.fit()` and passing in the S3 path to our train and test dataset.\n", "\n", "After the hyperprameter tuning job is created, you should be able to describe the tuning job to see its progress in the next step, and you can go to SageMaker console->Jobs to check out the progress of the progress of the hyperparameter tuning job." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "tuner.fit(inputs)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's just run a quick check of the hyperparameter tuning jobs status to make sure it started successfully." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "boto3.client('sagemaker').describe_hyper_parameter_tuning_job(\n", " HyperParameterTuningJobName=tuner.latest_tuning_job.job_name)['HyperParameterTuningJobStatus']" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Analyze tuning job results - after tuning job is completed\n", "Please refer to \"HPO_Analyze_TuningJob_Results.ipynb\" to see example code to analyze the tuning job results." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Deploy the best model\n", "Now that we have got the best model, we can deploy it to an endpoint. Please refer to other SageMaker sample notebooks or SageMaker documentation to see how to deploy a model." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Extra Credit: Training with SageMaker Pipe Mode and TensorFlow using the SageMaker Python SDK\n", "\n", "SageMaker Pipe Mode is an input mechanism for SageMaker training containers based on Linux named pipes. SageMaker makes the data available to the training container using named pipes, which allows data to be downloaded from S3 to the container while training is running. For larger datasets, this dramatically improves the time to start training, as the data does not need to be first downloaded to the container. To learn more about pipe mode, please consult the AWS documentation at: https://docs.aws.amazon.com/sagemaker/latest/dg/your-algorithms-training-algo.html#your-algorithms-training-algo-running-container-trainingdata.\n", "\n", "In this tutorial, we show you how to train a tf.estimator using data read with SageMaker Pipe Mode. We'll use the SageMaker `PipeModeDataset` class - a special TensorFlow `Dataset` built specifically to read from SageMaker Pipe Mode data. This `Dataset` is available in our TensorFlow containers for TensorFlow versions 1.7.0 and up. It's also open-sourced at https://github.com/aws/sagemaker-tensorflow-extensions and can be built into custom TensorFlow images for use in SageMaker. \n", "\n", "Although you can also build the PipeModeDataset into your own containers, in this tutorial we'll show how you can use the PipeModeDataset by launching training from the SageMaker Python SDK. The SageMaker Python SDK helps you deploy your models for training and hosting in optimized, production ready containers in SageMaker. The SageMaker Python SDK is easy to use, modular, extensible and compatible with TensorFlow and MXNet. \n", "\n", "Different collections of S3 files can be made available to the training container while it's running. These are referred to as \"channels\" in SageMaker. In this example, we use two channels - one for training data and one for evaluation data. Each channel is mapped to S3 files from different directories. The SageMaker PipeModeDataset knows how to read from the named pipes for each channel given just the channel name. When we launch SageMaker training we tell SageMaker what channels we have and where in S3 to read the data for each channel.\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Complete training source code \n", "\n", "In this section of the tutorial we train a TensorFlow LinearClassifier using pipe mode data. The TensorFlow training script is contained in following file:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "!pygmentize 'pipemode.py'" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The above script implements all the functions required for a sagemaker tensorflow training script (See: [Preparing TensorFlow Training Script](https://github.com/aws/sagemaker-python-sdk/blob/master/src/sagemaker/tensorflow/README.rst#preparing-the-tensorflow-training-script)). " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Using a PipeModeDataset in an input_fn\n", "To train an estimator using a Pipe Mode channel, we must construct an input_fn that reads from the channel. To do this, we use the SageMaker PipeModeDataset. This is a TensorFlow Dataset specifically created to read from a SageMaker Pipe Mode channel. A PipeModeDataset is a fully-featured TensorFlow Dataset and can be used in exactly the same ways as a regular TensorFlow Dataset can be used.\n", "\n", "The training and evaluation data used in this tutorial is synthetic. It contains a series of records stored in a TensorFlow Example protobuf object. Each record contains a numeric class label and an array of 1024 floating point numbers. Each array is sampled from a multi-dimensional Gaussian distribution with a class-specific mean. This means it is possible to learn a model using a TensorFlow Linear classifier which can classify examples well. Each record is separated using RecordIO encoding (though the PipeModeDataset class also supports the TFRecord format as well). \n", "\n", "The training and evaluation data were produced using the benchmarking source code in the sagemaker-tensorflow-extensions benchmarking sub-package. If you want to investigate this further, please visit the GitHub repository for sagemaker-tensorflow-extensions at https://github.com/aws/sagemaker-tensorflow-extensions. \n", "\n", "The following example code shows how to use a PipeModeDataset in an input_fn.\n", "\n", "```python\n", "from sagemaker_tensorflow import PipeModeDataset\n", "\n", "def input_fn():\n", " # Simple example data - a labeled vector.\n", " features = {\n", " 'data': tf.FixedLenFeature([], tf.string),\n", " 'labels': tf.FixedLenFeature([], tf.int64),\n", " }\n", " \n", " # A function to parse record bytes to a labeled vector record\n", " def parse(record):\n", " parsed = tf.parse_single_example(record, features)\n", " return ({\n", " 'data': tf.decode_raw(parsed['data'], tf.float64)\n", " }, parsed['labels'])\n", "\n", " # Construct a PipeModeDataset reading from a 'training' channel, using\n", " # the TF Record encoding.\n", " ds = PipeModeDataset(channel='training', record_format='TFRecord')\n", "\n", " # The PipeModeDataset is a TensorFlow Dataset and provides standard Dataset methods\n", " ds = ds.repeat(20)\n", " ds = ds.prefetch(10)\n", " ds = ds.map(parse, num_parallel_calls=10)\n", " ds = ds.batch(64)\n", " \n", " return ds\n", "```" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Running training using the Python SDK\n", "\n", "We can use the SDK to run our local training script on SageMaker infrastructure.\n", "\n", "1. Pass the path to the pipemode.py file, which contains the functions for defining your estimator, to the sagemaker.TensorFlow init method.\n", "2. Pass the S3 location that we uploaded our data to previously to the fit() method." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from sagemaker.tensorflow import TensorFlow\n", "\n", "tensorflow = TensorFlow(entry_point='pipemode.py',\n", " role=role,\n", " framework_version='1.11.0',\n", " input_mode='Pipe',\n", " output_path=model_artifacts_location,\n", " code_location=custom_code_upload_location,\n", " train_instance_count=1,\n", " training_steps=1000,\n", " evaluation_steps=100,\n", " train_instance_type='ml.c4.xlarge')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "After we've created the SageMaker Python SDK TensorFlow object, we can call fit to launch TensorFlow training:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%%time\n", "import boto3\n", "\n", "# use the region-specific sample data bucket\n", "region = boto3.Session().region_name\n", "\n", "# training data that is already partitioned\n", "train_data = 's3://sagemaker-sample-data-{}/tensorflow/pipe-mode/train'.format(region)\n", "eval_data = 's3://sagemaker-sample-data-{}/tensorflow/pipe-mode/eval'.format(region)\n", "\n", "tensorflow.fit({'train':train_data, 'eval':eval_data})" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "After ``fit`` returns, you've successfully trained a TensorFlow LinearClassifier using SageMaker pipe mode! The TensorFlow model data will be stored in '``s3:///artifacts``' - where '````' is the name of the bucket you supplied earlier." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Final cleanup" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Delete the CloudFormation template" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "!aws cloudformation delete-stack --stack-name $stack_name" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Deleting SageMaker Neo endpoint" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "sagemaker.Session().delete_endpoint(optimized_predictor.endpoint)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Deleting SageMaker the endpoint" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "sagemaker.Session().delete_endpoint(mnist_predictor.endpoint)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Delete S3 Bucket" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "workshop.delete_bucket_completely(bucket)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "conda_tensorflow_p27", "language": "python", "name": "conda_tensorflow_p27" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 2 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython2", "version": "2.7.15" }, "notice": "Copyright 2017 Amazon.com, Inc. or its affiliates. All Rights Reserved. Licensed under the Apache License, Version 2.0 (the \"License\"). You may not use this file except in compliance with the License. A copy of the License is located at http://aws.amazon.com/apache2.0/ or in the \"license\" file accompanying this file. This file is distributed on an \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License." }, "nbformat": 4, "nbformat_minor": 2 }