{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Part 2: Deploy a model trained using SageMaker distributed data parallel\n" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "---\n", "\n", "This notebook's CI test result for us-west-2 is as follows. CI test results in other regions can be found at the end of the notebook. \n", "\n", "![This us-west-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/us-west-2/training|distributed_training|pytorch|data_parallel|mnist|infer_pytorch.ipynb)\n", "\n", "---" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "Use this notebook after you have completed **Part 1: Distributed data parallel MNIST training with PyTorch and SageMaker's distributed data parallel library** in the notebook pytorch_smdataparallel_mnist_demo.ipynb. To deploy the model you previously trained, you need to create a Sagemaker Endpoint. This is a hosted prediction service that you can use to perform inference.\n", "\n", "## Finding the model\n", "\n", "This notebook uses a stored model if it exists. If you recently ran a training example that use the `%store%` magic, it will be restored in the next cell.\n", "\n", "Otherwise, you can pass the URI to the model file (a .tar.gz file) in the `model_data` variable.\n", "\n", "To find the location of model files in the [SageMaker console](https://console.aws.amazon.com/sagemaker/home), do the following: \n", "\n", "1. Go to the SageMaker console: https://console.aws.amazon.com/sagemaker/home.\n", "1. Select **Training** in the left navigation pane and then Select **Training jobs**. \n", "1. Find your recent training job and choose it.\n", "1. In the **Output** section, you should see an S3 URI under **S3 model artifact**. Copy this S3 URI.\n", "1. Uncomment the `model_data` line in the next cell that manually sets the model's URI and replace the placeholder value with that S3 URI." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Retrieve a saved model from a previous notebook run's stored variable\n", "%store -r model_data\n", "\n", "# If no model was found, set it manually here.\n", "# model_data = 's3://sagemaker-us-west-2-XXX/pytorch-smdataparallel-mnist-2020-10-16-17-15-16-419/output/model.tar.gz'\n", "\n", "print(\"Using this model: {}\".format(model_data))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Create a model object\n", "\n", "You define the model object by using the SageMaker Python SDK's `PyTorchModel` and pass in the model from the `estimator` and the `entry_point`. The endpoint's entry point for inference is defined by `model_fn` as seen in the following code block that prints out `inference.py`. The function loads the model and sets it to use a GPU, if available." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "!pygmentize code/inference.py" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import sagemaker\n", "\n", "role = sagemaker.get_execution_role()\n", "\n", "from sagemaker.pytorch import PyTorchModel\n", "\n", "model = PyTorchModel(\n", " model_data=model_data,\n", " source_dir=\"code\",\n", " entry_point=\"inference.py\",\n", " role=role,\n", " framework_version=\"1.6.0\",\n", " py_version=\"py3\",\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Deploy the model on an endpoint\n", "\n", "You create a `predictor` by using the `model.deploy` function. You can optionally change both the instance count and instance type." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "predictor = model.deploy(initial_instance_count=1, instance_type=\"ml.m4.xlarge\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Test the model\n", "You can test the depolyed model using samples from the test set.\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Download the test set\n", "import torchvision\n", "from torchvision import datasets, transforms\n", "from torch.utils.data import DataLoader\n", "from packaging.version import Version\n", "\n", "# Set the source to download MNIST data from\n", "TORCHVISION_VERSION = \"0.9.1\"\n", "if Version(torchvision.__version__) < Version(TORCHVISION_VERSION):\n", " # Set path to data source and include checksum key to make sure data isn't corrupted\n", " datasets.MNIST.resources = [\n", " (\n", " \"https://sagemaker-sample-files.s3.amazonaws.com/datasets/image/MNIST/train-images-idx3-ubyte.gz\",\n", " \"f68b3c2dcbeaaa9fbdd348bbdeb94873\",\n", " ),\n", " (\n", " \"https://sagemaker-sample-files.s3.amazonaws.com/datasets/image/MNIST/train-labels-idx1-ubyte.gz\",\n", " \"d53e105ee54ea40749a09fcbcd1e9432\",\n", " ),\n", " (\n", " \"https://sagemaker-sample-files.s3.amazonaws.com/datasets/image/MNIST/t10k-images-idx3-ubyte.gz\",\n", " \"9fb629c4189551a2d022fa330f9573f3\",\n", " ),\n", " (\n", " \"https://sagemaker-sample-files.s3.amazonaws.com/datasets/image/MNIST/t10k-labels-idx1-ubyte.gz\",\n", " \"ec29112dd5afa0611ce80d1b7f02629c\",\n", " ),\n", " ]\n", "else:\n", " # Set path to data source\n", " datasets.MNIST.mirrors = [\n", " \"https://sagemaker-sample-files.s3.amazonaws.com/datasets/image/MNIST/\"\n", " ]\n", "\n", "\n", "test_set = datasets.MNIST(\n", " \"data\",\n", " download=True,\n", " train=False,\n", " transform=transforms.Compose(\n", " [transforms.ToTensor(), transforms.Normalize((0.1307,), (0.3081,))]\n", " ),\n", ")\n", "\n", "\n", "# Randomly sample 16 images from the test set\n", "test_loader = DataLoader(test_set, shuffle=True, batch_size=16)\n", "test_images, _ = iter(test_loader).next()\n", "\n", "# inspect the images\n", "import torchvision\n", "import numpy as np\n", "import matplotlib.pyplot as plt\n", "\n", "%matplotlib inline\n", "\n", "\n", "def imshow(img):\n", " img = img.numpy()\n", " img = np.transpose(img, (1, 2, 0))\n", " plt.imshow(img)\n", " return\n", "\n", "\n", "# unnormalize the test images for displaying\n", "unnorm_images = (test_images * 0.3081) + 0.1307\n", "\n", "print(\"Sampled test images: \")\n", "imshow(torchvision.utils.make_grid(unnorm_images))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Send the sampled images to endpoint for inference\n", "outputs = predictor.predict(test_images.numpy())\n", "predicted = np.argmax(outputs, axis=1)\n", "\n", "print(\"Predictions: \")\n", "print(predicted.tolist())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Cleanup\n", "\n", "If you don't intend on trying out inference or to do anything else with the endpoint, you should delete it." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "predictor.delete_endpoint()" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "## Notebook CI Test Results\n", "\n", "This notebook was tested in multiple regions. The test results are as follows, except for us-west-2 which is shown at the top of the notebook.\n", "\n", "![This us-east-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/us-east-1/training|distributed_training|pytorch|data_parallel|mnist|infer_pytorch.ipynb)\n", "\n", "![This us-east-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/us-east-2/training|distributed_training|pytorch|data_parallel|mnist|infer_pytorch.ipynb)\n", "\n", "![This us-west-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/us-west-1/training|distributed_training|pytorch|data_parallel|mnist|infer_pytorch.ipynb)\n", "\n", "![This ca-central-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/ca-central-1/training|distributed_training|pytorch|data_parallel|mnist|infer_pytorch.ipynb)\n", "\n", "![This sa-east-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/sa-east-1/training|distributed_training|pytorch|data_parallel|mnist|infer_pytorch.ipynb)\n", "\n", "![This eu-west-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/eu-west-1/training|distributed_training|pytorch|data_parallel|mnist|infer_pytorch.ipynb)\n", "\n", "![This eu-west-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/eu-west-2/training|distributed_training|pytorch|data_parallel|mnist|infer_pytorch.ipynb)\n", "\n", "![This eu-west-3 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/eu-west-3/training|distributed_training|pytorch|data_parallel|mnist|infer_pytorch.ipynb)\n", "\n", "![This eu-central-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/eu-central-1/training|distributed_training|pytorch|data_parallel|mnist|infer_pytorch.ipynb)\n", "\n", "![This eu-north-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/eu-north-1/training|distributed_training|pytorch|data_parallel|mnist|infer_pytorch.ipynb)\n", "\n", "![This ap-southeast-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/ap-southeast-1/training|distributed_training|pytorch|data_parallel|mnist|infer_pytorch.ipynb)\n", "\n", "![This ap-southeast-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/ap-southeast-2/training|distributed_training|pytorch|data_parallel|mnist|infer_pytorch.ipynb)\n", "\n", "![This ap-northeast-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/ap-northeast-1/training|distributed_training|pytorch|data_parallel|mnist|infer_pytorch.ipynb)\n", "\n", "![This ap-northeast-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/ap-northeast-2/training|distributed_training|pytorch|data_parallel|mnist|infer_pytorch.ipynb)\n", "\n", "![This ap-south-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/ap-south-1/training|distributed_training|pytorch|data_parallel|mnist|infer_pytorch.ipynb)\n" ] } ], "metadata": { "kernelspec": { "display_name": "conda_pytorch_p36", "language": "python", "name": "conda_pytorch_p36" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.10" } }, "nbformat": 4, "nbformat_minor": 4 }