{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "## Hosting a Pretrained Model on SageMaker\n", " \n", "Amazon SageMaker is a service to accelerate the entire machine learning lifecycle. It includes components for building, training and deploying machine learning models. Each SageMaker component is modular, so you're welcome to only use the features needed for your use case. One of the most popular features of SageMaker is [model hosting](https://docs.aws.amazon.com/sagemaker/latest/dg/how-it-works-deployment.html). Using SageMaker Hosting you can deploy your model as a scalable, highly available, multi-process API endpoint with a few lines of code. In this notebook, we will demonstrate how to host a pretrained model (GPT-2) in Amazon SageMaker.\n", "\n", "SageMaker provides prebuilt containers that can be used for training, hosting, or data processing. The inference containers include a web serving stack, so you don't need to install and configure one. We will be using the SageMaker [PyTorch container](https://github.com/aws/deep-learning-containers), but you may use the [TensorFlow container](https://github.com/aws/deep-learning-containers/blob/master/available_images.md), or bring your own container if needed. \n", "\n", "This notebook will walk you through how to deploy a pretrained Hugging Face model as a scalable, highly available, production ready API in under 15 minutes." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Retrieve Model Artifacts\n", "\n", "First we will download the model artifacts for the pretrained [GPT-2](https://d4mucfpksywv.cloudfront.net/better-language-models/language_models_are_unsupervised_multitask_learners.pdf) model. GPT-2 is a popular text generation model that was developed by OpenAI. Given a text prompt it can generate synthetic text that may follow." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "scrolled": true }, "outputs": [], "source": [ "!pip install transformers==3.3.1 sagemaker==2.15.0 --quiet" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import os\n", "from transformers import GPT2Tokenizer, GPT2LMHeadModel\n", "\n", "tokenizer = GPT2Tokenizer.from_pretrained('gpt2')\n", "model = GPT2LMHeadModel.from_pretrained('gpt2')\n", "\n", "model_path = 'model/'\n", "code_path = 'code/'\n", "\n", "if not os.path.exists(model_path):\n", " os.mkdir(model_path)\n", " \n", "model.save_pretrained(save_directory=model_path)\n", "tokenizer.save_vocabulary(save_directory=model_path)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Write the Inference Script\n", "\n", "Since we are bringing a model to SageMaker, we must create an inference script. The script will run inside our PyTorch container. Our script should include a function for model loading, and optionally functions generating predicitions, and input/output processing. The PyTorch container provides default implementations for generating a prediction and input/output processing. By including these functions in your script you are overriding the default functions. You can find additional [details here](https://sagemaker.readthedocs.io/en/stable/frameworks/pytorch/using_pytorch.html#serve-a-pytorch-model).\n", "\n", "In the next cell we'll see our inference script. You will notice that it uses the [transformers library from Hugging Face](https://huggingface.co/transformers/). This Python library is not installed in the container by default, so we will have to add that in the next section." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "!pygmentize code/inference_code.py" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Package Model\n", "\n", "For hosting, SageMaker requires that the deployment package be structed in a compatible format. It expects all files to be packaged in a tar archive named \"model.tar.gz\" with gzip compression. To install additional libraries at container startup, we can add a [requirements.txt](https://sagemaker.readthedocs.io/en/stable/frameworks/pytorch/using_pytorch.html#using-third-party-libraries) text file that specifies the libraries to be installed using [pip](https://pypi.org/project/pip/). Within the archive, the PyTorch container expects all inference code and requirements.txt file to be inside the code/ directory. See the [guide here](https://sagemaker.readthedocs.io/en/stable/frameworks/pytorch/using_pytorch.html#for-versions-1-2-and-higher) for a thorough explanation of the required directory structure. " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import tarfile\n", "\n", "zipped_model_path = os.path.join(model_path, \"model.tar.gz\")\n", "\n", "with tarfile.open(zipped_model_path, \"w:gz\") as tar:\n", " tar.add(model_path)\n", " tar.add(code_path)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Deploy Model\n", "\n", "Now that we have our deployment package, we can use the [SageMaker SDK](https://sagemaker.readthedocs.io/en/stable/index.html) to deploy our API endpoint with two lines of code. We need to specify an IAM role for the SageMaker endpoint to use. Minimally, it will need read access to the default SageMaker bucket (usually named sagemaker-{region}-{your account number}) so it can read the deployment package. When we call deploy(), the SDK will save our deployment archive to S3 for the SageMaker endpoint to use. We will use the helper function [get_execution_role](https://sagemaker.readthedocs.io/en/stable/api/utility/session.html?highlight=get_execution_role#sagemaker.session.get_execution_role) to retrieve our current IAM role so we can pass it to the SageMaker endpoint. You may specify another IAM role here. Minimally it will require read access to the model artifacts in S3 and the [ECR repository](https://github.com/aws/deep-learning-containers/blob/master/available_images.md) where the container image is stored by AWS.\n", "\n", "You may notice that we specify our PyTorch version and Python version when creating the PyTorchModel object. The SageMaker SDK uses these parameters to determine which PyTorch container to use. \n", "\n", "The full size [GPT-2 model has 1.2 billion parameters](https://d4mucfpksywv.cloudfront.net/better-language-models/language_models_are_unsupervised_multitask_learners.pdf). Even though we are using the small version of the model, our endpoint will need to fit millions of parameters in to memory. We'll choose an m5 instance for our endpoint to ensure we have sufficient memory to serve our model. " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from sagemaker.pytorch import PyTorchModel\n", "from sagemaker import get_execution_role\n", "\n", "endpoint_name = 'GPT2'\n", "\n", "model = PyTorchModel(entry_point='inference_code.py', \n", " model_data=zipped_model_path, \n", " role=get_execution_role(),\n", " framework_version='1.5', \n", " py_version='py3')\n", "\n", "predictor = model.deploy(initial_instance_count=1, \n", " instance_type='ml.m5.xlarge', \n", " endpoint_name=endpoint_name)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Get Predictions\n", "\n", "Now that our RESTful API endpoint is deployed, we can send it text to get predictions from our GPT-2 model. You can use the SageMaker Python SDK or the [SageMaker Runtime API](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_runtime_InvokeEndpoint.html) to invoke the endpoint. " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import boto3\n", "import json\n", "\n", "sm = boto3.client('sagemaker-runtime')\n", "\n", "prompt = \"Working with SageMaker makes machine learning \"\n", "\n", "response = sm.invoke_endpoint(EndpointName=endpoint_name, \n", " Body=json.dumps(prompt),\n", " ContentType='text/csv')\n", "\n", "response['Body'].read().decode('utf-8')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Conclusion\n", "\n", "You have successfully created a scalable, high available, RESTful API that is backed by a GPT-2 model! If you are still interested in learning more, check out some of the more advanced features of SageMaker Hosting, like [model monitoring](https://docs.aws.amazon.com/sagemaker/latest/dg/model-monitor.html) to detect concept drift, [autoscaling](https://docs.aws.amazon.com/sagemaker/latest/dg/endpoint-auto-scaling.html) to dynamically adjust the number of instances, or [VPC config](https://docs.aws.amazon.com/sagemaker/latest/dg/host-vpc.html) to control network access to/from your endpoint.\n", "\n", "You can also look in to the [ezsmdeploy SDK](https://aws.amazon.com/blogs/opensource/deploy-machine-learning-models-to-amazon-sagemaker-using-the-ezsmdeploy-python-package-and-a-few-lines-of-code/) that automates most of this process." ] } ], "metadata": { "kernelspec": { "display_name": "conda_pytorch_latest_p36", "language": "python", "name": "conda_pytorch_latest_p36" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.10" } }, "nbformat": 4, "nbformat_minor": 4 }