{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "## Hosting a Pretrained Model on SageMaker\n", " \n", "Amazon SageMaker is a service to accelerate the entire machine learning lifecycle. It includes components for building, training and deploying machine learning models. Each SageMaker component is modular, so you're welcome to only use the features needed for your use case. One of the most popular features of SageMaker is [model hosting](https://docs.aws.amazon.com/sagemaker/latest/dg/how-it-works-deployment.html). Using SageMaker Hosting you can deploy your model as a scalable, highly available, multi-process API endpoint with a few lines of code. In this notebook, we will demonstrate how to host a pretrained model (BERT) in Amazon SageMaker to extract embeddings from text.\n", "\n", "SageMaker provides prebuilt containers that can be used for training, hosting, or data processing. The inference containers include a web serving stack, so you don't need to install and configure one. We will be using the SageMaker [PyTorch container](https://github.com/aws/deep-learning-containers), but you may use the [TensorFlow container](https://github.com/aws/deep-learning-containers/blob/master/available_images.md), or bring your own container if needed. \n", "\n", "This notebook will walk you through how to deploy a pretrained Hugging Face model as a scalable, highly available, production ready API in under 15 minutes." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Retrieve Model Artifacts\n", "\n", "First we will download the model artifacts for the pretrained [BERT](https://arxiv.org/abs/1810.04805) model. BERT is a popular natural language processing (NLP) model that extracts meaning and context from text." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "scrolled": true }, "outputs": [], "source": [ "!pip install transformers==3.3.1 sagemaker==2.15.0 --quiet" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import os\n", "from transformers import BertTokenizer, BertModel\n", "\n", "tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')\n", "model = BertModel.from_pretrained(\"bert-base-uncased\")\n", "\n", "model_path = 'model/'\n", "code_path = 'code/'\n", "\n", "if not os.path.exists(model_path):\n", " os.mkdir(model_path)\n", " \n", "model.save_pretrained(save_directory=model_path)\n", "tokenizer.save_pretrained(save_directory=model_path)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Write the Inference Script\n", "\n", "Since we are bringing a model to SageMaker, we must create an inference script. The script will run inside our PyTorch container. Our script should include a function for model loading, and optionally functions generating predicitions, and input/output processing. The PyTorch container provides default implementations for generating a prediction and input/output processing. By including these functions in your script you are overriding the default functions. You can find additional [details here](https://sagemaker.readthedocs.io/en/stable/frameworks/pytorch/using_pytorch.html#serve-a-pytorch-model).\n", "\n", "In the next cell we'll see our inference script. You will notice that it uses the [transformers library from HuggingFace](https://huggingface.co/transformers/). This Python library is not installed in the container by default, so we will have to add that in the next section." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "!pygmentize code/inference_code.py" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Package Model\n", "\n", "For hosting, SageMaker requires that the deployment package be structed in a compatible format. It expects all files to be packaged in a tar archive named \"model.tar.gz\" with gzip compression. To install additional libraries at container startup, we can add a [requirements.txt](https://sagemaker.readthedocs.io/en/stable/frameworks/pytorch/using_pytorch.html#using-third-party-libraries) text file that specifies the libraries to be installed using [pip](https://pypi.org/project/pip/). Within the archive, the PyTorch container expects all inference code and requirements.txt file to be inside the code/ directory. See the [guide here](https://sagemaker.readthedocs.io/en/stable/frameworks/pytorch/using_pytorch.html#for-versions-1-2-and-higher) for a thorough explanation of the required directory structure. " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import tarfile\n", "\n", "zipped_model_path = os.path.join(model_path, \"model.tar.gz\")\n", "\n", "with tarfile.open(zipped_model_path, \"w:gz\") as tar:\n", " tar.add(model_path)\n", " tar.add(code_path)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Deploy Model\n", "\n", "Now that we have our deployment package, we can use the [SageMaker SDK](https://sagemaker.readthedocs.io/en/stable/index.html) to deploy our API endpoint with two lines of code. We need to specify an IAM role for the SageMaker endpoint to use. Minimally, it will need read access to the default SageMaker bucket (usually named sagemaker-{region}-{your account number}) so it can read the deployment package. When we call deploy(), the SDK will save our deployment archive to S3 for the SageMaker endpoint to use. We will use the helper function [get_execution_role](https://sagemaker.readthedocs.io/en/stable/api/utility/session.html?highlight=get_execution_role#sagemaker.session.get_execution_role) to retrieve our current IAM role so we can pass it to the SageMaker endpoint. Minimally it will require read access to the model artifacts in S3 and the [ECR repository](https://github.com/aws/deep-learning-containers/blob/master/available_images.md) where the container image is stored by AWS.\n", "\n", "\n", "You may notice that we specify our PyTorch version and Python version when creating the PyTorchModel object. The SageMaker SDK uses these parameters to determine which PyTorch container to use. \n", "\n", "We'll choose an m5 instance for our endpoint to ensure we have sufficient memory to serve our model. " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from sagemaker.pytorch import PyTorchModel\n", "from sagemaker import get_execution_role\n", "\n", "endpoint_name = 'bert-base'\n", "\n", "model = PyTorchModel(entry_point='inference_code.py', \n", " model_data=zipped_model_path, \n", " role=get_execution_role(), \n", " framework_version='1.5', \n", " py_version='py3')\n", "\n", "predictor = model.deploy(initial_instance_count=1, \n", " instance_type='ml.m5.xlarge', \n", " endpoint_name=endpoint_name)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Get Predictions\n", "\n", "Now that our API endpoint is deployed, we can send it text to get predictions from our BERT model. You can use the SageMaker SDK or the [SageMaker Runtime API](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_runtime_InvokeEndpoint.html) to invoke the endpoint. " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import boto3\n", "\n", "sm = boto3.client('sagemaker-runtime')\n", "\n", "prompt = \"The best part of Amazon SageMaker is that it makes machine learning easy.\"\n", "\n", "response = sm.invoke_endpoint(EndpointName=endpoint_name, \n", " Body=prompt.encode(encoding='UTF-8'),\n", " ContentType='text/csv')\n", "\n", "response['Body'].read()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Conclusion\n", "\n", "You have successfully created a scalable, high available, RESTful API that is backed by a BERT model! It can be used for downstreaming NLP tasks like text classification. If you are still interested in learning more, check out some of the more advanced features of SageMaker Hosting, like [model monitoring](https://docs.aws.amazon.com/sagemaker/latest/dg/model-monitor.html) to detect concept drift, [autoscaling](https://docs.aws.amazon.com/sagemaker/latest/dg/endpoint-auto-scaling.html) to dynamically adjust the number of instances, or [VPC config](https://docs.aws.amazon.com/sagemaker/latest/dg/host-vpc.html) to control network access to/from your endpoint.\n", "\n", "You can also look in to the [ezsmdeploy SDK](https://aws.amazon.com/blogs/opensource/deploy-machine-learning-models-to-amazon-sagemaker-using-the-ezsmdeploy-python-package-and-a-few-lines-of-code/) that automates most of this process." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "conda_pytorch_latest_p36", "language": "python", "name": "conda_pytorch_latest_p36" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.10" } }, "nbformat": 4, "nbformat_minor": 4 }