{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Deploy Hugging Face BART transformer models with multi-model endpoints \n", "\n", "This notebook is a step-by-step tutorial on deploying multiple pre-trained PyTorch Hugging Face model [BART](https://huggingface.co/transformers/model_doc/bart.html) with multi-model endpoint on Amazon SageMaker. Bart uses a standard seq2seq/machine translation architecture with a bidirectional encoder (like BERT) and a left-to-right decoder (like GPT). Specifically, we use the BART Model with a language modeling head [BartForConditionalGeneration](https://huggingface.co/transformers/model_doc/bart.html#transformers.BartForConditionalGeneration) for summarization task. \n", "\n", "We will describe the steps for deploying a multi-model endpoint on Amazon SageMaker with TorchServe serving stack. An additional step compared to single model deployment is the requirement to create a manifest file for each model prior to deployment. For training Hugging Face models on SageMaker, refer the examples [here](https://github.com/huggingface/notebooks/tree/master/sagemaker)\n", "\n", "The outline of steps is as follows:\n", "\n", "1. Download 2 pre-trained Hugging Face model\n", "2. Use torch-archiver to create a manifest file for each model\n", "3. Save and upload model artifact to S3\n", "4. Create an inference entrypoint script\n", "5. Deploy multi-model endpoint\n", "6. Trigger endpoint invocation" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import sagemaker\n", "from sagemaker import get_execution_role\n", "from sagemaker.utils import name_from_base\n", "from sagemaker.pytorch import PyTorchModel\n", "import boto3\n", "import torch\n" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "us-east-1\n", "arn:aws:iam::208480242416:role/service-role/AmazonSageMaker-ExecutionRole-endtoendml\n", "sagemaker-us-east-1-208480242416\n" ] } ], "source": [ "from sagemaker import get_execution_role\n", "\n", "role = get_execution_role()\n", "region = boto3.Session().region_name\n", "sagemaker_session = sagemaker.session.Session()\n", "bucket = sagemaker_session.default_bucket()\n", "prefix = 'hf-multimodel-deploy-pytorch'\n", "hf_cache_dir = 'hf_cache_dir/'\n", "\n", "print(region)\n", "print(role)\n", "print(bucket)\n", "\n", "model_data_path = 's3://{0}/{1}/models'.format(bucket,prefix)\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Download the Hugging Face pretrained model" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "!pip install transformers==4.5.1 --quiet" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "transformers==4.5.1\r\n" ] } ], "source": [ "!pip freeze | grep transformers" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Note: you may need to restart the kernel to use updated packages.\n" ] } ], "source": [ "pip install -U ipywidgets --quiet" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "from transformers import BartForConditionalGeneration, BartTokenizer, BartConfig" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "e4ca39bb5ffa4712bd8ff861e1424498", "version_major": 2, "version_minor": 0 }, "text/plain": [ "Downloading: 0%| | 0.00/1.40k [00:00