{ "cells": [ { "cell_type": "markdown", "id": "2c00f4b5-2e2a-4001-a75b-db900a180f3b", "metadata": {}, "source": [ "# Deploy Hugging Face transformer models with multi-model endpoints \n", "\n", "***\n", "This notebooks is designed to run on `Python 3 Data Science 2.0` kernel in Amazon SageMaker Studio\n", "***\n", "\n", "We will describe the steps for deploying a multi-model endpoint on Amazon SageMaker with TorchServe serving stack. An additional step compared to single model deployment is the requirement to create a manifest file for each model prior to deployment. For training Hugging Face models on SageMaker, refer the examples [here](https://github.com/huggingface/notebooks/tree/master/sagemaker)\n", "\n", "We will perform following steps:\n", "1. [Introduction](#Introduction) \n", "2. [Setup](#Setup)\n", "3. [Register a new HuggingFace Transformer model version](#Register-a-new-HuggingFace-Transformer-model-version)\n", "4. [Create the model metadata for multi-model endpoint](#Create-the-model-metadata-for-multi\\-model-endpoint)\n", "5. [Create the multi-model endpoint](#Create-the-multi\\-model-endpoint)" ] }, { "cell_type": "markdown", "id": "6dcbaeba-3898-4a79-bb37-46f8da19a43c", "metadata": {}, "source": [ "## Introduction\n", "\n", "In lab 1, we have demonstrated how to deploy models to Amazon SageMaker single model endpoints. SageMaker also supports deploying multiple models to one endpoint. There are three multi-model hosting options:\n", "- [Host multiple models in one container behind one endpoint](https://docs.aws.amazon.com/sagemaker/latest/dg/multi-model-endpoints.html)\n", "- [Host multiple models which use different containers behind one endpoint](https://docs.aws.amazon.com/sagemaker/latest/dg/multi-container-endpoints.html)\n", "- [Host models along with pre-processing logic as serial inference pipeline behind one endpoint](https://docs.aws.amazon.com/sagemaker/latest/dg/inference-pipelines.html)\n", "\n", "This notebook is a step-by-step instruction on deploying multiple pre-trained PyTorch Hugging Face model in one container with multi-model endpoint on Amazon SageMaker. \n" ] }, { "cell_type": "markdown", "id": "510be590-9bfa-4270-9104-7e6610bb0dec", "metadata": {}, "source": [ "## Setup" ] }, { "cell_type": "code", "execution_count": null, "id": "a1092eb5-7116-4619-8e78-7e9833f285b3", "metadata": {}, "outputs": [], "source": [ "import sagemaker\n", "from sagemaker import get_execution_role\n", "from sagemaker.utils import name_from_base\n", "from sagemaker.pytorch import PyTorchModel\n", "from sagemaker.s3 import S3Uploader, s3_path_join\n", "import boto3\n", "import time\n", "import pandas as pd\n", "from pathlib import Path\n", "import tarfile\n", "import shutil \n", "import datetime\n", "import json\n", "import os,sys\n", "p = os.path.abspath('..')\n", "if p not in sys.path:\n", " sys.path.append(p)\n", "import utils\n", "\n", "sm_session = sagemaker.Session()\n", "role = get_execution_role()\n", "region = sm_session.boto_region_name\n", "bucket = sm_session.default_bucket()\n", "sm_client = sm_session.sagemaker_client\n", "sm_runtime = sm_session.sagemaker_runtime_client\n", "prefix = \"sagemaker/huggingface-pytorch-sentiment-analysis\"" ] }, { "cell_type": "code", "execution_count": null, "id": "8f8b211b-2aee-45f7-ae96-12c2ff2fbc5e", "metadata": {}, "outputs": [], "source": [ "%store\n", "%store -r" ] }, { "cell_type": "code", "execution_count": null, "id": "9bd6cc57-49aa-40c7-bdd2-9def40e7bd30", "metadata": {}, "outputs": [], "source": [ "try:\n", " describe_model_package_group_response = sm_client.describe_model_package_group(\n", " ModelPackageGroupName=model_package_group_name\n", " )\n", " print(describe_model_package_group_response)\n", "except:\n", " print(f\"model package group {model_package_group_name} does not exist, please go through lab 1 first\")" ] }, { "cell_type": "markdown", "id": "5c3b29af-7acd-4e6d-8fdc-0964bdc2b253", "metadata": {}, "source": [ "## Register a new HuggingFace Transformer model version" ] }, { "cell_type": "markdown", "id": "4b5743d3-1f1a-457b-80fb-f758607d7b9c", "metadata": {}, "source": [ "### Register a new model version for Hugging Face roberta model with entry point script helper function\n", "To deploy the models in one container, we will use the Hugging Face prebuilt container which has the required packages for transformer models. However, we will use a custom entry point script for each of the model and define our own data preprocessing function. We will firstly download the roberta model file and prepare the model with inference script to be used in the endpoint. This updated model tar file will be registered to model registry as a new model version." ] }, { "cell_type": "code", "execution_count": null, "id": "3788dfd4-e349-4a21-a521-0d225db9bd09", "metadata": {}, "outputs": [], "source": [ "local_artifact_path = Path(\"model_artifacts\")\n", "local_artifact_path.mkdir(exist_ok=True, parents=True)\n", "model_tar_name = 'model_roberta_MME.tar.gz'\n", "org_model_tar_name = Path(model_roberta_uri).parts[-1]" ] }, { "cell_type": "code", "execution_count": null, "id": "9899d5fe-424d-4371-812e-5fd7d32c3364", "metadata": {}, "outputs": [], "source": [ "sm_session.download_data('./', bucket, str(Path(prefix, \"models\", org_model_tar_name)))" ] }, { "cell_type": "code", "execution_count": null, "id": "83aeb060-bcff-4b9c-aaa1-c3371c162598", "metadata": {}, "outputs": [], "source": [ "with tarfile.open(org_model_tar_name) as tar:\n", " tar.extractall(path=local_artifact_path.stem)" ] }, { "cell_type": "code", "execution_count": null, "id": "20228089-bbb5-49a6-b11e-8e70598f4e6b", "metadata": {}, "outputs": [], "source": [ "shutil.copytree('../code', local_artifact_path / 'code', dirs_exist_ok=True) " ] }, { "cell_type": "code", "execution_count": null, "id": "a068f177-e38f-4a33-9bf2-5620d51908a1", "metadata": {}, "outputs": [], "source": [ "tar_size = utils.create_tar(model_tar_name, local_artifact_path)\n", "print(f\"Created {model_tar_name}, size {tar_size:.2f} MB\")" ] }, { "cell_type": "code", "execution_count": null, "id": "f5bb6ab6-fd6f-4f00-be22-6a1dfa1b7464", "metadata": {}, "outputs": [], "source": [ "\n", "model_data_path = s3_path_join(\"s3://\",bucket,prefix+\"/models\")\n", "model_roberta_mme_uri =S3Uploader.upload(model_tar_name, model_data_path)\n", "print(f\"Uploaded roberta MME model to {model_roberta_mme_uri}\")\n", "%store model_roberta_mme_uri" ] }, { "cell_type": "markdown", "id": "801eb127-e8b6-40bc-9cdf-77c23f9b41c0", "metadata": {}, "source": [ "Prepare model package parameters based on the existing roberta model package" ] }, { "cell_type": "code", "execution_count": null, "id": "44495d63-7bc3-4576-87d7-ff285c2b417d", "metadata": {}, "outputs": [], "source": [ "describe_model_package_response = sm_client.describe_model_package(\n", " ModelPackageName=roberta_model_package_arn\n", ")\n", "describe_model_package_response" ] }, { "cell_type": "code", "execution_count": null, "id": "3ac63c4d-d62b-46bf-b251-5bc383ce6690", "metadata": {}, "outputs": [], "source": [ "model_package_keys = [\"ModelPackageGroupName\", \"Domain\", \"Task\", \"InferenceSpecification\"]\n", "roberta_model_package_mme = dict()\n", "for key in model_package_keys:\n", " roberta_model_package_mme[key] = describe_model_package_response[key]\n", "roberta_model_package_mme[\"ModelPackageDescription\"] = \"Hugging Face Roberta Model MME - sentiment analysis\"\n", "roberta_model_package_mme[\"InferenceSpecification\"][\"Containers\"][0][\"ContainerHostname\"] = \"huggingface-pytorch-roberta-mme\"\n", "roberta_model_package_mme[\"InferenceSpecification\"][\"Containers\"][0][\"ModelDataUrl\"] = model_roberta_mme_uri\n", "roberta_model_package_mme[\"InferenceSpecification\"][\"Containers\"][0][\"Environment\"][\"SAGEMAKER_SUBMIT_DIRECTORY\"] = model_roberta_mme_uri \n", "roberta_model_package_mme" ] }, { "cell_type": "code", "execution_count": null, "id": "39c0063f-791c-40d2-8580-53b0eaa10464", "metadata": {}, "outputs": [], "source": [ "model_package_response = sm_client.create_model_package(**roberta_model_package_mme)" ] }, { "cell_type": "code", "execution_count": null, "id": "3931c5d2-4cfc-46af-a10a-fa129f6d280c", "metadata": {}, "outputs": [], "source": [ "list_model_packages_response = sm_client.list_model_packages(\n", " ModelPackageGroupName=model_package_group_name\n", ")\n", "list_model_packages_response" ] }, { "cell_type": "code", "execution_count": null, "id": "3aa6b155-6e41-42c2-af56-9e583190e7f3", "metadata": {}, "outputs": [], "source": [ "# we will use the roberta mme model and the distilbert model to create the multi-model endpoint\n", "roberta_mme_model_version_arn = list_model_packages_response[\"ModelPackageSummaryList\"][0][\"ModelPackageArn\"]\n", "print(f\"roberta MME model: {roberta_mme_model_version_arn}\")\n", "distilbert_model_version_arn = list_model_packages_response[\"ModelPackageSummaryList\"][1][\"ModelPackageArn\"]\n", "print(f\"distilbert model: {distilbert_model_version_arn}\")" ] }, { "cell_type": "code", "execution_count": null, "id": "543e98aa-5d84-4664-97f3-9b5fc22ae49e", "metadata": {}, "outputs": [], "source": [ "# before deploying the model from model registry, we need to approve the model package version\n", "model_package_update_input_dict = {\n", " \"ModelPackageArn\": roberta_mme_model_version_arn,\n", " \"ModelApprovalStatus\": \"Approved\",\n", "}\n", "model_package_update_response = sm_client.update_model_package(**model_package_update_input_dict)\n", "model_package_update_response" ] }, { "cell_type": "markdown", "id": "03eadf0f-d2a2-43e5-89eb-a0f42398802a", "metadata": {}, "source": [ "#### Create the Roberta MME model object" ] }, { "cell_type": "code", "execution_count": null, "id": "4953ccc3-cbcd-4dd4-93dd-6bb15561b02e", "metadata": {}, "outputs": [], "source": [ "now_roberta_mme = f'{datetime.datetime.now():%Y-%m-%d-%H-%M-%S}'\n", "roberta_mme_model_name = f\"hf-pytorch-model-roberta-mme-{now_roberta_mme}\"\n", "print(f\"Model name : {roberta_mme_model_name}\")\n", "%store roberta_mme_model_name" ] }, { "cell_type": "code", "execution_count": null, "id": "0cf93201-3f41-49bb-b0e2-3b5d3b9f51a4", "metadata": {}, "outputs": [], "source": [ "primary_container_roberta = {\n", " \"ModelPackageName\": roberta_mme_model_version_arn,\n", "}\n", "\n", "create_model_roberta_respose = sm_client.create_model(\n", " ModelName=roberta_mme_model_name, \n", " ExecutionRoleArn=role, \n", " PrimaryContainer=primary_container_roberta\n", ")\n", "\n", "print(f\"Model arn : {create_model_roberta_respose['ModelArn']}\")" ] }, { "cell_type": "code", "execution_count": null, "id": "fefca41c-27cf-41e2-9cfd-90fd411cff2e", "metadata": {}, "outputs": [], "source": [ "inference_image_hf_mme" ] }, { "cell_type": "markdown", "id": "b4aa9505-c6cb-44a9-bead-aeeaf79b6c2f", "metadata": {}, "source": [ "## Create the model metadata for multi-model endpoint\n", "Here we use `boto3` to establish the model metadata. Instead of describing a single model, this metadata will indicate the use of multi-model semantics and will identify the source location of all specific model artifacts. You also need to pass the ModelDataUrl field that specifies the prefix in Amazon S3 where the model artifacts are located, instead of the path to a single model artifact, as you would when deploying a single model." ] }, { "cell_type": "code", "execution_count": null, "id": "1e4bfb5d-4054-4fe0-b00b-dc8032e36599", "metadata": {}, "outputs": [], "source": [ "# establish the place in S3 from which the endpoint will pull individual models\n", "multi_model_now = f'{datetime.datetime.now():%Y-%m-%d-%H-%M-%S}'\n", "multi_model_name = f'pytorch-multi-model-senti-{multi_model_now}'\n", "_container = {\n", " 'Image': inference_image_hf_mme,\n", " 'ModelDataUrl': model_data_path,\n", " 'Mode': 'MultiModel'\n", "}\n", "create_model_response = sm_client.create_model(\n", " ModelName = multi_model_name,\n", " ExecutionRoleArn = role,\n", " Containers = [_container])\n", "%store multi_model_name\n", "print(f'Multi Model name {multi_model_name}')" ] }, { "cell_type": "markdown", "id": "07bdb34a-8bd1-4c3b-b0be-3e29038ad971", "metadata": {}, "source": [ "## Create the multi-model endpoint\n", "There is nothing special about the SageMaker endpoint config metadata for a multi-model endpoint. You need to consider the appropriate instance type and number of instances for the projected prediction workload. The number and size of the individual models will drive memory requirements.\n", "\n", "Once the endpoint config is in place, the endpoint creation is straightforward." ] }, { "cell_type": "code", "execution_count": null, "id": "6e59e878-247d-4943-97fb-356025d78828", "metadata": {}, "outputs": [], "source": [ "endpoint_config_name = f'pytorch-multi-model-config-{multi_model_now}'\n", "print('Endpoint config name: ' + endpoint_config_name)\n", "\n", "create_endpoint_config_response = sm_client.create_endpoint_config(\n", " EndpointConfigName = endpoint_config_name,\n", " ProductionVariants=[{\n", " 'InstanceType': deploy_instance_type,\n", " 'InitialInstanceCount': 1,\n", " 'InitialVariantWeight': 1,\n", " 'ModelName': multi_model_name,\n", " 'VariantName': 'AllTraffic'}])" ] }, { "cell_type": "code", "execution_count": null, "id": "4b377a55-b8fd-4236-bc55-7cb9aea36295", "metadata": {}, "outputs": [], "source": [ "endpoint_name = f'pytorch-multi-model-endpoint-{multi_model_now}'\n", "print('Endpoint name: ' + endpoint_name)\n", "\n", "create_endpoint_response = sm_client.create_endpoint(\n", " EndpointName=endpoint_name,\n", " EndpointConfigName=endpoint_config_name)\n", "print('Endpoint Arn: ' + create_endpoint_response['EndpointArn'])" ] }, { "cell_type": "code", "execution_count": null, "id": "0f0985ac-9fb8-4e0a-864a-efa59242ccba", "metadata": {}, "outputs": [], "source": [ "%%time\n", "utils.endpoint_creation_wait(endpoint_name)" ] }, { "cell_type": "markdown", "id": "7bc3f98d-d804-4c95-afb9-711ebb665bbe", "metadata": {}, "source": [ "### Invoke multi-model endpoint" ] }, { "cell_type": "code", "execution_count": null, "id": "0b8584aa-1078-453f-864c-6b0de1677fb2", "metadata": {}, "outputs": [], "source": [ "test_data = pd.read_csv(\"../sample_payload/test_data.csv\", header=None)\n", "json_data = dict({'inputs':test_data.iloc[:,0].to_list()})\n", "test_data" ] }, { "cell_type": "code", "execution_count": null, "id": "2c901520-a047-4a83-8835-b5b32aadfbe6", "metadata": {}, "outputs": [], "source": [ "def invoke_multi_model_endpoint(model_archive=None, content_type=\"JSON\", test_data=None):\n", "\n", " if content_type == \"JSON\":\n", "\n", " response = sm_runtime.invoke_endpoint(\n", " EndpointName=endpoint_name,\n", " Body=json.dumps(test_data),\n", " ContentType=\"application/json\",\n", " TargetModel=model_archive,\n", " )\n", " elif content_type == \"CSV\":\n", " response = sm_runtime.invoke_endpoint(\n", " EndpointName=endpoint_name,\n", " Body=test_data.to_csv(header=False, index=False),\n", " ContentType=\"text/csv\",\n", " TargetModel=model_archive,\n", " )\n", " else:\n", " print(f\"input content type {content_type} is not supported, please selece CSV or JSON.\")\n", " return response[\"Body\"].read()" ] }, { "cell_type": "code", "execution_count": null, "id": "926b7145-39da-4f6c-b191-79754e2e3d96", "metadata": {}, "outputs": [], "source": [ "%%time\n", "model_archive = '/model_roberta_MME.tar.gz'\n", "content_type = \"JSON\" #\"CSV\"\n", "payload = json_data #test_data\n", "results = invoke_multi_model_endpoint(model_archive, content_type, payload)\n", "print(results)" ] }, { "cell_type": "code", "execution_count": null, "id": "a0564b95-d5cc-4e49-a594-5788fc012c0a", "metadata": {}, "outputs": [], "source": [ "%%time\n", "model_archive = '/model_roberta_MME.tar.gz'\n", "content_type = \"CSV\"\n", "payload = test_data\n", "results = invoke_multi_model_endpoint(model_archive, content_type, payload)\n", "print(results)" ] }, { "cell_type": "code", "execution_count": null, "id": "87c7dd77-d2ef-4fbf-8887-ba47a82bf338", "metadata": {}, "outputs": [], "source": [ "%%time\n", "model_archive = '/model_distilbert.tar.gz'\n", "content_type = \"JSON\" #\"CSV\"\n", "payload = json_data #test_data\n", "results = invoke_multi_model_endpoint(model_archive, content_type, payload)\n", "print(results)" ] }, { "cell_type": "code", "execution_count": null, "id": "ee5363f6-34e3-49fc-84f1-d3ad0a0826e8", "metadata": {}, "outputs": [], "source": [ "%%time\n", "model_archive = '/model_distilbert.tar.gz'\n", "content_type = \"CSV\"\n", "payload = test_data\n", "results = invoke_multi_model_endpoint(model_archive, content_type, payload)\n", "print(results)" ] }, { "cell_type": "markdown", "id": "3ffcb213-6ae4-42ac-8b3f-a7ca41f6a5f4", "metadata": {}, "source": [ "## Delete the endpoint\n", "\n", "If you do not plan to use this endpoint further, you should delete the endpoint to avoid incurring additional charges." ] }, { "cell_type": "code", "execution_count": null, "id": "20256894-a8c7-4a0c-aa9c-34d2a9338eb3", "metadata": {}, "outputs": [], "source": [ "sm_session.delete_endpoint(endpoint_name)" ] }, { "cell_type": "code", "execution_count": null, "id": "77f7c6eb-0cac-459e-833a-b6c0807947b1", "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "instance_type": "ml.t3.medium", "kernelspec": { "display_name": "Python 3 (Data Science 2.0)", "language": "python", "name": "python3__SAGEMAKER_INTERNAL__arn:aws:sagemaker:us-east-1:081325390199:image/sagemaker-data-science-38" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.13" } }, "nbformat": 4, "nbformat_minor": 5 }