{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# SageMaker Serverless Inference\n", "## XGBoost Regression Example\n", "\n", "Amazon SageMaker Serverless Inference is a purpose-built inference option that makes it easy for customers to deploy and scale ML models. Serverless Inference is ideal for workloads which have idle periods between traffic spurts and can tolerate cold starts. Serverless endpoints also automatically launch compute resources and scale them in and out depending on traffic, eliminating the need to choose instance types or manage scaling policies.\n", "\n", "For this notebook we'll be working with the SageMaker XGBoost Algorithm to train a model and then deploy a serverless endpoint. We will be using the public S3 Abalone regression dataset for this example." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Table of Contents\n", "- Setup\n", "- Model Training\n", "- Deployment\n", " - Model Creation\n", " - Endpoint Configuration (Adjust for Serverless)\n", " - Serverless Endpoint Creation\n", " - Endpoint Invocation\n", "- Cleanup" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Setup\n", "\n", "For testing you need to properly configure your Notebook Role to have SageMaker Full Access." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Let's start by installing preview wheels of the Python SDK, boto and aws cli\n", "! pip install sagemaker botocore boto3 awscli --upgrade" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Setup clients\n", "import boto3\n", "\n", "client = boto3.client(service_name=\"sagemaker\")\n", "runtime = boto3.client(service_name=\"sagemaker-runtime\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### SageMaker Setup\n", "To begin, we import the AWS SDK for Python (Boto3) and set up our environment, including an IAM role and an S3 bucket to store our data." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import boto3\n", "import sagemaker\n", "from sagemaker.estimator import Estimator\n", "\n", "boto_session = boto3.session.Session()\n", "region = boto_session.region_name\n", "print(region)\n", "\n", "sagemaker_session = sagemaker.Session()\n", "base_job_prefix = \"xgboost-serverless-example\"\n", "role = sagemaker.get_execution_role()\n", "print(role)\n", "\n", "default_bucket = sagemaker_session.default_bucket()\n", "s3_prefix = base_job_prefix\n", "\n", "instance_type = \"ml.m5.xlarge\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Deployment" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Model Creation\n", "Create a model by providing your model artifacts, the container image URI, environment variables for the container (if applicable), a model name, and the SageMaker IAM role." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "model_s3_key = f\"{s3_prefix}/model.tar.gz\"\n", "model_url = f\"s3://{default_bucket}/{model_s3_key}\"\n", "print(f\"Uploading Model to {model_url}\")\n", "\n", "with open(\"model/model.tar.gz\", \"rb\") as model_file:\n", " boto_session.resource(\"s3\").Bucket(default_bucket).Object(model_s3_key).upload_fileobj(model_file)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from time import gmtime, strftime\n", "\n", "model_name = \"xgboost-serverless\" + strftime(\"%Y-%m-%d-%H-%M-%S\", gmtime())\n", "print(\"Model name: \" + model_name)\n", "\n", "# environment variables\n", "byo_container_env_vars = {\"SAGEMAKER_CONTAINER_LOG_LEVEL\": \"20\"}\n", "\n", "# retrieve xgboost image\n", "image_uri = sagemaker.image_uris.retrieve(\n", " framework=\"xgboost\",\n", " region=region,\n", " version=\"1.0-1\",\n", " py_version=\"py3\",\n", " instance_type=instance_type,\n", ")\n", "\n", "create_model_response = client.create_model(\n", " ModelName=model_name,\n", " Containers=[\n", " {\n", " \"Image\": image_uri,\n", " \"Mode\": \"SingleModel\",\n", " \"ModelDataUrl\": model_url,\n", " \"Environment\": byo_container_env_vars,\n", " }\n", " ],\n", " ExecutionRoleArn=role,\n", ")\n", "\n", "print(\"Model Arn: \" + create_model_response[\"ModelArn\"])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Endpoint Configuration Creation\n", "\n", "This is where you can adjust the Serverless Configuration for your endpoint. The current max concurrent invocations for a single endpoint, known as MaxConcurrency, can be any value from 1 to 50, and MemorySize can be any of the following: 1024 MB, 2048 MB, 3072 MB, 4096 MB, 5120 MB, or 6144 MB." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "xgboost_epc_name = \"xgboost-serverless-epc\" + strftime(\"%Y-%m-%d-%H-%M-%S\", gmtime())\n", "\n", "endpoint_config_response = client.create_endpoint_config(\n", " EndpointConfigName=xgboost_epc_name,\n", " ProductionVariants=[\n", " {\n", " \"VariantName\": \"byoVariant\",\n", " \"ModelName\": model_name,\n", " \"ServerlessConfig\": {\n", " # The memory size of your serverless endpoint. Valid values are in 1 GB increments: 1024 MB, 2048 MB, 3072 MB, 4096 MB, 5120 MB, or 6144 MB.\n", " \"MemorySizeInMB\": 4096,\n", " # The maximum number of concurrent invocations your serverless endpoint can process[1-200]\n", " \"MaxConcurrency\": 1,\n", " },\n", " },\n", " ],\n", ")\n", "\n", "# Commented version below shows how you would create an Endpoint Config in case of a RealTime Endpoint\n", "# You would provide the `initial_instance_count` and `instance_type` as opposed to the Serverless Config parameters\n", "\n", "'''\n", "\n", "initial_instance_count = 1\n", "instance_type = \"ml.t2.medium\"\n", "\n", "endpoint_config_response = client.create_endpoint_config(\n", " EndpointConfigName=xgboost_epc_name,\n", " production_variants = [\n", " {\n", " \"VariantName\": byoVariant,\n", " \"ModelName\": model_name,\n", " \"InitialInstanceCount\": initial_instance_count,\n", " \"InstanceType\": instance_type,\n", " }\n", " ],\n", ")\n", "\n", "'''\n", "\n", "print(\"Endpoint Configuration Arn: \" + endpoint_config_response[\"EndpointConfigArn\"])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Serverless Endpoint Creation\n", "Now that we have an endpoint configuration, we can create a serverless endpoint and deploy our model to it. When creating the endpoint, provide the name of your endpoint configuration and a name for the new endpoint." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "endpoint_name = \"xgboost-serverless-ep\" + strftime(\"%Y-%m-%d-%H-%M-%S\", gmtime())\n", "\n", "create_endpoint_response = client.create_endpoint(\n", " EndpointName=endpoint_name,\n", " EndpointConfigName=xgboost_epc_name,\n", ")\n", "\n", "print(\"Endpoint Arn: \" + create_endpoint_response[\"EndpointArn\"])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Wait until the endpoint status is InService before invoking the endpoint." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# wait for endpoint to reach a terminal state (InService) using describe endpoint\n", "import time\n", "\n", "describe_endpoint_response = client.describe_endpoint(EndpointName=endpoint_name)\n", "\n", "while describe_endpoint_response[\"EndpointStatus\"] == \"Creating\":\n", " describe_endpoint_response = client.describe_endpoint(EndpointName=endpoint_name)\n", " print(describe_endpoint_response[\"EndpointStatus\"])\n", " time.sleep(15)\n", "\n", "describe_endpoint_response" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Endpoint Invocation\n", "Invoke the endpoint by sending a request to it. The following is a sample data point grabbed from the CSV file downloaded from the public Abalone dataset." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "response = runtime.invoke_endpoint(\n", " EndpointName=endpoint_name,\n", " Body=b\".345,0.224414,.131102,0.042329,.279923,-0.110329,-0.099358,0.0\",\n", " ContentType=\"text/csv\",\n", ")\n", "\n", "print(response[\"Body\"].read())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Clean Up\n", "Delete any resources you created in this notebook that you no longer wish to use." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# client.delete_model(ModelName=model_name)\n", "# client.delete_endpoint_config(EndpointConfigName=xgboost_epc_name)\n", "# client.delete_endpoint(EndpointName=endpoint_name)" ] } ], "metadata": { "instance_type": "ml.t3.medium", "kernelspec": { "display_name": "Python 3 (Data Science)", "language": "python", "name": "python3__SAGEMAKER_INTERNAL__arn:aws:sagemaker:us-east-1:081325390199:image/datascience-1.0" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.10" } }, "nbformat": 4, "nbformat_minor": 5 }