{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# SageMaker Real-Time Inference\n", "## XGBoost Regression Example\n", "\n", "Amazon SageMaker Real-Time Inference is instance-based hosting, you should utilize it for low latency, high throughput sensitive workloads.\n", "\n", "For this notebook we'll be working with the SageMaker XGBoost Algorithm to train a model and then deploy a real-time endpoint. We will be using the public S3 Abalone regression dataset for this example." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Table of Contents\n", "- Setup\n", "- Deployment\n", " - Model Creation\n", " - Endpoint Configuration (Prod Variants + Instance Setup)\n", " - Real-Time Endpoint Creation\n", " - Endpoint Invocation\n", "- Cleanup" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Setup\n", "\n", "For testing you need to properly configure your Notebook Role to have SageMaker Full Access." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Let's start by installing preview wheels of the Python SDK, boto and aws cli\n", "! pip install sagemaker botocore boto3 awscli --upgrade" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Setup clients\n", "import boto3\n", "\n", "client = boto3.client(service_name=\"sagemaker\")\n", "runtime = boto3.client(service_name=\"sagemaker-runtime\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### SageMaker Setup\n", "To begin, we import the AWS SDK for Python (Boto3) and set up our environment, including an IAM role and an S3 bucket to store our data." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import boto3\n", "import sagemaker\n", "from sagemaker.estimator import Estimator\n", "\n", "boto_session = boto3.session.Session()\n", "region = boto_session.region_name\n", "print(region)\n", "\n", "sagemaker_session = sagemaker.Session()\n", "base_job_prefix = \"xgboost-example\"\n", "role = sagemaker.get_execution_role()\n", "print(role)\n", "\n", "default_bucket = sagemaker_session.default_bucket()\n", "s3_prefix = base_job_prefix" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Deployment" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Model Creation\n", "Create a model by providing your model artifacts, the container image URI, environment variables for the container (if applicable), a model name, and the SageMaker IAM role." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "model_s3_key = f\"{s3_prefix}/model.tar.gz\"\n", "model_url = f\"s3://{default_bucket}/{model_s3_key}\"\n", "print(f\"Uploading Model to {model_url}\")\n", "\n", "with open(\"model/model.tar.gz\", \"rb\") as model_file:\n", " boto_session.resource(\"s3\").Bucket(default_bucket).Object(model_s3_key).upload_fileobj(model_file)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from time import gmtime, strftime\n", "\n", "model_name = \"xgboost-realtime\" + strftime(\"%Y-%m-%d-%H-%M-%S\", gmtime())\n", "print(\"Model name: \" + model_name)\n", "\n", "# environment variables\n", "byo_container_env_vars = {\"SAGEMAKER_CONTAINER_LOG_LEVEL\": \"20\"}\n", "\n", "inference_instance_type = \"ml.m5.xlarge\"\n", "\n", "# retrieve xgboost image\n", "image_uri = sagemaker.image_uris.retrieve(\n", " framework=\"xgboost\",\n", " region=region,\n", " version=\"1.0-1\",\n", " py_version=\"py3\",\n", " instance_type=inference_instance_type,\n", ")\n", "\n", "create_model_response = client.create_model(\n", " ModelName=model_name,\n", " Containers=[\n", " {\n", " \"Image\": image_uri,\n", " \"Mode\": \"SingleModel\",\n", " \"ModelDataUrl\": model_url,\n", " \"Environment\": byo_container_env_vars,\n", " }\n", " ],\n", " ExecutionRoleArn=role,\n", ")\n", "\n", "print(\"Model Arn: \" + create_model_response[\"ModelArn\"])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Endpoint Configuration Creation\n", "\n", "This is where you can adjust the instance count/type for your endpoint." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "xgboost_epc_name = \"xgboost-real-time-epc\" + strftime(\"%Y-%m-%d-%H-%M-%S\", gmtime())\n", "\n", "endpoint_config_response = client.create_endpoint_config(\n", " EndpointConfigName=xgboost_epc_name,\n", " ProductionVariants=[\n", " {\n", " \"VariantName\": \"byoVariant\",\n", " \"ModelName\": model_name,\n", " \"InitialInstanceCount\": 1,\n", " \"InstanceType\": \"ml.m5.xlarge\"\n", " },\n", " ],\n", ")\n", "print(\"Endpoint Configuration Arn: \" + endpoint_config_response[\"EndpointConfigArn\"])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### RealTime Endpoint Creation\n", "Now that we have an endpoint configuration, we can create a RealTime endpoint and deploy our model to it. When creating the endpoint, provide the name of your endpoint configuration and a name for the new endpoint." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "endpoint_name = \"xgboost-realtime-ep\" + strftime(\"%Y-%m-%d-%H-%M-%S\", gmtime())\n", "\n", "create_endpoint_response = client.create_endpoint(\n", " EndpointName=endpoint_name,\n", " EndpointConfigName=xgboost_epc_name,\n", ")\n", "\n", "print(\"Endpoint Arn: \" + create_endpoint_response[\"EndpointArn\"])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Wait until the endpoint status is InService before invoking the endpoint." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# wait for endpoint to reach a terminal state (InService) using describe endpoint\n", "import time\n", "\n", "describe_endpoint_response = client.describe_endpoint(EndpointName=endpoint_name)\n", "\n", "while describe_endpoint_response[\"EndpointStatus\"] == \"Creating\":\n", " describe_endpoint_response = client.describe_endpoint(EndpointName=endpoint_name)\n", " print(describe_endpoint_response[\"EndpointStatus\"])\n", " time.sleep(15)\n", "\n", "describe_endpoint_response" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Endpoint Invocation\n", "Invoke the endpoint by sending a request to it. The following is a sample data point grabbed from the CSV file downloaded from the public Abalone dataset." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "response = runtime.invoke_endpoint(\n", " EndpointName=endpoint_name,\n", " Body=b\".345,0.224414,.131102,0.042329,.279923,-0.110329,-0.099358,0.0\",\n", " ContentType=\"text/csv\",\n", ")\n", "\n", "print(response[\"Body\"].read())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Clean Up\n", "Delete any resources you created in this notebook that you no longer wish to use." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# client.delete_model(ModelName=model_name)\n", "# client.delete_endpoint_config(EndpointConfigName=xgboost_epc_name)\n", "# client.delete_endpoint(EndpointName=endpoint_name)" ] } ], "metadata": { "instance_type": "ml.t3.medium", "kernelspec": { "display_name": "Python 3 (Data Science)", "language": "python", "name": "python3__SAGEMAKER_INTERNAL__arn:aws:sagemaker:us-west-2:236514542706:image/datascience-1.0" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.10" } }, "nbformat": 4, "nbformat_minor": 5 }