{
"cells": [
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"# Deploying Serverless Endpoints From SageMaker Model Registry"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"---\n",
"\n",
"This notebook's CI test result for us-west-2 is as follows. CI test results in other regions can be found at the end of the notebook. \n",
"\n",
"\n",
"\n",
"---"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"## SageMaker XGBoost Algorithm Regression Example"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"Amazon SageMaker Serverless Inference is a purpose-built inference option that makes it easy for customers to deploy and scale ML models. Serverless Inference is ideal for workloads which have idle periods between traffic spurts and can tolerate cold starts. Serverless endpoints also automatically launch compute resources and scale them in and out depending on traffic, eliminating the need to choose instance types or manage scaling policies.\n",
"\n",
"[SageMaker Model Registry](https://docs.aws.amazon.com/sagemaker/latest/dg/model-registry.html) can be used to catalog and manage different model versions. Model Registry now supports deploying registered models to serverless endpoints. For this notebook we will take the existing [XGBoost Serverless example](https://github.com/aws/amazon-sagemaker-examples/blob/main/serverless-inference/Serverless-Inference-Walkthrough.ipynb) and integrate with the Model Registry. From there we will take our trained model and deploy it to a serverless endpoint using the Boto3 Python SDK. Note that there is no support for Model Registry in the SageMaker SDK with serverless endpoints at the moment.\n",
"\n",
"Notebook Setting\n",
"- SageMaker Studio: Python 3 (Data Science)\n",
"- Regions Available: SageMaker Serverless Inference is currently available in the following regions in preview: US East (Northern Virginia), US East (Ohio), US West (Oregon), EU (Ireland), Asia Pacific (Tokyo) and Asia Pacific (Sydney). After general availability it should be available in all commercial regions. To verify availability stay up to date with the [documentation](https://docs.aws.amazon.com/sagemaker/latest/dg/serverless-endpoints.html) which will reflect all supported regions."
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"## Table of Contents\n",
"- Setup\n",
"- Model Training\n",
"- Model Registry\n",
"- Deployment\n",
" - Model Creation\n",
" - Endpoint Configuration Creation\n",
" - Serverless Endpoint Creation\n",
" - Endpoint Invocation\n",
"- Cleanup"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"## Setup\n",
"\n",
"For testing you need to properly configure your Notebook Role to have SageMaker Full Access."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"! pip install sagemaker botocore boto3 awscli --upgrade"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Setup clients\n",
"import boto3\n",
"\n",
"client = boto3.client(service_name=\"sagemaker\")\n",
"runtime = boto3.client(service_name=\"sagemaker-runtime\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import sagemaker\n",
"from sagemaker.estimator import Estimator\n",
"\n",
"boto_session = boto3.session.Session()\n",
"region = boto_session.region_name\n",
"print(region)\n",
"\n",
"sagemaker_session = sagemaker.Session()\n",
"base_job_prefix = \"xgboost-example\"\n",
"role = sagemaker.get_execution_role()\n",
"print(role)\n",
"\n",
"default_bucket = sagemaker_session.default_bucket()\n",
"s3_prefix = base_job_prefix\n",
"\n",
"training_instance_type = \"ml.m5.xlarge\""
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# retrieve data\n",
"s3 = boto3.client(\"s3\")\n",
"s3.download_file(\n",
" f\"sagemaker-example-files-prod-{region}\",\n",
" \"datasets/tabular/uci_abalone/train_csv/abalone_dataset1_train.csv\",\n",
" \"abalone_dataset1_train.csv\",\n",
")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# upload data to S3\n",
"!aws s3 cp abalone_dataset1_train.csv s3://{default_bucket}/xgboost-regression/train.csv"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"## Model Training\n",
"\n",
"Now, we train an ML model using the [SageMaker XGBoost Algorithm](https://docs.aws.amazon.com/sagemaker/latest/dg/xgboost.html). In this example, we use a SageMaker-provided XGBoost container image and configure an estimator to train our model."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from sagemaker.inputs import TrainingInput\n",
"\n",
"training_path = f\"s3://{default_bucket}/xgboost-regression/train.csv\"\n",
"train_input = TrainingInput(training_path, content_type=\"text/csv\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"model_path = f\"s3://{default_bucket}/{s3_prefix}/xgb_model\"\n",
"\n",
"# retrieve xgboost image\n",
"image_uri = sagemaker.image_uris.retrieve(\n",
" framework=\"xgboost\",\n",
" region=region,\n",
" version=\"1.0-1\",\n",
" py_version=\"py3\",\n",
" instance_type=training_instance_type,\n",
")\n",
"\n",
"# Configure Training Estimator\n",
"xgb_train = Estimator(\n",
" image_uri=image_uri,\n",
" instance_type=training_instance_type,\n",
" instance_count=1,\n",
" output_path=model_path,\n",
" sagemaker_session=sagemaker_session,\n",
" role=role,\n",
")\n",
"\n",
"# Set Hyperparameters\n",
"xgb_train.set_hyperparameters(\n",
" objective=\"reg:linear\",\n",
" num_round=50,\n",
" max_depth=5,\n",
" eta=0.2,\n",
" gamma=4,\n",
" min_child_weight=6,\n",
" subsample=0.7,\n",
" silent=0,\n",
")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Fit model\n",
"xgb_train.fit({\"train\": train_input})"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Retrieve model data from training job\n",
"model_artifacts = xgb_train.model_data\n",
"model_artifacts"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"## Model Registry"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Create a Model Package Group: https://docs.aws.amazon.com/sagemaker/latest/dg/model-registry-model-group.html\n",
"import time\n",
"from time import gmtime, strftime\n",
"\n",
"model_package_group_name = \"xgboost-abalone\" + strftime(\"%Y-%m-%d-%H-%M-%S\", gmtime())\n",
"model_package_group_input_dict = {\n",
" \"ModelPackageGroupName\": model_package_group_name,\n",
" \"ModelPackageGroupDescription\": \"Model package group for xgboost regression model with Abalone dataset\",\n",
"}\n",
"\n",
"create_model_pacakge_group_response = client.create_model_package_group(\n",
" **model_package_group_input_dict\n",
")\n",
"print(\n",
" \"ModelPackageGroup Arn : {}\".format(create_model_pacakge_group_response[\"ModelPackageGroupArn\"])\n",
")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"model_package_group_arn = create_model_pacakge_group_response[\"ModelPackageGroupArn\"]\n",
"modelpackage_inference_specification = {\n",
" \"InferenceSpecification\": {\n",
" \"Containers\": [\n",
" {\n",
" \"Image\": image_uri,\n",
" }\n",
" ],\n",
" \"SupportedContentTypes\": [\"text/csv\"],\n",
" \"SupportedResponseMIMETypes\": [\"text/csv\"],\n",
" }\n",
"}\n",
"\n",
"# Specify the model source\n",
"model_url = model_artifacts\n",
"\n",
"# Specify the model data\n",
"modelpackage_inference_specification[\"InferenceSpecification\"][\"Containers\"][0][\n",
" \"ModelDataUrl\"\n",
"] = model_url\n",
"\n",
"create_model_package_input_dict = {\n",
" \"ModelPackageGroupName\": model_package_group_arn,\n",
" \"ModelPackageDescription\": \"Model for regression with the Abalone dataset\",\n",
" \"ModelApprovalStatus\": \"PendingManualApproval\",\n",
"}\n",
"create_model_package_input_dict.update(modelpackage_inference_specification)\n",
"\n",
"# Create cross-account model package\n",
"create_mode_package_response = client.create_model_package(**create_model_package_input_dict)\n",
"model_package_arn = create_mode_package_response[\"ModelPackageArn\"]\n",
"print(\"ModelPackage Version ARN : {}\".format(model_package_arn))"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"client.list_model_packages(ModelPackageGroupName=model_package_group_name)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"model_package_arn = client.list_model_packages(ModelPackageGroupName=model_package_group_name)[\n",
" \"ModelPackageSummaryList\"\n",
"][0][\"ModelPackageArn\"]\n",
"model_package_arn"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"client.describe_model_package(ModelPackageName=model_package_arn)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Approve the model package\n",
"model_package_update_input_dict = {\n",
" \"ModelPackageArn\": model_package_arn,\n",
" \"ModelApprovalStatus\": \"Approved\",\n",
"}\n",
"model_package_update_response = client.update_model_package(**model_package_update_input_dict)\n",
"print(model_package_update_response)"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"## Deployment"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"### Model Creation"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"model_name = \"xgboost-serverless-model\" + strftime(\"%Y-%m-%d-%H-%M-%S\", gmtime())\n",
"print(\"Model name : {}\".format(model_name))\n",
"container_list = [{\"ModelPackageName\": model_package_arn}]\n",
"\n",
"create_model_response = client.create_model(\n",
" ModelName=model_name, ExecutionRoleArn=role, Containers=container_list\n",
")\n",
"print(\"Model arn : {}\".format(create_model_response[\"ModelArn\"]))"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"### Endpoint Configuration Creation\n",
"This is where you can adjust the [Serverless Configuration](https://docs.aws.amazon.com/sagemaker/latest/dg/serverless-endpoints-create.html) for your endpoint. The current max concurrent invocations for a single endpoint, known as `MaxConcurrency`, can be any value from 1 to 200, and `MemorySize` can be any of the following: 1024 MB, 2048 MB, 3072 MB, 4096 MB, 5120 MB, or 6144 MB."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"endpoint_config_name = \"xgboost-serverless-epc\" + strftime(\"%Y-%m-%d-%H-%M-%S\", gmtime())\n",
"print(endpoint_config_name)\n",
"create_endpoint_config_response = client.create_endpoint_config(\n",
" EndpointConfigName=endpoint_config_name,\n",
" ProductionVariants=[\n",
" {\n",
" \"ServerlessConfig\": {\"MemorySizeInMB\": 1024, \"MaxConcurrency\": 10},\n",
" \"ModelName\": model_name,\n",
" \"VariantName\": \"AllTraffic\",\n",
" }\n",
" ],\n",
")\n",
"print(\"Endpoint Configuration Arn: \" + create_endpoint_config_response[\"EndpointConfigArn\"])"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"### Endpoint Creation\n",
"Now that we have an endpoint configuration, we can create a serverless endpoint and deploy our model to it. When creating the endpoint, provide the name of your endpoint configuration and a name for the new endpoint."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"endpoint_name = \"xgboost-serverless-ep\" + strftime(\"%Y-%m-%d-%H-%M-%S\", gmtime())\n",
"print(\"EndpointName={}\".format(endpoint_name))\n",
"\n",
"create_endpoint_response = client.create_endpoint(\n",
" EndpointName=endpoint_name, EndpointConfigName=endpoint_config_name\n",
")\n",
"print(create_endpoint_response[\"EndpointArn\"])"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"Wait until the endpoint status is `InService` before invoking the endpoint."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import time\n",
"\n",
"describe_endpoint_response = client.describe_endpoint(EndpointName=endpoint_name)\n",
"\n",
"while describe_endpoint_response[\"EndpointStatus\"] == \"Creating\":\n",
" describe_endpoint_response = client.describe_endpoint(EndpointName=endpoint_name)\n",
" print(describe_endpoint_response[\"EndpointStatus\"])\n",
" time.sleep(15)\n",
"\n",
"describe_endpoint_response"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"### Invocation\n",
"Invoke the endpoint by sending a request to it. The following is a sample data point grabbed from the CSV file downloaded from the public Abalone dataset."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"response = runtime.invoke_endpoint(\n",
" EndpointName=endpoint_name,\n",
" Body=b\".345,0.224414,.131102,0.042329,.279923,-0.110329,-0.099358,0.0\",\n",
" ContentType=\"text/csv\",\n",
")\n",
"\n",
"print(response[\"Body\"].read())"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"## Cleanup\n",
"Delete any resources you created in this notebook that you no longer wish to use."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"client.delete_model(ModelName=model_name)\n",
"client.delete_endpoint_config(EndpointConfigName=endpoint_config_name)\n",
"client.delete_endpoint(EndpointName=endpoint_name)"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"## Notebook CI Test Results\n",
"\n",
"This notebook was tested in multiple regions. The test results are as follows, except for us-west-2 which is shown at the top of the notebook.\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n"
]
}
],
"metadata": {
"instance_type": "ml.t3.medium",
"kernelspec": {
"display_name": "Python 3 (Data Science 3.0)",
"language": "python",
"name": "python3__SAGEMAKER_INTERNAL__arn:aws:sagemaker:us-east-1:081325390199:image/sagemaker-data-science-310-v1"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.6"
}
},
"nbformat": 4,
"nbformat_minor": 4
}