{ "cells": [ { "cell_type": "markdown", "id": "2f919041", "metadata": {}, "source": [ "# Bring your own container\n" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "---\n", "\n", "This notebook's CI test result for us-west-2 is as follows. CI test results in other regions can be found at the end of the notebook. \n", "\n", "![This us-west-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/us-west-2/inference|nlp|realtime|byoc|multi_model_catboost.ipynb)\n", "\n", "---" ] }, { "cell_type": "markdown", "id": "2f919041", "metadata": {}, "source": [ "\n", "This notebook shows an example of bring your own container. This example leverages the MultiModelServer to host MME and this example \n", "can be further modified and adapted to fit your needs\n" ] }, { "cell_type": "markdown", "id": "06b411bb", "metadata": {}, "source": [ "# Multi-Model Endpoint - CatBoost\n", "\n", "This example notebook also showcases how to use a custom container to host multiple CatBoost models on a SageMaker Multi Model Endpoint. The model this notebook deploys is taken from this [CatBoost tutorial](https://github.com/catboost/tutorials/blob/master/python_tutorial_with_tasks.ipynb). \n", "\n", "We are using this framework as an example to demonstrate deployment and serving using MultiModel Endpoint and showcase the capability. This notebook can be extended to any framework.\n", "\n", "Catboost is gaining in popularity and is not yet supported as a framework for SageMaker MultiModelEndpoint. Further this example serves to demostrate how to bring your own container to a MultiModelEndpoint\n", "\n", "In this Notebook we will use identical model to simulate multiple models for loading and inference" ] }, { "cell_type": "markdown", "id": "2569b2c3", "metadata": {}, "source": [ "## Prerequisites\n", "### Packages and Permissions\n", "The SageMaker SDK uses the SageMaker default S3 bucket when needed. If the get_execution_role does not return a role with the appropriate permissions, you'll need to specify an IAM role ARN that does. Please make sure the `SageMakerFullAccess` policy is attached to the execution role you are using." ] }, { "cell_type": "markdown", "id": "3fb6b304", "metadata": {}, "source": [ "## Load model and test local inference\n", "Here, install `catboost` to test we can load up the model locally and make inference. \n", "\n", "We load up the model locally using `CatBoostClassifier()`. `test_data.csv` contains a single row of test inference data." ] }, { "cell_type": "code", "execution_count": null, "id": "124fee8b", "metadata": {}, "outputs": [], "source": [ "# Cell 01\n", "\n", "!pip install catboost" ] }, { "cell_type": "code", "execution_count": null, "id": "022aa0b5", "metadata": {}, "outputs": [], "source": [ "# Cell 02\n", "from catboost import CatBoostClassifier, Pool as CatboostPool, cv\n", "import os\n", "import pandas\n", "\n", "model_file = CatBoostClassifier()\n", "model_file = model_file.load_model(\"./models/mme_catboost/catboost_model.bin\")\n", "df = pandas.read_csv(\"./data/mme_catboost/test_data.csv\")\n", "df.head(2)" ] }, { "cell_type": "code", "execution_count": null, "id": "b4a864d5", "metadata": {}, "outputs": [], "source": [ "# Cell 03\n", "import pandas as pd\n", "import io\n", "import json\n", "\n", "out = model_file.predict_proba(df)\n", "print(out)" ] }, { "cell_type": "markdown", "id": "b28981f2", "metadata": {}, "source": [ "## Upload tar ball to s3\n" ] }, { "cell_type": "markdown", "id": "5b96bafb", "metadata": {}, "source": [ "### Create a model tar ball\n", "\n", "SageMaker requires our model to be packaged in a tar.gz file." ] }, { "cell_type": "code", "execution_count": null, "id": "8a4b71aa", "metadata": {}, "outputs": [], "source": [ "# Cell 04\n", "!cd models/mme_catboost && tar -czvf catboost-model.tar.gz catboost_model.bin\n", "!ls models/mme_catboost" ] }, { "cell_type": "markdown", "id": "9d52509a", "metadata": {}, "source": [ "### Upload 5 copies of the model to S3\n", "\n", "Multi-Model Endpoints require all our models to be in a specific S3 prefix. Here we upload 100 of them to our default bucket. \n", "\n", "This is a simulation of having different models which we need to use to predict. In reality you would probably have each of these models trained separately" ] }, { "cell_type": "code", "execution_count": null, "id": "3314369e", "metadata": {}, "outputs": [], "source": [ "# Cell 05\n", "import sagemaker\n", "\n", "sess = sagemaker.Session()\n", "s3_bucket = sess.default_bucket() # Replace with your own bucket name if needed\n", "print(s3_bucket)" ] }, { "cell_type": "markdown", "id": "5d52e111", "metadata": {}, "source": [ "### Upload the model tar balls using boto3 with a unique name" ] }, { "cell_type": "code", "execution_count": null, "id": "ff6b4674", "metadata": {}, "outputs": [], "source": [ "# Cell 06\n", "import boto3\n", "\n", "s3 = boto3.client(\"s3\")\n", "for i in range(0, 5):\n", " with open(\"models/mme_catboost/catboost-model.tar.gz\", \"rb\") as f:\n", " s3.upload_fileobj(f, s3_bucket, \"catboost/catboost-model-{}.tar.gz\".format(i))\n", "\n", "print(\"Models:uploaded and ready for use\")" ] }, { "cell_type": "markdown", "id": "c6d8dd04", "metadata": {}, "source": [ "### List all models in s3 prefix we will use for our Multi-Model Endpoint" ] }, { "cell_type": "code", "execution_count": null, "id": "2fd62eab", "metadata": {}, "outputs": [], "source": [ "# Cell 07\n", "!aws s3 ls s3://$s3_bucket/catboost/" ] }, { "cell_type": "markdown", "id": "766a5fc8", "metadata": {}, "source": [ "## Building the custom container" ] }, { "cell_type": "markdown", "id": "d69948f3", "metadata": {}, "source": [ "The container folder in this example contains 3 files:\n", "```\n", "\u251c\u2500\u2500 container\n", "\u2502 \u251c\u2500\u2500 dockerd-entrypoint.py\n", "\u2502 \u251c\u2500\u2500 Dockerfile\n", "\u2502 \u2514\u2500\u2500 model_handler.py\n", "```\n", "\n", "- `dockerd-entrypoint.py` is the entry point script that will start the multi model server.\n", "- `Dockerfile` contains the container definition that will be used to assemble the image. This includes the packages that need to be installed.\n", "- `model_handler.py` is the script that will contain the logic to load up the model and make inference.\n", "\n", "Take a look through the files to see if there is any customization that you would like to do.\n", "Below cells highlight the main part of the files. \n" ] }, { "cell_type": "markdown", "id": "2b5d9799", "metadata": {}, "source": [ "### Install catboost in the `Dockerfile`" ] }, { "cell_type": "code", "execution_count": null, "id": "83c3cdd6", "metadata": {}, "outputs": [], "source": [ "# Cell 08\n", "! sed -n '26,30p' container/mme_catboost/Dockerfile" ] }, { "cell_type": "markdown", "id": "329c64f8", "metadata": {}, "source": [ "### Update `initialize` function in `model_handler.py` with logic to load up the model\n", "In this case we are using `CatBoostClassifier()`. Feel free to update the loading logic in this function to your needs." ] }, { "cell_type": "code", "execution_count": null, "id": "7ea3a1b7", "metadata": {}, "outputs": [], "source": [ "# Cell 09\n", "! sed -n '22,40p' ./container/mme_catboost/model_handler.py" ] }, { "cell_type": "markdown", "id": "17f20a73", "metadata": {}, "source": [ "### Update `handle` function in `model_handler.py` with logic to load up the model" ] }, { "cell_type": "code", "execution_count": null, "id": "0fe695ee", "metadata": {}, "outputs": [], "source": [ "# Cell 10\n", "! sed -n '70,85p' ./container/mme_catboost/model_handler.py" ] }, { "cell_type": "markdown", "id": "dff64a5b", "metadata": {}, "source": [ "### Build and Push the custom image to ECR\n", "\n", "**This steps takes atleast 5-6 minutes so please be patient and ignore any \"warnings\" **" ] }, { "cell_type": "code", "execution_count": null, "id": "6c77e8bd", "metadata": {}, "outputs": [], "source": [ "%%sh\n", "# Cell 11\n", "\n", "echo \"Starting Docker Build\"\n", "\n", "# The name of our algorithm\n", "algorithm_name=catboost-sagemaker-multimodel\n", "\n", "cd container/mme_catboost\n", "\n", "account=$(aws sts get-caller-identity --query Account --output text)\n", "\n", "# Get the region defined in the current configuration (default to us-east-1 if none defined)\n", "region=$(aws configure get region)\n", "region=${region:-us-east-1}\n", "\n", "fullname=\"${account}.dkr.ecr.${region}.amazonaws.com/${algorithm_name}:latest\"\n", "\n", "echo \"fullname:image=${fullname}\"\n", "\n", "# If the repository doesn't exist in ECR, create it.\n", "aws ecr describe-repositories --repository-names \"${algorithm_name}\" > /dev/null 2>&1\n", "\n", "if [ $? -ne 0 ]\n", "then\n", " aws ecr create-repository --repository-name \"${algorithm_name}\" > /dev/null\n", "fi\n", "\n", "# Get the login command from ECR and execute it directly\n", "$(aws ecr get-login --region ${region} --no-include-email)\n", "\n", "# Build the docker image locally with the image name and then push it to ECR\n", "# with the full name.\n", "\n", "echo \"Starting the Docker Build with ${algorithm_name}\"\n", "docker build -q -t ${algorithm_name} .\n", "docker tag ${algorithm_name} ${fullname}\n", "\n", "echo \"Pushing Docker image ${fullname} to ECR \"\n", "docker push ${fullname}" ] }, { "cell_type": "markdown", "id": "ab808776", "metadata": {}, "source": [ "### Deploy Multi Model Endpoint" ] }, { "cell_type": "code", "execution_count": null, "id": "6f5b0bf1", "metadata": {}, "outputs": [], "source": [ "# Cell 12\n", "from sagemaker import get_execution_role\n", "\n", "sm_client = boto3.client(service_name=\"sagemaker\")\n", "runtime_sm_client = boto3.client(service_name=\"sagemaker-runtime\")\n", "\n", "account_id = boto3.client(\"sts\").get_caller_identity()[\"Account\"]\n", "region = boto3.Session().region_name\n", "\n", "role = get_execution_role()\n", "print(role)" ] }, { "cell_type": "markdown", "id": "b4d11d0f", "metadata": {}, "source": [ "### Create the SageMaker Multi-Model" ] }, { "cell_type": "code", "execution_count": null, "id": "0edada9e", "metadata": {}, "outputs": [], "source": [ "# Cell 13\n", "from time import gmtime, strftime\n", "\n", "model_name = \"catboost-multimodel-\" + strftime(\"%Y-%m-%d-%H-%M-%S\", gmtime())\n", "model_url = \"s3://{}/catboost/\".format(s3_bucket) ## MODEL S3 URL\n", "container = \"{}.dkr.ecr.{}.amazonaws.com/catboost-sagemaker-multimodel:latest\".format(\n", " account_id, region\n", ")\n", "instance_type = \"ml.m5.xlarge\"\n", "\n", "print(\"Model name: \" + model_name)\n", "print(\"Model data Url: \" + model_url)\n", "print(\"Container image: \" + container)\n", "\n", "container = {\"Image\": container, \"ModelDataUrl\": model_url, \"Mode\": \"MultiModel\"}\n", "\n", "create_model_response = sm_client.create_model(\n", " ModelName=model_name, ExecutionRoleArn=role, Containers=[container]\n", ")\n", "\n", "print(\"Model ARN: \" + create_model_response[\"ModelArn\"])" ] }, { "cell_type": "markdown", "id": "0cb23d9d", "metadata": {}, "source": [ "### Create the SageMaker Endpoint Configuration\n" ] }, { "cell_type": "code", "execution_count": null, "id": "9dc553db", "metadata": {}, "outputs": [], "source": [ "# Cell 14\n", "endpoint_config_name = \"catboost-multimodel-config\" + strftime(\"%Y-%m-%d-%H-%M-%S\", gmtime())\n", "print(\"Endpoint config name: \" + endpoint_config_name)\n", "\n", "create_endpoint_config_response = sm_client.create_endpoint_config(\n", " EndpointConfigName=endpoint_config_name,\n", " ProductionVariants=[\n", " {\n", " \"InstanceType\": instance_type,\n", " \"InitialInstanceCount\": 1,\n", " \"InitialVariantWeight\": 1,\n", " \"ModelName\": model_name,\n", " \"VariantName\": \"AllTraffic\",\n", " }\n", " ],\n", ")\n", "\n", "print(\"Endpoint config ARN: \" + create_endpoint_config_response[\"EndpointConfigArn\"])" ] }, { "cell_type": "markdown", "id": "3f4d1699", "metadata": {}, "source": [ "### Create the SageMaker Multi-Model Endpoint\n", "\n", "**This step will take a couple of minutes**" ] }, { "cell_type": "code", "execution_count": null, "id": "68d9248c", "metadata": {}, "outputs": [], "source": [ "%%time\n", "# Cell 15\n", "\n", "import time\n", "\n", "endpoint_name = \"catboost-multimodel-endpoint-\" + strftime(\"%Y-%m-%d-%H-%M-%S\", gmtime())\n", "print(\"Endpoint name: \" + endpoint_name)\n", "\n", "create_endpoint_response = sm_client.create_endpoint(\n", " EndpointName=endpoint_name, EndpointConfigName=endpoint_config_name\n", ")\n", "print(\"Endpoint Arn: \" + create_endpoint_response[\"EndpointArn\"])\n", "\n", "resp = sm_client.describe_endpoint(EndpointName=endpoint_name)\n", "status = resp[\"EndpointStatus\"]\n", "print(\"Endpoint Status: \" + status)\n", "\n", "print(\"Waiting for {} endpoint to be in service...\".format(endpoint_name))\n", "waiter = sm_client.get_waiter(\"endpoint_in_service\")\n", "waiter.wait(EndpointName=endpoint_name)\n", "\n", "print(\"Created {} endpoint is in Service and read to invoke ...\".format(endpoint_name))" ] }, { "cell_type": "markdown", "id": "4d8a5eec", "metadata": {}, "source": [ "### Invoke each of the 5 models\n", "We have identical models here to simulate multiple models belonging to the same framework" ] }, { "cell_type": "code", "execution_count": null, "id": "50844ad9", "metadata": {}, "outputs": [], "source": [ "# Cell 16\n", "from datetime import datetime\n", "import time\n", "\n", "for i in range(0, 4):\n", " start_time = datetime.now()\n", " response = runtime_sm_client.invoke_endpoint(\n", " EndpointName=endpoint_name,\n", " TargetModel=\"catboost-model-{}.tar.gz\".format(i),\n", " Body=df.to_csv(index=False),\n", " )\n", " time_delta = (datetime.now() - start_time).total_seconds() * 1000\n", " time_delta = \"{:.2f}\".format(time_delta)\n", "\n", " print(f'Time={time_delta} --- > ::{json.loads(response[\"Body\"].read().decode(\"utf-8\"))}')" ] }, { "cell_type": "markdown", "id": "d54ab45c", "metadata": {}, "source": [ "### Invoke just one of models 1000 times \n", "Since the models are in memory and loaded, these invocations should not have any latency \n" ] }, { "cell_type": "code", "execution_count": null, "id": "dbdcab68", "metadata": {}, "outputs": [], "source": [ "# Cell 17\n", "import numpy as np\n", "\n", "print(\"Starting invocation for model::catboost-model-1.tar.gz, please wait ...\")\n", "results = []\n", "for i in range(0, 1000):\n", " start = time.time()\n", " response = runtime_sm_client.invoke_endpoint(\n", " EndpointName=endpoint_name,\n", " TargetModel=\"catboost-model-1.tar.gz\",\n", " Body=df.to_csv(index=False),\n", " )\n", " results.append((time.time() - start) * 1000)\n", "print(\"\\nPredictions for model latency: \\n\")\n", "print(\"\\nP95: \" + str(np.percentile(results, 95)) + \" ms\\n\")\n", "print(\"P90: \" + str(np.percentile(results, 90)) + \" ms\\n\")\n", "print(\"Average: \" + str(np.average(results)) + \" ms\\n\")" ] }, { "cell_type": "markdown", "id": "97bf8de1", "metadata": { "pycharm": { "name": "#%% md\n" } }, "source": [ "## Optional Clean up\n", "Clean up and delete the end point" ] }, { "cell_type": "code", "execution_count": null, "id": "3405794a", "metadata": { "pycharm": { "name": "#%%\n" } }, "outputs": [], "source": [ "# delete the end point\n", "# Cell 18\n", "\n", "sm_client.delete_endpoint(EndpointName=endpoint_name)" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "## Notebook CI Test Results\n", "\n", "This notebook was tested in multiple regions. The test results are as follows, except for us-west-2 which is shown at the top of the notebook.\n", "\n", "![This us-east-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/us-east-1/inference|nlp|realtime|byoc|multi_model_catboost.ipynb)\n", "\n", "![This us-east-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/us-east-2/inference|nlp|realtime|byoc|multi_model_catboost.ipynb)\n", "\n", "![This us-west-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/us-west-1/inference|nlp|realtime|byoc|multi_model_catboost.ipynb)\n", "\n", "![This ca-central-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/ca-central-1/inference|nlp|realtime|byoc|multi_model_catboost.ipynb)\n", "\n", "![This sa-east-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/sa-east-1/inference|nlp|realtime|byoc|multi_model_catboost.ipynb)\n", "\n", "![This eu-west-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/eu-west-1/inference|nlp|realtime|byoc|multi_model_catboost.ipynb)\n", "\n", "![This eu-west-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/eu-west-2/inference|nlp|realtime|byoc|multi_model_catboost.ipynb)\n", "\n", "![This eu-west-3 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/eu-west-3/inference|nlp|realtime|byoc|multi_model_catboost.ipynb)\n", "\n", "![This eu-central-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/eu-central-1/inference|nlp|realtime|byoc|multi_model_catboost.ipynb)\n", "\n", "![This eu-north-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/eu-north-1/inference|nlp|realtime|byoc|multi_model_catboost.ipynb)\n", "\n", "![This ap-southeast-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/ap-southeast-1/inference|nlp|realtime|byoc|multi_model_catboost.ipynb)\n", "\n", "![This ap-southeast-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/ap-southeast-2/inference|nlp|realtime|byoc|multi_model_catboost.ipynb)\n", "\n", "![This ap-northeast-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/ap-northeast-1/inference|nlp|realtime|byoc|multi_model_catboost.ipynb)\n", "\n", "![This ap-northeast-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/ap-northeast-2/inference|nlp|realtime|byoc|multi_model_catboost.ipynb)\n", "\n", "![This ap-south-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/ap-south-1/inference|nlp|realtime|byoc|multi_model_catboost.ipynb)\n" ] } ], "metadata": { "kernelspec": { "display_name": "conda_python3", "language": "python", "name": "conda_python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.12" } }, "nbformat": 4, "nbformat_minor": 5 }