{ "cells": [ { "cell_type": "markdown", "id": "fa00e4c6", "metadata": {}, "source": [ "# MLOps: Initial - Manual deployment & validation \n", "\n", "
\n", "\t⚠️ PRE-REQUISITE: Before proceeding with this notebook, please ensure that you have executed the 1-data-prep-feature-store.ipynb and 2-training-registry.ipynb Notebooks\n", "
" ] }, { "cell_type": "markdown", "id": "14a19cd8", "metadata": {}, "source": [ "## Contents\n", "\n", "- [Introduction](#Introduction)\n", "- [SageMaker Endpoint](#SageMaker-Endpoint)" ] }, { "cell_type": "markdown", "id": "b9bfc677", "metadata": {}, "source": [ "## Introduction" ] }, { "cell_type": "markdown", "id": "f0e1b87b", "metadata": {}, "source": [ "This is our third notebook which will explore the model deployment of ML workflow.\n", "\n", "Here, we will put on the hat of a `Data Scientist` or `ML Engineer` and will perform the task of model deployment which includes fetching the right model and deploying it for inference. This is often done during model development to validate consuming applications or inference code as well as for ad-hoc inference. In the early stages of MLOps adoptions, this is often a manual task. However, for models deployed that are serving live business critical production traffic - typically the deployment is handled through Infrastructure-as-Code(IaC)/Configuration-as-Code(CaC) outside the experimentation environment. This allows for robust deployment patterns that support advanced deployment strategies (ex. A/B, Shadow, Blue/Green) as well as automatic rollback capabilities.\n", "\n", "For this task we will be using Amazon SageMaker Model Hosting capabilities to manually deploy and perform inference. In this case, we will utilize a real-time endpoint; however, SageMaker has other deployment options to meet the needs of your use cases. Please refer to the provided [detailed guidance](https://docs.aws.amazon.com/sagemaker/latest/dg/deploy-model.html) when choosing the right option for your use case. We'll also pull features for inference from the online SageMaker Feature Store. Keep in mind, this is typically done through the consuming application and inference pipelines but we are showing it here to illustrate the manual steps often performed during validation stages of development. \n", "\n", "![Notebook3](./images/Notebook-3.png)\n", "\n", "Let's get started!" ] }, { "cell_type": "markdown", "id": "6980e54b-e544-4b4f-a230-d1d7c4f2fa52", "metadata": {}, "source": [ "**Imports**" ] }, { "cell_type": "code", "execution_count": null, "id": "504c6d5a", "metadata": { "tags": [] }, "outputs": [], "source": [ "!pip install -U sagemaker" ] }, { "cell_type": "code", "execution_count": null, "id": "da514842", "metadata": { "tags": [] }, "outputs": [], "source": [ "%store -r" ] }, { "cell_type": "code", "execution_count": null, "id": "1ebce8cc", "metadata": { "tags": [] }, "outputs": [], "source": [ "from urllib.parse import urlparse\n", "import io\n", "import time\n", "from sagemaker.model import ModelPackage\n", "import boto3\n", "import sagemaker\n", "from sagemaker.predictor import Predictor\n", "from sagemaker.serializers import CSVSerializer\n", "from sagemaker.deserializers import JSONDeserializer, CSVDeserializer\n", "import numpy as np\n", "import pathlib\n", "from sagemaker.feature_store.feature_group import FeatureGroup\n", "from helper_library import *" ] }, { "cell_type": "markdown", "id": "8759a53f", "metadata": {}, "source": [ "**Session variables**" ] }, { "cell_type": "code", "execution_count": null, "id": "d951f780", "metadata": { "tags": [] }, "outputs": [], "source": [ "# Useful SageMaker variables\n", "sagemaker_session = sagemaker.Session()\n", "bucket = sagemaker_session.default_bucket()\n", "role_arn= sagemaker.get_execution_role()\n", "region = sagemaker_session.boto_region_name\n", "s3_client = boto3.client('s3', region_name=region)\n", "sagemaker_client = boto3.client('sagemaker')" ] }, { "cell_type": "markdown", "id": "5774fc61", "metadata": {}, "source": [ "## SageMaker Endpoint" ] }, { "cell_type": "markdown", "id": "e7786ab5", "metadata": {}, "source": [ "You can deploy your trained model as [SageMaker hosted endpoint](https://docs.aws.amazon.com/sagemaker/latest/dg/realtime-endpoints-deployment.html) which serves real-time predictions from a trained model. The endpoint will retrieve the model created during training and deploy it within a SageMaker scikit-learn container. This all can be accomplished with one line of code. Note that it will take several minutes to deploy the model to a hosted endpoint." ] }, { "cell_type": "markdown", "id": "cfdb209e", "metadata": {}, "source": [ "Let's get the model we registered in the Model Registry." ] }, { "cell_type": "code", "execution_count": null, "id": "79882a19", "metadata": { "tags": [] }, "outputs": [], "source": [ "xgb_regressor_model = ModelPackage(\n", " role_arn,\n", " model_package_arn=model_package_arn,\n", " name=model_name\n", ")" ] }, { "cell_type": "markdown", "id": "81b0e02f", "metadata": {}, "source": [ "It's current status is `PendingApproval`. In order to use this model for offline predictions or as a real-time endpoint, we should adopt the practice of setting the status to `Approved` prior to deployment. You can optionally register your model in `Approved` status at the time of model registration; however, this option allows for as secondary peer review approval workflow. " ] }, { "cell_type": "code", "execution_count": null, "id": "0378ac9e", "metadata": { "tags": [] }, "outputs": [], "source": [ "sagemaker_client.update_model_package(\n", " ModelPackageArn=xgb_regressor_model.model_package_arn,\n", " ModelApprovalStatus='Approved'\n", ")" ] }, { "cell_type": "markdown", "id": "018ecb31", "metadata": {}, "source": [ "Now we can deploy it!" ] }, { "cell_type": "code", "execution_count": null, "id": "7267a878", "metadata": { "tags": [] }, "outputs": [], "source": [ "from IPython.core.display import display, HTML\n", "endpoint_name = f'{model_name}-endpoint-' + time.strftime('%Y-%m-%d-%H-%M-%S')\n", "display(\n", " HTML(\n", " 'Review The Endpoint After About 5 Minutes'.format(\n", " region, endpoint_name\n", " )\n", " )\n", ")\n", "xgb_regressor_model.deploy(\n", " initial_instance_count=1,\n", " instance_type='ml.m5.xlarge', # 'ml.t2.medium',\n", " endpoint_name=endpoint_name\n", ")" ] }, { "cell_type": "markdown", "id": "171d75d9", "metadata": {}, "source": [ "Let's test this real-time endpoint by passing it some data and getting a real-time prediction back." ] }, { "cell_type": "markdown", "id": "b666e361", "metadata": {}, "source": [ "## Use the test set to test the preditions" ] }, { "cell_type": "code", "execution_count": null, "id": "5bab5a4c-2d6a-4609-bc10-c6cca13d3270", "metadata": { "tags": [] }, "outputs": [], "source": [ "import pandas as pd \n", "import os\n", "import time\n", "from glob import glob\n", "# Copy the link of one of the csv files under ./data/processed/test/\n", "path = './data/processed/test/'\n", "csv_files = glob(os.path.join(path, \"*.csv\"))\n", "df = pd.read_csv(csv_files[0])\n", "df.head(5)" ] }, { "cell_type": "code", "execution_count": null, "id": "7436ebc7", "metadata": { "tags": [] }, "outputs": [], "source": [ "# Attach to the SageMaker endpoint\n", "predictor = Predictor(\n", " endpoint_name=endpoint_name,\n", " sagemaker_session=sagemaker_session,\n", " serializer=CSVSerializer(),\n", " deserializer=CSVDeserializer()\n", ")\n", "\n", "dropped_df = df.drop(columns=[\"PRICE\"])\n", "\n", "# Get a real-time prediction (only predicting the 1st 5 row to reduce output size)\n", "predictor.predict(dropped_df[:5].to_csv(index=False, header=False))" ] }, { "cell_type": "markdown", "id": "2352f567-aac9-4804-8b24-68777aa0a92d", "metadata": {}, "source": [ "Predict using the raw number. Remember to apply the scaling to the input variable before prediction" ] }, { "cell_type": "code", "execution_count": null, "id": "5a4eb155", "metadata": { "tags": [] }, "outputs": [], "source": [ "predictor.predict('-1.2786684709120923,-0.2238534355557023,-1.4253170501458523,-0.0020177318669674,-0.6703058406786521,1.330874688860325,-0.9766068758234135,1.0010005005003753')" ] }, { "cell_type": "code", "execution_count": null, "id": "2d67e425-3833-4046-b4c6-8593d9d8d841", "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "availableInstances": [ { "_defaultOrder": 0, "_isFastLaunch": true, "category": "General purpose", "gpuNum": 0, "memoryGiB": 4, "name": "ml.t3.medium", "vcpuNum": 2 }, { "_defaultOrder": 1, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "memoryGiB": 8, "name": "ml.t3.large", "vcpuNum": 2 }, { "_defaultOrder": 2, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "memoryGiB": 16, "name": "ml.t3.xlarge", "vcpuNum": 4 }, { "_defaultOrder": 3, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "memoryGiB": 32, "name": "ml.t3.2xlarge", "vcpuNum": 8 }, { "_defaultOrder": 4, "_isFastLaunch": true, "category": "General purpose", "gpuNum": 0, "memoryGiB": 8, "name": "ml.m5.large", "vcpuNum": 2 }, { "_defaultOrder": 5, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "memoryGiB": 16, "name": "ml.m5.xlarge", "vcpuNum": 4 }, { "_defaultOrder": 6, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "memoryGiB": 32, "name": "ml.m5.2xlarge", "vcpuNum": 8 }, { "_defaultOrder": 7, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "memoryGiB": 64, "name": "ml.m5.4xlarge", "vcpuNum": 16 }, { "_defaultOrder": 8, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "memoryGiB": 128, "name": "ml.m5.8xlarge", "vcpuNum": 32 }, { "_defaultOrder": 9, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "memoryGiB": 192, "name": "ml.m5.12xlarge", "vcpuNum": 48 }, { "_defaultOrder": 10, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "memoryGiB": 256, "name": "ml.m5.16xlarge", "vcpuNum": 64 }, { "_defaultOrder": 11, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "memoryGiB": 384, "name": "ml.m5.24xlarge", "vcpuNum": 96 }, { "_defaultOrder": 12, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "memoryGiB": 8, "name": "ml.m5d.large", "vcpuNum": 2 }, { "_defaultOrder": 13, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "memoryGiB": 16, "name": "ml.m5d.xlarge", "vcpuNum": 4 }, { "_defaultOrder": 14, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "memoryGiB": 32, "name": "ml.m5d.2xlarge", "vcpuNum": 8 }, { "_defaultOrder": 15, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "memoryGiB": 64, "name": "ml.m5d.4xlarge", "vcpuNum": 16 }, { "_defaultOrder": 16, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "memoryGiB": 128, "name": "ml.m5d.8xlarge", "vcpuNum": 32 }, { "_defaultOrder": 17, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "memoryGiB": 192, "name": "ml.m5d.12xlarge", "vcpuNum": 48 }, { "_defaultOrder": 18, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "memoryGiB": 256, "name": "ml.m5d.16xlarge", "vcpuNum": 64 }, { "_defaultOrder": 19, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "memoryGiB": 384, "name": "ml.m5d.24xlarge", "vcpuNum": 96 }, { "_defaultOrder": 20, "_isFastLaunch": true, "category": "Compute optimized", "gpuNum": 0, "memoryGiB": 4, "name": "ml.c5.large", "vcpuNum": 2 }, { "_defaultOrder": 21, "_isFastLaunch": false, "category": "Compute optimized", "gpuNum": 0, "memoryGiB": 8, "name": "ml.c5.xlarge", "vcpuNum": 4 }, { "_defaultOrder": 22, "_isFastLaunch": false, "category": "Compute optimized", "gpuNum": 0, "memoryGiB": 16, "name": "ml.c5.2xlarge", "vcpuNum": 8 }, { "_defaultOrder": 23, "_isFastLaunch": false, "category": "Compute optimized", "gpuNum": 0, "memoryGiB": 32, "name": "ml.c5.4xlarge", "vcpuNum": 16 }, { "_defaultOrder": 24, "_isFastLaunch": false, "category": "Compute optimized", "gpuNum": 0, "memoryGiB": 72, "name": "ml.c5.9xlarge", "vcpuNum": 36 }, { "_defaultOrder": 25, "_isFastLaunch": false, "category": "Compute optimized", "gpuNum": 0, "memoryGiB": 96, "name": "ml.c5.12xlarge", "vcpuNum": 48 }, { "_defaultOrder": 26, "_isFastLaunch": false, "category": "Compute optimized", "gpuNum": 0, "memoryGiB": 144, "name": "ml.c5.18xlarge", "vcpuNum": 72 }, { "_defaultOrder": 27, "_isFastLaunch": false, "category": "Compute optimized", "gpuNum": 0, "memoryGiB": 192, "name": "ml.c5.24xlarge", "vcpuNum": 96 }, { "_defaultOrder": 28, "_isFastLaunch": true, "category": "Accelerated computing", "gpuNum": 1, "memoryGiB": 16, "name": "ml.g4dn.xlarge", "vcpuNum": 4 }, { "_defaultOrder": 29, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 1, "memoryGiB": 32, "name": "ml.g4dn.2xlarge", "vcpuNum": 8 }, { "_defaultOrder": 30, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 1, "memoryGiB": 64, "name": "ml.g4dn.4xlarge", "vcpuNum": 16 }, { "_defaultOrder": 31, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 1, "memoryGiB": 128, "name": "ml.g4dn.8xlarge", "vcpuNum": 32 }, { "_defaultOrder": 32, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 4, "memoryGiB": 192, "name": "ml.g4dn.12xlarge", "vcpuNum": 48 }, { "_defaultOrder": 33, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 1, "memoryGiB": 256, "name": "ml.g4dn.16xlarge", "vcpuNum": 64 }, { "_defaultOrder": 34, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 1, "memoryGiB": 61, "name": "ml.p3.2xlarge", "vcpuNum": 8 }, { "_defaultOrder": 35, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 4, "memoryGiB": 244, "name": "ml.p3.8xlarge", "vcpuNum": 32 }, { "_defaultOrder": 36, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 8, "memoryGiB": 488, "name": "ml.p3.16xlarge", "vcpuNum": 64 }, { "_defaultOrder": 37, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 8, "memoryGiB": 768, "name": "ml.p3dn.24xlarge", "vcpuNum": 96 }, { "_defaultOrder": 38, "_isFastLaunch": false, "category": "Memory Optimized", "gpuNum": 0, "memoryGiB": 16, "name": "ml.r5.large", "vcpuNum": 2 }, { "_defaultOrder": 39, "_isFastLaunch": false, "category": "Memory Optimized", "gpuNum": 0, "memoryGiB": 32, "name": "ml.r5.xlarge", "vcpuNum": 4 }, { "_defaultOrder": 40, "_isFastLaunch": false, "category": "Memory Optimized", "gpuNum": 0, "memoryGiB": 64, "name": "ml.r5.2xlarge", "vcpuNum": 8 }, { "_defaultOrder": 41, "_isFastLaunch": false, "category": "Memory Optimized", "gpuNum": 0, "memoryGiB": 128, "name": "ml.r5.4xlarge", "vcpuNum": 16 }, { "_defaultOrder": 42, "_isFastLaunch": false, "category": "Memory Optimized", "gpuNum": 0, "memoryGiB": 256, "name": "ml.r5.8xlarge", "vcpuNum": 32 }, { "_defaultOrder": 43, "_isFastLaunch": false, "category": "Memory Optimized", "gpuNum": 0, "memoryGiB": 384, "name": "ml.r5.12xlarge", "vcpuNum": 48 }, { "_defaultOrder": 44, "_isFastLaunch": false, "category": "Memory Optimized", "gpuNum": 0, "memoryGiB": 512, "name": "ml.r5.16xlarge", "vcpuNum": 64 }, { "_defaultOrder": 45, "_isFastLaunch": false, "category": "Memory Optimized", "gpuNum": 0, "memoryGiB": 768, "name": "ml.r5.24xlarge", "vcpuNum": 96 }, { "_defaultOrder": 46, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 1, "memoryGiB": 16, "name": "ml.g5.xlarge", "vcpuNum": 4 }, { "_defaultOrder": 47, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 1, "memoryGiB": 32, "name": "ml.g5.2xlarge", "vcpuNum": 8 }, { "_defaultOrder": 48, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 1, "memoryGiB": 64, "name": "ml.g5.4xlarge", "vcpuNum": 16 }, { "_defaultOrder": 49, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 1, "memoryGiB": 128, "name": "ml.g5.8xlarge", "vcpuNum": 32 }, { "_defaultOrder": 50, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 1, "memoryGiB": 256, "name": "ml.g5.16xlarge", "vcpuNum": 64 }, { "_defaultOrder": 51, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 4, "memoryGiB": 192, "name": "ml.g5.12xlarge", "vcpuNum": 48 }, { "_defaultOrder": 52, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 4, "memoryGiB": 384, "name": "ml.g5.24xlarge", "vcpuNum": 96 }, { "_defaultOrder": 53, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 8, "memoryGiB": 768, "name": "ml.g5.48xlarge", "vcpuNum": 192 } ], "instance_type": "ml.t3.medium", "kernelspec": { "display_name": "Python 3 (Data Science 3.0)", "language": "python", "name": "python3__SAGEMAKER_INTERNAL__arn:aws:sagemaker:us-east-1:081325390199:image/sagemaker-data-science-310-v1" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.6" } }, "nbformat": 4, "nbformat_minor": 5 }