{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# MLOps Demo\n", "This notebook walks you through few of the features of this MLOps SageMaker template.\n", "\n", "For details of the use case, the high level architecture check the [README](README.md)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Prepare the environment" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import logging\n", "\n", "import requests\n", "import sagemaker\n", "from utils.get_datasets import get_and_upload_data" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "logger = logging.getLogger(name='project')\n", "sagemaker_session = sagemaker.Session()\n", "boto_session = sagemaker_session.boto_session\n", "sagemaker_client = boto_session.client('sagemaker')\n", "region = sagemaker_session.boto_region_name\n", "ssm = boto_session.client('ssm')" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "project_name = \"\" # <--- fill here" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Upload Demo Data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Define where the example data file will be stored in S3" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "claims_raw_uri = ssm.get_parameter(Name=f\"/sagemaker-{project_name}/{project_name}-claims\")['Parameter']['Value']\n", "customers_raw_uri = ssm.get_parameter(Name=f\"/sagemaker-{project_name}/{project_name}-customers\")['Parameter']['Value']\n", "\n", "logger.info(f\"Claims dataset URI: {claims_raw_uri}\")\n", "logger.info(f\"Customers dataset URI: {claims_raw_uri}\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Download the data from the Amazon SageMaker Example GitHub repository" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "base_url = \"https://raw.githubusercontent.com/aws/amazon-sagemaker-examples/main/end_to_end/fraud_detection/data/\"\n", "file_list = [\"claims.csv\", \"customers.csv\"]\n", "uri_list = [claims_raw_uri, customers_raw_uri]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Download the datasets and upload them to the designated URI" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "for k,j in zip(file_list, uri_list):\n", " get_and_upload_data(base_url + k, j)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The project creates an EventBrige rule for each Feature Ingestion pipeline. During next scheduled pipelines execution, the data will be transformed and uploaded to the Feature Store.\n", "\n", "They are scheduled to run every 12 hours... if you don't wait to wait that long, we can trigger the pipeline run manually with the code below. " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "customers_pipeline_name = f'{project_name}-customers-preprocessing'\n", "claims_pipeline_name = f'{project_name}-claims-preprocessing'\n", "\n", "customers_pipeline_execution = sagemaker_client.start_pipeline_execution(\n", " PipelineName=customers_pipeline_name,\n", " PipelineExecutionDisplayName=\"ManualExecution\",\n", " PipelineParameters=[\n", " {\"Name\": \"InputDataUrl\", \"Value\": customers_raw_uri},\n", " ],\n", ")\n", "\n", "claims_pipeline_execution = sagemaker_client.start_pipeline_execution(\n", " PipelineName=claims_pipeline_name,\n", " PipelineExecutionDisplayName=\"ManualExecution\",\n", " PipelineParameters=[\n", " {\"Name\": \"InputDataUrl\", \"Value\": claims_raw_uri},\n", " ],\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "⬅️ You can observe the progress of the pipeline by double-cliking on the pipeline name in the `Pipelines` panel on the left hand side.\n", "\n", "Once the pipelines executions are completed, We can now confirm data is in the Feature Store" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "featurestore_runtime = boto_session.client(\n", " service_name=\"sagemaker-featurestore-runtime\", region_name=region\n", ")\n", "claims_fg_name=f\"{project_name}-claims\"\n", "customers_fg_name=f\"{project_name}-customers\"" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "featurestore_runtime.get_record(\n", " FeatureGroupName=claims_fg_name,\n", " RecordIdentifierValueAsString=f\"{9}\",\n", " )['Record']" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "featurestore_runtime.get_record(\n", " FeatureGroupName=customers_fg_name,\n", " RecordIdentifierValueAsString=f\"{9}\",\n", " )['Record']" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Model Building\n", "With data in the feature store, you can now start the model building pipeline. You can leave the default parameter values." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "xgboost_pipeline_name = f\"{project_name}-build-xgboost\"" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "sagemaker_client.start_pipeline_execution(\n", " PipelineName=xgboost_pipeline_name,\n", " PipelineExecutionDisplayName=\"ManualExecution\",\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "⬅️ You can observe the progress of the pipeline by double-cliking on the pipeline name in the `Pipelines` panel on the left hand side.\n", "\n", "Once the model building pipeline execution is completed, you can check the model training metrics from the model registry.\n", "\n", "⬅️ You can access the `Model registry` from the panel on the left." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Model deployment" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The template deploys an EventBridge rule that triggers the execution of a CodePipeline when a new version of the model is approved.\n", "\n", "⬅️ You can approve the model from the `Model registry` panel...\n", "\n", " ⬇️ or run the cell below to approve the latest unapproved model." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "model_package_group_name=f\"{project_name}-fraud-classification-xgboost\"\n", "\n", "model_package_arn = sagemaker_client.list_model_packages(\n", " ModelPackageGroupName=model_package_group_name,\n", " ModelApprovalStatus='PendingManualApproval',\n", " SortBy='CreationTime',\n", " SortOrder='Descending'\n", ")['ModelPackageSummaryList'][0]['ModelPackageArn']\n", "sagemaker_client.update_model_package(\n", " ModelPackageArn=model_package_arn,\n", " ModelApprovalStatus='Approved')\n", "\n", "logger.info(f\"{model_package_arn} Approved\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Testing the Real Time Endpoint" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "try:\n", " live_endpoint = ssm.get_parameter(Name=f\"/sagemaker-{project_name}/{project_name}-xgboost\")['Parameter']['Value']\n", "except ssm.exceptions.ParameterNotFound:\n", " logger.exception(\"Possibly the Real Time endpoint has not been deployed yet\", exc_info=False)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%timeit requests.get(live_endpoint, params=dict(policy_id=1)).json()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "preds = [\n", " requests.get(live_endpoint, params=dict(policy_id=k)).json()\n", " for k\n", " in range(1, 6)\n", "]\n", "\n", "preds" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Testing Batch Inference\n", "\n", "To load the predictions into DynamoDB, you can trigger the batch inference pipeline, either via the Studio UI or with the code below." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "batch_pipeline_name = f'{project_name}-batch-transform'\n", "try:\n", " batch_pipeline_execution = sagemaker_client.start_pipeline_execution(\n", " PipelineName=batch_pipeline_name,\n", " PipelineExecutionDisplayName=\"ManualExecution\",\n", " )\n", "except sagemaker_client.exceptions.ResourceNotFound:\n", " logger.exception(\"Possibly the Batch transform stack has not been deployed yet\", exc_info=False)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Once the pipeline execution is completed, we can access the cached predictions via the REST endpoint." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "try:\n", " ddb_serving = ssm.get_parameter(Name=f\"/sagemaker-{project_name}/{project_name}-batch-transform\")['Parameter']['Value']\n", "except ssm.exceptions.ParameterNotFound:\n", " logger.exception(\"The serving stack might have not been deployed yet\", exc_info=False)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%timeit requests.get(ddb_serving, params=dict(policy_id=3))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "preds_cached = [\n", " requests.get(ddb_serving, params=dict(policy_id=k)).json()\n", " for k\n", " in range(1, 6)\n", "]\n", "preds_cached" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Cleanup\n", "\n", "Uncomment the code in the following cells and run it to clear the resources created by the SageMaker Project." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Removing all model version and model\n", "packages" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# [\n", "# sagemaker_client.delete_model_package(ModelPackageName=k['ModelPackageArn'])\n", "# for k\n", "# in sagemaker_client.list_model_packages(\n", "# ModelPackageGroupName=model_package_group_name,\n", "# )['ModelPackageSummaryList']\n", "# ]\n", "# sagemaker_client.delete_model_package_group(ModelPackageGroupName=model_package_group_name)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You might want to remove the three CloudFormation Stacks created by the project\n", "\n", "[CloudFormation Console](https://console.aws.amazon.com/cloudformation/home)\n", "\n", "Once the stacks have finished deleting, it is possible to delete the SageMaker Project. This will also trigger the deletion of the CloudFormation Stack." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# sagemaker_client.delete_project(ProjectName=project_name)" ] } ], "metadata": { "instance_type": "ml.t3.medium", "kernelspec": { "display_name": "Python 3 (Data Science)", "language": "python", "name": "python3__SAGEMAKER_INTERNAL__arn:aws:sagemaker:ap-southeast-1:492261229750:image/datascience-1.0" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.10" } }, "nbformat": 4, "nbformat_minor": 4 }