{
"cells": [
{
"attachments": {},
"cell_type": "markdown",
"metadata": {
"collapsed": false
},
"source": [
"# Customizing the Build/Train/Deploy MLOps Project Template\n"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"---\n",
"\n",
"This notebook's CI test result for us-west-2 is as follows. CI test results in other regions can be found at the end of the notebook. \n",
"\n",
"\n",
"\n",
"---"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {
"collapsed": false
},
"source": [
"\n",
"We recently announced [Amazon SageMaker Pipelines](https://aws.amazon.com/sagemaker/pipelines/), the first \n",
"purpose-built, easy-to-use Continuous Integration and Continuous Delivery (CI/CD) service for machine learning. \n",
"SageMaker Pipelines has three main components which improves the operational resilience and reproducibility of your \n",
"workflows: Pipelines, Model Registry, and Projects. \n",
"\n",
"SageMaker Projects introduce MLOps templates that automatically provision the underlying resources needed to enable \n",
"CI/CD capabilities for your Machine Learning Development Lifecycle (MLDC). Customers can use a number of built-in \n",
"templates or create your own custom templates.\n",
"\n",
"This example will focus on using one of the MLOps templates to bootstrap your ML project and establish a CI/CD \n",
"pattern from seed code. We’ll show how to use the built-in Build/Train/Deploy Project template as a base for a \n",
"customer churn classification example. This base template will enable CI/CD for training machine learning models, \n",
"registering model artifacts to the Model Registry, and automating model deployment with manual approval and automated \n",
"testing."
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {
"collapsed": false
},
"source": [
"## MLOps Template for Build, Train, and Deploy\n",
"\n",
"We’ll start by taking a detailed look at what AWS services are launched when this build, train, deploy MLOps template \n",
"is launched. Later, we’ll discuss how the skeleton can be modified for a custom use case. \n",
"\n",
"To get started with SageMaker Projects, [they must be first enabled in the SageMaker Studio console](https://docs.aws.amazon.com/sagemaker/latest/dg/sagemaker-projects-studio-updates.html). \n",
"This can be done for existing users or while creating new ones:\n",
"\n",
"
\n",
"\n",
"Within Amazon SageMaker Studio, you can now select “Projects” from a drop-down menu on the “Components and registries” \n",
"tab as shown below:\n",
"\n",
"
\n",
"\n",
"From the projects page you’ll have the option to launch a pre-configured SageMaker MLOps template. We'll select the build, train and deploy template:\n",
"\n",
"
\n",
"\n",
"NOTE: Launching this template will kick off a model building pipeline by default and will train a regression model. This will incur a small cost.\n",
"\n",
"Once the project is created from the MLOps template, the following architecture will be deployed:\n",
"\n",
"
\n"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {
"collapsed": false
},
"source": [
"## Modifying the Seed Code for Custom Use Case\n",
"\n",
"After your project has been created the architecture shown above will be deployed and the visualization of the \n",
"Pipeline will be available in the “Pipelines” drop down menu within SageMaker Studio.\n",
"\n",
"In order to modify the seed code from this launched template, we’ll first need to clone the AWS CodeCommit \n",
"repositories to our local SageMaker Studio instance. From the list of projects, select the one that was just \n",
"created. Under the “Repositories” tab you can select the hyperlinks to locally clone the AWS CodeCommit repos:\n",
"\n",
"
\n"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {
"collapsed": false
},
"source": [
"### ModelBuild Repo\n",
"\n",
"The “ModelBuild” repository contains the code for preprocessing, training, and evaluating the model. \n",
"The seed code trains and evaluates a model on the [UCI Abalone dataset](https://archive.ics.uci.edu/ml/datasets/abalone). We can modify these files in order to \n",
"solve our own customer churn use-case.\n",
"\n",
"
\n",
"\n",
"We’ll need a dataset accessible to the project. The easiest way to do this is to open a new SageMaker notebook \n",
"inside Studio and run the following cells:"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {
"collapsed": false,
"pycharm": {
"name": "#%% md\n"
}
},
"source": [
"```\n",
"import boto3\n",
"\n",
"region = boto3.session.Session().region_name\n",
"\n",
"s3 = boto3.client(\"s3\")\n",
"s3.download_file(f\"sagemaker-example-files-prod-{region}\", \"datasets/tabular/synthetic/churn.txt\", \"churn.txt\")\n",
"```\n",
"\n",
"```\n",
"import os\n",
"import boto3\n",
"import sagemaker\n",
"prefix = 'sagemaker/DEMO-xgboost-churn'\n",
"region = boto3.Session().region_name\n",
"default_bucket = sagemaker.session.Session().default_bucket()\n",
"role = sagemaker.get_execution_role()\n",
"\n",
"RawData = boto3.Session().resource('s3')\\\n",
".Bucket(default_bucket).Object(os.path.join(prefix, 'data/RawData.csv'))\\\n",
".upload_file('./churn.txt')\n",
"\n",
"print(os.path.join(\"s3://\",default_bucket, prefix, 'data/RawData.csv'))\n",
"```"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {
"collapsed": false,
"pycharm": {
"name": "#%% md\n"
}
},
"source": [
"We'll need to rename the `abalone` directory to `customer_churn`. That will require us to replace `codebuild-buildspec.yml`\n",
"in your Studio project with the one found in [this directory](modelbuild/codebuild-buildspec.yml) \n",
"\n",
"Next, replace the `preprocess.py`, `evaluate.py` and `pipeline.py` with the ones found in [this example directory](modelbuild/pipelines/customer_churn).\n",
"\n",
"**Note: You'll need to replace the `default_value` of \"InputDataURL\" with the URL you obtained when uploading the data above.**\n"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {
"collapsed": false
},
"source": [
"## Trigger a new Pipeline Execution through git commit\n",
"\n",
"By committing these changes to the AWS CodeCommit repository (easily done in SageMaker Studio source control tab), a \n",
"new Pipeline execution will be triggered since there is an EventBridge monitoring for commits. After a few moments, \n",
"we can monitor the execution by selecting your Pipeline inside of the SageMaker Project.\n",
"\n",
"
\n",
"\n",
"
\n",
"\n",
"
\n",
"\n",
"Once completed, we can go to our “Model groups” tab inside of the SageMaker Project and inspect the metadata attached \n",
"to the model artifacts. If everything looks good, we can manually approve the model:\n",
"\n",
"
\n",
"\n",
"
\n",
"\n",
"This approval will trigger the ModelDeploy pipeline and expose an endpoint for real time inference.\n",
"\n",
"
"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {
"collapsed": false
},
"source": [
"## Trigger a new Pipeline Execution through SDK\n",
"\n",
"Alternatively you can also retrieve and execute an existing pipeline through the sagemaker SDK. The template created a \n",
"file `get_pipeline` which you can use to trigger an execution in your own notebook\n"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {
"collapsed": false,
"pycharm": {
"name": "#%% md\n"
}
},
"source": [
"```\n",
"# This is the module name or the path to your pipeline.py file.\n",
"from pipelines.customer_churn.pipeline import get_pipeline\n",
"\n",
"model_package_group_name = f\"CustomerChurnPackageGroup\"\n",
"pipeline_name = f\"CustomerChurnDemo-p-ewf8t7lvhivm\"\n",
"\n",
"\n",
"# These variables were defined the IAM role.\n",
"pipeline = get_pipeline(\n",
" region=region,\n",
" role=role,\n",
" default_bucket=default_bucket,\n",
" model_package_group_name=model_package_group_name,\n",
" pipeline_name=pipeline_name,\n",
")\n",
"```"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {
"collapsed": false,
"pycharm": {
"name": "#%% md\n"
}
},
"source": [
"### Submit the pipeline to SageMaker and start execution\n",
"\n",
"Let's submit our pipeline definition to the workflow service. The role passed in will be used by the workflow service to create all the jobs defined in the steps.\n",
"\n",
"```\n",
"pipeline.upsert(role_arn=role)\n",
"execution = pipeline.start()\n",
"\n",
"execution.describe()\n",
"execution.wait()\n",
"```"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {
"collapsed": false,
"pycharm": {
"name": "#%% md\n"
}
},
"source": [
"### Parametrized Executions\n",
"\n",
"We can run additional executions of the pipeline specifying different pipeline parameters. The parameters argument is a \n",
"dictionary whose names are the parameter names, and whose values are the primitive values to use as overrides of the defaults.\n",
"\n",
"Of particular note, based on the performance of the model, we may want to kick off another pipeline execution, but this \n",
"time set the model approval status automatically be \"Approved\". This means\n",
"that the model package version generated by the `RegisterModel` step will automatically be ready for deployment through \n",
"CI/CD pipelines, such as with SageMaker Projects.\n",
"\n",
"```\n",
"# Note: You can change the ModelApprovalStatus to \"PendingManualApproval\". This is the default set in the pipeline.py file.\n",
"\n",
"execution = pipeline.start(\n",
" parameters=dict(\n",
" ModelApprovalStatus=\"Approved\",\n",
" )\n",
")\n",
"\n",
"\n",
"execution.wait()\n",
"execution.list_steps()\n",
"```"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"## Notebook CI Test Results\n",
"\n",
"This notebook was tested in multiple regions. The test results are as follows, except for us-west-2 which is shown at the top of the notebook.\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n"
]
}
],
"metadata": {
"instance_type": "ml.t3.medium",
"kernelspec": {
"display_name": "Python 3 (Data Science 3.0)",
"language": "python",
"name": "python3__SAGEMAKER_INTERNAL__arn:aws:sagemaker:us-east-1:081325390199:image/sagemaker-data-science-310-v1"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.6"
}
},
"nbformat": 4,
"nbformat_minor": 4
}