{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Orchestrating Jobs, Model Registration, and Continuous Deployment with Amazon SageMaker in a secure environment\n",
"\n",
"Amazon SageMaker offers Machine Learning application developers and Machine Learning operations engineers the ability to orchestrate SageMaker jobs and author reproducible Machine Learning pipelines, deploy custom-build models for inference in real-time with low latency or offline inferences with Batch Transform, and track lineage of artifacts. You can institute sound operational practices in deploying and monitoring production workflows, deployment of model artifacts, and track artifact lineage through a simple interface, adhering to safety and best-practice paradigmsfor Machine Learning application development.\n",
"\n",
"The SageMaker Pipelines service supports a SageMaker Machine Learning Pipeline Domain Specific Language (DSL), which is a declarative Json specification. This DSL defines a Directed Acyclic Graph (DAG) of pipeline parameters and SageMaker job steps. The SageMaker Python Software Developer Kit (SDK) streamlines the generation of the pipeline DSL using constructs that are already familiar to engineers and scientists alike.\n",
"\n",
"The SageMaker Model Registry is where trained models are stored, versioned, and managed. Data Scientists and Machine Learning Engineers can compare model versions, approve models for deployment, and deploy models from different AWS accounts, all from a single Model Registry. SageMaker enables customers to follow the best practices with ML Ops and getting started right. Customers are able to standup a full ML Ops end-to-end system with a single API call."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## SageMaker Pipelines\n",
"\n",
"Amazon SageMaker Pipelines support the following activites:\n",
"\n",
"* Pipelines - A Directed Acyclic Graph of steps and conditions to orchestrate SageMaker jobs and resource creation.\n",
"* Processing Job steps - A simplified, managed experience on SageMaker to run data processing workloads, such as feature engineering, data validation, model evaluation, and model interpretation.\n",
"* Training Job steps - An iterative process that teaches a model to make predictions by presenting examples from a training dataset.\n",
"* Conditional step execution - Provides conditional execution of branches in a pipeline.\n",
"* Registering Models - Creates a model package resource in the Model Registry that can be used to create deployable models in Amazon SageMaker.\n",
"* Creating Model steps - Create a model for use in transform steps or later publication as an endpoint.\n",
"* Parametrized Pipeline executions - Allows pipeline executions to vary by supplied parameters.\n",
"* Transform Job steps - A batch transform to preprocess datasets to remove noise or bias that interferes with training or inference from your dataset, get inferences from large datasets, and run inference when you don't need a persistent endpoint."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Running SageMaker workloads in a secure network configuration\n",
"\n",
"This data science environment is deployed in a secure network configuration with private VPC, private subnets and security groups. \n",
"\n",
"To ensure and enforce the usage of this secure environment, we implement preventive security controls using IAM policy for SageMaker execution role.\n",
"\n",
"This following IAM policy code enforce usage of VPC isolation (with private subnets) for SageMaker notebook instances, processing, training, and tuning jobs, as well as for models.\n",
"\n",
"```json\n",
"{\n",
" \"Action\": [\n",
" \"sagemaker:CreateNotebookInstance\",\n",
" \"sagemaker:CreateHyperParameterTuningJob\",\n",
" \"sagemaker:CreateProcessingJob\",\n",
" \"sagemaker:CreateTrainingJob\",\n",
" \"sagemaker:CreateModel\"\n",
" ],\n",
" \"Resource\": \"*\",\n",
" \"Effect\": \"Deny\",\n",
" \"Condition\": {\n",
" \"Null\": {\n",
" \"sagemaker:VpcSubnets\": \"true\"\n",
" }\n",
" }\n",
"}\n",
"```\n",
"\n",
"Any SageMaker workload which is running under SageMaker execution role is enforced to use VPC configuration for these tasks. If you try to create any job without specifying a network configuration, an `AccessDeny` exception will be raised.\n",
"The following sections show step-by-step how to configure the secure execution of SageMaker workloads.\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Implementing preventive security controls using IAM condition keys\n",
"\n",
"[SageMaker-specific IAM condition keys](https://docs.aws.amazon.com/IAM/latest/UserGuide/list_amazonsagemaker.html) can be used to improve security by preventing resources from being created without security controls. For example:\n",
"\n",
"```json\n",
"\"Condition\": {\n",
" \"StringEquals\": {\n",
" \"sagemaker:RootAccess\": \"Disabled\"\n",
" }\n",
"}\n",
"```\n",
"\n",
"Security-specific examples of the condition keys:\n",
"- `sagemaker:DirectInternetAccess`\n",
"- `sagemaker:InterContainerTrafficEncryption`\n",
"- `sagemaker:NetworkIsolation`\n",
"- `sagemaker:RootAccess`\n",
"- `sagemaker:VPCSecurityGroupIds`: should be set to a pre-created security group configured with the necessary controls\n",
"- `sagemaker:VPCSubnets`\n",
"- `sagemaker:VolumeKmsKey`\n",
"- `sagemaker:OutputKmsKey`\n",
"\n",
"### Example: Enforce usage of network isolation mode\n",
"To enforce usage of resource secure configuration, you can add the following policy to the SageMaker excecution role:\n",
"```json\n",
"{\n",
" \"Version\": \"2012-10-17\",\n",
" \"Statement\": [\n",
" {\n",
" \"Effect\": \"Deny\",\n",
" \"Action\": [\n",
" \"sagemaker:Create*\"\n",
" ],\n",
" \"Resource\": \"*\",\n",
" \"Condition\": {\n",
" \"StringNotEqualsIfExists\": {\n",
" \"sagemaker:NetworkIsolation\": \"true\"\n",
" }\n",
" }\n",
" }\n",
" ]\n",
"}\n",
"```\n",
"\n",
"The policy denies creation of any component (processing or traning job, endpoint, transform job) if the `sagemaker:NetworkIsolation` parameter is not set to `true`. This applies only to the components which have this parameter.\n",
"Similarly you can add validation of any other SageMaker service-specific condition keys."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Layout of the SageMaker ModelBuild Project Template\n",
"\n",
"The template provides a starting point for bringing your SageMaker Pipeline development to production.\n",
"\n",
"```\n",
"|-- buildspec.yml\n",
"|-- CONTRIBUTING.md\n",
"|-- dataset\n",
"| `-- abalone-dataset.csv\n",
"|-- pipelines\n",
"| |-- abalone\n",
"| | |-- evaluate.py\n",
"| | |-- __init__.py\n",
"| | |-- pipeline.py\n",
"| | `-- preprocess.py\n",
"| |-- get_pipeline_definition.py\n",
"| |-- __init__.py\n",
"| |-- run_pipeline.py\n",
"| |-- _utils.py\n",
"| `-- __version__.py\n",
"|-- README.md\n",
"|-- sagemaker-pipeline.ipynb\n",
"|-- sagemaker-pipelines-project.ipynb\n",
"|-- setup.cfg\n",
"|-- setup.py\n",
"|-- tests\n",
"| `-- test_pipelines.py\n",
"`-- tox.ini\n",
"```"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"A description of some of the artifacts is provided below:\n",
"
\n",
"Your codebuild execution instructions:\n",
"```\n",
"|-- buildspec.yml\n",
"```\n",
"
\n",
"Your pipeline artifacts, which includes a pipeline module defining the required `get_pipeline` method that returns an instance of a SageMaker pipeline, a preprocessing script that is used in feature engineering, and a model evaluation script to measure the Mean Squared Error of the model that's trained by the pipeline:\n",
"\n",
"```\n",
"|-- pipelines\n",
"| |-- abalone\n",
"| | |-- evaluate.py\n",
"| | |-- __init__.py\n",
"| | |-- pipeline.py\n",
"| | `-- preprocess.py\n",
"\n",
"```\n",
"
\n",
"Utility modules for getting pipeline definition jsons and running pipelines:\n",
"\n",
"```\n",
"|-- pipelines\n",
"| |-- get_pipeline_definition.py\n",
"| |-- __init__.py\n",
"| |-- run_pipeline.py\n",
"| |-- _utils.py\n",
"| `-- __version__.py\n",
"```\n",
"
\n",
"Python package artifacts:\n",
"```\n",
"|-- setup.cfg\n",
"|-- setup.py\n",
"```\n",
"
\n",
"A stubbed testing module for testing your pipeline as you develop:\n",
"```\n",
"|-- tests\n",
"| `-- test_pipelines.py\n",
"```\n",
"
\n",
"The `tox` testing framework configuration:\n",
"```\n",
"`-- tox.ini\n",
"```"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### A SageMaker Pipeline\n",
"\n",
"The pipeline that we create follows a typical Machine Learning Application pattern of pre-processing, training, evaluation, and conditional model registration and publication, if the quality of the model is sufficient.\n",
"\n",
""
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Get sagemaker package"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"if False:\n",
" !pip install --disable-pip-version-check -q sagemaker==2.47.1"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"if False:\n",
" !pip install -U sagemaker"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"!python --version"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import boto3\n",
"import sagemaker\n",
"import sagemaker.session\n",
"import json\n",
"\n",
"print(f\"SageMaker version: {sagemaker.__version__}\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Getting some constants\n",
"\n",
"We get some constants from the local execution environment.\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"sm = boto3.client(\"sagemaker\")\n",
"ssm = boto3.client(\"ssm\")\n",
"\n",
"def get_environment(project_name, ssm_params):\n",
" r = sm.describe_domain(\n",
" DomainId=sm.describe_project(\n",
" ProjectName=project_name\n",
" )[\"CreatedBy\"][\"DomainId\"]\n",
" )\n",
" del r[\"ResponseMetadata\"]\n",
" del r[\"CreationTime\"]\n",
" del r[\"LastModifiedTime\"]\n",
" r = {**r, **r[\"DefaultUserSettings\"]}\n",
" del r[\"DefaultUserSettings\"]\n",
"\n",
" i = {\n",
" **r,\n",
" **{t[\"Key\"]:t[\"Value\"] \n",
" for t in sm.list_tags(ResourceArn=r[\"DomainArn\"])[\"Tags\"] \n",
" if t[\"Key\"] in [\"EnvironmentName\", \"EnvironmentType\"]}\n",
" }\n",
"\n",
" for p in ssm_params:\n",
" try:\n",
" i[p[\"VariableName\"]] = ssm.get_parameter(Name=f\"{i['EnvironmentName']}-{i['EnvironmentType']}-{p['ParameterName']}\")[\"Parameter\"][\"Value\"]\n",
" except:\n",
" i[p[\"VariableName\"]] = \"\"\n",
"\n",
" return i"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"
Shutting down your kernel for this notebook to release resources.
\n", "\n", " \n", "" ] } ], "metadata": { "instance_type": "ml.t3.medium", "kernelspec": { "display_name": "Python 3.9.1 64-bit ('python@3.9')", "name": "python391jvsc74a57bd0ac2eaa0ea0ebeafcc7822e65e46aa9d4f966f30b695406963e145ea4a91cd4fc" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.1" }, "metadata": { "interpreter": { "hash": "ac2eaa0ea0ebeafcc7822e65e46aa9d4f966f30b695406963e145ea4a91cd4fc" } } }, "nbformat": 4, "nbformat_minor": 4 }