{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# SageMaker Pipelines Customer Churn Prediction\n", "\n", "Amazon SageMaker Model Building Pipelines offers machine learning (ML) application developers and operations engineers the ability to orchestrate SageMaker jobs and author reproducible ML pipelines. It also enables them to deploy custom-build models for inference in real-time with low latency, run offline inferences with Batch Transform, and track lineage of artifacts. They can institute sound operational practices in deploying and monitoring production workflows, deploying model artifacts, and tracking artifact lineage through a simple interface, adhering to safety and best practice paradigms for ML application development.\n", "\n", "The SageMaker Pipelines service supports a SageMaker Pipeline domain specific language (DSL), which is a declarative JSON specification. This DSL defines a directed acyclic graph (DAG) of pipeline parameters and SageMaker job steps. The SageMaker Python Software Developer Kit (SDK) streamlines the generation of the pipeline DSL using constructs that engineers and scientists are already familiar with." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## SageMaker Pipelines\n", "\n", "SageMaker Pipelines supports the following activities, which are demonstrated in this notebook:\n", "\n", "* Pipelines - A DAG of steps and conditions to orchestrate SageMaker jobs and resource creation.\n", "* Processing job steps - A simplified, managed experience on SageMaker to run data processing workloads, such as feature engineering, data validation, model evaluation, and model interpretation.\n", "* Training job steps - An iterative process that teaches a model to make predictions by presenting examples from a training dataset.\n", "* Conditional execution steps - A step that provides conditional execution of branches in a pipeline.\n", "* Register model steps - A step that creates a model package resource in the Model Registry that can be used to create deployable models in Amazon SageMaker.\n", "* Create model steps - A step that creates a model for use in transform steps or later publication as an endpoint.\n", "* Parametrized Pipeline executions - Enables variation in pipeline executions according to specified parameters." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Notebook Overview\n", "\n", "This notebook shows how to:\n", "\n", "* Define a set of Pipeline parameters that can be used to parametrize a SageMaker Pipeline.\n", "* Define a Processing step that extracts data from feature store to create the train, validation and test data sets.\n", "* Define a Training step that trains a model on the preprocessed train data set.\n", "* Define a Processing step that evaluates the trained model's performance on the test dataset.\n", "* Define a Create Model step that creates a model from the model artifacts used in training.\n", "* Define a Conditional step that measures a condition based on output from prior steps and conditionally executes other steps.\n", "* Define a Register Model step that creates a model package from the estimator and model artifacts used to train the model.\n", "* Define a Processing step that delopy a real-time endpoint from the trained model.\n", "* Define and create a Pipeline definition in a DAG, with the defined parameters and steps.\n", "* Start a Pipeline execution and wait for execution to complete.\n", "* Download the model evaluation report from the S3 bucket for examination.\n", "* Clean up all artifacts created in this sample code." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## A SageMaker Pipeline\n", "\n", "The pipeline that you create follows a typical machine learning (ML) application pattern of preprocessing, training, evaluation, model creation, model registration, batch transform and endpoint deployment.\n", "\n", "