{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "## Use Sagemaker Pipelines To Orchestrate End To End Cross Validation Model Training Workflow\n", "\n", "Amazon SageMaker Pipelines simplifies ML workflows orchestration across each step of the ML process, from exploration data analysis, preprocessing to model training and model deployment. \n", "With Sagemaker Pipelines, you can develop a consistent, reusable workflow that integrates with CI/CD pipeline for improved quality and reduced errors throughout development lifecycle.\n", "\n", "## SageMaker Pipelines\n", "An ML workflow built using Sagemaker Pipeline is made up of a series of Steps defined as a directed acryclic graph (DAG). The pipeline is expressed in JSON definition that captures relationships between the steps of your pipeline. Here's a terminology used in Sagemaker Pipeline for defining an ML workflow.\n", "\n", "* Pipelines - Top level definition of a pipeline. It encapsulates name, parameters, and steps. A pipeline is scoped within an account and region. \n", "* Parameters - Parameters are defined in the pipeline definition. It introduces variables that can be provided to the pipeline at execution time. Parameters support string, float and integer types. \n", "* Pipeline Steps - Defines the actions that the pipeline takes and the relationships between steps using properties. Sagemaker Pipelines support the following step types: Processing, Training, Transform, CreateModel, RegisterModel, Condition, Callback." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Notebook Overview\n", "This notebook implements a complete Cross Validation ML model workflow using a custom built docker image, HyperparameterTuner for automatic hyperparameter optimization, \n", "SKLearn framework for K fold split and model training. The workflow is defined orchestrated using Sagemaker Pipelines. \n", "Here are the main steps involved the end to end workflow:\n", " \n", "