{ "cells": [ { "cell_type": "markdown", "id": "471ccdb1", "metadata": {}, "source": [ "# Manage AutoML Workflows with AWS StepFunctions and AutoGluon on Amazon SageMaker\n", "\n", "This notebook provides a tutorial on how to run ML experiments using AWS StepFunctions.\n", "The state machine is able to execute different workloads based on its runtime input parameters.\n", "\n", "We provide here a subset of the most common use-cases:\n", "\n", "1) [Train and evaluate a ML model](#train-evaluate)\n", "\n", "2) [Run batch predictions with pre-trained AutoGluon Model](#pretrained-batch)\n", "\n", "3) [Train and deploy model to SageMaker Endpoint](#train-endpoint)\n", "\n", "NB:\n", "- Please select `conda_python3` as Notebook Kernel.\n", "- Please consider using `Jupyter` over `Jupyter Lab` to avoid potential visualizatoin issues with `stepfunctions` library" ] }, { "cell_type": "markdown", "id": "83b839de", "metadata": {}, "source": [ "## Configure Environment\n", "\n", "Let's start with install AWS StepFunctions Python SDK" ] }, { "cell_type": "code", "execution_count": 40, "id": "d41b42d3", "metadata": {}, "outputs": [], "source": [ "!pip install -q stepfunctions==2.2.0" ] }, { "cell_type": "markdown", "id": "15b1d814", "metadata": {}, "source": [ "Import libraries" ] }, { "cell_type": "code", "execution_count": 1, "id": "6a61c75e", "metadata": {}, "outputs": [], "source": [ "from stepfunctions.workflow import Workflow\n", "from stepfunctions.inputs import ExecutionInput\n", "import json\n", "from time import gmtime, strftime, sleep\n", "from IPython.display import display, clear_output\n", "import pandas as pd\n", "from sklearn.model_selection import train_test_split\n", "import sagemaker\n", "import boto3\n", "import os\n", "\n", "INPUT_PARAMS_DIR = \"./input/\"\n", "DATA_DIR = \"./data/\"\n", "PREFIX = \"automl-data\"\n", "\n", "session = sagemaker.Session()\n", "sagemaker_bucket = session.default_bucket()" ] }, { "cell_type": "markdown", "id": "46c1230f", "metadata": {}, "source": [ "### Download sample data\n", "\n", "__NB: Replace data s3 paths and jump to next section if you would like to use a custom dataset__\n", "\n", "Let's download data" ] }, { "cell_type": "code", "execution_count": 158, "id": "0ac07246", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "download: s3://sagemaker-sample-files/datasets/tabular/synthetic/churn.txt to data/churn.csv\n" ] } ], "source": [ "!aws s3 cp s3://sagemaker-sample-files/datasets/tabular/synthetic/churn.txt {DATA_DIR}/churn.csv" ] }, { "cell_type": "code", "execution_count": 55, "id": "db5d48cf", "metadata": {}, "outputs": [], "source": [ "df = pd.read_csv(f\"{DATA_DIR}/churn.csv\")" ] }, { "cell_type": "code", "execution_count": 36, "id": "d690df1b", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "['Account Length', 'Area Code']" ] }, "execution_count": 36, "metadata": {}, "output_type": "execute_result" } ], "source": [ "[feature for feature in df.columns if 'A' in feature]" ] }, { "cell_type": "markdown", "id": "ef4bfb69", "metadata": {}, "source": [ "Holdout split in train/test" ] }, { "cell_type": "code", "execution_count": 56, "id": "65b4c30c", "metadata": {}, "outputs": [], "source": [ "train, test = train_test_split(df, test_size=.2)" ] }, { "cell_type": "markdown", "id": "0c3e30fa", "metadata": {}, "source": [ "Save file locally before upload" ] }, { "cell_type": "code", "execution_count": 57, "id": "039f47b5", "metadata": {}, "outputs": [], "source": [ "train.to_csv(f\"{DATA_DIR}/train.csv\", index=False)\n", "test.to_csv(f\"{DATA_DIR}/test.csv\", index=False)\n", "test.drop('Churn?', axis=1).to_csv(f\"{DATA_DIR}/test_batch.csv\", index=False, header=False)" ] }, { "cell_type": "markdown", "id": "c0fc2051", "metadata": {}, "source": [ "Upload files to S3 for training" ] }, { "cell_type": "code", "execution_count": 58, "id": "4458ce16", "metadata": {}, "outputs": [], "source": [ "boto3.Session().resource('s3').Bucket(sagemaker_bucket).Object(os.path.join(PREFIX, 'train.csv')).upload_file(f\"{DATA_DIR}/train.csv\")\n", "boto3.Session().resource('s3').Bucket(sagemaker_bucket).Object(os.path.join(PREFIX, 'test.csv')).upload_file(f\"{DATA_DIR}/test.csv\")\n", "boto3.Session().resource('s3').Bucket(sagemaker_bucket).Object(os.path.join(PREFIX, 'test_batch.csv')).upload_file(f\"{DATA_DIR}/test_batch.csv\")\n", "\n", "train_uri = f\"s3://{sagemaker_bucket}/{PREFIX}/train.csv\"\n", "test_uri = f\"s3://{sagemaker_bucket}/{PREFIX}/test.csv\"\n", "test_batch_uri = f\"s3://{sagemaker_bucket}/{PREFIX}/test_batch.csv\"\n", "model_output_prefix = f\"s3://{sagemaker_bucket}/{PREFIX}/output/\"" ] }, { "cell_type": "markdown", "id": "cf55c719", "metadata": {}, "source": [ "Define resource ARNs \n", "\n", "__TODO find a way to retrieve ARNs automatically (maybe with Parameter Store)__" ] }, { "cell_type": "code", "execution_count": 179, "id": "00e2bc4a", "metadata": {}, "outputs": [], "source": [ "main_machine_arn = \"arn:aws:states:eu-west-1:039573824519:stateMachine:MainStateMachineD8FB90C3-GcOHBmyXA0SP\"\n", "train_machine_arn = \"arn:aws:states:eu-west-1:039573824519:stateMachine:TrainStateMachineAA65CDDB-ovlcReYQjVFQ\"\n", "deploy_machine_arn = \"arn:aws:states:eu-west-1:039573824519:stateMachine:DeployStateMachine357A3963-KbeWPmnhskxz\"" ] }, { "cell_type": "markdown", "id": "62210ae2", "metadata": {}, "source": [ "Attach SDK to state machines" ] }, { "cell_type": "code", "execution_count": 180, "id": "2f9f9d12", "metadata": {}, "outputs": [], "source": [ "main_workflow = Workflow.attach(main_machine_arn)\n", "train_workflow = Workflow.attach(train_machine_arn)\n", "deploy_workflow = Workflow.attach(deploy_machine_arn)" ] }, { "cell_type": "markdown", "id": "032de21d", "metadata": {}, "source": [ "### Main State Machine\n", "\n", "This state machine is in charge of orchestrating the execution and kickstaring both `Train` and `Deploy` state machines if required.\n", "\n", "It includes:\n", "- Training AutoGluon model (through external state machine)\n", "- Evaluating trained model\n", "- Deploying model to SageMaker Endpoint or executes SageMaker Batch Transform Job (through external state machine)" ] }, { "cell_type": "code", "execution_count": 4, "id": "f211c6ac", "metadata": {}, "outputs": [ { "data": { "text/html": [ "Workflow: arn:aws:states:eu-west-1:039573824519:stateMachine:MainStateMachineD8FB90C3-GcOHBmyXA0SP" ], "text/plain": [ "Workflow(name='MainStateMachineD8FB90C3-GcOHBmyXA0SP', role='arn:aws:iam::039573824519:role/CdkAgSfStack-RoleMainStateMachine7BC2742D-WIE2TQE2CHAT', state_machine_arn='arn:aws:states:eu-west-1:039573824519:stateMachine:MainStateMachineD8FB90C3-GcOHBmyXA0SP')" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "main_workflow" ] }, { "cell_type": "code", "execution_count": 5, "id": "d140816a", "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "