{ "cells": [ { "cell_type": "markdown", "id": "471ccdb1", "metadata": {}, "source": [ "# Manage AutoML Workflows with AWS StepFunctions and AutoGluon on Amazon SageMaker\n", "\n", "This notebook provides a tutorial on how to run ML experiments using AWS StepFunctions.\n", "The state machine is able to execute different workloads based on its runtime input parameters.\n", "\n", "We provide here a subset of the most common use-cases:\n", "\n", "1) [Train and evaluate a ML model](#train-evaluate)\n", "\n", "2) [Run batch predictions with pre-trained AutoGluon Model](#pretrained-batch)\n", "\n", "3) [Train and deploy model to SageMaker Endpoint](#train-endpoint)\n", "\n", "NB:\n", "- Please select `conda_python3` as Notebook Kernel.\n", "- Please consider using `Jupyter` over `Jupyter Lab` to avoid potential visualizatoin issues with `stepfunctions` library" ] }, { "cell_type": "markdown", "id": "83b839de", "metadata": {}, "source": [ "## Configure Environment\n", "\n", "Let's start with install AWS StepFunctions Python SDK" ] }, { "cell_type": "code", "execution_count": 40, "id": "d41b42d3", "metadata": {}, "outputs": [], "source": [ "!pip install -q stepfunctions==2.2.0" ] }, { "cell_type": "markdown", "id": "15b1d814", "metadata": {}, "source": [ "Import libraries" ] }, { "cell_type": "code", "execution_count": 1, "id": "6a61c75e", "metadata": {}, "outputs": [], "source": [ "from stepfunctions.workflow import Workflow\n", "from stepfunctions.inputs import ExecutionInput\n", "import json\n", "from time import gmtime, strftime, sleep\n", "from IPython.display import display, clear_output\n", "import pandas as pd\n", "from sklearn.model_selection import train_test_split\n", "import sagemaker\n", "import boto3\n", "import os\n", "\n", "INPUT_PARAMS_DIR = \"./input/\"\n", "DATA_DIR = \"./data/\"\n", "PREFIX = \"automl-data\"\n", "\n", "session = sagemaker.Session()\n", "sagemaker_bucket = session.default_bucket()" ] }, { "cell_type": "markdown", "id": "46c1230f", "metadata": {}, "source": [ "### Download sample data\n", "\n", "__NB: Replace data s3 paths and jump to next section if you would like to use a custom dataset__\n", "\n", "Let's download data" ] }, { "cell_type": "code", "execution_count": 158, "id": "0ac07246", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "download: s3://sagemaker-sample-files/datasets/tabular/synthetic/churn.txt to data/churn.csv\n" ] } ], "source": [ "!aws s3 cp s3://sagemaker-sample-files/datasets/tabular/synthetic/churn.txt {DATA_DIR}/churn.csv" ] }, { "cell_type": "code", "execution_count": 55, "id": "db5d48cf", "metadata": {}, "outputs": [], "source": [ "df = pd.read_csv(f\"{DATA_DIR}/churn.csv\")" ] }, { "cell_type": "code", "execution_count": 36, "id": "d690df1b", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "['Account Length', 'Area Code']" ] }, "execution_count": 36, "metadata": {}, "output_type": "execute_result" } ], "source": [ "[feature for feature in df.columns if 'A' in feature]" ] }, { "cell_type": "markdown", "id": "ef4bfb69", "metadata": {}, "source": [ "Holdout split in train/test" ] }, { "cell_type": "code", "execution_count": 56, "id": "65b4c30c", "metadata": {}, "outputs": [], "source": [ "train, test = train_test_split(df, test_size=.2)" ] }, { "cell_type": "markdown", "id": "0c3e30fa", "metadata": {}, "source": [ "Save file locally before upload" ] }, { "cell_type": "code", "execution_count": 57, "id": "039f47b5", "metadata": {}, "outputs": [], "source": [ "train.to_csv(f\"{DATA_DIR}/train.csv\", index=False)\n", "test.to_csv(f\"{DATA_DIR}/test.csv\", index=False)\n", "test.drop('Churn?', axis=1).to_csv(f\"{DATA_DIR}/test_batch.csv\", index=False, header=False)" ] }, { "cell_type": "markdown", "id": "c0fc2051", "metadata": {}, "source": [ "Upload files to S3 for training" ] }, { "cell_type": "code", "execution_count": 58, "id": "4458ce16", "metadata": {}, "outputs": [], "source": [ "boto3.Session().resource('s3').Bucket(sagemaker_bucket).Object(os.path.join(PREFIX, 'train.csv')).upload_file(f\"{DATA_DIR}/train.csv\")\n", "boto3.Session().resource('s3').Bucket(sagemaker_bucket).Object(os.path.join(PREFIX, 'test.csv')).upload_file(f\"{DATA_DIR}/test.csv\")\n", "boto3.Session().resource('s3').Bucket(sagemaker_bucket).Object(os.path.join(PREFIX, 'test_batch.csv')).upload_file(f\"{DATA_DIR}/test_batch.csv\")\n", "\n", "train_uri = f\"s3://{sagemaker_bucket}/{PREFIX}/train.csv\"\n", "test_uri = f\"s3://{sagemaker_bucket}/{PREFIX}/test.csv\"\n", "test_batch_uri = f\"s3://{sagemaker_bucket}/{PREFIX}/test_batch.csv\"\n", "model_output_prefix = f\"s3://{sagemaker_bucket}/{PREFIX}/output/\"" ] }, { "cell_type": "markdown", "id": "cf55c719", "metadata": {}, "source": [ "Define resource ARNs \n", "\n", "__TODO find a way to retrieve ARNs automatically (maybe with Parameter Store)__" ] }, { "cell_type": "code", "execution_count": 179, "id": "00e2bc4a", "metadata": {}, "outputs": [], "source": [ "main_machine_arn = \"arn:aws:states:eu-west-1:039573824519:stateMachine:MainStateMachineD8FB90C3-GcOHBmyXA0SP\"\n", "train_machine_arn = \"arn:aws:states:eu-west-1:039573824519:stateMachine:TrainStateMachineAA65CDDB-ovlcReYQjVFQ\"\n", "deploy_machine_arn = \"arn:aws:states:eu-west-1:039573824519:stateMachine:DeployStateMachine357A3963-KbeWPmnhskxz\"" ] }, { "cell_type": "markdown", "id": "62210ae2", "metadata": {}, "source": [ "Attach SDK to state machines" ] }, { "cell_type": "code", "execution_count": 180, "id": "2f9f9d12", "metadata": {}, "outputs": [], "source": [ "main_workflow = Workflow.attach(main_machine_arn)\n", "train_workflow = Workflow.attach(train_machine_arn)\n", "deploy_workflow = Workflow.attach(deploy_machine_arn)" ] }, { "cell_type": "markdown", "id": "032de21d", "metadata": {}, "source": [ "### Main State Machine\n", "\n", "This state machine is in charge of orchestrating the execution and kickstaring both `Train` and `Deploy` state machines if required.\n", "\n", "It includes:\n", "- Training AutoGluon model (through external state machine)\n", "- Evaluating trained model\n", "- Deploying model to SageMaker Endpoint or executes SageMaker Batch Transform Job (through external state machine)" ] }, { "cell_type": "code", "execution_count": 4, "id": "f211c6ac", "metadata": {}, "outputs": [ { "data": { "text/html": [ "Workflow: arn:aws:states:eu-west-1:039573824519:stateMachine:MainStateMachineD8FB90C3-GcOHBmyXA0SP" ], "text/plain": [ "Workflow(name='MainStateMachineD8FB90C3-GcOHBmyXA0SP', role='arn:aws:iam::039573824519:role/CdkAgSfStack-RoleMainStateMachine7BC2742D-WIE2TQE2CHAT', state_machine_arn='arn:aws:states:eu-west-1:039573824519:stateMachine:MainStateMachineD8FB90C3-GcOHBmyXA0SP')" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "main_workflow" ] }, { "cell_type": "code", "execution_count": 5, "id": "d140816a", "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "
\n", " \n", " \n", " \n", "
\n", "\n", "\n" ], "text/plain": [ "" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "main_workflow.render_graph()" ] }, { "cell_type": "markdown", "id": "9e124848", "metadata": {}, "source": [ "### Train State Machine\n", "\n", "This state machine performs to main tasks:\n", "\n", "1) Starts a new SageMaker Training Job using AutoGluon tabular\n", "\n", "2) Save trained model to SageMaker Model Registry" ] }, { "cell_type": "code", "execution_count": 6, "id": "865ef992", "metadata": {}, "outputs": [ { "data": { "text/html": [ "Workflow: arn:aws:states:eu-west-1:039573824519:stateMachine:TrainStateMachineAA65CDDB-ovlcReYQjVFQ" ], "text/plain": [ "Workflow(name='TrainStateMachineAA65CDDB-ovlcReYQjVFQ', role='arn:aws:iam::039573824519:role/CdkAgSfStack-RoleTrainStateMachine78727C93-18ATNL4AB5UC4', state_machine_arn='arn:aws:states:eu-west-1:039573824519:stateMachine:TrainStateMachineAA65CDDB-ovlcReYQjVFQ')" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "train_workflow" ] }, { "cell_type": "code", "execution_count": 7, "id": "135a022b", "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "
\n", " \n", " \n", " \n", "
\n", "\n", "\n" ], "text/plain": [ "" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "train_workflow.render_graph()" ] }, { "cell_type": "markdown", "id": "d4cccc47", "metadata": {}, "source": [ "### Deploy State Machine\n", "\n", "This state machine performs:\n", "\n", "- Online deployment, by spinning up a new SageMaker Endpoint and waiting for it to became available\n", "\n", "- Offline deployment, by running a SageMaker Batch Transform Job and waiting for its completion" ] }, { "cell_type": "code", "execution_count": 8, "id": "1e761230", "metadata": {}, "outputs": [ { "data": { "text/html": [ "Workflow: arn:aws:states:eu-west-1:039573824519:stateMachine:DeployStateMachine357A3963-KbeWPmnhskxz" ], "text/plain": [ "Workflow(name='DeployStateMachine357A3963-KbeWPmnhskxz', role='arn:aws:iam::039573824519:role/CdkAgSfStack-RoleDeployStateMachine1E6B6BE7-1NZ12H0ZDTH1F', state_machine_arn='arn:aws:states:eu-west-1:039573824519:stateMachine:DeployStateMachine357A3963-KbeWPmnhskxz')" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "deploy_workflow" ] }, { "cell_type": "code", "execution_count": 9, "id": "56482832", "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "
\n", " \n", " \n", " \n", "
\n", "\n", "\n" ], "text/plain": [ "" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "deploy_workflow.render_graph()" ] }, { "cell_type": "markdown", "id": "abcffe85", "metadata": {}, "source": [ "\n", "## 1) Train and Evaluate ML Model\n", "\n", "Sample execution that consists of __training__ a new AutoGluon model and __evaluating__ its perfomance on a fresh test set.\n", "\n", "Let's start by loading the input parameters:" ] }, { "cell_type": "code", "execution_count": 37, "id": "e443c827", "metadata": {}, "outputs": [], "source": [ "input_params_path = f\"{INPUT_PARAMS_DIR}train_evaluate_input.json\"\n", "with open(input_params_path, \"r\") as input_file:\n", " input_params = json.load(input_file)" ] }, { "cell_type": "markdown", "id": "f37d9d8e", "metadata": {}, "source": [ "Replace data placeholder with train and test s3 data uri" ] }, { "cell_type": "code", "execution_count": 38, "id": "48f242e5", "metadata": {}, "outputs": [], "source": [ "input_params['Parameters']['Train']['TrainDataPath'] = train_uri\n", "input_params['Parameters']['Train']['TestDataPath'] = test_uri\n", "input_params['Parameters']['Train'][\"TrainingOutput\"] = model_output_prefix" ] }, { "cell_type": "markdown", "id": "2ad1e4c9", "metadata": {}, "source": [ "Set target label and [problem type (options: `binary`, `multiclass`, `regression`, `quantile`)](https://auto.gluon.ai/stable/api/autogluon.task.html#autogluon.tabular.TabularPredictor)" ] }, { "cell_type": "code", "execution_count": 39, "id": "ced0042a", "metadata": {}, "outputs": [], "source": [ "# Double JSON encoding is needed to successfully forward parameters from State Machine to SageMaker Training Job\n", "\n", "ag_init_params = json.loads(json.loads(input_params['Parameters']['Train'][\"InitArgs\"]))\n", "ag_init_params['problem_type'] = 'binary'\n", "ag_init_params['label'] = 'Churn?'\n", "\n", "ag_fit_params = json.loads(json.loads(input_params['Parameters']['Train']['FitArgs']))\n", "ag_fit_params['presets'] = 'medium_quality_faster_train'\n", "\n", "input_params['Parameters']['Train'][\"InitArgs\"] = json.dumps(json.dumps(ag_init_params))" ] }, { "cell_type": "markdown", "id": "31264d87", "metadata": {}, "source": [ "Let's explore the parameters structure" ] }, { "cell_type": "code", "execution_count": 40, "id": "3e7ca569", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Parameters sections: ['Flow', 'Train', 'Evaluation']\n" ] } ], "source": [ "print(f\"Parameters sections: {list(input_params['Parameters'].keys())}\")" ] }, { "cell_type": "markdown", "id": "d6f862f4", "metadata": {}, "source": [ "If we inspect the `Flow` section, we can see which stages of the pipeline have been selected." ] }, { "cell_type": "code", "execution_count": 41, "id": "f9e60aed", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Selected steps: ['Train', 'Evaluate']\n", "Unselected steps: ['Deploy']\n" ] } ], "source": [ "print('Selected steps:', [k for k, v in input_params['Parameters']['Flow'].items() if v])\n", "print('Unselected steps:', [k for k, v in input_params['Parameters']['Flow'].items() if not v])" ] }, { "cell_type": "code", "execution_count": 42, "id": "5a16ff36", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{'Parameters': {'Flow': {'Train': True, 'Evaluate': True, 'Deploy': False},\n", " 'Train': {'TrainDataPath': 's3://sagemaker-eu-west-1-039573824519/automl-data/train.csv',\n", " 'TestDataPath': 's3://sagemaker-eu-west-1-039573824519/automl-data/test.csv',\n", " 'TrainingOutput': 's3://stepfunctions-mlops/output/training/',\n", " 'InstanceCount': 1,\n", " 'InstanceType': 'ml.m5.2xlarge',\n", " 'FitArgs': '\"{\\\\\"presets\\\\\": \\\\\"medium_quality_faster_train\\\\\"}\"',\n", " 'InitArgs': '\"{\\\\\"label\\\\\": \\\\\"Churn?\\\\\", \\\\\"problem_type\\\\\": \\\\\"binary\\\\\"}\"',\n", " 'FeatureImportance': 'true',\n", " 'Leaderboard': 'true'},\n", " 'Evaluation': {'Threshold': '0.6', 'Metric': 'accuracy'}}}" ] }, "execution_count": 42, "metadata": {}, "output_type": "execute_result" } ], "source": [ "input_params" ] }, { "cell_type": "markdown", "id": "d6983ab4", "metadata": {}, "source": [ "Let's start the execution and wait for completion" ] }, { "cell_type": "code", "execution_count": 124, "id": "ba00e745", "metadata": {}, "outputs": [], "source": [ "train_execution_name = f\"train-evaluate-{strftime('%Y-%m-%d-%H-%M-%S', gmtime())}\"\n", "train_execution = main_workflow.execute(name=train_execution_name, inputs=input_params)" ] }, { "cell_type": "code", "execution_count": 44, "id": "30b7d1b0", "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "
\n", " \n", " \n", "
\n", "
    \n", "
  • \n", "
    \n", " Success\n", "
  • \n", "
  • \n", "
    \n", " Failed\n", "
  • \n", "
  • \n", "
    \n", " Cancelled\n", "
  • \n", "
  • \n", "
    \n", " In Progress\n", "
  • \n", "
  • \n", "
    \n", " Caught Error\n", "
  • \n", "
\n", "
\n", "\n", " \n", " Inspect in AWS Step Functions \n", "
\n", "\n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "Execution terminated, status: SUCCEEDED\n" ] } ], "source": [ "status = train_execution.describe()['status']\n", "while status == 'RUNNING':\n", " clear_output(wait=True)\n", " display(train_execution.render_progress())\n", " sleep(60)\n", " status = train_execution.describe()['status']\n", "\n", "clear_output(wait=True)\n", "display(train_execution.render_progress())\n", "print(f\"Execution terminated, status: {status}\")" ] }, { "cell_type": "markdown", "id": "b4d605b3", "metadata": {}, "source": [ "Let's explore the execution output.\n", "\n", "Here's some snippet from Train state machine output:" ] }, { "cell_type": "code", "execution_count": 45, "id": "0983484e", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Training Job Name: ef208ec4-04fd-45af-bb57-885d618daee1\n", "Model artifacts: s3://stepfunctions-mlops/output/training/ef208ec4-04fd-45af-bb57-885d618daee1/output/model.tar.gz\n" ] } ], "source": [ "training_output = train_execution.get_output()['TrainStepsOutput']['Output']['TrainingOutput']\n", "\n", "print(f\"Training Job Name: {training_output['TrainingJobName']}\")\n", "print(f\"Model artifacts: {training_output['ModelArtifacts']['S3ModelArtifacts']}\")" ] }, { "cell_type": "markdown", "id": "3ba51632", "metadata": {}, "source": [ "Here's instead we have the evaluation step output" ] }, { "cell_type": "code", "execution_count": 46, "id": "a4521d71", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Has model outperfomed the threshold? True\n", "\n", "Scores\n", "\taccuracy: 0.963\n", "\tbalanced_accuracy: 0.9624517803708555\n", "\tmcc: 0.9261342082184997\n", "\troc_auc: 0.9893606367594807\n", "\tf1: 0.9647954329210275\n", "\tprecision: 0.9530075187969925\n", "\trecall: 0.976878612716763\n" ] } ], "source": [ "evaluation_output = train_execution.get_output()['ModelValidationLambdaOutput']['Payload']\n", "print(f\"Has model outperfomed the threshold? {evaluation_output['IsValid']}\")\n", "print(\"\\nScores\")\n", "for metric, value in evaluation_output['Scores'].items():\n", " print(f\"\\t{metric}: {value}\")" ] }, { "cell_type": "markdown", "id": "ff390f37", "metadata": {}, "source": [ "\n", "## 2) Run batch predictions with pre-trained AutoGluon Model\n", "\n", "Load an already trained model and run batch inference on new set.\n", "\n", "Let's load a new set of input parameters:" ] }, { "cell_type": "code", "execution_count": 47, "id": "dc1eec80", "metadata": {}, "outputs": [], "source": [ "input_params_path = f\"{INPUT_PARAMS_DIR}pretrained_batch_input.json\"\n", "with open(input_params_path, \"r\") as input_file:\n", " input_params = json.load(input_file)" ] }, { "cell_type": "markdown", "id": "86096ef1", "metadata": {}, "source": [ "Let's explore the parameters section" ] }, { "cell_type": "code", "execution_count": 48, "id": "9ba215b9", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Parameters sections: ['Flow', 'PretrainedModel', 'Deploy']\n" ] } ], "source": [ "print(f\"Parameters sections: {list(input_params['Parameters'].keys())}\")" ] }, { "cell_type": "markdown", "id": "c6e3fb2a", "metadata": {}, "source": [ "Fill the value of `PretrainedModel` with model trained at the previous step" ] }, { "cell_type": "code", "execution_count": 49, "id": "0f17c7c4", "metadata": {}, "outputs": [], "source": [ "input_params['Parameters']['PretrainedModel']['Name'] = train_execution.get_output()['TrainStepsOutput']['Output']['TrainingOutput']['TrainingJobName']\n", "input_params['Parameters']['Deploy']['BatchOutputDataPath'] = f\"s3://{sagemaker_bucket}/batch-out/\"\n", "input_params['Parameters']['Deploy']['BatchInputDataPath'] = test_batch_uri" ] }, { "cell_type": "code", "execution_count": 51, "id": "781308f6", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Selected steps: ['Deploy']\n", "Unselected steps: ['Train', 'Evaluate']\n" ] } ], "source": [ "print('Selected steps:', [k for k, v in input_params['Parameters']['Flow'].items() if v])\n", "print('Unselected steps:', [k for k, v in input_params['Parameters']['Flow'].items() if not v])" ] }, { "cell_type": "code", "execution_count": 59, "id": "6fb59418", "metadata": {}, "outputs": [], "source": [ "batch_execution_name = f\"pretrained-batch-{strftime('%Y-%m-%d-%H-%M-%S', gmtime())}\"\n", "batch_execution = main_workflow.execute(name=batch_execution_name, inputs=input_params)" ] }, { "cell_type": "code", "execution_count": null, "id": "b262a977", "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "
\n", " \n", " \n", "
\n", "
    \n", "
  • \n", "
    \n", " Success\n", "
  • \n", "
  • \n", "
    \n", " Failed\n", "
  • \n", "
  • \n", "
    \n", " Cancelled\n", "
  • \n", "
  • \n", "
    \n", " In Progress\n", "
  • \n", "
  • \n", "
    \n", " Caught Error\n", "
  • \n", "
\n", "
\n", "\n", " \n", " Inspect in AWS Step Functions \n", "
\n", "\n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "status = batch_execution.describe()['status']\n", "while status == 'RUNNING':\n", " clear_output(wait=True)\n", " display(batch_execution.render_progress())\n", " sleep(60)\n", " status = batch_execution.describe()['status']\n", "\n", "clear_output(wait=True)\n", "display(batch_execution.render_progress())\n", "print(f\"Execution terminated, status: {status}\")" ] }, { "cell_type": "markdown", "id": "ba645120", "metadata": {}, "source": [ "Let's explore brifly the predictions saved on S3" ] }, { "cell_type": "code", "execution_count": 61, "id": "6d5c8c17", "metadata": {}, "outputs": [], "source": [ "batch_output_uri = batch_execution.get_output()['Parameters']['Deploy']['BatchOutputDataPath']\n", "batch_filename = 'batch_out.json'" ] }, { "cell_type": "code", "execution_count": 62, "id": "370fd9d8", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "download: s3://sagemaker-eu-west-1-039573824519/batch-out/test_batch.csv.out to ./batch_out.json\r\n" ] } ], "source": [ "!aws s3 cp {batch_output_uri}test_batch.csv.out {batch_filename}" ] }, { "cell_type": "code", "execution_count": 63, "id": "fe3aaf6b", "metadata": {}, "outputs": [], "source": [ "with open(batch_filename, \"r\") as input_file:\n", " batch_predictions = json.load(input_file)" ] }, { "cell_type": "code", "execution_count": 64, "id": "1556344c", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[['True.', 0.04348301887512207, 0.9565169811248779],\n", " ['True.', 0.009195685386657715, 0.9908043146133423],\n", " ['True.', 0.0015034675598144531, 0.9984965324401855],\n", " ['False.', 0.9803850054740906, 0.019614998251199722],\n", " ['True.', 0.018985986709594727, 0.9810140132904053],\n", " ['False.', 0.9465275406837463, 0.053472474217414856],\n", " ['True.', 0.06416893005371094, 0.9358310699462891],\n", " ['False.', 0.9860401749610901, 0.013959839008748531],\n", " ['False.', 0.9652785062789917, 0.03472146764397621],\n", " ['True.', 0.012667417526245117, 0.9873325824737549]]" ] }, "execution_count": 64, "metadata": {}, "output_type": "execute_result" } ], "source": [ "batch_predictions[:10]" ] }, { "cell_type": "markdown", "id": "2d00618a", "metadata": {}, "source": [ "Clean up prediction file from local memory" ] }, { "cell_type": "code", "execution_count": 65, "id": "d13846e2", "metadata": {}, "outputs": [], "source": [ "!rm {batch_filename}" ] }, { "cell_type": "markdown", "id": "0d417f93", "metadata": {}, "source": [ "\n", "## 3) Train, evaluate and deploy a new AutoGluon model for online prediction\n", "\n", "Let's bring everything together and train, evaluate and deploy model to SageMaker Endpoint.\n", "\n", "Let's load a new set of input parameters:" ] }, { "cell_type": "code", "execution_count": 181, "id": "2adacfa6", "metadata": {}, "outputs": [], "source": [ "input_params_path = f\"{INPUT_PARAMS_DIR}train_endpoint_input.json\"\n", "with open(input_params_path, \"r\") as input_file:\n", " input_params = json.load(input_file)" ] }, { "cell_type": "markdown", "id": "75c4e7e8", "metadata": {}, "source": [ "Let's explore the parameters structure" ] }, { "cell_type": "code", "execution_count": 182, "id": "a7045cb9", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Parameters sections: ['Flow', 'Train', 'Evaluation', 'Deploy']\n" ] } ], "source": [ "print(f\"Parameters sections: {list(input_params['Parameters'].keys())}\")" ] }, { "cell_type": "markdown", "id": "4788e718", "metadata": {}, "source": [ "Let's now replace placeholders in input parameters" ] }, { "cell_type": "code", "execution_count": 183, "id": "c78c03ee", "metadata": {}, "outputs": [], "source": [ "input_params['Parameters']['Train']['TrainDataPath'] = train_uri\n", "input_params['Parameters']['Train']['TestDataPath'] = test_uri\n", "input_params['Parameters']['Train'][\"TrainingOutput\"] = model_output_prefix\n", "\n", "# Double JSON encoding is needed to successfully forward parameters from State Machine to SageMaker Training Job\n", "ag_init_params = json.loads(json.loads(input_params['Parameters']['Train'][\"InitArgs\"]))\n", "ag_init_params['problem_type'] = 'binary'\n", "ag_init_params['label'] = 'Churn?'\n", "\n", "ag_fit_params = json.loads(json.loads(input_params['Parameters']['Train']['FitArgs']))\n", "ag_fit_params['presets'] = 'optimize_for_deployment'\n", "\n", "input_params['Parameters']['Train'][\"InitArgs\"] = json.dumps(json.dumps(ag_init_params))" ] }, { "cell_type": "code", "execution_count": 184, "id": "1b3c57a5", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Selected steps: ['Train', 'Evaluate', 'Deploy']\n", "Unselected steps: []\n" ] } ], "source": [ "print('Selected steps:', [k for k, v in input_params['Parameters']['Flow'].items() if v])\n", "print('Unselected steps:', [k for k, v in input_params['Parameters']['Flow'].items() if not v])" ] }, { "cell_type": "code", "execution_count": 185, "id": "87730589", "metadata": {}, "outputs": [], "source": [ "endpoint_execution_name = f\"train-endpoint-{strftime('%Y-%m-%d-%H-%M-%S', gmtime())}\"\n", "endpoint_execution = main_workflow.execute(name=endpoint_execution_name, inputs=input_params)" ] }, { "cell_type": "code", "execution_count": 186, "id": "695fbf78", "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "
\n", " \n", " \n", "
\n", "
    \n", "
  • \n", "
    \n", " Success\n", "
  • \n", "
  • \n", "
    \n", " Failed\n", "
  • \n", "
  • \n", "
    \n", " Cancelled\n", "
  • \n", "
  • \n", "
    \n", " In Progress\n", "
  • \n", "
  • \n", "
    \n", " Caught Error\n", "
  • \n", "
\n", "
\n", "\n", " \n", " Inspect in AWS Step Functions \n", "
\n", "\n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "Execution terminated, status: SUCCEEDED\n" ] } ], "source": [ "status = endpoint_execution.describe()['status']\n", "while status == 'RUNNING':\n", " clear_output(wait=True)\n", " display(endpoint_execution.render_progress())\n", " sleep(60)\n", " status = endpoint_execution.describe()['status']\n", "\n", "clear_output(wait=True)\n", "display(endpoint_execution.render_progress())\n", "print(f\"Execution terminated, status: {status}\")" ] }, { "cell_type": "markdown", "id": "055ee24e", "metadata": {}, "source": [ "Let's test the endpoint" ] }, { "cell_type": "code", "execution_count": 187, "id": "cfd784c4", "metadata": {}, "outputs": [], "source": [ "endpoint_arn = endpoint_execution.get_output()['DeployStepsOutput']['Output']['CreateEndpointOutput']['EndpointArn']\n", "endpoint_name = endpoint_arn.split('/')[-1]" ] }, { "cell_type": "code", "execution_count": 196, "id": "989ea6a8", "metadata": {}, "outputs": [], "source": [ "sm_runtime = boto3.client('runtime.sagemaker')\n", "\n", "# Loading a snippet of the test file\n", "with open(f\"{DATA_DIR}test_batch.csv\", \"r\") as batch_file:\n", " batch_file.readline()\n", " samples = batch_file.readlines()[:10]\n", " \n", "payload = {\"instances\": ''.join(samples)}\n", "\n", "response = sm_runtime.invoke_endpoint(\n", " EndpointName=endpoint_name,\n", " ContentType='text/csv',\n", " Body=''.join(samples)\n", ")" ] }, { "cell_type": "code", "execution_count": 197, "id": "6398bb39", "metadata": {}, "outputs": [], "source": [ "results = json.loads(response['Body'].read().decode())" ] }, { "cell_type": "markdown", "id": "ec84feeb", "metadata": {}, "source": [ "Here's your live predictions" ] }, { "cell_type": "code", "execution_count": 200, "id": "4491a0ff", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[['True.', 0.04431271553039551, 0.9556872844696045],\n", " ['True.', 0.02011626958847046, 0.9798837304115295],\n", " ['True.', 0.008425593376159668, 0.9915744066238403],\n", " ['False.', 0.9018359184265137, 0.09816406667232513],\n", " ['True.', 0.2541276216506958, 0.7458723783493042],\n", " ['True.', 0.42594635486602783, 0.5740536451339722],\n", " ['True.', 0.4489116668701172, 0.5510883331298828],\n", " ['False.', 0.9520736932754517, 0.047926317900419235],\n", " ['False.', 0.9109795093536377, 0.0890204906463623],\n", " ['True.', 0.05050283670425415, 0.9494971632957458]]" ] }, "execution_count": 200, "metadata": {}, "output_type": "execute_result" } ], "source": [ "results" ] }, { "cell_type": "markdown", "id": "33828a20", "metadata": {}, "source": [ "__Optional__: delete endpoint to avoid additional charges" ] }, { "cell_type": "code", "execution_count": 202, "id": "3919bf2a", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{'ResponseMetadata': {'RequestId': '60f01fdd-6656-4a2b-a79c-e19c5c109f81',\n", " 'HTTPStatusCode': 200,\n", " 'HTTPHeaders': {'x-amzn-requestid': '60f01fdd-6656-4a2b-a79c-e19c5c109f81',\n", " 'content-type': 'application/x-amz-json-1.1',\n", " 'content-length': '0',\n", " 'date': 'Wed, 26 Jan 2022 15:37:39 GMT'},\n", " 'RetryAttempts': 0}}" ] }, "execution_count": 202, "metadata": {}, "output_type": "execute_result" } ], "source": [ "sm_client = boto3.client('sagemaker')\n", "sm_client.delete_endpoint(EndpointName=endpoint_name)" ] }, { "cell_type": "code", "execution_count": null, "id": "4338732e", "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "conda_python3", "language": "python", "name": "conda_python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.13" } }, "nbformat": 4, "nbformat_minor": 5 }