{ "cells": [ { "cell_type": "code", "execution_count": 1, "id": "3905c65f", "metadata": { "execution": { "iopub.execute_input": "2022-07-13T16:07:43.979421Z", "iopub.status.busy": "2022-07-13T16:07:43.978612Z", "iopub.status.idle": "2022-07-13T16:07:43.980773Z", "shell.execute_reply": "2022-07-13T16:07:43.980271Z" }, "papermill": { "duration": 0.036422, "end_time": "2022-07-13T16:07:43.980902", "exception": false, "start_time": "2022-07-13T16:07:43.944480", "status": "completed" }, "tags": [ "injected-parameters" ] }, "outputs": [], "source": [ "# Parameters\n", "kms_key = \"arn:aws:kms:us-west-2:000000000000:1234abcd-12ab-34cd-56ef-1234567890ab\"" ] }, { "cell_type": "markdown", "id": "f207c2b8", "metadata": { "papermill": { "duration": 0.029403, "end_time": "2022-07-13T16:07:44.040408", "exception": false, "start_time": "2022-07-13T16:07:44.011005", "status": "completed" }, "tags": [] }, "source": [ "# Orchestrate Jobs to Train and Evaluate Models with Amazon SageMaker Pipelines\n" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "---\n", "\n", "This notebook's CI test result for us-west-2 is as follows. CI test results in other regions can be found at the end of the notebook. \n", "\n", "![This us-west-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/us-west-2/sagemaker-pipelines|tabular|abalone_build_train_deploy|sagemaker-pipelines-preprocess-train-evaluate-batch-transform_outputs.ipynb)\n", "\n", "---" ] }, { "cell_type": "markdown", "id": "f207c2b8", "metadata": { "papermill": { "duration": 0.029403, "end_time": "2022-07-13T16:07:44.040408", "exception": false, "start_time": "2022-07-13T16:07:44.011005", "status": "completed" }, "tags": [] }, "source": [ "\n", "Amazon SageMaker Pipelines offers machine learning (ML) application developers and operations engineers the ability to orchestrate SageMaker jobs and author reproducible ML pipelines. It also enables them to deploy custom-built models for inference in real-time with low latency, run offline inferences with Batch Transform, and track lineage of artifacts. They can institute sound operational practices in deploying and monitoring production workflows, deploying model artifacts, and tracking artifact lineage through a simple interface, adhering to safety and best practice paradigms for ML application development.\n", "\n", "The SageMaker Pipelines service supports a SageMaker Pipeline domain specific language (DSL), which is a declarative JSON specification. This DSL defines a directed acyclic graph (DAG) of pipeline parameters and SageMaker job steps. The SageMaker Python Software Developer Kit (SDK) streamlines the generation of the pipeline DSL using constructs that engineers and scientists are already familiar with.\n", "\n", "## Runtime\n", "\n", "This notebook takes approximately an hour to run.\n", "\n", "## Contents\n", "\n", "1. [SageMaker Pipelines](#SageMaker-Pipelines)\n", "1. [Notebook Overview](#Notebook-Overview)\n", "1. [A SageMaker Pipeline](#A-SageMaker-Pipeline)\n", "1. [Dataset](#Dataset)\n", "1. [Define Parameters to Parametrize Pipeline Execution](#Define-Parameters-to-Parametrize-Pipeline-Execution)\n", "1. [Define a Processing Step for Feature Engineering](#Define-a-Processing-Step-for-Feature-Engineering)\n", "1. [Define a Training Step to Train a Model](#Define-a-Training-Step-to-Train-a-Model)\n", "1. [Define a Model Evaluation Step to Evaluate the Trained Model](#Define-a-Model-Evaluation-Step-to-Evaluate-the-Trained-Model)\n", "1. [Define a Create Model Step to Create a Model](#Define-a-Create-Model-Step-to-Create-a-Model)\n", "1. [Define a Transform Step to Perform Batch Transformation](#Define-a-Transform-Step-to-Perform-Batch-Transformation)\n", "1. [Define a Register Model Step to Create a Model Package](#Define-a-Register-Model-Step-to-Create-a-Model-Package)\n", "1. [Define a Fail Step to Terminate the Pipeline Execution and Mark it as Failed](#Define-a-Fail-Step-to-Terminate-the-Pipeline-Execution-and-Mark-it-as-Failed)\n", "1. [Define a Condition Step to Check Accuracy and Conditionally Create a Model and Run a Batch Transformation and Register a Model in the Model Registry, Or Terminate the Execution in Failed State](#Define-a-Condition-Step-to-Check-Accuracy-and-Conditionally-Create-a-Model-and-Run-a-Batch-Transformation-and-Register-a-Model-in-the-Model-Registry,-Or-Terminate-the-Execution-in-Failed-State)\n", "1. [Define a Pipeline of Parameters, Steps, and Conditions](#Define-a-Pipeline-of-Parameters,-Steps,-and-Conditions)\n", "1. [Submit the pipeline to SageMaker and start execution](#Submit-the-pipeline-to-SageMaker-and-start-execution)\n", "1. [Pipeline Operations: Examining and Waiting for Pipeline Execution](#Pipeline-Operations:-Examining-and-Waiting-for-Pipeline-Execution)\n", " 1. [Examining the Evaluation](#Examining-the-Evaluation)\n", " 1. [Lineage](#Lineage)\n", " 1. [Parametrized Executions](#Parametrized-Executions)" ] }, { "cell_type": "markdown", "id": "57abefb0", "metadata": { "papermill": { "duration": 0.029368, "end_time": "2022-07-13T16:07:44.099261", "exception": false, "start_time": "2022-07-13T16:07:44.069893", "status": "completed" }, "tags": [] }, "source": [ "## SageMaker Pipelines\n", "\n", "SageMaker Pipelines supports the following activities, which are demonstrated in this notebook:\n", "\n", "* Pipelines - A DAG of steps and conditions to orchestrate SageMaker jobs and resource creation.\n", "* Processing job steps - A simplified, managed experience on SageMaker to run data processing workloads, such as feature engineering, data validation, model evaluation, and model interpretation.\n", "* Training job steps - An iterative process that teaches a model to make predictions by presenting examples from a training dataset.\n", "* Conditional execution steps - A step that provides conditional execution of branches in a pipeline.\n", "* Register model steps - A step that creates a model package resource in the Model Registry that can be used to create deployable models in Amazon SageMaker.\n", "* Create model steps - A step that creates a model for use in transform steps or later publication as an endpoint.\n", "* Transform job steps - A batch transform to preprocess datasets to remove noise or bias that interferes with training or inference from a dataset, get inferences from large datasets, and run inference when a persistent endpoint is not needed.\n", "* Fail steps - A step that stops a pipeline execution and marks the pipeline execution as failed.\n", "* Parametrized Pipeline executions - Enables variation in pipeline executions according to specified parameters." ] }, { "cell_type": "markdown", "id": "f8fd6f33", "metadata": { "papermill": { "duration": 0.029618, "end_time": "2022-07-13T16:07:44.158330", "exception": false, "start_time": "2022-07-13T16:07:44.128712", "status": "completed" }, "tags": [] }, "source": [ "## Notebook Overview\n", "\n", "This notebook shows how to:\n", "\n", "* Define a set of Pipeline parameters that can be used to parametrize a SageMaker Pipeline.\n", "* Define a Processing step that performs cleaning, feature engineering, and splitting the input data into train and test data sets.\n", "* Define a Training step that trains a model on the preprocessed train data set.\n", "* Define a Processing step that evaluates the trained model's performance on the test dataset.\n", "* Define a Create Model step that creates a model from the model artifacts used in training.\n", "* Define a Transform step that performs batch transformation based on the model that was created.\n", "* Define a Register Model step that creates a model package from the estimator and model artifacts used to train the model.\n", "* Define a Conditional step that measures a condition based on output from prior steps and conditionally executes other steps.\n", "* Define a Fail step with a customized error message indicating the cause of the execution failure.\n", "* Define and create a Pipeline definition in a DAG, with the defined parameters and steps.\n", "* Start a Pipeline execution and wait for execution to complete.\n", "* Download the model evaluation report from the S3 bucket for examination.\n", "* Start a second Pipeline execution." ] }, { "cell_type": "markdown", "id": "38772d5a", "metadata": { "papermill": { "duration": 0.029536, "end_time": "2022-07-13T16:07:44.217483", "exception": false, "start_time": "2022-07-13T16:07:44.187947", "status": "completed" }, "tags": [] }, "source": [ "## A SageMaker Pipeline\n", "\n", "The pipeline that you create follows a typical machine learning (ML) application pattern of preprocessing, training, evaluation, model creation, batch transformation, and model registration:\n", "\n", "![A typical ML Application pipeline](https://raw.githubusercontent.com/aws/amazon-sagemaker-examples/main/sagemaker-pipelines/tabular/abalone_build_train_deploy/img/pipeline-full.png)" ] }, { "cell_type": "markdown", "id": "9607841c", "metadata": { "papermill": { "duration": 0.029684, "end_time": "2022-07-13T16:07:44.276589", "exception": false, "start_time": "2022-07-13T16:07:44.246905", "status": "completed" }, "tags": [] }, "source": [ "## Dataset\n", "\n", "The dataset you use is the [UCI Machine Learning Abalone Dataset](https://archive.ics.uci.edu/ml/datasets/abalone) [1]. The aim for this task is to determine the age of an abalone snail from its physical measurements. At the core, this is a regression problem.\n", "\n", "The dataset contains several features: length (the longest shell measurement), diameter (the diameter perpendicular to length), height (the height with meat in the shell), whole_weight (the weight of whole abalone), shucked_weight (the weight of meat), viscera_weight (the gut weight after bleeding), shell_weight (the weight after being dried), sex ('M', 'F', 'I' where 'I' is Infant), and rings (integer).\n", "\n", "The number of rings turns out to be a good approximation for age (age is rings + 1.5). However, to obtain this number requires cutting the shell through the cone, staining the section, and counting the number of rings through a microscope, which is a time-consuming task. However, the other physical measurements are easier to determine. You use the dataset to build a predictive model of the variable rings through these other physical measurements.\n", "\n", "Before you upload the data to an S3 bucket, install the SageMaker Python SDK and gather some constants you can use later in this notebook.\n", "\n", "> [1] Dua, D. and Graff, C. (2019). [UCI Machine Learning Repository](http://archive.ics.uci.edu/ml). Irvine, CA: University of California, School of Information and Computer Science." ] }, { "cell_type": "code", "execution_count": 2, "id": "ef441354", "metadata": { "execution": { "iopub.execute_input": "2022-07-13T16:07:44.382810Z", "iopub.status.busy": "2022-07-13T16:07:44.382069Z", "iopub.status.idle": "2022-07-13T16:07:48.212159Z", "shell.execute_reply": "2022-07-13T16:07:48.211713Z" }, "papermill": { "duration": 3.905415, "end_time": "2022-07-13T16:07:48.212278", "exception": false, "start_time": "2022-07-13T16:07:44.306863", "status": "completed" }, "pycharm": { "name": "#%%\n" }, "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "/opt/conda/lib/python3.7/site-packages/secretstorage/dhcrypto.py:16: CryptographyDeprecationWarning: int_from_bytes is deprecated, use int.from_bytes instead\r\n", " from cryptography.utils import int_from_bytes\r\n", "/opt/conda/lib/python3.7/site-packages/secretstorage/util.py:25: CryptographyDeprecationWarning: int_from_bytes is deprecated, use int.from_bytes instead\r\n", " from cryptography.utils import int_from_bytes\r\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Requirement already satisfied: sagemaker>=2.99.0 in /opt/conda/lib/python3.7/site-packages (2.99.0)\r\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Requirement already satisfied: protobuf3-to-dict<1.0,>=0.1.5 in /opt/conda/lib/python3.7/site-packages (from sagemaker>=2.99.0) (0.1.5)\r\n", "Requirement already satisfied: numpy<2.0,>=1.9.0 in /opt/conda/lib/python3.7/site-packages (from sagemaker>=2.99.0) (1.21.1)\r\n", "Requirement already satisfied: attrs<22,>=20.3.0 in /opt/conda/lib/python3.7/site-packages (from sagemaker>=2.99.0) (21.4.0)\r\n", "Requirement already satisfied: google-pasta in /opt/conda/lib/python3.7/site-packages (from sagemaker>=2.99.0) (0.2.0)\r\n", "Requirement already satisfied: boto3<2.0,>=1.20.21 in /opt/conda/lib/python3.7/site-packages (from sagemaker>=2.99.0) (1.20.47)\r\n", "Requirement already satisfied: pathos in /opt/conda/lib/python3.7/site-packages (from sagemaker>=2.99.0) (0.2.8)\r\n", "Requirement already satisfied: protobuf<4.0,>=3.1 in /opt/conda/lib/python3.7/site-packages (from sagemaker>=2.99.0) (3.17.3)\r\n", "Requirement already satisfied: packaging>=20.0 in /opt/conda/lib/python3.7/site-packages (from sagemaker>=2.99.0) (20.1)\r\n", "Requirement already satisfied: pandas in /opt/conda/lib/python3.7/site-packages (from sagemaker>=2.99.0) (1.0.1)\r\n", "Requirement already satisfied: importlib-metadata<5.0,>=1.4.0 in /opt/conda/lib/python3.7/site-packages (from sagemaker>=2.99.0) (1.5.0)\r\n", "Requirement already satisfied: smdebug-rulesconfig==1.0.1 in /opt/conda/lib/python3.7/site-packages (from sagemaker>=2.99.0) (1.0.1)\r\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Requirement already satisfied: jmespath<1.0.0,>=0.7.1 in /opt/conda/lib/python3.7/site-packages (from boto3<2.0,>=1.20.21->sagemaker>=2.99.0) (0.10.0)\r\n", "Requirement already satisfied: botocore<1.24.0,>=1.23.47 in /opt/conda/lib/python3.7/site-packages (from boto3<2.0,>=1.20.21->sagemaker>=2.99.0) (1.23.47)\r\n", "Requirement already satisfied: s3transfer<0.6.0,>=0.5.0 in /opt/conda/lib/python3.7/site-packages (from boto3<2.0,>=1.20.21->sagemaker>=2.99.0) (0.5.0)\r\n", "Requirement already satisfied: python-dateutil<3.0.0,>=2.1 in /opt/conda/lib/python3.7/site-packages (from botocore<1.24.0,>=1.23.47->boto3<2.0,>=1.20.21->sagemaker>=2.99.0) (2.8.1)\r\n", "Requirement already satisfied: urllib3<1.27,>=1.25.4 in /opt/conda/lib/python3.7/site-packages (from botocore<1.24.0,>=1.23.47->boto3<2.0,>=1.20.21->sagemaker>=2.99.0) (1.26.6)\r\n", "Requirement already satisfied: zipp>=0.5 in /opt/conda/lib/python3.7/site-packages (from importlib-metadata<5.0,>=1.4.0->sagemaker>=2.99.0) (2.2.0)\r\n", "Requirement already satisfied: pyparsing>=2.0.2 in /opt/conda/lib/python3.7/site-packages (from packaging>=20.0->sagemaker>=2.99.0) (2.4.6)\r\n", "Requirement already satisfied: six in /opt/conda/lib/python3.7/site-packages (from packaging>=20.0->sagemaker>=2.99.0) (1.14.0)\r\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Requirement already satisfied: pytz>=2017.2 in /opt/conda/lib/python3.7/site-packages (from pandas->sagemaker>=2.99.0) (2019.3)\r\n", "Requirement already satisfied: ppft>=1.6.6.4 in /opt/conda/lib/python3.7/site-packages (from pathos->sagemaker>=2.99.0) (1.6.6.4)\r\n", "Requirement already satisfied: pox>=0.3.0 in /opt/conda/lib/python3.7/site-packages (from pathos->sagemaker>=2.99.0) (0.3.0)\r\n", "Requirement already satisfied: dill>=0.3.4 in /opt/conda/lib/python3.7/site-packages (from pathos->sagemaker>=2.99.0) (0.3.4)\r\n", "Requirement already satisfied: multiprocess>=0.70.12 in /opt/conda/lib/python3.7/site-packages (from pathos->sagemaker>=2.99.0) (0.70.12.2)\r\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\u001b[33mWARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv\u001b[0m\r\n", "\u001b[33mWARNING: You are using pip version 21.1.3; however, version 22.1.2 is available.\r\n", "You should consider upgrading via the '/opt/conda/bin/python -m pip install --upgrade pip' command.\u001b[0m\r\n" ] } ], "source": [ "import sys\n", "\n", "!{sys.executable} -m pip install \"sagemaker>=2.99.0\"\n", "\n", "import boto3\n", "import sagemaker\n", "from sagemaker.workflow.pipeline_context import PipelineSession\n", "\n", "sagemaker_session = sagemaker.session.Session()\n", "region = sagemaker_session.boto_region_name\n", "role = sagemaker.get_execution_role()\n", "pipeline_session = PipelineSession()\n", "default_bucket = sagemaker_session.default_bucket()\n", "model_package_group_name = f\"AbaloneModelPackageGroupName\"" ] }, { "cell_type": "markdown", "id": "c03f70f4", "metadata": { "papermill": { "duration": 0.031185, "end_time": "2022-07-13T16:07:48.274459", "exception": false, "start_time": "2022-07-13T16:07:48.243274", "status": "completed" }, "tags": [] }, "source": [ "Now, upload the data into the default bucket. You can select our own data set for the `input_data_uri` as is appropriate." ] }, { "cell_type": "code", "execution_count": 3, "id": "f15c8059", "metadata": { "execution": { "iopub.execute_input": "2022-07-13T16:07:48.346589Z", "iopub.status.busy": "2022-07-13T16:07:48.341539Z", "iopub.status.idle": "2022-07-13T16:07:48.497993Z", "shell.execute_reply": "2022-07-13T16:07:48.497482Z" }, "papermill": { "duration": 0.19199, "end_time": "2022-07-13T16:07:48.498111", "exception": false, "start_time": "2022-07-13T16:07:48.306121", "status": "completed" }, "pycharm": { "name": "#%%\n" }, "tags": [] }, "outputs": [], "source": [ "!mkdir -p data" ] }, { "cell_type": "code", "execution_count": 4, "id": "8ff00b12", "metadata": { "execution": { "iopub.execute_input": "2022-07-13T16:07:48.575181Z", "iopub.status.busy": "2022-07-13T16:07:48.574347Z", "iopub.status.idle": "2022-07-13T16:07:49.570833Z", "shell.execute_reply": "2022-07-13T16:07:49.571230Z" }, "papermill": { "duration": 1.041888, "end_time": "2022-07-13T16:07:49.571370", "exception": false, "start_time": "2022-07-13T16:07:48.529482", "status": "completed" }, "pycharm": { "name": "#%%\n" }, "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "s3://sagemaker-us-west-2-000000000000/abalone/abalone-dataset.csv\n" ] } ], "source": [ "local_path = \"data/abalone-dataset.csv\"\n", "\n", "s3 = boto3.resource(\"s3\")\n", "s3.Bucket(f\"sagemaker-sample-files\").download_file(\n", " \"datasets/tabular/uci_abalone/abalone.csv\", local_path\n", ")\n", "\n", "base_uri = f\"s3://{default_bucket}/abalone\"\n", "input_data_uri = sagemaker.s3.S3Uploader.upload(\n", " local_path=local_path,\n", " desired_s3_uri=base_uri,\n", ")\n", "print(input_data_uri)" ] }, { "cell_type": "markdown", "id": "68c6ec51", "metadata": { "papermill": { "duration": 0.033529, "end_time": "2022-07-13T16:07:49.637761", "exception": false, "start_time": "2022-07-13T16:07:49.604232", "status": "completed" }, "tags": [] }, "source": [ "Download a second dataset for batch transformation after model creation. You can select our own dataset for the `batch_data_uri` as is appropriate." ] }, { "cell_type": "code", "execution_count": 5, "id": "f671e801", "metadata": { "execution": { "iopub.execute_input": "2022-07-13T16:07:49.722364Z", "iopub.status.busy": "2022-07-13T16:07:49.719824Z", "iopub.status.idle": "2022-07-13T16:07:50.203802Z", "shell.execute_reply": "2022-07-13T16:07:50.208095Z" }, "papermill": { "duration": 0.530874, "end_time": "2022-07-13T16:07:50.208271", "exception": false, "start_time": "2022-07-13T16:07:49.677397", "status": "completed" }, "pycharm": { "name": "#%%\n" }, "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "s3://sagemaker-us-west-2-000000000000/abalone/abalone-dataset-batch\n" ] } ], "source": [ "local_path = \"data/abalone-dataset-batch\"\n", "\n", "s3 = boto3.resource(\"s3\")\n", "s3.Bucket(f\"sagemaker-servicecatalog-seedcode-{region}\").download_file(\n", " \"dataset/abalone-dataset-batch\", local_path\n", ")\n", "\n", "base_uri = f\"s3://{default_bucket}/abalone\"\n", "batch_data_uri = sagemaker.s3.S3Uploader.upload(\n", " local_path=local_path,\n", " desired_s3_uri=base_uri,\n", ")\n", "print(batch_data_uri)" ] }, { "cell_type": "markdown", "id": "345b427e", "metadata": { "papermill": { "duration": 0.054012, "end_time": "2022-07-13T16:07:50.311578", "exception": false, "start_time": "2022-07-13T16:07:50.257566", "status": "completed" }, "tags": [] }, "source": [ "## Define Parameters to Parametrize Pipeline Execution\n", "\n", "Define Pipeline parameters that you can use to parametrize the pipeline. Parameters enable custom pipeline executions and schedules without having to modify the Pipeline definition.\n", "\n", "The supported parameter types include:\n", "\n", "* `ParameterString` - represents a `str` Python type\n", "* `ParameterInteger` - represents an `int` Python type\n", "* `ParameterFloat` - represents a `float` Python type\n", "\n", "These parameters support providing a default value, which can be overridden on pipeline execution. The default value specified should be an instance of the type of the parameter.\n", "\n", "The parameters defined in this workflow include:\n", "\n", "* `processing_instance_count` - The instance count of the processing job.\n", "* `instance_type` - The `ml.*` instance type of the training job.\n", "* `model_approval_status` - The approval status to register with the trained model for CI/CD purposes (\"PendingManualApproval\" is the default).\n", "* `input_data` - The S3 bucket URI location of the input data.\n", "* `batch_data` - The S3 bucket URI location of the batch data.\n", "* `mse_threshold` - The Mean Squared Error (MSE) threshold used to verify the accuracy of a model." ] }, { "cell_type": "code", "execution_count": 6, "id": "b730dc01", "metadata": { "execution": { "iopub.execute_input": "2022-07-13T16:07:50.422392Z", "iopub.status.busy": "2022-07-13T16:07:50.421597Z", "iopub.status.idle": "2022-07-13T16:07:50.423907Z", "shell.execute_reply": "2022-07-13T16:07:50.423467Z" }, "papermill": { "duration": 0.058807, "end_time": "2022-07-13T16:07:50.424032", "exception": false, "start_time": "2022-07-13T16:07:50.365225", "status": "completed" }, "pycharm": { "name": "#%%\n" }, "tags": [] }, "outputs": [], "source": [ "from sagemaker.workflow.parameters import (\n", " ParameterInteger,\n", " ParameterString,\n", " ParameterFloat,\n", ")\n", "\n", "processing_instance_count = ParameterInteger(name=\"ProcessingInstanceCount\", default_value=1)\n", "instance_type = ParameterString(name=\"TrainingInstanceType\", default_value=\"ml.m5.xlarge\")\n", "model_approval_status = ParameterString(\n", " name=\"ModelApprovalStatus\", default_value=\"PendingManualApproval\"\n", ")\n", "input_data = ParameterString(\n", " name=\"InputData\",\n", " default_value=input_data_uri,\n", ")\n", "batch_data = ParameterString(\n", " name=\"BatchData\",\n", " default_value=batch_data_uri,\n", ")\n", "mse_threshold = ParameterFloat(name=\"MseThreshold\", default_value=6.0)" ] }, { "cell_type": "markdown", "id": "2113cab9", "metadata": { "papermill": { "duration": 0.061565, "end_time": "2022-07-13T16:07:50.532413", "exception": false, "start_time": "2022-07-13T16:07:50.470848", "status": "completed" }, "tags": [] }, "source": [ "![Define Parameters](https://raw.githubusercontent.com/aws/amazon-sagemaker-examples/main/sagemaker-pipelines/tabular/abalone_build_train_deploy/img/pipeline-1.png)" ] }, { "cell_type": "markdown", "id": "0279c19d", "metadata": { "papermill": { "duration": 0.034918, "end_time": "2022-07-13T16:07:50.601674", "exception": false, "start_time": "2022-07-13T16:07:50.566756", "status": "completed" }, "tags": [] }, "source": [ "## Define a Processing Step for Feature Engineering\n", "\n", "First, develop a preprocessing script that is specified in the Processing step.\n", "\n", "This notebook cell writes a file `preprocessing_abalone.py`, which contains the preprocessing script. You can update the script, and rerun this cell to overwrite. The preprocessing script uses `scikit-learn` to do the following:\n", "\n", "* Fill in missing sex category data and encode it so that it is suitable for training.\n", "* Scale and normalize all numerical fields, aside from sex and rings numerical data.\n", "* Split the data into training, validation, and test datasets.\n", "\n", "The Processing step executes the script on the input data. The Training step uses the preprocessed training features and labels to train a model. The Evaluation step uses the trained model and preprocessed test features and labels to evaluate the model." ] }, { "cell_type": "code", "execution_count": 7, "id": "8b07a322", "metadata": { "execution": { "iopub.execute_input": "2022-07-13T16:07:50.679648Z", "iopub.status.busy": "2022-07-13T16:07:50.678977Z", "iopub.status.idle": "2022-07-13T16:07:50.827765Z", "shell.execute_reply": "2022-07-13T16:07:50.827361Z" }, "papermill": { "duration": 0.19357, "end_time": "2022-07-13T16:07:50.827884", "exception": false, "start_time": "2022-07-13T16:07:50.634314", "status": "completed" }, "pycharm": { "name": "#%%\n" }, "tags": [] }, "outputs": [], "source": [ "!mkdir -p code" ] }, { "cell_type": "code", "execution_count": 8, "id": "3901891e", "metadata": { "execution": { "iopub.execute_input": "2022-07-13T16:07:50.900839Z", "iopub.status.busy": "2022-07-13T16:07:50.899665Z", "iopub.status.idle": "2022-07-13T16:07:50.902508Z", "shell.execute_reply": "2022-07-13T16:07:50.902925Z" }, "papermill": { "duration": 0.043198, "end_time": "2022-07-13T16:07:50.903065", "exception": false, "start_time": "2022-07-13T16:07:50.859867", "status": "completed" }, "pycharm": { "name": "#%%\n" }, "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Writing code/preprocessing.py\n" ] } ], "source": [ "%%writefile code/preprocessing.py\n", "import argparse\n", "import os\n", "import requests\n", "import tempfile\n", "\n", "import numpy as np\n", "import pandas as pd\n", "\n", "from sklearn.compose import ColumnTransformer\n", "from sklearn.impute import SimpleImputer\n", "from sklearn.pipeline import Pipeline\n", "from sklearn.preprocessing import StandardScaler, OneHotEncoder\n", "\n", "\n", "# Since we get a headerless CSV file, we specify the column names here.\n", "feature_columns_names = [\n", " \"sex\",\n", " \"length\",\n", " \"diameter\",\n", " \"height\",\n", " \"whole_weight\",\n", " \"shucked_weight\",\n", " \"viscera_weight\",\n", " \"shell_weight\",\n", "]\n", "label_column = \"rings\"\n", "\n", "feature_columns_dtype = {\n", " \"sex\": str,\n", " \"length\": np.float64,\n", " \"diameter\": np.float64,\n", " \"height\": np.float64,\n", " \"whole_weight\": np.float64,\n", " \"shucked_weight\": np.float64,\n", " \"viscera_weight\": np.float64,\n", " \"shell_weight\": np.float64,\n", "}\n", "label_column_dtype = {\"rings\": np.float64}\n", "\n", "\n", "def merge_two_dicts(x, y):\n", " z = x.copy()\n", " z.update(y)\n", " return z\n", "\n", "\n", "if __name__ == \"__main__\":\n", " base_dir = \"/opt/ml/processing\"\n", "\n", " df = pd.read_csv(\n", " f\"{base_dir}/input/abalone-dataset.csv\",\n", " header=None,\n", " names=feature_columns_names + [label_column],\n", " dtype=merge_two_dicts(feature_columns_dtype, label_column_dtype),\n", " )\n", " numeric_features = list(feature_columns_names)\n", " numeric_features.remove(\"sex\")\n", " numeric_transformer = Pipeline(\n", " steps=[(\"imputer\", SimpleImputer(strategy=\"median\")), (\"scaler\", StandardScaler())]\n", " )\n", "\n", " categorical_features = [\"sex\"]\n", " categorical_transformer = Pipeline(\n", " steps=[\n", " (\"imputer\", SimpleImputer(strategy=\"constant\", fill_value=\"missing\")),\n", " (\"onehot\", OneHotEncoder(handle_unknown=\"ignore\")),\n", " ]\n", " )\n", "\n", " preprocess = ColumnTransformer(\n", " transformers=[\n", " (\"num\", numeric_transformer, numeric_features),\n", " (\"cat\", categorical_transformer, categorical_features),\n", " ]\n", " )\n", "\n", " y = df.pop(\"rings\")\n", " X_pre = preprocess.fit_transform(df)\n", " y_pre = y.to_numpy().reshape(len(y), 1)\n", "\n", " X = np.concatenate((y_pre, X_pre), axis=1)\n", "\n", " np.random.shuffle(X)\n", " train, validation, test = np.split(X, [int(0.7 * len(X)), int(0.85 * len(X))])\n", "\n", " pd.DataFrame(train).to_csv(f\"{base_dir}/train/train.csv\", header=False, index=False)\n", " pd.DataFrame(validation).to_csv(\n", " f\"{base_dir}/validation/validation.csv\", header=False, index=False\n", " )\n", " pd.DataFrame(test).to_csv(f\"{base_dir}/test/test.csv\", header=False, index=False)" ] }, { "cell_type": "markdown", "id": "753400dc", "metadata": { "papermill": { "duration": 0.03212, "end_time": "2022-07-13T16:07:50.967490", "exception": false, "start_time": "2022-07-13T16:07:50.935370", "status": "completed" }, "tags": [] }, "source": [ "Next, create an instance of a `SKLearnProcessor` processor and use that in our `ProcessingStep`.\n", "\n", "You also specify the `framework_version` to use throughout this notebook.\n", "\n", "Note the `processing_instance_count` parameter used by the processor instance." ] }, { "cell_type": "code", "execution_count": 9, "id": "c3563172", "metadata": { "execution": { "iopub.execute_input": "2022-07-13T16:07:51.037823Z", "iopub.status.busy": "2022-07-13T16:07:51.037000Z", "iopub.status.idle": "2022-07-13T16:07:51.055494Z", "shell.execute_reply": "2022-07-13T16:07:51.055848Z" }, "papermill": { "duration": 0.056125, "end_time": "2022-07-13T16:07:51.055993", "exception": false, "start_time": "2022-07-13T16:07:50.999868", "status": "completed" }, "pycharm": { "name": "#%%\n" }, "tags": [] }, "outputs": [], "source": [ "from sagemaker.sklearn.processing import SKLearnProcessor\n", "\n", "\n", "framework_version = \"0.23-1\"\n", "\n", "sklearn_processor = SKLearnProcessor(\n", " framework_version=framework_version,\n", " instance_type=\"ml.m5.xlarge\",\n", " instance_count=processing_instance_count,\n", " base_job_name=\"sklearn-abalone-process\",\n", " role=role,\n", " sagemaker_session=pipeline_session,\n", ")" ] }, { "cell_type": "markdown", "id": "93a1d463", "metadata": { "papermill": { "duration": 0.032682, "end_time": "2022-07-13T16:07:51.121085", "exception": false, "start_time": "2022-07-13T16:07:51.088403", "status": "completed" }, "tags": [] }, "source": [ "Finally, we take the output of the processor's `run` method and pass that as arguments to the `ProcessingStep`. By passing the `pipeline_session` to the `sagemaker_session`, calling `.run()` does not launch the processing job, it returns the arguments needed to run the job as a step in the pipeline.\n", "\n", "Note the `\"train_data\"` and `\"test_data\"` named channels specified in the output configuration for the processing job. Step `Properties` can be used in subsequent steps and resolve to their runtime values at execution. Specifically, this usage is called out when you define the training step." ] }, { "cell_type": "code", "execution_count": 10, "id": "240281be", "metadata": { "execution": { "iopub.execute_input": "2022-07-13T16:07:51.192383Z", "iopub.status.busy": "2022-07-13T16:07:51.191558Z", "iopub.status.idle": "2022-07-13T16:07:51.548107Z", "shell.execute_reply": "2022-07-13T16:07:51.548453Z" }, "papermill": { "duration": 0.39512, "end_time": "2022-07-13T16:07:51.548594", "exception": false, "start_time": "2022-07-13T16:07:51.153474", "status": "completed" }, "pycharm": { "name": "#%%\n" }, "tags": [] }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "/opt/conda/lib/python3.7/site-packages/sagemaker/workflow/pipeline_context.py:197: UserWarning: Running within a PipelineSession, there will be No Wait, No Logs, and No Job being started.\n", " UserWarning,\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "Job Name: sklearn-abalone-process-2022-07-13-16-07-51-228\n", "Inputs: [{'InputName': 'input-1', 'AppManaged': False, 'S3Input': {'S3Uri': ParameterString(name='InputData', parameter_type=, default_value='s3://sagemaker-us-west-2-000000000000/abalone/abalone-dataset.csv'), 'LocalPath': '/opt/ml/processing/input', 'S3DataType': 'S3Prefix', 'S3InputMode': 'File', 'S3DataDistributionType': 'FullyReplicated', 'S3CompressionType': 'None'}}, {'InputName': 'code', 'AppManaged': False, 'S3Input': {'S3Uri': 's3://sagemaker-us-west-2-000000000000/sklearn-abalone-process-2022-07-13-16-07-51-228/input/code/preprocessing.py', 'LocalPath': '/opt/ml/processing/input/code', 'S3DataType': 'S3Prefix', 'S3InputMode': 'File', 'S3DataDistributionType': 'FullyReplicated', 'S3CompressionType': 'None'}}]\n", "Outputs: [{'OutputName': 'train', 'AppManaged': False, 'S3Output': {'S3Uri': 's3://sagemaker-us-west-2-000000000000/sklearn-abalone-process-2022-07-13-16-07-51-228/output/train', 'LocalPath': '/opt/ml/processing/train', 'S3UploadMode': 'EndOfJob'}}, {'OutputName': 'validation', 'AppManaged': False, 'S3Output': {'S3Uri': 's3://sagemaker-us-west-2-000000000000/sklearn-abalone-process-2022-07-13-16-07-51-228/output/validation', 'LocalPath': '/opt/ml/processing/validation', 'S3UploadMode': 'EndOfJob'}}, {'OutputName': 'test', 'AppManaged': False, 'S3Output': {'S3Uri': 's3://sagemaker-us-west-2-000000000000/sklearn-abalone-process-2022-07-13-16-07-51-228/output/test', 'LocalPath': '/opt/ml/processing/test', 'S3UploadMode': 'EndOfJob'}}]\n" ] } ], "source": [ "from sagemaker.processing import ProcessingInput, ProcessingOutput\n", "from sagemaker.workflow.steps import ProcessingStep\n", "\n", "processor_args = sklearn_processor.run(\n", " inputs=[\n", " ProcessingInput(source=input_data, destination=\"/opt/ml/processing/input\"),\n", " ],\n", " outputs=[\n", " ProcessingOutput(output_name=\"train\", source=\"/opt/ml/processing/train\"),\n", " ProcessingOutput(output_name=\"validation\", source=\"/opt/ml/processing/validation\"),\n", " ProcessingOutput(output_name=\"test\", source=\"/opt/ml/processing/test\"),\n", " ],\n", " code=\"code/preprocessing.py\",\n", ")\n", "\n", "step_process = ProcessingStep(name=\"AbaloneProcess\", step_args=processor_args)" ] }, { "cell_type": "markdown", "id": "b42f9bb1", "metadata": { "papermill": { "duration": 0.033806, "end_time": "2022-07-13T16:07:51.617961", "exception": false, "start_time": "2022-07-13T16:07:51.584155", "status": "completed" }, "tags": [] }, "source": [ "![Define a Processing Step for Feature Engineering](https://raw.githubusercontent.com/aws/amazon-sagemaker-examples/main/sagemaker-pipelines/tabular/abalone_build_train_deploy/img/pipeline-2.png)" ] }, { "cell_type": "markdown", "id": "dc187225", "metadata": { "papermill": { "duration": 0.040739, "end_time": "2022-07-13T16:07:51.691562", "exception": false, "start_time": "2022-07-13T16:07:51.650823", "status": "completed" }, "tags": [] }, "source": [ "## Define a Training Step to Train a Model\n", "\n", "In this section, use Amazon SageMaker's [XGBoost Algorithm](https://docs.aws.amazon.com/sagemaker/latest/dg/xgboost.html) to train on this dataset. Configure an Estimator for the XGBoost algorithm and the input dataset. A typical training script loads data from the input channels, configures training with hyperparameters, trains a model, and saves a model to `model_dir` so that it can be hosted later.\n", "\n", "The model path where the models from training are saved is also specified.\n", "\n", "Note the `instance_type` parameter may be used in multiple places in the pipeline. In this case, the `instance_type` is passed into the estimator." ] }, { "cell_type": "code", "execution_count": 11, "id": "7407bef6", "metadata": { "execution": { "iopub.execute_input": "2022-07-13T16:07:51.766085Z", "iopub.status.busy": "2022-07-13T16:07:51.765580Z", "iopub.status.idle": "2022-07-13T16:07:51.897005Z", "shell.execute_reply": "2022-07-13T16:07:51.896426Z" }, "papermill": { "duration": 0.171801, "end_time": "2022-07-13T16:07:51.897137", "exception": false, "start_time": "2022-07-13T16:07:51.725336", "status": "completed" }, "pycharm": { "name": "#%%\n" }, "tags": [] }, "outputs": [], "source": [ "from sagemaker.estimator import Estimator\n", "from sagemaker.inputs import TrainingInput\n", "\n", "model_path = f\"s3://{default_bucket}/AbaloneTrain\"\n", "image_uri = sagemaker.image_uris.retrieve(\n", " framework=\"xgboost\",\n", " region=region,\n", " version=\"1.0-1\",\n", " py_version=\"py3\",\n", " instance_type=\"ml.m5.xlarge\",\n", ")\n", "xgb_train = Estimator(\n", " image_uri=image_uri,\n", " instance_type=instance_type,\n", " instance_count=1,\n", " output_path=model_path,\n", " role=role,\n", " sagemaker_session=pipeline_session,\n", ")\n", "xgb_train.set_hyperparameters(\n", " objective=\"reg:linear\",\n", " num_round=50,\n", " max_depth=5,\n", " eta=0.2,\n", " gamma=4,\n", " min_child_weight=6,\n", " subsample=0.7,\n", ")\n", "\n", "train_args = xgb_train.fit(\n", " inputs={\n", " \"train\": TrainingInput(\n", " s3_data=step_process.properties.ProcessingOutputConfig.Outputs[\"train\"].S3Output.S3Uri,\n", " content_type=\"text/csv\",\n", " ),\n", " \"validation\": TrainingInput(\n", " s3_data=step_process.properties.ProcessingOutputConfig.Outputs[\n", " \"validation\"\n", " ].S3Output.S3Uri,\n", " content_type=\"text/csv\",\n", " ),\n", " }\n", ")" ] }, { "cell_type": "markdown", "id": "f86374ad", "metadata": { "papermill": { "duration": 0.064705, "end_time": "2022-07-13T16:07:51.995767", "exception": false, "start_time": "2022-07-13T16:07:51.931062", "status": "completed" }, "tags": [] }, "source": [ "Finally, we use the output of the estimator's `.fit()` method as arguments to the `TrainingStep`. By passing the `pipeline_session` to the `sagemaker_session`, calling `.fit()` does not launch the training job, it returns the arguments needed to run the job as a step in the pipeline.\n", "\n", "Pass in the `S3Uri` of the `\"train_data\"` output channel to the `.fit()` method. Also, use the other `\"test_data\"` output channel for model evaluation in the pipeline. The `properties` attribute of a Pipeline step matches the object model of the corresponding response of a describe call. These properties can be referenced as placeholder values and are resolved at runtime. For example, the `ProcessingStep` `properties` attribute matches the object model of the [DescribeProcessingJob](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_DescribeProcessingJob.html) response object." ] }, { "cell_type": "code", "execution_count": 12, "id": "724d8eef", "metadata": { "execution": { "iopub.execute_input": "2022-07-13T16:07:52.195314Z", "iopub.status.busy": "2022-07-13T16:07:52.193872Z", "iopub.status.idle": "2022-07-13T16:07:52.195919Z", "shell.execute_reply": "2022-07-13T16:07:52.196336Z" }, "papermill": { "duration": 0.103611, "end_time": "2022-07-13T16:07:52.196480", "exception": false, "start_time": "2022-07-13T16:07:52.092869", "status": "completed" }, "pycharm": { "name": "#%%\n" }, "tags": [] }, "outputs": [], "source": [ "from sagemaker.inputs import TrainingInput\n", "from sagemaker.workflow.steps import TrainingStep\n", "\n", "\n", "step_train = TrainingStep(\n", " name=\"AbaloneTrain\",\n", " step_args=train_args,\n", ")" ] }, { "cell_type": "markdown", "id": "61d7285f", "metadata": { "papermill": { "duration": 0.097979, "end_time": "2022-07-13T16:07:52.392020", "exception": false, "start_time": "2022-07-13T16:07:52.294041", "status": "completed" }, "tags": [] }, "source": [ "![Define a Training Step to Train a Model](https://raw.githubusercontent.com/aws/amazon-sagemaker-examples/main/sagemaker-pipelines/tabular/abalone_build_train_deploy/img/pipeline-3.png)" ] }, { "cell_type": "markdown", "id": "fd58bfa7", "metadata": { "papermill": { "duration": 0.097153, "end_time": "2022-07-13T16:07:52.586494", "exception": false, "start_time": "2022-07-13T16:07:52.489341", "status": "completed" }, "tags": [] }, "source": [ "## Define a Model Evaluation Step to Evaluate the Trained Model\n", "\n", "First, develop an evaluation script that is specified in a Processing step that performs the model evaluation.\n", "\n", "After pipeline execution, you can examine the resulting `evaluation.json` for analysis.\n", "\n", "The evaluation script uses `xgboost` to do the following:\n", "\n", "* Load the model.\n", "* Read the test data.\n", "* Issue predictions against the test data.\n", "* Build a classification report, including accuracy and ROC curve.\n", "* Save the evaluation report to the evaluation directory." ] }, { "cell_type": "code", "execution_count": 13, "id": "a03a42f2", "metadata": { "execution": { "iopub.execute_input": "2022-07-13T16:07:52.867379Z", "iopub.status.busy": "2022-07-13T16:07:52.866257Z", "iopub.status.idle": "2022-07-13T16:07:52.869025Z", "shell.execute_reply": "2022-07-13T16:07:52.869384Z" }, "papermill": { "duration": 0.184948, "end_time": "2022-07-13T16:07:52.869516", "exception": false, "start_time": "2022-07-13T16:07:52.684568", "status": "completed" }, "pycharm": { "name": "#%%\n" }, "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Writing code/evaluation.py\n" ] } ], "source": [ "%%writefile code/evaluation.py\n", "import json\n", "import pathlib\n", "import pickle\n", "import tarfile\n", "\n", "import joblib\n", "import numpy as np\n", "import pandas as pd\n", "import xgboost\n", "\n", "from sklearn.metrics import mean_squared_error\n", "\n", "\n", "if __name__ == \"__main__\":\n", " model_path = f\"/opt/ml/processing/model/model.tar.gz\"\n", " with tarfile.open(model_path) as tar:\n", " tar.extractall(path=\".\")\n", "\n", " model = pickle.load(open(\"xgboost-model\", \"rb\"))\n", "\n", " test_path = \"/opt/ml/processing/test/test.csv\"\n", " df = pd.read_csv(test_path, header=None)\n", "\n", " y_test = df.iloc[:, 0].to_numpy()\n", " df.drop(df.columns[0], axis=1, inplace=True)\n", "\n", " X_test = xgboost.DMatrix(df.values)\n", "\n", " predictions = model.predict(X_test)\n", "\n", " mse = mean_squared_error(y_test, predictions)\n", " std = np.std(y_test - predictions)\n", " report_dict = {\n", " \"regression_metrics\": {\n", " \"mse\": {\"value\": mse, \"standard_deviation\": std},\n", " },\n", " }\n", "\n", " output_dir = \"/opt/ml/processing/evaluation\"\n", " pathlib.Path(output_dir).mkdir(parents=True, exist_ok=True)\n", "\n", " evaluation_path = f\"{output_dir}/evaluation.json\"\n", " with open(evaluation_path, \"w\") as f:\n", " f.write(json.dumps(report_dict))" ] }, { "cell_type": "markdown", "id": "3db80eee", "metadata": { "papermill": { "duration": 0.033789, "end_time": "2022-07-13T16:07:52.999885", "exception": false, "start_time": "2022-07-13T16:07:52.966096", "status": "completed" }, "tags": [] }, "source": [ "Next, create an instance of a `ScriptProcessor` processor and use it in the `ProcessingStep`." ] }, { "cell_type": "code", "execution_count": 14, "id": "53230930", "metadata": { "execution": { "iopub.execute_input": "2022-07-13T16:07:53.265975Z", "iopub.status.busy": "2022-07-13T16:07:53.265226Z", "iopub.status.idle": "2022-07-13T16:07:53.339364Z", "shell.execute_reply": "2022-07-13T16:07:53.338974Z" }, "papermill": { "duration": 0.242851, "end_time": "2022-07-13T16:07:53.339481", "exception": false, "start_time": "2022-07-13T16:07:53.096630", "status": "completed" }, "pycharm": { "name": "#%%\n" }, "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "Job Name: script-abalone-eval-2022-07-13-16-07-53-265\n", "Inputs: [{'InputName': 'input-1', 'AppManaged': False, 'S3Input': {'S3Uri': , 'LocalPath': '/opt/ml/processing/model', 'S3DataType': 'S3Prefix', 'S3InputMode': 'File', 'S3DataDistributionType': 'FullyReplicated', 'S3CompressionType': 'None'}}, {'InputName': 'input-2', 'AppManaged': False, 'S3Input': {'S3Uri': , 'LocalPath': '/opt/ml/processing/test', 'S3DataType': 'S3Prefix', 'S3InputMode': 'File', 'S3DataDistributionType': 'FullyReplicated', 'S3CompressionType': 'None'}}, {'InputName': 'code', 'AppManaged': False, 'S3Input': {'S3Uri': 's3://sagemaker-us-west-2-000000000000/script-abalone-eval-2022-07-13-16-07-53-265/input/code/evaluation.py', 'LocalPath': '/opt/ml/processing/input/code', 'S3DataType': 'S3Prefix', 'S3InputMode': 'File', 'S3DataDistributionType': 'FullyReplicated', 'S3CompressionType': 'None'}}]\n", "Outputs: [{'OutputName': 'evaluation', 'AppManaged': False, 'S3Output': {'S3Uri': 's3://sagemaker-us-west-2-000000000000/script-abalone-eval-2022-07-13-16-07-53-265/output/evaluation', 'LocalPath': '/opt/ml/processing/evaluation', 'S3UploadMode': 'EndOfJob'}}]\n" ] } ], "source": [ "from sagemaker.processing import ScriptProcessor\n", "\n", "\n", "script_eval = ScriptProcessor(\n", " image_uri=image_uri,\n", " command=[\"python3\"],\n", " instance_type=\"ml.m5.xlarge\",\n", " instance_count=1,\n", " base_job_name=\"script-abalone-eval\",\n", " role=role,\n", " sagemaker_session=pipeline_session,\n", ")\n", "\n", "eval_args = script_eval.run(\n", " inputs=[\n", " ProcessingInput(\n", " source=step_train.properties.ModelArtifacts.S3ModelArtifacts,\n", " destination=\"/opt/ml/processing/model\",\n", " ),\n", " ProcessingInput(\n", " source=step_process.properties.ProcessingOutputConfig.Outputs[\"test\"].S3Output.S3Uri,\n", " destination=\"/opt/ml/processing/test\",\n", " ),\n", " ],\n", " outputs=[\n", " ProcessingOutput(output_name=\"evaluation\", source=\"/opt/ml/processing/evaluation\"),\n", " ],\n", " code=\"code/evaluation.py\",\n", ")" ] }, { "cell_type": "markdown", "id": "858206cd", "metadata": { "papermill": { "duration": 0.097754, "end_time": "2022-07-13T16:07:53.494882", "exception": false, "start_time": "2022-07-13T16:07:53.397128", "status": "completed" }, "tags": [] }, "source": [ "Use the processor's arguments returned by `.run()` to construct a `ProcessingStep`, along with the input and output channels and the code that will be executed when the pipeline invokes pipeline execution.\n", "\n", "Specifically, the `S3ModelArtifacts` from the `step_train` `properties` and the `S3Uri` of the `\"test_data\"` output channel of the `step_process` `properties` are passed as inputs. The `TrainingStep` and `ProcessingStep` `properties` attribute matches the object model of the [DescribeTrainingJob](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_DescribeTrainingJob.html) and [DescribeProcessingJob](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_DescribeProcessingJob.html) response objects, respectively." ] }, { "cell_type": "code", "execution_count": 15, "id": "ae643d67", "metadata": { "execution": { "iopub.execute_input": "2022-07-13T16:07:53.695523Z", "iopub.status.busy": "2022-07-13T16:07:53.694111Z", "iopub.status.idle": "2022-07-13T16:07:53.696160Z", "shell.execute_reply": "2022-07-13T16:07:53.696534Z" }, "papermill": { "duration": 0.104102, "end_time": "2022-07-13T16:07:53.696669", "exception": false, "start_time": "2022-07-13T16:07:53.592567", "status": "completed" }, "pycharm": { "name": "#%%\n" }, "tags": [] }, "outputs": [], "source": [ "from sagemaker.workflow.properties import PropertyFile\n", "\n", "\n", "evaluation_report = PropertyFile(\n", " name=\"EvaluationReport\", output_name=\"evaluation\", path=\"evaluation.json\"\n", ")\n", "step_eval = ProcessingStep(\n", " name=\"AbaloneEval\",\n", " step_args=eval_args,\n", " property_files=[evaluation_report],\n", ")" ] }, { "cell_type": "markdown", "id": "99f1a3c2", "metadata": { "papermill": { "duration": 0.096611, "end_time": "2022-07-13T16:07:53.966367", "exception": false, "start_time": "2022-07-13T16:07:53.869756", "status": "completed" }, "tags": [] }, "source": [ "![Define a Model Evaluation Step to Evaluate the Trained Model](https://raw.githubusercontent.com/aws/amazon-sagemaker-examples/main/sagemaker-pipelines/tabular/abalone_build_train_deploy/img/pipeline-4.png)" ] }, { "cell_type": "markdown", "id": "1ff0a560", "metadata": { "papermill": { "duration": 0.097005, "end_time": "2022-07-13T16:07:54.096931", "exception": false, "start_time": "2022-07-13T16:07:53.999926", "status": "completed" }, "tags": [] }, "source": [ "## Define a Create Model Step to Create a Model\n", "\n", "In order to perform batch transformation using the example model, create a SageMaker model.\n", "\n", "Specifically, pass in the `S3ModelArtifacts` from the `TrainingStep`, `step_train` properties. The `TrainingStep` `properties` attribute matches the object model of the [DescribeTrainingJob](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_DescribeTrainingJob.html) response object." ] }, { "cell_type": "code", "execution_count": 16, "id": "6aab382f", "metadata": { "execution": { "iopub.execute_input": "2022-07-13T16:07:54.292477Z", "iopub.status.busy": "2022-07-13T16:07:54.291118Z", "iopub.status.idle": "2022-07-13T16:07:54.293086Z", "shell.execute_reply": "2022-07-13T16:07:54.293444Z" }, "papermill": { "duration": 0.103791, "end_time": "2022-07-13T16:07:54.293572", "exception": false, "start_time": "2022-07-13T16:07:54.189781", "status": "completed" }, "pycharm": { "name": "#%%\n" }, "tags": [] }, "outputs": [], "source": [ "from sagemaker.model import Model\n", "\n", "model = Model(\n", " image_uri=image_uri,\n", " model_data=step_train.properties.ModelArtifacts.S3ModelArtifacts,\n", " sagemaker_session=pipeline_session,\n", " role=role,\n", ")" ] }, { "cell_type": "markdown", "id": "d9be254b", "metadata": { "papermill": { "duration": 0.099816, "end_time": "2022-07-13T16:07:54.490471", "exception": false, "start_time": "2022-07-13T16:07:54.390655", "status": "completed" }, "tags": [] }, "source": [ "Define the `ModelStep` by providing the return values from `model.create()` as the step arguments." ] }, { "cell_type": "code", "execution_count": 17, "id": "7a8dc222", "metadata": { "execution": { "iopub.execute_input": "2022-07-13T16:07:54.690071Z", "iopub.status.busy": "2022-07-13T16:07:54.689317Z", "iopub.status.idle": "2022-07-13T16:07:54.694056Z", "shell.execute_reply": "2022-07-13T16:07:54.694467Z" }, "papermill": { "duration": 0.106688, "end_time": "2022-07-13T16:07:54.694631", "exception": false, "start_time": "2022-07-13T16:07:54.587943", "status": "completed" }, "pycharm": { "name": "#%%\n" }, "tags": [] }, "outputs": [], "source": [ "from sagemaker.inputs import CreateModelInput\n", "from sagemaker.workflow.model_step import ModelStep\n", "\n", "step_create_model = ModelStep(\n", " name=\"AbaloneCreateModel\",\n", " step_args=model.create(instance_type=\"ml.m5.large\", accelerator_type=\"ml.eia1.medium\"),\n", ")" ] }, { "cell_type": "markdown", "id": "c26c7052", "metadata": { "papermill": { "duration": 0.092854, "end_time": "2022-07-13T16:07:54.964979", "exception": false, "start_time": "2022-07-13T16:07:54.872125", "status": "completed" }, "tags": [] }, "source": [ "## Define a Transform Step to Perform Batch Transformation\n", "\n", "Now that a model instance is defined, create a `Transformer` instance with the appropriate model type, compute instance type, and desired output S3 URI.\n", "\n", "Specifically, pass in the `ModelName` from the `CreateModelStep`, `step_create_model` properties. The `CreateModelStep` `properties` attribute matches the object model of the [DescribeModel](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_DescribeModel.html) response object." ] }, { "cell_type": "code", "execution_count": 18, "id": "f55318d8", "metadata": { "execution": { "iopub.execute_input": "2022-07-13T16:07:55.168074Z", "iopub.status.busy": "2022-07-13T16:07:55.167333Z", "iopub.status.idle": "2022-07-13T16:07:55.175645Z", "shell.execute_reply": "2022-07-13T16:07:55.176011Z" }, "papermill": { "duration": 0.177212, "end_time": "2022-07-13T16:07:55.176173", "exception": false, "start_time": "2022-07-13T16:07:54.998961", "status": "completed" }, "pycharm": { "name": "#%%\n" }, "tags": [] }, "outputs": [], "source": [ "from sagemaker.transformer import Transformer\n", "\n", "\n", "transformer = Transformer(\n", " model_name=step_create_model.properties.ModelName,\n", " instance_type=\"ml.m5.xlarge\",\n", " instance_count=1,\n", " output_path=f\"s3://{default_bucket}/AbaloneTransform\",\n", ")" ] }, { "cell_type": "markdown", "id": "6227a29a", "metadata": { "papermill": { "duration": 0.102855, "end_time": "2022-07-13T16:07:55.376256", "exception": false, "start_time": "2022-07-13T16:07:55.273401", "status": "completed" }, "tags": [] }, "source": [ "Pass in the transformer instance and the `TransformInput` with the `batch_data` pipeline parameter defined earlier." ] }, { "cell_type": "code", "execution_count": 19, "id": "6e1aa0ad", "metadata": { "execution": { "iopub.execute_input": "2022-07-13T16:07:55.578902Z", "iopub.status.busy": "2022-07-13T16:07:55.577537Z", "iopub.status.idle": "2022-07-13T16:07:55.579522Z", "shell.execute_reply": "2022-07-13T16:07:55.579879Z" }, "papermill": { "duration": 0.110305, "end_time": "2022-07-13T16:07:55.580011", "exception": false, "start_time": "2022-07-13T16:07:55.469706", "status": "completed" }, "pycharm": { "name": "#%%\n" }, "tags": [] }, "outputs": [], "source": [ "from sagemaker.inputs import TransformInput\n", "from sagemaker.workflow.steps import TransformStep\n", "\n", "\n", "step_transform = TransformStep(\n", " name=\"AbaloneTransform\", transformer=transformer, inputs=TransformInput(data=batch_data)\n", ")" ] }, { "cell_type": "markdown", "id": "e400b0f0", "metadata": { "papermill": { "duration": 0.097293, "end_time": "2022-07-13T16:07:55.779132", "exception": false, "start_time": "2022-07-13T16:07:55.681839", "status": "completed" }, "tags": [] }, "source": [ "## Define a Register Model Step to Create a Model Package\n", "\n", "A model package is an abstraction of reusable model artifacts that packages all ingredients required for inference. Primarily, it consists of an inference specification that defines the inference image to use along with an optional model weights location.\n", "\n", "A model package group is a collection of model packages. A model package group can be created for a specific ML business problem, and new versions of the model packages can be added to it. Typically, customers are expected to create a ModelPackageGroup for a SageMaker pipeline so that model package versions can be added to the group for every SageMaker Pipeline run.\n", "\n", "To register a model in the Model Registry, we take the model created in the previous steps\n", "```\n", "model = Model(\n", " image_uri=image_uri,\n", " model_data=step_train.properties.ModelArtifacts.S3ModelArtifacts,\n", " sagemaker_session=pipeline_session,\n", " role=role,\n", ")\n", "```\n", "and call the `.register()` function on it while passing all the parameters needed for registering the model.\n", "\n", "We take the outputs of the `.register()` call and pass that to the `ModelStep` as step arguments." ] }, { "cell_type": "code", "execution_count": 20, "id": "49268979", "metadata": { "execution": { "iopub.execute_input": "2022-07-13T16:07:55.990666Z", "iopub.status.busy": "2022-07-13T16:07:55.989233Z", "iopub.status.idle": "2022-07-13T16:07:55.991258Z", "shell.execute_reply": "2022-07-13T16:07:55.991663Z" }, "papermill": { "duration": 0.106521, "end_time": "2022-07-13T16:07:55.991800", "exception": false, "start_time": "2022-07-13T16:07:55.885279", "status": "completed" }, "pycharm": { "name": "#%%\n" }, "tags": [] }, "outputs": [], "source": [ "from sagemaker.model_metrics import MetricsSource, ModelMetrics\n", "\n", "model_metrics = ModelMetrics(\n", " model_statistics=MetricsSource(\n", " s3_uri=\"{}/evaluation.json\".format(\n", " step_eval.arguments[\"ProcessingOutputConfig\"][\"Outputs\"][0][\"S3Output\"][\"S3Uri\"]\n", " ),\n", " content_type=\"application/json\",\n", " )\n", ")\n", "\n", "register_args = model.register(\n", " content_types=[\"text/csv\"],\n", " response_types=[\"text/csv\"],\n", " inference_instances=[\"ml.t2.medium\", \"ml.m5.xlarge\"],\n", " transform_instances=[\"ml.m5.xlarge\"],\n", " model_package_group_name=model_package_group_name,\n", " approval_status=model_approval_status,\n", " model_metrics=model_metrics,\n", ")\n", "step_register = ModelStep(name=\"AbaloneRegisterModel\", step_args=register_args)" ] }, { "cell_type": "markdown", "id": "d1622708", "metadata": { "papermill": { "duration": 0.093149, "end_time": "2022-07-13T16:07:56.186199", "exception": false, "start_time": "2022-07-13T16:07:56.093050", "status": "completed" }, "tags": [] }, "source": [ "![Define a Create Model Step and Batch Transform to Process Data in Batch at Scale](https://raw.githubusercontent.com/aws/amazon-sagemaker-examples/main/sagemaker-pipelines/tabular/abalone_build_train_deploy/img/pipeline-5.png)" ] }, { "cell_type": "markdown", "id": "756157c0", "metadata": { "papermill": { "duration": 0.097569, "end_time": "2022-07-13T16:07:56.381372", "exception": false, "start_time": "2022-07-13T16:07:56.283803", "status": "completed" }, "tags": [] }, "source": [ "## Define a Fail Step to Terminate the Pipeline Execution and Mark it as Failed\n", "\n", "This section walks you through the following steps:\n", "\n", "* Define a `FailStep` with customized error message, which indicates the cause of the execution failure.\n", "* Enter the `FailStep` error message with a `Join` function, which appends a static text string with the dynamic `mse_threshold` parameter to build a more informative error message." ] }, { "cell_type": "code", "execution_count": 21, "id": "f041c8a8", "metadata": { "execution": { "iopub.execute_input": "2022-07-13T16:07:56.670215Z", "iopub.status.busy": "2022-07-13T16:07:56.669270Z", "iopub.status.idle": "2022-07-13T16:07:56.675249Z", "shell.execute_reply": "2022-07-13T16:07:56.674652Z" }, "papermill": { "duration": 0.193196, "end_time": "2022-07-13T16:07:56.675378", "exception": false, "start_time": "2022-07-13T16:07:56.482182", "status": "completed" }, "pycharm": { "name": "#%%\n" }, "tags": [] }, "outputs": [], "source": [ "from sagemaker.workflow.fail_step import FailStep\n", "from sagemaker.workflow.functions import Join\n", "\n", "step_fail = FailStep(\n", " name=\"AbaloneMSEFail\",\n", " error_message=Join(on=\" \", values=[\"Execution failed due to MSE >\", mse_threshold]),\n", ")" ] }, { "cell_type": "markdown", "id": "8fcf7e9c", "metadata": { "papermill": { "duration": 0.182821, "end_time": "2022-07-13T16:07:56.967379", "exception": false, "start_time": "2022-07-13T16:07:56.784558", "status": "completed" }, "tags": [] }, "source": [ "![Define a Fail Step to Terminate the Execution in Failed State](https://raw.githubusercontent.com/aws/amazon-sagemaker-examples/main/sagemaker-pipelines/tabular/abalone_build_train_deploy/img/pipeline-8.png)" ] }, { "cell_type": "markdown", "id": "fd777542", "metadata": { "papermill": { "duration": 0.034317, "end_time": "2022-07-13T16:07:57.099302", "exception": false, "start_time": "2022-07-13T16:07:57.064985", "status": "completed" }, "tags": [] }, "source": [ "## Define a Condition Step to Check Accuracy and Conditionally Create a Model and Run a Batch Transformation and Register a Model in the Model Registry, Or Terminate the Execution in Failed State\n", "\n", "In this step, the model is registered only if the accuracy of the model, as determined by the evaluation step `step_eval`, exceeded a specified value. Otherwise, the pipeline execution fails and terminates. A `ConditionStep` enables pipelines to support conditional execution in the pipeline DAG based on the conditions of the step properties.\n", "\n", "In the following section, you:\n", "\n", "* Define a `ConditionLessThanOrEqualTo` on the accuracy value found in the output of the evaluation step, `step_eval`.\n", "* Use the condition in the list of conditions in a `ConditionStep`.\n", "* Pass the `CreateModelStep` and `TransformStep` steps, and the `RegisterModel` step collection into the `if_steps` of the `ConditionStep`, which are only executed if the condition evaluates to `True`.\n", "* Pass the `FailStep` step into the `else_steps`of the `ConditionStep`, which is only executed if the condition evaluates to `False`." ] }, { "cell_type": "code", "execution_count": 22, "id": "8969e1fc", "metadata": { "execution": { "iopub.execute_input": "2022-07-13T16:07:57.300757Z", "iopub.status.busy": "2022-07-13T16:07:57.299973Z", "iopub.status.idle": "2022-07-13T16:07:57.367501Z", "shell.execute_reply": "2022-07-13T16:07:57.367888Z" }, "papermill": { "duration": 0.171087, "end_time": "2022-07-13T16:07:57.368032", "exception": false, "start_time": "2022-07-13T16:07:57.196945", "status": "completed" }, "pycharm": { "name": "#%%\n" }, "tags": [] }, "outputs": [], "source": [ "from sagemaker.workflow.conditions import ConditionLessThanOrEqualTo\n", "from sagemaker.workflow.condition_step import ConditionStep\n", "from sagemaker.workflow.functions import JsonGet\n", "\n", "\n", "cond_lte = ConditionLessThanOrEqualTo(\n", " left=JsonGet(\n", " step_name=step_eval.name,\n", " property_file=evaluation_report,\n", " json_path=\"regression_metrics.mse.value\",\n", " ),\n", " right=mse_threshold,\n", ")\n", "\n", "step_cond = ConditionStep(\n", " name=\"AbaloneMSECond\",\n", " conditions=[cond_lte],\n", " if_steps=[step_register, step_create_model, step_transform],\n", " else_steps=[step_fail],\n", ")" ] }, { "cell_type": "markdown", "id": "772154ec", "metadata": { "papermill": { "duration": 0.098709, "end_time": "2022-07-13T16:07:57.564590", "exception": false, "start_time": "2022-07-13T16:07:57.465881", "status": "completed" }, "tags": [] }, "source": [ "![Define a Condition Step to Check Accuracy and Conditionally Execute Steps](https://raw.githubusercontent.com/aws/amazon-sagemaker-examples/main/sagemaker-pipelines/tabular/abalone_build_train_deploy/img/pipeline-6.png)" ] }, { "cell_type": "markdown", "id": "5cd8a7d1", "metadata": { "papermill": { "duration": 0.097983, "end_time": "2022-07-13T16:07:57.697262", "exception": false, "start_time": "2022-07-13T16:07:57.599279", "status": "completed" }, "tags": [] }, "source": [ "## Define a Pipeline of Parameters, Steps, and Conditions\n", "\n", "In this section, combine the steps into a Pipeline so it can be executed.\n", "\n", "A pipeline requires a `name`, `parameters`, and `steps`. Names must be unique within an `(account, region)` pair.\n", "\n", "Note:\n", "\n", "* All the parameters used in the definitions must be present.\n", "* Steps passed into the pipeline do not have to be listed in the order of execution. The SageMaker Pipeline service resolves the data dependency DAG as steps for the execution to complete.\n", "* Steps must be unique to across the pipeline step list and all condition step if/else lists." ] }, { "cell_type": "code", "execution_count": 23, "id": "9ed7ea0f", "metadata": { "execution": { "iopub.execute_input": "2022-07-13T16:07:57.977833Z", "iopub.status.busy": "2022-07-13T16:07:57.977020Z", "iopub.status.idle": "2022-07-13T16:07:57.996102Z", "shell.execute_reply": "2022-07-13T16:07:57.995674Z" }, "papermill": { "duration": 0.201248, "end_time": "2022-07-13T16:07:57.996214", "exception": false, "start_time": "2022-07-13T16:07:57.794966", "status": "completed" }, "pycharm": { "name": "#%%\n" }, "tags": [] }, "outputs": [], "source": [ "from sagemaker.workflow.pipeline import Pipeline\n", "\n", "\n", "pipeline_name = f\"AbalonePipeline\"\n", "pipeline = Pipeline(\n", " name=pipeline_name,\n", " parameters=[\n", " processing_instance_count,\n", " instance_type,\n", " model_approval_status,\n", " input_data,\n", " batch_data,\n", " mse_threshold,\n", " ],\n", " steps=[step_process, step_train, step_eval, step_cond],\n", ")" ] }, { "cell_type": "markdown", "id": "8bd7778c", "metadata": { "papermill": { "duration": 0.098305, "end_time": "2022-07-13T16:07:58.192290", "exception": false, "start_time": "2022-07-13T16:07:58.093985", "status": "completed" }, "tags": [] }, "source": [ "![Define a Pipeline of Parameters, Steps, and Conditions](https://raw.githubusercontent.com/aws/amazon-sagemaker-examples/main/sagemaker-pipelines/tabular/abalone_build_train_deploy/img/pipeline-7.png)" ] }, { "cell_type": "markdown", "id": "c1ae8658", "metadata": { "papermill": { "duration": 0.097728, "end_time": "2022-07-13T16:07:58.394347", "exception": false, "start_time": "2022-07-13T16:07:58.296619", "status": "completed" }, "tags": [] }, "source": [ "### (Optional) Examining the pipeline definition\n", "\n", "The JSON of the pipeline definition can be examined to confirm the pipeline is well-defined and the parameters and step properties resolve correctly." ] }, { "cell_type": "code", "execution_count": 24, "id": "39a836b8", "metadata": { "execution": { "iopub.execute_input": "2022-07-13T16:07:58.681323Z", "iopub.status.busy": "2022-07-13T16:07:58.680648Z", "iopub.status.idle": "2022-07-13T16:07:58.684073Z", "shell.execute_reply": "2022-07-13T16:07:58.684421Z" }, "papermill": { "duration": 0.18974, "end_time": "2022-07-13T16:07:58.684552", "exception": false, "start_time": "2022-07-13T16:07:58.494812", "status": "completed" }, "pycharm": { "name": "#%%\n" }, "tags": [] }, "outputs": [ { "data": { "text/plain": [ "{'Version': '2020-12-01',\n", " 'Metadata': {},\n", " 'Parameters': [{'Name': 'ProcessingInstanceCount',\n", " 'Type': 'Integer',\n", " 'DefaultValue': 1},\n", " {'Name': 'TrainingInstanceType',\n", " 'Type': 'String',\n", " 'DefaultValue': 'ml.m5.xlarge'},\n", " {'Name': 'ModelApprovalStatus',\n", " 'Type': 'String',\n", " 'DefaultValue': 'PendingManualApproval'},\n", " {'Name': 'InputData',\n", " 'Type': 'String',\n", " 'DefaultValue': 's3://sagemaker-us-west-2-000000000000/abalone/abalone-dataset.csv'},\n", " {'Name': 'BatchData',\n", " 'Type': 'String',\n", " 'DefaultValue': 's3://sagemaker-us-west-2-000000000000/abalone/abalone-dataset-batch'},\n", " {'Name': 'MseThreshold', 'Type': 'Float', 'DefaultValue': 6.0}],\n", " 'PipelineExperimentConfig': {'ExperimentName': {'Get': 'Execution.PipelineName'},\n", " 'TrialName': {'Get': 'Execution.PipelineExecutionId'}},\n", " 'Steps': [{'Name': 'AbaloneProcess',\n", " 'Type': 'Processing',\n", " 'Arguments': {'ProcessingResources': {'ClusterConfig': {'InstanceType': 'ml.m5.xlarge',\n", " 'InstanceCount': {'Get': 'Parameters.ProcessingInstanceCount'},\n", " 'VolumeSizeInGB': 30}},\n", " 'AppSpecification': {'ImageUri': '246618743249.dkr.ecr.us-west-2.amazonaws.com/sagemaker-scikit-learn:0.23-1-cpu-py3',\n", " 'ContainerEntrypoint': ['python3',\n", " '/opt/ml/processing/input/code/preprocessing.py']},\n", " 'RoleArn': 'arn:aws:iam::000000000000:role/SageMakerRole',\n", " 'ProcessingInputs': [{'InputName': 'input-1',\n", " 'AppManaged': False,\n", " 'S3Input': {'S3Uri': {'Get': 'Parameters.InputData'},\n", " 'LocalPath': '/opt/ml/processing/input',\n", " 'S3DataType': 'S3Prefix',\n", " 'S3InputMode': 'File',\n", " 'S3DataDistributionType': 'FullyReplicated',\n", " 'S3CompressionType': 'None'}},\n", " {'InputName': 'code',\n", " 'AppManaged': False,\n", " 'S3Input': {'S3Uri': 's3://sagemaker-us-west-2-000000000000/sklearn-abalone-process-2022-07-13-16-07-51-228/input/code/preprocessing.py',\n", " 'LocalPath': '/opt/ml/processing/input/code',\n", " 'S3DataType': 'S3Prefix',\n", " 'S3InputMode': 'File',\n", " 'S3DataDistributionType': 'FullyReplicated',\n", " 'S3CompressionType': 'None'}}],\n", " 'ProcessingOutputConfig': {'Outputs': [{'OutputName': 'train',\n", " 'AppManaged': False,\n", " 'S3Output': {'S3Uri': 's3://sagemaker-us-west-2-000000000000/sklearn-abalone-process-2022-07-13-16-07-51-228/output/train',\n", " 'LocalPath': '/opt/ml/processing/train',\n", " 'S3UploadMode': 'EndOfJob'}},\n", " {'OutputName': 'validation',\n", " 'AppManaged': False,\n", " 'S3Output': {'S3Uri': 's3://sagemaker-us-west-2-000000000000/sklearn-abalone-process-2022-07-13-16-07-51-228/output/validation',\n", " 'LocalPath': '/opt/ml/processing/validation',\n", " 'S3UploadMode': 'EndOfJob'}},\n", " {'OutputName': 'test',\n", " 'AppManaged': False,\n", " 'S3Output': {'S3Uri': 's3://sagemaker-us-west-2-000000000000/sklearn-abalone-process-2022-07-13-16-07-51-228/output/test',\n", " 'LocalPath': '/opt/ml/processing/test',\n", " 'S3UploadMode': 'EndOfJob'}}]}}},\n", " {'Name': 'AbaloneTrain',\n", " 'Type': 'Training',\n", " 'Arguments': {'AlgorithmSpecification': {'TrainingInputMode': 'File',\n", " 'TrainingImage': '246618743249.dkr.ecr.us-west-2.amazonaws.com/sagemaker-xgboost:1.0-1-cpu-py3'},\n", " 'OutputDataConfig': {'S3OutputPath': 's3://sagemaker-us-west-2-000000000000/AbaloneTrain'},\n", " 'StoppingCondition': {'MaxRuntimeInSeconds': 86400},\n", " 'ResourceConfig': {'VolumeSizeInGB': 30,\n", " 'InstanceCount': 1,\n", " 'InstanceType': {'Get': 'Parameters.TrainingInstanceType'}},\n", " 'RoleArn': 'arn:aws:iam::000000000000:role/SageMakerRole',\n", " 'InputDataConfig': [{'DataSource': {'S3DataSource': {'S3DataType': 'S3Prefix',\n", " 'S3Uri': {'Get': \"Steps.AbaloneProcess.ProcessingOutputConfig.Outputs['train'].S3Output.S3Uri\"},\n", " 'S3DataDistributionType': 'FullyReplicated'}},\n", " 'ContentType': 'text/csv',\n", " 'ChannelName': 'train'},\n", " {'DataSource': {'S3DataSource': {'S3DataType': 'S3Prefix',\n", " 'S3Uri': {'Get': \"Steps.AbaloneProcess.ProcessingOutputConfig.Outputs['validation'].S3Output.S3Uri\"},\n", " 'S3DataDistributionType': 'FullyReplicated'}},\n", " 'ContentType': 'text/csv',\n", " 'ChannelName': 'validation'}],\n", " 'HyperParameters': {'objective': 'reg:linear',\n", " 'num_round': '50',\n", " 'max_depth': '5',\n", " 'eta': '0.2',\n", " 'gamma': '4',\n", " 'min_child_weight': '6',\n", " 'subsample': '0.7'},\n", " 'ProfilerRuleConfigurations': [{'RuleConfigurationName': 'ProfilerReport-1657728471',\n", " 'RuleEvaluatorImage': '895741380848.dkr.ecr.us-west-2.amazonaws.com/sagemaker-debugger-rules:latest',\n", " 'RuleParameters': {'rule_to_invoke': 'ProfilerReport'}}],\n", " 'ProfilerConfig': {'S3OutputPath': 's3://sagemaker-us-west-2-000000000000/AbaloneTrain'}}},\n", " {'Name': 'AbaloneEval',\n", " 'Type': 'Processing',\n", " 'Arguments': {'ProcessingResources': {'ClusterConfig': {'InstanceType': 'ml.m5.xlarge',\n", " 'InstanceCount': 1,\n", " 'VolumeSizeInGB': 30}},\n", " 'AppSpecification': {'ImageUri': '246618743249.dkr.ecr.us-west-2.amazonaws.com/sagemaker-xgboost:1.0-1-cpu-py3',\n", " 'ContainerEntrypoint': ['python3',\n", " '/opt/ml/processing/input/code/evaluation.py']},\n", " 'RoleArn': 'arn:aws:iam::000000000000:role/SageMakerRole',\n", " 'ProcessingInputs': [{'InputName': 'input-1',\n", " 'AppManaged': False,\n", " 'S3Input': {'S3Uri': {'Get': 'Steps.AbaloneTrain.ModelArtifacts.S3ModelArtifacts'},\n", " 'LocalPath': '/opt/ml/processing/model',\n", " 'S3DataType': 'S3Prefix',\n", " 'S3InputMode': 'File',\n", " 'S3DataDistributionType': 'FullyReplicated',\n", " 'S3CompressionType': 'None'}},\n", " {'InputName': 'input-2',\n", " 'AppManaged': False,\n", " 'S3Input': {'S3Uri': {'Get': \"Steps.AbaloneProcess.ProcessingOutputConfig.Outputs['test'].S3Output.S3Uri\"},\n", " 'LocalPath': '/opt/ml/processing/test',\n", " 'S3DataType': 'S3Prefix',\n", " 'S3InputMode': 'File',\n", " 'S3DataDistributionType': 'FullyReplicated',\n", " 'S3CompressionType': 'None'}},\n", " {'InputName': 'code',\n", " 'AppManaged': False,\n", " 'S3Input': {'S3Uri': 's3://sagemaker-us-west-2-000000000000/script-abalone-eval-2022-07-13-16-07-53-265/input/code/evaluation.py',\n", " 'LocalPath': '/opt/ml/processing/input/code',\n", " 'S3DataType': 'S3Prefix',\n", " 'S3InputMode': 'File',\n", " 'S3DataDistributionType': 'FullyReplicated',\n", " 'S3CompressionType': 'None'}}],\n", " 'ProcessingOutputConfig': {'Outputs': [{'OutputName': 'evaluation',\n", " 'AppManaged': False,\n", " 'S3Output': {'S3Uri': 's3://sagemaker-us-west-2-000000000000/script-abalone-eval-2022-07-13-16-07-53-265/output/evaluation',\n", " 'LocalPath': '/opt/ml/processing/evaluation',\n", " 'S3UploadMode': 'EndOfJob'}}]}},\n", " 'PropertyFiles': [{'PropertyFileName': 'EvaluationReport',\n", " 'OutputName': 'evaluation',\n", " 'FilePath': 'evaluation.json'}]},\n", " {'Name': 'AbaloneMSECond',\n", " 'Type': 'Condition',\n", " 'Arguments': {'Conditions': [{'Type': 'LessThanOrEqualTo',\n", " 'LeftValue': {'Std:JsonGet': {'PropertyFile': {'Get': 'Steps.AbaloneEval.PropertyFiles.EvaluationReport'},\n", " 'Path': 'regression_metrics.mse.value'}},\n", " 'RightValue': {'Get': 'Parameters.MseThreshold'}}],\n", " 'IfSteps': [{'Name': 'AbaloneRegisterModel-RegisterModel',\n", " 'Type': 'RegisterModel',\n", " 'Arguments': {'ModelPackageGroupName': 'AbaloneModelPackageGroupName',\n", " 'ModelMetrics': {'ModelQuality': {'Statistics': {'ContentType': 'application/json',\n", " 'S3Uri': 's3://sagemaker-us-west-2-000000000000/script-abalone-eval-2022-07-13-16-07-53-265/output/evaluation/evaluation.json'}},\n", " 'Bias': {},\n", " 'Explainability': {}},\n", " 'InferenceSpecification': {'Containers': [{'Image': '246618743249.dkr.ecr.us-west-2.amazonaws.com/sagemaker-xgboost:1.0-1-cpu-py3',\n", " 'Environment': {},\n", " 'ModelDataUrl': {'Get': 'Steps.AbaloneTrain.ModelArtifacts.S3ModelArtifacts'}}],\n", " 'SupportedContentTypes': ['text/csv'],\n", " 'SupportedResponseMIMETypes': ['text/csv'],\n", " 'SupportedRealtimeInferenceInstanceTypes': ['ml.t2.medium',\n", " 'ml.m5.xlarge'],\n", " 'SupportedTransformInstanceTypes': ['ml.m5.xlarge']},\n", " 'ModelApprovalStatus': {'Get': 'Parameters.ModelApprovalStatus'}}},\n", " {'Name': 'AbaloneCreateModel-CreateModel',\n", " 'Type': 'Model',\n", " 'Arguments': {'ExecutionRoleArn': 'arn:aws:iam::000000000000:role/SageMakerRole',\n", " 'PrimaryContainer': {'Image': '246618743249.dkr.ecr.us-west-2.amazonaws.com/sagemaker-xgboost:1.0-1-cpu-py3',\n", " 'Environment': {},\n", " 'ModelDataUrl': {'Get': 'Steps.AbaloneTrain.ModelArtifacts.S3ModelArtifacts'}}}},\n", " {'Name': 'AbaloneTransform',\n", " 'Type': 'Transform',\n", " 'Arguments': {'ModelName': {'Get': 'Steps.AbaloneCreateModel-CreateModel.ModelName'},\n", " 'TransformInput': {'DataSource': {'S3DataSource': {'S3DataType': 'S3Prefix',\n", " 'S3Uri': {'Get': 'Parameters.BatchData'}}}},\n", " 'TransformOutput': {'S3OutputPath': 's3://sagemaker-us-west-2-000000000000/AbaloneTransform'},\n", " 'TransformResources': {'InstanceCount': 1,\n", " 'InstanceType': 'ml.m5.xlarge'}}}],\n", " 'ElseSteps': [{'Name': 'AbaloneMSEFail',\n", " 'Type': 'Fail',\n", " 'Arguments': {'ErrorMessage': {'Std:Join': {'On': ' ',\n", " 'Values': ['Execution failed due to MSE >',\n", " {'Get': 'Parameters.MseThreshold'}]}}}}]}}]}" ] }, "execution_count": 24, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import json\n", "\n", "\n", "definition = json.loads(pipeline.definition())\n", "definition" ] }, { "cell_type": "markdown", "id": "0b396c63", "metadata": { "papermill": { "duration": 0.093902, "end_time": "2022-07-13T16:07:58.880484", "exception": false, "start_time": "2022-07-13T16:07:58.786582", "status": "completed" }, "tags": [] }, "source": [ "## Submit the pipeline to SageMaker and start execution\n", "\n", "Submit the pipeline definition to the Pipeline service. The Pipeline service uses the role that is passed in to create all the jobs defined in the steps." ] }, { "cell_type": "code", "execution_count": 25, "id": "944bcafb", "metadata": { "execution": { "iopub.execute_input": "2022-07-13T16:07:59.093264Z", "iopub.status.busy": "2022-07-13T16:07:59.092600Z", "iopub.status.idle": "2022-07-13T16:08:00.011312Z", "shell.execute_reply": "2022-07-13T16:08:00.011954Z" }, "papermill": { "duration": 1.022416, "end_time": "2022-07-13T16:08:00.012097", "exception": false, "start_time": "2022-07-13T16:07:58.989681", "status": "completed" }, "pycharm": { "name": "#%%\n" }, "tags": [] }, "outputs": [ { "data": { "text/plain": [ "{'PipelineArn': 'arn:aws:sagemaker:us-west-2:000000000000:pipeline/abalonepipeline',\n", " 'ResponseMetadata': {'RequestId': 'dcee2b6f-839c-4493-bc40-26beda53949e',\n", " 'HTTPStatusCode': 200,\n", " 'HTTPHeaders': {'x-amzn-requestid': 'dcee2b6f-839c-4493-bc40-26beda53949e',\n", " 'content-type': 'application/x-amz-json-1.1',\n", " 'content-length': '83',\n", " 'date': 'Wed, 13 Jul 2022 16:07:59 GMT'},\n", " 'RetryAttempts': 0}}" ] }, "execution_count": 25, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pipeline.upsert(role_arn=role)" ] }, { "cell_type": "markdown", "id": "7c7133c2", "metadata": { "papermill": { "duration": 0.03559, "end_time": "2022-07-13T16:08:00.083359", "exception": false, "start_time": "2022-07-13T16:08:00.047769", "status": "completed" }, "tags": [] }, "source": [ "Start the pipeline and accept all the default parameters." ] }, { "cell_type": "code", "execution_count": 26, "id": "cde6f323", "metadata": { "execution": { "iopub.execute_input": "2022-07-13T16:08:00.168396Z", "iopub.status.busy": "2022-07-13T16:08:00.167738Z", "iopub.status.idle": "2022-07-13T16:08:00.468960Z", "shell.execute_reply": "2022-07-13T16:08:00.469840Z" }, "papermill": { "duration": 0.346706, "end_time": "2022-07-13T16:08:00.470006", "exception": false, "start_time": "2022-07-13T16:08:00.123300", "status": "completed" }, "pycharm": { "name": "#%%\n" }, "tags": [] }, "outputs": [], "source": [ "execution = pipeline.start()" ] }, { "cell_type": "markdown", "id": "2a608977", "metadata": { "papermill": { "duration": 0.040044, "end_time": "2022-07-13T16:08:00.552015", "exception": false, "start_time": "2022-07-13T16:08:00.511971", "status": "completed" }, "tags": [] }, "source": [ "## Pipeline Operations: Examining and Waiting for Pipeline Execution\n", "\n", "Describe the pipeline execution." ] }, { "cell_type": "code", "execution_count": 27, "id": "7140e583", "metadata": { "execution": { "iopub.execute_input": "2022-07-13T16:08:00.638590Z", "iopub.status.busy": "2022-07-13T16:08:00.638043Z", "iopub.status.idle": "2022-07-13T16:08:00.979910Z", "shell.execute_reply": "2022-07-13T16:08:00.980329Z" }, "papermill": { "duration": 0.388899, "end_time": "2022-07-13T16:08:00.980580", "exception": false, "start_time": "2022-07-13T16:08:00.591681", "status": "completed" }, "pycharm": { "name": "#%%\n" }, "tags": [] }, "outputs": [ { "data": { "text/plain": [ "{'PipelineArn': 'arn:aws:sagemaker:us-west-2:000000000000:pipeline/abalonepipeline',\n", " 'PipelineExecutionArn': 'arn:aws:sagemaker:us-west-2:000000000000:pipeline/abalonepipeline/execution/d84ewltjmxhe',\n", " 'PipelineExecutionDisplayName': 'execution-1657728480421',\n", " 'PipelineExecutionStatus': 'Executing',\n", " 'CreationTime': datetime.datetime(2022, 7, 13, 16, 8, 0, 354000, tzinfo=tzlocal()),\n", " 'LastModifiedTime': datetime.datetime(2022, 7, 13, 16, 8, 0, 354000, tzinfo=tzlocal()),\n", " 'CreatedBy': {},\n", " 'LastModifiedBy': {},\n", " 'ResponseMetadata': {'RequestId': '04d9849a-0f5d-40f7-88d9-30e87f51908a',\n", " 'HTTPStatusCode': 200,\n", " 'HTTPHeaders': {'x-amzn-requestid': '04d9849a-0f5d-40f7-88d9-30e87f51908a',\n", " 'content-type': 'application/x-amz-json-1.1',\n", " 'content-length': '395',\n", " 'date': 'Wed, 13 Jul 2022 16:08:00 GMT'},\n", " 'RetryAttempts': 0}}" ] }, "execution_count": 27, "metadata": {}, "output_type": "execute_result" } ], "source": [ "execution.describe()" ] }, { "cell_type": "markdown", "id": "d548cb82", "metadata": { "papermill": { "duration": 0.058959, "end_time": "2022-07-13T16:08:01.088218", "exception": false, "start_time": "2022-07-13T16:08:01.029259", "status": "completed" }, "tags": [] }, "source": [ "Wait for the execution to complete." ] }, { "cell_type": "code", "execution_count": 28, "id": "97251833", "metadata": { "execution": { "iopub.execute_input": "2022-07-13T16:08:01.164376Z", "iopub.status.busy": "2022-07-13T16:08:01.163859Z", "iopub.status.idle": "2022-07-13T16:27:11.118707Z", "shell.execute_reply": "2022-07-13T16:27:11.117917Z" }, "papermill": { "duration": 1149.995026, "end_time": "2022-07-13T16:27:11.118837", "exception": false, "start_time": "2022-07-13T16:08:01.123811", "status": "completed" }, "pycharm": { "name": "#%%\n" }, "tags": [] }, "outputs": [], "source": [ "execution.wait()" ] }, { "cell_type": "markdown", "id": "a4dc0b89", "metadata": { "papermill": { "duration": 0.038587, "end_time": "2022-07-13T16:27:11.202554", "exception": false, "start_time": "2022-07-13T16:27:11.163967", "status": "completed" }, "tags": [] }, "source": [ "List the steps in the execution. These are the steps in the pipeline that have been resolved by the step executor service." ] }, { "cell_type": "code", "execution_count": 29, "id": "6293013c", "metadata": { "execution": { "iopub.execute_input": "2022-07-13T16:27:11.287922Z", "iopub.status.busy": "2022-07-13T16:27:11.287440Z", "iopub.status.idle": "2022-07-13T16:27:11.552799Z", "shell.execute_reply": "2022-07-13T16:27:11.553188Z" }, "papermill": { "duration": 0.311444, "end_time": "2022-07-13T16:27:11.553327", "exception": false, "start_time": "2022-07-13T16:27:11.241883", "status": "completed" }, "pycharm": { "name": "#%%\n" }, "tags": [] }, "outputs": [ { "data": { "text/plain": [ "[{'StepName': 'AbaloneTransform',\n", " 'StartTime': datetime.datetime(2022, 7, 13, 16, 21, 28, 497000, tzinfo=tzlocal()),\n", " 'EndTime': datetime.datetime(2022, 7, 13, 16, 26, 41, 240000, tzinfo=tzlocal()),\n", " 'StepStatus': 'Succeeded',\n", " 'AttemptCount': 0,\n", " 'Metadata': {'TransformJob': {'Arn': 'arn:aws:sagemaker:us-west-2:000000000000:transform-job/pipelines-d84ewltjmxhe-abalonetransform-hjf5dpst8n'}}},\n", " {'StepName': 'AbaloneRegisterModel-RegisterModel',\n", " 'StartTime': datetime.datetime(2022, 7, 13, 16, 21, 27, 227000, tzinfo=tzlocal()),\n", " 'EndTime': datetime.datetime(2022, 7, 13, 16, 21, 27, 908000, tzinfo=tzlocal()),\n", " 'StepStatus': 'Succeeded',\n", " 'AttemptCount': 0,\n", " 'Metadata': {'RegisterModel': {'Arn': 'arn:aws:sagemaker:us-west-2:000000000000:model-package/abalonemodelpackagegroupname/1'}}},\n", " {'StepName': 'AbaloneCreateModel-CreateModel',\n", " 'StartTime': datetime.datetime(2022, 7, 13, 16, 21, 27, 227000, tzinfo=tzlocal()),\n", " 'EndTime': datetime.datetime(2022, 7, 13, 16, 21, 28, 88000, tzinfo=tzlocal()),\n", " 'StepStatus': 'Succeeded',\n", " 'AttemptCount': 0,\n", " 'Metadata': {'Model': {'Arn': 'arn:aws:sagemaker:us-west-2:000000000000:model/pipelines-d84ewltjmxhe-abalonecreatemodel-c-dgvk2pwxwn'}}},\n", " {'StepName': 'AbaloneMSECond',\n", " 'StartTime': datetime.datetime(2022, 7, 13, 16, 21, 25, 972000, tzinfo=tzlocal()),\n", " 'EndTime': datetime.datetime(2022, 7, 13, 16, 21, 26, 697000, tzinfo=tzlocal()),\n", " 'StepStatus': 'Succeeded',\n", " 'AttemptCount': 0,\n", " 'Metadata': {'Condition': {'Outcome': 'True'}}},\n", " {'StepName': 'AbaloneEval',\n", " 'StartTime': datetime.datetime(2022, 7, 13, 16, 16, 42, 221000, tzinfo=tzlocal()),\n", " 'EndTime': datetime.datetime(2022, 7, 13, 16, 21, 25, 344000, tzinfo=tzlocal()),\n", " 'StepStatus': 'Succeeded',\n", " 'AttemptCount': 0,\n", " 'Metadata': {'ProcessingJob': {'Arn': 'arn:aws:sagemaker:us-west-2:000000000000:processing-job/pipelines-d84ewltjmxhe-abaloneeval-gq8u48ijdl'}}},\n", " {'StepName': 'AbaloneTrain',\n", " 'StartTime': datetime.datetime(2022, 7, 13, 16, 13, 8, 547000, tzinfo=tzlocal()),\n", " 'EndTime': datetime.datetime(2022, 7, 13, 16, 16, 41, 447000, tzinfo=tzlocal()),\n", " 'StepStatus': 'Succeeded',\n", " 'AttemptCount': 0,\n", " 'Metadata': {'TrainingJob': {'Arn': 'arn:aws:sagemaker:us-west-2:000000000000:training-job/pipelines-d84ewltjmxhe-abalonetrain-3kyzvpyx11'}}},\n", " {'StepName': 'AbaloneProcess',\n", " 'StartTime': datetime.datetime(2022, 7, 13, 16, 8, 2, 216000, tzinfo=tzlocal()),\n", " 'EndTime': datetime.datetime(2022, 7, 13, 16, 13, 7, 881000, tzinfo=tzlocal()),\n", " 'StepStatus': 'Succeeded',\n", " 'AttemptCount': 0,\n", " 'Metadata': {'ProcessingJob': {'Arn': 'arn:aws:sagemaker:us-west-2:000000000000:processing-job/pipelines-d84ewltjmxhe-abaloneprocess-alx6e7utl3'}}}]" ] }, "execution_count": 29, "metadata": {}, "output_type": "execute_result" } ], "source": [ "execution.list_steps()" ] }, { "cell_type": "markdown", "id": "f1b6e6aa", "metadata": { "papermill": { "duration": 0.043169, "end_time": "2022-07-13T16:27:11.634677", "exception": false, "start_time": "2022-07-13T16:27:11.591508", "status": "completed" }, "tags": [] }, "source": [ "### Examining the Evaluation\n", "\n", "Examine the resulting model evaluation after the pipeline completes. Download the resulting `evaluation.json` file from S3 and print the report." ] }, { "cell_type": "code", "execution_count": 30, "id": "f151e408", "metadata": { "execution": { "iopub.execute_input": "2022-07-13T16:27:11.740982Z", "iopub.status.busy": "2022-07-13T16:27:11.740105Z", "iopub.status.idle": "2022-07-13T16:27:12.193225Z", "shell.execute_reply": "2022-07-13T16:27:12.192820Z" }, "papermill": { "duration": 0.514239, "end_time": "2022-07-13T16:27:12.193344", "exception": false, "start_time": "2022-07-13T16:27:11.679105", "status": "completed" }, "pycharm": { "name": "#%%\n" }, "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "{'regression_metrics': {'mse': {'standard_deviation': 2.2157255951663437,\n", " 'value': 4.913856830660247}}}\n" ] } ], "source": [ "from pprint import pprint\n", "\n", "\n", "evaluation_json = sagemaker.s3.S3Downloader.read_file(\n", " \"{}/evaluation.json\".format(\n", " step_eval.arguments[\"ProcessingOutputConfig\"][\"Outputs\"][0][\"S3Output\"][\"S3Uri\"]\n", " )\n", ")\n", "pprint(json.loads(evaluation_json))" ] }, { "cell_type": "markdown", "id": "08eec00d", "metadata": { "papermill": { "duration": 0.036632, "end_time": "2022-07-13T16:27:12.266645", "exception": false, "start_time": "2022-07-13T16:27:12.230013", "status": "completed" }, "tags": [] }, "source": [ "### Lineage\n", "\n", "Review the lineage of the artifacts generated by the pipeline." ] }, { "cell_type": "code", "execution_count": 31, "id": "ac629990", "metadata": { "execution": { "iopub.execute_input": "2022-07-13T16:27:12.347880Z", "iopub.status.busy": "2022-07-13T16:27:12.347015Z", "iopub.status.idle": "2022-07-13T16:27:49.232675Z", "shell.execute_reply": "2022-07-13T16:27:49.233495Z" }, "papermill": { "duration": 36.928591, "end_time": "2022-07-13T16:27:49.233763", "exception": false, "start_time": "2022-07-13T16:27:12.305172", "status": "completed" }, "pycharm": { "name": "#%%\n" }, "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "{'StepName': 'AbaloneProcess', 'StartTime': datetime.datetime(2022, 7, 13, 16, 8, 2, 216000, tzinfo=tzlocal()), 'EndTime': datetime.datetime(2022, 7, 13, 16, 13, 7, 881000, tzinfo=tzlocal()), 'StepStatus': 'Succeeded', 'AttemptCount': 0, 'Metadata': {'ProcessingJob': {'Arn': 'arn:aws:sagemaker:us-west-2:000000000000:processing-job/pipelines-d84ewltjmxhe-abaloneprocess-alx6e7utl3'}}}\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Name/SourceDirectionTypeAssociation TypeLineage Type
0s3://...16-07-51-228/input/code/preprocessing.pyInputDataSetContributedToartifact
1s3://...000000000000/abalone/abalone-dataset.csvInputDataSetContributedToartifact
224661...om/sagemaker-scikit-learn:0.23-1-cpu-py3InputImageContributedToartifact
3s3://...cess-2022-07-13-16-07-51-228/output/testOutputDataSetProducedartifact
4s3://...022-07-13-16-07-51-228/output/validationOutputDataSetProducedartifact
5s3://...ess-2022-07-13-16-07-51-228/output/trainOutputDataSetProducedartifact
\n", "
" ], "text/plain": [ " Name/Source Direction Type \\\n", "0 s3://...16-07-51-228/input/code/preprocessing.py Input DataSet \n", "1 s3://...000000000000/abalone/abalone-dataset.csv Input DataSet \n", "2 24661...om/sagemaker-scikit-learn:0.23-1-cpu-py3 Input Image \n", "3 s3://...cess-2022-07-13-16-07-51-228/output/test Output DataSet \n", "4 s3://...022-07-13-16-07-51-228/output/validation Output DataSet \n", "5 s3://...ess-2022-07-13-16-07-51-228/output/train Output DataSet \n", "\n", " Association Type Lineage Type \n", "0 ContributedTo artifact \n", "1 ContributedTo artifact \n", "2 ContributedTo artifact \n", "3 Produced artifact \n", "4 Produced artifact \n", "5 Produced artifact " ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "{'StepName': 'AbaloneTrain', 'StartTime': datetime.datetime(2022, 7, 13, 16, 13, 8, 547000, tzinfo=tzlocal()), 'EndTime': datetime.datetime(2022, 7, 13, 16, 16, 41, 447000, tzinfo=tzlocal()), 'StepStatus': 'Succeeded', 'AttemptCount': 0, 'Metadata': {'TrainingJob': {'Arn': 'arn:aws:sagemaker:us-west-2:000000000000:training-job/pipelines-d84ewltjmxhe-abalonetrain-3kyzvpyx11'}}}\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Name/SourceDirectionTypeAssociation TypeLineage Type
0s3://...022-07-13-16-07-51-228/output/validationInputDataSetContributedToartifact
1s3://...ess-2022-07-13-16-07-51-228/output/trainInputDataSetContributedToartifact
224661...naws.com/sagemaker-xgboost:1.0-1-cpu-py3InputImageContributedToartifact
3s3://...loneTrain-3kYZvpYx11/output/model.tar.gzOutputModelProducedartifact
\n", "
" ], "text/plain": [ " Name/Source Direction Type \\\n", "0 s3://...022-07-13-16-07-51-228/output/validation Input DataSet \n", "1 s3://...ess-2022-07-13-16-07-51-228/output/train Input DataSet \n", "2 24661...naws.com/sagemaker-xgboost:1.0-1-cpu-py3 Input Image \n", "3 s3://...loneTrain-3kYZvpYx11/output/model.tar.gz Output Model \n", "\n", " Association Type Lineage Type \n", "0 ContributedTo artifact \n", "1 ContributedTo artifact \n", "2 ContributedTo artifact \n", "3 Produced artifact " ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "{'StepName': 'AbaloneEval', 'StartTime': datetime.datetime(2022, 7, 13, 16, 16, 42, 221000, tzinfo=tzlocal()), 'EndTime': datetime.datetime(2022, 7, 13, 16, 21, 25, 344000, tzinfo=tzlocal()), 'StepStatus': 'Succeeded', 'AttemptCount': 0, 'Metadata': {'ProcessingJob': {'Arn': 'arn:aws:sagemaker:us-west-2:000000000000:processing-job/pipelines-d84ewltjmxhe-abaloneeval-gq8u48ijdl'}}}\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Name/SourceDirectionTypeAssociation TypeLineage Type
0s3://...13-16-07-53-265/input/code/evaluation.pyInputDataSetContributedToartifact
1s3://...cess-2022-07-13-16-07-51-228/output/testInputDataSetContributedToartifact
2s3://...loneTrain-3kYZvpYx11/output/model.tar.gzInputModelContributedToartifact
324661...naws.com/sagemaker-xgboost:1.0-1-cpu-py3InputImageContributedToartifact
4s3://...022-07-13-16-07-53-265/output/evaluationOutputDataSetProducedartifact
\n", "
" ], "text/plain": [ " Name/Source Direction Type \\\n", "0 s3://...13-16-07-53-265/input/code/evaluation.py Input DataSet \n", "1 s3://...cess-2022-07-13-16-07-51-228/output/test Input DataSet \n", "2 s3://...loneTrain-3kYZvpYx11/output/model.tar.gz Input Model \n", "3 24661...naws.com/sagemaker-xgboost:1.0-1-cpu-py3 Input Image \n", "4 s3://...022-07-13-16-07-53-265/output/evaluation Output DataSet \n", "\n", " Association Type Lineage Type \n", "0 ContributedTo artifact \n", "1 ContributedTo artifact \n", "2 ContributedTo artifact \n", "3 ContributedTo artifact \n", "4 Produced artifact " ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "{'StepName': 'AbaloneMSECond', 'StartTime': datetime.datetime(2022, 7, 13, 16, 21, 25, 972000, tzinfo=tzlocal()), 'EndTime': datetime.datetime(2022, 7, 13, 16, 21, 26, 697000, tzinfo=tzlocal()), 'StepStatus': 'Succeeded', 'AttemptCount': 0, 'Metadata': {'Condition': {'Outcome': 'True'}}}\n" ] }, { "data": { "text/plain": [ "None" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "{'StepName': 'AbaloneCreateModel-CreateModel', 'StartTime': datetime.datetime(2022, 7, 13, 16, 21, 27, 227000, tzinfo=tzlocal()), 'EndTime': datetime.datetime(2022, 7, 13, 16, 21, 28, 88000, tzinfo=tzlocal()), 'StepStatus': 'Succeeded', 'AttemptCount': 0, 'Metadata': {'Model': {'Arn': 'arn:aws:sagemaker:us-west-2:000000000000:model/pipelines-d84ewltjmxhe-abalonecreatemodel-c-dgvk2pwxwn'}}}\n" ] }, { "data": { "text/plain": [ "None" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "{'StepName': 'AbaloneRegisterModel-RegisterModel', 'StartTime': datetime.datetime(2022, 7, 13, 16, 21, 27, 227000, tzinfo=tzlocal()), 'EndTime': datetime.datetime(2022, 7, 13, 16, 21, 27, 908000, tzinfo=tzlocal()), 'StepStatus': 'Succeeded', 'AttemptCount': 0, 'Metadata': {'RegisterModel': {'Arn': 'arn:aws:sagemaker:us-west-2:000000000000:model-package/abalonemodelpackagegroupname/1'}}}\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Name/SourceDirectionTypeAssociation TypeLineage Type
0s3://...loneTrain-3kYZvpYx11/output/model.tar.gzInputModelContributedToartifact
124661...naws.com/sagemaker-xgboost:1.0-1-cpu-py3InputImageContributedToartifact
2abalonemodelpackagegroupname-1-PendingManualAp...InputApprovalContributedToaction
3AbaloneModelPackageGroupName-1657729287-aws-mo...OutputModelGroupAssociatedWithcontext
\n", "
" ], "text/plain": [ " Name/Source Direction Type \\\n", "0 s3://...loneTrain-3kYZvpYx11/output/model.tar.gz Input Model \n", "1 24661...naws.com/sagemaker-xgboost:1.0-1-cpu-py3 Input Image \n", "2 abalonemodelpackagegroupname-1-PendingManualAp... Input Approval \n", "3 AbaloneModelPackageGroupName-1657729287-aws-mo... Output ModelGroup \n", "\n", " Association Type Lineage Type \n", "0 ContributedTo artifact \n", "1 ContributedTo artifact \n", "2 ContributedTo action \n", "3 AssociatedWith context " ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "{'StepName': 'AbaloneTransform', 'StartTime': datetime.datetime(2022, 7, 13, 16, 21, 28, 497000, tzinfo=tzlocal()), 'EndTime': datetime.datetime(2022, 7, 13, 16, 26, 41, 240000, tzinfo=tzlocal()), 'StepStatus': 'Succeeded', 'AttemptCount': 0, 'Metadata': {'TransformJob': {'Arn': 'arn:aws:sagemaker:us-west-2:000000000000:transform-job/pipelines-d84ewltjmxhe-abalonetransform-hjf5dpst8n'}}}\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Name/SourceDirectionTypeAssociation TypeLineage Type
0s3://...loneTrain-3kYZvpYx11/output/model.tar.gzInputModelContributedToartifact
124661...naws.com/sagemaker-xgboost:1.0-1-cpu-py3InputImageContributedToartifact
2s3://...1695447989/abalone/abalone-dataset-batchInputDataSetContributedToartifact
3s3://...-us-west-2-000000000000/AbaloneTransformOutputDataSetProducedartifact
\n", "
" ], "text/plain": [ " Name/Source Direction Type \\\n", "0 s3://...loneTrain-3kYZvpYx11/output/model.tar.gz Input Model \n", "1 24661...naws.com/sagemaker-xgboost:1.0-1-cpu-py3 Input Image \n", "2 s3://...1695447989/abalone/abalone-dataset-batch Input DataSet \n", "3 s3://...-us-west-2-000000000000/AbaloneTransform Output DataSet \n", "\n", " Association Type Lineage Type \n", "0 ContributedTo artifact \n", "1 ContributedTo artifact \n", "2 ContributedTo artifact \n", "3 Produced artifact " ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "import time\n", "from sagemaker.lineage.visualizer import LineageTableVisualizer\n", "\n", "\n", "viz = LineageTableVisualizer(sagemaker.session.Session())\n", "for execution_step in reversed(execution.list_steps()):\n", " print(execution_step)\n", " display(viz.show(pipeline_execution_step=execution_step))\n", " time.sleep(5)" ] }, { "cell_type": "markdown", "id": "76c5a401", "metadata": { "papermill": { "duration": 0.045193, "end_time": "2022-07-13T16:27:49.331609", "exception": false, "start_time": "2022-07-13T16:27:49.286416", "status": "completed" }, "tags": [] }, "source": [ "### Parametrized Executions\n", "\n", "You can run additional executions of the pipeline and specify different pipeline parameters. The `parameters` argument is a dictionary containing parameter names, and where the values are used to override the defaults values.\n", "\n", "Based on the performance of the model, you might want to kick off another pipeline execution on a compute-optimized instance type and set the model approval status to \"Approved\" automatically. This means that the model package version generated by the `RegisterModel` step is automatically ready for deployment through CI/CD pipelines, such as with SageMaker Projects." ] }, { "cell_type": "code", "execution_count": 32, "id": "78b12dc2", "metadata": { "execution": { "iopub.execute_input": "2022-07-13T16:27:49.433540Z", "iopub.status.busy": "2022-07-13T16:27:49.432536Z", "iopub.status.idle": "2022-07-13T16:27:50.039997Z", "shell.execute_reply": "2022-07-13T16:27:50.040356Z" }, "papermill": { "duration": 0.660009, "end_time": "2022-07-13T16:27:50.040517", "exception": false, "start_time": "2022-07-13T16:27:49.380508", "status": "completed" }, "pycharm": { "name": "#%%\n" }, "tags": [] }, "outputs": [], "source": [ "execution = pipeline.start(\n", " parameters=dict(\n", " ModelApprovalStatus=\"Approved\",\n", " )\n", ")" ] }, { "cell_type": "code", "execution_count": 33, "id": "c053eb92", "metadata": { "execution": { "iopub.execute_input": "2022-07-13T16:27:50.127383Z", "iopub.status.busy": "2022-07-13T16:27:50.126899Z", "iopub.status.idle": "2022-07-13T16:45:59.346748Z", "shell.execute_reply": "2022-07-13T16:45:59.346269Z" }, "papermill": { "duration": 1089.265129, "end_time": "2022-07-13T16:45:59.346876", "exception": false, "start_time": "2022-07-13T16:27:50.081747", "status": "completed" }, "pycharm": { "name": "#%%\n" }, "tags": [] }, "outputs": [], "source": [ "execution.wait()" ] }, { "cell_type": "code", "execution_count": 34, "id": "e498d8a3", "metadata": { "execution": { "iopub.execute_input": "2022-07-13T16:45:59.460546Z", "iopub.status.busy": "2022-07-13T16:45:59.459990Z", "iopub.status.idle": "2022-07-13T16:45:59.682468Z", "shell.execute_reply": "2022-07-13T16:45:59.682907Z" }, "papermill": { "duration": 0.290742, "end_time": "2022-07-13T16:45:59.683052", "exception": false, "start_time": "2022-07-13T16:45:59.392310", "status": "completed" }, "pycharm": { "name": "#%%\n" }, "tags": [] }, "outputs": [ { "data": { "text/plain": [ "[{'StepName': 'AbaloneTransform',\n", " 'StartTime': datetime.datetime(2022, 7, 13, 16, 40, 37, 410000, tzinfo=tzlocal()),\n", " 'EndTime': datetime.datetime(2022, 7, 13, 16, 45, 35, 417000, tzinfo=tzlocal()),\n", " 'StepStatus': 'Succeeded',\n", " 'AttemptCount': 0,\n", " 'Metadata': {'TransformJob': {'Arn': 'arn:aws:sagemaker:us-west-2:000000000000:transform-job/pipelines-ib5u1ar77ell-abalonetransform-ybfqj1idhf'}}},\n", " {'StepName': 'AbaloneRegisterModel-RegisterModel',\n", " 'StartTime': datetime.datetime(2022, 7, 13, 16, 40, 35, 659000, tzinfo=tzlocal()),\n", " 'EndTime': datetime.datetime(2022, 7, 13, 16, 40, 36, 745000, tzinfo=tzlocal()),\n", " 'StepStatus': 'Succeeded',\n", " 'AttemptCount': 0,\n", " 'Metadata': {'RegisterModel': {'Arn': 'arn:aws:sagemaker:us-west-2:000000000000:model-package/abalonemodelpackagegroupname/2'}}},\n", " {'StepName': 'AbaloneCreateModel-CreateModel',\n", " 'StartTime': datetime.datetime(2022, 7, 13, 16, 40, 35, 659000, tzinfo=tzlocal()),\n", " 'EndTime': datetime.datetime(2022, 7, 13, 16, 40, 36, 840000, tzinfo=tzlocal()),\n", " 'StepStatus': 'Succeeded',\n", " 'AttemptCount': 0,\n", " 'Metadata': {'Model': {'Arn': 'arn:aws:sagemaker:us-west-2:000000000000:model/pipelines-ib5u1ar77ell-abalonecreatemodel-c-z2inp9am2j'}}},\n", " {'StepName': 'AbaloneMSECond',\n", " 'StartTime': datetime.datetime(2022, 7, 13, 16, 40, 34, 694000, tzinfo=tzlocal()),\n", " 'EndTime': datetime.datetime(2022, 7, 13, 16, 40, 34, 978000, tzinfo=tzlocal()),\n", " 'StepStatus': 'Succeeded',\n", " 'AttemptCount': 0,\n", " 'Metadata': {'Condition': {'Outcome': 'True'}}},\n", " {'StepName': 'AbaloneEval',\n", " 'StartTime': datetime.datetime(2022, 7, 13, 16, 35, 44, 913000, tzinfo=tzlocal()),\n", " 'EndTime': datetime.datetime(2022, 7, 13, 16, 40, 34, 151000, tzinfo=tzlocal()),\n", " 'StepStatus': 'Succeeded',\n", " 'AttemptCount': 0,\n", " 'Metadata': {'ProcessingJob': {'Arn': 'arn:aws:sagemaker:us-west-2:000000000000:processing-job/pipelines-ib5u1ar77ell-abaloneeval-d1olj9t5kb'}}},\n", " {'StepName': 'AbaloneTrain',\n", " 'StartTime': datetime.datetime(2022, 7, 13, 16, 32, 39, 527000, tzinfo=tzlocal()),\n", " 'EndTime': datetime.datetime(2022, 7, 13, 16, 35, 43, 780000, tzinfo=tzlocal()),\n", " 'StepStatus': 'Succeeded',\n", " 'AttemptCount': 0,\n", " 'Metadata': {'TrainingJob': {'Arn': 'arn:aws:sagemaker:us-west-2:000000000000:training-job/pipelines-ib5u1ar77ell-abalonetrain-hqep8kakhv'}}},\n", " {'StepName': 'AbaloneProcess',\n", " 'StartTime': datetime.datetime(2022, 7, 13, 16, 27, 50, 852000, tzinfo=tzlocal()),\n", " 'EndTime': datetime.datetime(2022, 7, 13, 16, 32, 38, 578000, tzinfo=tzlocal()),\n", " 'StepStatus': 'Succeeded',\n", " 'AttemptCount': 0,\n", " 'Metadata': {'ProcessingJob': {'Arn': 'arn:aws:sagemaker:us-west-2:000000000000:processing-job/pipelines-ib5u1ar77ell-abaloneprocess-pfwnwtmeef'}}}]" ] }, "execution_count": 34, "metadata": {}, "output_type": "execute_result" } ], "source": [ "execution.list_steps()" ] }, { "cell_type": "markdown", "id": "c1f9631d", "metadata": { "papermill": { "duration": 0.043074, "end_time": "2022-07-13T16:45:59.771340", "exception": false, "start_time": "2022-07-13T16:45:59.728266", "status": "completed" }, "tags": [] }, "source": [ "Apart from that, you might also want to adjust the MSE threshold to a smaller value and raise the bar for the accuracy of the registered model. In this case you can override the MSE threshold like the following:" ] }, { "cell_type": "code", "execution_count": 35, "id": "ff7f6820", "metadata": { "execution": { "iopub.execute_input": "2022-07-13T16:45:59.870874Z", "iopub.status.busy": "2022-07-13T16:45:59.869936Z", "iopub.status.idle": "2022-07-13T16:46:00.219548Z", "shell.execute_reply": "2022-07-13T16:46:00.219917Z" }, "papermill": { "duration": 0.403974, "end_time": "2022-07-13T16:46:00.220070", "exception": false, "start_time": "2022-07-13T16:45:59.816096", "status": "completed" }, "pycharm": { "name": "#%%\n" }, "tags": [] }, "outputs": [], "source": [ "execution = pipeline.start(parameters=dict(MseThreshold=3.0))" ] }, { "cell_type": "markdown", "id": "c07f95d7", "metadata": { "papermill": { "duration": 0.078875, "end_time": "2022-07-13T16:46:00.342112", "exception": false, "start_time": "2022-07-13T16:46:00.263237", "status": "completed" }, "tags": [] }, "source": [ "If the MSE threshold is not satisfied, the pipeline execution enters the `FailStep` and is marked as failed." ] }, { "cell_type": "code", "execution_count": 36, "id": "1f244f42", "metadata": { "execution": { "iopub.execute_input": "2022-07-13T16:46:00.433733Z", "iopub.status.busy": "2022-07-13T16:46:00.433177Z", "iopub.status.idle": "2022-07-13T16:59:07.315832Z", "shell.execute_reply": "2022-07-13T16:59:07.316625Z" }, "papermill": { "duration": 786.932424, "end_time": "2022-07-13T16:59:07.316785", "exception": false, "start_time": "2022-07-13T16:46:00.384361", "status": "completed" }, "pycharm": { "name": "#%%\n" }, "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Waiter PipelineExecutionComplete failed: Waiter encountered a terminal failure state: For expression \"PipelineExecutionStatus\" we matched expected path: \"Failed\"\n" ] } ], "source": [ "try:\n", " execution.wait()\n", "except Exception as error:\n", " print(error)" ] }, { "cell_type": "code", "execution_count": 37, "id": "62081cff", "metadata": { "execution": { "iopub.execute_input": "2022-07-13T16:59:07.436018Z", "iopub.status.busy": "2022-07-13T16:59:07.435161Z", "iopub.status.idle": "2022-07-13T16:59:07.610216Z", "shell.execute_reply": "2022-07-13T16:59:07.610648Z" }, "papermill": { "duration": 0.238686, "end_time": "2022-07-13T16:59:07.610792", "exception": false, "start_time": "2022-07-13T16:59:07.372106", "status": "completed" }, "pycharm": { "name": "#%%\n" }, "tags": [] }, "outputs": [ { "data": { "text/plain": [ "[{'StepName': 'AbaloneMSEFail',\n", " 'StartTime': datetime.datetime(2022, 7, 13, 16, 58, 50, 492000, tzinfo=tzlocal()),\n", " 'EndTime': datetime.datetime(2022, 7, 13, 16, 58, 51, 63000, tzinfo=tzlocal()),\n", " 'StepStatus': 'Failed',\n", " 'AttemptCount': 0,\n", " 'FailureReason': 'Execution failed due to MSE > 3.0',\n", " 'Metadata': {'Fail': {'ErrorMessage': 'Execution failed due to MSE > 3.0'}}},\n", " {'StepName': 'AbaloneMSECond',\n", " 'StartTime': datetime.datetime(2022, 7, 13, 16, 58, 49, 539000, tzinfo=tzlocal()),\n", " 'EndTime': datetime.datetime(2022, 7, 13, 16, 58, 49, 926000, tzinfo=tzlocal()),\n", " 'StepStatus': 'Succeeded',\n", " 'AttemptCount': 0,\n", " 'Metadata': {'Condition': {'Outcome': 'False'}}},\n", " {'StepName': 'AbaloneEval',\n", " 'StartTime': datetime.datetime(2022, 7, 13, 16, 54, 6, 136000, tzinfo=tzlocal()),\n", " 'EndTime': datetime.datetime(2022, 7, 13, 16, 58, 49, 30000, tzinfo=tzlocal()),\n", " 'StepStatus': 'Succeeded',\n", " 'AttemptCount': 0,\n", " 'Metadata': {'ProcessingJob': {'Arn': 'arn:aws:sagemaker:us-west-2:000000000000:processing-job/pipelines-hswk7n1v1oju-abaloneeval-z6banrjbgd'}}},\n", " {'StepName': 'AbaloneTrain',\n", " 'StartTime': datetime.datetime(2022, 7, 13, 16, 50, 55, 273000, tzinfo=tzlocal()),\n", " 'EndTime': datetime.datetime(2022, 7, 13, 16, 54, 5, 654000, tzinfo=tzlocal()),\n", " 'StepStatus': 'Succeeded',\n", " 'AttemptCount': 0,\n", " 'Metadata': {'TrainingJob': {'Arn': 'arn:aws:sagemaker:us-west-2:000000000000:training-job/pipelines-hswk7n1v1oju-abalonetrain-ig59hd6pjy'}}},\n", " {'StepName': 'AbaloneProcess',\n", " 'StartTime': datetime.datetime(2022, 7, 13, 16, 46, 1, 213000, tzinfo=tzlocal()),\n", " 'EndTime': datetime.datetime(2022, 7, 13, 16, 50, 54, 782000, tzinfo=tzlocal()),\n", " 'StepStatus': 'Succeeded',\n", " 'AttemptCount': 0,\n", " 'Metadata': {'ProcessingJob': {'Arn': 'arn:aws:sagemaker:us-west-2:000000000000:processing-job/pipelines-hswk7n1v1oju-abaloneprocess-npvk1jznbk'}}}]" ] }, "execution_count": 37, "metadata": {}, "output_type": "execute_result" } ], "source": [ "execution.list_steps()" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "## Notebook CI Test Results\n", "\n", "This notebook was tested in multiple regions. The test results are as follows, except for us-west-2 which is shown at the top of the notebook.\n", "\n", "![This us-east-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/us-east-1/sagemaker-pipelines|tabular|abalone_build_train_deploy|sagemaker-pipelines-preprocess-train-evaluate-batch-transform_outputs.ipynb)\n", "\n", "![This us-east-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/us-east-2/sagemaker-pipelines|tabular|abalone_build_train_deploy|sagemaker-pipelines-preprocess-train-evaluate-batch-transform_outputs.ipynb)\n", "\n", "![This us-west-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/us-west-1/sagemaker-pipelines|tabular|abalone_build_train_deploy|sagemaker-pipelines-preprocess-train-evaluate-batch-transform_outputs.ipynb)\n", "\n", "![This ca-central-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/ca-central-1/sagemaker-pipelines|tabular|abalone_build_train_deploy|sagemaker-pipelines-preprocess-train-evaluate-batch-transform_outputs.ipynb)\n", "\n", "![This sa-east-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/sa-east-1/sagemaker-pipelines|tabular|abalone_build_train_deploy|sagemaker-pipelines-preprocess-train-evaluate-batch-transform_outputs.ipynb)\n", "\n", "![This eu-west-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/eu-west-1/sagemaker-pipelines|tabular|abalone_build_train_deploy|sagemaker-pipelines-preprocess-train-evaluate-batch-transform_outputs.ipynb)\n", "\n", "![This eu-west-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/eu-west-2/sagemaker-pipelines|tabular|abalone_build_train_deploy|sagemaker-pipelines-preprocess-train-evaluate-batch-transform_outputs.ipynb)\n", "\n", "![This eu-west-3 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/eu-west-3/sagemaker-pipelines|tabular|abalone_build_train_deploy|sagemaker-pipelines-preprocess-train-evaluate-batch-transform_outputs.ipynb)\n", "\n", "![This eu-central-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/eu-central-1/sagemaker-pipelines|tabular|abalone_build_train_deploy|sagemaker-pipelines-preprocess-train-evaluate-batch-transform_outputs.ipynb)\n", "\n", "![This eu-north-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/eu-north-1/sagemaker-pipelines|tabular|abalone_build_train_deploy|sagemaker-pipelines-preprocess-train-evaluate-batch-transform_outputs.ipynb)\n", "\n", "![This ap-southeast-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/ap-southeast-1/sagemaker-pipelines|tabular|abalone_build_train_deploy|sagemaker-pipelines-preprocess-train-evaluate-batch-transform_outputs.ipynb)\n", "\n", "![This ap-southeast-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/ap-southeast-2/sagemaker-pipelines|tabular|abalone_build_train_deploy|sagemaker-pipelines-preprocess-train-evaluate-batch-transform_outputs.ipynb)\n", "\n", "![This ap-northeast-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/ap-northeast-1/sagemaker-pipelines|tabular|abalone_build_train_deploy|sagemaker-pipelines-preprocess-train-evaluate-batch-transform_outputs.ipynb)\n", "\n", "![This ap-northeast-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/ap-northeast-2/sagemaker-pipelines|tabular|abalone_build_train_deploy|sagemaker-pipelines-preprocess-train-evaluate-batch-transform_outputs.ipynb)\n", "\n", "![This ap-south-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/ap-south-1/sagemaker-pipelines|tabular|abalone_build_train_deploy|sagemaker-pipelines-preprocess-train-evaluate-batch-transform_outputs.ipynb)\n" ] } ], "metadata": { "instance_type": "ml.t3.medium", "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.10" }, "papermill": { "default_parameters": {}, "duration": 3085.109473, "end_time": "2022-07-13T16:59:08.278091", "environment_variables": {}, "exception": null, "input_path": "sagemaker-pipelines-preprocess-train-evaluate-batch-transform.ipynb", "output_path": "/opt/ml/processing/output/sagemaker-pipelines-preprocess-train-evaluate-batch-transform-2022-07-13-15-54-21.ipynb", "parameters": { "kms_key": "arn:aws:kms:us-west-2:000000000000:1234abcd-12ab-34cd-56ef-1234567890ab" }, "start_time": "2022-07-13T16:07:43.168618", "version": "2.3.4" } }, "nbformat": 4, "nbformat_minor": 5 }