{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Build a machine learning workflow using Step Functions and SageMaker\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---\n", "\n", "This notebook's CI test result for us-west-2 is as follows. CI test results in other regions can be found at the end of the notebook. \n", "\n", "![This us-west-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/us-west-2/step-functions-data-science-sdk|machine_learning_workflow_abalone|machine_learning_workflow_abalone.ipynb)\n", "\n", "---" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "1. [Introduction](#Introduction)\n", "1. [Setup](#Setup)\n", "1. [Build a machine learning workflow](#Build-a-machine-learning-workflow)\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Introduction\n", "\n", "This notebook describes using the AWS Step Functions Data Science SDK to create and manage workflows. The Step Functions SDK is an open source library that allows data scientists to easily create and execute machine learning workflows using AWS Step Functions and Amazon SageMaker. For more information, see the following.\n", "* [AWS Step Functions](https://aws.amazon.com/step-functions/)\n", "* [AWS Step Functions Developer Guide](https://docs.aws.amazon.com/step-functions/latest/dg/welcome.html)\n", "* [AWS Step Functions Data Science SDK](https://aws-step-functions-data-science-sdk.readthedocs.io)\n", "\n", "In this notebook we will use the SDK to create steps, link them together to create a workflow, and execute the workflow in AWS Step Functions. The first tutorial shows how to create an ML pipeline workflow, and the second shows how to run multiple experiments in parallel." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "import sys\n", "\n", "!{sys.executable} -m pip install --upgrade stepfunctions" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Setup\n", "\n", "### Add a policy to your SageMaker role in IAM\n", "\n", "**If you are running this notebook on an Amazon SageMaker notebook instance**, the IAM role assumed by your notebook instance needs permission to create and run workflows in AWS Step Functions. To provide this permission to the role, do the following.\n", "\n", "1. Open the Amazon [SageMaker console](https://console.aws.amazon.com/sagemaker/). \n", "2. Select **Notebook instances** and choose the name of your notebook instance\n", "3. Under **Permissions and encryption** select the role ARN to view the role on the IAM console\n", "4. Choose **Attach policies** and search for `AWSStepFunctionsFullAccess`.\n", "5. Select the check box next to `AWSStepFunctionsFullAccess` and choose **Attach policy**\n", "\n", "If you are running this notebook in a local environment, the SDK will use your configured AWS CLI configuration. For more information, see [Configuring the AWS CLI](https://docs.aws.amazon.com/cli/latest/userguide/cli-chap-configure.html).\n", "\n", "Next, create an execution role in IAM for Step Functions. \n", "\n", "### Create an execution role for Step Functions\n", "\n", "You need an execution role so that you can create and execute workflows in Step Functions.\n", "\n", "1. Go to the [IAM console](https://console.aws.amazon.com/iam/)\n", "2. Select **Roles** and then **Create role**.\n", "3. Under **Choose the service that will use this role** select **Step Functions**\n", "4. Choose **Next** until you can enter a **Role name**\n", "5. Enter a name such as `AmazonSageMaker-StepFunctionsWorkflowExecutionRole` and then select **Create role**\n", "\n", "\n", "Attach a policy to the role you created. The following steps attach a policy that provides full access to Step Functions, however as a good practice you should only provide access to the resources you need. \n", "\n", "1. Under the **Permissions** tab, click **Add inline policy**\n", "2. Enter the following in the **JSON** tab\n", "\n", "```json\n", "{\n", " \"Version\": \"2012-10-17\",\n", " \"Statement\": [\n", " {\n", " \"Effect\": \"Allow\",\n", " \"Action\": [\n", " \"sagemaker:CreateTransformJob\",\n", " \"sagemaker:DescribeTransformJob\",\n", " \"sagemaker:StopTransformJob\",\n", " \"sagemaker:CreateTrainingJob\",\n", " \"sagemaker:DescribeTrainingJob\",\n", " \"sagemaker:StopTrainingJob\",\n", " \"sagemaker:CreateHyperParameterTuningJob\",\n", " \"sagemaker:DescribeHyperParameterTuningJob\",\n", " \"sagemaker:StopHyperParameterTuningJob\",\n", " \"sagemaker:CreateModel\",\n", " \"sagemaker:CreateEndpointConfig\",\n", " \"sagemaker:CreateEndpoint\",\n", " \"sagemaker:DeleteEndpointConfig\",\n", " \"sagemaker:DeleteEndpoint\",\n", " \"sagemaker:UpdateEndpoint\",\n", " \"sagemaker:ListTags\",\n", " \"lambda:InvokeFunction\",\n", " \"sqs:SendMessage\",\n", " \"sns:Publish\",\n", " \"ecs:RunTask\",\n", " \"ecs:StopTask\",\n", " \"ecs:DescribeTasks\",\n", " \"dynamodb:GetItem\",\n", " \"dynamodb:PutItem\",\n", " \"dynamodb:UpdateItem\",\n", " \"dynamodb:DeleteItem\",\n", " \"batch:SubmitJob\",\n", " \"batch:DescribeJobs\",\n", " \"batch:TerminateJob\",\n", " \"glue:StartJobRun\",\n", " \"glue:GetJobRun\",\n", " \"glue:GetJobRuns\",\n", " \"glue:BatchStopJobRun\"\n", " ],\n", " \"Resource\": \"*\"\n", " },\n", " {\n", " \"Effect\": \"Allow\",\n", " \"Action\": [\n", " \"iam:PassRole\"\n", " ],\n", " \"Resource\": \"*\",\n", " \"Condition\": {\n", " \"StringEquals\": {\n", " \"iam:PassedToService\": \"sagemaker.amazonaws.com\"\n", " }\n", " }\n", " },\n", " {\n", " \"Effect\": \"Allow\",\n", " \"Action\": [\n", " \"events:PutTargets\",\n", " \"events:PutRule\",\n", " \"events:DescribeRule\"\n", " ],\n", " \"Resource\": [\n", " \"arn:aws:events:*:*:rule/StepFunctionsGetEventsForSageMakerTrainingJobsRule\",\n", " \"arn:aws:events:*:*:rule/StepFunctionsGetEventsForSageMakerTransformJobsRule\",\n", " \"arn:aws:events:*:*:rule/StepFunctionsGetEventsForSageMakerTuningJobsRule\",\n", " \"arn:aws:events:*:*:rule/StepFunctionsGetEventsForECSTaskRule\",\n", " \"arn:aws:events:*:*:rule/StepFunctionsGetEventsForBatchJobsRule\"\n", " ]\n", " }\n", " ]\n", "}\n", "```\n", "\n", "3. Choose **Review policy** and give the policy a name such as `AmazonSageMaker-StepFunctionsWorkflowExecutionPolicy`\n", "4. Choose **Create policy**. You will be redirected to the details page for the role.\n", "5. Copy the **Role ARN** at the top of the **Summary**" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Configure execution roles" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "import sagemaker\n", "\n", "# SageMaker Execution Role\n", "# You can use sagemaker.get_execution_role() if running inside sagemaker's notebook instance\n", "sagemaker_execution_role = (\n", " sagemaker.get_execution_role()\n", ") # Replace with ARN if not in an AWS SageMaker notebook\n", "\n", "# paste the AmazonSageMaker-StepFunctionsWorkflowExecutionRole ARN from above\n", "workflow_execution_role = \"\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Import the required modules" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "import boto3\n", "import sagemaker\n", "import time\n", "import random\n", "import uuid\n", "import logging\n", "import stepfunctions\n", "import io\n", "import random\n", "\n", "from sagemaker.amazon.amazon_estimator import image_uris\n", "from stepfunctions import steps\n", "from stepfunctions.steps import TrainingStep, ModelStep, TransformStep\n", "from stepfunctions.inputs import ExecutionInput\n", "from stepfunctions.workflow import Workflow\n", "from stepfunctions.template import TrainingPipeline\n", "from stepfunctions.template.utils import replace_parameters_with_jsonpath\n", "\n", "session = sagemaker.Session()\n", "stepfunctions.set_stream_logger(level=logging.INFO)\n", "\n", "region = boto3.Session().region_name\n", "bucket = session.default_bucket()\n", "prefix = \"sagemaker/DEMO-xgboost-regression\"\n", "bucket_path = \"https://s3-{}.amazonaws.com/{}\".format(region, bucket)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Prepare the dataset\n", "\n", "The following cell defines utility methods to split a dataset into train, validation, and test datasets. It then defines methods to upload them to an Amazon S3 bucket." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "def data_split(\n", " FILE_DATA,\n", " FILE_TRAIN,\n", " FILE_VALIDATION,\n", " FILE_TEST,\n", " PERCENT_TRAIN,\n", " PERCENT_VALIDATION,\n", " PERCENT_TEST,\n", "):\n", " data = [l for l in open(FILE_DATA, \"r\")]\n", " train_file = open(FILE_TRAIN, \"w\")\n", " valid_file = open(FILE_VALIDATION, \"w\")\n", " tests_file = open(FILE_TEST, \"w\")\n", "\n", " num_of_data = len(data)\n", " num_train = int((PERCENT_TRAIN / 100.0) * num_of_data)\n", " num_valid = int((PERCENT_VALIDATION / 100.0) * num_of_data)\n", " num_tests = int((PERCENT_TEST / 100.0) * num_of_data)\n", "\n", " data_fractions = [num_train, num_valid, num_tests]\n", " split_data = [[], [], []]\n", "\n", " rand_data_ind = 0\n", "\n", " for split_ind, fraction in enumerate(data_fractions):\n", " for i in range(fraction):\n", " rand_data_ind = random.randint(0, len(data) - 1)\n", " split_data[split_ind].append(data[rand_data_ind])\n", " data.pop(rand_data_ind)\n", "\n", " for l in split_data[0]:\n", " train_file.write(l)\n", "\n", " for l in split_data[1]:\n", " valid_file.write(l)\n", "\n", " for l in split_data[2]:\n", " tests_file.write(l)\n", "\n", " train_file.close()\n", " valid_file.close()\n", " tests_file.close()\n", "\n", "\n", "def write_to_s3(fobj, bucket, key):\n", " return (\n", " boto3.Session(region_name=region)\n", " .resource(\"s3\")\n", " .Bucket(bucket)\n", " .Object(key)\n", " .upload_fileobj(fobj)\n", " )\n", "\n", "\n", "def upload_to_s3(bucket, channel, filename):\n", " fobj = open(filename, \"rb\")\n", " key = prefix + \"/\" + channel\n", " url = \"s3://{}/{}/{}\".format(bucket, key, filename)\n", " print(\"Writing to {}\".format(url))\n", " write_to_s3(fobj, bucket, key)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This notebook uses the XGBoost algorithm to train and host a regression model. We use the [Abalone data](https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/regression.html) originally from the [UCI data repository](https://archive.ics.uci.edu/ml/datasets/abalone). More details about the original dataset can be found [here](https://archive.ics.uci.edu/ml/machine-learning-databases/abalone/abalone.names). In the libsvm converted [version](https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/regression.html), the nominal feature (Male/Female/Infant) has been converted into a real valued feature. Age of abalone is to be predicted from eight physical measurements. " ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "FILE_DATA = \"abalone\"\n", "s3 = boto3.client(\"s3\")\n", "s3.download_file(\n", " f\"sagemaker-example-files-prod-{region}\",\n", " \"datasets/tabular/uci_abalone/abalone.libsvm\",\n", " FILE_DATA,\n", ")\n", "\n", "# split the downloaded data into train/test/validation files\n", "FILE_TRAIN = \"abalone.train\"\n", "FILE_VALIDATION = \"abalone.validation\"\n", "FILE_TEST = \"abalone.test\"\n", "PERCENT_TRAIN = 70\n", "PERCENT_VALIDATION = 15\n", "PERCENT_TEST = 15\n", "data_split(\n", " FILE_DATA,\n", " FILE_TRAIN,\n", " FILE_VALIDATION,\n", " FILE_TEST,\n", " PERCENT_TRAIN,\n", " PERCENT_VALIDATION,\n", " PERCENT_TEST,\n", ")\n", "\n", "# upload the files to the S3 bucket\n", "upload_to_s3(bucket, \"train\", FILE_TRAIN)\n", "upload_to_s3(bucket, \"validation\", FILE_VALIDATION)\n", "upload_to_s3(bucket, \"test\", FILE_TEST)\n", "\n", "train_s3_file = bucket_path + \"/\" + prefix + \"/train\"\n", "validation_s3_file = bucket_path + \"/\" + prefix + \"/validation\"\n", "test_s3_file = bucket_path + \"/\" + prefix + \"/test\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Configure the AWS Sagemaker estimator" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "xgb = sagemaker.estimator.Estimator(\n", " image_uris.retrieve(\"xgboost\", region, \"1.2-1\"),\n", " sagemaker_execution_role,\n", " train_instance_count=1,\n", " train_instance_type=\"ml.m4.4xlarge\",\n", " train_volume_size=5,\n", " output_path=bucket_path + \"/\" + prefix + \"/single-xgboost\",\n", " sagemaker_session=session,\n", ")\n", "\n", "xgb.set_hyperparameters(\n", " objective=\"reg:linear\",\n", " num_round=50,\n", " max_depth=5,\n", " eta=0.2,\n", " gamma=4,\n", " min_child_weight=6,\n", " subsample=0.7,\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "## Build a machine learning workflow" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can use a workflow to create a machine learning pipeline. The AWS Data Science Workflows SDK provides several AWS SageMaker workflow steps that you can use to construct an ML pipeline. In this tutorial you will use the Train and Transform steps.\n", "\n", "* [**TrainingStep**](https://aws-step-functions-data-science-sdk.readthedocs.io/en/latest/sagemaker.html#stepfunctions.steps.sagemaker.TrainingStep) - Starts a Sagemaker training job and outputs the model artifacts to S3.\n", "* [**ModelStep**](https://aws-step-functions-data-science-sdk.readthedocs.io/en/latest/sagemaker.html#stepfunctions.steps.sagemaker.ModelStep) - Creates a model on SageMaker using the model artifacts from S3.\n", "* [**TransformStep**](https://aws-step-functions-data-science-sdk.readthedocs.io/en/latest/sagemaker.html#stepfunctions.steps.sagemaker.TransformStep) - Starts a SageMaker transform job\n", "* [**EndpointConfigStep**](https://aws-step-functions-data-science-sdk.readthedocs.io/en/latest/sagemaker.html#stepfunctions.steps.sagemaker.EndpointConfigStep) - Defines an endpoint configuration on SageMaker.\n", "* [**EndpointStep**](https://aws-step-functions-data-science-sdk.readthedocs.io/en/latest/sagemaker.html#stepfunctions.steps.sagemaker.EndpointStep) - Deploys the trained model to the configured endpoint." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Define the input schema for a workflow execution\n", "\n", "The [**ExecutionInput**](https://aws-step-functions-data-science-sdk.readthedocs.io/en/latest/placeholders.html#stepfunctions.inputs.ExecutionInput) API defines the options to dynamically pass information to a workflow at runtime.\n", "\n", "The following cell defines the fields that must be passed to your workflow when starting an execution.\n", "\n", "While the workflow is usually static after it is defined, you may want to pass values dynamically that are used by steps in your workflow. To help with this, the SDK provides a way to create placeholders when you define your workflow. These placeholders can be dynamically assigned values when you execute your workflow.\n", "\n", "ExecutionInput values are accessible to each step of your workflow. You have the ability to define a schema for this placeholder collection, as shown in the cell below. When you execute your workflow the SDK will verify if the dynamic input conforms to the schema you defined." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "# SageMaker expects unique names for each job, model and endpoint.\n", "# If these names are not unique the execution will fail. Pass these\n", "# dynamically for each execution using placeholders.\n", "execution_input = ExecutionInput(schema={\"JobName\": str, \"ModelName\": str, \"EndpointName\": str})" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Create the training step \n", "\n", "In the following cell we create the training step and pass the estimator we defined above. See [TrainingStep](https://aws-step-functions-data-science-sdk.readthedocs.io/en/latest/sagemaker.html#stepfunctions.steps.sagemaker.TrainingStep) in the AWS Step Functions Data Science SDK documentation." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "training_step = steps.TrainingStep(\n", " \"Train Step\",\n", " estimator=xgb,\n", " data={\n", " \"train\": sagemaker.TrainingInput(train_s3_file, content_type=\"text/libsvm\"),\n", " \"validation\": sagemaker.TrainingInput(validation_s3_file, content_type=\"text/libsvm\"),\n", " },\n", " job_name=execution_input[\"JobName\"],\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Create the model step \n", "\n", "In the following cell we define a model step that will create a model in SageMaker using the artifacts created during the TrainingStep. See [ModelStep](https://aws-step-functions-data-science-sdk.readthedocs.io/en/latest/sagemaker.html#stepfunctions.steps.sagemaker.ModelStep) in the AWS Step Functions Data Science SDK documentation.\n", "\n", "The model creation step typically follows the training step. The Step Functions SDK provides the [get_expected_model](https://aws-step-functions-data-science-sdk.readthedocs.io/en/latest/sagemaker.html#stepfunctions.steps.sagemaker.TrainingStep.get_expected_model) method in the TrainingStep class to provide a reference for the trained model artifacts. Please note that this method is only useful when the ModelStep directly follows the TrainingStep." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "model_step = steps.ModelStep(\n", " \"Save model\", model=training_step.get_expected_model(), model_name=execution_input[\"ModelName\"]\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Create the transform step\n", "\n", "In the following cell we create the transform step. See [TransformStep](https://aws-step-functions-data-science-sdk.readthedocs.io/en/latest/sagemaker.html#stepfunctions.steps.sagemaker.TransformStep) in the AWS Step Functions Data Science SDK documentation." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "transform_step = steps.TransformStep(\n", " \"Transform Input Dataset\",\n", " transformer=xgb.transformer(instance_count=1, instance_type=\"ml.m5.large\"),\n", " job_name=execution_input[\"JobName\"],\n", " model_name=execution_input[\"ModelName\"],\n", " data=test_s3_file,\n", " content_type=\"text/libsvm\",\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Create an endpoint configuration step\n", "\n", "In the following cell we create an endpoint configuration step. See [EndpointConfigStep](https://aws-step-functions-data-science-sdk.readthedocs.io/en/latest/sagemaker.html#stepfunctions.steps.sagemaker.EndpointConfigStep) in the AWS Step Functions Data Science SDK documentation.\n", "\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "endpoint_config_step = steps.EndpointConfigStep(\n", " \"Create Endpoint Config\",\n", " endpoint_config_name=execution_input[\"ModelName\"],\n", " model_name=execution_input[\"ModelName\"],\n", " initial_instance_count=1,\n", " instance_type=\"ml.m5.large\",\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Create an endpoint\n", "\n", "In the following cell we create a step to deploy the trained model to an endpoint in AWS SageMaker. See [EndpointStep](https://aws-step-functions-data-science-sdk.readthedocs.io/en/latest/sagemaker.html#stepfunctions.steps.sagemaker.EndpointStep) in the AWS Step Functions Data Science SDK documentation." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "endpoint_step = steps.EndpointStep(\n", " \"Create Endpoint\",\n", " endpoint_name=execution_input[\"EndpointName\"],\n", " endpoint_config_name=execution_input[\"ModelName\"],\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Chain together steps for your workflow\n", "\n", "Create your workflow definition by chaining the steps together. See [Chain](https://aws-step-functions-data-science-sdk.readthedocs.io/en/latest/sagemaker.html#stepfunctions.steps.states.Chain) in the AWS Step Functions Data Science SDK documentation." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "workflow_definition = steps.Chain(\n", " [training_step, model_step, transform_step, endpoint_config_step, endpoint_step]\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Create your workflow using the workflow definition above, and render the graph with [render_graph](https://aws-step-functions-data-science-sdk.readthedocs.io/en/latest/workflow.html#stepfunctions.workflow.Workflow.render_graph)." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "workflow = Workflow(\n", " name=\"MyTrainTransformDeploy_v1\",\n", " definition=workflow_definition,\n", " role=workflow_execution_role,\n", " execution_input=execution_input,\n", ")" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "workflow.render_graph()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Create the workflow in AWS Step Functions with [create](https://aws-step-functions-data-science-sdk.readthedocs.io/en/latest/workflow.html#stepfunctions.workflow.Workflow.create)." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "workflow.create()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Run the workflow with [execute](https://aws-step-functions-data-science-sdk.readthedocs.io/en/latest/workflow.html#stepfunctions.workflow.Workflow.execute)." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "execution = workflow.execute(\n", " inputs={\n", " \"JobName\": \"regression-{}\".format(\n", " uuid.uuid1().hex\n", " ), # Each Sagemaker Job requires a unique name\n", " \"ModelName\": \"regression-{}\".format(uuid.uuid1().hex), # Each Model requires a unique name,\n", " \"EndpointName\": \"regression-{}\".format(\n", " uuid.uuid1().hex\n", " ), # Each Endpoint requires a unique name,\n", " }\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Render workflow progress with the [render_progress](https://aws-step-functions-data-science-sdk.readthedocs.io/en/latest/workflow.html#stepfunctions.workflow.Execution.render_progress).\n", "\n", "This generates a snapshot of the current state of your workflow as it executes. This is a static image. Run the cell again to check progress. " ] }, { "cell_type": "code", "execution_count": null, "metadata": { "scrolled": true, "tags": [] }, "outputs": [], "source": [ "execution.render_progress()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Use [list_events](https://aws-step-functions-data-science-sdk.readthedocs.io/en/latest/workflow.html#stepfunctions.workflow.Execution.list_events) to list all events in the workflow execution." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "execution.list_events()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Use [list_executions](https://aws-step-functions-data-science-sdk.readthedocs.io/en/latest/workflow.html#stepfunctions.workflow.Workflow.list_executions) to list all executions for a specific workflow." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "workflow.list_executions(html=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Use [list_workflows](https://aws-step-functions-data-science-sdk.readthedocs.io/en/latest/workflow.html#stepfunctions.workflow.Workflow.list_workflows) to list all workflows in your AWS account." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "scrolled": true, "tags": [] }, "outputs": [], "source": [ "Workflow.list_workflows(html=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Notebook CI Test Results\n", "\n", "This notebook was tested in multiple regions. The test results are as follows, except for us-west-2 which is shown at the top of the notebook.\n", "\n", "![This us-east-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/us-east-1/step-functions-data-science-sdk|machine_learning_workflow_abalone|machine_learning_workflow_abalone.ipynb)\n", "\n", "![This us-east-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/us-east-2/step-functions-data-science-sdk|machine_learning_workflow_abalone|machine_learning_workflow_abalone.ipynb)\n", "\n", "![This us-west-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/us-west-1/step-functions-data-science-sdk|machine_learning_workflow_abalone|machine_learning_workflow_abalone.ipynb)\n", "\n", "![This ca-central-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/ca-central-1/step-functions-data-science-sdk|machine_learning_workflow_abalone|machine_learning_workflow_abalone.ipynb)\n", "\n", "![This sa-east-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/sa-east-1/step-functions-data-science-sdk|machine_learning_workflow_abalone|machine_learning_workflow_abalone.ipynb)\n", "\n", "![This eu-west-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/eu-west-1/step-functions-data-science-sdk|machine_learning_workflow_abalone|machine_learning_workflow_abalone.ipynb)\n", "\n", "![This eu-west-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/eu-west-2/step-functions-data-science-sdk|machine_learning_workflow_abalone|machine_learning_workflow_abalone.ipynb)\n", "\n", "![This eu-west-3 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/eu-west-3/step-functions-data-science-sdk|machine_learning_workflow_abalone|machine_learning_workflow_abalone.ipynb)\n", "\n", "![This eu-central-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/eu-central-1/step-functions-data-science-sdk|machine_learning_workflow_abalone|machine_learning_workflow_abalone.ipynb)\n", "\n", "![This eu-north-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/eu-north-1/step-functions-data-science-sdk|machine_learning_workflow_abalone|machine_learning_workflow_abalone.ipynb)\n", "\n", "![This ap-southeast-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/ap-southeast-1/step-functions-data-science-sdk|machine_learning_workflow_abalone|machine_learning_workflow_abalone.ipynb)\n", "\n", "![This ap-southeast-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/ap-southeast-2/step-functions-data-science-sdk|machine_learning_workflow_abalone|machine_learning_workflow_abalone.ipynb)\n", "\n", "![This ap-northeast-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/ap-northeast-1/step-functions-data-science-sdk|machine_learning_workflow_abalone|machine_learning_workflow_abalone.ipynb)\n", "\n", "![This ap-northeast-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/ap-northeast-2/step-functions-data-science-sdk|machine_learning_workflow_abalone|machine_learning_workflow_abalone.ipynb)\n", "\n", "![This ap-south-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/ap-south-1/step-functions-data-science-sdk|machine_learning_workflow_abalone|machine_learning_workflow_abalone.ipynb)\n" ] } ], "metadata": { "availableInstances": [ { "_defaultOrder": 0, "_isFastLaunch": true, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 4, "name": "ml.t3.medium", "vcpuNum": 2 }, { "_defaultOrder": 1, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 8, "name": "ml.t3.large", "vcpuNum": 2 }, { "_defaultOrder": 2, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 16, "name": "ml.t3.xlarge", "vcpuNum": 4 }, { "_defaultOrder": 3, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 32, "name": "ml.t3.2xlarge", "vcpuNum": 8 }, { "_defaultOrder": 4, "_isFastLaunch": true, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 8, "name": "ml.m5.large", "vcpuNum": 2 }, { "_defaultOrder": 5, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 16, "name": "ml.m5.xlarge", "vcpuNum": 4 }, { "_defaultOrder": 6, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 32, "name": "ml.m5.2xlarge", "vcpuNum": 8 }, { "_defaultOrder": 7, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 64, "name": "ml.m5.4xlarge", "vcpuNum": 16 }, { "_defaultOrder": 8, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 128, "name": "ml.m5.8xlarge", "vcpuNum": 32 }, { "_defaultOrder": 9, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 192, "name": "ml.m5.12xlarge", "vcpuNum": 48 }, { "_defaultOrder": 10, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 256, "name": "ml.m5.16xlarge", "vcpuNum": 64 }, { "_defaultOrder": 11, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 384, "name": "ml.m5.24xlarge", "vcpuNum": 96 }, { "_defaultOrder": 12, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 8, "name": "ml.m5d.large", "vcpuNum": 2 }, { "_defaultOrder": 13, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 16, "name": "ml.m5d.xlarge", "vcpuNum": 4 }, { "_defaultOrder": 14, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 32, "name": "ml.m5d.2xlarge", "vcpuNum": 8 }, { "_defaultOrder": 15, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 64, "name": "ml.m5d.4xlarge", "vcpuNum": 16 }, { "_defaultOrder": 16, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 128, "name": "ml.m5d.8xlarge", "vcpuNum": 32 }, { "_defaultOrder": 17, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 192, "name": "ml.m5d.12xlarge", "vcpuNum": 48 }, { "_defaultOrder": 18, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 256, "name": "ml.m5d.16xlarge", "vcpuNum": 64 }, { "_defaultOrder": 19, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 384, "name": "ml.m5d.24xlarge", "vcpuNum": 96 }, { "_defaultOrder": 20, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": true, "memoryGiB": 0, "name": "ml.geospatial.interactive", "supportedImageNames": [ "sagemaker-geospatial-v1-0" ], "vcpuNum": 0 }, { "_defaultOrder": 21, "_isFastLaunch": true, "category": "Compute optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 4, "name": "ml.c5.large", "vcpuNum": 2 }, { "_defaultOrder": 22, "_isFastLaunch": false, "category": "Compute optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 8, "name": "ml.c5.xlarge", "vcpuNum": 4 }, { "_defaultOrder": 23, "_isFastLaunch": false, "category": "Compute optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 16, "name": "ml.c5.2xlarge", "vcpuNum": 8 }, { "_defaultOrder": 24, "_isFastLaunch": false, "category": "Compute optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 32, "name": "ml.c5.4xlarge", "vcpuNum": 16 }, { "_defaultOrder": 25, "_isFastLaunch": false, "category": "Compute optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 72, "name": "ml.c5.9xlarge", "vcpuNum": 36 }, { "_defaultOrder": 26, "_isFastLaunch": false, "category": "Compute optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 96, "name": "ml.c5.12xlarge", "vcpuNum": 48 }, { "_defaultOrder": 27, "_isFastLaunch": false, "category": "Compute optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 144, "name": "ml.c5.18xlarge", "vcpuNum": 72 }, { "_defaultOrder": 28, "_isFastLaunch": false, "category": "Compute optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 192, "name": "ml.c5.24xlarge", "vcpuNum": 96 }, { "_defaultOrder": 29, "_isFastLaunch": true, "category": "Accelerated computing", "gpuNum": 1, "hideHardwareSpecs": false, "memoryGiB": 16, "name": "ml.g4dn.xlarge", "vcpuNum": 4 }, { "_defaultOrder": 30, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 1, "hideHardwareSpecs": false, "memoryGiB": 32, "name": "ml.g4dn.2xlarge", "vcpuNum": 8 }, { "_defaultOrder": 31, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 1, "hideHardwareSpecs": false, "memoryGiB": 64, "name": "ml.g4dn.4xlarge", "vcpuNum": 16 }, { "_defaultOrder": 32, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 1, "hideHardwareSpecs": false, "memoryGiB": 128, "name": "ml.g4dn.8xlarge", "vcpuNum": 32 }, { "_defaultOrder": 33, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 4, "hideHardwareSpecs": false, "memoryGiB": 192, "name": "ml.g4dn.12xlarge", "vcpuNum": 48 }, { "_defaultOrder": 34, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 1, "hideHardwareSpecs": false, "memoryGiB": 256, "name": "ml.g4dn.16xlarge", "vcpuNum": 64 }, { "_defaultOrder": 35, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 1, "hideHardwareSpecs": false, "memoryGiB": 61, "name": "ml.p3.2xlarge", "vcpuNum": 8 }, { "_defaultOrder": 36, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 4, "hideHardwareSpecs": false, "memoryGiB": 244, "name": "ml.p3.8xlarge", "vcpuNum": 32 }, { "_defaultOrder": 37, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 8, "hideHardwareSpecs": false, "memoryGiB": 488, "name": "ml.p3.16xlarge", "vcpuNum": 64 }, { "_defaultOrder": 38, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 8, "hideHardwareSpecs": false, "memoryGiB": 768, "name": "ml.p3dn.24xlarge", "vcpuNum": 96 }, { "_defaultOrder": 39, "_isFastLaunch": false, "category": "Memory Optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 16, "name": "ml.r5.large", "vcpuNum": 2 }, { "_defaultOrder": 40, "_isFastLaunch": false, "category": "Memory Optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 32, "name": "ml.r5.xlarge", "vcpuNum": 4 }, { "_defaultOrder": 41, "_isFastLaunch": false, "category": "Memory Optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 64, "name": "ml.r5.2xlarge", "vcpuNum": 8 }, { "_defaultOrder": 42, "_isFastLaunch": false, "category": "Memory Optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 128, "name": "ml.r5.4xlarge", "vcpuNum": 16 }, { "_defaultOrder": 43, "_isFastLaunch": false, "category": "Memory Optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 256, "name": "ml.r5.8xlarge", "vcpuNum": 32 }, { "_defaultOrder": 44, "_isFastLaunch": false, "category": "Memory Optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 384, "name": "ml.r5.12xlarge", "vcpuNum": 48 }, { "_defaultOrder": 45, "_isFastLaunch": false, "category": "Memory Optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 512, "name": "ml.r5.16xlarge", "vcpuNum": 64 }, { "_defaultOrder": 46, "_isFastLaunch": false, "category": "Memory Optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 768, "name": "ml.r5.24xlarge", "vcpuNum": 96 }, { "_defaultOrder": 47, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 1, "hideHardwareSpecs": false, "memoryGiB": 16, "name": "ml.g5.xlarge", "vcpuNum": 4 }, { "_defaultOrder": 48, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 1, "hideHardwareSpecs": false, "memoryGiB": 32, "name": "ml.g5.2xlarge", "vcpuNum": 8 }, { "_defaultOrder": 49, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 1, "hideHardwareSpecs": false, "memoryGiB": 64, "name": "ml.g5.4xlarge", "vcpuNum": 16 }, { "_defaultOrder": 50, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 1, "hideHardwareSpecs": false, "memoryGiB": 128, "name": "ml.g5.8xlarge", "vcpuNum": 32 }, { "_defaultOrder": 51, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 1, "hideHardwareSpecs": false, "memoryGiB": 256, "name": "ml.g5.16xlarge", "vcpuNum": 64 }, { "_defaultOrder": 52, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 4, "hideHardwareSpecs": false, "memoryGiB": 192, "name": "ml.g5.12xlarge", "vcpuNum": 48 }, { "_defaultOrder": 53, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 4, "hideHardwareSpecs": false, "memoryGiB": 384, "name": "ml.g5.24xlarge", "vcpuNum": 96 }, { "_defaultOrder": 54, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 8, "hideHardwareSpecs": false, "memoryGiB": 768, "name": "ml.g5.48xlarge", "vcpuNum": 192 }, { "_defaultOrder": 55, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 8, "hideHardwareSpecs": false, "memoryGiB": 1152, "name": "ml.p4d.24xlarge", "vcpuNum": 96 }, { "_defaultOrder": 56, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 8, "hideHardwareSpecs": false, "memoryGiB": 1152, "name": "ml.p4de.24xlarge", "vcpuNum": 96 } ], "kernelspec": { "display_name": "Python 3 (Data Science 3.0)", "language": "python", "name": "python3__SAGEMAKER_INTERNAL__arn:aws:sagemaker:us-west-2:236514542706:image/sagemaker-data-science-310-v1" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.6" } }, "nbformat": 4, "nbformat_minor": 4 }