{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# SageMaker Pipelines EMR Step With Cluster Lifecycle Management\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---\n", "\n", "This notebook's CI test result for us-west-2 is as follows. CI test results in other regions can be found at the end of the notebook. \n", "\n", "![This us-west-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/us-west-2/sagemaker-pipelines|tabular|emr-step|sagemaker-pipelines-emr-step-with-cluster-lifecycle-management.ipynb)\n", "\n", "---" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "This notebook illustrates how an [EMR step](https://docs.aws.amazon.com/sagemaker/latest/dg/build-and-manage-steps.html#step-type-emr) can be run from a SageMaker Pipeline. This example requires a cluster config. The EMR step uses the cluster config to create an EMR cluster, performs the required job and finally closes the cluster.\n", "\n", "The steps in this pipeline include:\n", "* Preprocess the Abalone dataset with PySpark on EMR\n", "* Train an XGBoost Model\n", "* Evaluate the model performance\n", "* Create a model\n", "\n", "\n", "A step to register the model into a Model Registry can be added to the pipeline using the `RegisterModel` step." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Contents\n", "\n", "1. [Prerequisites](#Prerequisites)\n", "1. [Configuration Setup](#Configuration-Setup)\n", "1. [Parameters](#Parameters)\n", "1. [Data Preparation](#Data-Preparation)\n", "1. [Model Training and Evaluation](#Model-Training-and-Evaluation)\n", "1. [Model Registry](#Model-registry)\n", "1. [Execute the Pipeline](#Execute-the-Pipeline)\n", "1. [Cleanup](#Cleanup)\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Prerequisites\n", "\n", "To run this notebook you will need:\n", "* EMR roles\n", "* IAM policies which enable the notebook to run a step on an Amazon EMR cluster \n", " \n", "#### EMR roles\n", "You'll need:\n", "* Service role for Amazon EMR (EMR role) - this is passed as the `ServiceRole` parameter\n", "* Service role for cluster EC2 instances (EC2 instance profile) - this is passed as the `JobFlowRole` parameter \n", "\n", "See ['EMR IAM roles'](https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-iam-roles.html) for more details.\n", "\n", "#### IAM policy\n", "The notebook execution role should have policies which enable the notebook to run a step on an Amazon EMR cluster. The Amazon managed policy `AmazonSageMakerPipelinesIntegrations` should be added to the notebook execution role.\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Setup \n", "\n", "### Setup Dependencies" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "%pip install --upgrade sagemaker" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "import os\n", "import json\n", "\n", "import boto3\n", "import sagemaker\n", "import sagemaker.session\n", "\n", "from sagemaker.estimator import Estimator\n", "from sagemaker.inputs import TrainingInput\n", "from sagemaker.model_metrics import (\n", " MetricsSource,\n", " ModelMetrics,\n", ")\n", "from sagemaker.processing import (\n", " ProcessingInput,\n", " ProcessingOutput,\n", " ScriptProcessor,\n", ")\n", "from sagemaker.sklearn.processing import SKLearnProcessor\n", "from sagemaker.workflow.conditions import ConditionLessThanOrEqualTo\n", "from sagemaker.workflow.condition_step import (\n", " ConditionStep,\n", ")\n", "from sagemaker.workflow.functions import (\n", " JsonGet,\n", ")\n", "from sagemaker.workflow.parameters import (\n", " ParameterInteger,\n", " ParameterString,\n", ")\n", "from sagemaker.workflow.pipeline import Pipeline\n", "from sagemaker.workflow.properties import PropertyFile\n", "from sagemaker.workflow.steps import (\n", " ProcessingStep,\n", " TrainingStep,\n", ")\n", "from sagemaker.workflow.model_step import ModelStep\n", "from sagemaker.model import Model\n", "\n", "from sagemaker.workflow.emr_step import EMRStep, EMRStepConfig\n", "from sagemaker.workflow.pipeline_context import PipelineSession" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Configuration Setup" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's now configure the setup we need, which includes the session object from the SageMaker Python SDK, and necessary configurations for the pipelines, such as object types, input and output buckets and so on." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "# Create the SageMaker Session\n", "\n", "sagemaker_session = sagemaker.Session()\n", "role = sagemaker.get_execution_role()\n", "region = sagemaker_session.boto_region_name\n", "boto_session = boto3.Session(region_name=region)\n", "sagemaker_client = sagemaker_session.sagemaker_client\n", "default_bucket = sagemaker_session.default_bucket()\n", "\n", "\n", "account = boto_session.client(\"sts\").get_caller_identity()[\"Account\"]\n", "\n", "\n", "pipeline_session = PipelineSession(\n", " boto_session=boto_session,\n", " sagemaker_client=sagemaker_client,\n", " default_bucket=default_bucket,\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Parameters\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "model_package_group_name = \"AbalonePackageGroup\"\n", "pipeline_name = \"EMRStepPipeline\"\n", "base_job_prefix = \"emr-step-pipeline\"\n", "processing_instance_type = \"ml.m5.xlarge\"\n", "training_instance_type = \"ml.m5.xlarge\"\n", "BASE_DIR = \"code\"\n", "\n", "job_flow_role = f\"arn:aws:iam::{account}:instance-profile/EMR_EC2_DefaultRole\"\n", "service_role = f\"arn:aws:iam::{account}:role/EMR_DefaultRole_V2\"" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "# Define variables and parameters needed for the Pipeline steps\n", "# parameters for pipeline execution\n", "\n", "processing_instance_count = ParameterInteger(name=\"ProcessingInstanceCount\", default_value=1)\n", "model_approval_status = ParameterString(\n", " name=\"ModelApprovalStatus\", default_value=\"PendingManualApproval\"\n", ")\n", "input_data = ParameterString(\n", " name=\"InputDataUrl\",\n", " default_value=f\"s3://sagemaker-example-files-prod-{region}/datasets/tabular/uci_abalone/abalone.csv\",\n", ")\n", "\n", "output_path = f\"s3://{default_bucket}/{base_job_prefix}/prep\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Data Preparation" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "A PySpark job on EMR is used to prepare the for the training job. Using the script `preprocess.py`, the dataset is featurized and split into train, test, and validation datasets.\n", "\n", "The output of this step is used as the input to the TrainingStep.\n", "\n", "The dataset you use is the [UCI Machine Learning Abalone Dataset](https://archive.ics.uci.edu/ml/datasets/abalone) [1]. The aim for this task is to determine the age of an abalone snail from its physical measurements. At the core, this is a regression problem.\n", "\n", "The dataset contains several features: length (the longest shell measurement), diameter (the diameter perpendicular to length), height (the height with meat in the shell), whole_weight (the weight of whole abalone), shucked_weight (the weight of meat), viscera_weight (the gut weight after bleeding), shell_weight (the weight after being dried), sex ('M', 'F', 'I' where 'I' is Infant), and rings (integer).\n", "\n", "The number of rings turns out to be a good approximation for age (age is rings + 1.5). However, to obtain this number requires cutting the shell through the cone, staining the section, and counting the number of rings through a microscope, which is a time-consuming task. However, the other physical measurements are easier to determine. You use the dataset to build a predictive model of the variable rings through these other physical measurements.\n", "\n", "Before you upload the data to an S3 bucket, install the SageMaker Python SDK and gather some constants you can use later in this notebook.\n", "\n", "> [1] Dua, D. and Graff, C. (2019). [UCI Machine Learning Repository](http://archive.ics.uci.edu/ml). Irvine, CA: University of California, School of Information and Computer Science." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "%mkdir code" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "%%writefile code/preprocess.py\n", "\n", "from pyspark.ml.feature import StringIndexer\n", "from pyspark.sql import SparkSession\n", "from pyspark.sql import functions as F\n", "from pyspark.ml.functions import vector_to_array\n", "from pyspark.sql.types import StructType, StructField, StringType, DoubleType\n", "from pyspark.ml import Pipeline\n", "from argparse import ArgumentParser\n", "\n", "\n", "def process(args):\n", " print(\"Starting spark session\")\n", " spark = SparkSession.builder.appName(\"preprocess\").getOrCreate()\n", " spark.conf.set(\"mapreduce.fileoutputcommitter.marksuccessfuljobs\", \"false\")\n", "\n", " schema = StructType(\n", " [\n", " StructField(\"sex\", StringType(), True),\n", " StructField(\"length\", DoubleType(), True),\n", " StructField(\"diameter\", DoubleType(), True),\n", " StructField(\"height\", DoubleType(), True),\n", " StructField(\"whole_weight\", DoubleType(), True),\n", " StructField(\"shucked_weight\", DoubleType(), True),\n", " StructField(\"viscera_weight\", DoubleType(), True),\n", " StructField(\"shell_weight\", DoubleType(), True),\n", " StructField(\"rings\", DoubleType(), True),\n", " ]\n", " )\n", "\n", " print(\"Reading source data\")\n", " df = spark.read.csv(args.input, schema=schema).fillna(\"missing\", subset=[\"sex\"])\n", "\n", " from pyspark.ml.feature import (\n", " Imputer,\n", " OneHotEncoder,\n", " StandardScaler,\n", " OneHotEncoder,\n", " VectorAssembler,\n", " )\n", "\n", " numerical_features = [\n", " \"length\",\n", " \"diameter\",\n", " \"height\",\n", " \"whole_weight\",\n", " \"shucked_weight\",\n", " \"viscera_weight\",\n", " \"shell_weight\",\n", " ]\n", " print(\"Performing feature engineering\")\n", " pipeline = Pipeline(\n", " stages=[\n", " StringIndexer(inputCol=\"sex\", outputCol=\"cat_sex\"),\n", " OneHotEncoder(\n", " inputCols=[\"cat_sex\"],\n", " outputCols=[\"feature_sex\"],\n", " dropLast=False,\n", " ),\n", " Imputer(\n", " inputCols=numerical_features,\n", " outputCols=[f\"impute_{c}\" for c in numerical_features],\n", " strategy=\"median\",\n", " ),\n", " VectorAssembler(\n", " inputCols=[f\"impute_{c}\" for c in numerical_features], outputCol=\"vector\"\n", " ),\n", " StandardScaler(inputCol=\"vector\", outputCol=\"features\"),\n", " ]\n", " )\n", " print(\"Fitting transformers\")\n", " model = pipeline.fit(df)\n", " print(\"Transforming source data\")\n", " df_out = (\n", " model.transform(df)\n", " .select(\n", " \"rings\",\n", " vector_to_array(F.col(\"features\")).alias(\"features\"),\n", " vector_to_array(F.col(\"feature_sex\")).alias(\"feature_sex\"),\n", " )\n", " .select(\n", " [F.col(\"rings\")]\n", " + [F.col(\"features\")[idx] for idx in range(len(numerical_features))]\n", " + [F.col(\"feature_sex\")[idx] for idx in range(3)]\n", " )\n", " )\n", "\n", " # shuffle\n", " # split train/test/valid\n", " # write out to csvs without headers or indices\n", " print(\"Writing train/valid/test spits\")\n", " train, valid, test = df_out.orderBy(F.rand()).randomSplit([0.7, 0.15, 0.15])\n", " prefix = args.output\n", " train.repartition(1).write.mode(\"overwrite\").csv(f\"{prefix}/train\")\n", " valid.repartition(1).write.mode(\"overwrite\").csv(f\"{prefix}/valid\")\n", " test.repartition(1).write.mode(\"overwrite\").csv(f\"{prefix}/test\")\n", "\n", " spark.stop()\n", "\n", "\n", "if __name__ == \"__main__\":\n", " parser = ArgumentParser()\n", " parser.add_argument(\"--input\")\n", " parser.add_argument(\"--output\")\n", " args, _ = parser.parse_known_args()\n", " process(args)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "script = sagemaker_session.upload_data(\"code/preprocess.py\", key_prefix=f\"{base_job_prefix}/app\")" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "# Process the training data step using a PySpark script.\n", "# Split the training data set into train, test, and validation datasets\n", "# Run as a step as a job flow on EMR\n", "emr_config = EMRStepConfig(\n", " jar=\"command-runner.jar\",\n", " args=[\n", " \"spark-submit\",\n", " \"--deploy-mode\",\n", " \"cluster\",\n", " script,\n", " \"--input\",\n", " input_data,\n", " \"--output\",\n", " output_path,\n", " ],\n", ")\n", "\n", "\n", "step_emr = EMRStep(\n", " name=\"EMRStep\",\n", " cluster_id=None,\n", " step_config=emr_config,\n", " display_name=\"Preprocess\",\n", " description=\"preprocess data for XGBoost\",\n", " cluster_config={\n", " \"Applications\": [\n", " {\n", " \"Name\": \"Spark\",\n", " }\n", " ],\n", " \"Instances\": {\n", " \"InstanceGroups\": [\n", " {\"InstanceRole\": \"MASTER\", \"InstanceCount\": 1, \"InstanceType\": \"m5.2xlarge\"},\n", " {\"InstanceRole\": \"CORE\", \"InstanceCount\": 2, \"InstanceType\": \"m5.2xlarge\"},\n", " ]\n", " },\n", " \"BootstrapActions\": [],\n", " \"ReleaseLabel\": \"emr-6.6.0\",\n", " \"JobFlowRole\": job_flow_role,\n", " \"ServiceRole\": service_role,\n", " },\n", ")" ] }, { "cell_type": "markdown", "metadata": { "tags": [] }, "source": [ "## Model Training and Evaluation\n", "\n", "We will now train an XGBoost model using the SageMaker Python SDK and the output of the EMR Step." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "# training step for generating model artifacts\n", "model_path = f\"s3://{sagemaker_session.default_bucket()}/{base_job_prefix}/AbaloneTrain\"\n", "image_uri = sagemaker.image_uris.retrieve(\n", " framework=\"xgboost\",\n", " region=region,\n", " version=\"1.0-1\",\n", " py_version=\"py3\",\n", " instance_type=training_instance_type,\n", ")\n", "xgb_train = Estimator(\n", " image_uri=image_uri,\n", " instance_type=training_instance_type,\n", " instance_count=1,\n", " output_path=model_path,\n", " base_job_name=f\"{base_job_prefix}/abalone-train\",\n", " sagemaker_session=pipeline_session,\n", " role=role,\n", ")\n", "xgb_train.set_hyperparameters(\n", " objective=\"reg:linear\",\n", " num_round=50,\n", " max_depth=5,\n", " eta=0.2,\n", " gamma=4,\n", " min_child_weight=6,\n", " subsample=0.7,\n", " silent=0,\n", ")\n", "step_args = xgb_train.fit(\n", " inputs={\n", " \"train\": TrainingInput(\n", " s3_data=f\"{output_path}/train\",\n", " content_type=\"text/csv\",\n", " ),\n", " \"validation\": TrainingInput(\n", " s3_data=f\"{output_path}/valid\",\n", " content_type=\"text/csv\",\n", " ),\n", " },\n", ")\n", "step_train = TrainingStep(\n", " name=\"TrainAbaloneModel\",\n", " step_args=step_args,\n", ")" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "step_train.add_depends_on([step_emr])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Evaluating the model\n", "\n", "Use a processing job to evaluate the model from the TrainingStep. If the output of the evaluation is True, a model is created." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "%%writefile code/evaluate.py\n", "\n", "\"\"\"Evaluation script for measuring mean squared error.\"\"\"\n", "import json\n", "import logging\n", "import pathlib\n", "import pickle\n", "import tarfile\n", "\n", "import numpy as np\n", "import pandas as pd\n", "import xgboost\n", "import glob\n", "\n", "from sklearn.metrics import mean_squared_error\n", "\n", "logger = logging.getLogger()\n", "logger.setLevel(logging.INFO)\n", "logger.addHandler(logging.StreamHandler())\n", "\n", "\n", "if __name__ == \"__main__\":\n", " logger.debug(\"Starting evaluation.\")\n", " model_path = \"/opt/ml/processing/model/model.tar.gz\"\n", " with tarfile.open(model_path) as tar:\n", " tar.extractall(path=\".\")\n", "\n", " logger.debug(\"Loading xgboost model.\")\n", " model = pickle.load(open(\"xgboost-model\", \"rb\"))\n", "\n", " logger.debug(\"Reading test data.\")\n", "\n", " test_path = \"/opt/ml/processing/test/\"\n", " test_file = glob.glob(f\"{test_path}/*.csv\")[0]\n", " df = pd.read_csv(test_file, header=None)\n", "\n", " logger.debug(\"Reading test data.\")\n", " y_test = df.iloc[:, 0].to_numpy()\n", " df.drop(df.columns[0], axis=1, inplace=True)\n", " X_test = xgboost.DMatrix(df.values)\n", "\n", " logger.info(\"Performing predictions against test data.\")\n", " predictions = model.predict(X_test)\n", "\n", " logger.debug(\"Calculating mean squared error.\")\n", " mse = mean_squared_error(y_test, predictions)\n", " std = np.std(y_test - predictions)\n", " report_dict = {\n", " \"regression_metrics\": {\n", " \"mse\": {\"value\": mse, \"standard_deviation\": std},\n", " },\n", " }\n", "\n", " output_dir = \"/opt/ml/processing/evaluation\"\n", " pathlib.Path(output_dir).mkdir(parents=True, exist_ok=True)\n", "\n", " logger.info(\"Writing out evaluation report with mse: %f\", mse)\n", " evaluation_path = f\"{output_dir}/evaluation.json\"\n", " with open(evaluation_path, \"w\") as f:\n", " f.write(json.dumps(report_dict))" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "script_eval = ScriptProcessor(\n", " image_uri=image_uri,\n", " command=[\"python3\"],\n", " instance_type=processing_instance_type,\n", " instance_count=1,\n", " base_job_name=f\"{base_job_prefix}/script-abalone-eval\",\n", " sagemaker_session=pipeline_session,\n", " role=role,\n", ")\n", "step_args = script_eval.run(\n", " inputs=[\n", " ProcessingInput(\n", " source=step_train.properties.ModelArtifacts.S3ModelArtifacts,\n", " destination=\"/opt/ml/processing/model\",\n", " ),\n", " ProcessingInput(\n", " source=f\"{output_path}/test\",\n", " destination=\"/opt/ml/processing/test\",\n", " ),\n", " ],\n", " outputs=[\n", " ProcessingOutput(output_name=\"evaluation\", source=\"/opt/ml/processing/evaluation\"),\n", " ],\n", " code=os.path.join(BASE_DIR, \"evaluate.py\"),\n", ")\n", "evaluation_report = PropertyFile(\n", " name=\"AbaloneEvaluationReport\",\n", " output_name=\"evaluation\",\n", " path=\"evaluation.json\",\n", ")\n", "step_eval = ProcessingStep(\n", " name=\"EvaluateAbaloneModel\",\n", " step_args=step_args,\n", " property_files=[evaluation_report],\n", ")" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "step_eval.add_depends_on([step_emr])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Model registry" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "# register model step that will be conditionally executed\n", "model_metrics = ModelMetrics(\n", " model_statistics=MetricsSource(\n", " s3_uri=\"{}/evaluation.json\".format(\n", " step_eval.arguments[\"ProcessingOutputConfig\"][\"Outputs\"][0][\"S3Output\"][\"S3Uri\"]\n", " ),\n", " content_type=\"application/json\",\n", " )\n", ")\n", "model = Model(\n", " image_uri=image_uri,\n", " model_data=step_train.properties.ModelArtifacts.S3ModelArtifacts,\n", " sagemaker_session=pipeline_session,\n", " role=role,\n", ")\n", "step_args = model.register(\n", " content_types=[\"text/csv\"],\n", " response_types=[\"text/csv\"],\n", " inference_instances=[\"ml.t2.medium\", \"ml.m5.large\"],\n", " transform_instances=[\"ml.m5.large\"],\n", " model_package_group_name=model_package_group_name,\n", " approval_status=model_approval_status,\n", " model_metrics=model_metrics,\n", ")\n", "step_register = ModelStep(\n", " name=\"RegisterAbaloneModel\",\n", " step_args=step_args,\n", ")\n", "\n", "# condition step for evaluating model quality and branching execution\n", "cond_lte = ConditionLessThanOrEqualTo(\n", " left=JsonGet(\n", " step_name=step_eval.name,\n", " property_file=evaluation_report,\n", " json_path=\"regression_metrics.mse.value\",\n", " ),\n", " right=6.0,\n", ")\n", "step_cond = ConditionStep(\n", " name=\"CheckMSEAbaloneEvaluation\",\n", " conditions=[cond_lte],\n", " if_steps=[step_register],\n", " else_steps=[],\n", ")" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "# Use the same pipeline name across executions for cache usage.\n", "\n", "pipeline = Pipeline(\n", " name=pipeline_name,\n", " parameters=[\n", " processing_instance_type,\n", " processing_instance_count,\n", " training_instance_type,\n", " model_approval_status,\n", " input_data,\n", " ],\n", " steps=[step_emr, step_train, step_eval, step_cond],\n", " sagemaker_session=pipeline_session,\n", ")" ] }, { "cell_type": "markdown", "metadata": { "tags": [] }, "source": [ "## Execute the pipeline" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "definition = json.loads(pipeline.definition())\n", "definition" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "pipeline.upsert(role_arn=role)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "execution = pipeline.start()" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "execution.wait()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Cleanup\n", "Running the following cell will delete the following resources created in this notebook\n", "* SageMaker Pipeline\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Delete the Pipeline\n", "sagemaker_client.delete_pipeline(PipelineName=pipeline_name)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Notebook CI Test Results\n", "\n", "This notebook was tested in multiple regions. The test results are as follows, except for us-west-2 which is shown at the top of the notebook.\n", "\n", "![This us-east-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/us-east-1/sagemaker-pipelines|tabular|emr-step|sagemaker-pipelines-emr-step-with-cluster-lifecycle-management.ipynb)\n", "\n", "![This us-east-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/us-east-2/sagemaker-pipelines|tabular|emr-step|sagemaker-pipelines-emr-step-with-cluster-lifecycle-management.ipynb)\n", "\n", "![This us-west-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/us-west-1/sagemaker-pipelines|tabular|emr-step|sagemaker-pipelines-emr-step-with-cluster-lifecycle-management.ipynb)\n", "\n", "![This ca-central-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/ca-central-1/sagemaker-pipelines|tabular|emr-step|sagemaker-pipelines-emr-step-with-cluster-lifecycle-management.ipynb)\n", "\n", "![This sa-east-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/sa-east-1/sagemaker-pipelines|tabular|emr-step|sagemaker-pipelines-emr-step-with-cluster-lifecycle-management.ipynb)\n", "\n", "![This eu-west-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/eu-west-1/sagemaker-pipelines|tabular|emr-step|sagemaker-pipelines-emr-step-with-cluster-lifecycle-management.ipynb)\n", "\n", "![This eu-west-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/eu-west-2/sagemaker-pipelines|tabular|emr-step|sagemaker-pipelines-emr-step-with-cluster-lifecycle-management.ipynb)\n", "\n", "![This eu-west-3 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/eu-west-3/sagemaker-pipelines|tabular|emr-step|sagemaker-pipelines-emr-step-with-cluster-lifecycle-management.ipynb)\n", "\n", "![This eu-central-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/eu-central-1/sagemaker-pipelines|tabular|emr-step|sagemaker-pipelines-emr-step-with-cluster-lifecycle-management.ipynb)\n", "\n", "![This eu-north-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/eu-north-1/sagemaker-pipelines|tabular|emr-step|sagemaker-pipelines-emr-step-with-cluster-lifecycle-management.ipynb)\n", "\n", "![This ap-southeast-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/ap-southeast-1/sagemaker-pipelines|tabular|emr-step|sagemaker-pipelines-emr-step-with-cluster-lifecycle-management.ipynb)\n", "\n", "![This ap-southeast-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/ap-southeast-2/sagemaker-pipelines|tabular|emr-step|sagemaker-pipelines-emr-step-with-cluster-lifecycle-management.ipynb)\n", "\n", "![This ap-northeast-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/ap-northeast-1/sagemaker-pipelines|tabular|emr-step|sagemaker-pipelines-emr-step-with-cluster-lifecycle-management.ipynb)\n", "\n", "![This ap-northeast-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/ap-northeast-2/sagemaker-pipelines|tabular|emr-step|sagemaker-pipelines-emr-step-with-cluster-lifecycle-management.ipynb)\n", "\n", "![This ap-south-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/ap-south-1/sagemaker-pipelines|tabular|emr-step|sagemaker-pipelines-emr-step-with-cluster-lifecycle-management.ipynb)\n" ] } ], "metadata": { "availableInstances": [ { "_defaultOrder": 0, "_isFastLaunch": true, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 4, "name": "ml.t3.medium", "vcpuNum": 2 }, { "_defaultOrder": 1, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 8, "name": "ml.t3.large", "vcpuNum": 2 }, { "_defaultOrder": 2, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 16, "name": "ml.t3.xlarge", "vcpuNum": 4 }, { "_defaultOrder": 3, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 32, "name": "ml.t3.2xlarge", "vcpuNum": 8 }, { "_defaultOrder": 4, "_isFastLaunch": true, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 8, "name": "ml.m5.large", "vcpuNum": 2 }, { "_defaultOrder": 5, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 16, "name": "ml.m5.xlarge", "vcpuNum": 4 }, { "_defaultOrder": 6, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 32, "name": "ml.m5.2xlarge", "vcpuNum": 8 }, { "_defaultOrder": 7, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 64, "name": "ml.m5.4xlarge", "vcpuNum": 16 }, { "_defaultOrder": 8, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 128, "name": "ml.m5.8xlarge", "vcpuNum": 32 }, { "_defaultOrder": 9, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 192, "name": "ml.m5.12xlarge", "vcpuNum": 48 }, { "_defaultOrder": 10, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 256, "name": "ml.m5.16xlarge", "vcpuNum": 64 }, { "_defaultOrder": 11, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 384, "name": "ml.m5.24xlarge", "vcpuNum": 96 }, { "_defaultOrder": 12, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 8, "name": "ml.m5d.large", "vcpuNum": 2 }, { "_defaultOrder": 13, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 16, "name": "ml.m5d.xlarge", "vcpuNum": 4 }, { "_defaultOrder": 14, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 32, "name": "ml.m5d.2xlarge", "vcpuNum": 8 }, { "_defaultOrder": 15, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 64, "name": "ml.m5d.4xlarge", "vcpuNum": 16 }, { "_defaultOrder": 16, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 128, "name": "ml.m5d.8xlarge", "vcpuNum": 32 }, { "_defaultOrder": 17, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 192, "name": "ml.m5d.12xlarge", "vcpuNum": 48 }, { "_defaultOrder": 18, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 256, "name": "ml.m5d.16xlarge", "vcpuNum": 64 }, { "_defaultOrder": 19, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 384, "name": "ml.m5d.24xlarge", "vcpuNum": 96 }, { "_defaultOrder": 20, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": true, "memoryGiB": 0, "name": "ml.geospatial.interactive", "supportedImageNames": [ "sagemaker-geospatial-v1-0" ], "vcpuNum": 0 }, { "_defaultOrder": 21, "_isFastLaunch": true, "category": "Compute optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 4, "name": "ml.c5.large", "vcpuNum": 2 }, { "_defaultOrder": 22, "_isFastLaunch": false, "category": "Compute optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 8, "name": "ml.c5.xlarge", "vcpuNum": 4 }, { "_defaultOrder": 23, "_isFastLaunch": false, "category": "Compute optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 16, "name": "ml.c5.2xlarge", "vcpuNum": 8 }, { "_defaultOrder": 24, "_isFastLaunch": false, "category": "Compute optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 32, "name": "ml.c5.4xlarge", "vcpuNum": 16 }, { "_defaultOrder": 25, "_isFastLaunch": false, "category": "Compute optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 72, "name": "ml.c5.9xlarge", "vcpuNum": 36 }, { "_defaultOrder": 26, "_isFastLaunch": false, "category": "Compute optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 96, "name": "ml.c5.12xlarge", "vcpuNum": 48 }, { "_defaultOrder": 27, "_isFastLaunch": false, "category": "Compute optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 144, "name": "ml.c5.18xlarge", "vcpuNum": 72 }, { "_defaultOrder": 28, "_isFastLaunch": false, "category": "Compute optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 192, "name": "ml.c5.24xlarge", "vcpuNum": 96 }, { "_defaultOrder": 29, "_isFastLaunch": true, "category": "Accelerated computing", "gpuNum": 1, "hideHardwareSpecs": false, "memoryGiB": 16, "name": "ml.g4dn.xlarge", "vcpuNum": 4 }, { "_defaultOrder": 30, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 1, "hideHardwareSpecs": false, "memoryGiB": 32, "name": "ml.g4dn.2xlarge", "vcpuNum": 8 }, { "_defaultOrder": 31, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 1, "hideHardwareSpecs": false, "memoryGiB": 64, "name": "ml.g4dn.4xlarge", "vcpuNum": 16 }, { "_defaultOrder": 32, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 1, "hideHardwareSpecs": false, "memoryGiB": 128, "name": "ml.g4dn.8xlarge", "vcpuNum": 32 }, { "_defaultOrder": 33, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 4, "hideHardwareSpecs": false, "memoryGiB": 192, "name": "ml.g4dn.12xlarge", "vcpuNum": 48 }, { "_defaultOrder": 34, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 1, "hideHardwareSpecs": false, "memoryGiB": 256, "name": "ml.g4dn.16xlarge", "vcpuNum": 64 }, { "_defaultOrder": 35, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 1, "hideHardwareSpecs": false, "memoryGiB": 61, "name": "ml.p3.2xlarge", "vcpuNum": 8 }, { "_defaultOrder": 36, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 4, "hideHardwareSpecs": false, "memoryGiB": 244, "name": "ml.p3.8xlarge", "vcpuNum": 32 }, { "_defaultOrder": 37, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 8, "hideHardwareSpecs": false, "memoryGiB": 488, "name": "ml.p3.16xlarge", "vcpuNum": 64 }, { "_defaultOrder": 38, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 8, "hideHardwareSpecs": false, "memoryGiB": 768, "name": "ml.p3dn.24xlarge", "vcpuNum": 96 }, { "_defaultOrder": 39, "_isFastLaunch": false, "category": "Memory Optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 16, "name": "ml.r5.large", "vcpuNum": 2 }, { "_defaultOrder": 40, "_isFastLaunch": false, "category": "Memory Optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 32, "name": "ml.r5.xlarge", "vcpuNum": 4 }, { "_defaultOrder": 41, "_isFastLaunch": false, "category": "Memory Optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 64, "name": "ml.r5.2xlarge", "vcpuNum": 8 }, { "_defaultOrder": 42, "_isFastLaunch": false, "category": "Memory Optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 128, "name": "ml.r5.4xlarge", "vcpuNum": 16 }, { "_defaultOrder": 43, "_isFastLaunch": false, "category": "Memory Optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 256, "name": "ml.r5.8xlarge", "vcpuNum": 32 }, { "_defaultOrder": 44, "_isFastLaunch": false, "category": "Memory Optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 384, "name": "ml.r5.12xlarge", "vcpuNum": 48 }, { "_defaultOrder": 45, "_isFastLaunch": false, "category": "Memory Optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 512, "name": "ml.r5.16xlarge", "vcpuNum": 64 }, { "_defaultOrder": 46, "_isFastLaunch": false, "category": "Memory Optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 768, "name": "ml.r5.24xlarge", "vcpuNum": 96 }, { "_defaultOrder": 47, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 1, "hideHardwareSpecs": false, "memoryGiB": 16, "name": "ml.g5.xlarge", "vcpuNum": 4 }, { "_defaultOrder": 48, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 1, "hideHardwareSpecs": false, "memoryGiB": 32, "name": "ml.g5.2xlarge", "vcpuNum": 8 }, { "_defaultOrder": 49, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 1, "hideHardwareSpecs": false, "memoryGiB": 64, "name": "ml.g5.4xlarge", "vcpuNum": 16 }, { "_defaultOrder": 50, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 1, "hideHardwareSpecs": false, "memoryGiB": 128, "name": "ml.g5.8xlarge", "vcpuNum": 32 }, { "_defaultOrder": 51, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 1, "hideHardwareSpecs": false, "memoryGiB": 256, "name": "ml.g5.16xlarge", "vcpuNum": 64 }, { "_defaultOrder": 52, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 4, "hideHardwareSpecs": false, "memoryGiB": 192, "name": "ml.g5.12xlarge", "vcpuNum": 48 }, { "_defaultOrder": 53, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 4, "hideHardwareSpecs": false, "memoryGiB": 384, "name": "ml.g5.24xlarge", "vcpuNum": 96 }, { "_defaultOrder": 54, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 8, "hideHardwareSpecs": false, "memoryGiB": 768, "name": "ml.g5.48xlarge", "vcpuNum": 192 }, { "_defaultOrder": 55, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 8, "hideHardwareSpecs": false, "memoryGiB": 1152, "name": "ml.p4d.24xlarge", "vcpuNum": 96 }, { "_defaultOrder": 56, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 8, "hideHardwareSpecs": false, "memoryGiB": 1152, "name": "ml.p4de.24xlarge", "vcpuNum": 96 } ], "kernelspec": { "display_name": "Python 3 (Data Science 3.0)", "language": "python", "name": "python3__SAGEMAKER_INTERNAL__arn:aws:sagemaker:us-west-2:236514542706:image/sagemaker-data-science-310-v1" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.6" }, "pycharm": { "stem_cell": { "cell_type": "raw", "metadata": { "collapsed": false }, "source": [] } }, "toc": { "base_numbering": 1, "nav_menu": {}, "number_sections": true, "sideBar": true, "skip_h1_title": false, "title_cell": "Table of Contents", "title_sidebar": "Contents", "toc_cell": false, "toc_position": {}, "toc_section_display": true, "toc_window_display": false }, "varInspector": { "cols": { "lenName": 16, "lenType": 16, "lenVar": 40 }, "kernels_config": { "python": { "delete_cmd_postfix": "", "delete_cmd_prefix": "del ", "library": "var_list.py", "varRefreshCmd": "print(var_dic_list())" }, "r": { "delete_cmd_postfix": ") ", "delete_cmd_prefix": "rm(", "library": "var_list.r", "varRefreshCmd": "cat(var_dic_list()) " } }, "types_to_exclude": [ "module", "function", "builtin_function_or_method", "instance", "_Feature" ], "window_display": false } }, "nbformat": 4, "nbformat_minor": 4 }