{
 "cells": [
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Parameterize spark configuration in pipeline PySparkProcessor execution\n"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "---\n",
    "\n",
    "This notebook's CI test result for us-west-2 is as follows. CI test results in other regions can be found at the end of the notebook. \n",
    "\n",
    "![This us-west-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/us-west-2/sagemaker-spark|parameterize-spark-config-pysparkprocessor-pipeline|parameterize-spark-config-pysparkprocessor-pipeline.ipynb)\n",
    "\n",
    "---"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Overview\n",
    "In this example, we demonstrate how we can parameterize spark-configuration in different pipeline PySparkProcessor executions. This example is an extended version of [Specifying additional Spark configuration](https://sagemaker-examples.readthedocs.io/en/latest/sagemaker_processing/spark_distributed_data_processing/sagemaker-spark-processing.html#Example-4:-Specifying-additional-Spark-configuration) example in [Distributed Data Processing using Apache Spark and SageMaker Processing](https://sagemaker-examples.readthedocs.io/en/latest/sagemaker_processing/spark_distributed_data_processing/sagemaker-spark-processing.html#Distributed-Data-Processing-using-Apache-Spark-and-SageMaker-Processing). Here we are creating a simple pipeline with one processing step to demonstrate spark-configuration parameterization capabilities in sagemaker pipeline PySparkProcessor. This could be useful to pipeline users who want to define different spark-configuraitons for different pipeline PySparkProcessor executions."
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\n",
    "### Prerequisites\n",
    "To learn about how we can create pipeline, follow this [tutorial](https://docs.aws.amazon.com/sagemaker/latest/dg/define-pipeline.html)\n",
    "\n",
    "## Pipeline Creation\n",
    "The following is the step-by-step process to demonstrate parameterization capabilities in pipeline PySparkProcessor\n",
    "\n",
    "#### Step-1: Install the latest SageMaker Python SDK"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!pip install -U \"sagemaker>2.0\""
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Restart your notebook kernel after upgrading the SDK"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Step-2: Setup Environment"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "import boto3\n",
    "import sagemaker\n",
    "\n",
    "from sagemaker.workflow.pipeline_context import PipelineSession\n",
    "\n",
    "sagemaker_session = PipelineSession()\n",
    "role = sagemaker.get_execution_role()\n",
    "default_bucket = sagemaker_session.default_bucket()\n",
    "region = sagemaker.Session().boto_region_name"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Create prefix for parametrize-spark-config-pysparkprocessor-demo"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "from time import gmtime, strftime\n",
    "\n",
    "# Upload the raw input dataset to a unique S3 location\n",
    "timestamp_prefix = strftime(\"%Y-%m-%d-%H-%M-%S\", gmtime())\n",
    "prefix = \"sagemaker/parametrize-spark-config-pysparkprocessor-demo/{}\".format(timestamp_prefix)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Step-3: Prepare Input Data\n",
    "In this example, we process [Abalone Data Set](https://archive.ics.uci.edu/ml/datasets/abalone) using PySpark script. We download the data locally and upload it to our Amazon S3 bucket for data processing."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!mkdir -p data\n",
    "local_path = \"./data/abalone-dataset.csv\"\n",
    "\n",
    "s3 = boto3.resource(\"s3\")\n",
    "s3.Bucket(f\"sagemaker-example-files-prod-{region}\").download_file(\n",
    "    \"datasets/tabular/uci_abalone/abalone.csv\", local_path\n",
    ")\n",
    "\n",
    "input_prefix_abalone = \"{}/input/raw/abalone\".format(prefix)\n",
    "input_preprocessed_prefix_abalone = \"{}/input/preprocessed/abalone\".format(prefix)\n",
    "\n",
    "sagemaker_session.upload_data(\n",
    "    path=local_path, bucket=default_bucket, key_prefix=input_prefix_abalone\n",
    ")"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Step-4: Upload default spark-configuraiton\n",
    "Upload default spark-configuration to Amazon S3"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [],
   "source": [
    "import json\n",
    "\n",
    "\n",
    "def upload_to_s3(bucket, prefix, body):\n",
    "    s3_object = s3.Object(bucket, prefix)\n",
    "    s3_object.put(Body=body)\n",
    "\n",
    "\n",
    "default_spark_configuration = [\n",
    "    {\n",
    "        \"Classification\": \"spark-defaults\",\n",
    "        \"Properties\": {\"spark.executor.memory\": \"2g\", \"spark.executor.cores\": \"1\"},\n",
    "    }\n",
    "]\n",
    "default_spark_conf_prefix = \"{}/spark/conf/cores_1/configuration.json\".format(prefix)\n",
    "default_spark_configuration_object_s3_uri = \"s3://{}/{}\".format(\n",
    "    default_bucket, default_spark_conf_prefix\n",
    ")\n",
    "\n",
    "upload_to_s3(default_bucket, default_spark_conf_prefix, json.dumps(default_spark_configuration))"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Step-5: Define Pipeline Parameters\n",
    "If no SparkConfigS3Uri is provided to the pipeline execution, the pipeline uses the pre-uploaded default_spark_configuration as a default spark-config."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [],
   "source": [
    "from sagemaker.workflow.parameters import ParameterString\n",
    "\n",
    "spark_config_s3_uri = ParameterString(\n",
    "    name=\"SparkConfigS3Uri\",\n",
    "    default_value=default_spark_configuration_object_s3_uri,\n",
    ")"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Step-6: Write the PySpark script\n",
    "\n",
    "We create a PySpark script similar to this [example](https://github.com/aws/amazon-sagemaker-examples/blob/main/sagemaker_processing/spark_distributed_data_processing/sagemaker-spark-processing.ipynb). The source for a preprocessing script is in the cell below. This script does some basic feature engineering on a raw input dataset. In this example, the dataset is the Abalone Data Set and the code below performs string indexing, one hot encoding, vector assembly, and combines them into a pipeline to perform these transformations in order. The script then does an 80-20 split to produce training and validation datasets as output. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [],
   "source": [
    "!mkdir -p code"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%%writefile ./code/preprocess.py\n",
    "from __future__ import print_function\n",
    "from __future__ import unicode_literals\n",
    "\n",
    "import argparse\n",
    "import csv\n",
    "import os\n",
    "import shutil\n",
    "import sys\n",
    "import time\n",
    "\n",
    "import pyspark\n",
    "from pyspark.sql import SparkSession\n",
    "from pyspark.ml import Pipeline\n",
    "from pyspark.ml.feature import (\n",
    "    OneHotEncoder,\n",
    "    StringIndexer,\n",
    "    VectorAssembler,\n",
    "    VectorIndexer,\n",
    ")\n",
    "from pyspark.sql.functions import *\n",
    "from pyspark.sql.types import (\n",
    "    DoubleType,\n",
    "    StringType,\n",
    "    StructField,\n",
    "    StructType,\n",
    ")\n",
    "\n",
    "\n",
    "def extract(row):\n",
    "    return (row[0],) + tuple(row[1].toArray().tolist())\n",
    "\n",
    "\n",
    "def main():\n",
    "    parser = argparse.ArgumentParser(description=\"app inputs and outputs\")\n",
    "    parser.add_argument(\"--s3_input_bucket\", type=str, help=\"s3 input bucket\")\n",
    "    parser.add_argument(\"--s3_input_key_prefix\", type=str, help=\"s3 input key prefix\")\n",
    "    parser.add_argument(\"--s3_output_bucket\", type=str, help=\"s3 output bucket\")\n",
    "    parser.add_argument(\"--s3_output_key_prefix\", type=str, help=\"s3 output key prefix\")\n",
    "    args = parser.parse_args()\n",
    "\n",
    "    spark = SparkSession.builder.appName(\"PySparkApp\").getOrCreate()\n",
    "\n",
    "    # This is needed to save RDDs which is the only way to write nested Dataframes into CSV format\n",
    "    spark.sparkContext._jsc.hadoopConfiguration().set(\n",
    "        \"mapred.output.committer.class\", \"org.apache.hadoop.mapred.FileOutputCommitter\"\n",
    "    )\n",
    "\n",
    "    # Defining the schema corresponding to the input data. The input data does not contain the headers\n",
    "    schema = StructType(\n",
    "        [\n",
    "            StructField(\"sex\", StringType(), True),\n",
    "            StructField(\"length\", DoubleType(), True),\n",
    "            StructField(\"diameter\", DoubleType(), True),\n",
    "            StructField(\"height\", DoubleType(), True),\n",
    "            StructField(\"whole_weight\", DoubleType(), True),\n",
    "            StructField(\"shucked_weight\", DoubleType(), True),\n",
    "            StructField(\"viscera_weight\", DoubleType(), True),\n",
    "            StructField(\"shell_weight\", DoubleType(), True),\n",
    "            StructField(\"rings\", DoubleType(), True),\n",
    "        ]\n",
    "    )\n",
    "\n",
    "    # Downloading the data from S3 into a Dataframe\n",
    "    total_df = spark.read.csv(\n",
    "        (\n",
    "            \"s3://\"\n",
    "            + os.path.join(args.s3_input_bucket, args.s3_input_key_prefix, \"abalone-dataset.csv\")\n",
    "        ),\n",
    "        header=False,\n",
    "        schema=schema,\n",
    "    )\n",
    "\n",
    "    # StringIndexer on the sex column which has categorical value\n",
    "    sex_indexer = StringIndexer(inputCol=\"sex\", outputCol=\"indexed_sex\")\n",
    "\n",
    "    # one-hot-encoding is being performed on the string-indexed sex column (indexed_sex)\n",
    "    sex_encoder = OneHotEncoder(inputCol=\"indexed_sex\", outputCol=\"sex_vec\")\n",
    "\n",
    "    # vector-assembler will bring all the features to a 1D vector for us to save easily into CSV format\n",
    "    assembler = VectorAssembler(\n",
    "        inputCols=[\n",
    "            \"sex_vec\",\n",
    "            \"length\",\n",
    "            \"diameter\",\n",
    "            \"height\",\n",
    "            \"whole_weight\",\n",
    "            \"shucked_weight\",\n",
    "            \"viscera_weight\",\n",
    "            \"shell_weight\",\n",
    "        ],\n",
    "        outputCol=\"features\",\n",
    "    )\n",
    "\n",
    "    # The pipeline is comprised of the steps added above\n",
    "    pipeline = Pipeline(stages=[sex_indexer, sex_encoder, assembler])\n",
    "\n",
    "    # This step trains the feature transformers\n",
    "    model = pipeline.fit(total_df)\n",
    "\n",
    "    # This step transforms the dataset with information obtained from the previous fit\n",
    "    transformed_total_df = model.transform(total_df)\n",
    "\n",
    "    # Split the overall dataset into 80-20 training and validation\n",
    "    (train_df, validation_df) = transformed_total_df.randomSplit([0.8, 0.2])\n",
    "\n",
    "    # Convert the train dataframe to RDD to save in CSV format and upload to S3\n",
    "    train_rdd = train_df.rdd.map(lambda x: (x.rings, x.features))\n",
    "\n",
    "    train_rdd.map(extract).toDF().write.mode(\"overwrite\").option(\"header\", False).csv(\n",
    "        \"s3://\" + os.path.join(args.s3_output_bucket, args.s3_output_key_prefix, \"train\")\n",
    "    )\n",
    "\n",
    "    # Convert the validation dataframe to RDD to save in CSV format and upload to S3\n",
    "    validation_rdd = validation_df.rdd.map(lambda x: (x.rings, x.features))\n",
    "\n",
    "    validation_rdd.map(extract).toDF().write.mode(\"overwrite\").option(\"header\", False).csv(\n",
    "        \"s3://\" + os.path.join(args.s3_output_bucket, args.s3_output_key_prefix, \"validation\")\n",
    "    )\n",
    "\n",
    "\n",
    "if __name__ == \"__main__\":\n",
    "    main()"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Step-7: Create PySparkProcessor \n",
    "Create an instance of an PySparkProcessor to pass in to the processing step."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from sagemaker.spark.processing import PySparkProcessor\n",
    "from sagemaker.processing import ProcessingInput\n",
    "from sagemaker.spark.processing import _SparkProcessorBase\n",
    "\n",
    "pyspark_processor = PySparkProcessor(\n",
    "    base_job_name=\"sm-spark\",\n",
    "    framework_version=\"3.1\",\n",
    "    role=role,\n",
    "    instance_count=2,\n",
    "    instance_type=\"ml.m5.xlarge\",\n",
    "    max_runtime_in_seconds=1200,\n",
    "    sagemaker_session=sagemaker_session,\n",
    ")\n",
    "\n",
    "step_args = pyspark_processor.run(\n",
    "    \"./code/preprocess.py\",\n",
    "    inputs=[\n",
    "        ProcessingInput(\n",
    "            source=spark_config_s3_uri,\n",
    "            destination=f\"{pyspark_processor._conf_container_base_path}{pyspark_processor._conf_container_input_name}\",\n",
    "            input_name=_SparkProcessorBase._conf_container_input_name,\n",
    "        )\n",
    "    ],\n",
    "    arguments=[\n",
    "        \"--s3_input_bucket\",\n",
    "        default_bucket,\n",
    "        \"--s3_input_key_prefix\",\n",
    "        input_prefix_abalone,\n",
    "        \"--s3_output_bucket\",\n",
    "        default_bucket,\n",
    "        \"--s3_output_key_prefix\",\n",
    "        input_preprocessed_prefix_abalone,\n",
    "    ],\n",
    ")"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Step-8: Create ProcessingStep and Pipeline \n",
    "Create a processing step. This step takes in the PySparkProcessor, the input and output channels, and the ./code/preprocess.py script that we created. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from sagemaker.workflow.steps import ProcessingStep\n",
    "from sagemaker.workflow.pipeline import Pipeline\n",
    "\n",
    "spark_step_process = ProcessingStep(name=\"AbaloneSparkProcess\", step_args=step_args)\n",
    "\n",
    "pipeline_name = f\"AbalonePipeline-Spark\"\n",
    "pipeline = Pipeline(\n",
    "    name=pipeline_name, parameters=[spark_config_s3_uri], steps=[spark_step_process]\n",
    ")\n",
    "\n",
    "pipeline.upsert(role_arn=role)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now, we have successfully created a sagemaker pipeline with a PySparkProcessor. "
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Pipeline Executions\n",
    "### Execute pipeline with default spark-configuration\n",
    "If no SparkConfigS3Uri parameter value is provided, pipeline execution uses default_spark_configuration_object_s3_uri as a default spark-configuration. In the following execution example, we execute PySparkProcessor with default spark-configuration."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Execute pipeline with default pre-uploaded spark-config\n",
    "execution_with_default_spark_configuration = pipeline.start()\n",
    "\n",
    "# Describe the pipeline execution.\n",
    "execution_with_default_spark_configuration.describe()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Wait for the execution to complete.\n",
    "execution_with_default_spark_configuration.wait()\n",
    "\n",
    "# List the steps in the execution.\n",
    "execution_with_default_spark_configuration.list_steps()"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can verify that PySparkProcessor is using the default spark-configuration by looking into the CloudWatch logs."
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "![default configuration](img/default_conf.png)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Execute pipeline with a new spark-configuraiton\n",
    "We upload a new spark-configuration to Amazon S3 and use it in the next pipeline execution"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [],
   "source": [
    "spark_configuraitons_cores_2 = [\n",
    "    {\n",
    "        \"Classification\": \"spark-defaults\",\n",
    "        \"Properties\": {\"spark.executor.memory\": \"2g\", \"spark.executor.cores\": \"2\"},\n",
    "    }\n",
    "]\n",
    "\n",
    "spark_conf_prefix = \"{}/spark/conf/cores_2/configuration.json\".format(prefix)\n",
    "spark_configuration_object_s3_uri = \"s3://{}/{}\".format(default_bucket, spark_conf_prefix)\n",
    "upload_to_s3(default_bucket, spark_conf_prefix, json.dumps(spark_configuraitons_cores_2))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Execute pipeline with newly uploaded spark-config\n",
    "execution_spark_conf_spark_executor_cores_2 = pipeline.start(\n",
    "    parameters=dict(\n",
    "        SparkConfigS3Uri=spark_configuration_object_s3_uri,\n",
    "    )\n",
    ")\n",
    "\n",
    "# Describe the pipeline execution.\n",
    "execution_spark_conf_spark_executor_cores_2.describe()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Wait for the execution to complete.\n",
    "execution_spark_conf_spark_executor_cores_2.wait()\n",
    "\n",
    "# List the steps in the execution.\n",
    "execution_spark_conf_spark_executor_cores_2.list_steps()"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can verify that PySparkProcessor is using the newly provided spark-configuration by looking into the CloudWatch logs. \n"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "![default configuration](img/spark_conf_executor_cores_2.png)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Notebook CI Test Results\n",
    "\n",
    "This notebook was tested in multiple regions. The test results are as follows, except for us-west-2 which is shown at the top of the notebook.\n",
    "\n",
    "![This us-east-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/us-east-1/sagemaker-spark|parameterize-spark-config-pysparkprocessor-pipeline|parameterize-spark-config-pysparkprocessor-pipeline.ipynb)\n",
    "\n",
    "![This us-east-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/us-east-2/sagemaker-spark|parameterize-spark-config-pysparkprocessor-pipeline|parameterize-spark-config-pysparkprocessor-pipeline.ipynb)\n",
    "\n",
    "![This us-west-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/us-west-1/sagemaker-spark|parameterize-spark-config-pysparkprocessor-pipeline|parameterize-spark-config-pysparkprocessor-pipeline.ipynb)\n",
    "\n",
    "![This ca-central-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/ca-central-1/sagemaker-spark|parameterize-spark-config-pysparkprocessor-pipeline|parameterize-spark-config-pysparkprocessor-pipeline.ipynb)\n",
    "\n",
    "![This sa-east-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/sa-east-1/sagemaker-spark|parameterize-spark-config-pysparkprocessor-pipeline|parameterize-spark-config-pysparkprocessor-pipeline.ipynb)\n",
    "\n",
    "![This eu-west-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/eu-west-1/sagemaker-spark|parameterize-spark-config-pysparkprocessor-pipeline|parameterize-spark-config-pysparkprocessor-pipeline.ipynb)\n",
    "\n",
    "![This eu-west-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/eu-west-2/sagemaker-spark|parameterize-spark-config-pysparkprocessor-pipeline|parameterize-spark-config-pysparkprocessor-pipeline.ipynb)\n",
    "\n",
    "![This eu-west-3 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/eu-west-3/sagemaker-spark|parameterize-spark-config-pysparkprocessor-pipeline|parameterize-spark-config-pysparkprocessor-pipeline.ipynb)\n",
    "\n",
    "![This eu-central-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/eu-central-1/sagemaker-spark|parameterize-spark-config-pysparkprocessor-pipeline|parameterize-spark-config-pysparkprocessor-pipeline.ipynb)\n",
    "\n",
    "![This eu-north-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/eu-north-1/sagemaker-spark|parameterize-spark-config-pysparkprocessor-pipeline|parameterize-spark-config-pysparkprocessor-pipeline.ipynb)\n",
    "\n",
    "![This ap-southeast-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/ap-southeast-1/sagemaker-spark|parameterize-spark-config-pysparkprocessor-pipeline|parameterize-spark-config-pysparkprocessor-pipeline.ipynb)\n",
    "\n",
    "![This ap-southeast-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/ap-southeast-2/sagemaker-spark|parameterize-spark-config-pysparkprocessor-pipeline|parameterize-spark-config-pysparkprocessor-pipeline.ipynb)\n",
    "\n",
    "![This ap-northeast-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/ap-northeast-1/sagemaker-spark|parameterize-spark-config-pysparkprocessor-pipeline|parameterize-spark-config-pysparkprocessor-pipeline.ipynb)\n",
    "\n",
    "![This ap-northeast-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/ap-northeast-2/sagemaker-spark|parameterize-spark-config-pysparkprocessor-pipeline|parameterize-spark-config-pysparkprocessor-pipeline.ipynb)\n",
    "\n",
    "![This ap-south-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/ap-south-1/sagemaker-spark|parameterize-spark-config-pysparkprocessor-pipeline|parameterize-spark-config-pysparkprocessor-pipeline.ipynb)\n"
   ]
  }
 ],
 "metadata": {
  "instance_type": "ml.t3.medium",
  "interpreter": {
   "hash": "aee8b7b246df8f9039afb4144a1f6fd8d2ca17a180786b69acc140d282b71a49"
  },
  "kernelspec": {
   "display_name": "Python 3 (Data Science 3.0)",
   "language": "python",
   "name": "python3__SAGEMAKER_INTERNAL__arn:aws:sagemaker:us-east-1:081325390199:image/sagemaker-data-science-310-v1"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.6"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}