{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Amazon SageMaker Workshop\n",
    "### _**Pipelines**_\n",
    "\n",
    "---\n",
    "In this part of the workshop we will all our previous work from the labs and will automate the whole ML workflow. With that we can make the whole process more robust and any updates to the data preparation, modeling, evaluation, inference and monitoring will be put into production faster and more reliable.\n",
    "\n",
    "---\n",
    "\n",
    "## Contents\n",
    "\n",
    "a. [Background](#background) - Getting the work from previous labs.\n",
    "\n",
    "b. [Create the training pipeline](#Create_pipeline) - featuring [SageMaker Pipelines](https://docs.aws.amazon.com/sagemaker/latest/dg/pipelines.html)\n",
    "1. [Creating data preparation step](#dataprep_step)\n",
    "2. [Creating training step](#train_step)\n",
    "3. [Creating evaluation step](#eval_step)\n",
    "4. [Creating approve and register model steps](#appr_model_reg_step)\n",
    "5. [Finish the pipeline](#end_creation_pipe)\n",
    "\n",
    "d. [Create the end-to-end solution automatically](#SM_Projects) - Create end-to-end ML solutions with CI/CD (featuring [SageMaker Projects](https://docs.aws.amazon.com/sagemaker/latest/dg/sagemaker-projects.html))\n",
    "1. customize the project with our pipeline and code\n",
    "2. trigger training pipeline\n",
    "3. trigger deployment pipeline\n",
    "\n",
    "---\n",
    "<a id='background'></a>\n",
    "\n",
    "## Background\n",
    "\n",
    "In the previous labs we created multiple resources to prepare the data (_2-DataPrep_), train the model (_3-Modeling_), evaluate model performance (_4-Evaluation_), deploy and customize inference logic (_4-Deployment/RealTime_) and monitor the deployed model (_5-Monitoring_).\n",
    "\n",
    "Now it's time to **bring everything together**!\n",
    "\n",
    "We will create a pipeline with 5 steps:\n",
    "\n",
    "1. Data preparation\n",
    "2. Training\n",
    "3. Evaluation\n",
    "4. Approve model\n",
    "5. Save to model registry step\n",
    "\n",
    "We will build our pipeline iterating little by little.\n",
    "\n",
    "---"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### - if you _skipped_ some/all of the previous labs, follow instructions:\n",
    "\n",
    "   - **run this [notebook](./config/pre_setup.ipynb)**\n",
    "\n",
    "---\n",
    "Load all variables (and modules) for this lab:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%store -r bucket\n",
    "%store -r prefix\n",
    "%store -r region\n",
    "%store -r docker_image_name"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "bucket, prefix, region, docker_image_name"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "#Supress default INFO logging\n",
    "import logging\n",
    "logger = logging.getLogger()\n",
    "logger.setLevel(logging.ERROR)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import sagemaker \n",
    "role = sagemaker.get_execution_role()\n",
    "sagemaker_session = sagemaker.session.Session()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "---\n",
    "<a id=\"Create_pipeline\"></a>\n",
    "# Create the training pipeline with SageMaker Pipelines\n",
    "\n",
    "<a id=\"dataprep_step\"></a>\n",
    "\n",
    "## 1. Create data preparation step\n",
    "\n",
    "Get the raw data location and the S3 URI where our code for data preparation was stored:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%store -r s3uri_raw\n",
    "%store -r s3_dataprep_code_uri\n",
    "s3uri_raw, s3_dataprep_code_uri"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from sagemaker.workflow.steps import (\n",
    "    ProcessingStep,\n",
    "    TrainingStep,\n",
    ")\n",
    "from sagemaker.processing import (\n",
    "    ProcessingInput,\n",
    "    ProcessingOutput,\n",
    "    ScriptProcessor,\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This first step will receive some inputs:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from sagemaker.workflow.parameters import (\n",
    "    ParameterInteger,\n",
    "    ParameterString,\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Parameters for data preparation step\n",
    "input_data = ParameterString(\n",
    "    name=\"InputDataUrl\",\n",
    "    default_value=s3uri_raw # S3 URI where we stored the raw data\n",
    ")\n",
    "processing_instance_count = ParameterInteger(\n",
    "    name=\"ProcessingInstanceCount\", default_value=1\n",
    ")\n",
    "processing_instance_type = ParameterString(\n",
    "    name=\"ProcessingInstanceType\", default_value=\"ml.m5.xlarge\"\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from my_labs_solutions.dataprep_solution import get_dataprep_processor\n",
    "sklearn_processor = get_dataprep_processor(processing_instance_type, processing_instance_count, role)\n",
    "sklearn_processor"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Processing step for feature engineering\n",
    "step_process = ProcessingStep(\n",
    "    name=\"CustomerChurnProcess\",  # choose any name\n",
    "    processor=sklearn_processor,\n",
    "    outputs=[\n",
    "        ProcessingOutput(output_name=\"train\", source=\"/opt/ml/processing/train\"),\n",
    "        ProcessingOutput(\n",
    "            output_name=\"validation\", source=\"/opt/ml/processing/validation\"\n",
    "        ),\n",
    "        ProcessingOutput(output_name=\"test\", source=\"/opt/ml/processing/test\"),\n",
    "    ],\n",
    "    code=s3_dataprep_code_uri,\n",
    "    job_arguments=[\"--input-data\", input_data],\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Create the first iteration of the Pipeline\n",
    "\n",
    "We will create a simple pipeline that receives some inputs and just have 1 data preparation step:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from time import strftime, gmtime\n",
    "from sagemaker.workflow.pipeline import Pipeline\n",
    "from sagemaker.workflow.pipeline_experiment_config import PipelineExperimentConfig\n",
    "from sagemaker.workflow.step_collections import RegisterModel"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You can associate SageMaker Experiments with Pipelines to help track multiple moving pieces (ML hyperparameters, data, artifacts, plots, metrics, etc. - a.k.a. [ML lineage tracking](https://docs.aws.amazon.com/sagemaker/latest/dg/lineage-tracking.html)) "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Experiment configs\n",
    "create_date = lambda: strftime(\"%Y-%m-%d-%H-%M-%S\", gmtime())\n",
    "\n",
    "experiment_name=f\"pipeline-customer-churn-prediction-xgboost-{create_date()}\"\n",
    "trial_name=f\"pipeline-framework-trial-{create_date()}\"\n",
    "pipeline_name = f\"ChurnMLPipeline\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "pipeline_experiment_config = PipelineExperimentConfig(\n",
    "    experiment_name = experiment_name,\n",
    "    trial_name = trial_name\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Pipeline with just input parameters and 1 step for data prep\n",
    "pipeline = Pipeline(\n",
    "        name=pipeline_name,\n",
    "        parameters=[\n",
    "            input_data,\n",
    "            processing_instance_type,\n",
    "            processing_instance_count,\n",
    "        ],\n",
    "        steps=[step_process],\n",
    "        sagemaker_session=sagemaker_session,\n",
    "    )"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Validate that pipeline was configured correctly and load its definition\n",
    "import json\n",
    "json.loads(pipeline.definition())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "jupyter": {
     "outputs_hidden": true
    }
   },
   "source": [
    "#### Ok, looks good. Let's create the pipeline:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "pipeline.upsert(role_arn=role)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "1. Go to the pipeline and see its DAG:\n",
    "\n",
    "<img src=\"./media/sm-pipeline.png\" width=\"50%\">\n",
    "\n",
    "2. Right-click the ChurnMLPipeline -> `Open pipeline details`.\n",
    "\n",
    "Check its DAG (with just the data prep step:\n",
    "\n",
    "<img src=\"./media/sm-pipe-iter-1.png\" width=\"50%\">\n",
    "\n",
    "3. Click on `Parameters` to see the default parameter inputs for a execution:\n",
    "\n",
    "<img src=\"./media/sm-pipe-iter-1-params.png\" width=\"100%\">\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "> Remember that we set the inputs as:\n",
    "```python\n",
    "# Parameters for data preparation step\n",
    "input_data = ParameterString(\n",
    "    name=\"InputDataUrl\",\n",
    "    default_value=s3uri_raw # S3 URI where we stored the raw data\n",
    ")\n",
    "processing_instance_count = ParameterInteger(\n",
    "    name=\"ProcessingInstanceCount\", default_value=1\n",
    ")\n",
    "processing_instance_type = ParameterString(\n",
    "    name=\"ProcessingInstanceType\", default_value=\"ml.m5.xlarge\"\n",
    ")\n",
    "```\n",
    "\n",
    "**Let's programatically execute the pipeline with defaults:**\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "execution = pipeline.start()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "execution.describe()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "execution.list_steps()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# If we wanted to wait for execution to end:\n",
    "# execution.wait()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\n",
    "4. Right-click the `Executions` tab:\n",
    "\n",
    "<img src=\"./media/sm-pipe-iter-1-exec-list.png\" width=\"100%\">\n",
    "\n",
    "5. Select the only execution (should be in status \"Executing\") and double click on it:\n",
    "\n",
    "<img src=\"./media/sm-pipe-iter-1-exec.png\" width=\"60%\">\n",
    "\n",
    "6. Wait for a few minutes (for the data preparation step and the SageMaker Processing Job under the hood to finish):\n",
    "\n",
    "<img src=\"./media/sm-pipe-iter-1-exec-succ.png\" width=\"60%\">\n",
    "\n",
    "7. If you go to `Experiments and trials` tab you will see that SageMaker Pipelines created an experiment called `churnmlpipeline`.\n",
    "\n",
    "Also if we select our data prep Processing job, we can see that it correctly created 3 dataset as output: `train`, `validation` and `test`:\n",
    "\n",
    "<img src=\"./media/sm-pipe-iter-1-exec-outs.png\" width=\"100%\">\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "---\n",
    "\n",
    "<a id=\"train_step\"></a>\n",
    "\n",
    "# 2. Create modeling step"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%store -r s3_modeling_code_uri\n",
    "%store -r train_script_name\n",
    "s3_modeling_code_uri, train_script_name"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from my_labs_solutions.modeling_solution import get_modeling_estimator\n",
    "\n",
    "xgb_train = get_modeling_estimator(bucket,\n",
    "                                   prefix,\n",
    "                                   s3_modeling_code_uri, \n",
    "                                   docker_image_name,\n",
    "                                   role,\n",
    "                                   entry_point_script = train_script_name)\n",
    "xgb_train"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from sagemaker.inputs import TrainingInput"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "step_train = TrainingStep(\n",
    "    name=\"CustomerChurnTrain\",\n",
    "    estimator=xgb_train,\n",
    "    inputs={\n",
    "        \"train\": TrainingInput(\n",
    "                    s3_data=step_process.properties.ProcessingOutputConfig.Outputs[\n",
    "                        \"train\"\n",
    "                    ].S3Output.S3Uri,\n",
    "                    content_type=\"text/csv\"\n",
    "                 ),\n",
    "        \"validation\": TrainingInput(\n",
    "                    s3_data=step_process.properties.ProcessingOutputConfig.Outputs[\n",
    "                        \"validation\"\n",
    "                    ].S3Output.S3Uri,\n",
    "                    content_type=\"text/csv\"\n",
    "                 )\n",
    "    }\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Notice that we can link one step's output to other steps input by accessing the properties:\n",
    "```python\n",
    "# Get output from processing step with key `train`\n",
    "step_process.properties.ProcessingOutputConfig.Outputs[\"train\"].S3Output.S3Uri\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Create the second iteration of the Pipeline (updating the definition)\n",
    "\n",
    "We will update the pipeline adding an input parameter for the training Step and also the training Step itself, resulting in a pipeline with 2 step:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Add an input parameter to define the training instance type\n",
    "training_instance_type = ParameterString(\n",
    "    name=\"TrainingInstanceType\", default_value=\"ml.m5.xlarge\"\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "pipeline = Pipeline(\n",
    "    name=pipeline_name,\n",
    "    parameters=[\n",
    "        input_data,\n",
    "        processing_instance_type,\n",
    "        processing_instance_count,\n",
    "        training_instance_type,\n",
    "    ],\n",
    "    steps=[step_process, step_train],\n",
    "    sagemaker_session=sagemaker_session,\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Update the pipeline\n",
    "pipeline.upsert(role_arn=role)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "1. If we go to the pipeline and click on the refresh button, we see now its 2 steps and the new input parameter:\n",
    "\n",
    "<img src=\"./media/sm-pipe-iter-2.png\" width=\"70%\">\n",
    "\n",
    "<img src=\"./media/sm-pipe-iter-2-params.png\" width=\"100%\">\n",
    "\n",
    "2. Now, let's execute the new pipeline in the Studio UI. Click on `Start an execution`:\n",
    "\n",
    "<img src=\"./media/sm-pipe-iter-2-exec.png\" width=\"80%\">\n",
    "\n",
    "3. The default input configurations should appear in the Studio UI. Click on `Start`:\n",
    "\n",
    "<img src=\"./media/sm-pipe-iter-2-exec-man.png\" width=\"80%\">\n",
    "\n",
    "4. Refreshing the executions we should see:\n",
    "\n",
    "<img src=\"./media/sm-pipe-iter-2-exec-man2.png\" width=\"80%\">\n",
    "\n",
    "5. Click on `View details` or select the execution in the list (status \"Executing\") and double click:\n",
    "\n",
    "<img src=\"./media/sm-pipe-iter-2-exec-man3.png\" width=\"80%\">\n",
    "\n",
    "6. Wait a few minutes to the data prep Processing job and the training job finish. You should see this:\n",
    "\n",
    "<img src=\"./media/sm-pipe-iter-2-exec-succ.png\" width=\"80%\">\n",
    "\n",
    "If you click on the training step and select `Outputs` you will also be able to see the final training and validation log losses."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "---\n",
    "\n",
    "<a id=\"eval_step\"></a>\n",
    "\n",
    "# 3. Create evaluation step"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from my_labs_solutions.evaluation_solution import get_evaluation_processor\n",
    "script_eval = get_evaluation_processor(docker_image_name, role)\n",
    "script_eval"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from sagemaker.workflow.properties import PropertyFile"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "evaluation_report = PropertyFile(\n",
    "    name=\"EvaluationReport\",\n",
    "    output_name=\"evaluation\",\n",
    "    path=\"evaluation.json\",\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%store -r s3_evaluation_code_uri\n",
    "s3_evaluation_code_uri"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Processing step for evaluation\n",
    "step_eval = ProcessingStep(\n",
    "        name=\"CustomerChurnEval\",\n",
    "        processor=script_eval,\n",
    "        inputs=[\n",
    "            ProcessingInput(\n",
    "                source=step_train.properties.ModelArtifacts.S3ModelArtifacts,\n",
    "                destination=\"/opt/ml/processing/model\",\n",
    "            ),\n",
    "            ProcessingInput(\n",
    "                source=step_process.properties.ProcessingOutputConfig.Outputs[\n",
    "                    \"test\"\n",
    "                ].S3Output.S3Uri,\n",
    "                destination=\"/opt/ml/processing/test\",\n",
    "            ),\n",
    "        ],\n",
    "        outputs=[\n",
    "            ProcessingOutput(\n",
    "                output_name=\"evaluation\", source=\"/opt/ml/processing/evaluation\"\n",
    "            ),\n",
    "        ],\n",
    "        code=s3_evaluation_code_uri,\n",
    "        property_files=[evaluation_report],\n",
    "    )"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now, notice that we get the model from the training step and also the test dataset from the data preparation step:\n",
    "```python\n",
    "# Get output model artifact from training step\n",
    "step_train.properties.ModelArtifacts.S3ModelArtifacts\n",
    "\n",
    "# Get the test dataset - the output of data preparation step with key `test`\n",
    "step_process.properties.ProcessingOutputConfig.Outputs[\"test\"].S3Output.S3Uri\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Create the third iteration of the Pipeline (updating the definition)\n",
    "\n",
    "We will update the pipeline adding the evaluation step, resulting in a pipeline with 3 step: data prep, training and evaluation."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "pipeline = Pipeline(\n",
    "    name=pipeline_name,\n",
    "    parameters=[\n",
    "        input_data,\n",
    "        processing_instance_type,\n",
    "        processing_instance_count,\n",
    "        training_instance_type,\n",
    "    ],\n",
    "    steps=[step_process, step_train, step_eval],\n",
    "    sagemaker_session=sagemaker_session,\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Update the pipeline\n",
    "pipeline.upsert(role_arn=role)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "1. If we go to the pipeline and click on the refresh button again:\n",
    "\n",
    "<img src=\"./media/sm-pipe-iter-3.png\" width=\"70%\">\n",
    "\n",
    "2. Now, _**let's execute the new pipeline programatically here...**_\n",
    "\n",
    "> ### Wait! You must be wondering...\n",
    "> Do I have to keep re-running everything from beginning every time?!\n",
    "\n",
    "No, you don't. \n",
    "\n",
    "**SageMaker Pipelines can cache results from previous step.** Hence, the executions will be a lot faster and you won't keep spending money with steps that would generate in the exact same outputs!\n",
    "\n",
    "Let's run this 3rd iteration of the pipeline caching both data preparation and training steps:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from sagemaker.workflow.steps import CacheConfig\n",
    "\n",
    "# Cache for 30 minutes\n",
    "cache_config = CacheConfig(enable_caching=True, expire_after=\"T30m\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Minor change in data preparation steps\n",
    "step_process = ProcessingStep(\n",
    "    name=\"CustomerChurnProcess\",  # choose any name\n",
    "    processor=sklearn_processor,\n",
    "    outputs=[\n",
    "        ProcessingOutput(output_name=\"train\", source=\"/opt/ml/processing/train\"),\n",
    "        ProcessingOutput(\n",
    "            output_name=\"validation\", source=\"/opt/ml/processing/validation\"\n",
    "        ),\n",
    "        ProcessingOutput(output_name=\"test\", source=\"/opt/ml/processing/test\"),\n",
    "    ],\n",
    "    code=s3_dataprep_code_uri,\n",
    "    job_arguments=[\"--input-data\", input_data],\n",
    "    cache_config=cache_config\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Minor change in data training steps\n",
    "step_train = TrainingStep(\n",
    "    name=\"CustomerChurnTrain\",\n",
    "    estimator=xgb_train,\n",
    "    inputs={\n",
    "        \"train\": TrainingInput(\n",
    "                    s3_data=step_process.properties.ProcessingOutputConfig.Outputs[\n",
    "                        \"train\"\n",
    "                    ].S3Output.S3Uri,\n",
    "                    content_type=\"text/csv\"\n",
    "                 ),\n",
    "        \"validation\": TrainingInput(\n",
    "                    s3_data=step_process.properties.ProcessingOutputConfig.Outputs[\n",
    "                        \"validation\"\n",
    "                    ].S3Output.S3Uri,\n",
    "                    content_type=\"text/csv\"\n",
    "                 )\n",
    "    },\n",
    "    cache_config=cache_config\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Processing step for evaluation\n",
    "step_eval = ProcessingStep(\n",
    "        name=\"CustomerChurnEval\",\n",
    "        processor=script_eval,\n",
    "        inputs=[\n",
    "            ProcessingInput(\n",
    "                source=step_train.properties.ModelArtifacts.S3ModelArtifacts,\n",
    "                destination=\"/opt/ml/processing/model\",\n",
    "            ),\n",
    "            ProcessingInput(\n",
    "                source=step_process.properties.ProcessingOutputConfig.Outputs[\n",
    "                    \"test\"\n",
    "                ].S3Output.S3Uri,\n",
    "                destination=\"/opt/ml/processing/test\",\n",
    "            ),\n",
    "        ],\n",
    "        outputs=[\n",
    "            ProcessingOutput(\n",
    "                output_name=\"evaluation\", source=\"/opt/ml/processing/evaluation\"\n",
    "            ),\n",
    "        ],\n",
    "        code=s3_evaluation_code_uri,\n",
    "        property_files=[evaluation_report],\n",
    "        cache_config=cache_config\n",
    "    )"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Update the pipeline definition and pipelines with the cache configuration of 30 min for all 3 steps:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "pipeline = Pipeline(\n",
    "    name=pipeline_name,\n",
    "    parameters=[\n",
    "        input_data,\n",
    "        processing_instance_type,\n",
    "        processing_instance_count,\n",
    "        training_instance_type,\n",
    "    ],\n",
    "    steps=[step_process, step_train, step_eval],\n",
    "    sagemaker_session=sagemaker_session,\n",
    ")\n",
    "\n",
    "# Update the pipeline\n",
    "pipeline.upsert(role_arn=role)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "execution = pipeline.start()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Ok, so you should see:\n",
    "\n",
    "<img src=\"./media/sm-pipe-iter-3-exec.png\" width=\"80%\">\n",
    "\n",
    "This is the first execution with the cache configuration (we will have to wait one more time).\n",
    "\n",
    "3. Wait a few minutes to the data prep step, the training step and evaluation step finish:\n",
    "\n",
    "<img src=\"./media/sm-pipe-iter-3-exec-succ.png\" width=\"80%\">\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "---\n",
    "\n",
    "<a id=\"appr_model_reg_step\"></a>\n",
    "\n",
    "# 4. Create approve model and save to model registry steps"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from sagemaker.workflow.conditions import (\n",
    "    ConditionGreaterThanOrEqualTo,\n",
    ")\n",
    "from sagemaker.workflow.condition_step import (\n",
    "    ConditionStep,\n",
    "    JsonGet,\n",
    ")\n",
    "from sagemaker.workflow.step_collections import RegisterModel\n",
    "from sagemaker.model_metrics import (\n",
    "    MetricsSource,\n",
    "    ModelMetrics,\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Add new input parameter for the model registration step:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "model_approval_status = ParameterString(\n",
    "    name=\"ModelApprovalStatus\",\n",
    "    default_value=\"PendingManualApproval\",  # ModelApprovalStatus can be set to a default of \"Approved\" if you don't want manual approval.\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Create register model step:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Model metrics that will be associated with RegisterModel step\n",
    "'''\n",
    "model_metrics = ModelMetrics(\n",
    "    model_statistics=MetricsSource(\n",
    "        s3_uri=\"{}/evaluation.json\".format(\n",
    "            step_eval.arguments[\"ProcessingOutputConfig\"][\"Outputs\"][0][\"S3Output\"][\n",
    "                \"S3Uri\"\n",
    "            ]\n",
    "        ),\n",
    "        content_type=\"application/json\",\n",
    "    )\n",
    ")\n",
    "'''"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "model_package_group_name=\"CustomerChurnPackageGroup\"\n",
    "\n",
    "# Register model step that will be conditionally executed\n",
    "step_register = RegisterModel(\n",
    "    name=\"CustomerChurnRegisterModel\",\n",
    "    estimator=xgb_train,\n",
    "    model_data=step_train.properties.ModelArtifacts.S3ModelArtifacts,\n",
    "    content_types=[\"text/csv\"],\n",
    "    response_types=[\"text/csv\"],\n",
    "    inference_instances=[\"ml.t2.medium\", \"ml.m5.large\"],\n",
    "    transform_instances=[\"ml.m5.large\"],\n",
    "    model_package_group_name=model_package_group_name,\n",
    "    approval_status=model_approval_status,\n",
    "    #model_metrics=model_metrics,\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Create condition step for **accuracy above 0.8**:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Condition step for evaluating model quality and branching execution\n",
    "cond_lte = ConditionGreaterThanOrEqualTo(  # You can change the condition here\n",
    "    left=JsonGet(\n",
    "        step=step_eval,\n",
    "        property_file=evaluation_report,\n",
    "        json_path=\"binary_classification_metrics.accuracy.value\",  # This should follow the structure of your report_dict defined in the evaluate.py file.\n",
    "    ),\n",
    "    right=0.8,  # You can change the threshold here\n",
    ")\n",
    "step_cond = ConditionStep(\n",
    "    name=\"CustomerChurnAccuracyCond\",\n",
    "    conditions=[cond_lte],\n",
    "    if_steps=[step_register],\n",
    "    else_steps=[],\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Create the forth and final iteration of the Pipeline (updating the definition)\n",
    "\n",
    "We will update the pipeline the final approve model (contidion) and save model steps, resulting in a pipeline with 5 steps: data prep, training, evaluation, approval, save to registry steps."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "pipeline = Pipeline(\n",
    "        name=pipeline_name,\n",
    "        parameters=[\n",
    "            input_data,\n",
    "            processing_instance_type,\n",
    "            processing_instance_count,\n",
    "            training_instance_type,\n",
    "            model_approval_status,\n",
    "        ],\n",
    "        steps=[step_process, step_train, step_eval, step_cond],\n",
    "        sagemaker_session=sagemaker_session,\n",
    "    )"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "pipeline.upsert(role_arn=role)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's start final execution:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "execution = pipeline.start()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<a id='end_creation_pipe'></a>\n",
    "    \n",
    "# 5. End of pipeline creation\n",
    "\n",
    "With the caches all should be faster now.\n",
    "\n",
    "Let's get the final result of the pipeline. Read evaluation report:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "evaluation_json = sagemaker.s3.S3Downloader.read_file(\"{}/evaluation.json\".format(\n",
    "    step_eval.arguments[\"ProcessingOutputConfig\"][\"Outputs\"][0][\"S3Output\"][\"S3Uri\"]\n",
    "))\n",
    "json.loads(evaluation_json)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Ok, so you should see in the execution:\n",
    "\n",
    "<img src=\"./media/sm-pipe-iter-final-exec-succ.png\" width=\"80%\">\n",
    "\n",
    "If we go to the register model step we can see that it was approved (accuracy was 0.95 > 0.8):\n",
    "<img src=\"./media/sm-pipe-iter-final-exec-succ2.png\" width=\"80%\">\n",
    "\n",
    "Since the model was approved it was saved in the **SageMaker Model Registry**:\n",
    "\n",
    "<img src=\"./media/sm-pipe-iter-final-exec-succ3.png\" width=\"40%\">\n",
    "\n",
    "Select our model group:\n",
    "\n",
    "<img src=\"./media/sm-pipe-iter-final-exec-succ4.png\" width=\"40%\">\n",
    "\n",
    "Open its details:\n",
    "\n",
    "<img src=\"./media/sm-pipe-iter-final-exec-succ5.png\" width=\"40%\">\n",
    "\n",
    "Select the model that was saved in the Model Registry:\n",
    "\n",
    "<img src=\"./media/sm-pipe-iter-final-exec-succ6.png\" width=\"100%\">\n",
    "\n",
    "\n",
    "Here we can see the model details (if you click on the metrics you can visualize the auc and accuracy metrics).\n",
    "\n",
    "<img src=\"./media/sm-pipe-iter-final-exec-succ7.png\" width=\"100%\">\n",
    "\n",
    "Now, we can manually approve the model:\n",
    "\n",
    "<img src=\"./media/sm-pipe-iter-final-exec-succ-approve.png\" width=\"40%\">\n",
    "\n",
    "After selecting approve and click on `Update status`, the model will be updated:\n",
    "\n",
    "<img src=\"./media/sm-pipe-iter-final-exec-succ8.png\" width=\"100%\">\n",
    "\n",
    "We also can see the metrics to this model we just approved:\n",
    "\n",
    "<img src=\"./media/sm-pipe-iter-final-exec-succ9.png\" width=\"100%\">\n",
    "\n",
    "and its details:\n",
    "\n",
    "<img src=\"./media/sm-pipe-iter-final-exec-succ10.png\" width=\"100%\">"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Run again with caches and changing input parameters:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Obs.: If we want to override the input parameters with other ones:\n",
    "\n",
    "execution = pipeline.start(\n",
    "    parameters=dict(\n",
    "        ProcessingInstanceType=\"ml.c5.xlarge\",\n",
    "        ModelApprovalStatus=\"Approved\", # Would approve automatically\n",
    "    )\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now with the cache everything will be even faster (check the `Elapsed time`):\n",
    "\n",
    "<img src=\"./media/sm-pipe-iter-final-exec-override.png\" width=\"80%\">\n",
    "\n",
    "\n",
    "Since we overrode the input `ModelApprovalStatus` to \"Approved\", this time model will be approved automatically and saved to the Model Registry:\n",
    "\n",
    "<img src=\"./media/sm-pipe-iter-final-exec-override2.png\" width=\"80%\">\n",
    "\n",
    "Let's compare the models. Just select both, right-click and then choose `Compare models`:\n",
    "\n",
    "<img src=\"./media/sm-pipe-iter-final-exec-override3.png\" width=\"80%\">\n",
    "\n",
    "Obviously both executions were identical and the 2 models have the same metrics:\n",
    "\n",
    "<img src=\"./media/sm-pipe-iter-final-exec-override4.png\" width=\"80%\">"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# # Obs.: if we wanted to stop pipeline execution:\n",
    "# execution.stop()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# # Obs.: if we wanted to delete the whole pipeline:\n",
    "# pipeline.delete()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's put the whole pipeline code into a python script:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%%writefile my_labs_solutions/pipeline_definition.py\n",
    "import os\n",
    "import json\n",
    "from time import strftime, gmtime\n",
    "\n",
    "import sagemaker\n",
    "from sagemaker.inputs import TrainingInput\n",
    "from sagemaker.workflow.steps import (\n",
    "    ProcessingStep,\n",
    "    TrainingStep,\n",
    ")\n",
    "from sagemaker.processing import (\n",
    "    ProcessingInput,\n",
    "    ProcessingOutput,\n",
    "    ScriptProcessor,\n",
    ")\n",
    "from sagemaker.workflow.parameters import (\n",
    "    ParameterInteger,\n",
    "    ParameterString,\n",
    ")\n",
    "\n",
    "from sagemaker.workflow.pipeline import Pipeline\n",
    "from sagemaker.workflow.pipeline_experiment_config import PipelineExperimentConfig\n",
    "from sagemaker.workflow.step_collections import RegisterModel\n",
    "from sagemaker.workflow.properties import PropertyFile\n",
    "from sagemaker.workflow.steps import CacheConfig\n",
    "from sagemaker.workflow.conditions import (\n",
    "    ConditionGreaterThanOrEqualTo,\n",
    ")\n",
    "from sagemaker.workflow.condition_step import (\n",
    "    ConditionStep,\n",
    "    JsonGet,\n",
    ")\n",
    "from sagemaker.workflow.step_collections import RegisterModel\n",
    "from sagemaker.model_metrics import (\n",
    "    MetricsSource,\n",
    "    ModelMetrics,\n",
    ")\n",
    "\n",
    "from .dataprep_solution import get_dataprep_processor\n",
    "from .modeling_solution import get_modeling_estimator\n",
    "from .evaluation_solution import get_evaluation_processor\n",
    "\n",
    "BASE_DIR = os.path.dirname(os.path.realpath(__file__))\n",
    "\n",
    "def get_my_solutions_vars():\n",
    "    vars_path = os.path.join(\".\", \"pipelines\", \"my_labs_solutions\", \"my-solution-vars.json\")\n",
    "\n",
    "    with open(vars_path, \"rb\") as f:\n",
    "        my_vars = json.loads(f.read())\n",
    "        \n",
    "    return my_vars\n",
    "\n",
    "def get_pipeline(region,\n",
    "                 role=None,\n",
    "                 default_bucket=None,\n",
    "                 model_package_group_name=\"MLOpsCustomerChurnPackageGroup\",  # Choose any name\n",
    "                 pipeline_name=\"MLOpsFinalChurnMLPipeline\",  # You can find your pipeline name in the Studio UI (project -> Pipelines -> name)\n",
    "                 base_job_prefix=\"CustomerChurn\",  # Choose any name\n",
    "                ) -> Pipeline:\n",
    "    \n",
    "    # Get config vars\n",
    "    my_vars = get_my_solutions_vars()\n",
    "    bucket = my_vars[\"bucket\"]\n",
    "    prefix = my_vars[\"prefix\"]\n",
    "    region = my_vars[\"region\"]\n",
    "    docker_image_name = my_vars[\"docker_image_name\"]\n",
    "    s3uri_raw = my_vars[\"s3uri_raw\"]\n",
    "    s3_dataprep_code_uri = my_vars[\"s3_dataprep_code_uri\"]\n",
    "    s3_modeling_code_uri = my_vars[\"s3_modeling_code_uri\"]\n",
    "    train_script_name = my_vars[\"train_script_name\"]\n",
    "    s3_evaluation_code_uri = my_vars[\"s3_evaluation_code_uri\"]\n",
    "    role = my_vars[\"role\"]\n",
    "\n",
    "    sagemaker_session = sagemaker.session.Session()\n",
    "\n",
    "    # Parameters for data preparation step\n",
    "    input_data = ParameterString(\n",
    "        name=\"InputDataUrl\",\n",
    "        default_value=s3uri_raw # S3 URI where we stored the raw data\n",
    "    )\n",
    "    processing_instance_count = ParameterInteger(\n",
    "        name=\"ProcessingInstanceCount\", default_value=1\n",
    "    )\n",
    "    processing_instance_type = ParameterString(\n",
    "        name=\"ProcessingInstanceType\", default_value=\"ml.m5.xlarge\"\n",
    "    )\n",
    "\n",
    "    # Add an input parameter to define the training instance type\n",
    "    training_instance_type = ParameterString(\n",
    "        name=\"TrainingInstanceType\", default_value=\"ml.m5.xlarge\"\n",
    "    )\n",
    "    model_approval_status = ParameterString(\n",
    "        name=\"ModelApprovalStatus\",\n",
    "        default_value=\"PendingManualApproval\",  # ModelApprovalStatus can be set to a default of \"Approved\" if you don't want manual approval.\n",
    "    )\n",
    "\n",
    "\n",
    "    # Cache for 30 minutes\n",
    "    cache_config = CacheConfig(enable_caching=True, expire_after=\"T30m\")\n",
    "\n",
    "    sklearn_processor = get_dataprep_processor(processing_instance_type, processing_instance_count, role)\n",
    "\n",
    "    # Processing step for feature engineering\n",
    "    step_process = ProcessingStep(\n",
    "        name=\"CustomerChurnProcess\",  # choose any name\n",
    "        processor=sklearn_processor,\n",
    "        outputs=[\n",
    "            ProcessingOutput(output_name=\"train\", source=\"/opt/ml/processing/train\"),\n",
    "            ProcessingOutput(\n",
    "                output_name=\"validation\", source=\"/opt/ml/processing/validation\"\n",
    "            ),\n",
    "            ProcessingOutput(output_name=\"test\", source=\"/opt/ml/processing/test\"),\n",
    "        ],\n",
    "        code=s3_dataprep_code_uri,\n",
    "        job_arguments=[\"--input-data\", input_data],\n",
    "        cache_config=cache_config\n",
    "    )\n",
    "\n",
    "\n",
    "    xgb_train = get_modeling_estimator(bucket,\n",
    "                                       prefix,\n",
    "                                       s3_modeling_code_uri, \n",
    "                                       docker_image_name,\n",
    "                                       role,\n",
    "                                       entry_point_script = train_script_name)\n",
    "\n",
    "\n",
    "    step_train = TrainingStep(\n",
    "        name=\"CustomerChurnTrain\",\n",
    "        estimator=xgb_train,\n",
    "        inputs={\n",
    "            \"train\": TrainingInput(\n",
    "                        s3_data=step_process.properties.ProcessingOutputConfig.Outputs[\n",
    "                            \"train\"\n",
    "                        ].S3Output.S3Uri,\n",
    "                        content_type=\"text/csv\"\n",
    "                     ),\n",
    "            \"validation\": TrainingInput(\n",
    "                        s3_data=step_process.properties.ProcessingOutputConfig.Outputs[\n",
    "                            \"validation\"\n",
    "                        ].S3Output.S3Uri,\n",
    "                        content_type=\"text/csv\"\n",
    "                     )\n",
    "        },\n",
    "        cache_config=cache_config\n",
    "    )     \n",
    "\n",
    "\n",
    "    evaluation_report = PropertyFile(\n",
    "        name=\"EvaluationReport\",\n",
    "        output_name=\"evaluation\",\n",
    "        path=\"evaluation.json\",\n",
    "    )\n",
    "\n",
    "    script_eval = get_evaluation_processor(docker_image_name, role)\n",
    "\n",
    "    # Processing step for evaluation\n",
    "    step_eval = ProcessingStep(\n",
    "            name=\"CustomerChurnEval\",\n",
    "            processor=script_eval,\n",
    "            inputs=[\n",
    "                ProcessingInput(\n",
    "                    source=step_train.properties.ModelArtifacts.S3ModelArtifacts,\n",
    "                    destination=\"/opt/ml/processing/model\",\n",
    "                ),\n",
    "                ProcessingInput(\n",
    "                    source=step_process.properties.ProcessingOutputConfig.Outputs[\n",
    "                        \"test\"\n",
    "                    ].S3Output.S3Uri,\n",
    "                    destination=\"/opt/ml/processing/test\",\n",
    "                ),\n",
    "            ],\n",
    "            outputs=[\n",
    "                ProcessingOutput(\n",
    "                    output_name=\"evaluation\", source=\"/opt/ml/processing/evaluation\"\n",
    "                ),\n",
    "            ],\n",
    "            code=s3_evaluation_code_uri,\n",
    "            property_files=[evaluation_report],\n",
    "            cache_config=cache_config\n",
    "    )\n",
    "\n",
    "\n",
    "    # Model metrics that will be associated with RegisterModel step\n",
    "    model_metrics = ModelMetrics(\n",
    "        model_statistics=MetricsSource(\n",
    "            s3_uri=\"{}/evaluation.json\".format(\n",
    "                step_eval.arguments[\"ProcessingOutputConfig\"][\"Outputs\"][0][\"S3Output\"][\n",
    "                    \"S3Uri\"\n",
    "                ]\n",
    "            ),\n",
    "            content_type=\"application/json\",\n",
    "        )\n",
    "    )\n",
    "\n",
    "    #model_package_group_name=\"CustomerChurnPackageGroup\"\n",
    "\n",
    "    # Register model step that will be conditionally executed\n",
    "    step_register = RegisterModel(\n",
    "        name=\"CustomerChurnRegisterModel\",\n",
    "        estimator=xgb_train,\n",
    "        model_data=step_train.properties.ModelArtifacts.S3ModelArtifacts,\n",
    "        content_types=[\"text/csv\"],\n",
    "        response_types=[\"text/csv\"],\n",
    "        inference_instances=[\"ml.t2.medium\", \"ml.m5.large\"],\n",
    "        transform_instances=[\"ml.m5.large\"],\n",
    "        model_package_group_name=model_package_group_name,\n",
    "        approval_status=model_approval_status,\n",
    "        model_metrics=model_metrics,\n",
    "    )\n",
    "\n",
    "\n",
    "    # Condition step for evaluating model quality and branching execution\n",
    "    cond_lte = ConditionGreaterThanOrEqualTo(  # You can change the condition here\n",
    "        left=JsonGet(\n",
    "            step=step_eval,\n",
    "            property_file=evaluation_report,\n",
    "            json_path=\"binary_classification_metrics.accuracy.value\",  # This should follow the structure of your report_dict defined in the evaluate.py file.\n",
    "        ),\n",
    "        right=0.8,  # You can change the threshold here\n",
    "    )\n",
    "    step_cond = ConditionStep(\n",
    "        name=\"CustomerChurnAccuracyCond\",\n",
    "        conditions=[cond_lte],\n",
    "        if_steps=[step_register],\n",
    "        else_steps=[],\n",
    "    )\n",
    "\n",
    "\n",
    "\n",
    "    # Experiment configs\n",
    "    create_date = lambda: strftime(\"%Y-%m-%d-%H-%M-%S\", gmtime())\n",
    "\n",
    "    experiment_name=f\"pipeline-customer-churn-prediction-xgboost-{create_date()}\"\n",
    "    trial_name=f\"pipeline-framework-trial-{create_date()}\"\n",
    "\n",
    "    pipeline_experiment_config = PipelineExperimentConfig(\n",
    "        experiment_name = experiment_name,\n",
    "        trial_name = trial_name\n",
    "    )\n",
    "\n",
    "\n",
    "    pipeline = Pipeline(\n",
    "            name=pipeline_name,\n",
    "            parameters=[\n",
    "                input_data,\n",
    "                processing_instance_type,\n",
    "                processing_instance_count,\n",
    "                training_instance_type,\n",
    "                model_approval_status,\n",
    "            ],\n",
    "            steps=[step_process, step_train, step_eval, step_cond],\n",
    "            sagemaker_session=sagemaker_session,\n",
    "        )\n",
    "    \n",
    "    return pipeline\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<a id='SM_Projects'></a>\n",
    "# Customizing the Build/Train/Deploy MLOps Project Template\n",
    "\n",
    "SageMaker Projects introduce MLOps templates that automatically provision the underlying resources needed to enable \n",
    "CI/CD capabilities for your Machine Learning Development Lifecycle (MLDC). Customers can use a number of built-in \n",
    "templates or create your own custom templates.\n",
    "\n",
    "This workshop we will use one of the **pre-built MLOps templates** to bootstrap your ML project and establish a CI/CD \n",
    "pattern from seed code."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### MLOps Template for Build, Train, and Deploy\n",
    "\n",
    "> Imagine now that you are a data scientist that just joined the company. You need to get access to the ML resources.\n",
    "\n",
    "To get started with SageMaker Projects, [they must be first enabled in the SageMaker Studio console](https://docs.aws.amazon.com/sagemaker/latest/dg/sagemaker-projects-studio-updates.html). \n",
    "This can be done for existing users or while creating new ones:\n",
    "\n",
    "<img src=\"media/enable_projects.png\">\n",
    "\n",
    "Within Amazon SageMaker Studio, you can now select “Projects” from a drop-down menu on the “Components and registries” \n",
    "tab as shown below:\n",
    "\n",
    "<img src=\"media/select_projects.png\">\n",
    "\n",
    "From the projects page you’ll have the option to launch a pre-configured SageMaker MLOps template. Click on `Create project` and we'll select the build, train and deploy template:\n",
    "\n",
    "<img src=\"media/create_project.png\">\n",
    "\n",
    "Name the project `ChurnProject`.\n",
    "\n",
    "> NOTE: Launching this template will kick off a model building pipeline by default and will train a regression model. This will incur a small cost.\n",
    "\n",
    "Once the project is created from the MLOps template, the following architecture will be deployed:\n",
    "\n",
    "<img src=\"media/deep_dive.png\">\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Modifying IAM Role for Code Build\n",
    "\n",
    "Attach AdminRole on Code Build Role"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Modifying the Seed Code for Custom Use Case\n",
    "\n",
    "After your project has been created the architecture shown above will be deployed and the visualization of the \n",
    "Pipeline will be available in the “Pipelines” drop down menu within SageMaker Studio.\n",
    "\n",
    "In order to modify the seed code from this launched template, we’ll first need to clone the AWS CodeCommit \n",
    "repositories to our local SageMaker Studio instance. From the list of projects, select the one that was just \n",
    "created. Under the “Repositories” tab you can select the hyperlinks to locally clone the AWS CodeCommit repos:\n",
    "\n",
    "<img src=\"media/clone_repos.png\">\n",
    "\n",
    "\n",
    "### Clone the `...modelbuild` repo (click on `clone repo...`)\n",
    "\n",
    "The SageMaker project template will create this repositories.\n",
    "\n",
    "In the `...-modelbuild` repository there's the code for preprocessing, training, and evaluating the model. This pre-built template includes another example for a regression model related to the [UCI Abalone dataset](https://archive.ics.uci.edu/ml/datasets/abalone):\n",
    "\n",
    "<img src=\"media/repo_directory.png\">\n",
    "\n",
    "\n",
    "**In our case we want to create a pipeline for predicting Churn (previous labs).** We can modify these files in order to solve our own customer churn use-case."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "---\n",
    "\n",
    "### Modifying the code for the Churn problem\n",
    "\n",
    "This is the sample structure of the Project (Abalone):\n",
    "\n",
    "<img src=\"media/repo_directory.png\" width=\"40%\">\n",
    "\n",
    "\n",
    "#### Let's use everything we just built:\n",
    "\n",
    "In the `...modelbuild` repo:\n",
    "\n",
    "1. replace `codebuild-buildspec.yml` in your current Studio project (Abalone) with the one found in [modelbuild/codebuild-buildspec.yml](modelbuild/codebuild-buildspec.yml) (Churn)\n",
    "\n",
    "The final `codebuild-buildspec.yml` should be this one (with the comment at the top 1st line)\n",
    "\n",
    "<img src=\"media/buildspec.png\" width=\"60%\">\n",
    "\n",
    "\n",
    "2. go to `pipelines`. Delete the `abalone` directory. \n",
    "\n",
    "<img src=\"media/dir_del.png\" width=\"40%\">\n",
    "\n",
    "\n",
    "3. Cut `my_labs_solutions` directory and paste it to the `...modelbuild/pipelines` repo.\n",
    "\n",
    "<img src=\"media/dir_cut.png\" width=\"40%\">\n",
    "\n",
    "<img src=\"media/dir_paste.png\" width=\"40%\">\n",
    "\n",
    "In the end the `...modelbuild` repo should look like this:\n",
    "\n",
    "<img src=\"media/dir_structure.png\" width=\"40%\">"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Trigger a new training Pipeline Execution through git commit\n",
    "\n",
    "By committing these changes to the AWS CodeCommit repository (easily done in SageMaker Studio source control tab), a \n",
    "new Pipeline execution will be triggered since there is an EventBridge monitoring for commits.  After a few moments, \n",
    "we can monitor the execution by selecting your Pipeline inside of the SageMaker Project.\n",
    "\n",
    "Go to the directory of the `...modelbuild/pipelines` repo. Click on the git symbol:\n",
    "\n",
    "<img src=\"media/git_push.png\">\n",
    "\n",
    "This triggers the pipelines for training. Go to our `“Pipelines”` tab inside of the SageMaker Project. Click on our only pipeline. And you'll see:\n",
    "\n",
    "<img src=\"media/execute_pipeline.png\">\n",
    "\n",
    "Select the most recent execution:\n",
    "\n",
    "<img src=\"media/dag.png\">\n",
    "\n",
    "\n",
    "## Trigger the ModelDeploy Pipeline\n",
    "\n",
    "Once the train pipeline is completed, we can go to our `“Model groups”` tab inside of the SageMaker Project and inspect the metadata attached to the model artifacts. If everything looks good, we can manually approve the model:\n",
    "\n",
    "<img src=\"media/model_metrics.png\">\n",
    "\n",
    "<img src=\"media/approve_model.png\">\n",
    "\n",
    "This approval will trigger the ModelDeploy pipeline (in CodePipeline):\n",
    "\n",
    "<img src=\"media/execute_pipeline_deploy.png\">\n",
    "\n",
    "After we deploy to a staging environment and run some tests, we will have to **approve the deployment to production** by approving in the `ApproveDeployment` stage:\n",
    "\n",
    "<img src=\"media/approve_deploy_prod.png\">\n",
    "\n",
    "\n",
    "\n",
    "Finally, if we go back to Studio, we will see the Production endpoint for real time inference.\n",
    "\n",
    "<img src=\"media/endpoints.png\">"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "---"
   ]
  }
 ],
 "metadata": {
  "availableInstances": [
   {
    "_defaultOrder": 0,
    "_isFastLaunch": true,
    "category": "General purpose",
    "gpuNum": 0,
    "memoryGiB": 4,
    "name": "ml.t3.medium",
    "vcpuNum": 2
   },
   {
    "_defaultOrder": 1,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "memoryGiB": 8,
    "name": "ml.t3.large",
    "vcpuNum": 2
   },
   {
    "_defaultOrder": 2,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "memoryGiB": 16,
    "name": "ml.t3.xlarge",
    "vcpuNum": 4
   },
   {
    "_defaultOrder": 3,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "memoryGiB": 32,
    "name": "ml.t3.2xlarge",
    "vcpuNum": 8
   },
   {
    "_defaultOrder": 4,
    "_isFastLaunch": true,
    "category": "General purpose",
    "gpuNum": 0,
    "memoryGiB": 8,
    "name": "ml.m5.large",
    "vcpuNum": 2
   },
   {
    "_defaultOrder": 5,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "memoryGiB": 16,
    "name": "ml.m5.xlarge",
    "vcpuNum": 4
   },
   {
    "_defaultOrder": 6,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "memoryGiB": 32,
    "name": "ml.m5.2xlarge",
    "vcpuNum": 8
   },
   {
    "_defaultOrder": 7,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "memoryGiB": 64,
    "name": "ml.m5.4xlarge",
    "vcpuNum": 16
   },
   {
    "_defaultOrder": 8,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "memoryGiB": 128,
    "name": "ml.m5.8xlarge",
    "vcpuNum": 32
   },
   {
    "_defaultOrder": 9,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "memoryGiB": 192,
    "name": "ml.m5.12xlarge",
    "vcpuNum": 48
   },
   {
    "_defaultOrder": 10,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "memoryGiB": 256,
    "name": "ml.m5.16xlarge",
    "vcpuNum": 64
   },
   {
    "_defaultOrder": 11,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "memoryGiB": 384,
    "name": "ml.m5.24xlarge",
    "vcpuNum": 96
   },
   {
    "_defaultOrder": 12,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "memoryGiB": 8,
    "name": "ml.m5d.large",
    "vcpuNum": 2
   },
   {
    "_defaultOrder": 13,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "memoryGiB": 16,
    "name": "ml.m5d.xlarge",
    "vcpuNum": 4
   },
   {
    "_defaultOrder": 14,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "memoryGiB": 32,
    "name": "ml.m5d.2xlarge",
    "vcpuNum": 8
   },
   {
    "_defaultOrder": 15,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "memoryGiB": 64,
    "name": "ml.m5d.4xlarge",
    "vcpuNum": 16
   },
   {
    "_defaultOrder": 16,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "memoryGiB": 128,
    "name": "ml.m5d.8xlarge",
    "vcpuNum": 32
   },
   {
    "_defaultOrder": 17,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "memoryGiB": 192,
    "name": "ml.m5d.12xlarge",
    "vcpuNum": 48
   },
   {
    "_defaultOrder": 18,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "memoryGiB": 256,
    "name": "ml.m5d.16xlarge",
    "vcpuNum": 64
   },
   {
    "_defaultOrder": 19,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "memoryGiB": 384,
    "name": "ml.m5d.24xlarge",
    "vcpuNum": 96
   },
   {
    "_defaultOrder": 20,
    "_isFastLaunch": true,
    "category": "Compute optimized",
    "gpuNum": 0,
    "memoryGiB": 4,
    "name": "ml.c5.large",
    "vcpuNum": 2
   },
   {
    "_defaultOrder": 21,
    "_isFastLaunch": false,
    "category": "Compute optimized",
    "gpuNum": 0,
    "memoryGiB": 8,
    "name": "ml.c5.xlarge",
    "vcpuNum": 4
   },
   {
    "_defaultOrder": 22,
    "_isFastLaunch": false,
    "category": "Compute optimized",
    "gpuNum": 0,
    "memoryGiB": 16,
    "name": "ml.c5.2xlarge",
    "vcpuNum": 8
   },
   {
    "_defaultOrder": 23,
    "_isFastLaunch": false,
    "category": "Compute optimized",
    "gpuNum": 0,
    "memoryGiB": 32,
    "name": "ml.c5.4xlarge",
    "vcpuNum": 16
   },
   {
    "_defaultOrder": 24,
    "_isFastLaunch": false,
    "category": "Compute optimized",
    "gpuNum": 0,
    "memoryGiB": 72,
    "name": "ml.c5.9xlarge",
    "vcpuNum": 36
   },
   {
    "_defaultOrder": 25,
    "_isFastLaunch": false,
    "category": "Compute optimized",
    "gpuNum": 0,
    "memoryGiB": 96,
    "name": "ml.c5.12xlarge",
    "vcpuNum": 48
   },
   {
    "_defaultOrder": 26,
    "_isFastLaunch": false,
    "category": "Compute optimized",
    "gpuNum": 0,
    "memoryGiB": 144,
    "name": "ml.c5.18xlarge",
    "vcpuNum": 72
   },
   {
    "_defaultOrder": 27,
    "_isFastLaunch": false,
    "category": "Compute optimized",
    "gpuNum": 0,
    "memoryGiB": 192,
    "name": "ml.c5.24xlarge",
    "vcpuNum": 96
   },
   {
    "_defaultOrder": 28,
    "_isFastLaunch": true,
    "category": "Accelerated computing",
    "gpuNum": 1,
    "memoryGiB": 16,
    "name": "ml.g4dn.xlarge",
    "vcpuNum": 4
   },
   {
    "_defaultOrder": 29,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 1,
    "memoryGiB": 32,
    "name": "ml.g4dn.2xlarge",
    "vcpuNum": 8
   },
   {
    "_defaultOrder": 30,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 1,
    "memoryGiB": 64,
    "name": "ml.g4dn.4xlarge",
    "vcpuNum": 16
   },
   {
    "_defaultOrder": 31,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 1,
    "memoryGiB": 128,
    "name": "ml.g4dn.8xlarge",
    "vcpuNum": 32
   },
   {
    "_defaultOrder": 32,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 4,
    "memoryGiB": 192,
    "name": "ml.g4dn.12xlarge",
    "vcpuNum": 48
   },
   {
    "_defaultOrder": 33,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 1,
    "memoryGiB": 256,
    "name": "ml.g4dn.16xlarge",
    "vcpuNum": 64
   },
   {
    "_defaultOrder": 34,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 1,
    "memoryGiB": 61,
    "name": "ml.p3.2xlarge",
    "vcpuNum": 8
   },
   {
    "_defaultOrder": 35,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 4,
    "memoryGiB": 244,
    "name": "ml.p3.8xlarge",
    "vcpuNum": 32
   },
   {
    "_defaultOrder": 36,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 8,
    "memoryGiB": 488,
    "name": "ml.p3.16xlarge",
    "vcpuNum": 64
   },
   {
    "_defaultOrder": 37,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 8,
    "memoryGiB": 768,
    "name": "ml.p3dn.24xlarge",
    "vcpuNum": 96
   },
   {
    "_defaultOrder": 38,
    "_isFastLaunch": false,
    "category": "Memory Optimized",
    "gpuNum": 0,
    "memoryGiB": 16,
    "name": "ml.r5.large",
    "vcpuNum": 2
   },
   {
    "_defaultOrder": 39,
    "_isFastLaunch": false,
    "category": "Memory Optimized",
    "gpuNum": 0,
    "memoryGiB": 32,
    "name": "ml.r5.xlarge",
    "vcpuNum": 4
   },
   {
    "_defaultOrder": 40,
    "_isFastLaunch": false,
    "category": "Memory Optimized",
    "gpuNum": 0,
    "memoryGiB": 64,
    "name": "ml.r5.2xlarge",
    "vcpuNum": 8
   },
   {
    "_defaultOrder": 41,
    "_isFastLaunch": false,
    "category": "Memory Optimized",
    "gpuNum": 0,
    "memoryGiB": 128,
    "name": "ml.r5.4xlarge",
    "vcpuNum": 16
   },
   {
    "_defaultOrder": 42,
    "_isFastLaunch": false,
    "category": "Memory Optimized",
    "gpuNum": 0,
    "memoryGiB": 256,
    "name": "ml.r5.8xlarge",
    "vcpuNum": 32
   },
   {
    "_defaultOrder": 43,
    "_isFastLaunch": false,
    "category": "Memory Optimized",
    "gpuNum": 0,
    "memoryGiB": 384,
    "name": "ml.r5.12xlarge",
    "vcpuNum": 48
   },
   {
    "_defaultOrder": 44,
    "_isFastLaunch": false,
    "category": "Memory Optimized",
    "gpuNum": 0,
    "memoryGiB": 512,
    "name": "ml.r5.16xlarge",
    "vcpuNum": 64
   },
   {
    "_defaultOrder": 45,
    "_isFastLaunch": false,
    "category": "Memory Optimized",
    "gpuNum": 0,
    "memoryGiB": 768,
    "name": "ml.r5.24xlarge",
    "vcpuNum": 96
   },
   {
    "_defaultOrder": 46,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 1,
    "memoryGiB": 16,
    "name": "ml.g5.xlarge",
    "vcpuNum": 4
   },
   {
    "_defaultOrder": 47,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 1,
    "memoryGiB": 32,
    "name": "ml.g5.2xlarge",
    "vcpuNum": 8
   },
   {
    "_defaultOrder": 48,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 1,
    "memoryGiB": 64,
    "name": "ml.g5.4xlarge",
    "vcpuNum": 16
   },
   {
    "_defaultOrder": 49,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 1,
    "memoryGiB": 128,
    "name": "ml.g5.8xlarge",
    "vcpuNum": 32
   },
   {
    "_defaultOrder": 50,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 1,
    "memoryGiB": 256,
    "name": "ml.g5.16xlarge",
    "vcpuNum": 64
   },
   {
    "_defaultOrder": 51,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 4,
    "memoryGiB": 192,
    "name": "ml.g5.12xlarge",
    "vcpuNum": 48
   },
   {
    "_defaultOrder": 52,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 4,
    "memoryGiB": 384,
    "name": "ml.g5.24xlarge",
    "vcpuNum": 96
   },
   {
    "_defaultOrder": 53,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 8,
    "memoryGiB": 768,
    "name": "ml.g5.48xlarge",
    "vcpuNum": 192
   }
  ],
  "instance_type": "ml.t3.medium",
  "kernelspec": {
   "display_name": "Python 3 (Data Science)",
   "language": "python",
   "name": "python3__SAGEMAKER_INTERNAL__arn:aws:sagemaker:us-east-1:081325390199:image/datascience-1.0"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.10"
  },
  "pycharm": {
   "stem_cell": {
    "cell_type": "raw",
    "metadata": {
     "collapsed": false
    },
    "source": []
   }
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}