{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "6eb825bb-e3c2-4838-b73c-528e1fc25735",
   "metadata": {},
   "source": [
    "# Step 4: Add a model building CI/CD pipeline\n",
    "\n",
    "In this step you create an automated CI/CD pipeline for model building using [Amazon SageMaker Projects](https://docs.aws.amazon.com/sagemaker/latest/dg/sagemaker-projects.html). \n",
    "\n",
    "![](img/six-steps-4.png)\n",
    "\n",
    "You are going to use a [SageMaker-provided MLOps project template for model building and training](https://docs.aws.amazon.com/sagemaker/latest/dg/sagemaker-projects-templates-sm.html#sagemaker-projects-templates-code-commit) to provision a CI/CD workflow automation with [AWS CodePipeline](https://aws.amazon.com/codepipeline/) and an [AWS CodeCommit](https://aws.amazon.com/codecommit/) code repository.\n",
    "\n",
    "SageMaker project templates offer you the following choice of code repositories, workflow automation tools, and pipeline stages:\n",
    "- **Code repository**: AWS CodeCommit or third-party Git repositories such as GitHub and Bitbucket\n",
    "- **CI/CD workflow automation**: AWS CodePipeline or Jenkins\n",
    "- **Pipeline stages**: Model building and training, model deployment, or both"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "42102a0b-a706-43b3-9f23-59f7084123f1",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "import boto3\n",
    "import sagemaker \n",
    "from time import gmtime, strftime, sleep"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "7c0dd762-ba2a-4141-b937-80cfccf98faf",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "%store -r \n",
    "\n",
    "%store\n",
    "\n",
    "try:\n",
    "    initialized\n",
    "except NameError:\n",
    "    print(\"+++++++++++++++++++++++++++++++++++++++++++++++++\")\n",
    "    print(\"[ERROR] YOU HAVE TO RUN 00-start-here notebook   \")\n",
    "    print(\"+++++++++++++++++++++++++++++++++++++++++++++++++\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "aaeaa68c-a2aa-4da5-8577-42d7dff2067a",
   "metadata": {},
   "source": [
    "## Create an MLOps project\n",
    "⭐ You can create a project programmatically in this notebook - Option 1 or in Studio - Option 2. Option 1 is recommended as it requires no manual input. Option 2 is given to demonstrate [**Create Project** UX flow](https://docs.aws.amazon.com/sagemaker/latest/dg/sagemaker-projects-create.html)."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "cd7061c4-b0f5-48a9-999c-1b0b84c8a4b9",
   "metadata": {},
   "source": [
    "### Option 1: Create project programmatically\n",
    "Use `boto3` to create an MLOps project via a SageMaker API."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "1002ef46-7906-44fe-a874-78c3607d827a",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "sm = boto3.client(\"sagemaker\")\n",
    "sc = boto3.client(\"servicecatalog\")\n",
    "\n",
    "sc_provider_name = \"Amazon SageMaker\"\n",
    "sc_product_name = \"MLOps template for model building and training\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "a30b2c04-8c60-4f67-abef-284e47072110",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "p_ids = [p['ProductId'] for p in sc.search_products(\n",
    "    Filters={\n",
    "        'FullTextSearch': [sc_product_name]\n",
    "    },\n",
    ")['ProductViewSummaries'] if p[\"Name\"]==sc_product_name]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "579caafe-27c5-4a82-84c4-f364675ca7c7",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "p_ids"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "2cef972c-ff89-49fb-a498-35f8a4768d25",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "# If you get any exception from this code, go to the Option 2 and create a project in Studio UX\n",
    "if not len(p_ids):\n",
    "    raise Exception(\"No Amazon SageMaker ML Ops products found!\")\n",
    "elif len(p_ids) > 1:\n",
    "    raise Exception(\"Too many matching Amazon SageMaker ML Ops products found!\")\n",
    "else:\n",
    "    product_id = p_ids[0]\n",
    "    print(f\"ML Ops product id: {product_id}\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "8b51c8dc-57e7-4ff1-a137-3f4e93d9594c",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "provisioning_artifact_id = sorted(\n",
    "    [i for i in sc.list_provisioning_artifacts(\n",
    "        ProductId=product_id\n",
    "    )['ProvisioningArtifactDetails'] if i['Guidance']=='DEFAULT'],\n",
    "    key=lambda d: d['Name'], reverse=True)[0]['Id']"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "f65aa4b1-c848-45ce-9f51-b5c4b70175b1",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "provisioning_artifact_id"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "ed952193-a0cf-48c8-b776-351f6452adc8",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "project_name = f\"model-build-{strftime('%m-%d-%H-%M-%S', gmtime())}\"\n",
    "project_parameters = [] # This SageMaker built-in project template doesn't have any parameters"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "22e7f228-8ca1-4c93-8131-8901acf75165",
   "metadata": {},
   "source": [
    "Finally, create a SageMaker project from the service catalog product template:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "4928192d-4580-4f23-ab89-749937365cfb",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "# create SageMaker project\n",
    "r = sm.create_project(\n",
    "    ProjectName=project_name,\n",
    "    ProjectDescription=\"Model build project\",\n",
    "    ServiceCatalogProvisioningDetails={\n",
    "        'ProductId': product_id,\n",
    "        'ProvisioningArtifactId': provisioning_artifact_id,\n",
    "    },\n",
    ")\n",
    "\n",
    "print(r)\n",
    "project_id = r[\"ProjectId\"]"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a82b313e-9025-42cb-8f7a-a617ed771e3e",
   "metadata": {},
   "source": [
    "<div class=\"alert alert-info\"> 💡 <strong> Wait until project creation is completed </strong>\n",
    "</div>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0afc704f",
   "metadata": {},
   "source": [
    "\n",
    "<img src=\"data:image/svg+xml;base64,Cjxzdmcgd2lkdGg9IjgwMCIgaGVpZ2h0PSIxMjUiIHZpZXdCb3g9IjAgMCA4MDAgMTI1IiB4bWxucz0iaHR0cDovL3d3dy53My5vcmcvMjAwMC9zdmciPgogICAgPGRlZnM+CiAgICAgICAgPGxpbmVhckdyYWRpZW50IGlkPSJmYWRlR3JhZGllbnQiIHgxPSIwIiB4Mj0iMSI+CiAgICAgICAgICAgIDxzdG9wIG9mZnNldD0iMCUiIHN0b3AtY29sb3I9IiNGMEYwRjAiLz4KICAgICAgICAgICAgPHN0b3Agb2Zmc2V0PSIxMDAlIiBzdG9wLWNvbG9yPSIjRjBGMEYwIiBzdG9wLW9wYWNpdHk9IjAiLz4KICAgICAgICA8L2xpbmVhckdyYWRpZW50PgogICAgICAgIDxtYXNrIGlkPSJmYWRlTWFzayI+CiAgICAgICAgICAgIDxyZWN0IHg9IjAiIHk9IjAiIHdpZHRoPSI3NTAiIGhlaWdodD0iMTI1IiBmaWxsPSJ3aGl0ZSIvPgogICAgICAgICAgICA8cmVjdCB4PSI3NTAiIHk9IjAiIHdpZHRoPSI1MCIgaGVpZ2h0PSIxMjUiIGZpbGw9InVybCgjZmFkZUdyYWRpZW50KSIvPgogICAgICAgIDwvbWFzaz4KICAgIDwvZGVmcz4KICAgIDxwYXRoIGQ9Ik0zLDUwIEE1MCw1MCAwIDAgMSA1MywzIEw3OTcsMyBMNzk3LDk3IEw5Nyw5NyBMNTAsMTE1IEwzLDk3IFoiIGZpbGw9IiNGMEYwRjAiIHN0cm9rZT0iI0UwRTBFMCIgc3Ryb2tlLXdpZHRoPSIxIiBtYXNrPSJ1cmwoI2ZhZGVNYXNrKSIvPgogICAgPGNpcmNsZSBjeD0iNTAiIGN5PSI1MCIgcj0iMzAiIGZpbGw9IiM1N2M0ZjgiIHN0cm9rZT0iIzU3YzRmOCIgc3Ryb2tlLXdpZHRoPSIxIi8+CiAgICA8Y2lyY2xlIGN4PSI1MCIgY3k9IjUwIiByPSIyNSIgZmlsbD0iI0YwRjBGMCIvPgogICAgPGxpbmUgeDE9IjUwIiB5MT0iNTAiIHgyPSI1MCIgeTI9IjMwIiBzdHJva2U9IiM1N2M0ZjgiIHN0cm9rZS13aWR0aD0iMyIgc3Ryb2tlLWxpbmVjYXA9InJvdW5kIi8+CiAgICA8bGluZSB4MT0iNTAiIHkxPSI1MCIgeDI9IjY1IiB5Mj0iNTAiIHN0cm9rZT0iIzU3YzRmOCIgc3Ryb2tlLXdpZHRoPSIzIiBzdHJva2UtbGluZWNhcD0icm91bmQiLz4KICAgIDx0ZXh0IHg9IjEwMCIgeT0iMzQiIGZvbnQtZmFtaWx5PSJBcmlhbCwgc2Fucy1zZXJpZiIgZm9udC1zaXplPSIxNCIgZmlsbD0iIzMzMzMzMyI+VGhlIG5leHQgY2VsbCBtYXkgdGFrZSBhIGZldyBtaW51dGVzIHRvIHJ1bi48L3RleHQ+Cjwvc3ZnPgo=\" alt=\"Time alert open medium\"/>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "af1af2b6-2324-41f4-8a5a-0cf178b9ec23",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "while sm.describe_project(ProjectName=project_name)['ProjectStatus'] != 'CreateCompleted':\n",
    "    print(\"Waiting for project creation completion\")\n",
    "    sleep(10)\n",
    "    \n",
    "print(f\"MLOps project {project_name} creation completed\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "58ee7720",
   "metadata": {},
   "source": [
    "<img src=\"data:image/svg+xml;base64,Cjxzdmcgd2lkdGg9IjgwMCIgaGVpZ2h0PSI1MCIgdmlld0JveD0iMCAwIDgwMCA1MCIgeG1sbnM9Imh0dHA6Ly93d3cudzMub3JnLzIwMDAvc3ZnIj4KICAgIDxkZWZzPgogICAgICAgIDxsaW5lYXJHcmFkaWVudCBpZD0iZmFkZUdyYWRpZW50IiB4MT0iMCIgeDI9IjEiPgogICAgICAgICAgICA8c3RvcCBvZmZzZXQ9IjAlIiBzdG9wLWNvbG9yPSIjRjBGMEYwIi8+CiAgICAgICAgICAgIDxzdG9wIG9mZnNldD0iMTAwJSIgc3RvcC1jb2xvcj0iI0YwRjBGMCIgc3RvcC1vcGFjaXR5PSIwIi8+CiAgICAgICAgPC9saW5lYXJHcmFkaWVudD4KICAgICAgICA8bWFzayBpZD0iZmFkZU1hc2siPgogICAgICAgICAgICA8cmVjdCB4PSIwIiB5PSIwIiB3aWR0aD0iNzUwIiBoZWlnaHQ9IjUwIiBmaWxsPSJ3aGl0ZSIvPgogICAgICAgICAgICA8cmVjdCB4PSI3NTAiIHk9IjAiIHdpZHRoPSI1MCIgaGVpZ2h0PSI1MCIgZmlsbD0idXJsKCNmYWRlR3JhZGllbnQpIi8+CiAgICAgICAgPC9tYXNrPgogICAgPC9kZWZzPgogICAgPHBhdGggZD0iTTI1LDUwIFEwLDUwIDAsMjUgTDUwLDMgTDk3LDI1IEw3OTcsMjUgTDc5Nyw1MCBMMjUsNTAgWiIgZmlsbD0iI0YwRjBGMCIgc3Ryb2tlPSIjRTBFMEUwIiBzdHJva2Utd2lkdGg9IjEiIG1hc2s9InVybCgjZmFkZU1hc2spIi8+Cjwvc3ZnPgo=\" alt=\"Time alert close\"/>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "54b2bb29-1051-4ad4-a050-c5366e018f74",
   "metadata": {},
   "source": [
    "### End of Option 1: Create project programmatically\n",
    "Now you have instanciated a project template in your SageMaker environment. You can go to the section **Configure the MLOps project**.\n",
    "\n",
    "---"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7792b450-40ea-47f8-9449-528148ae43f5",
   "metadata": {},
   "source": [
    "### Option 2: Create a project in Studio\n",
    "<div class=\"alert alert-info\"> 💡 <strong> Skip this section if you created a project programmatically </strong>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "26a0bcfd-94e3-4892-8199-6ca06593c0af",
   "metadata": {},
   "source": [
    "To [create a project](https://docs.aws.amazon.com/sagemaker/latest/dg/sagemaker-projects-create.html) in Studio:\n",
    "\n",
    "1. In the Studio sidebar, choose the **Home** icon.\n",
    "2. Select **Deployments** from the menu, and then select **Projects**.\n",
    "3. Choose **Create project**.\n",
    "    - The **Create project** tab opens displaying a list of available project templates.\n",
    "4. For **SageMaker project templates**, choose **SageMaker templates**. \n",
    "5. Choose **MLOps template for model building and training**\n",
    "6. Choose **Select project template**.\n",
    "\n",
    "![](img/create-mlops-project.png)\n",
    "\n",
    "![](img/create-mlops-project-2.png)\n",
    "\n",
    "The **Create project** tab changes to display **Project details**.\n",
    "\n",
    "![](img/project-details.png)\n",
    "\n",
    "Enter the following information:\n",
    "- For **Project details**, enter a name and description for your project. Note the name requirements.\n",
    "- Optionally, add tags, which are key-value pairs that you can use to track your projects.\n",
    "\n",
    "Choose **Create project** and wait for the project to appear in the Projects list."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "19f8a881-3e06-4497-8dce-e73dc6923ce4",
   "metadata": {},
   "source": [
    "### Resolve issues with project creation\n",
    "\n",
    "#### Project creation process stuck in pending\n",
    "If after 5 minutes the project creation banner is still on, close the Studio browser window and sign in Studio again.\n",
    "\n",
    "![](img/project-creation-pending.png)\n",
    "\n",
    "#### Error messages\n",
    "❗ If you see an error message similar to:\n",
    "```\n",
    "Your project couldn't be created\n",
    "Studio encountered an error when creating your project. Try recreating the project again.\n",
    "\n",
    "CodeBuild is not authorized to perform: sts:AssumeRole on arn:aws:iam::XXXX:role/service-role/AmazonSageMakerServiceCatalogProductsCodeBuildRole (Service: AWSCodeBuild; Status Code: 400; Error Code: InvalidInputException; Request ID: 4cf59a54-0c59-476a-a970-0ac656db4402; Proxy: null)\n",
    "```\n",
    "\n",
    "see steps 5-6 of [SageMaker Studio Permissions Required to Use Projects](https://docs.aws.amazon.com/sagemaker/latest/dg/sagemaker-projects-studio-updates.html). Make sure you have all required project roles listed in the **Apps** card under **Projects**. \n",
    "\n",
    "💡 If you don't have these roles, you must follow [Shut Down and Update SageMaker Studio and Studio Apps](https://docs.aws.amazon.com/sagemaker/latest/dg/studio-tasks-update.html) instructions to update the domain. You must shutdown both JupyterServer and KernelGateway apps. After you shutdown all apps, go to Amazon SageMaker console, choose **Domains**, click on your domain from the list, and choose **Domain Settings**. Choose **Configure app** on the **Apps** card. Click through all **Next** in configuration panes and choose **Submit**. This will update the domain and create all needed project roles automatically.\n",
    "\n",
    "Alternatively, you can create the required roles programmaticaly by using the provided CloudFormation template [`cfn-templates/sagemaker-project-templates-roles.yaml`](cfn-templates/sagemaker-project-templates-roles.yaml). \n",
    "Run in the repository clone directory from the command line terminal where you have the corresponding permissions:\n",
    "\n",
    "```sh\n",
    "aws cloudformation deploy \\\n",
    "    --template-file cfn-templates/sagemaker-project-templates-roles.yaml \\\n",
    "    --stack-name sagemaker-project-template-roles \\\n",
    "    --capabilities CAPABILITY_IAM CAPABILITY_NAMED_IAM \\\n",
    "    --parameter-overrides \\\n",
    "    CreateCloudFormationRole=YES \\\n",
    "    CreateCodeBuildRole=YES \\\n",
    "    CreateCodePipelineRole=YES \\\n",
    "    CreateEventsRole=YES \\\n",
    "    CreateProductsExecutionRole=YES \n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d0675444-29b3-4db6-bf0c-cfec72426c41",
   "metadata": {},
   "source": [
    "### End of Option 2: Create a project in Studio\n",
    "Now when you have the project created, move to the section **Configure the MLOps project**.\n",
    "\n",
    "---"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "976166a9-0769-4db8-924d-270b25d91acb",
   "metadata": {},
   "source": [
    "## Configure the MLOps project\n",
    "The project takes about 3-5 min to be created. The project runs a provided default model building pipeline automatically as soon as it has been created.\n",
    "The project templates deploys the following architecture in your AWS account:\n",
    "\n",
    "![](img/mlops-model-build-train.png)\n",
    "\n",
    "The main components are:\n",
    "1. The project template is made available through SageMaker Projects and AWS Service Catalog portfolio\n",
    "2. A CodePipeline pipeline with two stages - `Source` to download the source code from a CodeCommit repository and `Build` to create and execute a SageMaker pipeline\n",
    "3. A default SageMaker pipeline with model build, train, and register workflow\n",
    "4. A seed code repository in CodeCommit with a provided default version of the scaffolding code"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7edd547c-b9ec-4902-b798-2e4ac6cae2c1",
   "metadata": {},
   "source": [
    "This project contains all the required code and the insfrastructure to implement an automated CI/CD pipeline. \n",
    "To start using the project with your pipeline, you need to complete the following steps:\n",
    "1. Clone the project CodeCommit repository to your home directory on Studio EFS\n",
    "2. Replace the ML pipeline implementation sample code with your pipeline construction code, as implemented in the step 3 notebook\n",
    "3. Modify the `codebuild-buildspec.yml` file to reference the correct Python module name and to set project parameters\n",
    "\n",
    "Next sections guide you through these steps. For detailed instructions and a hands-on example, refer to the development guide [SageMaker MLOps Project Walkthrough](https://docs.aws.amazon.com/sagemaker/latest/dg/sagemaker-projects-walkthrough.html)."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e398fd92-8e76-43a0-9b61-3bb5291d880f",
   "metadata": {
    "tags": []
   },
   "source": [
    "### 1. Clone the project seed code to the Studio file system\n",
    "1. Choose **Home** in the Studio sidebar\n",
    "2. Select **Deployments** and then select **Projects**\n",
    "3. Click on the name of the project you created to open the project details tab\n",
    "4. In the project tab, choose **Repositories**\n",
    "5. In the **Local path** column for the repository choose **clone repo....**\n",
    "6. In the dialog box that appears choose **Clone Repository**\n",
    "\n",
    "![](img/select-project.png)\n",
    "\n",
    "![](img/clone-project-repo.png)\n",
    "\n",
    "When clone of the repository is complete, the local path appears in the **Local path** column. Choose the path to open the local folder that contains the repository code in Studio."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ccfe9536-fd10-4b43-9d63-bfc01090afa7",
   "metadata": {},
   "source": [
    "### 2. Replace pipeline construction code\n",
    "If you used the option 1 `boto3` to create an MLOps project, the `project_name` and `project_id` are set automatically. You can run the following code cell to print the values. If you followed the UX instructions to create a project, you must set the `project_name` manually."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "7ff9414b-2dbd-4aa5-a96f-e43e2159b66f",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "try:\n",
    "    print(project_name)\n",
    "    print(project_id)\n",
    "except NameError:\n",
    "    print(\"+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++\")\n",
    "    print(\"You must set the project_name manually in the following code cell\")\n",
    "    print(\"+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "4d7e208b-c041-4b9d-ac59-7bc6dc072688",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "# project_name = \"<ENTER THE NAME OF THE CREATED PROJECT>\"\n",
    "r = sm.describe_project(ProjectName=project_name)\n",
    "project_id = r['ProjectId']\n",
    "project_arn = r['ProjectArn']\n",
    "project_folder = f\"{project_name}-{project_id}/sagemaker-{project_name}-{project_id}-modelbuild\"\n",
    "\n",
    "print(project_folder)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2982e5ab-98d0-4c44-b1c5-32dffbc1a4e3",
   "metadata": {},
   "source": [
    "The following steps are required to customize the project. The next code cell executes the required steps, you don't need to do anything manually, the following text for your information only.\n",
    "\n",
    "- In Studio file browser navigate to the project's code repository folder, which looks like `<project-name>-<project-id>/sagemaker-<project-name>-<project-id>-modelbuild`.\n",
    "- Rename the file `codebuild-buildspec.yml` to `codebuild-buildspec-original.yml`.\n",
    "- Navigate to the `pipelines` folder inside the project's code repository folder and rename the `abalone` folder to `fromideatoprod`.\n",
    "- Rename the `pipeline.py` file in the `fromideatoprod` folder to `pipeline-original.py`.\n",
    "- Copy the `preprocessing.py` and `evaluation.py` files that you created in the step 2 and 3 notebooks from the `amazon-sagemaker-from-idea-to-production` folder to the `pipelines/fromideatoprod` folder in the project's code repository folder."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "0d6b5651-c5b9-466f-bc63-5a02a2f2841f",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "# if you local path for the workshop folder is different, set the correct absolute path here\n",
    "workshop_folder = \"amazon-sagemaker-from-idea-to-production\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "c3e2ac0e-53fe-4e29-b362-ca4ef28aede0",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "!mv ~/{project_folder}/codebuild-buildspec.yml ~/{project_folder}/codebuild-buildspec-original.yml\n",
    "!mv ~/{project_folder}/pipelines/abalone ~/{project_folder}/pipelines/fromideatoprod\n",
    "!mv ~/{project_folder}/pipelines/fromideatoprod/pipeline.py ~/{project_folder}/pipelines/fromideatoprod/pipeline-original.py\n",
    "!cp ~/{workshop_folder}/preprocessing.py ~/{project_folder}/pipelines/fromideatoprod/\n",
    "!cp ~/{workshop_folder}/evaluation.py ~/{project_folder}/pipelines/fromideatoprod/"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "59e4e13d-8956-4b69-9834-53a88ea46048",
   "metadata": {},
   "source": [
    "Execute the following cell to write pipeline construction code to the file `pipeline.py`. Re-use the code from the step 3 notebook as the function `get_pipeline()`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "8e15c1c4-e04e-45f1-8c48-6f87e46e6a13",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "%%writefile pipeline.py\n",
    "\n",
    "import pandas as pd\n",
    "import json\n",
    "import boto3\n",
    "import pathlib\n",
    "import io\n",
    "import sagemaker\n",
    "\n",
    "from sagemaker.deserializers import CSVDeserializer\n",
    "from sagemaker.serializers import CSVSerializer\n",
    "\n",
    "from sagemaker.workflow.pipeline_context import PipelineSession\n",
    "from sagemaker.xgboost.estimator import XGBoost\n",
    "from sagemaker.sklearn.processing import SKLearnProcessor\n",
    "from sagemaker.processing import (\n",
    "    ProcessingInput, \n",
    "    ProcessingOutput, \n",
    "    ScriptProcessor\n",
    ")\n",
    "from sagemaker.inputs import TrainingInput\n",
    "\n",
    "from sagemaker.workflow.pipeline import Pipeline\n",
    "from sagemaker.workflow.steps import (\n",
    "    ProcessingStep, \n",
    "    TrainingStep, \n",
    "    CreateModelStep,\n",
    "    CacheConfig\n",
    ")\n",
    "from sagemaker.workflow.check_job_config import CheckJobConfig\n",
    "from sagemaker.workflow.parameters import (\n",
    "    ParameterInteger, \n",
    "    ParameterFloat, \n",
    "    ParameterString, \n",
    "    ParameterBoolean\n",
    ")\n",
    "from sagemaker.workflow.clarify_check_step import (\n",
    "    ModelBiasCheckConfig, \n",
    "    ClarifyCheckStep, \n",
    "    ModelExplainabilityCheckConfig\n",
    ")\n",
    "from sagemaker import Model\n",
    "from sagemaker.inputs import CreateModelInput\n",
    "from sagemaker.workflow.model_step import ModelStep\n",
    "from sagemaker.workflow.fail_step import FailStep\n",
    "from sagemaker.workflow.conditions import (\n",
    "    ConditionGreaterThan,\n",
    "    ConditionGreaterThanOrEqualTo\n",
    ")\n",
    "from sagemaker.workflow.properties import PropertyFile\n",
    "from sagemaker.workflow.condition_step import ConditionStep\n",
    "from sagemaker.workflow.functions import (\n",
    "    Join,\n",
    "    JsonGet\n",
    ")\n",
    "from sagemaker.workflow.lambda_step import (\n",
    "    LambdaStep,\n",
    "    LambdaOutput,\n",
    "    LambdaOutputTypeEnum,\n",
    ")\n",
    "from sagemaker.lambda_helper import Lambda\n",
    "\n",
    "from sagemaker.model_metrics import (\n",
    "    MetricsSource, \n",
    "    ModelMetrics, \n",
    "    FileSource\n",
    ")\n",
    "from sagemaker.drift_check_baselines import DriftCheckBaselines\n",
    "\n",
    "from sagemaker.image_uris import retrieve\n",
    "import os\n",
    "\n",
    "BASE_DIR = os.path.dirname(os.path.realpath(__file__))\n",
    "\n",
    "def get_sagemaker_client(region):\n",
    "     \"\"\"Gets the sagemaker client.\n",
    "\n",
    "        Args:\n",
    "            region: the aws region to start the session\n",
    "            default_bucket: the bucket to use for storing the artifacts\n",
    "\n",
    "        Returns:\n",
    "            `sagemaker.session.Session instance\n",
    "        \"\"\"\n",
    "     boto_session = boto3.Session(region_name=region)\n",
    "     sagemaker_client = boto_session.client(\"sagemaker\")\n",
    "     return sagemaker_client\n",
    "\n",
    "\n",
    "def get_session(region, default_bucket):\n",
    "    \"\"\"Gets the sagemaker session based on the region.\n",
    "\n",
    "    Args:\n",
    "        region: the aws region to start the session\n",
    "        default_bucket: the bucket to use for storing the artifacts\n",
    "\n",
    "    Returns:\n",
    "        `sagemaker.session.Session instance\n",
    "    \"\"\"\n",
    "\n",
    "    boto_session = boto3.Session(region_name=region)\n",
    "\n",
    "    sagemaker_client = boto_session.client(\"sagemaker\")\n",
    "    runtime_client = boto_session.client(\"sagemaker-runtime\")\n",
    "    return sagemaker.session.Session(\n",
    "        boto_session=boto_session,\n",
    "        sagemaker_client=sagemaker_client,\n",
    "        sagemaker_runtime_client=runtime_client,\n",
    "        default_bucket=default_bucket,\n",
    "    )\n",
    "\n",
    "def get_pipeline_session(region, default_bucket):\n",
    "    \"\"\"Gets the pipeline session based on the region.\n",
    "\n",
    "    Args:\n",
    "        region: the aws region to start the session\n",
    "        default_bucket: the bucket to use for storing the artifacts\n",
    "\n",
    "    Returns:\n",
    "        PipelineSession instance\n",
    "    \"\"\"\n",
    "\n",
    "    boto_session = boto3.Session(region_name=region)\n",
    "    sagemaker_client = boto_session.client(\"sagemaker\")\n",
    "\n",
    "    return PipelineSession(\n",
    "        boto_session=boto_session,\n",
    "        sagemaker_client=sagemaker_client,\n",
    "        default_bucket=default_bucket,\n",
    "    )\n",
    "\n",
    "def get_pipeline_custom_tags(new_tags, region, sagemaker_project_arn=None):\n",
    "    try:\n",
    "        sm_client = get_sagemaker_client(region)\n",
    "        response = sm_client.list_tags(\n",
    "            ResourceArn=sagemaker_project_arn)\n",
    "        project_tags = response[\"Tags\"]\n",
    "        for project_tag in project_tags:\n",
    "            new_tags.append(project_tag)\n",
    "    except Exception as e:\n",
    "        print(f\"Error getting project tags: {e}\")\n",
    "    return new_tags\n",
    "\n",
    "\n",
    "def get_pipeline(\n",
    "    region,\n",
    "    sagemaker_project_arn=None,\n",
    "    sagemaker_project_id=None,\n",
    "    role=None,\n",
    "    default_bucket=None,\n",
    "    input_data_url=None,\n",
    "    bucket_prefix=\"from-idea-to-prod/xgboost\",\n",
    "    model_package_group_name=\"from-idea-to-prod-model-group\",\n",
    "    pipeline_name=\"from-idea-to-prod-pipeline\",\n",
    "    base_job_prefix=\"from-idea-to-prod-pipeline\",\n",
    "    processing_instance_type=\"ml.c5.xlarge\",\n",
    "    training_instance_type=\"ml.m5.xlarge\",\n",
    "    test_score_threshold=0.75,\n",
    "):\n",
    "    \"\"\"Gets a SageMaker ML Pipeline instance.\n",
    "\n",
    "    Args:\n",
    "        region: AWS region to create and run the pipeline.\n",
    "        role: IAM role to create and run steps and pipeline.\n",
    "        default_bucket: the bucket to use for storing the artifacts\n",
    "\n",
    "    Returns:\n",
    "        an instance of a pipeline\n",
    "    \"\"\"\n",
    "    sagemaker_session = get_session(region, default_bucket)\n",
    "    if role is None:\n",
    "        role = sagemaker.session.get_execution_role(sagemaker_session)\n",
    "\n",
    "    session = get_pipeline_session(region, default_bucket)\n",
    "    sm = session.sagemaker_client\n",
    "\n",
    "    # Set S3 urls for processed data\n",
    "    train_s3_url = f\"s3://{default_bucket}/{bucket_prefix}/train\"\n",
    "    validation_s3_url = f\"s3://{default_bucket}/{bucket_prefix}/validation\"\n",
    "    test_s3_url = f\"s3://{default_bucket}/{bucket_prefix}/test\"\n",
    "    baseline_s3_url = f\"s3://{default_bucket}/{bucket_prefix}/baseline\"\n",
    "    evaluation_s3_url = f\"s3://{default_bucket}/{bucket_prefix}/evaluation\"\n",
    "    prediction_baseline_s3_url = f\"s3://{default_bucket}/{bucket_prefix}/prediction_baseline\"\n",
    "    \n",
    "    # Set S3 url for model artifact\n",
    "    output_s3_url = f\"s3://{default_bucket}/{bucket_prefix}/output\"\n",
    "\n",
    "    # Parameters for pipeline execution\n",
    "    # Set processing instance type\n",
    "    process_instance_type_param = ParameterString(\n",
    "        name=\"ProcessingInstanceType\",\n",
    "        default_value=processing_instance_type,\n",
    "    )\n",
    "\n",
    "    # Set training instance type\n",
    "    train_instance_type_param = ParameterString(\n",
    "        name=\"TrainingInstanceType\",\n",
    "        default_value=training_instance_type,\n",
    "    )\n",
    "\n",
    "    # Set training instance count\n",
    "    train_instance_count_param = ParameterInteger(\n",
    "        name=\"TrainingInstanceCount\",\n",
    "        default_value=1\n",
    "    )\n",
    "\n",
    "    # Set model approval param\n",
    "    model_approval_status_param = ParameterString(\n",
    "        name=\"ModelApprovalStatus\", default_value=\"PendingManualApproval\"\n",
    "    )\n",
    "\n",
    "    # Minimal threshold for model performance on the test dataset\n",
    "    test_score_threshold_param = ParameterFloat(\n",
    "        name=\"TestScoreThreshold\", \n",
    "        default_value=0.75\n",
    "    )\n",
    "\n",
    "    # Set S3 url for input dataset\n",
    "    input_s3_url_param = ParameterString(\n",
    "        name=\"InputDataUrl\",\n",
    "        default_value=input_data_url,\n",
    "    )\n",
    "    \n",
    "    # Define step cache config\n",
    "    cache_config = CacheConfig(\n",
    "        enable_caching=True,\n",
    "        expire_after=\"P30d\" # 30-day\n",
    "    )\n",
    "    \n",
    "    # processing step for feature engineering\n",
    "    sklearn_processor = SKLearnProcessor(\n",
    "        framework_version=\"0.23-1\",\n",
    "        role=role,\n",
    "        instance_type=process_instance_type_param,\n",
    "        instance_count=1,\n",
    "        base_job_name=f\"{pipeline_name}/preprocess\",\n",
    "        sagemaker_session=session,\n",
    "    )\n",
    "    \n",
    "    processing_inputs=[\n",
    "        ProcessingInput(source=input_s3_url_param, destination=\"/opt/ml/processing/input\")\n",
    "    ]\n",
    "\n",
    "    processing_outputs=[\n",
    "        ProcessingOutput(output_name=\"train_data\", source=\"/opt/ml/processing/output/train\", \n",
    "                         destination=train_s3_url),\n",
    "        ProcessingOutput(output_name=\"validation_data\", source=\"/opt/ml/processing/output/validation\",\n",
    "                         destination=validation_s3_url),\n",
    "        ProcessingOutput(output_name=\"test_data\", source=\"/opt/ml/processing/output/test\",\n",
    "                         destination=test_s3_url),\n",
    "        ProcessingOutput(output_name=\"baseline_data\", source=\"/opt/ml/processing/output/baseline\", \n",
    "                         destination=baseline_s3_url),\n",
    "    ]\n",
    "\n",
    "    processor_args = sklearn_processor.run(\n",
    "        inputs=processing_inputs,\n",
    "        outputs=processing_outputs,\n",
    "        code=os.path.join(BASE_DIR, \"preprocessing.py\"),\n",
    "        # arguments = ['arg1', 'arg2'],\n",
    "    )\n",
    "\n",
    "    # Define processing step\n",
    "    step_process = ProcessingStep(\n",
    "        name=f\"{pipeline_name}-preprocess-data\",\n",
    "        step_args=processor_args,\n",
    "        cache_config = cache_config\n",
    "    )\n",
    "\n",
    "    # Training step for generating model artifacts\n",
    "    xgboost_image_uri = sagemaker.image_uris.retrieve(\n",
    "        \"xgboost\",\n",
    "        region=region, \n",
    "        version=\"1.5-1\")\n",
    "\n",
    "    # Instantiate an XGBoost estimator object\n",
    "    estimator = sagemaker.estimator.Estimator(\n",
    "        image_uri=xgboost_image_uri,\n",
    "        role=role, \n",
    "        instance_type=train_instance_type_param,\n",
    "        instance_count=train_instance_count_param,\n",
    "        output_path=output_s3_url,\n",
    "        sagemaker_session=session,\n",
    "        base_job_name=f\"{pipeline_name}/train\",\n",
    "    )\n",
    "\n",
    "    # Define algorithm hyperparameters\n",
    "    estimator.set_hyperparameters(\n",
    "        num_round=150, # the number of rounds to run the training\n",
    "        max_depth=5, # maximum depth of a tree\n",
    "        eta=0.5, # step size shrinkage used in updates to prevent overfitting\n",
    "        alpha=2.5, # L1 regularization term on weights\n",
    "        objective=\"binary:logistic\",\n",
    "        eval_metric=\"auc\", # evaluation metrics for validation data\n",
    "        subsample=0.8, # subsample ratio of the training instance\n",
    "        colsample_bytree=0.8, # subsample ratio of columns when constructing each tree\n",
    "        min_child_weight=3, # minimum sum of instance weight (hessian) needed in a child\n",
    "        early_stopping_rounds=10, # the model trains until the validation score stops improving\n",
    "        verbosity=1, # verbosity of printing messages\n",
    "    )\n",
    "\n",
    "    training_inputs = {\n",
    "        \"train\": TrainingInput(\n",
    "            s3_data=step_process.properties.ProcessingOutputConfig.Outputs[\n",
    "                \"train_data\"\n",
    "            ].S3Output.S3Uri,\n",
    "            content_type=\"text/csv\",\n",
    "        ),\n",
    "        \"validation\": TrainingInput(\n",
    "            s3_data=step_process.properties.ProcessingOutputConfig.Outputs[\n",
    "                \"validation_data\"\n",
    "            ].S3Output.S3Uri,\n",
    "            content_type=\"text/csv\",\n",
    "        ),\n",
    "    }\n",
    "\n",
    "    training_args = estimator.fit(training_inputs)\n",
    "\n",
    "    # Define training step\n",
    "    step_train = TrainingStep(\n",
    "        name=f\"{pipeline_name}-train\",\n",
    "        step_args=training_args,\n",
    "        cache_config = cache_config\n",
    "    )\n",
    "    \n",
    "    # Evaluation step\n",
    "    script_processor = ScriptProcessor(\n",
    "        image_uri=xgboost_image_uri,\n",
    "        role=role,\n",
    "        command=[\"python3\"],\n",
    "        instance_type=process_instance_type_param,\n",
    "        instance_count=1,\n",
    "        base_job_name=f\"{pipeline_name}/evaluate\",\n",
    "        sagemaker_session=session,\n",
    "    )\n",
    "\n",
    "    eval_inputs=[\n",
    "        ProcessingInput(source=step_train.properties.ModelArtifacts.S3ModelArtifacts, \n",
    "                        destination=\"/opt/ml/processing/model\"),\n",
    "        ProcessingInput(source=step_process.properties.ProcessingOutputConfig.Outputs[\"test_data\"].S3Output.S3Uri, \n",
    "                        destination=\"/opt/ml/processing/test\"),\n",
    "    ]\n",
    "\n",
    "    eval_outputs=[\n",
    "        ProcessingOutput(output_name=\"evaluation\", source=\"/opt/ml/processing/evaluation\", \n",
    "                         destination=evaluation_s3_url),\n",
    "        ProcessingOutput(output_name=\"prediction_baseline_data\", source=\"/opt/ml/processing/output/prediction_baseline\", \n",
    "                         destination=prediction_baseline_s3_url),\n",
    "    ]\n",
    "\n",
    "    eval_args = script_processor.run(\n",
    "        inputs=eval_inputs,\n",
    "        outputs=eval_outputs,\n",
    "        code=os.path.join(BASE_DIR, \"evaluation.py\"),\n",
    "    )\n",
    "\n",
    "    evaluation_report = PropertyFile(\n",
    "        name=\"ModelEvaluationReport\", output_name=\"evaluation\", path=\"evaluation.json\"\n",
    "    )\n",
    "\n",
    "    step_eval = ProcessingStep(\n",
    "        name=f\"{pipeline_name}-evaluate-model\",\n",
    "        step_args=eval_args,\n",
    "        property_files=[evaluation_report],\n",
    "        cache_config = cache_config\n",
    "    )\n",
    "    \n",
    "    # Define register step\n",
    "    model = Model(\n",
    "        image_uri=xgboost_image_uri,        \n",
    "        model_data=step_train.properties.ModelArtifacts.S3ModelArtifacts,\n",
    "        sagemaker_session=session,\n",
    "        role=role,\n",
    "    )\n",
    "\n",
    "    model_metrics = ModelMetrics(\n",
    "        model_statistics=MetricsSource(\n",
    "            s3_uri=\"{}/evaluation.json\".format(\n",
    "                step_eval.arguments[\"ProcessingOutputConfig\"][\"Outputs\"][0][\"S3Output\"][\"S3Uri\"]\n",
    "            ),\n",
    "            content_type=\"application/json\",\n",
    "        )\n",
    "    )\n",
    "\n",
    "    register_args = model.register(\n",
    "        content_types=[\"text/csv\"],\n",
    "        response_types=[\"text/csv\"],\n",
    "        inference_instances=[\"ml.t2.medium\", \"ml.m5.xlarge\", \"ml.m5.large\"],\n",
    "        transform_instances=[\"ml.m5.xlarge\", \"ml.m5.large\"],\n",
    "        model_package_group_name=model_package_group_name,\n",
    "        approval_status=model_approval_status_param,\n",
    "        model_metrics=model_metrics,\n",
    "    )\n",
    "\n",
    "    step_register = ModelStep(\n",
    "        name=f\"{pipeline_name}-register\",\n",
    "        step_args=register_args\n",
    "    )\n",
    "\n",
    "    # Fail step\n",
    "    step_fail = FailStep(\n",
    "        name=f\"{pipeline_name}-fail\",\n",
    "        error_message=Join(on=\" \", values=[\"Execution failed due to AUC Score >\", test_score_threshold_param]),\n",
    "    )\n",
    "    \n",
    "    # Condition step\n",
    "    cond_lte = ConditionGreaterThan(\n",
    "        left=JsonGet(\n",
    "            step_name=step_eval.name,\n",
    "            property_file=evaluation_report,\n",
    "            json_path=\"classification_metrics.auc_score.value\",\n",
    "        ),\n",
    "        right=test_score_threshold_param,\n",
    "    )\n",
    "\n",
    "    step_cond = ConditionStep(\n",
    "        name=f\"{pipeline_name}-check-test-score\",\n",
    "        conditions=[cond_lte],\n",
    "        if_steps=[step_register],\n",
    "        else_steps=[step_fail],\n",
    "    )\n",
    "    \n",
    "    # Pipeline instance\n",
    "    pipeline = Pipeline(\n",
    "        name=pipeline_name,\n",
    "        parameters=[\n",
    "            process_instance_type_param,\n",
    "            train_instance_type_param,\n",
    "            train_instance_count_param,\n",
    "            model_approval_status_param,\n",
    "            test_score_threshold_param,\n",
    "            input_s3_url_param,\n",
    "        ],\n",
    "        steps=[step_process, step_train, step_eval, step_cond],\n",
    "        sagemaker_session=session,\n",
    "    )\n",
    "    \n",
    "    return pipeline"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1856dd52-e4e9-455a-9d80-e4f0bdd02793",
   "metadata": {},
   "source": [
    "Copy this `pipeline.py` file from the workshop folder to the `pipelines/fromideatoprod` folder in the project's code repository folder:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "6c7032f9-7a3b-4c13-af03-70ee4476514d",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "!cp ~/{workshop_folder}/pipeline.py ~/{project_folder}/pipelines/fromideatoprod/"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3dd662c7-0411-4b2f-9234-e60ba81e2bc0",
   "metadata": {},
   "source": [
    "#### Attach the model package group to the project\n",
    "Project-owned resources are automatically tagged with `sagemaker:project-name` and `sagemaker:project-id` tags for cost control, attribute-based security control, and governance. \n",
    "Since the model package group already exists in the model registry, you need to tag it to attach to this project. The following code cell calls `AddTags` API to set project tags to the model package group."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "f74be4dc-c7ed-4fc0-918d-851caaf8fa78",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "model_package_group_arn = sm.describe_model_package_group(ModelPackageGroupName=model_package_group_name).get(\"ModelPackageGroupArn\")\n",
    "\n",
    "if model_package_group_arn:\n",
    "    print(f\"Adding tags {project_arn.split('/')[-1]} and {project_id} for model package group {model_package_group_arn}\")\n",
    "    r = sm.add_tags(\n",
    "        ResourceArn=model_package_group_arn,\n",
    "        Tags=[\n",
    "            {\n",
    "                'Key': 'sagemaker:project-name',\n",
    "                'Value': project_arn.split(\"/\")[-1]\n",
    "            },\n",
    "            {\n",
    "                'Key': 'sagemaker:project-id',\n",
    "                'Value': project_id\n",
    "            },\n",
    "        ]\n",
    "    )\n",
    "    print(r)\n",
    "else:\n",
    "    print(f\"The model package group {model_package_group_name} doesn't exist\")\n",
    "    \n",
    "sm.list_tags(ResourceArn=model_package_group_arn)[\"Tags\"]"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f760f4bd-c5d5-4bf7-b56c-11e70fa981e1",
   "metadata": {},
   "source": [
    "### 3. Modify the build specification file\n",
    "You must modify the `codebuild-buildspec.yml` file in the project folder to reflect the new name of Python module with your pipeline and set project-specific parameters."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "07928bfd-570a-4be9-a21e-7a1160e23b71",
   "metadata": {},
   "source": [
    "First, print the value of `input_s3_url` variable with the S3 path to the source dataset. You must pass this value to the pipeline:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "736ce50f-b4a6-4b84-9c4e-f4a9f2c30711",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "input_s3_url"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c12f8aa7-6bb2-40bc-824f-4dd70f8d5036",
   "metadata": {},
   "source": [
    "Second, replace the value of the `input_data_url` parameter in the following code cell with the printed value of `input_s3_url`. \n",
    "\n",
    "First, locate the parameter `kwargs` in the code snippet in the following code cell starting with `%%writefile codebuild-buildspec.yml`:\n",
    "\n",
    "```\n",
    "--kwargs \"{\\\"region\\\":\\\"${AWS_REGION}\\\",\\\"sagemaker_project_arn\\\":\\\"${SAGEMAKER_PROJECT_ARN}\\\",\n",
    "\\\"role\\\":\\\"${SAGEMAKER_PIPELINE_ROLE_ARN}\\\",\\\"default_bucket\\\":\\\"${ARTIFACT_BUCKET}\\\",\n",
    "\\\"input_data_url\\\":\\\"s3://sagemaker-us-east-1-906545278380/from-idea-to-prod/xgboost/input/bank-additional-full.csv\\\"}\"\n",
    "```\n",
    "\n",
    "and replace the value of `input_data_url` at the very end of the string with the value printed by the previous cell.\n",
    "\n",
    "Finally, execute the cell to create a build spec file."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "6713dcbc-4145-402a-92a2-e17b5b44c6c7",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "%%writefile codebuild-buildspec.yml\n",
    "\n",
    "version: 0.2\n",
    "\n",
    "phases:\n",
    "  install:\n",
    "    runtime-versions:\n",
    "      python: 3.8\n",
    "    commands:\n",
    "      - pip install --upgrade --force-reinstall . \"awscli>1.20.30\"\n",
    "  \n",
    "  build:\n",
    "    commands:\n",
    "      - export PYTHONUNBUFFERED=TRUE\n",
    "      - export SAGEMAKER_PROJECT_NAME_ID=\"${SAGEMAKER_PROJECT_NAME}-${SAGEMAKER_PROJECT_ID}\"\n",
    "      - |\n",
    "        run-pipeline --module-name pipelines.fromideatoprod.pipeline \\\n",
    "          --role-arn $SAGEMAKER_PIPELINE_ROLE_ARN \\\n",
    "          --tags \"[{\\\"Key\\\":\\\"sagemaker:project-name\\\", \\\"Value\\\":\\\"${SAGEMAKER_PROJECT_NAME}\\\"}, {\\\"Key\\\":\\\"sagemaker:project-id\\\", \\\"Value\\\":\\\"${SAGEMAKER_PROJECT_ID}\\\"}]\" \\\n",
    "          --kwargs \"{\\\"region\\\":\\\"${AWS_REGION}\\\",\\\"sagemaker_project_arn\\\":\\\"${SAGEMAKER_PROJECT_ARN}\\\",\\\"sagemaker_project_id\\\":\\\"${SAGEMAKER_PROJECT_ID}\\\",\\\"role\\\":\\\"${SAGEMAKER_PIPELINE_ROLE_ARN}\\\",\\\"default_bucket\\\":\\\"${ARTIFACT_BUCKET}\\\",\\\"input_data_url\\\":\\\"s3://sagemaker-us-east-1-462832133259/from-idea-to-prod/xgboost/input/bank-additional-full.csv\\\"}\"\n",
    "      - echo \"Create/Update of the SageMaker Pipeline and execution completed.\""
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ffd4a948-8a46-4d75-b15a-c15d3d4873c4",
   "metadata": {},
   "source": [
    "Copy the `codebuild-buildspec.yml` file from the workshop folder to the project's code repository folder:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "f521809e-b0fe-46a7-b49f-5afea0ab6ad2",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "!cp ~/{workshop_folder}/codebuild-buildspec.yml ~/{project_folder}/codebuild-buildspec.yml"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "00a152ac-2237-45d9-a1a9-1b3d1738e607",
   "metadata": {},
   "source": [
    "To summurize, you have just done three changes in the build spec file:\n",
    "1. Modified the `run-pipeline` `--module-name` parameter value from `pipelines.abalone.pipeline` to the new path `pipelines.fromideatoprod.pipeline`\n",
    "2. Removed some parameters from the `kwargs` list to make use of `get_pipeline()` function default parameter values\n",
    "3. Added an Amazon S3 url to the source data to the `kwargs` parameter list"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e71c081b-75f2-4c06-a892-8348e9d0cd91",
   "metadata": {},
   "source": [
    "### 4. Fix the `setup.py` file\n",
    "Finally, open the `setup.py` file in the project's code repository folder and replace the line `required_packages = [\"sagemaker==2.XX.0\"]` with `required_packages = [\"sagemaker\"]`. Save your changes.\n",
    "\n",
    "Why did you do this change? The pinned sagemaker library version is a bug and is going to be fixed in future releases of the built-in SageMaker project templates. For now you fix this template file manually. Keep in mind, that the built-in project templates are for your convenience only and to demostrate how to use SageMaker project mechanism to package and provision your own custom MLOps projects.\n",
    "\n",
    "Now you are ready to launch the CI/CD model building pipeline."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9fcfd6d3-2116-42e9-988b-e377fcd4b912",
   "metadata": {},
   "source": [
    "---"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d4ee8789-1d8f-4a06-a958-8c244fa9a869",
   "metadata": {},
   "source": [
    "## Run CI/CD model building pipeline\n",
    "❗ Make sure you are in the local folder that contains the **repository code** in Studio file browser. The folder name looks like `sagemaker-<project-name>-<project-id>-modelbuild`.\n",
    "\n",
    "Open a Studio system terminal via the Studio menu **File** > **New** > **Terminal** and enter the following commands. Keep `user.email` and `user.name` or replace with your data.\n",
    "```sh\n",
    "cd ~/<PROJECT-FOLDER>/<PROJECT-CODE-REPOSITORY-FOLDER>\n",
    "\n",
    "git config --global user.email \"you@example.com\"\n",
    "git config --global user.name \"Your Name\"\n",
    "  \n",
    "git add -A\n",
    "git commit -am \"customize project\"\n",
    "git push\n",
    "```\n",
    "\n",
    "You an also work with git command via Git option on the Studio [left sidebar](https://docs.aws.amazon.com/sagemaker/latest/dg/studio-ui.html).\n",
    "\n",
    "After pushing your code changes, the MLOps project initiates a run of the CodePipeline pipeline that updates and executes the SageMaker model building pipeline. This new pipeline execution creates a new model version in the model package group in the SageMaker model registry.\n",
    "\n",
    "You can follow up the execution of the pipeline in **Home** > **Pipelines**.\n",
    "\n",
    "Wait until the pipeline execution finishes. The execution takes about 12 minutes to complete."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a30f9944-4659-4e53-9d99-10a58eac4cdf",
   "metadata": {},
   "source": [
    "## View the details of a new model version\n",
    "After the pipeline execution finished, a new model version must be registered in the model registry. To see the model version details:\n",
    "\n",
    "1. In the Studio sidebar, choose the **Home** icon\n",
    "2. Chose **Models** and then elect **Model registry** from the list\n",
    "3. Click on the name of the model package group you created in the step 3 notebook (`from-idea-to-prod-model-group`) to open the model group\n",
    "4. In the list of model versions, double-click on the latest version of the model\n",
    "\n",
    "![](img/model-package-group.png)\n",
    "\n",
    "![](img/model-package-group-2.png)\n",
    "\n",
    "On the model version tab that opens, you can browse activity, [model version details](https://docs.aws.amazon.com/sagemaker/latest/dg/model-registry-details.html), and [data lineage](https://docs.aws.amazon.com/sagemaker/latest/dg/lineage-tracking.html). \n",
    "\n",
    "![](img/model-version-details.png)\n",
    "\n",
    "In a real-world project you add various model attributes and additional model version metadata such as [model quality metrics](https://docs.aws.amazon.com/sagemaker/latest/dg/model-monitor-model-quality-metrics.html), [explainability](https://docs.aws.amazon.com/sagemaker/latest/dg/clarify-model-explainability.html) and [bias](https://docs.aws.amazon.com/sagemaker/latest/dg/clarify-measure-post-training-bias.html) reports, load test data, and [inference recommender](https://docs.aws.amazon.com/sagemaker/latest/dg/inference-recommender.html)."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0b462a7b-d27c-401a-919f-0785617d1361",
   "metadata": {},
   "source": [
    "## Summary\n",
    "In this notebook you implement a CI/CD pipeline with the following features:\n",
    "- Model building ML pipeline is under the source control in a CodeCommit repository\n",
    "- Every push into the code commit repository launches a new CodeBuild build which upserts and executes the ML pipeline\n",
    "- SageMaker project is a logical construct in Studio which has the metadata about related ML pipelines, repositories, models, experiments, and inference endpoints"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "86fcc951-0ea2-43f3-89f0-5b2982334d11",
   "metadata": {},
   "source": [
    "## Continue with the step 5\n",
    "open the step 5 [notebook](05-deploy.ipynb)."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4afba0bb-5a1d-4399-b3bc-3664be5e40b9",
   "metadata": {},
   "source": [
    "## Further development ideas for your real-world projects\n",
    "- You can use a SageMaker-provided [MLOps template for model building, training, and deployment with third-party Git repositories using Jenkins](https://docs.aws.amazon.com/sagemaker/latest/dg/sagemaker-projects-templates-sm.html#sagemaker-projects-templates-git-jenkins)\n",
    "- Create a [custom SageMaker project template](https://docs.aws.amazon.com/sagemaker/latest/dg/sagemaker-projects-templates-custom.html) to cover your specific project requirements"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "06e55501-6c9c-40a7-89f3-04530e147a26",
   "metadata": {},
   "source": [
    "## Additional resources\n",
    "- [Amazon SageMaker Pipelines lab in SageMaker Immersion Day](https://catalog.us-east-1.prod.workshops.aws/workshops/63069e26-921c-4ce1-9cc7-dd882ff62575/en-US/lab6)\n",
    "- [Enhance your machine learning development by using a modular architecture with Amazon SageMaker projects](https://aws.amazon.com/blogs/machine-learning/enhance-your-machine-learning-development-by-using-a-modular-architecture-with-amazon-sagemaker-projects/)\n",
    "- [Dive deep into automating MLOps](https://www.youtube.com/watch?v=3_cHnk9VSfQ)\n",
    "- [SageMaker MLOps Project Walkthrough](https://docs.aws.amazon.com/sagemaker/latest/dg/sagemaker-projects-walkthrough.html)\n",
    "- [`aws-samples` GitHub repository with custom project templates examples](https://github.com/aws-samples/sagemaker-custom-project-templates)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "53f175fc-3eea-4777-9047-329c5098eed7",
   "metadata": {},
   "source": [
    "# Shutdown kernel"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "feb3ee45-d774-4d7d-8954-6f5479e20090",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%html\n",
    "\n",
    "<p><b>Shutting down your kernel for this notebook to release resources.</b></p>\n",
    "<button class=\"sm-command-button\" data-commandlinker-command=\"kernelmenu:shutdown\" style=\"display:none;\">Shutdown Kernel</button>\n",
    "        \n",
    "<script>\n",
    "try {\n",
    "    els = document.getElementsByClassName(\"sm-command-button\");\n",
    "    els[0].click();\n",
    "}\n",
    "catch(err) {\n",
    "    // NoOp\n",
    "}    \n",
    "</script>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b10fb645-0789-4fc4-a662-1502d7b30fd2",
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "availableInstances": [
   {
    "_defaultOrder": 0,
    "_isFastLaunch": true,
    "category": "General purpose",
    "gpuNum": 0,
    "memoryGiB": 4,
    "name": "ml.t3.medium",
    "vcpuNum": 2
   },
   {
    "_defaultOrder": 1,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "memoryGiB": 8,
    "name": "ml.t3.large",
    "vcpuNum": 2
   },
   {
    "_defaultOrder": 2,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "memoryGiB": 16,
    "name": "ml.t3.xlarge",
    "vcpuNum": 4
   },
   {
    "_defaultOrder": 3,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "memoryGiB": 32,
    "name": "ml.t3.2xlarge",
    "vcpuNum": 8
   },
   {
    "_defaultOrder": 4,
    "_isFastLaunch": true,
    "category": "General purpose",
    "gpuNum": 0,
    "memoryGiB": 8,
    "name": "ml.m5.large",
    "vcpuNum": 2
   },
   {
    "_defaultOrder": 5,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "memoryGiB": 16,
    "name": "ml.m5.xlarge",
    "vcpuNum": 4
   },
   {
    "_defaultOrder": 6,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "memoryGiB": 32,
    "name": "ml.m5.2xlarge",
    "vcpuNum": 8
   },
   {
    "_defaultOrder": 7,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "memoryGiB": 64,
    "name": "ml.m5.4xlarge",
    "vcpuNum": 16
   },
   {
    "_defaultOrder": 8,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "memoryGiB": 128,
    "name": "ml.m5.8xlarge",
    "vcpuNum": 32
   },
   {
    "_defaultOrder": 9,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "memoryGiB": 192,
    "name": "ml.m5.12xlarge",
    "vcpuNum": 48
   },
   {
    "_defaultOrder": 10,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "memoryGiB": 256,
    "name": "ml.m5.16xlarge",
    "vcpuNum": 64
   },
   {
    "_defaultOrder": 11,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "memoryGiB": 384,
    "name": "ml.m5.24xlarge",
    "vcpuNum": 96
   },
   {
    "_defaultOrder": 12,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "memoryGiB": 8,
    "name": "ml.m5d.large",
    "vcpuNum": 2
   },
   {
    "_defaultOrder": 13,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "memoryGiB": 16,
    "name": "ml.m5d.xlarge",
    "vcpuNum": 4
   },
   {
    "_defaultOrder": 14,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "memoryGiB": 32,
    "name": "ml.m5d.2xlarge",
    "vcpuNum": 8
   },
   {
    "_defaultOrder": 15,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "memoryGiB": 64,
    "name": "ml.m5d.4xlarge",
    "vcpuNum": 16
   },
   {
    "_defaultOrder": 16,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "memoryGiB": 128,
    "name": "ml.m5d.8xlarge",
    "vcpuNum": 32
   },
   {
    "_defaultOrder": 17,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "memoryGiB": 192,
    "name": "ml.m5d.12xlarge",
    "vcpuNum": 48
   },
   {
    "_defaultOrder": 18,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "memoryGiB": 256,
    "name": "ml.m5d.16xlarge",
    "vcpuNum": 64
   },
   {
    "_defaultOrder": 19,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "memoryGiB": 384,
    "name": "ml.m5d.24xlarge",
    "vcpuNum": 96
   },
   {
    "_defaultOrder": 20,
    "_isFastLaunch": true,
    "category": "Compute optimized",
    "gpuNum": 0,
    "memoryGiB": 4,
    "name": "ml.c5.large",
    "vcpuNum": 2
   },
   {
    "_defaultOrder": 21,
    "_isFastLaunch": false,
    "category": "Compute optimized",
    "gpuNum": 0,
    "memoryGiB": 8,
    "name": "ml.c5.xlarge",
    "vcpuNum": 4
   },
   {
    "_defaultOrder": 22,
    "_isFastLaunch": false,
    "category": "Compute optimized",
    "gpuNum": 0,
    "memoryGiB": 16,
    "name": "ml.c5.2xlarge",
    "vcpuNum": 8
   },
   {
    "_defaultOrder": 23,
    "_isFastLaunch": false,
    "category": "Compute optimized",
    "gpuNum": 0,
    "memoryGiB": 32,
    "name": "ml.c5.4xlarge",
    "vcpuNum": 16
   },
   {
    "_defaultOrder": 24,
    "_isFastLaunch": false,
    "category": "Compute optimized",
    "gpuNum": 0,
    "memoryGiB": 72,
    "name": "ml.c5.9xlarge",
    "vcpuNum": 36
   },
   {
    "_defaultOrder": 25,
    "_isFastLaunch": false,
    "category": "Compute optimized",
    "gpuNum": 0,
    "memoryGiB": 96,
    "name": "ml.c5.12xlarge",
    "vcpuNum": 48
   },
   {
    "_defaultOrder": 26,
    "_isFastLaunch": false,
    "category": "Compute optimized",
    "gpuNum": 0,
    "memoryGiB": 144,
    "name": "ml.c5.18xlarge",
    "vcpuNum": 72
   },
   {
    "_defaultOrder": 27,
    "_isFastLaunch": false,
    "category": "Compute optimized",
    "gpuNum": 0,
    "memoryGiB": 192,
    "name": "ml.c5.24xlarge",
    "vcpuNum": 96
   },
   {
    "_defaultOrder": 28,
    "_isFastLaunch": true,
    "category": "Accelerated computing",
    "gpuNum": 1,
    "memoryGiB": 16,
    "name": "ml.g4dn.xlarge",
    "vcpuNum": 4
   },
   {
    "_defaultOrder": 29,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 1,
    "memoryGiB": 32,
    "name": "ml.g4dn.2xlarge",
    "vcpuNum": 8
   },
   {
    "_defaultOrder": 30,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 1,
    "memoryGiB": 64,
    "name": "ml.g4dn.4xlarge",
    "vcpuNum": 16
   },
   {
    "_defaultOrder": 31,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 1,
    "memoryGiB": 128,
    "name": "ml.g4dn.8xlarge",
    "vcpuNum": 32
   },
   {
    "_defaultOrder": 32,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 4,
    "memoryGiB": 192,
    "name": "ml.g4dn.12xlarge",
    "vcpuNum": 48
   },
   {
    "_defaultOrder": 33,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 1,
    "memoryGiB": 256,
    "name": "ml.g4dn.16xlarge",
    "vcpuNum": 64
   },
   {
    "_defaultOrder": 34,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 1,
    "memoryGiB": 61,
    "name": "ml.p3.2xlarge",
    "vcpuNum": 8
   },
   {
    "_defaultOrder": 35,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 4,
    "memoryGiB": 244,
    "name": "ml.p3.8xlarge",
    "vcpuNum": 32
   },
   {
    "_defaultOrder": 36,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 8,
    "memoryGiB": 488,
    "name": "ml.p3.16xlarge",
    "vcpuNum": 64
   },
   {
    "_defaultOrder": 37,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 8,
    "memoryGiB": 768,
    "name": "ml.p3dn.24xlarge",
    "vcpuNum": 96
   },
   {
    "_defaultOrder": 38,
    "_isFastLaunch": false,
    "category": "Memory Optimized",
    "gpuNum": 0,
    "memoryGiB": 16,
    "name": "ml.r5.large",
    "vcpuNum": 2
   },
   {
    "_defaultOrder": 39,
    "_isFastLaunch": false,
    "category": "Memory Optimized",
    "gpuNum": 0,
    "memoryGiB": 32,
    "name": "ml.r5.xlarge",
    "vcpuNum": 4
   },
   {
    "_defaultOrder": 40,
    "_isFastLaunch": false,
    "category": "Memory Optimized",
    "gpuNum": 0,
    "memoryGiB": 64,
    "name": "ml.r5.2xlarge",
    "vcpuNum": 8
   },
   {
    "_defaultOrder": 41,
    "_isFastLaunch": false,
    "category": "Memory Optimized",
    "gpuNum": 0,
    "memoryGiB": 128,
    "name": "ml.r5.4xlarge",
    "vcpuNum": 16
   },
   {
    "_defaultOrder": 42,
    "_isFastLaunch": false,
    "category": "Memory Optimized",
    "gpuNum": 0,
    "memoryGiB": 256,
    "name": "ml.r5.8xlarge",
    "vcpuNum": 32
   },
   {
    "_defaultOrder": 43,
    "_isFastLaunch": false,
    "category": "Memory Optimized",
    "gpuNum": 0,
    "memoryGiB": 384,
    "name": "ml.r5.12xlarge",
    "vcpuNum": 48
   },
   {
    "_defaultOrder": 44,
    "_isFastLaunch": false,
    "category": "Memory Optimized",
    "gpuNum": 0,
    "memoryGiB": 512,
    "name": "ml.r5.16xlarge",
    "vcpuNum": 64
   },
   {
    "_defaultOrder": 45,
    "_isFastLaunch": false,
    "category": "Memory Optimized",
    "gpuNum": 0,
    "memoryGiB": 768,
    "name": "ml.r5.24xlarge",
    "vcpuNum": 96
   },
   {
    "_defaultOrder": 46,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 1,
    "memoryGiB": 16,
    "name": "ml.g5.xlarge",
    "vcpuNum": 4
   },
   {
    "_defaultOrder": 47,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 1,
    "memoryGiB": 32,
    "name": "ml.g5.2xlarge",
    "vcpuNum": 8
   },
   {
    "_defaultOrder": 48,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 1,
    "memoryGiB": 64,
    "name": "ml.g5.4xlarge",
    "vcpuNum": 16
   },
   {
    "_defaultOrder": 49,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 1,
    "memoryGiB": 128,
    "name": "ml.g5.8xlarge",
    "vcpuNum": 32
   },
   {
    "_defaultOrder": 50,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 1,
    "memoryGiB": 256,
    "name": "ml.g5.16xlarge",
    "vcpuNum": 64
   },
   {
    "_defaultOrder": 51,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 4,
    "memoryGiB": 192,
    "name": "ml.g5.12xlarge",
    "vcpuNum": 48
   },
   {
    "_defaultOrder": 52,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 4,
    "memoryGiB": 384,
    "name": "ml.g5.24xlarge",
    "vcpuNum": 96
   },
   {
    "_defaultOrder": 53,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 8,
    "memoryGiB": 768,
    "name": "ml.g5.48xlarge",
    "vcpuNum": 192
   }
  ],
  "instance_type": "ml.t3.medium",
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.12"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}