{
"cells": [
{
"cell_type": "markdown",
"id": "b23cf44f",
"metadata": {},
"source": [
"# Train a XGBoost regression model on Amazon SageMaker and host inference as an API on a Docker container running on AWS App Runner\n",
"\n",
"[Amazon SageMaker](https://aws.amazon.com/sagemaker/) is a fully managed end-to-end Machine Learning (ML) service. With SageMaker, you have the option of using the built-in algorithms or you can bring your own algorithms and frameworks to train your models. After training, you can deploy the models in [one of two ways](https://docs.aws.amazon.com/sagemaker/latest/dg/deploy-model.html) for inference - persistent endpoint or batch transform.\n",
"\n",
"With a persistent inference endpoint, you get a fully-managed real-time HTTPS endpoint hosted on either CPU or GPU based EC2 instances. It supports features like auto scaling, data capture, model monitoring and also provides cost-effective GPU support using [Amazon Elastic Inference](https://docs.aws.amazon.com/sagemaker/latest/dg/ei.html). It also supports hosting multiple models using multi-model endpoints that provide A/B testing capability. You can monitor the endpoint using [Amazon CloudWatch](https://aws.amazon.com/cloudwatch/). In addition to all these, you can use [Amazon SageMaker Pipelines](https://aws.amazon.com/sagemaker/pipelines/) which provides a purpose-built, easy-to-use Continuous Integration and Continuous Delivery (CI/CD) service for Machine Learning.\n",
"\n",
"There are use cases where you may want to host the ML model on a real-time inference endpoint that is cost-effective and do not require all the capabilities provided by the SageMaker persistent inference endpoint. These may involve,\n",
"* simple models\n",
"* models whose sizes are lesser than 200 MB\n",
"* models that are invoked sparsely and do not need inference instances running all the time\n",
"* models that do not need to be re-trained and re-deployed frequently\n",
"* models that do not need GPUs for inference\n",
"\n",
"In these cases, you can take the trained ML model and host it as an API on a Docker container on [AWS App Runner](https://aws.amazon.com/apprunner/). This will be cost-effective as compared to having real-time inference instances and still provide a fully-managed and scalable solution.\n",
"\n",
"[AWS App Runner](https://aws.amazon.com/apprunner/) is a fully managed service that makes it easy for developers to quickly deploy containerized web applications and APIs, at scale and with no prior infrastructure experience required. App Runner automatically builds and deploys the web application and load balances traffic with encryption. App Runner also scales up or down automatically to meet your traffic needs.\n",
"\n",
"This notebook demonstrates this solution by using SageMaker's [built-in XGBoost algorithm](https://docs.aws.amazon.com/sagemaker/latest/dg/xgboost.html) to train a regression model on the [California Housing dataset](https://www.dcc.fc.up.pt/~ltorgo/Regression/cal_housing.html). It loads the trained model as a Python3 [pickle](https://docs.python.org/3/library/pickle.html) object in a Python3 [Flask](https://flask.palletsprojects.com/en/1.1.x/) app script in a Docker container to be hosted as an API on [AWS App Runner](https://aws.amazon.com/apprunner/).\n",
"\n",
"**Warning:** The Python3 [pickle](https://docs.python.org/3/library/pickle.html) module is not secure. Only unpickle data you trust. Keep this in mind if you decide to get the trained ML model file from somewhere instead of building your own model.\n",
"\n",
"**Note:**\n",
"\n",
"* This notebook should only be run from within a SageMaker notebook instance as it references SageMaker native APIs. The underlying OS of the notebook instance can either be Amazon Linux v1 or v2.\n",
"* At the time of writing this notebook, the most relevant latest version of the Jupyter notebook kernel for this notebook was `conda_python3` and this came built-in with SageMaker notebooks.\n",
"* This notebook uses CPU based instances for training.\n",
"* If you already have a trained model that can be loaded as a Python3 [pickle](https://docs.python.org/3/library/pickle.html) object, then you can skip the training step in this notebook and directly upload the model file to S3 and update the code in this notebook's cells accordingly.\n",
"* In this notebook, the ML model generated in the training step has not been tuned as that is not the intent of this demo.\n",
"* At the time of writing this notebook, [AWS App Runner](https://aws.amazon.com/apprunner/) was a new service and supported only in a few regions. To know if this service is available in a specific region, either refer the [documentation](https://docs.aws.amazon.com/apprunner/index.html) or check in the [AWS console](https://aws.amazon.com/console/).\n",
"* At the time of writing this notebook, AWS App Runner supported only public endpoints.\n",
"* If you intend to deploy this to Production with access security, then you have to build the authentication and authorization layers in the container prior to letting the call go to the inference script. This is not covered in this notebook.\n",
"* This notebook will create resources in the same AWS account and in the same region where this notebook is running.\n",
"* Users of this notebook require `root` access to install/update required software. This is set by default when you create the notebook. For more info, refer [here](https://docs.aws.amazon.com/sagemaker/latest/dg/nbi-root-access.html).\n",
"\n",
"**Table of Contents:**\n",
"\n",
"1. [Complete prerequisites](#Complete%20prerequisites)\n",
"\n",
" 1. [Check and configure access to the Internet](#Check%20and%20configure%20access%20to%20the%20Internet)\n",
"\n",
" 2. [Check and upgrade required software versions](#Check%20and%20upgrade%20required%20software%20versions)\n",
" \n",
" 3. [Check and configure security permissions](#Check%20and%20configure%20security%20permissions)\n",
"\n",
" 4. [Organize imports](#Organize%20imports)\n",
" \n",
" 5. [Create common objects](#Create%20common%20objects)\n",
"\n",
"2. [Prepare the data](#Prepare%20the%20data)\n",
"\n",
" 1. [Create the local directories](#Create%20the%20local%20directories)\n",
" \n",
" 2. [Load the dataset and view the details](#Load%20the%20dataset%20and%20view%20the%20details)\n",
" \n",
" 3. [(Optional) Visualize the dataset](#(Optional)%20Visualize%20the%20dataset)\n",
" \n",
" 4. [Split the dataset into train, validate and test sets](#Split%20the%20dataset%20into%20train,%20validate%20and%20test%20sets)\n",
" \n",
" 5. [Standardize the datasets](#Standardize%20the%20datasets)\n",
" \n",
" 6. [Save the prepared datasets locally](#Save%20the%20prepared%20datasets%20locally)\n",
" \n",
" 7. [Upload the prepared datasets to S3](#Upload%20the%20prepared%20datasets%20to%20S3)\n",
"\n",
"3. [Perform training](#Perform%20training)\n",
"\n",
" 1. [Set the training parameters](#Set%20the%20training%20parameters)\n",
" \n",
" 2. [(Optional) Delete previous checkpoints](#(Optional)%20Delete%20previous%20checkpoints)\n",
" \n",
" 3. [Run the training job](#Run%20the%20training%20job)\n",
"\n",
"4. [Create and push the Docker container to an Amazon ECR repository](#Create%20and%20push%20the%20Docker%20container%20to%20an%20Amazon%20ECR%20repository)\n",
"\n",
" 1. [Retrieve the model pickle file](#Retrieve%20the%20model%20pickle%20file)\n",
" \n",
" 2. [(Optional) Test the model pickle file](#(Optional)%20Test%20the%20model%20pickle%20file)\n",
" \n",
" 3. [View the inference script](#View%20the%20inference%20script)\n",
" \n",
" 4. [Create the Dockerfile](#Create%20the%20Dockerfile)\n",
" \n",
" 5. [Create the container](#Create%20the%20container)\n",
" \n",
" 6. [Create the private repository in ECR](#Create%20the%20private%20repository%20in%20ECR)\n",
" \n",
" 7. [Push the container to ECR](#Push%20the%20container%20to%20ECR)\n",
"\n",
"5. [Deploy and test on AWS App Runner](#Deploy%20and%20test%20on%20AWS%20App%20Runner)\n",
" \n",
" 1. [Create the App Runner service](#Create%20the%20App%20Runner%20service)\n",
" \n",
" 2. [Test the App Runner service](#Test%20the%20App%20Runner%20service)\n",
"\n",
"6. [Cleanup](#Cleanup)\n",
"\n",
" 1. [Cleanup App Runner resources](#Cleanup%20App%20Runner%20resources)\n",
" \n",
" 2. [Cleanup ECR repository](#Cleanup%20ECR%20repository)\n",
" \n",
" 3. [Cleanup S3 objects](#Cleanup%20S3%20objects)\n"
]
},
{
"cell_type": "markdown",
"id": "f48a80cd",
"metadata": {},
"source": [
"## 1. Complete prerequisites \n",
"\n",
"Check and complete the prerequisites."
]
},
{
"cell_type": "markdown",
"id": "42ac2978",
"metadata": {},
"source": [
"### A. Check and configure access to the Internet \n",
"\n",
"This notebook requires outbound access to the Internet to download the required software updates and to make calls to the container hosted as an AWS App Runner service. You can either provide direct Internet access (default) or provide Internet access through a VPC. For more information on this, refer [here](https://docs.aws.amazon.com/sagemaker/latest/dg/appendix-notebook-and-internet-access.html)."
]
},
{
"cell_type": "markdown",
"id": "bada0393",
"metadata": {},
"source": [
"### B. Check and upgrade required software versions \n",
"\n",
"This notebook requires:\n",
"* [SageMaker Python SDK version 2.x](https://sagemaker.readthedocs.io/en/stable/v2.html)\n",
"* [Python 3.6.x](https://www.python.org/downloads/release/python-360/)\n",
"* [Boto3](https://boto3.amazonaws.com/v1/documentation/api/latest/index.html)\n",
"* [AWS Command Line Interface](https://aws.amazon.com/cli/)\n",
"* [Docker](https://www.docker.com/)\n",
"* [XGBoost Python module](https://xgboost.readthedocs.io/en/latest/python/python_intro.html)\n",
"* [cURL](https://curl.se/)"
]
},
{
"cell_type": "markdown",
"id": "9b306633",
"metadata": {},
"source": [
"Capture the version of the OS on which this notebook is running."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "1854bd86",
"metadata": {},
"outputs": [],
"source": [
"import subprocess\n",
"from subprocess import Popen\n",
"\n",
"p = Popen(['cat','/etc/system-release'], stdout=subprocess.PIPE, stderr=subprocess.PIPE, universal_newlines=True)\n",
"os_cmd_output, os_cmd_error = p.communicate()\n",
"if len(os_cmd_error) > 0:\n",
" print('Notebook OS command returned error :: {}'.format(os_cmd_error))\n",
" os_version = ''\n",
"else:\n",
" if os_cmd_output.find('Amazon Linux release 2') >= 0:\n",
" os_version = 'ALv2'\n",
" elif os_cmd_output.find('Amazon Linux AMI release 2018.03') >= 0:\n",
" os_version = 'ALv1'\n",
" else:\n",
" os_version = ''\n",
"print('Notebook OS version : {}'.format(os_version))"
]
},
{
"cell_type": "markdown",
"id": "d684bd9b",
"metadata": {},
"source": [
"**Note:** When running the following cell, if you get 'module not found' errors, then uncomment the appropriate installation commands and install the modules. Also, uncomment and run the kernel shutdown command. When the kernel comes back, comment out the installation and kernel shutdown commands and run the following cell. Now, you should not see any errors."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "3c0f4a00",
"metadata": {},
"outputs": [],
"source": [
"\"\"\"\n",
"\n",
"Last tested versions:\n",
"\n",
"\n",
"On Amazon Linux v1 (ALv1) notebook:\n",
"-----------------------------------\n",
"SageMaker Python SDK version : 2.54.0\n",
"Python version : 3.6.13 | packaged by conda-forge | (default, Feb 19 2021, 05:36:01) \n",
"[GCC 9.3.0]\n",
"Boto3 version : 1.18.27\n",
"XGBoost Python module version : 1.4.2\n",
"AWS CLI version : aws-cli/1.20.21 Python/3.6.13 Linux/4.14.238-125.422.amzn1.x86_64 botocore/1.21.27\n",
"Docker version : 19.03.13-ce, build 4484c46\n",
"\n",
"\n",
"On Amazon Linux v2 (ALv2) notebook:\n",
"-----------------------------------\n",
"SageMaker Python SDK version : 2.59.1\n",
"Python version : 3.6.13 | packaged by conda-forge | (default, Feb 19 2021, 05:36:01) \n",
"[GCC 9.3.0]\n",
"Boto3 version : 1.18.36\n",
"XGBoost Python module version : 1.4.2\n",
"AWS CLI version : aws-cli/1.20.24 Python/3.6.13 Linux/4.14.243-185.433.amzn2.x86_64 botocore/1.21.36\n",
"Docker version : 20.10.7, build f0df350\n",
"Amazon ECR Docker Credential Helper : 0.6.3\n",
"\n",
"\"\"\"\n",
"\n",
"import boto3\n",
"import IPython\n",
"import os\n",
"import sagemaker\n",
"import sys\n",
"try:\n",
" import xgboost as xgb\n",
"except ModuleNotFoundError:\n",
" # Install XGBoost and restart kernel\n",
" print('Installing XGBoost module...')\n",
" !{sys.executable} -m pip install -U xgboost\n",
" IPython.Application.instance().kernel.do_shutdown(True)\n",
"\n",
"# Install/upgrade the Sagemaker SDK, Boto3 and XGBoost and restart kernel\n",
"#!{sys.executable} -m pip install -U sagemaker boto3 xgboost\n",
"#IPython.Application.instance().kernel.do_shutdown(True)\n",
"\n",
"# Get the current installed version of Sagemaker SDK, Python, Boto3 and XGBoost\n",
"print('SageMaker Python SDK version : {}'.format(sagemaker.__version__))\n",
"print('Python version : {}'.format(sys.version))\n",
"print('Boto3 version : {}'.format(boto3.__version__))\n",
"print('XGBoost Python module version : {}'.format(xgb.__version__))\n",
"\n",
"# Get the AWS CLI version\n",
"print('AWS CLI version : ')\n",
"!aws --version"
]
},
{
"cell_type": "markdown",
"id": "27323694",
"metadata": {},
"source": [
"**Docker:**\n",
"\n",
"Docker should be pre-installed in the SageMaker notebook instance. Verify it by running the `docker --version` command. If Docker is not installed, you can install it by uncommenting the install command in the following cell. You will require `sudo` rights to install."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "194b7169",
"metadata": {},
"outputs": [],
"source": [
"# Verify if docker is installed\n",
"!docker --version\n",
"\n",
"# Install docker\n",
"#!sudo yum --assumeyes install docker"
]
},
{
"cell_type": "markdown",
"id": "bf548ca3",
"metadata": {},
"source": [
"**cURL:**\n",
"\n",
"cURL should be pre-installed in the SageMaker notebook instance. Verify it by running the `curl --version` command. If cURL is not installed, you can install it by uncommenting the install command in the following cell. You will require `sudo` rights to install."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "a5155784",
"metadata": {},
"outputs": [],
"source": [
"\"\"\"\n",
"Last tested version:\n",
"curl 7.71.1 (x86_64-conda-linux-gnu) libcurl/7.71.1 OpenSSL/1.1.1j zlib/1.2.11 libssh2/1.9.0 nghttp2/1.43.0\n",
"Release-Date: 2020-07-01\n",
"Protocols: dict file ftp ftps gopher http https imap imaps pop3 pop3s rtsp scp sftp smb smbs smtp smtps telnet tftp \n",
"Features: AsynchDNS GSS-API HTTP2 HTTPS-proxy IPv6 Kerberos Largefile libz NTLM NTLM_WB SPNEGO SSL TLS-SRP UnixSockets\n",
"\"\"\"\n",
"\n",
"# Verify if curl is installed\n",
"!curl --version\n",
"\n",
"# Install curl\n",
"#!sudo yum --assumeyes install curl"
]
},
{
"cell_type": "markdown",
"id": "c894bcf3",
"metadata": {},
"source": [
"**Additional prerequisite (when notebook is running on Amazon Linux v2):**\n",
"\n",
"Install and configure the [Amazon ECR credential helper](https://github.com/awslabs/amazon-ecr-credential-helper). This makes it easier to store and use Docker credentials for use with Amazon ECR private registries."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "78737e88",
"metadata": {},
"outputs": [],
"source": [
"if os_version == 'ALv2':\n",
" # Install\n",
" !sudo yum --assumeyes install amazon-ecr-credential-helper\n",
" # Verify installation\n",
" print('Amazon ECR Docker Credential Helper version : ')\n",
" !docker-credential-ecr-login version\n",
" # Create the .docker directory if it doesn't exist\n",
" !mkdir -p ~/.docker\n",
" # Configure\n",
" !printf \"{\\\\n\\\\t\\\"credsStore\\\": \\\"ecr-login\\\"\\\\n}\" > ~/.docker/config.json\n",
" # Verify configuration\n",
" !cat ~/.docker/config.json"
]
},
{
"cell_type": "markdown",
"id": "27123a92",
"metadata": {},
"source": [
"### C. Check and configure security permissions \n",
"\n",
"Users of this notebook require `root` access to install/update required software. This is set by default when you create the notebook. For more info, refer [here](https://docs.aws.amazon.com/sagemaker/latest/dg/nbi-root-access.html).\n",
"\n",
"This notebook uses the IAM role attached to the underlying notebook instance. This role should have the following permissions,\n",
"\n",
"1. Full access to the S3 bucket that will be used to store training and output data.\n",
"2. Full access to launch training instances.\n",
"3. Access to create CloudWatch Log Groups.\n",
"4. Access to write to CloudWatch Logs and CloudWatch Metrics.\n",
"5. Access to create, delete and write to Amazon ECR private registries.\n",
"6. Access to create, update and delete AWS App Runner services.\n",
"\n",
"To view the name of this role, run the following cell."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "1770505e",
"metadata": {},
"outputs": [],
"source": [
"print(sagemaker.get_execution_role())"
]
},
{
"cell_type": "markdown",
"id": "0fad6d35",
"metadata": {},
"source": [
"This notebook creates an [image-based service](https://docs.aws.amazon.com/apprunner/latest/dg/service-source-image.html) on [AWS App Runner](https://docs.aws.amazon.com/apprunner/latest/dg/what-is-apprunner.html). This service requires the following service roles in IAM,\n",
"\n",
"1. Access role with the following permissions:\n",
" * Read access to Amazon ECR.\n",
"2. Instance role with the following permissions:\n",
" * Access to create CloudWatch Log Groups.\n",
" * Access to write to CloudWatch Logs and CloudWatch Metrics.\n",
"\n",
"For more information on this, refer [here](https://docs.aws.amazon.com/apprunner/latest/dg/security_iam_service-with-iam.html#security_iam_service-with-iam-roles)."
]
},
{
"cell_type": "markdown",
"id": "1fef3814",
"metadata": {},
"source": [
"### D. Organize imports \n",
"\n",
"Organize all the library and module imports for later use."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "85424300",
"metadata": {},
"outputs": [],
"source": [
"from io import StringIO\n",
"import json\n",
"import logging\n",
"import matplotlib.pyplot as plt\n",
"import numpy as np\n",
"import pickle\n",
"import pandas as pd\n",
"from sagemaker.inputs import TrainingInput\n",
"import seaborn as sns\n",
"import sklearn.model_selection\n",
"from sklearn.preprocessing import StandardScaler\n",
"import tarfile\n",
"import time"
]
},
{
"cell_type": "markdown",
"id": "c3ab3c6f",
"metadata": {},
"source": [
"### E. Create common objects \n",
"\n",
"Create common objects to be used in future steps in this notebook."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "b5df25e7",
"metadata": {},
"outputs": [],
"source": [
"# Specify the S3 bucket name\n",
"s3_bucket = ''\n",
"\n",
"# Create the S3 Boto3 resource\n",
"s3_resource = boto3.resource('s3')\n",
"s3_bucket_resource = s3_resource.Bucket(s3_bucket)\n",
"\n",
"# Create the SageMaker Boto3 client\n",
"sm_client = boto3.client('sagemaker')\n",
"\n",
"# Create the Amazon ECR client\n",
"ecr_client = boto3.client('ecr')\n",
"\n",
"# Create the AWS App Runner client\n",
"apprunner_client = boto3.client('apprunner')\n",
"\n",
"# Get the AWS region name\n",
"region_name = sagemaker.Session().boto_region_name\n",
"\n",
"# Base name to be used to create resources\n",
"nb_name = 'sm-xgboost-ca-housing-apprunner-model-hosting'\n",
"\n",
"# Names of various resources\n",
"train_job_name = 'train-{}'.format(nb_name)\n",
"\n",
"# Names of local sub-directories in the notebook file system\n",
"data_dir = os.path.join(os.getcwd(), 'data/{}'.format(nb_name))\n",
"train_dir = os.path.join(os.getcwd(), 'data/{}/train'.format(nb_name))\n",
"val_dir = os.path.join(os.getcwd(), 'data/{}/validate'.format(nb_name))\n",
"test_dir = os.path.join(os.getcwd(), 'data/{}/test'.format(nb_name))\n",
"\n",
"# Location of the datasets file in the notebook file system\n",
"dataset_csv_file = os.path.join(os.getcwd(), 'datasets/california_housing.csv')\n",
"\n",
"# Container artifacts directory in the notebook file system\n",
"container_artifacts_dir = os.path.join(os.getcwd(), 'container-artifacts/{}'.format(nb_name))\n",
"\n",
"# Location of the Python3 Flask script (containing the inference code) and it's corresponding\n",
"# requirements.txt in the notebook file system\n",
"container_script_file_name = 'container_sm_xgboost_ca_housing_inference.py'\n",
"container_script_req_file_name = 'container_sm_xgboost_ca_housing_inference_requirements.txt'\n",
"container_script_file = os.path.join(os.getcwd(), 'scripts/{}'.format(container_script_file_name))\n",
"container_script_req_file = os.path.join(os.getcwd(), 'scripts/{}'.format(container_script_req_file_name))\n",
"\n",
"# Sub-folder names in S3\n",
"train_dir_s3_prefix = '{}/data/train'.format(nb_name)\n",
"val_dir_s3_prefix = '{}/data/validate'.format(nb_name)\n",
"test_dir_s3_prefix = '{}/data/test'.format(nb_name)\n",
"\n",
"# Location in S3 where the model checkpoint will be stored\n",
"model_checkpoint_s3_path = 's3://{}/{}/checkpoint/'.format(s3_bucket, nb_name)\n",
"\n",
"# Location in S3 where the trained model will be stored\n",
"model_output_s3_path = 's3://{}/{}/output/'.format(s3_bucket, nb_name)\n",
"\n",
"# Names of the model tar file and extracted file - these are dependent on the\n",
"# framework and algorithm you used to train the model. This notebook uses\n",
"# SageMaker's built-in XGBoost algorithm and that will have the names as follows:\n",
"model_tar_file_name = 'model.tar.gz'\n",
"extracted_model_file_name = 'xgboost-model'\n",
"\n",
"# Container details\n",
"container_image_name = nb_name\n",
"container_registry_url_prefix = ''\n",
"\n",
"# App Runner details\n",
"## Service details\n",
"## Note: The ARN of the App Runner service will be generated when the service is created\n",
"apprunner_service_arn = 'TBD'\n",
"apprunner_service_name = 'sm-xgboost-ca-housing-model'\n",
"apprunner_cpu = '1 vCPU'\n",
"apprunner_memory = '2 GB'\n",
"apprunner_port = '8080'\n",
"apprunner_access_role = ''\n",
"apprunner_instance_role = ''\n",
"## Healthcheck details\n",
"apprunner_healthcheck_protocol = 'HTTP'\n",
"apprunner_healthcheck_path = '/healthcheck'\n",
"apprunner_healthcheck_interval_in_seconds = 10\n",
"apprunner_healthcheck_timeout_in_seconds = 10\n",
"apprunner_healthcheck_healthy_threshold = 3\n",
"apprunner_healthcheck_unhealthy_threshold = 3"
]
},
{
"cell_type": "markdown",
"id": "8b3a4d08",
"metadata": {},
"source": [
"## 2. Prepare the data \n",
"\n",
"The [California Housing dataset](https://www.dcc.fc.up.pt/~ltorgo/Regression/cal_housing.html) consists of 20,640 observations on housing prices with 9 economic covariates. These covariates are,\n",
"\n",
"* MedianHouseValue\n",
"* MedianIncome\n",
"* HousingMedianAge\n",
"* TotalRooms\n",
"* TotalBedrooms\n",
"* Population\n",
"* Households\n",
"* Latitude\n",
"* Longitude\n",
"\n",
"This dataset has been downloaded to the local `datasets` directory and modified as a CSV file with the feature names in the first row. This will be used in this notebook.\n",
"\n",
"The following steps will help with preparing the datasets for training, validation and testing."
]
},
{
"cell_type": "markdown",
"id": "f1259af0",
"metadata": {},
"source": [
"### A) Create the local directories \n",
"\n",
"Create the directories in the local system where the dataset will be copied to and processed."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "89720ba9",
"metadata": {},
"outputs": [],
"source": [
"# Create the local directories if they don't exist\n",
"os.makedirs(data_dir, exist_ok=True)\n",
"os.makedirs(train_dir, exist_ok=True)\n",
"os.makedirs(val_dir, exist_ok=True)\n",
"os.makedirs(test_dir, exist_ok=True)"
]
},
{
"cell_type": "markdown",
"id": "0310b251",
"metadata": {},
"source": [
"### B) Load the dataset and view the details \n",
"\n",
"Check if the CSV file exists in the `datasets` directory and load it into a Pandas DataFrame. Finally, print the details of the dataset."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "d00eea8c",
"metadata": {},
"outputs": [],
"source": [
"# Check if the dataset file exists and proceed\n",
"if os.path.exists(dataset_csv_file):\n",
" print('Dataset CSV file \\'{}\\' exists.'.format(dataset_csv_file))\n",
" # Load the data into a Pandas DataFrame\n",
" pd_data_frame = pd.read_csv(dataset_csv_file)\n",
" # Print the first 5 records\n",
" #print(pd_data_frame.head(5))\n",
" # Describe the dataset\n",
" print(pd_data_frame.describe())\n",
"else:\n",
" print('Dataset CSV file \\'{}\\' does not exist.'.format(dataset_csv_file))"
]
},
{
"cell_type": "markdown",
"id": "eca369f5",
"metadata": {},
"source": [
"### C) (Optional) Visualize the dataset \n",
"\n",
"Display the distributions in the dataset."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "d8f59d9d",
"metadata": {},
"outputs": [],
"source": [
"# Print the correlation matrix\n",
"plt.figure(figsize=(11, 7))\n",
"sns.heatmap(cbar=False, annot=True, data=(pd_data_frame.corr() * 100), cmap='coolwarm')\n",
"plt.title('% Correlation Matrix')\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"id": "c6deb712",
"metadata": {},
"source": [
"### D) Split the dataset into train, validate and test sets \n",
"\n",
"Split the dataset into train, validate and test sets after shuffling. Split further into x and y sets."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "d940577f",
"metadata": {},
"outputs": [],
"source": [
"# Split into train and test datasets after shuffling\n",
"train, test = sklearn.model_selection.train_test_split(pd_data_frame, test_size=0.2,\n",
" random_state=35, shuffle=True)\n",
"# Split the train dataset further into train and validation datasets after shuffling\n",
"train, val = sklearn.model_selection.train_test_split(train, test_size=0.1,\n",
" random_state=25, shuffle=True)\n",
"\n",
"# Define functions to get x and y columns\n",
"def get_x(df):\n",
" return df[['median_income','housing_median_age','total_rooms','total_bedrooms',\n",
" 'population','households','latitude','longitude']]\n",
"def get_y(df):\n",
" return df[['median_house_value']]\n",
"\n",
"# Load the x and y columns for train, validation and test datasets\n",
"x_train = get_x(train)\n",
"y_train = get_y(train)\n",
"x_val = get_x(val)\n",
"y_val = get_y(val)\n",
"x_test = get_x(test)\n",
"y_test = get_y(test)\n",
"\n",
"# Summarize the datasets\n",
"print(\"x_train shape:\", x_train.shape)\n",
"print(\"y_train shape:\", y_train.shape)\n",
"print(\"x_val shape:\", x_val.shape)\n",
"print(\"y_val shape:\", y_val.shape)\n",
"print(\"x_test shape:\", x_test.shape)\n",
"print(\"y_test shape:\", y_test.shape)"
]
},
{
"cell_type": "markdown",
"id": "1186830d",
"metadata": {},
"source": [
"### E) Standardize the datasets \n",
"\n",
"* Standardize the x columns of the train dataset using the `fit_transform()` function of `StandardScaler`.\n",
"* Standardize the x columns of the validate and test datasets using the `transform()` function of `StandardScaler`."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "5f8cd0c4",
"metadata": {},
"outputs": [],
"source": [
"# Standardize the dataset\n",
"scaler = StandardScaler()\n",
"x_train = scaler.fit_transform(x_train)\n",
"x_val = scaler.transform(x_val)\n",
"x_test = scaler.transform(x_test)"
]
},
{
"cell_type": "markdown",
"id": "90ee5315",
"metadata": {},
"source": [
"### F) Save the prepared datasets locally \n",
"\n",
"Save the prepared train, validate and test datasets to local directories. Prior to saving, concatenate x and y columns as needed. Create the directories if they don't exist."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "83e4a3bb",
"metadata": {},
"outputs": [],
"source": [
"# Save the prepared dataset (in numpy format) to the local directories as csv files\n",
"\n",
"np.savetxt(os.path.join(train_dir, 'train.csv'),\n",
" np.concatenate((y_train.to_numpy(), x_train), axis=1), delimiter=',')\n",
"np.savetxt(os.path.join(train_dir, 'train_x.csv'), x_train)\n",
"np.savetxt(os.path.join(train_dir, 'train_y.csv'), y_train.to_numpy())\n",
"\n",
"np.savetxt(os.path.join(val_dir, 'validate.csv'),\n",
" np.concatenate((y_val.to_numpy(), x_val), axis=1), delimiter=',')\n",
"np.savetxt(os.path.join(val_dir, 'validate_x.csv'), x_val)\n",
"np.savetxt(os.path.join(val_dir, 'validate_y.csv'), y_val.to_numpy())\n",
"\n",
"np.savetxt(os.path.join(test_dir, 'test.csv'),\n",
" np.concatenate((y_test.to_numpy(), x_test), axis=1), delimiter=',')\n",
"np.savetxt(os.path.join(test_dir, 'test_x.csv'), x_test)\n",
"np.savetxt(os.path.join(test_dir, 'test_y.csv'), y_test.to_numpy())"
]
},
{
"cell_type": "markdown",
"id": "89a41a68",
"metadata": {},
"source": [
"### G) Upload the prepared datasets to S3 \n",
"\n",
"Upload the datasets from the local directories to appropriate sub-directories in the specified S3 bucket."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "c514fcd4",
"metadata": {},
"outputs": [],
"source": [
"# Upload the data to S3\n",
"train_dir_s3_path = sagemaker.Session().upload_data(path='./data/{}/train/'.format(nb_name),\n",
" bucket=s3_bucket,\n",
" key_prefix=train_dir_s3_prefix)\n",
"val_dir_s3_path = sagemaker.Session().upload_data(path='./data/{}/validate/'.format(nb_name),\n",
" bucket=s3_bucket,\n",
" key_prefix=val_dir_s3_prefix)\n",
"test_dir_s3_path = sagemaker.Session().upload_data(path='./data/{}/test/'.format(nb_name),\n",
" bucket=s3_bucket,\n",
" key_prefix=test_dir_s3_prefix)\n",
"\n",
"# Capture the S3 locations of the uploaded datasets\n",
"train_s3_path = '{}/train.csv'.format(train_dir_s3_path)\n",
"train_x_s3_path = '{}/train_x.csv'.format(train_dir_s3_path)\n",
"train_y_s3_path = '{}/train_y.csv'.format(train_dir_s3_path)\n",
"val_s3_path = '{}/validate.csv'.format(val_dir_s3_path)\n",
"val_x_s3_path = '{}/validate_x.csv'.format(val_dir_s3_path)\n",
"val_y_s3_path = '{}/validate_y.csv'.format(val_dir_s3_path)\n",
"test_s3_path = '{}/test.csv'.format(test_dir_s3_path)\n",
"test_x_s3_path = '{}/test_x.csv'.format(test_dir_s3_path)\n",
"test_y_s3_path = '{}/test_y.csv'.format(test_dir_s3_path)"
]
},
{
"cell_type": "markdown",
"id": "9622728e",
"metadata": {},
"source": [
"## 3. Perform training \n",
"\n",
"In this step, SageMaker's [built-in XGBoost algorithm](https://docs.aws.amazon.com/sagemaker/latest/dg/xgboost.html) is used to train a regression model on the [California Housing dataset](https://www.dcc.fc.up.pt/~ltorgo/Regression/cal_housing.html).\n",
"\n",
"Note: This model has not been tuned as that is not the intent of this demo."
]
},
{
"cell_type": "markdown",
"id": "09d98056",
"metadata": {},
"source": [
"### A) Set the training parameters \n",
"\n",
"1. Inputs - S3 location of the training and validation data.\n",
"2. Hyperparameters.\n",
"3. Training instance details:\n",
"\n",
" 1. Instance count\n",
" \n",
" 2. Instance type\n",
" \n",
" 3. The max run time of the training job\n",
" \n",
" 4. (Optional) Use Spot instances. For more info, refer [here](https://docs.aws.amazon.com/sagemaker/latest/dg/model-managed-spot-training.html).\n",
" \n",
" 5. (Optional) The max wait for Spot instances, if using Spot. This should be larger than the max run time.\n",
" \n",
"4. Base job name\n",
"5. Appropriate local and S3 directories that will be used by the training job."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "fcf9bd2d",
"metadata": {},
"outputs": [],
"source": [
"# Set the input data input along with their content types\n",
"train_input = TrainingInput(train_s3_path, content_type='text/csv')\n",
"val_input = TrainingInput(val_s3_path, content_type='text/csv')\n",
"inputs = {'train':train_input, 'validation':val_input}\n",
"\n",
"# Set the hyperparameters\n",
"hyperparameters = {\n",
" 'objective':'reg:squarederror',\n",
" 'max_depth':'6',\n",
" 'eta':'0.3',\n",
" 'alpha':'3',\n",
" 'colsample_bytree':'0.7',\n",
" 'num_round':'100'}\n",
"\n",
"# Set the instance count, instance type, volume size, options to use Spot instances and other parameters\n",
"train_instance_count = 1\n",
"train_instance_type = 'ml.m5.xlarge'\n",
"train_instance_volume_size_in_gb = 5\n",
"#use_spot_instances = True\n",
"#spot_max_wait_time_in_seconds = 5400\n",
"use_spot_instances = False\n",
"spot_max_wait_time_in_seconds = None\n",
"max_run_time_in_seconds = 3600\n",
"algorithm_name = 'xgboost'\n",
"algorithm_version = '1.2-2'\n",
"py_version = 'py37'\n",
"# Get the container image URI for the specified parameters\n",
"container_image_uri = sagemaker.image_uris.retrieve(framework=algorithm_name,\n",
" region=region_name,\n",
" version=algorithm_version,\n",
" py_version=py_version,\n",
" instance_type=train_instance_type,\n",
" image_scope='training')\n",
"\n",
"# Set the training container related parameters\n",
"container_log_level = logging.INFO\n",
"\n",
"# Location where the model checkpoints will be stored locally in the container before being uploaded to S3\n",
"model_checkpoint_local_dir = '/opt/ml/checkpoints/'\n",
"\n",
"# Location where the trained model will be stored locally in the container before being uploaded to S3\n",
"model_local_dir = '/opt/ml/model'"
]
},
{
"cell_type": "markdown",
"id": "3ebd7db7",
"metadata": {},
"source": [
"### B) (Optional) Delete previous checkpoints \n",
"\n",
"If model checkpoints from previous trainings are found in the S3 checkpoint location specified in the previous step, then training will resume from those checkpoints. In order to start a fresh training, run the following code cell to delete all checkpoint objects from S3."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "014bd048",
"metadata": {},
"outputs": [],
"source": [
"# Delete the checkpoints if you want to train from the beginning; else ignore this code cell\n",
"for checkpoint_file in s3_bucket_resource.objects.filter(Prefix='{}/checkpoint/'.format(nb_name)):\n",
" checkpoint_file_key = checkpoint_file.key\n",
" print('Deleting {} ...'.format(checkpoint_file_key))\n",
" s3_resource.Object(s3_bucket_resource.name, checkpoint_file_key).delete()"
]
},
{
"cell_type": "markdown",
"id": "82a33b96",
"metadata": {},
"source": [
"### C) Run the training job \n",
"\n",
"Prepare the `estimator` and call the `fit()` method. This will pull the container containing the specified version of the algorithm in the AWS region and run the training job in the specified type of EC2 instance(s). The training data will be pulled from the specified location in S3 and training results and checkpoints will be written to the specified locations in S3.\n",
"\n",
"Note: SageMaker Debugger is disabled."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "7b2b0654",
"metadata": {},
"outputs": [],
"source": [
"# Create the estimator\n",
"estimator = sagemaker.estimator.Estimator(\n",
" image_uri=container_image_uri,\n",
" checkpoint_local_path=model_checkpoint_local_dir,\n",
" checkpoint_s3_uri=model_checkpoint_s3_path,\n",
" model_dir=model_local_dir,\n",
" output_path=model_output_s3_path,\n",
" instance_type=train_instance_type,\n",
" instance_count=train_instance_count,\n",
" use_spot_instances=use_spot_instances,\n",
" max_wait=spot_max_wait_time_in_seconds,\n",
" max_run=max_run_time_in_seconds,\n",
" hyperparameters=hyperparameters,\n",
" role=sagemaker.get_execution_role(),\n",
" base_job_name=train_job_name,\n",
" framework_version=algorithm_version,\n",
" py_version=py_version,\n",
" container_log_level=container_log_level,\n",
" script_mode=False,\n",
" debugger_hook_config=False,\n",
" disable_profiler=True)\n",
"\n",
"# Perform the training\n",
"estimator.fit(inputs, wait=True)"
]
},
{
"cell_type": "markdown",
"id": "ff486ad0",
"metadata": {},
"source": [
"## 4. Create and push the Docker container to an Amazon ECR repository \n",
"\n",
"In this step, we will create a Docker container containing the generated model along with its dependencies. If you bring a pre-trained model, you can upload it to S3 and use it to build the container. The following steps contains instructions for doing so."
]
},
{
"cell_type": "markdown",
"id": "f7a083d0",
"metadata": {},
"source": [
"### A) Retrieve the model pickle file \n",
"\n",
"* The model file generated using SageMaker's [built-in XGBoost algorithm](https://docs.aws.amazon.com/sagemaker/latest/dg/xgboost.html) will be a Python pickle file zipped up in a tar file named `model.tar.gz`. The S3 URI for this file will be available in the `model_data` attribute of the `estimator` object created in the training step.\n",
"\n",
"* If you bring your pre-trained model, you have to specify the S3 URI appropriately in the following cell.\n",
"\n",
"* The zip file needs to be downloaded from S3 and extracted.\n",
"\n",
"* The name of the extracted pickle file will depend on the framework and algorithm that was used to train the model. In this notebook example, we have used SageMaker's [built-in XGBoost algorithm](https://docs.aws.amazon.com/sagemaker/latest/dg/xgboost.html) and so the pickle file will be named `xgboost-model`. You will see this when the model tar file is extracted."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "921b6cd4",
"metadata": {},
"outputs": [],
"source": [
"# Create the container artifacts directory if it doesn't exist\n",
"os.makedirs(container_artifacts_dir, exist_ok=True)\n",
"\n",
"# Set the file paths\n",
"model_tar_file_s3_path_suffix = '{}/output/{}/output/{}'.format(nb_name,\n",
" estimator.latest_training_job.name,\n",
" model_tar_file_name)\n",
"model_tar_file_local_path = '{}/{}'.format(container_artifacts_dir, model_tar_file_name)\n",
"extracted_model_file_local_path = '{}/{}'.format(container_artifacts_dir, extracted_model_file_name)\n",
"\n",
"# Delete old model files if they exist\n",
"if os.path.exists(model_tar_file_local_path):\n",
" os.remove(model_tar_file_local_path)\n",
"if os.path.exists(extracted_model_file_local_path):\n",
" os.remove(extracted_model_file_local_path)\n",
"\n",
"# Download the model tar file from S3\n",
"s3_bucket_resource.download_file(model_tar_file_s3_path_suffix, model_tar_file_local_path)\n",
"\n",
"# Extract the model tar file and retrieve the model pickle file\n",
"with tarfile.open(model_tar_file_local_path, \"r:gz\") as tar:\n",
" tar.extractall(path=container_artifacts_dir)"
]
},
{
"cell_type": "markdown",
"id": "4711351d",
"metadata": {},
"source": [
"### B) (Optional) Test the model pickle file \n",
"\n",
"The code in the following cell entirely depends on the framework and algorithm that was used to train the model. The extracted Python3 pickle file will contain the appropriate object name. If you are bringing your own model file, you have to change this cell appropriately."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "2f7654f6",
"metadata": {},
"outputs": [],
"source": [
"# Load the model pickle file as a pickle object\n",
"pickle_file_path = extracted_model_file_local_path\n",
"with open(pickle_file_path, 'rb') as pkl_file:\n",
" model = pickle.load(pkl_file)\n",
"\n",
"# Run a prediction against the model loaded as a pickle object\n",
"# by sending the first record of the test dataset\n",
"test_pred_x_df = pd.read_csv(StringIO(','.join(map(str, x_test[0]))), sep=',', header=None)\n",
"test_pred_x = xgb.DMatrix(test_pred_x_df.values)\n",
"print('Input for prediction = {}'.format(test_pred_x_df.values))\n",
"print('Predicted value = {}'.format(model.predict(test_pred_x)[0]))\n",
"print('Actual value = {}'.format(y_test.values[0][0]))\n",
"print('Note: There may be a huge difference between the actual and predicted values as the model has not been tuned in the training step.')"
]
},
{
"cell_type": "markdown",
"id": "38eaefee",
"metadata": {},
"source": [
"### C) View the inference script \n",
"\n",
"The inference script is a Python3 [Flask](https://flask.palletsprojects.com/en/1.1.x/) app script that contains the following logic:\n",
"* Initialize the Flask web app server.\n",
"* Load the ML model pickle object into memory.\n",
"* Run the Flask web app server.\n",
"* Parse the request sent to the web app server.\n",
"* Run the prediction.\n",
"* Format the response to match with the parameter specified in the request.\n",
"* Return the response.\n",
"* Implement the healthcheck logic to return a success on invocation. This has to be called by the service hosting this container to perform health checks.\n",
"\n",
"The request should be in the following format:\n",
"\n",
"`{\n",
" \"response_content_type\": \"\",\n",
" \"pred_x_csv\": \"\"\n",
"}`\n",
"\n",
"This script will be packaged into the container that will be built in the upcoming steps.\n",
"\n",
"You can view the script by running the following code cell."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "aa793dff",
"metadata": {},
"outputs": [],
"source": [
"# View the Python3 Flask script (containing the inference code)\n",
"!cat {container_script_file}"
]
},
{
"cell_type": "markdown",
"id": "e813a177",
"metadata": {},
"source": [
"### D) Create the Dockerfile \n",
"\n",
"In this step, we will create a [Dockerfile](https://docs.docker.com/engine/reference/builder/) which is required to build our [Docker](https://www.docker.com/) container containing the model pickle file, an inference script and its dependencies.\n",
"\n",
"In order to create the container, we will use the [Amazon Linux 2 container image](https://gallery.ecr.aws/amazonlinux/amazonlinux) available in the [Amazon ECR public registry](https://aws.amazon.com/ecr/) as the base image. As this is a public registry, you do not require any credentials or permissions to download it.\n",
"\n",
"Note: At the time of writing this notebook, this image was based on [Amazon Linux 2](https://aws.amazon.com/amazon-linux-2/). Depending on the specific version you intend to use, you can suffix container image URL with the specific version after the `:` character in the following cell."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "ae485dae",
"metadata": {},
"outputs": [],
"source": [
"# Copy the inference script and requirements.txt to the container-artifacts directory\n",
"!cp -pr {container_script_file} {container_artifacts_dir}/server.py\n",
"!cp -pr {container_script_req_file} {container_artifacts_dir}/requirements.txt\n",
"\n",
"# Create the Dockerfile content\n",
"dockerfile_content_lines = []\n",
"dockerfile_content_lines.append('# syntax=docker/dockerfile:1\\n\\n')\n",
"dockerfile_content_lines.append('# Use Amazon Linux 2 as the base image\\n')\n",
"dockerfile_content_lines.append('FROM public.ecr.aws/amazonlinux/amazonlinux:latest\\n\\n')\n",
"dockerfile_content_lines.append('# Setup the working directory\\n')\n",
"dockerfile_content_lines.append('WORKDIR /\\n\\n')\n",
"dockerfile_content_lines.append('# Install Python3\\n')\n",
"dockerfile_content_lines.append('RUN yum -y install python3\\n\\n')\n",
"dockerfile_content_lines.append('# Upgrade pip\\n')\n",
"dockerfile_content_lines.append('RUN pip3 install --upgrade pip\\n\\n')\n",
"dockerfile_content_lines.append('# Setup the Python virtual env to run the inference script\\n')\n",
"dockerfile_content_lines.append('RUN python3 -m venv /opt/appenv\\n\\n')\n",
"dockerfile_content_lines.append('# Install the Python packages required for the inference script in the virtual env\\n')\n",
"dockerfile_content_lines.append('COPY requirements.txt .\\n')\n",
"dockerfile_content_lines.append('RUN /opt/appenv/bin/pip install -r requirements.txt\\n\\n')\n",
"dockerfile_content_lines.append('# Copy the extracted model file and the inference script\\n')\n",
"dockerfile_content_lines.append('COPY ')\n",
"dockerfile_content_lines.append(extracted_model_file_name)\n",
"dockerfile_content_lines.append(' ./\\n')\n",
"dockerfile_content_lines.append('COPY server.py ./\\n\\n')\n",
"dockerfile_content_lines.append('# Specify the ENV variables\\n')\n",
"dockerfile_content_lines.append('ENV MODEL_PICKLE_FILE_PATH=')\n",
"dockerfile_content_lines.append(extracted_model_file_name)\n",
"dockerfile_content_lines.append('\\n')\n",
"dockerfile_content_lines.append('ENV FLASK_SERVER_LOG_LEVEL=DEBUG\\n')\n",
"dockerfile_content_lines.append('ENV FLASK_SERVER_HOSTNAME=0.0.0.0\\n')\n",
"dockerfile_content_lines.append('ENV FLASK_SERVER_PORT=')\n",
"dockerfile_content_lines.append(apprunner_port)\n",
"dockerfile_content_lines.append('\\n')\n",
"dockerfile_content_lines.append('ENV FLASK_SERVER_DEBUG=True\\n\\n')\n",
"dockerfile_content_lines.append('# Specify the command to run the inference script as a Flask app\\n')\n",
"dockerfile_content_lines.append('ENTRYPOINT [\"/opt/appenv/bin/python\", \"server.py\"]')\n",
"\n",
"# Create the Dockerfile\n",
"dockerfile_local_path = '{}/Dockerfile'.format(container_artifacts_dir)\n",
"with open(dockerfile_local_path, 'wt') as file:\n",
" file.write(''.join(dockerfile_content_lines))\n",
" \n",
"# Print the contents of the generated Dockerfile\n",
"!cat {dockerfile_local_path}"
]
},
{
"cell_type": "markdown",
"id": "8108a5ec",
"metadata": {},
"source": [
"### E) Create the container \n",
"\n",
"Create the Docker container using the `docker build` command. Specify the container image name and point to the container-artifacts directory that contains all the files to build the container.\n",
"\n",
"Note: You may see warning messages when the container is built with the Dockerfile that we created in the prior step. These warnings will be around installing the Python packages that are required by the inference script. You can choose to either ignore or fix them."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "fe02c474",
"metadata": {},
"outputs": [],
"source": [
"# Create the Docker container\n",
"!docker build -t {container_image_name} {container_artifacts_dir}"
]
},
{
"cell_type": "markdown",
"id": "a1bb277e",
"metadata": {},
"source": [
"### F) Create the private repository in ECR \n",
"\n",
"In order to create an image-based service in AWS App Runner, the container image should exist in a container registry. In this notebook, we will create and use an [Amazon ECR](https://aws.amazon.com/ecr/) private repository for this purpose.\n",
"\n",
"In this step, we will check if the private repository in Amazon ECR that we intend to create already exists or not. If it does not exist, we will create it with the repository name the same as the container image name.\n",
"\n",
"Note: When creating the repository, setting the `scanOnPush` parameter to `True` will automatically initiate a vulnerability scan on the container image that is pushed to the repository. For more info on image scanning, refer [here](https://docs.aws.amazon.com/AmazonECR/latest/userguide/image-scanning.html)."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "9368ee5b",
"metadata": {},
"outputs": [],
"source": [
"# Check if the ECR repository exists already; if not, then create it\n",
"try:\n",
" ecr_client.describe_repositories(repositoryNames=[container_image_name])\n",
" print('ECR repository {} already exists.'.format(container_image_name))\n",
"except ecr_client.exceptions.RepositoryNotFoundException:\n",
" print('ECR repository {} does not exist.'.format(container_image_name))\n",
" print('Creating ECR repository {}...'.format(container_image_name))\n",
" # Create the ECR repository - here we use the container image name for the repository name\n",
" ecr_client.create_repository(repositoryName=container_image_name,\n",
" imageScanningConfiguration={\n",
" 'scanOnPush': True\n",
" })\n",
" print('Completed creating ECR repository {}.'.format(container_image_name))"
]
},
{
"cell_type": "markdown",
"id": "a7a7e0e1",
"metadata": {},
"source": [
"### G) Push the container to ECR \n",
"\n",
"In this step, we will push the container to a private registry that we created in Amazon ECR.\n",
"\n",
"When using an Amazon ECR private registry, you must authenticate your Docker client to your private registry so that you can use the `docker push` and `docker pull` commands to push and pull images to and from the repositories in that registry. For more information about this, refer [here](https://docs.aws.amazon.com/AmazonECR/latest/userguide/registry_auth.html).\n",
"\n",
"1. If this notebook instance is running on Amazon Linux v1, the authentication happens through an authorization token generated by an AWS CLI command in the following code cell. This token will be automatically deleted when the code cell completes execution.\n",
"2. If this notebook instance is running on Amazon Linux v2, the authentication happens through temporary credentials generated based on the IAM role attached to this notebook. For this, you have to complete the prerequisite mentioned in the first step of this notebook."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "d12e888e",
"metadata": {},
"outputs": [],
"source": [
"# Set the image names\n",
"source_image_name = '{}:latest'.format(container_image_name)\n",
"target_image_name = '{}/{}:latest'.format(container_registry_url_prefix, container_image_name)\n",
"\n",
"if os_version == 'ALv1':\n",
" # Get the private registry credentials using an authorization token\n",
" !aws ecr get-login-password --region {region_name} | docker login --username AWS --password-stdin {container_registry_url_prefix}\n",
"\n",
"# Tag the container\n",
"!docker tag {source_image_name} {target_image_name}\n",
"\n",
"# Push the container to the specified registry in Amazon ECR\n",
"!docker push {target_image_name}\n",
"\n",
"if os_version == 'ALv1':\n",
" # Delete the Docker credentials file\n",
" print('\\nDeleting the generated Docker credentials file...')\n",
" !rm /home/ec2-user/.docker/config.json\n",
" print('Completed deleting the generated Docker credentials file.')\n",
" # Verify the delete\n",
" print('Verifying the delete of the generated Docker credentials file...')\n",
" !cat /home/ec2-user/.docker/config.json\n",
" print('Completed verifying the delete of the generated Docker credentials file.')"
]
},
{
"cell_type": "markdown",
"id": "ff2a0337",
"metadata": {},
"source": [
"## 5. Deploy and test on AWS App Runner \n",
"\n",
"In this step, we will create an [image-based service](https://docs.aws.amazon.com/apprunner/latest/dg/service-source-image.html) on [AWS App Runner](https://docs.aws.amazon.com/apprunner/latest/dg/what-is-apprunner.html) using the Docker container that was created in the previous step and test it."
]
},
{
"cell_type": "markdown",
"id": "a1c42302",
"metadata": {},
"source": [
"### A) Create the App Runner service \n",
"\n",
"In this step, we will check if the App Runner service that we intend to create already exists or not. If it does not exist, we will create it.\n",
"\n",
"Note:\n",
"\n",
"* At the time of writing this notebook, AWS App Runner supported only public endpoints.\n",
"* If you intend to deploy this to Production with access security, then you have to build the authentication and authorization layers in the container prior to letting the call go to the inference script. This is not covered in this notebook.\n",
"* We have not configured this App Runner service to use a custom domain name. If you require it, refer to the instructions [here](https://docs.aws.amazon.com/apprunner/latest/dg/manage-custom-domains.html)."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "90a5f03e",
"metadata": {},
"outputs": [],
"source": [
"# Check if the App Runner service exists already; if not, then create it\n",
"create_service_flag = False\n",
"try:\n",
" describe_service_response = apprunner_client.describe_service(ServiceArn=apprunner_service_arn)\n",
" service_status = describe_service_response['Service']['Status']\n",
" print('App Runner service \\'{}\\' already exists and is in status \\'{}\\'.'.format(apprunner_service_name,\n",
" service_status))\n",
" if service_status == 'DELETED':\n",
" create_service_flag = True\n",
" else:\n",
" apprunner_service_arn = describe_service_response['Service']['ServiceArn']\n",
" apprunner_service_url = describe_service_response['Service']['ServiceUrl']\n",
"except (apprunner_client.exceptions.ResourceNotFoundException,\n",
" apprunner_client.exceptions.InvalidRequestException):\n",
" print('App Runner service {} does not exist.'.format(apprunner_service_name))\n",
" create_service_flag = True\n",
"\n",
"# Create the service based on the determined condition\n",
"if create_service_flag == True:\n",
" print('Creating App Runner service {}...'.format(apprunner_service_name))\n",
" create_service_response = apprunner_client.create_service(ServiceName=apprunner_service_name,\n",
" SourceConfiguration={\n",
" 'ImageRepository':{\n",
" 'ImageIdentifier':target_image_name,\n",
" 'ImageConfiguration':{\n",
" 'Port':apprunner_port\n",
" },\n",
" 'ImageRepositoryType':'ECR'\n",
" },\n",
" 'AutoDeploymentsEnabled':False,\n",
" 'AuthenticationConfiguration':{\n",
" 'AccessRoleArn':apprunner_access_role\n",
" }\n",
" },\n",
" InstanceConfiguration={\n",
" 'Cpu':apprunner_cpu,\n",
" 'Memory':apprunner_memory,\n",
" 'InstanceRoleArn':apprunner_instance_role\n",
" },\n",
" HealthCheckConfiguration={\n",
" 'Protocol':apprunner_healthcheck_protocol,\n",
" 'Path':apprunner_healthcheck_path,\n",
" 'Interval':apprunner_healthcheck_interval_in_seconds,\n",
" 'Timeout':apprunner_healthcheck_timeout_in_seconds,\n",
" 'HealthyThreshold':apprunner_healthcheck_healthy_threshold,\n",
" 'UnhealthyThreshold':apprunner_healthcheck_unhealthy_threshold\n",
" })\n",
" apprunner_service_arn = create_service_response['Service']['ServiceArn']\n",
" apprunner_service_url = create_service_response['Service']['ServiceUrl']\n",
" print('App Runner service status = {}'.format(create_service_response['Service']['Status']))\n",
" \n",
"# Print the service details\n",
"print('App Runner service ARN = {}'.format(apprunner_service_arn))\n",
"print('App Runner service URL = {}'.format(apprunner_service_url))"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "401eb51e",
"metadata": {},
"outputs": [],
"source": [
"# Sleep every 10 seconds and print the status of the AWS App Runner service\n",
"# until it goes to CREATE_FAILED, RUNNING, DELETED, DELETE_FAILED or PAUSED state\n",
"while True:\n",
" describe_service_response = apprunner_client.describe_service(ServiceArn=apprunner_service_arn)\n",
" describe_service_status = describe_service_response['Service']['Status']\n",
" print('App Runner service status = {}'.format(describe_service_status))\n",
" if describe_service_status in {'CREATE_FAILED', 'RUNNING', 'DELETED', 'DELETE_FAILED', 'PAUSED'}:\n",
" break\n",
" time.sleep(10)"
]
},
{
"cell_type": "markdown",
"id": "677532ef",
"metadata": {},
"source": [
"### B) Test the App Runner service \n",
"\n",
"In this step, we will test the App Runner service that we created in the previous step by invoking it synchronously. For this, we will invoke the Python3 [Flask](https://flask.palletsprojects.com/en/1.1.x/) app script running in the container by using the Public URL of the App Runner service.\n",
"\n",
"Invoke the endpoint by making a HTTP POST call with the first record of the test dataset as a CSV string. The request should be in the following format:\n",
"\n",
"`{\n",
" \"response_content_type\": \"\",\n",
" \"pred_x_csv\": \"\"\n",
"}`"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "0150323b",
"metadata": {},
"outputs": [],
"source": [
"# Set the request payload\n",
"x_test_request_payload_csv = ','.join(map(str, x_test[0]))\n",
"x_test_request_payload = '{' + '\"response_content_type\": \"application/json\",\"pred_x_csv\":\"{}\"'.format(x_test_request_payload_csv) + '}'\n",
"# Print the request\n",
"print('Request payload:\\n')\n",
"print(x_test_request_payload)\n",
"\n",
"# Invoke the App Runner service and print the response\n",
"apprunner_service_full_url = 'https://{}/'.format(apprunner_service_url)\n",
"print('\\nResponse:\\n')\n",
"!curl -X POST -H 'Content-Type: application/json' --data '{x_test_request_payload}' {apprunner_service_full_url}"
]
},
{
"cell_type": "markdown",
"id": "0965db8a",
"metadata": {},
"source": [
"## 6. Cleanup \n",
"\n",
"As a best practice, you should delete resources and S3 objects when no longer required. This will help you avoid incurring unncessary costs.\n",
"\n",
"This step will cleanup the resources and S3 objects created by this notebook.\n",
"\n",
"Note: Apart from these resources, there will be Docker containers and related images created in the notebook instance that is running this Jupyter notebook. As they are already part of the notebook instance, you do not need to delete them. If you decide to delete them, then go to the Terminal of the Jupyter notebook and and run appropriate `docker` commands."
]
},
{
"cell_type": "markdown",
"id": "22ac1349",
"metadata": {},
"source": [
"### A) Cleanup App Runner resources "
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "e09b6532",
"metadata": {},
"outputs": [],
"source": [
"# Delete the App Runner service\n",
"apprunner_client.delete_service(ServiceArn=apprunner_service_arn)"
]
},
{
"cell_type": "markdown",
"id": "2a83e0de",
"metadata": {},
"source": [
"### B) Cleanup ECR repository "
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "85113038",
"metadata": {},
"outputs": [],
"source": [
"# Delete the ECR private repository\n",
"try:\n",
" ecr_client.delete_repository(repositoryName=container_image_name, force=True)\n",
" print('ECR repository {} deleted.'.format(container_image_name))\n",
"except ecr_client.exceptions.RepositoryNotFoundException:\n",
" print('ECR repository {} does not exist.'.format(container_image_name))"
]
},
{
"cell_type": "markdown",
"id": "5b4e8ffa",
"metadata": {},
"source": [
"### C) Cleanup S3 objects "
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "a2939cfe",
"metadata": {},
"outputs": [],
"source": [
"# Delete data from S3 bucket\n",
"for file in s3_bucket_resource.objects.filter(Prefix='{}/'.format(nb_name)):\n",
" file_key = file.key\n",
" print('Deleting {} ...'.format(file_key))\n",
" s3_resource.Object(s3_bucket_resource.name, file_key).delete()"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "311d41bf",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "conda_python3",
"language": "python",
"name": "conda_python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.13"
}
},
"nbformat": 4,
"nbformat_minor": 5
}