{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Lab: Bring your own custom container with Amazon SageMaker"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Overview\n",
    "\n",
    "### Background\n",
    "Here, we'll show how to bring your docker cotainer that packages your environment and code. We showcase the [decision tree](http://scikit-learn.org/stable/modules/tree.html) algorithm from the widely used [scikit-learn](http://scikit-learn.org/stable/) machine learning package. The example is purposefully fairly trivial since the point is to show the surrounding structure that you'll want to add to your own container so you can bring it to Amazon SageMaker for training and hosting.\n",
    "\n",
    "\n",
    "### High-level overview\n",
    "\n",
    "The following diagram shows how you typically train and deploy a model with Amazon SageMaker:"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<div>\n",
    "<img src=\"https://docs.aws.amazon.com/sagemaker/latest/dg/images/sagemaker-architecture.png\" width=\"450\"/>\n",
    "</div>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The area labeled SageMaker highlights the two components of SageMaker: model training and model deployment. The area labeled [EC2 container registry](https://aws.amazon.com/ecr/) is where we store, manage, and deploy our Docker container images. The training data and model artifacts are stored in S3 bucket. \n",
    "\n",
    "In this lab, we use a single image to support both model training and hosting for simplicity. Sometimes you’ll want separate images for training and hosting because they have different requirements. \n",
    "\n",
    "The high-level steps include:\n",
    "1. **Building the container** - We walk through the different components of the containers and inspect the docker file. Then we build and push the container to ECR. \n",
    "2. **Setup & Upload Data** - Once our container is built and registered. We ready sagemaker and upload the data to S3. \n",
    "3. **Model Training** - Create a training job using SageMaker Python SDK. It will pull data from S3 and use the container we built.  \n",
    "4. **Model Deployment** - Once training is complete, deploy our model to a HTTP endpoint using SageMaker Python SDK. \n",
    "5. **Run Inferences** - Run predictions to test our model.\n",
    "6. **Cleanup**\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Building the container\n",
    "[Docker](https://aws.amazon.com/docker/#:~:text=Docker%20is%20a%20software%20platform,test%2C%20and%20deploy%20applications%20quickly.&text=Running%20Docker%20on%20AWS%20provides,distributed%20applications%20at%20any%20scale.) packages software into standardized units called [containers](https://aws.amazon.com/containers/) that have everything the software needs to run including libraries, system tools, code, and runtime. Using Docker, you can quickly deploy and scale applications into any environment and know your code will run.\n",
    "\n",
    "\n",
    "Amazon SageMaker uses Docker to allow users to train and deploy arbitrary algorithms. More details on [how to use docker containers with sagemaker](https://docs.aws.amazon.com/sagemaker/latest/dg/docker-containers.html)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Walkthrough of the container directory\n",
    "You can find the source code of the sample container we are using in [this GitHub repository](https://github.com/aws/amazon-sagemaker-examples/tree/main/advanced_functionality/scikit_bring_your_own). \n",
    "\n",
    "The container directory contains all the components you need to package for SageMaker:"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "```\n",
    ".\n",
    "|-- Dockerfile\n",
    "|-- build_and_push.sh\n",
    "|-- local_test\n",
    "`-- decision_trees\n",
    "    |-- nginx.conf\n",
    "    |-- predictor.py\n",
    "    |-- serve\n",
    "    |-- train\n",
    "    `-- wsgi.py\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let’s discuss each of these in turn:\n",
    "\n",
    "- `Dockerfile` describes how to build your Docker container image. More details below.\n",
    "- `build_and_push.sh` is a script that uses the Dockerfile to build your container images and then pushes it to ECR. We’ll invoke the commands directly later in this notebook, but you can just copy and run the script for your own algorithms.\n",
    "- `local_test` is a directory that shows how to test your new container on any computer that can run Docker, including an Amazon SageMaker notebook instance. Using this method, you can quickly iterate using small datasets to eliminate any structural bugs before you use the container with Amazon SageMaker. Testing is not the focus of this lab, but feel free to checkout the example at your own time.  \n",
    "- `decision_trees` is the directory which contains the files that will be installed in the container.\n",
    "\n",
    "In this simple application, we only install five files in the container. These five show the standard structure of our Python containers, although you are free to choose a different toolset or programming language and therefore could have a different layout.\n",
    "\n",
    "The files that we’ll put in the container are:\n",
    "\n",
    "- `nginx.conf` is the configuration file for the nginx front-end. Generally, you should be able to take this file as-is.\n",
    "- `predictor.py` is the program that actually implements the Flask web server and the decision tree predictions for this app. You’ll want to customize the actual prediction parts to your application. Since this algorithm is simple, we do all the processing here in this file, but you may choose to have separate files for implementing your custom logic.\n",
    "- `serve` is the program started when the container is started for hosting. It simply launches the gunicorn server which runs multiple instances of the Flask app defined in predictor.py. You should be able to take this file as-is.\n",
    "- `train` is the program that is invoked when the container is run for training. You will modify this program to implement your training algorithm.\n",
    "- `wsgi.py` is a small wrapper used to invoke the Flask app. You should be able to take this file as-is.\n",
    "\n",
    "In summary, the two files you will probably want to change for your application are `train` and `predictor.py`"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Install packages\n",
    "Please choose `Python 3 (Base Python 3.0)` kernel to proceed.\n",
    "\n",
    "We will first install the prerequisite packages:\n",
    "- [**aiobotocore**](https://aiobotocore.readthedocs.io/en/latest/): adds async support for AWS services with [botocore](https://github.com/boto/botocore).\n",
    "- [**sagemaker-studio-image-build**](https://pypi.org/project/sagemaker-studio-image-build/): CLI for building Docker images in SageMaker Studio using [AWS CodeBuild](https://aws.amazon.com/codebuild/)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Requirement already satisfied: pip in /opt/conda/lib/python3.10/site-packages (23.1.2)\n",
      "\u001b[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.\n",
      "awscli 1.22.76 requires s3transfer<0.6.0,>=0.5.0, but you have s3transfer 0.6.1 which is incompatible.\n",
      "boto3 1.26.147 requires botocore<1.30.0,>=1.29.147, but you have botocore 1.24.21 which is incompatible.\u001b[0m\u001b[31m\n",
      "\u001b[0m\u001b[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.\n",
      "sagemaker 2.145.0 requires boto3<2.0,>=1.26.28, but you have boto3 1.21.21 which is incompatible.\n",
      "sagemaker-studio-analytics-extension 0.0.18 requires boto3<2.0,>=1.26.49, but you have boto3 1.21.21 which is incompatible.\u001b[0m\u001b[31m\n",
      "\u001b[0m\u001b[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.\n",
      "aiobotocore 2.3.0 requires botocore<1.24.22,>=1.24.21, but you have botocore 1.29.148 which is incompatible.\n",
      "awscli 1.22.76 requires botocore==1.24.21, but you have botocore 1.29.148 which is incompatible.\n",
      "awscli 1.22.76 requires s3transfer<0.6.0,>=0.5.0, but you have s3transfer 0.6.1 which is incompatible.\u001b[0m\u001b[31m\n",
      "\u001b[0mCollecting awswrangler==2.14.0\n",
      "  Downloading awswrangler-2.14.0-py3-none-any.whl (226 kB)\n",
      "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m226.6/226.6 kB\u001b[0m \u001b[31m3.7 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m00:01\u001b[0m\n",
      "\u001b[?25hRequirement already satisfied: boto3<2.0.0,>=1.20.17 in /opt/conda/lib/python3.10/site-packages (from awswrangler==2.14.0) (1.26.148)\n",
      "Requirement already satisfied: botocore<2.0.0,>=1.23.17 in /opt/conda/lib/python3.10/site-packages (from awswrangler==2.14.0) (1.29.148)\n",
      "Collecting jsonpath-ng<2.0.0,>=1.5.3 (from awswrangler==2.14.0)\n",
      "  Downloading jsonpath_ng-1.5.3-py3-none-any.whl (29 kB)\n",
      "Requirement already satisfied: numpy<2.0.0,>=1.21.0 in /opt/conda/lib/python3.10/site-packages (from awswrangler==2.14.0) (1.24.2)\n",
      "Requirement already satisfied: openpyxl<3.1.0,>=3.0.0 in /opt/conda/lib/python3.10/site-packages (from awswrangler==2.14.0) (3.0.10)\n",
      "Collecting opensearch-py<2.0.0,>=1.0.0 (from awswrangler==2.14.0)\n",
      "  Downloading opensearch_py-1.1.0-py2.py3-none-any.whl (207 kB)\n",
      "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m207.5/207.5 kB\u001b[0m \u001b[31m4.1 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m00:01\u001b[0m\n",
      "\u001b[?25hRequirement already satisfied: pandas<1.4.0,>=1.2.0 in /opt/conda/lib/python3.10/site-packages (from awswrangler==2.14.0) (1.3.5)\n",
      "Collecting pg8000<1.23.0,>=1.16.0 (from awswrangler==2.14.0)\n",
      "  Downloading pg8000-1.22.1-py3-none-any.whl (33 kB)\n",
      "Collecting progressbar2<4.0.0,>=3.53.3 (from awswrangler==2.14.0)\n",
      "  Downloading progressbar2-3.55.0-py2.py3-none-any.whl (26 kB)\n",
      "Collecting pyarrow<6.1.0,>=2.0.0 (from awswrangler==2.14.0)\n",
      "  Downloading pyarrow-6.0.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (25.6 MB)\n",
      "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m25.6/25.6 MB\u001b[0m \u001b[31m34.1 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m00:01\u001b[0m00:01\u001b[0m\n",
      "\u001b[?25hCollecting pymysql<1.1.0,>=0.9.0 (from awswrangler==2.14.0)\n",
      "  Downloading PyMySQL-1.0.3-py3-none-any.whl (43 kB)\n",
      "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m43.7/43.7 kB\u001b[0m \u001b[31m1.1 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
      "\u001b[?25hCollecting redshift-connector<2.1.0,>=2.0.889 (from awswrangler==2.14.0)\n",
      "  Downloading redshift_connector-2.0.911-py3-none-any.whl (112 kB)\n",
      "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m112.6/112.6 kB\u001b[0m \u001b[31m2.3 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0ma \u001b[36m0:00:01\u001b[0m\n",
      "\u001b[?25hCollecting requests-aws4auth<2.0.0,>=1.1.1 (from awswrangler==2.14.0)\n",
      "  Downloading requests_aws4auth-1.2.3-py2.py3-none-any.whl (24 kB)\n",
      "Requirement already satisfied: jmespath<2.0.0,>=0.7.1 in /opt/conda/lib/python3.10/site-packages (from boto3<2.0.0,>=1.20.17->awswrangler==2.14.0) (0.10.0)\n",
      "Requirement already satisfied: s3transfer<0.7.0,>=0.6.0 in /opt/conda/lib/python3.10/site-packages (from boto3<2.0.0,>=1.20.17->awswrangler==2.14.0) (0.6.1)\n",
      "Requirement already satisfied: python-dateutil<3.0.0,>=2.1 in /opt/conda/lib/python3.10/site-packages (from botocore<2.0.0,>=1.23.17->awswrangler==2.14.0) (2.8.2)\n",
      "Requirement already satisfied: urllib3<1.27,>=1.25.4 in /opt/conda/lib/python3.10/site-packages (from botocore<2.0.0,>=1.23.17->awswrangler==2.14.0) (1.26.15)\n",
      "Requirement already satisfied: ply in /opt/conda/lib/python3.10/site-packages (from jsonpath-ng<2.0.0,>=1.5.3->awswrangler==2.14.0) (3.11)\n",
      "Requirement already satisfied: decorator in /opt/conda/lib/python3.10/site-packages (from jsonpath-ng<2.0.0,>=1.5.3->awswrangler==2.14.0) (5.1.1)\n",
      "Requirement already satisfied: six in /opt/conda/lib/python3.10/site-packages (from jsonpath-ng<2.0.0,>=1.5.3->awswrangler==2.14.0) (1.16.0)\n",
      "Requirement already satisfied: et_xmlfile in /opt/conda/lib/python3.10/site-packages (from openpyxl<3.1.0,>=3.0.0->awswrangler==2.14.0) (1.1.0)\n",
      "Requirement already satisfied: certifi in /opt/conda/lib/python3.10/site-packages (from opensearch-py<2.0.0,>=1.0.0->awswrangler==2.14.0) (2022.12.7)\n",
      "Requirement already satisfied: pytz>=2017.3 in /opt/conda/lib/python3.10/site-packages (from pandas<1.4.0,>=1.2.0->awswrangler==2.14.0) (2022.1)\n",
      "Collecting scramp>=1.4.1 (from pg8000<1.23.0,>=1.16.0->awswrangler==2.14.0)\n",
      "  Downloading scramp-1.4.4-py3-none-any.whl (13 kB)\n",
      "Collecting python-utils>=2.3.0 (from progressbar2<4.0.0,>=3.53.3->awswrangler==2.14.0)\n",
      "  Downloading python_utils-3.6.0-py2.py3-none-any.whl (25 kB)\n",
      "Requirement already satisfied: beautifulsoup4<5.0.0,>=4.7.0 in /opt/conda/lib/python3.10/site-packages (from redshift-connector<2.1.0,>=2.0.889->awswrangler==2.14.0) (4.11.1)\n",
      "Requirement already satisfied: requests<3.0.0,>=2.23.0 in /opt/conda/lib/python3.10/site-packages (from redshift-connector<2.1.0,>=2.0.889->awswrangler==2.14.0) (2.28.2)\n",
      "Requirement already satisfied: lxml>=4.6.5 in /opt/conda/lib/python3.10/site-packages (from redshift-connector<2.1.0,>=2.0.889->awswrangler==2.14.0) (4.9.2)\n",
      "Requirement already satisfied: packaging in /opt/conda/lib/python3.10/site-packages (from redshift-connector<2.1.0,>=2.0.889->awswrangler==2.14.0) (21.3)\n",
      "Requirement already satisfied: setuptools in /opt/conda/lib/python3.10/site-packages (from redshift-connector<2.1.0,>=2.0.889->awswrangler==2.14.0) (67.6.1)\n",
      "Requirement already satisfied: soupsieve>1.2 in /opt/conda/lib/python3.10/site-packages (from beautifulsoup4<5.0.0,>=4.7.0->redshift-connector<2.1.0,>=2.0.889->awswrangler==2.14.0) (2.3.1)\n",
      "Requirement already satisfied: typing-extensions in /opt/conda/lib/python3.10/site-packages (from python-utils>=2.3.0->progressbar2<4.0.0,>=3.53.3->awswrangler==2.14.0) (4.3.0)\n",
      "Requirement already satisfied: charset-normalizer<4,>=2 in /opt/conda/lib/python3.10/site-packages (from requests<3.0.0,>=2.23.0->redshift-connector<2.1.0,>=2.0.889->awswrangler==2.14.0) (2.0.4)\n",
      "Requirement already satisfied: idna<4,>=2.5 in /opt/conda/lib/python3.10/site-packages (from requests<3.0.0,>=2.23.0->redshift-connector<2.1.0,>=2.0.889->awswrangler==2.14.0) (3.3)\n",
      "Collecting asn1crypto>=1.5.1 (from scramp>=1.4.1->pg8000<1.23.0,>=1.16.0->awswrangler==2.14.0)\n",
      "  Downloading asn1crypto-1.5.1-py2.py3-none-any.whl (105 kB)\n",
      "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m105.0/105.0 kB\u001b[0m \u001b[31m2.0 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m00:01\u001b[0m\n",
      "\u001b[?25hRequirement already satisfied: pyparsing!=3.0.5,>=2.0.2 in /opt/conda/lib/python3.10/site-packages (from packaging->redshift-connector<2.1.0,>=2.0.889->awswrangler==2.14.0) (3.0.9)\n",
      "Installing collected packages: asn1crypto, scramp, python-utils, pymysql, pyarrow, opensearch-py, jsonpath-ng, requests-aws4auth, progressbar2, pg8000, redshift-connector, awswrangler\n",
      "  Attempting uninstall: pyarrow\n",
      "    Found existing installation: pyarrow 11.0.0\n",
      "    Uninstalling pyarrow-11.0.0:\n",
      "      Successfully uninstalled pyarrow-11.0.0\n",
      "Successfully installed asn1crypto-1.5.1 awswrangler-2.14.0 jsonpath-ng-1.5.3 opensearch-py-1.1.0 pg8000-1.22.1 progressbar2-3.55.0 pyarrow-6.0.1 pymysql-1.0.3 python-utils-3.6.0 redshift-connector-2.0.911 requests-aws4auth-1.2.3 scramp-1.4.4\n"
     ]
    }
   ],
   "source": [
    "# cell 00\n",
    "!pip install --root-user-action=ignore --upgrade pip  \n",
    "!pip install --root-user-action=ignore -q s3fs==2022.5.0\n",
    "!pip install --root-user-action=ignore -q boto3==1.21.21\n",
    "!pip install --root-user-action=ignore -q botocore==1.24.21\n",
    "!pip install --root-user-action=ignore -q awscli==1.22.76\n",
    "!pip install --root-user-action=ignore -Uq aiobotocore==2.3.0\n",
    "!pip install --root-user-action=ignore -q pandas==1.3.5\n",
    "!pip install --root-user-action=ignore -q sagemaker-studio-image-build\n",
    "!pip install --root-user-action=ignore awswrangler==2.14.0"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We will then unzip and copy over the files we need:\n",
    "- `scikit_bring_your_own/container` → `lab03_container`\n",
    "- `scikit_bring_your_own/data` → `lab03_data` "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [],
   "source": [
    "# cell 01\n",
    "!unzip -q scikit_bring_your_own.zip\n",
    "!mv scikit_bring_your_own/data/ ./lab03_data/\n",
    "!mv scikit_bring_your_own/container/ ./lab03_container/\n",
    "!rm -rf scikit_bring_your_own"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### The Dockerfile\n",
    "The `Dockerfile` describes the image that we want to build. You can think of it as describing the complete operating system installation of the system that you want to run. A Docker container running is quite a bit lighter than a full operating system, however, because it takes advantage of Linux on the host machine for the basic operations.\n",
    "\n",
    "For the Python science stack, we will start from a standard Ubuntu installation and run the normal tools to install the things needed by `scikit-learn`. Finally, we add the code that implements our specific algorithm to the container and set up the right environment to run under."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's take a look of what's inside our `Dockerfile`:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {
    "scrolled": true,
    "tags": []
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\u001b[37m# Build an image that can do training and inference in SageMaker\u001b[39;49;00m\n",
      "\u001b[37m# This is a Python 3 image that uses the nginx, gunicorn, flask stack\u001b[39;49;00m\n",
      "\u001b[37m# for serving inferences in a stable way.\u001b[39;49;00m\n",
      "\n",
      "\u001b[34mFROM\u001b[39;49;00m\u001b[37m \u001b[39;49;00m\u001b[33mubuntu:18.04\u001b[39;49;00m\n",
      "\n",
      "\u001b[34mMAINTAINER\u001b[39;49;00m\u001b[37m \u001b[39;49;00m\u001b[33mAmazon AI <sage-learner@amazon.com>\u001b[39;49;00m\n",
      "\n",
      "\n",
      "\u001b[34mRUN\u001b[39;49;00m\u001b[37m \u001b[39;49;00mapt-get -y update && apt-get install -y --no-install-recommends \u001b[33m\\\u001b[39;49;00m\n",
      "         wget \u001b[33m\\\u001b[39;49;00m\n",
      "         python3-pip \u001b[33m\\\u001b[39;49;00m\n",
      "         python3-setuptools \u001b[33m\\\u001b[39;49;00m\n",
      "         nginx \u001b[33m\\\u001b[39;49;00m\n",
      "         ca-certificates \u001b[33m\\\u001b[39;49;00m\n",
      "    && rm -rf /var/lib/apt/lists/*\n",
      "\n",
      "\u001b[34mRUN\u001b[39;49;00m\u001b[37m \u001b[39;49;00mln -s /usr/bin/python3 /usr/bin/python\n",
      "\u001b[34mRUN\u001b[39;49;00m\u001b[37m \u001b[39;49;00mln -s /usr/bin/pip3 /usr/bin/pip\n",
      "\n",
      "\u001b[37m# Here we get all python packages.\u001b[39;49;00m\n",
      "\u001b[37m# There's substantial overlap between scipy and numpy that we eliminate by\u001b[39;49;00m\n",
      "\u001b[37m# linking them together. Likewise, pip leaves the install caches populated which uses\u001b[39;49;00m\n",
      "\u001b[37m# a significant amount of space. These optimizations save a fair amount of space in the\u001b[39;49;00m\n",
      "\u001b[37m# image, which reduces start up time.\u001b[39;49;00m\n",
      "\u001b[34mRUN\u001b[39;49;00m\u001b[37m \u001b[39;49;00mpip --no-cache-dir install \u001b[31mnumpy\u001b[39;49;00m==\u001b[34m1\u001b[39;49;00m.16.2 \u001b[31mscipy\u001b[39;49;00m==\u001b[34m1\u001b[39;49;00m.2.1 scikit-learn==\u001b[34m0\u001b[39;49;00m.20.2 pandas flask gunicorn\n",
      "\n",
      "\u001b[37m# Set some environment variables. PYTHONUNBUFFERED keeps Python from buffering our standard\u001b[39;49;00m\n",
      "\u001b[37m# output stream, which means that logs can be delivered to the user quickly. PYTHONDONTWRITEBYTECODE\u001b[39;49;00m\n",
      "\u001b[37m# keeps Python from writing the .pyc files which are unnecessary in this case. We also update\u001b[39;49;00m\n",
      "\u001b[37m# PATH so that the train and serve programs are found when the container is invoked.\u001b[39;49;00m\n",
      "\n",
      "\u001b[34mENV\u001b[39;49;00m\u001b[37m \u001b[39;49;00m\u001b[31mPYTHONUNBUFFERED\u001b[39;49;00m=TRUE\n",
      "\u001b[34mENV\u001b[39;49;00m\u001b[37m \u001b[39;49;00m\u001b[31mPYTHONDONTWRITEBYTECODE\u001b[39;49;00m=TRUE\n",
      "\u001b[34mENV\u001b[39;49;00m\u001b[37m \u001b[39;49;00m\u001b[31mPATH\u001b[39;49;00m=\u001b[33m\"\u001b[39;49;00m\u001b[33m/opt/program:\u001b[39;49;00m\u001b[33m${\u001b[39;49;00m\u001b[31mPATH\u001b[39;49;00m\u001b[33m}\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m\n",
      "\n",
      "\u001b[37m# Set up the program in the image\u001b[39;49;00m\n",
      "\u001b[34mCOPY\u001b[39;49;00m\u001b[37m \u001b[39;49;00mdecision_trees /opt/program\n",
      "\u001b[34mWORKDIR\u001b[39;49;00m\u001b[37m \u001b[39;49;00m\u001b[33m/opt/program\u001b[39;49;00m\n"
     ]
    }
   ],
   "source": [
    "!pygmentize lab03_container/Dockerfile"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Building and registering the container"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "> **NOTE** *In the next cell, if you run into IAM permission issue related to CodeBuild, make sure that you follow the Prerequisites steps outlined in the [immersion day lab instructions](https://catalog.us-east-1.prod.workshops.aws/v2/workshops/63069e26-921c-4ce1-9cc7-dd882ff62575/en-US/lab3/option2#prerequisites)*"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {
    "scrolled": true,
    "tags": []
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      ".................[Container] 2023/06/06 19:51:11 Waiting for agent ping\n",
      "\n",
      "[Container] 2023/06/06 19:51:12 Waiting for DOWNLOAD_SOURCE\n",
      "[Container] 2023/06/06 19:51:14 Phase is DOWNLOAD_SOURCE\n",
      "[Container] 2023/06/06 19:51:14 CODEBUILD_SRC_DIR=/codebuild/output/src005335491/src\n",
      "[Container] 2023/06/06 19:51:14 YAML location is /codebuild/output/src005335491/src/buildspec.yml\n",
      "[Container] 2023/06/06 19:51:14 Setting HTTP client timeout to higher timeout for S3 source\n",
      "[Container] 2023/06/06 19:51:14 Processing environment variables\n",
      "[Container] 2023/06/06 19:51:14 No runtime version selected in buildspec.\n",
      "[Container] 2023/06/06 19:51:14 Moving to directory /codebuild/output/src005335491/src\n",
      "[Container] 2023/06/06 19:51:14 Configuring ssm agent with target id: codebuild:cf1c1f8e-d80f-4e88-b8ba-dec8479f7fba\n",
      "[Container] 2023/06/06 19:51:14 Successfully updated ssm agent configuration\n",
      "[Container] 2023/06/06 19:51:14 Registering with agent\n",
      "[Container] 2023/06/06 19:51:14 Phases found in YAML: 3\n",
      "[Container] 2023/06/06 19:51:14  POST_BUILD: 3 commands\n",
      "[Container] 2023/06/06 19:51:14  PRE_BUILD: 9 commands\n",
      "[Container] 2023/06/06 19:51:14  BUILD: 4 commands\n",
      "[Container] 2023/06/06 19:51:14 Phase complete: DOWNLOAD_SOURCE State: SUCCEEDED\n",
      "[Container] 2023/06/06 19:51:14 Phase context status code:  Message:\n",
      "[Container] 2023/06/06 19:51:14 Entering phase INSTALL\n",
      "[Container] 2023/06/06 19:51:14 Phase complete: INSTALL State: SUCCEEDED\n",
      "[Container] 2023/06/06 19:51:14 Phase context status code:  Message:\n",
      "[Container] 2023/06/06 19:51:14 Entering phase PRE_BUILD\n",
      "[Container] 2023/06/06 19:51:14 Running command echo Logging in to Amazon ECR...\n",
      "Logging in to Amazon ECR...\n",
      "\n",
      "[Container] 2023/06/06 19:51:14 Running command $(aws ecr get-login --no-include-email --region $AWS_DEFAULT_REGION)\n",
      "WARNING! Using --password via the CLI is insecure. Use --password-stdin.\n",
      "WARNING! Your password will be stored unencrypted in /root/.docker/config.json.\n",
      "Configure a credential helper to remove this warning. See\n",
      "https://docs.docker.com/engine/reference/commandline/login/#credentials-store\n",
      "\n",
      "Login Succeeded\n",
      "\n",
      "[Container] 2023/06/06 19:51:16 Running command $(aws ecr get-login --no-include-email --region $AWS_DEFAULT_REGION --registry-ids 763104351884)\n",
      "WARNING! Using --password via the CLI is insecure. Use --password-stdin.\n",
      "WARNING! Your password will be stored unencrypted in /root/.docker/config.json.\n",
      "Configure a credential helper to remove this warning. See\n",
      "https://docs.docker.com/engine/reference/commandline/login/#credentials-store\n",
      "\n",
      "Login Succeeded\n",
      "\n",
      "[Container] 2023/06/06 19:51:17 Running command $(aws ecr get-login --no-include-email --region $AWS_DEFAULT_REGION --registry-ids 217643126080)\n",
      "WARNING! Using --password via the CLI is insecure. Use --password-stdin.\n",
      "WARNING! Your password will be stored unencrypted in /root/.docker/config.json.\n",
      "Configure a credential helper to remove this warning. See\n",
      "https://docs.docker.com/engine/reference/commandline/login/#credentials-store\n",
      "\n",
      "Login Succeeded\n",
      "\n",
      "[Container] 2023/06/06 19:51:17 Running command $(aws ecr get-login --no-include-email --region $AWS_DEFAULT_REGION --registry-ids 727897471807)\n",
      "WARNING! Using --password via the CLI is insecure. Use --password-stdin.\n",
      "WARNING! Your password will be stored unencrypted in /root/.docker/config.json.\n",
      "Configure a credential helper to remove this warning. See\n",
      "https://docs.docker.com/engine/reference/commandline/login/#credentials-store\n",
      "\n",
      "Login Succeeded\n",
      "\n",
      "[Container] 2023/06/06 19:51:18 Running command $(aws ecr get-login --no-include-email --region $AWS_DEFAULT_REGION --registry-ids 626614931356)\n",
      "WARNING! Using --password via the CLI is insecure. Use --password-stdin.\n",
      "WARNING! Your password will be stored unencrypted in /root/.docker/config.json.\n",
      "Configure a credential helper to remove this warning. See\n",
      "https://docs.docker.com/engine/reference/commandline/login/#credentials-store\n",
      "\n",
      "Login Succeeded\n",
      "\n",
      "[Container] 2023/06/06 19:51:18 Running command $(aws ecr get-login --no-include-email --region $AWS_DEFAULT_REGION --registry-ids 683313688378)\n",
      "WARNING! Using --password via the CLI is insecure. Use --password-stdin.\n",
      "WARNING! Your password will be stored unencrypted in /root/.docker/config.json.\n",
      "Configure a credential helper to remove this warning. See\n",
      "https://docs.docker.com/engine/reference/commandline/login/#credentials-store\n",
      "\n",
      "Login Succeeded\n",
      "\n",
      "[Container] 2023/06/06 19:51:19 Running command $(aws ecr get-login --no-include-email --region $AWS_DEFAULT_REGION --registry-ids 520713654638)\n",
      "WARNING! Using --password via the CLI is insecure. Use --password-stdin.\n",
      "WARNING! Your password will be stored unencrypted in /root/.docker/config.json.\n",
      "Configure a credential helper to remove this warning. See\n",
      "https://docs.docker.com/engine/reference/commandline/login/#credentials-store\n",
      "\n",
      "Login Succeeded\n",
      "\n",
      "[Container] 2023/06/06 19:51:19 Running command $(aws ecr get-login --no-include-email --region $AWS_DEFAULT_REGION --registry-ids 462105765813)\n",
      "WARNING! Using --password via the CLI is insecure. Use --password-stdin.\n",
      "WARNING! Your password will be stored unencrypted in /root/.docker/config.json.\n",
      "Configure a credential helper to remove this warning. See\n",
      "https://docs.docker.com/engine/reference/commandline/login/#credentials-store\n",
      "\n",
      "Login Succeeded\n",
      "\n",
      "[Container] 2023/06/06 19:51:20 Phase complete: PRE_BUILD State: SUCCEEDED\n",
      "[Container] 2023/06/06 19:51:20 Phase context status code:  Message:\n",
      "[Container] 2023/06/06 19:51:20 Entering phase BUILD\n",
      "[Container] 2023/06/06 19:51:20 Running command echo Build started on `date`\n",
      "Build started on Tue Jun 6 19:51:20 UTC 2023\n",
      "\n",
      "[Container] 2023/06/06 19:51:20 Running command echo Building the Docker image...\n",
      "Building the Docker image...\n",
      "\n",
      "[Container] 2023/06/06 19:51:20 Running command docker build -t $IMAGE_REPO_NAME:$IMAGE_TAG .\n",
      "Sending build context to Docker daemon  103.9kB\n",
      "Step 1/11 : FROM ubuntu:18.04\n",
      "18.04: Pulling from library/ubuntu\n",
      "toomanyrequests: You have reached your pull rate limit. You may increase the limit by authenticating and upgrading: https://www.docker.com/increase-rate-limit\n",
      "\n",
      "[Container] 2023/06/06 19:51:20 Command did not exit successfully docker build -t $IMAGE_REPO_NAME:$IMAGE_TAG . exit status 1\n",
      "[Container] 2023/06/06 19:51:20 Phase complete: BUILD State: FAILED\n",
      "[Container] 2023/06/06 19:51:20 Phase context status code: COMMAND_EXECUTION_ERROR Message: Error while executing command: docker build -t $IMAGE_REPO_NAME:$IMAGE_TAG .. Reason: exit status 1\n",
      "[Container] 2023/06/06 19:51:20 Entering phase POST_BUILD\n",
      "[Container] 2023/06/06 19:51:20 Running command echo Build completed on `date`\n",
      "Build completed on Tue Jun 6 19:51:20 UTC 2023\n",
      "\n",
      "[Container] 2023/06/06 19:51:20 Running command echo Pushing the Docker image...\n",
      "Pushing the Docker image...\n",
      "\n",
      "[Container] 2023/06/06 19:51:20 Running command docker push $AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com/$IMAGE_REPO_NAME:$IMAGE_TAG\n",
      "The push refers to repository [607789152815.dkr.ecr.us-east-1.amazonaws.com/sagemaker-decision-trees]\n",
      "An image does not exist locally with the tag: 607789152815.dkr.ecr.us-east-1.amazonaws.com/sagemaker-decision-trees\n",
      "\n",
      "[Container] 2023/06/06 19:51:20 Command did not exit successfully docker push $AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com/$IMAGE_REPO_NAME:$IMAGE_TAG exit status 1\n",
      "[Container] 2023/06/06 19:51:20 Phase complete: POST_BUILD State: FAILED\n",
      "[Container] 2023/06/06 19:51:20 Phase context status code: COMMAND_EXECUTION_ERROR Message: Error while executing command: docker push $AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com/$IMAGE_REPO_NAME:$IMAGE_TAG. Reason: exit status 1\n",
      "\n",
      "Image URI: 607789152815.dkr.ecr.us-east-1.amazonaws.com/sagemaker-decision-trees:latest\n"
     ]
    }
   ],
   "source": [
    "%%sh\n",
    "# cell 02\n",
    "\n",
    "cd lab03_container\n",
    "\n",
    "chmod +x decision_trees/train\n",
    "chmod +x decision_trees/serve\n",
    "\n",
    "sm-docker build .  --repository sagemaker-decision-trees:latest"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Setup & Upload Data"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Setup the Environment \n",
    "Here we specify a bucket to use and the role that will be used for working with SageMaker.\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {},
   "outputs": [],
   "source": [
    "# cell 03\n",
    "\n",
    "S3_prefix = 'DEMO-scikit-byo-iris'\n",
    "\n",
    "# Define IAM role\n",
    "import boto3\n",
    "import re\n",
    "\n",
    "import os\n",
    "import numpy as np\n",
    "import pandas as pd\n",
    "from sagemaker import get_execution_role\n",
    "\n",
    "role = get_execution_role()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The session remembers our connection parameters to SageMaker. We’ll use it to perform all of our SageMaker operations."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {},
   "outputs": [],
   "source": [
    "# cell 04\n",
    "\n",
    "import sagemaker as sage\n",
    "from time import gmtime, strftime\n",
    "\n",
    "sess = sage.Session()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Upload data to S3 Bucket"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "When training large models with huge amounts of data, you’ll typically use big data tools, like Amazon Athena, AWS Glue, or Amazon EMR, to create your data in S3. For the purposes of this example, we’re using some the [classic Iris dataset](https://en.wikipedia.org/wiki/Iris_flower_data_set) in the `lab03_data` directory. \n",
    "\n",
    "We can use use the tools provided by the [SageMaker Python SDK](https://sagemaker.readthedocs.io/en/stable/) to upload the data to a default bucket."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {},
   "outputs": [],
   "source": [
    "# cell 05\n",
    "\n",
    "WORK_DIRECTORY = 'lab03_data'\n",
    "\n",
    "data_location = sess.upload_data(WORK_DIRECTORY, \n",
    "                                 key_prefix=S3_prefix)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Model Training"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In order to use SageMaker to fit our algorithm, we create an [`estimator`](https://sagemaker.readthedocs.io/en/stable/api/training/estimators.html) that defines how to use the container to train. This includes the configuration we need to invoke SageMaker training:\n",
    "\n",
    "- `image_uri (str)` - The [Amazon Elastic Container Registry](https://aws.amazon.com/ecr/) path where the docker image is registered. This is constructed in the shell commands in *cell 06*.\n",
    "- `role (str)` - SageMaker IAM role as obtained above in *cell 03*.\n",
    "- `instance_count (int)` - number of machines to use for training.\n",
    "- `instance_type (str)` - the type of machine to use for training.\n",
    "- `output_path (str)` - where the model artifact will be written.\n",
    "- `sagemaker_session (sagemaker.session.Session)` - the SageMaker session object that we defined in *cell 04*.\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Then we use `estimator.fit()` method to train against the data that we uploaded.\n",
    "The API calls the Amazon SageMaker `CreateTrainingJob` API to start model training. The API uses configuration you provided to create the `estimator` and the specified input training data to send the `CreatingTrainingJob` request to Amazon SageMaker."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "INFO:sagemaker:Creating training-job with name: sagemaker-decision-trees-2023-06-06-19-51-50-719\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "2023-06-06 19:51:51 Starting - Starting the training job...\n",
      "2023-06-06 19:52:08 Starting - Preparing the instances for training......\n",
      "2023-06-06 19:53:22 Downloading - Downloading input data...\n",
      "2023-06-06 19:53:53 Training - Training image download completed. Training in progress.\n",
      "2023-06-06 19:53:53 Uploading - Uploading generated training model\u001b[34mStarting the training.\u001b[0m\n",
      "\u001b[34mTraining complete.\u001b[0m\n",
      "\n",
      "2023-06-06 19:54:03 Completed - Training job completed\n",
      "Training seconds: 42\n",
      "Billable seconds: 42\n"
     ]
    }
   ],
   "source": [
    "# cell 06\n",
    "\n",
    "account = sess.boto_session.client('sts').get_caller_identity()['Account']\n",
    "region = sess.boto_session.region_name\n",
    "image_uri = '{}.dkr.ecr.{}.amazonaws.com/sagemaker-decision-trees:latest'.format(account, region)\n",
    "\n",
    "tree = sage.estimator.Estimator(image_uri,\n",
    "                                role, \n",
    "                                instance_count=1, \n",
    "                                instance_type='ml.c4.2xlarge',\n",
    "                                output_path=\"s3://{}/output\".format(sess.default_bucket()),\n",
    "                                sagemaker_session=sess)\n",
    "\n",
    "file_location = data_location + '/iris.csv'\n",
    "tree.fit(file_location)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Model Deployment\n",
    "You can use a trained model to get real time predictions using HTTP endpoint. Follow these steps to walk you through the process.\n",
    "\n",
    "After the model training successfully completes, you can call the [`estimator.deploy()` method](https://sagemaker.readthedocs.io/en/stable/estimators.html#sagemaker.estimator.Estimator.deploy). The `deploy()` method creates a deployable model, configures the SageMaker hosting services endpoint, and launches the endpoint to host the model. \n",
    "\n",
    "The method uses the following configurations:\n",
    "- `initial_instance_count (int)` – The number of instances to deploy the model.\n",
    "- `instance_type (str)` – The type of instances that you want to operate your deployed model.\n",
    "- `serializer (int)` – Serialize input data of various formats (a NumPy array, list, file, or buffer) to a CSV-formatted string in this example. \n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "INFO:sagemaker:Creating model with name: sagemaker-decision-trees-2023-06-06-19-54-43-556\n",
      "INFO:sagemaker:Creating endpoint-config with name sagemaker-decision-trees-2023-06-06-19-54-43-556\n",
      "INFO:sagemaker:Creating endpoint with name sagemaker-decision-trees-2023-06-06-19-54-43-556\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "--------!"
     ]
    }
   ],
   "source": [
    "# cell 07\n",
    "\n",
    "from sagemaker.serializers import CSVSerializer\n",
    "predictor = tree.deploy(initial_instance_count=1, \n",
    "                        instance_type='ml.m4.xlarge', \n",
    "                        serializer=CSVSerializer())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Run Inferences\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Preparing test data\n",
    "In order to do some predictions, we’ll extract some of the data we used for training and do predictions against it. This is, of course, bad statistical practice, but an easy way to see how the mechanism works."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "s3://sagemaker-us-east-1-607789152815/DEMO-scikit-byo-iris/iris.csv\n"
     ]
    }
   ],
   "source": [
    "print(file_location)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>0</th>\n",
       "      <th>1</th>\n",
       "      <th>2</th>\n",
       "      <th>3</th>\n",
       "      <th>4</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>137</th>\n",
       "      <td>virginica</td>\n",
       "      <td>6.4</td>\n",
       "      <td>3.1</td>\n",
       "      <td>5.5</td>\n",
       "      <td>1.8</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>132</th>\n",
       "      <td>virginica</td>\n",
       "      <td>6.4</td>\n",
       "      <td>2.8</td>\n",
       "      <td>5.6</td>\n",
       "      <td>2.2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>143</th>\n",
       "      <td>virginica</td>\n",
       "      <td>6.8</td>\n",
       "      <td>3.2</td>\n",
       "      <td>5.9</td>\n",
       "      <td>2.3</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "             0    1    2    3    4\n",
       "137  virginica  6.4  3.1  5.5  1.8\n",
       "132  virginica  6.4  2.8  5.6  2.2\n",
       "143  virginica  6.8  3.2  5.9  2.3"
      ]
     },
     "execution_count": 26,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# cell 08\n",
    "import awswrangler as wr\n",
    "shape=wr.s3.read_csv(file_location, header=None)\n",
    "\n",
    "# shape=pd.read_csv(file_location, header=None)\n",
    "shape.sample(3)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "## s3://sagemaker-us-east-1-607789152815/DEMO-scikit-byo-iris/iris.csv\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>1</th>\n",
       "      <th>2</th>\n",
       "      <th>3</th>\n",
       "      <th>4</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>34</th>\n",
       "      <td>4.9</td>\n",
       "      <td>3.1</td>\n",
       "      <td>1.5</td>\n",
       "      <td>0.2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>25</th>\n",
       "      <td>5.0</td>\n",
       "      <td>3.0</td>\n",
       "      <td>1.6</td>\n",
       "      <td>0.2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>64</th>\n",
       "      <td>5.6</td>\n",
       "      <td>2.9</td>\n",
       "      <td>3.6</td>\n",
       "      <td>1.3</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "      1    2    3    4\n",
       "34  4.9  3.1  1.5  0.2\n",
       "25  5.0  3.0  1.6  0.2\n",
       "64  5.6  2.9  3.6  1.3"
      ]
     },
     "execution_count": 28,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# cell 09\n",
    "\n",
    "# drop the label column in the training set\n",
    "shape.drop(shape.columns[[0]],axis=1,inplace=True)\n",
    "shape.sample(3)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "metadata": {},
   "outputs": [],
   "source": [
    "# cell 10\n",
    "\n",
    "import itertools\n",
    "\n",
    "a = [50*i for i in range(3)]\n",
    "b = [40+i for i in range(10)]\n",
    "indices = [i+j for i,j in itertools.product(a,b)]\n",
    "\n",
    "test_data=shape.iloc[indices[:-1]]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Predictions\n",
    "\n",
    "Prediction is as easy as calling `predict` with the `predictor` we got back from `deploy` and the data we want to do predictions with. The serializers take care of doing the data conversions for us."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 30,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "setosa\n",
      "setosa\n",
      "setosa\n",
      "setosa\n",
      "setosa\n",
      "setosa\n",
      "setosa\n",
      "setosa\n",
      "setosa\n",
      "setosa\n",
      "versicolor\n",
      "versicolor\n",
      "versicolor\n",
      "versicolor\n",
      "versicolor\n",
      "versicolor\n",
      "versicolor\n",
      "versicolor\n",
      "versicolor\n",
      "versicolor\n",
      "virginica\n",
      "virginica\n",
      "virginica\n",
      "virginica\n",
      "virginica\n",
      "virginica\n",
      "virginica\n",
      "virginica\n",
      "virginica\n",
      "\n"
     ]
    }
   ],
   "source": [
    "# cell 11\n",
    "\n",
    "print(predictor.predict(test_data.values).decode('utf-8'))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Cleanup\n",
    "After completing the lab, use these steps to [delete the endpoint through AWS Console](https://docs.aws.amazon.com/sagemaker/latest/dg/ex1-cleanup.html) or simply run the following code\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 31,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "INFO:sagemaker:Deleting endpoint with name: sagemaker-decision-trees-2023-06-06-19-54-43-556\n"
     ]
    }
   ],
   "source": [
    "# cell 12\n",
    "sess.delete_endpoint(predictor.endpoint_name)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Remove the container artifacts and data we downloaded."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 32,
   "metadata": {},
   "outputs": [],
   "source": [
    "# cell 13\n",
    "!rm -rf lab03_container lab03_data"
   ]
  }
 ],
 "metadata": {
  "availableInstances": [
   {
    "_defaultOrder": 0,
    "_isFastLaunch": true,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 4,
    "name": "ml.t3.medium",
    "vcpuNum": 2
   },
   {
    "_defaultOrder": 1,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 8,
    "name": "ml.t3.large",
    "vcpuNum": 2
   },
   {
    "_defaultOrder": 2,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 16,
    "name": "ml.t3.xlarge",
    "vcpuNum": 4
   },
   {
    "_defaultOrder": 3,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 32,
    "name": "ml.t3.2xlarge",
    "vcpuNum": 8
   },
   {
    "_defaultOrder": 4,
    "_isFastLaunch": true,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 8,
    "name": "ml.m5.large",
    "vcpuNum": 2
   },
   {
    "_defaultOrder": 5,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 16,
    "name": "ml.m5.xlarge",
    "vcpuNum": 4
   },
   {
    "_defaultOrder": 6,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 32,
    "name": "ml.m5.2xlarge",
    "vcpuNum": 8
   },
   {
    "_defaultOrder": 7,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 64,
    "name": "ml.m5.4xlarge",
    "vcpuNum": 16
   },
   {
    "_defaultOrder": 8,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 128,
    "name": "ml.m5.8xlarge",
    "vcpuNum": 32
   },
   {
    "_defaultOrder": 9,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 192,
    "name": "ml.m5.12xlarge",
    "vcpuNum": 48
   },
   {
    "_defaultOrder": 10,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 256,
    "name": "ml.m5.16xlarge",
    "vcpuNum": 64
   },
   {
    "_defaultOrder": 11,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 384,
    "name": "ml.m5.24xlarge",
    "vcpuNum": 96
   },
   {
    "_defaultOrder": 12,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 8,
    "name": "ml.m5d.large",
    "vcpuNum": 2
   },
   {
    "_defaultOrder": 13,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 16,
    "name": "ml.m5d.xlarge",
    "vcpuNum": 4
   },
   {
    "_defaultOrder": 14,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 32,
    "name": "ml.m5d.2xlarge",
    "vcpuNum": 8
   },
   {
    "_defaultOrder": 15,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 64,
    "name": "ml.m5d.4xlarge",
    "vcpuNum": 16
   },
   {
    "_defaultOrder": 16,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 128,
    "name": "ml.m5d.8xlarge",
    "vcpuNum": 32
   },
   {
    "_defaultOrder": 17,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 192,
    "name": "ml.m5d.12xlarge",
    "vcpuNum": 48
   },
   {
    "_defaultOrder": 18,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 256,
    "name": "ml.m5d.16xlarge",
    "vcpuNum": 64
   },
   {
    "_defaultOrder": 19,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 384,
    "name": "ml.m5d.24xlarge",
    "vcpuNum": 96
   },
   {
    "_defaultOrder": 20,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": true,
    "memoryGiB": 0,
    "name": "ml.geospatial.interactive",
    "supportedImageNames": [
     "sagemaker-geospatial-v1-0"
    ],
    "vcpuNum": 0
   },
   {
    "_defaultOrder": 21,
    "_isFastLaunch": true,
    "category": "Compute optimized",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 4,
    "name": "ml.c5.large",
    "vcpuNum": 2
   },
   {
    "_defaultOrder": 22,
    "_isFastLaunch": false,
    "category": "Compute optimized",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 8,
    "name": "ml.c5.xlarge",
    "vcpuNum": 4
   },
   {
    "_defaultOrder": 23,
    "_isFastLaunch": false,
    "category": "Compute optimized",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 16,
    "name": "ml.c5.2xlarge",
    "vcpuNum": 8
   },
   {
    "_defaultOrder": 24,
    "_isFastLaunch": false,
    "category": "Compute optimized",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 32,
    "name": "ml.c5.4xlarge",
    "vcpuNum": 16
   },
   {
    "_defaultOrder": 25,
    "_isFastLaunch": false,
    "category": "Compute optimized",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 72,
    "name": "ml.c5.9xlarge",
    "vcpuNum": 36
   },
   {
    "_defaultOrder": 26,
    "_isFastLaunch": false,
    "category": "Compute optimized",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 96,
    "name": "ml.c5.12xlarge",
    "vcpuNum": 48
   },
   {
    "_defaultOrder": 27,
    "_isFastLaunch": false,
    "category": "Compute optimized",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 144,
    "name": "ml.c5.18xlarge",
    "vcpuNum": 72
   },
   {
    "_defaultOrder": 28,
    "_isFastLaunch": false,
    "category": "Compute optimized",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 192,
    "name": "ml.c5.24xlarge",
    "vcpuNum": 96
   },
   {
    "_defaultOrder": 29,
    "_isFastLaunch": true,
    "category": "Accelerated computing",
    "gpuNum": 1,
    "hideHardwareSpecs": false,
    "memoryGiB": 16,
    "name": "ml.g4dn.xlarge",
    "vcpuNum": 4
   },
   {
    "_defaultOrder": 30,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 1,
    "hideHardwareSpecs": false,
    "memoryGiB": 32,
    "name": "ml.g4dn.2xlarge",
    "vcpuNum": 8
   },
   {
    "_defaultOrder": 31,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 1,
    "hideHardwareSpecs": false,
    "memoryGiB": 64,
    "name": "ml.g4dn.4xlarge",
    "vcpuNum": 16
   },
   {
    "_defaultOrder": 32,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 1,
    "hideHardwareSpecs": false,
    "memoryGiB": 128,
    "name": "ml.g4dn.8xlarge",
    "vcpuNum": 32
   },
   {
    "_defaultOrder": 33,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 4,
    "hideHardwareSpecs": false,
    "memoryGiB": 192,
    "name": "ml.g4dn.12xlarge",
    "vcpuNum": 48
   },
   {
    "_defaultOrder": 34,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 1,
    "hideHardwareSpecs": false,
    "memoryGiB": 256,
    "name": "ml.g4dn.16xlarge",
    "vcpuNum": 64
   },
   {
    "_defaultOrder": 35,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 1,
    "hideHardwareSpecs": false,
    "memoryGiB": 61,
    "name": "ml.p3.2xlarge",
    "vcpuNum": 8
   },
   {
    "_defaultOrder": 36,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 4,
    "hideHardwareSpecs": false,
    "memoryGiB": 244,
    "name": "ml.p3.8xlarge",
    "vcpuNum": 32
   },
   {
    "_defaultOrder": 37,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 8,
    "hideHardwareSpecs": false,
    "memoryGiB": 488,
    "name": "ml.p3.16xlarge",
    "vcpuNum": 64
   },
   {
    "_defaultOrder": 38,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 8,
    "hideHardwareSpecs": false,
    "memoryGiB": 768,
    "name": "ml.p3dn.24xlarge",
    "vcpuNum": 96
   },
   {
    "_defaultOrder": 39,
    "_isFastLaunch": false,
    "category": "Memory Optimized",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 16,
    "name": "ml.r5.large",
    "vcpuNum": 2
   },
   {
    "_defaultOrder": 40,
    "_isFastLaunch": false,
    "category": "Memory Optimized",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 32,
    "name": "ml.r5.xlarge",
    "vcpuNum": 4
   },
   {
    "_defaultOrder": 41,
    "_isFastLaunch": false,
    "category": "Memory Optimized",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 64,
    "name": "ml.r5.2xlarge",
    "vcpuNum": 8
   },
   {
    "_defaultOrder": 42,
    "_isFastLaunch": false,
    "category": "Memory Optimized",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 128,
    "name": "ml.r5.4xlarge",
    "vcpuNum": 16
   },
   {
    "_defaultOrder": 43,
    "_isFastLaunch": false,
    "category": "Memory Optimized",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 256,
    "name": "ml.r5.8xlarge",
    "vcpuNum": 32
   },
   {
    "_defaultOrder": 44,
    "_isFastLaunch": false,
    "category": "Memory Optimized",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 384,
    "name": "ml.r5.12xlarge",
    "vcpuNum": 48
   },
   {
    "_defaultOrder": 45,
    "_isFastLaunch": false,
    "category": "Memory Optimized",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 512,
    "name": "ml.r5.16xlarge",
    "vcpuNum": 64
   },
   {
    "_defaultOrder": 46,
    "_isFastLaunch": false,
    "category": "Memory Optimized",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 768,
    "name": "ml.r5.24xlarge",
    "vcpuNum": 96
   },
   {
    "_defaultOrder": 47,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 1,
    "hideHardwareSpecs": false,
    "memoryGiB": 16,
    "name": "ml.g5.xlarge",
    "vcpuNum": 4
   },
   {
    "_defaultOrder": 48,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 1,
    "hideHardwareSpecs": false,
    "memoryGiB": 32,
    "name": "ml.g5.2xlarge",
    "vcpuNum": 8
   },
   {
    "_defaultOrder": 49,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 1,
    "hideHardwareSpecs": false,
    "memoryGiB": 64,
    "name": "ml.g5.4xlarge",
    "vcpuNum": 16
   },
   {
    "_defaultOrder": 50,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 1,
    "hideHardwareSpecs": false,
    "memoryGiB": 128,
    "name": "ml.g5.8xlarge",
    "vcpuNum": 32
   },
   {
    "_defaultOrder": 51,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 1,
    "hideHardwareSpecs": false,
    "memoryGiB": 256,
    "name": "ml.g5.16xlarge",
    "vcpuNum": 64
   },
   {
    "_defaultOrder": 52,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 4,
    "hideHardwareSpecs": false,
    "memoryGiB": 192,
    "name": "ml.g5.12xlarge",
    "vcpuNum": 48
   },
   {
    "_defaultOrder": 53,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 4,
    "hideHardwareSpecs": false,
    "memoryGiB": 384,
    "name": "ml.g5.24xlarge",
    "vcpuNum": 96
   },
   {
    "_defaultOrder": 54,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 8,
    "hideHardwareSpecs": false,
    "memoryGiB": 768,
    "name": "ml.g5.48xlarge",
    "vcpuNum": 192
   }
  ],
  "instance_type": "ml.t3.medium",
  "kernelspec": {
   "display_name": "Python 3 (Data Science 3.0)",
   "language": "python",
   "name": "python3__SAGEMAKER_INTERNAL__arn:aws:sagemaker:us-east-1:081325390199:image/sagemaker-data-science-310-v1"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.6"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}