{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Kubeflow Fairing Kaniko Cloud Builder on AWS\n",
    "\n",
    "## Requirements\n",
    "\n",
    "  * You must be running Kubeflow 1.0 or newer on EKS\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Create AWS secret in kubernetes and grant aws access to your notebook\n",
    "\n",
    "> Note: Once IAM for Service Account is merged in 1.0.1, we don't have to use credentials\n",
    "\n",
    "1. Please create an AWS secret in current namespace. \n",
    "\n",
    "> Note: To get base64 string, try `echo -n $AWS_ACCESS_KEY_ID | base64`. \n",
    "> Make sure you have `AmazonEC2ContainerRegistryFullAccess` and `AmazonS3FullAccess` for this experiment. Pods will use credentials to talk to AWS services."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%%bash\n",
    "\n",
    "# Replace placeholder with your own AWS credentials\n",
    "AWS_ACCESS_KEY_ID='<YOUR AWS ACCESS KEY>'\n",
    "AWS_SECRET_ACCESS_KEY='<YOUR AWS SECRET KEY>'\n",
    "\n",
    "kubectl create secret generic aws-secret --from-literal=AWS_ACCESS_KEY_ID=${AWS_ACCESS_KEY_ID} --from-literal=AWS_SECRET_ACCESS_KEY=${AWS_SECRET_ACCESS_KEY}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "2. Attach `AmazonEC2ContainerRegistryFullAccess` and `AmazonS3FullAccess` to EKS node group role and grant AWS access to notebook."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Verify you have access to AWS services\n",
    "\n",
    "* The cell below checks that this notebook was spawned with credentials to access AWS S3 and ECR"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import logging\n",
    "import os\n",
    "import uuid\n",
    "from importlib import reload\n",
    "import boto3\n",
    "\n",
    "# Set REGION for s3 bucket and elastic contaienr registry\n",
    "AWS_REGION='us-west-2'\n",
    "boto3.client('s3', region_name=AWS_REGION).list_buckets()\n",
    "boto3.client('ecr', region_name=AWS_REGION).describe_repositories()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Install Required Libraries\n",
    "\n",
    "Import the libraries required to train this model."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from src.kaniko import notebook_setup\n",
    "reload(notebook_setup)\n",
    "notebook_setup.notebook_setup()\n",
    "\n",
    "# Force a reload of kubeflow; since kubeflow is a multi namespace module\n",
    "# it looks like doing this in notebook_setup may not be sufficient\n",
    "import kubeflow\n",
    "reload(kubeflow)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Configure ECR Docker Registry For Kubeflow Fairing\n",
    "\n",
    "* In order to build docker images from your notebook we need a docker registry where the images will be stored\n",
    "* Below you set some variables specifying a [Amazon Elastic Container Registry](https://aws.amazon.com/ecr/)\n",
    "* Kubeflow Fairing provides a utility function to guess the name of your AWS account"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from kubeflow import fairing   \n",
    "from kubeflow.fairing import utils as fairing_utils\n",
    "from kubeflow.fairing.preprocessors import base as base_preprocessor\n",
    "from kubeflow.tfjob.api import tf_job_client as tf_job_client_module\n",
    "import yaml\n",
    "\n",
    "# Setting up AWS Elastic Container Registry (ECR) for storing output containers\n",
    "# You can use any docker container registry istead of ECR\n",
    "AWS_ACCOUNT_ID=fairing.cloud.aws.guess_account_id()\n",
    "AWS_ACCOUNT_ID = boto3.client('sts').get_caller_identity().get('Account')\n",
    "DOCKER_REGISTRY = '{}.dkr.ecr.{}.amazonaws.com'.format(AWS_ACCOUNT_ID, AWS_REGION)\n",
    "\n",
    "namespace = fairing_utils.get_current_k8s_namespace()\n",
    "\n",
    "logging.info(f\"Running in aws region {AWS_REGION}, account {AWS_ACCOUNT_ID}\")\n",
    "logging.info(f\"Running in namespace {namespace}\")\n",
    "logging.info(f\"Using docker registry {DOCKER_REGISTRY}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Use Kubeflow fairing to build the docker image\n",
    "\n",
    "* You will use kubeflow fairing's kaniko builder to build a docker image that includes all your dependencies\n",
    "  * You use kaniko because you want to be able to run `pip` to install dependencies\n",
    "  * Kaniko gives you the flexibility to build images from Dockerfiles"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from kubeflow.fairing.builders import cluster\n",
    "\n",
    "# output_map is a map of extra files to add to the notebook.\n",
    "# It is a map from source location to the location inside the context.\n",
    "output_map =  {\n",
    "    \"./src/kaniko/Dockerfile.model\": \"Dockerfile\",\n",
    "    \"./src/kaniko/model.py\": \"model.py\"\n",
    "}\n",
    "\n",
    "preprocessor = base_preprocessor.BasePreProcessor(\n",
    "    command=[\"python\"], # The base class will set this.\n",
    "    input_files=[],\n",
    "    path_prefix=\"/app\", # irrelevant since we aren't preprocessing any files\n",
    "    output_map=output_map)\n",
    "\n",
    "preprocessor.preprocess()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Create a new ECR repository to host model image\n",
    "ecr_repo_name = 'fairing-kaniko'\n",
    "!aws ecr create-repository --repository-name $ecr_repo_name --region=$AWS_REGION"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Use a Tensorflow image as the base image\n",
    "# We use a custom Dockerfile \n",
    "cluster_builder = cluster.cluster.ClusterBuilder(registry=DOCKER_REGISTRY,\n",
    "                                                 base_image=\"\", # base_image is set in the Dockerfile\n",
    "                                                 preprocessor=preprocessor,\n",
    "                                                 image_name=\"fairing-kaniko\",\n",
    "                                                 dockerfile_path=\"Dockerfile\",\n",
    "                                                 pod_spec_mutators=[fairing.cloud.aws.add_aws_credentials_if_exists, fairing.cloud.aws.add_ecr_config],\n",
    "                                                 context_source=cluster.s3_context.S3ContextSource(region=AWS_REGION))\n",
    "cluster_builder.build()\n",
    "logging.info(f\"Built image {cluster_builder.image_tag}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Create a S3 Bucket\n",
    "\n",
    "* Create a S3 bucket to store our models and other results.\n",
    "* Since we are running in python we use the python client libraries."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import boto3\n",
    "from botocore.exceptions import ClientError\n",
    "\n",
    "bucket = f\"{AWS_ACCOUNT_ID}-fairing-kaniko\"\n",
    "\n",
    "def create_bucket(bucket_name):\n",
    "    \"\"\"Create an S3 bucket in a specified region\n",
    "\n",
    "    :param bucket_name: Bucket to create\n",
    "    :return: True if bucket created, else False\n",
    "    \"\"\"\n",
    "\n",
    "    # Create bucket\n",
    "    try:\n",
    "        s3_client = boto3.client('s3')\n",
    "        s3_client.create_bucket(Bucket=bucket_name)\n",
    "    except ClientError as e:\n",
    "        logging.error(e)\n",
    "        return False\n",
    "    return True\n",
    "\n",
    "create_bucket(bucket)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Distributed training by using Fairing Kaniko image\n",
    "\n",
    "* We will train the model by using TFJob to run a distributed training job"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from src.kaniko.tfjob_spec_provider import tfj_spec \n",
    "train_name=f\"mnist-train-{uuid.uuid4().hex[:4]}\"\n",
    "train_spec = tfj_spec(train_name=train_name,\n",
    "                    num_ps=1,\n",
    "                    num_workers=2,\n",
    "                    model_dir=f\"s3://{bucket}/mnist\",\n",
    "                    export_path=f\"s3://{bucket}/mnist/export\",\n",
    "                    train_steps=200,\n",
    "                    batch_size=100,\n",
    "                    learning_rate=.01,\n",
    "                    image=cluster_builder.image_tag,\n",
    "                    AWS_REGION=AWS_REGION)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Create the training job from Kaniko Image\n",
    "\n",
    "* You could write the spec to a YAML file and then do `kubectl apply -f {FILE}`\n",
    "* Since you are running in jupyter you will use the TFJob client\n",
    "* You will run the TFJob in a namespace created by a Kubeflow profile\n",
    "  * The namespace will be the same namespace you are running the notebook in\n",
    "  * Creating a profile ensures the namespace is provisioned with service accounts and other resources needed for Kubeflow"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "tf_job_client = tf_job_client_module.TFJobClient()\n",
    "tf_job_body = yaml.safe_load(train_spec)\n",
    "tf_job = tf_job_client.create(tf_job_body, namespace=namespace)  \n",
    "\n",
    "logging.info(f\"Created job {namespace}.{train_name}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Check the job\n",
    "\n",
    "* Above you used the python SDK for TFJob to check the status\n",
    "* You can also use kubectl get the status of your job\n",
    "* The job conditions will tell you whether the job is running, succeeded or failed"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!kubectl get tfjobs -o yaml {train_name}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Clean Up"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [],
   "source": [
    "!aws ecr delete-repository --repository-name $ecr_repo_name --force --region=$AWS_REGION\n",
    "!aws s3 rb s3://$bucket --force"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.9"
  },
  "pycharm": {
   "stem_cell": {
    "cell_type": "raw",
    "source": [],
    "metadata": {
     "collapsed": false
    }
   }
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}