{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# SageMaker기반으로 도커컨테이너를 직접 빌드하여 머신러닝 분류기 학습/배포하기\n",
    "\n",
    "본 예제코드는 아래 소스에서 code-pipeline 부분을 제외하고 재구성하였습니다.\n",
    "- 소스 : https://github.com/awslabs/amazon-sagemaker-mlops-workshop (lab 01)\n",
    "\n",
    "본 예제에서는 도커이미지를 직접 생성하여 학습과 모델 배포를 진행할 것입니다. 실습에 사용할 모델은 scikit-learn(https://scikit-learn.org/)의 **Random Forest Tree** 를 이용하며, 붓꽃(iris)의 품종을 분류하는 기초적인 문제입니다. 실험에 사용할 데이터셋은 Iris (http://archive.ics.uci.edu/ml/datasets/iris)를 이용합니다. 모델 코드는 매우 간단한 예제이므로 실행환경과 진행방식에 보다 집중해 주십시오. \n",
    "\n",
    "Scikit-lean 컨테이너는 SageMaker에서 관리형으로 제공하고 있습니다. (https://docs.aws.amazon.com/sagemaker/latest/dg/sklearn.html) 본 예제는 컨테이너를 이용하는 방법을 보여주기 위한 예제입니다. \n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## PART 1 - 도커 이미지 빌드 "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 1.1 학습용 스크립트 작성\n",
    "\n",
    "Scikit-learn을 사용하는 간단한 예제입니다."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%%writefile train.py\n",
    "import os\n",
    "import sys\n",
    "import pandas as pd\n",
    "import re\n",
    "import joblib\n",
    "import json\n",
    "from sklearn.ensemble import RandomForestClassifier\n",
    "\n",
    "def load_dataset(path):\n",
    "    # Take the set of files and read them all into a single pandas dataframe\n",
    "    files = [ os.path.join(path, file) for file in os.listdir(path) ]\n",
    "    \n",
    "    if len(files) == 0:\n",
    "        raise ValueError(\"Invalid # of files in dir: {}\".format(path))\n",
    "\n",
    "    raw_data = [ pd.read_csv(file, sep=\",\", header=None ) for file in files ]\n",
    "    data = pd.concat(raw_data)\n",
    "\n",
    "    # labels are in the first column\n",
    "    y = data.iloc[:,0]\n",
    "    X = data.iloc[:,1:]\n",
    "    return X,y\n",
    "    \n",
    "def start(args):\n",
    "    print(\"Training mode\")\n",
    "\n",
    "    try:\n",
    "        X_train, y_train = load_dataset(args.train)\n",
    "        X_test, y_test = load_dataset(args.validation)\n",
    "        \n",
    "        hyperparameters = {\n",
    "            \"max_depth\": args.max_depth,\n",
    "            \"verbose\": 1, # show all logs\n",
    "            \"n_jobs\": args.n_jobs,\n",
    "            \"n_estimators\": args.n_estimators\n",
    "        }\n",
    "        print(\"Training the classifier\")\n",
    "        model = RandomForestClassifier()\n",
    "        model.set_params(**hyperparameters)\n",
    "        model.fit(X_train, y_train)\n",
    "        print(\"Score: {}\".format( model.score(X_test, y_test)) )\n",
    "        joblib.dump(model, open(os.path.join(args.model_dir, \"iris_model.pkl\"), \"wb\"))\n",
    "    \n",
    "    except Exception as e:\n",
    "        # Write out an error file. This will be returned as the failureReason in the\n",
    "        # DescribeTrainingJob result.\n",
    "        trc = traceback.format_exc()\n",
    "        with open(os.path.join(args.output_dir, \"failure\"), \"w\") as s:\n",
    "            s.write(\"Exception during training: \" + str(e) + \"\\\\n\" + trc)\n",
    "            \n",
    "        # Printing this causes the exception to be in the training job logs, as well.\n",
    "        print(\"Exception during training: \" + str(e) + \"\\\\n\" + trc, file=sys.stderr)\n",
    "        \n",
    "        # A non-zero exit code causes the training job to be marked as Failed.\n",
    "        sys.exit(255)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 1.2 추론용 핸들러(handler)를 만듭니다. \n",
    "\n",
    "SageMaker 추론 툴킷을 이용합니다. SageMaker 추론 툴킷은 SageMaker에서 머신러닝의 추론코드를 보다 쉽게 작성할 수 있도록 도와줍니다. \n",
    "\n",
    "- SageMaker 추론 툴킷 참조 : https://github.com/aws/sagemaker-inference-toolkit"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%%writefile handler.py\n",
    "import os\n",
    "import sys\n",
    "import joblib\n",
    "from sagemaker_inference.default_inference_handler import DefaultInferenceHandler\n",
    "from sagemaker_inference.default_handler_service import DefaultHandlerService\n",
    "from sagemaker_inference import content_types, errors, transformer, encoder, decoder\n",
    "\n",
    "class HandlerService(DefaultHandlerService, DefaultInferenceHandler):\n",
    "    def __init__(self):\n",
    "        op = transformer.Transformer(default_inference_handler=self)\n",
    "        super(HandlerService, self).__init__(transformer=op)\n",
    "    \n",
    "    ## Loads the model from the disk\n",
    "    def default_model_fn(self, model_dir):\n",
    "        model_filename = os.path.join(model_dir, \"iris_model.pkl\")\n",
    "        return joblib.load(open(model_filename, \"rb\"))\n",
    "    \n",
    "    ## Parse and check the format of the input data\n",
    "    def default_input_fn(self, input_data, content_type):\n",
    "        if content_type != \"text/csv\":\n",
    "            raise Exception(\"Invalid content-type: %s\" % content_type)\n",
    "        return decoder.decode(input_data, content_type).reshape(1,-1)\n",
    "    \n",
    "    ## Run our model and do the prediction\n",
    "    def default_predict_fn(self, payload, model):\n",
    "        return model.predict( payload ).tolist()\n",
    "    \n",
    "    ## Gets the prediction output and format it to be returned to the user\n",
    "    def default_output_fn(self, prediction, accept):\n",
    "        if accept != \"text/csv\":\n",
    "            raise Exception(\"Invalid accept: %s\" % accept)\n",
    "        return encoder.encode(prediction, accept)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 1.3 컨테이너의 entrypoint 스크립트를 작성합니다. \n",
    "\n",
    "main함수에서 매개변수 설정을 위해 **SageMaker 학습 툴킷**(https://github.com/aws/sagemaker-training-toolkit)을 사용하였으며 추론 서빙을 위해서 **SageMaker 추론 툴킷**을 이용하고 있습니다. 이들을 사용하면 보다 간결하게 코드를 작성할 수 있습니다.  "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%%writefile main.py\n",
    "import train\n",
    "import argparse\n",
    "import sys\n",
    "import os\n",
    "import traceback\n",
    "from sagemaker_inference import model_server\n",
    "from sagemaker_training import environment\n",
    "\n",
    "if __name__ == \"__main__\":\n",
    "    if len(sys.argv) < 2 or ( not sys.argv[1] in [ \"serve\", \"train\" ] ):\n",
    "        raise Exception(\"Invalid argument: you must inform 'train' for training mode or 'serve' predicting mode\") \n",
    "        \n",
    "    if sys.argv[1] == \"train\":\n",
    "        \n",
    "        env = environment.Environment()\n",
    "        \n",
    "        parser = argparse.ArgumentParser()\n",
    "        # https://github.com/aws/sagemaker-training-toolkit/blob/master/ENVIRONMENT_VARIABLES.md\n",
    "        parser.add_argument(\"--max-depth\", type=int, default=10)\n",
    "        parser.add_argument(\"--n-jobs\", type=int, default=env.num_cpus)\n",
    "        parser.add_argument(\"--n-estimators\", type=int, default=120)\n",
    "        \n",
    "        # reads input channels training and testing from the environment variables\n",
    "        parser.add_argument(\"--train\", type=str, default=env.channel_input_dirs[\"train\"])\n",
    "        parser.add_argument(\"--validation\", type=str, default=env.channel_input_dirs[\"validation\"])\n",
    "\n",
    "        parser.add_argument(\"--model-dir\", type=str, default=env.model_dir)\n",
    "        parser.add_argument(\"--output-dir\", type=str, default=env.output_dir)\n",
    "        \n",
    "        args,unknown = parser.parse_known_args()\n",
    "        train.start(args)\n",
    "    else:\n",
    "        model_server.start_model_server(handler_service=\"serving.handler\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 1.4 Dockerfile을 작성합니다.\n",
    "\n",
    "컨테이너에 설치하는 패키지에 주목하십시오. 컨테이너를 학습/추론용으로 모두 사용하기 위해 **SageMaker Inference Toolkit** (https://github.com/aws/sagemaker-inference-toolkit) 과 **SageMaker Training Toolkit** (https://github.com/aws/sagemaker-training-toolkit)을 설치하고 있습니다. 추론을 위해서는 multi-model-server를 이용하며 API 호출을 처리할 수 있는 웹서비스 형태로 모델을 서빙하게 됩니다. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%%writefile Dockerfile\n",
    "FROM python:3.7-buster\n",
    "\n",
    "# Set a docker label to advertise multi-model support on the container\n",
    "LABEL com.amazonaws.sagemaker.capabilities.multi-models=false\n",
    "# Set a docker label to enable container to use SAGEMAKER_BIND_TO_PORT environment variable if present\n",
    "LABEL com.amazonaws.sagemaker.capabilities.accept-bind-to-port=true\n",
    "\n",
    "RUN apt-get update -y && apt-get -y install --no-install-recommends default-jdk\n",
    "RUN rm -rf /var/lib/apt/lists/*\n",
    "\n",
    "RUN pip --no-cache-dir install multi-model-server sagemaker-inference sagemaker-training\n",
    "RUN pip --no-cache-dir install pandas numpy scipy scikit-learn\n",
    "\n",
    "ENV PYTHONUNBUFFERED=TRUE\n",
    "ENV PYTHONDONTWRITEBYTECODE=TRUE\n",
    "ENV PYTHONPATH=\"/opt/ml/code:${PATH}\"\n",
    "\n",
    "COPY main.py /opt/ml/code/main.py\n",
    "COPY train.py /opt/ml/code/train.py\n",
    "COPY handler.py /opt/ml/code/serving/handler.py\n",
    "\n",
    "ENTRYPOINT [\"python\", \"/opt/ml/code/main.py\"]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## PART 2 - 로컬 테스트: 로컬에서 이미지를 빌드하고 테스트를 수행합니다. \n",
    "### 2.1 로컬에서 이미지 빌드\n",
    "\n",
    "SageMaker Jupyter 노트북은 이미 **도커** 환경이 설치되어 있어 별도 작업없이 바로 사용가능합니다. \n",
    "\n",
    "조금전 준비한 Dockerfile을 이용하여 도커이미지를 빌드합니다.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "!docker build -f Dockerfile -t iris_model:1.0 ."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 2.2 이제 학습과 모델 배포를 위한 알고리즘 이미지가 준비되었습니다. "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 다음은 데이터셋을 준비하겠습니다.\n",
    "\n",
    "iris 데이터셋을 학습과 검증용으로 나누고 csv파일로 저장합니다. 이 파일들은 SageMaker환경과의 공유를 위해 S3버킷으로 업로드될 것입니다.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!rm -rf input\n",
    "!mkdir -p input/data/train\n",
    "!mkdir -p input/data/validation\n",
    "\n",
    "import pandas as pd\n",
    "import numpy as np\n",
    "\n",
    "from sklearn import datasets\n",
    "from sklearn.model_selection import train_test_split\n",
    "\n",
    "iris = datasets.load_iris()\n",
    "\n",
    "dataset = np.insert(iris.data, 0, iris.target,axis=1)\n",
    "\n",
    "df = pd.DataFrame(data=dataset, columns=[\"iris_id\"] + iris.feature_names)\n",
    "X = df.iloc[:,1:]\n",
    "y = df.iloc[:,0]\n",
    "X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)\n",
    "\n",
    "train_df = X_train.copy()\n",
    "train_df.insert(0, \"iris_id\", y_train)\n",
    "train_df.to_csv(\"input/data/train/training.csv\", sep=\",\", header=None, index=None)\n",
    "\n",
    "test_df = X_test.copy()\n",
    "test_df.insert(0, \"iris_id\", y_test)\n",
    "test_df.to_csv(\"input/data/validation/testing.csv\", sep=\",\", header=None, index=None)\n",
    "\n",
    "df.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 2.3 준비한 도커이미지와 데이처를 이용하여 로컬에서 학습작업을 실행해 봅니다. \n",
    "\n",
    "다음 코드들은 로컬에서 SageMaker와 유사한 환경을 시뮬레이션합니다. docker run 실행시 파일 경로의 매핑에 주의합니다."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!rm -rf input/config && mkdir -p input/config"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%%writefile input/config/hyperparameters.json\n",
    "{\"max_depth\": 20, \"n_jobs\": 4, \"n_estimators\": 120}"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%%writefile input/config/resourceconfig.json\n",
    "{\"current_host\": \"localhost\", \"hosts\": [\"algo-1-kipw9\"]}"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%%writefile input/config/inputdataconfig.json\n",
    "{\"train\": {\"TrainingInputMode\": \"File\"}, \"validation\": {\"TrainingInputMode\": \"File\"}}"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%%time\n",
    "!rm -rf model/\n",
    "!mkdir -p model\n",
    "\n",
    "print( \"Training...\")\n",
    "!docker run --rm --name \"my_model\" \\\n",
    "    -v \"$PWD/model:/opt/ml/model\" \\\n",
    "    -v \"$PWD/output:/opt/ml/output\" \\\n",
    "    -v \"$PWD/input:/opt/ml/input\" iris_model:1.0 train"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 2.4 다음은 추론 서빙을 테스트해봅니다. \n",
    "\n",
    "아래 코드는 SageMaker에 의해 생성되는 엔드포인트를 시뮬레이션합니다. \n",
    "\n",
    "다음 셀의 코드 실행하면 API서비스가 실행되면서 Jupyter notebook의 동작이 멈출 것입니다. 웹서비스는 8080 포트를 사용합니다. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!docker run --rm --name \"my_model\" \\\n",
    "    -p 8080:8080 \\\n",
    "    -v \"$PWD/model:/opt/ml/model\" \\\n",
    "    -v \"$PWD/input:/opt/ml/input\" iris_model:1.0 serve"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "> 위 셀을 실행한 후 두번째 노트북 [TEST NOTEBOOK](02_Testing%20our%20local%20model%20server.ipynb)을 오픈하고 테스트를 실행합니다.\n",
    "\n",
    "> 테스트가 끝나면 노트북 위쪽 메뉴에서 **STOP** 버튼을 클릭합니다."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 이제 SageMaker 환경에서 이 컨테이너를 이용하여 학습과 배포를 진행해 봅니다.\n",
    "\n",
    "AWS CodePipeline과 CodeBuild, CodeCommit을 이용하여 형상관리를 gkr 프로세스를 자동화하는 과정으로 진행하시려면 아래 원본 소스를 참고하십시오.\n",
    "- https://github.com/awslabs/amazon-sagemaker-mlops-workshop "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "이번에는 ECR에 배포된 알고리즘 이미지를 이용하여, SageMaker Estimator(https://sagemaker.readthedocs.io/en/stable/estimators.html)로 학습을 실행해 보겠습니다. 로컬에서 먼저 테스트를 진행한 다음 SageMaker환경에서 학습을 실행해 보겠습니다.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import sagemaker\n",
    "import json\n",
    "from sagemaker import get_execution_role\n",
    "\n",
    "role = get_execution_role()\n",
    "sagemaker_session = sagemaker.Session()\n",
    "bucket = sagemaker_session.default_bucket()\n",
    "prefix='mlops/iris'"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "iris-model을 ECR로 배포합니다."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!chmod +x build_and_push.sh\n",
    "!./build_and_push.sh iris-model"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 데이터셋 업로드\n",
    "\n",
    "앞단계에서 학습과 검증 데이터셋을 생성하였습니다. 이 파일을 S3로 업로드합니다.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "train_path = sagemaker_session.upload_data(path='input/data/train', key_prefix='iris-model/input/train')\n",
    "test_path = sagemaker_session.upload_data(path='input/data/validation', key_prefix='iris-model/input/validation')\n",
    "print(\"Train: %s\\nValidation: %s\" % (train_path, test_path) )"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\n",
    "## 이제 Sagemaker Estimator를 생성하고 학습과 배포를 실행해 봅니다."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Create the estimator\n",
    "# iris-model:test is the name of the container created in the previous notebook\n",
    "# By the local codebuild test. An image with that name:tag was pushed to the ECR.\n",
    "iris = sagemaker.estimator.Estimator('iris-model:latest',\n",
    "                                    role,\n",
    "                                    instance_count=1, \n",
    "                                    instance_type='local',\n",
    "                                    output_path='s3://{}/{}/output'.format(bucket, prefix))\n",
    "hyperparameters = {\n",
    "    'max_depth': 20,\n",
    "    'n_jobs': 4,\n",
    "    'n_estimators': 120\n",
    "}\n",
    "\n",
    "print(hyperparameters)\n",
    "iris.set_hyperparameters(**hyperparameters)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "`instance_type`을 `local`로 지정했으므로 `.fit()`을 호출시 SageMaker가 아닌 *로컬 도커 데몬* 환경에서 새로운 학습작업이 실행될 것입니다."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "iris.fit({'train': train_path, 'validation': test_path })"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "다음 명령은 로컬 도커 데몬에 새로운 추론용 컨테이너를 실행하고 이를 테스트할 수 있도록 predictor를 리턴합니다."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "iris_predictor = iris.deploy(initial_instance_count=1, instance_type='local')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "이제 로컬에 배포된 predictor(https://sagemaker.readthedocs.io/en/stable/predictors.html)를 이용하여 추론 테스트를 진행합니다."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd\n",
    "import random\n",
    "from sagemaker.serializers import CSVSerializer\n",
    "from sagemaker.deserializers import CSVDeserializer\n",
    "\n",
    "# configure the predictor to do everything for us\n",
    "iris_predictor.serializer = CSVSerializer()\n",
    "iris_predictor.deserializer = CSVDeserializer()\n",
    "\n",
    "# load the testing data from the validation csv\n",
    "validation = pd.read_csv('input/data/validation/testing.csv', header=None)\n",
    "idx = random.randint(0,len(validation)-5)\n",
    "req = validation.iloc[idx:idx+5].values\n",
    "\n",
    "# cut a sample with 5 lines from our dataset and then split the label from the features.\n",
    "X = req[:,1:].tolist()\n",
    "y = req[:,0].tolist()\n",
    "\n",
    "# call the local endpoint\n",
    "for features,label in zip(X,y):\n",
    "    prediction = iris_predictor.predict(features)\n",
    "\n",
    "    # compare the results\n",
    "    print(\"RESULT: {} == {} ? {}\".format( label, prediction[0][0], str(label) == str(prediction[0][0])) )"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 테스트가 완료되면 엔드포인트를 삭제합니다."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "iris_predictor.delete_endpoint()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\n",
    "## (Local이 아닌) 클라우드 환경에서 Sagemaker Estimator를 생성하고 학습과 배포를 실행해 봅니다.\n",
    "\n",
    "Estimator의 첫번째 인자인 `image_uri`를 full ecr name으로 지정하고 `instance_type`을 `local`이 아닌 `ml.c4.xlarge`로 바꾸어서 실행합니다. 나머지 설정은 동일합니다."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import boto3\n",
    "\n",
    "client = boto3.client(\"sts\")\n",
    "account = client.get_caller_identity()[\"Account\"]\n",
    "\n",
    "my_session = boto3.session.Session()\n",
    "region = my_session.region_name\n",
    "\n",
    "algorithm_name = \"iris-model\"\n",
    "\n",
    "ecr_image = \"{}.dkr.ecr.{}.amazonaws.com/{}:latest\".format(account, region, algorithm_name)\n",
    "\n",
    "print(ecr_image)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Create the estimator\n",
    "# iris-model:test is the name of the container created in the previous notebook\n",
    "# By the local codebuild test. An image with that name:tag was pushed to the ECR.\n",
    "iris = sagemaker.estimator.Estimator(ecr_image,\n",
    "                                    role,\n",
    "                                    instance_count=1, \n",
    "                                    instance_type='ml.c4.xlarge',\n",
    "                                    output_path='s3://{}/{}/output'.format(bucket, prefix))\n",
    "hyperparameters = {\n",
    "    'max_depth': 20,\n",
    "    'n_jobs': 4,\n",
    "    'n_estimators': 120\n",
    "}\n",
    "\n",
    "print(hyperparameters)\n",
    "iris.set_hyperparameters(**hyperparameters)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "iris.fit({'train': train_path, 'validation': test_path })"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "클라우드로 배포하고 엔드포인트를 생성합니다. `instance_type`을 `local`이 아닌 `ml.c4.xlarge`로 변경하였습니다."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "iris_predictor = iris.deploy(initial_instance_count=1, instance_type='ml.c4.xlarge')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "클라우드에 배포된 predictor(https://sagemaker.readthedocs.io/en/stable/predictors.html)를 이용하여 추론 테스트를 진행합니다."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd\n",
    "import random\n",
    "from sagemaker.serializers import CSVSerializer\n",
    "from sagemaker.deserializers import CSVDeserializer\n",
    "\n",
    "# configure the predictor to do everything for us\n",
    "iris_predictor.serializer = CSVSerializer()\n",
    "iris_predictor.deserializer = CSVDeserializer()\n",
    "\n",
    "# load the testing data from the validation csv\n",
    "validation = pd.read_csv('input/data/validation/testing.csv', header=None)\n",
    "idx = random.randint(0,len(validation)-5)\n",
    "req = validation.iloc[idx:idx+5].values\n",
    "\n",
    "# cut a sample with 5 lines from our dataset and then split the label from the features.\n",
    "X = req[:,1:].tolist()\n",
    "y = req[:,0].tolist()\n",
    "\n",
    "# call the local endpoint\n",
    "for features,label in zip(X,y):\n",
    "    prediction = iris_predictor.predict(features)\n",
    "\n",
    "    # compare the results\n",
    "    print(\"RESULT: {} == {} ? {}\".format( label, prediction[0][0], str(label) == str(prediction[0][0])) )"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 테스트가 완료되면 엔드포인트를 삭제합니다."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "iris_predictor.delete_endpoint()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\n",
    "## 클라우드에서 Batch Transform 작업을 실행해 봅니다.\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "pd.DataFrame(X).to_csv('test.csv', header=None, index=None)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!head input/data/validation/testing.csv"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!aws s3 cp data/validation/testing.csv s3://leonkang-datalake-tokyo/iris-testin/test.csv"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "input_path = 's3://leonkang-datalake-tokyo/iris-testin/'\n",
    "output_path = 's3://leonkang-datalake-tokyo/iris-testout/'"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "output_path = 's3://{}/iris-model/output/batchout'.format(bucket)\n",
    "print(test_path)\n",
    "print(output_path)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "transformer = iris.transformer(instance_count=1,\n",
    "                               instance_type='ml.m4.xlarge',\n",
    "                               output_path=output_path,\n",
    "                               assemble_with='Line',\n",
    "                               accept='text/csv',\n",
    "                               strategy='SingleRecord'\n",
    "                              )"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "transformer.transform(test_path, content_type='text/csv', split_type='Line', input_filter='$[1:]')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!aws s3 ls {output_path}/"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!aws s3 cp {output_path}/testing.csv.out test_out.csv"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!cat test_out.csv"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "conda_python3",
   "language": "python",
   "name": "conda_python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.13"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}