{ "cells": [ { "cell_type": "markdown", "id": "3bba35b9", "metadata": { "tags": [] }, "source": [ "# 1. SageMaker Training with Experiments and Processing For AutoGluon" ] }, { "cell_type": "markdown", "id": "18ee4107", "metadata": {}, "source": [ "## 학습 작업의 실행 노트북 개요\n", "\n", "- SageMaker Training에 SageMaker 실험을 추가하여 여러 실험의 결과를 비교할 수 있습니다.\n", " - [작업 실행 시 필요 라이브러리 import](#작업-실행-시-필요-라이브러리-import)\n", " - [SageMaker 세션과 Role, 사용 버킷 정의](#SageMaker-세션과-Role,-사용-버킷-정의)\n", " - [하이퍼파라미터 정의](#하이퍼파라미터-정의)\n", " - [학습 실행 작업 정의](#학습-실행-작업-정의)\n", " - 학습 코드 명\n", " - 학습 코드 폴더 명\n", " - 학습 코드가 사용한 Framework 종류, 버전 등\n", " - 학습 인스턴스 타입과 개수\n", " - SageMaker 세션\n", " - 학습 작업 하이퍼파라미터 정의\n", " - 학습 작업 산출물 관련 S3 버킷 설정 등\n", " - [학습 데이터셋 지정](#학습-데이터셋-지정)\n", " - 학습에 사용하는 데이터셋의 S3 URI 지정\n", " - [SageMaker 실험 설정](#SageMaker-실험-설정)\n", " - [학습 실행](#학습-실행)\n", " - [데이터 세트 설명](#데이터-세트-설명)\n", " - [실험 결과 보기](#실험-결과-보기)\n", " - [Evaluation 하기](#Evaluation-하기)" ] }, { "cell_type": "markdown", "id": "f30d8298", "metadata": {}, "source": [ "### 작업 실행 시 필요 라이브러리 import" ] }, { "cell_type": "code", "execution_count": null, "id": "3c742efe", "metadata": {}, "outputs": [], "source": [ "!pip install -U sagemaker-experiments" ] }, { "cell_type": "code", "execution_count": null, "id": "c1607693", "metadata": {}, "outputs": [], "source": [ "import os\n", "import sys\n", "import json\n", "import pandas as pd\n", "import boto3\n", "import sagemaker" ] }, { "cell_type": "code", "execution_count": null, "id": "e7602bfe-4c6a-40a9-b329-3e4a4189118d", "metadata": {}, "outputs": [], "source": [ "sys.path.append('./src')" ] }, { "cell_type": "code", "execution_count": null, "id": "5852afa7", "metadata": {}, "outputs": [], "source": [ "from ag_model import (\n", " AutoGluonTraining,\n", " AutoGluonInferenceModel,\n", " AutoGluonTabularPredictor,\n", " AutoGluonFramework\n", ")" ] }, { "cell_type": "markdown", "id": "35672ce2", "metadata": {}, "source": [ "### SageMaker 세션과 Role, 사용 버킷 정의" ] }, { "cell_type": "code", "execution_count": null, "id": "dea6c642", "metadata": {}, "outputs": [], "source": [ "sagemaker_session = sagemaker.session.Session()\n", "region = sagemaker_session._region_name\n", "role = sagemaker.get_execution_role()" ] }, { "cell_type": "code", "execution_count": null, "id": "9900af2c", "metadata": {}, "outputs": [], "source": [ "bucket = sagemaker_session.default_bucket()\n", "code_location = f's3://{bucket}/autogluon/code'\n", "output_path = f's3://{bucket}/autogluon/output'" ] }, { "cell_type": "markdown", "id": "51337b88", "metadata": {}, "source": [ "### 하이퍼파라미터 정의" ] }, { "cell_type": "code", "execution_count": null, "id": "20d3a907", "metadata": {}, "outputs": [], "source": [ "hyperparameters = {\n", " \"config_name\" : \"config-med.yaml\"\n", "}" ] }, { "cell_type": "markdown", "id": "f80ad1e7", "metadata": {}, "source": [ "### 학습 데이터셋 지정" ] }, { "cell_type": "code", "execution_count": null, "id": "f1637db8", "metadata": {}, "outputs": [], "source": [ "data_path=f's3://{bucket}/autogluon/dataset'\n", "config_path = f's3://{bucket}/autogluon/config'\n", "!aws s3 sync ../data/dataset/ $data_path\n", "!aws s3 sync ./config/ $config_path\n", "\n", "data_path" ] }, { "cell_type": "markdown", "id": "ac99c65e", "metadata": {}, "source": [ "### 학습 실행 작업 정의" ] }, { "cell_type": "code", "execution_count": null, "id": "6f139415", "metadata": {}, "outputs": [], "source": [ "instance_count = 1\n", "instance_type = \"ml.m5.large\"\n", "# instance_type = 'local'\n", "max_run = 1*60*60\n", "\n", "use_spot_instances = False\n", "if use_spot_instances:\n", " max_wait = 1*60*60\n", "else:\n", " max_wait = None" ] }, { "cell_type": "code", "execution_count": null, "id": "8bf7fbfa", "metadata": {}, "outputs": [], "source": [ "if instance_type == 'local':\n", " from sagemaker.local import LocalSession\n", " \n", " sagemaker_session = LocalSession()\n", " sagemaker_session.config = {'local': {'local_code': True}}\n", " local_data_path = \"file://\" + os.getcwd().replace('/lab_1_training', '') + \"/data/dataset\"\n", " \n", " data_channels = {\n", " \"inputdata\": local_data_path, \n", " \"config\" : \"file://\" + os.getcwd() + '/config'\n", " }\n", " \n", "else:\n", " sess = boto3.Session()\n", " sagemaker_session = sagemaker.Session()\n", " sm = sess.client('sagemaker')\n", " \n", " data_channels = {\n", " \"inputdata\": data_path, \n", " \"config\" : config_path\n", " }" ] }, { "cell_type": "code", "execution_count": null, "id": "e1f35afb", "metadata": {}, "outputs": [], "source": [ "ag_estimator = AutoGluonTraining(\n", " entry_point=\"autogluon_starter_script.py\",\n", " source_dir=os.getcwd() + \"/src\",\n", " role=role,\n", " # region=region,\n", " sagemaker_session=sagemaker_session,\n", " output_path=output_path,\n", " code_location=code_location,\n", " hyperparameters=hyperparameters,\n", " instance_count=instance_count,\n", " instance_type=instance_type,\n", " framework_version=\"0.4\",\n", " py_version=\"py38\",\n", " max_run=max_run,\n", " use_spot_instances=use_spot_instances, # spot instance 활용\n", " max_wait=max_wait,\n", ")" ] }, { "cell_type": "markdown", "id": "283aa610", "metadata": {}, "source": [ "### SageMaker 실험 설정" ] }, { "cell_type": "code", "execution_count": null, "id": "787a6ad6", "metadata": {}, "outputs": [], "source": [ "experiment_name='autogluon-poc-1'" ] }, { "cell_type": "code", "execution_count": null, "id": "912e6127", "metadata": {}, "outputs": [], "source": [ "from smexperiments.experiment import Experiment\n", "from smexperiments.trial import Trial\n", "from time import strftime" ] }, { "cell_type": "code", "execution_count": null, "id": "f94d1228", "metadata": {}, "outputs": [], "source": [ "def create_experiment(experiment_name):\n", " try:\n", " sm_experiment = Experiment.load(experiment_name)\n", " except:\n", " sm_experiment = Experiment.create(experiment_name=experiment_name)" ] }, { "cell_type": "code", "execution_count": null, "id": "60d7bec6", "metadata": {}, "outputs": [], "source": [ "def create_trial(experiment_name):\n", " create_date = strftime(\"%m%d-%H%M%s\")\n", "\n", " sm_trial = Trial.create(trial_name=f'{experiment_name}-{create_date}',\n", " experiment_name=experiment_name)\n", "\n", " job_name = f'{sm_trial.trial_name}'\n", " return job_name" ] }, { "cell_type": "markdown", "id": "fd8b8389", "metadata": {}, "source": [ "### 학습 실행" ] }, { "cell_type": "code", "execution_count": null, "id": "d8007ff9", "metadata": {}, "outputs": [], "source": [ "data_channels" ] }, { "cell_type": "code", "execution_count": null, "id": "f6c73653", "metadata": {}, "outputs": [], "source": [ "create_experiment(experiment_name)\n", "job_name = create_trial(experiment_name)\n", "\n", "ag_estimator.fit(inputs = data_channels,\n", " job_name = job_name,\n", " experiment_config={\n", " 'TrialName': job_name,\n", " 'TrialComponentDisplayName': job_name,\n", " },\n", " wait=False)" ] }, { "cell_type": "code", "execution_count": null, "id": "9f081402", "metadata": { "scrolled": true }, "outputs": [], "source": [ "ag_estimator.logs()" ] }, { "cell_type": "markdown", "id": "e97a251b", "metadata": { "tags": [] }, "source": [ "### 실험 결과 보기\n", "위의 실험한 결과를 확인 합니다.\n", "- 각각의 훈련잡의 시도에 대한 훈련 사용 데이터, 모델 입력 하이퍼 파라미터, 모델 평가 지표, 모델 아티펙트 결과 위치 등의 확인이 가능합니다.\n", "- **아래의 모든 내용은 SageMaker Studio 를 통해서 직관적으로 확인이 가능합니다.**" ] }, { "cell_type": "code", "execution_count": null, "id": "8cabceb7", "metadata": {}, "outputs": [], "source": [ "!rm -rf ./autogluon/\n", "!mkdir -p ./autogluon/result\n", "!aws s3 cp {ag_estimator.model_data} ./autogluon/" ] }, { "cell_type": "code", "execution_count": null, "id": "e5e66fcb", "metadata": {}, "outputs": [], "source": [ "!ls -alF ./autogluon/model.tar.gz" ] }, { "cell_type": "code", "execution_count": null, "id": "efeed7c9", "metadata": {}, "outputs": [], "source": [ "!tar -xzf ./autogluon/model.tar.gz -C ./autogluon/result/" ] }, { "cell_type": "markdown", "id": "7d46856c", "metadata": {}, "source": [ "### Endpoint Deployment" ] }, { "cell_type": "code", "execution_count": null, "id": "dd059e54", "metadata": {}, "outputs": [], "source": [ "instance_type = \"ml.m5.2xlarge\"\n", "# instance_type = 'local'" ] }, { "cell_type": "code", "execution_count": null, "id": "eea75911", "metadata": {}, "outputs": [], "source": [ "if instance_type == 'local':\n", " from sagemaker.local import LocalSession\n", " sagemaker_session = LocalSession()\n", " sagemaker_session.config = {'local': {'local_code': True}}\n", "else:\n", " sess = boto3.Session()\n", " sagemaker_session = sagemaker.Session()" ] }, { "cell_type": "code", "execution_count": null, "id": "9c2fba7c", "metadata": {}, "outputs": [], "source": [ "model = AutoGluonInferenceModel(\n", " source_dir=os.getcwd() + \"/src\",\n", " entry_point=\"autogluon_serve.py\",\n", " model_data=ag_estimator.model_data,\n", " instance_type=instance_type,\n", " role=role,\n", " sagemaker_session=sagemaker_session,\n", " # region=region,\n", " framework_version=\"0.4\",\n", " py_version=\"py38\",\n", " predictor_cls=AutoGluonTabularPredictor\n", ")" ] }, { "cell_type": "code", "execution_count": null, "id": "86800334", "metadata": {}, "outputs": [], "source": [ "from sagemaker.serializers import CSVSerializer\n", "\n", "predictor = model.deploy(\n", " initial_instance_count=1, serializer=CSVSerializer(), instance_type=instance_type\n", ")" ] }, { "cell_type": "markdown", "id": "545ce070", "metadata": {}, "source": [ "### Predict on unlabeled test data\n", "\n", "Remove target variable (`fraud`) from the data and get predictions for a sample of 100 rows using the deployed endpoint." ] }, { "cell_type": "code", "execution_count": null, "id": "cdf7e184", "metadata": {}, "outputs": [], "source": [ "df = pd.read_csv(\"../data/dataset/test.csv\")\n", "data = df.drop(columns=\"fraud\")[:100].values" ] }, { "cell_type": "code", "execution_count": null, "id": "58406884", "metadata": {}, "outputs": [], "source": [ "preds = predictor.predict(data)\n", "pred_df = pd.DataFrame(json.loads(preds))" ] }, { "cell_type": "code", "execution_count": null, "id": "1be1218b", "metadata": {}, "outputs": [], "source": [ "pred_df['fraud'].reset_index(drop=True, inplace=True)\n", "df[\"fraud\"][:len(pred_df)].reset_index(drop=True, inplace=True)" ] }, { "cell_type": "code", "execution_count": null, "id": "f3e05dbc", "metadata": {}, "outputs": [], "source": [ "p = pd.DataFrame({\"preds\": pred_df['fraud'], \"actual\": df[\"fraud\"][: len(pred_df)]})\n", "p.head()" ] }, { "cell_type": "code", "execution_count": null, "id": "5f2de2ec", "metadata": {}, "outputs": [], "source": [ "print(f\"{(p.preds==p.actual).astype(int).sum()}/{len(p)} are correct\")" ] }, { "cell_type": "markdown", "id": "e2730696", "metadata": {}, "source": [ "### Cleanup Endpoint" ] }, { "cell_type": "code", "execution_count": null, "id": "8c98890a", "metadata": {}, "outputs": [], "source": [ "# predictor.delete_endpoint()" ] }, { "cell_type": "markdown", "id": "5ee66a9a", "metadata": { "tags": [] }, "source": [ "# Batch Transform\n", "\n", "학습된 모델을 호스트된 엔드포인트에 배포하는 것은 출시 이후 SageMaker에서 사용할 수 있으며 웹 사이트나 모바일 앱과 같은 서비스에 실시간 예측을 제공하는 좋은 방법입니다. 그러나 지연 시간을 최소화하는 것이 문제가 되지 않는 대규모 데이터 세트에서 학습된 모델에서 예측을 생성하는 것이 목표라면 배치 변환 기능이 더 쉽고, 더 확장 가능하며, 더 적절할 수 있다.\n", "\n", "[Read more about Batch Transform](https://docs.aws.amazon.com/sagemaker/latest/dg/batch-transform.html)." ] }, { "cell_type": "code", "execution_count": null, "id": "3632031c", "metadata": {}, "outputs": [], "source": [ "instance_type = \"ml.m5.2xlarge\"" ] }, { "cell_type": "code", "execution_count": null, "id": "46805ef6", "metadata": {}, "outputs": [], "source": [ "# model = AutoGluonInferenceModel(\n", "# source_dir=os.getcwd() + \"/src\",\n", "# entry_point=\"autogluon_serve.py\",\n", "# model_data=ag_estimator.model_data,\n", "# instance_type=instance_type,\n", "# role=role,\n", "# sagemaker_session=sagemaker_session,\n", "# region=region,\n", "# framework_version=\"0.4\",\n", "# py_version=\"py38\", \n", "# predictor_cls=AutoGluonTabularPredictor,\n", "# )" ] }, { "cell_type": "code", "execution_count": null, "id": "5ab031b2", "metadata": {}, "outputs": [], "source": [ "transformer = model.transformer(\n", " instance_count=1,\n", " instance_type=instance_type,\n", " strategy=\"MultiRecord\",\n", " max_payload=6,\n", " max_concurrent_transforms=1,\n", " output_path=output_path,\n", " accept=\"application/json\",\n", " assemble_with=\"Line\",\n", ")\n" ] }, { "cell_type": "markdown", "id": "cc2f1966", "metadata": {}, "source": [ "Prepare data for batch transform" ] }, { "cell_type": "code", "execution_count": null, "id": "0e72eec3", "metadata": {}, "outputs": [], "source": [ "pd.read_csv(f\"../data/dataset/test.csv\")[:100].to_csv(\"../data/dataset/test_no_header.csv\", header=False, index=False)" ] }, { "cell_type": "code", "execution_count": null, "id": "94edaa3b", "metadata": {}, "outputs": [], "source": [ "test_input = transformer.sagemaker_session.upload_data(\n", " path=os.path.join(\"../data/dataset\", \"test_no_header.csv\"), key_prefix=f\"{bucket}/autogluon/dataset\"\n", ")\n", "test_input" ] }, { "cell_type": "code", "execution_count": null, "id": "c53e5ef6", "metadata": { "scrolled": true }, "outputs": [], "source": [ "transformer.transform(\n", " test_input,\n", " input_filter=\"$[1:]\", # filter-out target variable\n", " split_type=\"Line\",\n", " content_type=\"text/csv\",\n", " output_filter=\"$['fraud']\", # keep only prediction class in the output\n", ")\n", "\n", "transformer.wait()" ] }, { "cell_type": "markdown", "id": "f4ef88f4", "metadata": {}, "source": [ "batch transform 결과를 다운로드 받습니다." ] }, { "cell_type": "code", "execution_count": null, "id": "0008ce7a", "metadata": {}, "outputs": [], "source": [ "!rm -rf ./autogluon_batch_result\n", "!mkdir ./autogluon_batch_result" ] }, { "cell_type": "code", "execution_count": null, "id": "0510a91d", "metadata": {}, "outputs": [], "source": [ "transformer.output_path" ] }, { "cell_type": "code", "execution_count": null, "id": "0d955359", "metadata": {}, "outputs": [], "source": [ "!aws s3 cp {transformer.output_path}/test_no_header.csv.out ./autogluon_batch_result/" ] }, { "cell_type": "code", "execution_count": null, "id": "eaedc464", "metadata": {}, "outputs": [], "source": [ "p = pd.concat(\n", " [\n", " pd.read_json(\"./autogluon_batch_result/test_no_header.csv.out\", orient=\"index\")\n", " .sort_index()\n", " .rename(columns={0: \"preds\"}),\n", " pd.read_csv(\"../data/dataset/test.csv\")[[\"fraud\"]].iloc[:100].rename(columns={\"fraud\": \"actual\"}),\n", " ],\n", " axis=1,\n", ")\n", "p.head()" ] }, { "cell_type": "code", "execution_count": null, "id": "1d5e1ea5", "metadata": {}, "outputs": [], "source": [ "print(f\"{(p.preds==p.actual).astype(int).sum()}/{len(p)} are correct\")" ] }, { "cell_type": "markdown", "id": "946f9df3", "metadata": { "tags": [] }, "source": [ "### Processing Evaluation 하기\n", "SageMaker Processing을 이용하여 Evalution을 수행하는 코드를 동작할 수 있습니다. MLOps에서 Processing을 적용하면 전처리, Evaluation 등을 serverless로 동작할 수 있습니다." ] }, { "cell_type": "code", "execution_count": null, "id": "488bcaeb", "metadata": {}, "outputs": [], "source": [ "from sagemaker.processing import FrameworkProcessor\n", "from sagemaker.processing import ProcessingInput, ProcessingOutput\n", "from sagemaker.estimator import Framework" ] }, { "cell_type": "code", "execution_count": null, "id": "caa61af9", "metadata": {}, "outputs": [], "source": [ "instance_count = 1\n", "instance_type = \"ml.m5.large\"\n", "# instance_type = 'local'" ] }, { "cell_type": "code", "execution_count": null, "id": "ad69b234", "metadata": {}, "outputs": [], "source": [ "from sagemaker import image_uris\n", "\n", "image_uri = image_uris.retrieve(\n", " \"autogluon\",\n", " region=region,\n", " version=\"0.4\",\n", " py_version=\"py38\",\n", " image_scope=\"training\",\n", " instance_type=instance_type,\n", ")\n", "image_uri\n" ] }, { "cell_type": "code", "execution_count": null, "id": "d7d3acc6", "metadata": {}, "outputs": [], "source": [ "script_eval = FrameworkProcessor(\n", " AutoGluonFramework,\n", " framework_version=\"0.4\",\n", " role=role,\n", " py_version=\"py38\",\n", " image_uri=image_uri,\n", " instance_type=instance_type,\n", " instance_count=instance_count\n", ")" ] }, { "cell_type": "code", "execution_count": null, "id": "22ffaeb7", "metadata": {}, "outputs": [], "source": [ "detect_outputpath = f's3://{bucket}/autogluon/processing'" ] }, { "cell_type": "code", "execution_count": null, "id": "131e5774", "metadata": {}, "outputs": [], "source": [ "source_dir='src'\n", "\n", "if instance_type == 'local':\n", " from sagemaker.local import LocalSession\n", " from pathlib import Path\n", " \n", " sagemaker_session = LocalSession()\n", " sagemaker_session.config = {'local': {'local_code': True}}\n", " source_dir = f'{Path.cwd()}/src'\n", " s3_test_path=f'../data/dataset/test.csv'\n", "else:\n", " sagemaker_session = sagemaker.Session()\n", " s3_test_path = data_path + '/test.csv'" ] }, { "cell_type": "code", "execution_count": null, "id": "9fd72722", "metadata": {}, "outputs": [], "source": [ "create_experiment(experiment_name)\n", "job_name = create_trial(experiment_name)\n", "\n", "script_eval.run(\n", " code=\"autogluon_evaluation.py\",\n", " source_dir=source_dir,\n", " inputs=[ProcessingInput(source=s3_test_path, input_name=\"test_data\", destination=\"/opt/ml/processing/test\"),\n", " ProcessingInput(source=ag_estimator.model_data, input_name=\"model_weight\", destination=\"/opt/ml/processing/model\")\n", " ],\n", " outputs=[\n", " ProcessingOutput(source=\"/opt/ml/processing/output\", output_name='evaluation', destination=detect_outputpath + \"/\" + job_name),\n", " ],\n", " job_name=job_name,\n", " experiment_config={\n", " 'TrialName': job_name,\n", " 'TrialComponentDisplayName': job_name,\n", " },\n", " wait=False\n", ")" ] }, { "cell_type": "code", "execution_count": null, "id": "b464ac2e", "metadata": {}, "outputs": [], "source": [ "script_eval.latest_job.wait()" ] }, { "cell_type": "markdown", "id": "89048fdb-cd5a-47b1-ae04-8d0786b1a02a", "metadata": {}, "source": [ "### Code repository 생성 및 push\n", "현재 사용하는 노트북의 iam role에 IAMFullAccess을 추가한 이후에 아래 작업을 수행합니다." ] }, { "cell_type": "code", "execution_count": null, "id": "7094727d-5dfb-4081-a13d-cb9bfe440ec7", "metadata": {}, "outputs": [], "source": [ "from sagemaker import get_execution_role" ] }, { "cell_type": "code", "execution_count": null, "id": "b134f758", "metadata": {}, "outputs": [], "source": [ "iam_client = boto3.client('iam')\n", "\n", "role=get_execution_role()\n", "base_role_name=role.split('/')[-1]\n", "\n", "iam_client.attach_role_policy(\n", " RoleName=base_role_name,\n", " PolicyArn='arn:aws:iam::aws:policy/AWSCodeCommitFullAccess'\n", ")" ] }, { "cell_type": "code", "execution_count": null, "id": "692b1c06", "metadata": {}, "outputs": [], "source": [ "codecommit = boto3.client('codecommit')\n", "repository_name = 'autogluon_code'\n", "\n", "try:\n", " response = codecommit.create_repository(\n", " repositoryName=repository_name,\n", " repositoryDescription='Data Scientists share their training code using this Repository'\n", " )\n", "except:\n", " \n", " print(\"Repository already exists\")\n", " response = codecommit.get_repository(\n", " repositoryName=repository_name\n", " )" ] }, { "cell_type": "code", "execution_count": null, "id": "97457ce5-fd55-416c-9654-02f30c872fbb", "metadata": {}, "outputs": [], "source": [ "codecommit_repo = response['repositoryMetadata']['cloneUrlHttp']\n", "codecommit_repo" ] }, { "cell_type": "code", "execution_count": null, "id": "640b4135-8c3b-4c27-8944-f74ae1de49b6", "metadata": {}, "outputs": [], "source": [ "!git init\n", "!git remote add repo_codecommit $codecommit_repo\n", "!git checkout -b main\n", "!git add ./config ./src ./1.SageMaker-Training+Experiments+Processing-AutoGluon.ipynb\n", "!git commit -m \"autogluon-update\"\n", "!git push --set-upstream repo_codecommit main" ] }, { "cell_type": "code", "execution_count": null, "id": "d47b4c77-8203-4db6-8b8f-bb31b35e68ed", "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "instance_type": "ml.t3.medium", "kernelspec": { "display_name": "conda_python3", "language": "python", "name": "conda_python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.12" } }, "nbformat": 4, "nbformat_minor": 5 }