{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# [모듈 3.1] HPO 스텝 개발 (Tunning Step)\n", "\n", "이 노트북은 아래와 같은 목차로 진행 됩니다. 전체를 모두 실행시에 완료 시간은 약 5분-10분 소요 됩니다.\n", "\n", "- 1. 모델 튜닝 개요\n", "- 2. 기본 라이브러리 로딩\n", "- 3. 훈련에 사용할 전처리된 파일을 확인\n", "- 4. 모델 빌딩 파이프라인 의 스텝(Step) 생성\n", "- 5. 파리마터, 단계, 조건을 조합하여 최종 파이프라인 정의 및 실행\n", "- 6. HPO 잡 실행 확인 하기\n", " \n", "---" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# 1. 모델 튜닝 개요\n", "\n", "하이퍼파라미터 튜닝이라고도 하는 Amazon SageMaker 자동 모델 튜닝은 사용자가 지정한 알고리즘과 다양한 하이퍼파라미터를 사용하여 데이터 세트에 대해 여러 훈련 작업을 실행하여 최적의 모델 버전을 찾습니다. 그런 다음 선택한 지표로 측정된 값에 따라 최적의 성능을 보여준 모델을 만든 하이퍼파라미터 값을 선택합니다.\n", "\n", "\n", "\n", "- 참고\n", " - 개발자 가이드: [SageMaker 로 자동 모델 튜닝 수행](https://docs.aws.amazon.com/ko_kr/sagemaker/latest/dg/automatic-model-tuning.html)\n", " - 공식 세이지 메이커의 샘플 입니다. --> [HPO 시작 코드](https://github.com/aws/amazon-sagemaker-examples/blob/master/hyperparameter_tuning/xgboost_direct_marketing/hpo_xgboost_direct_marketing_sagemaker_python_sdk.ipynb)\n", "\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# 2. 기본 라이브러리 로딩\n", "\n", "세이지 메이커 관련 라이브러리를 로딩 합니다." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import boto3\n", "import sagemaker\n", "import pandas as pd\n", "import os\n", "\n", "#region = boto3.Session().region_name\n", "sagemaker_session = sagemaker.session.Session()\n", "role = sagemaker.get_execution_role()\n", "sm_client = boto3.client(\"sagemaker\")\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 2.1 노트북 변수 로딩\n" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "%store -r" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# 3. 훈련에 사용할 전처리된 파일을 확인\n", "이후에 훈련에서 사용할 S3의 저장된 전처리 데이터를 확인 합니다." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "2022-03-21 12:16:19 682602 sagemaker-webinar-pipeline-advanced/preporc/train.csv\n" ] } ], "source": [ "! aws s3 ls {train_preproc_data_uri} --recursive" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/html": [ "

\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "

	fraud	vehicle_claim	total_claim_amount	customer_age	months_as_customer	num_claims_past_year	num_insurers_past_5_years	policy_deductable	policy_annual_premium	customer_zip	...	collision_type_missing	incident_severity_Major	incident_severity_Minor	incident_severity_Totaled	authorities_contacted_Ambulance	authorities_contacted_Fire	authorities_contacted_None	authorities_contacted_Police	police_report_available_No	police_report_available_Yes
0	0	8913.668763	80513.668763	54	94	0	1	750	3000	99207	...	0	0	1	0	0	0	1	0	1	0
1	0	19746.724395	26146.724395	41	165	0	1	750	2950	95632	...	0	0	0	1	0	0	0	1	0	1
2	0	11652.969918	22052.969918	57	155	0	1	750	3000	93203	...	0	0	1	0	0	0	0	1	0	1
3	0	11260.930936	115960.930936	39	80	0	1	750	3000	85208	...	0	0	1	0	0	0	1	0	1	0
4	0	27987.704652	31387.704652	39	60	0	1	750	3000	91792	...	0	1	0	0	0	0	0	1	1	0
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
3995	0	18052.611626	67152.611626	42	103	1	1	750	3000	93654	...	0	0	1	0	0	0	1	0	1	0
3996	0	34949.202468	51749.202468	23	6	0	3	750	3000	94305	...	0	0	0	1	1	0	0	0	1	0
3997	0	4063.701410	9963.701410	44	35	0	2	750	2550	95476	...	0	0	1	0	0	0	0	1	0	1
3998	0	17390.520451	20490.520451	22	38	0	1	750	3000	90680	...	0	1	0	0	0	0	0	1	0	1
3999	0	2501.811593	8401.811593	57	74	0	1	900	2650	98029	...	0	1	0	0	0	0	0	1	0	1

\n", "

4000 rows × 59 columns

\n", "

" ], "text/plain": [ " fraud vehicle_claim total_claim_amount customer_age \\\n", "0 0 8913.668763 80513.668763 54 \n", "1 0 19746.724395 26146.724395 41 \n", "2 0 11652.969918 22052.969918 57 \n", "3 0 11260.930936 115960.930936 39 \n", "4 0 27987.704652 31387.704652 39 \n", "... ... ... ... ... \n", "3995 0 18052.611626 67152.611626 42 \n", "3996 0 34949.202468 51749.202468 23 \n", "3997 0 4063.701410 9963.701410 44 \n", "3998 0 17390.520451 20490.520451 22 \n", "3999 0 2501.811593 8401.811593 57 \n", "\n", " months_as_customer num_claims_past_year num_insurers_past_5_years \\\n", "0 94 0 1 \n", "1 165 0 1 \n", "2 155 0 1 \n", "3 80 0 1 \n", "4 60 0 1 \n", "... ... ... ... \n", "3995 103 1 1 \n", "3996 6 0 3 \n", "3997 35 0 2 \n", "3998 38 0 1 \n", "3999 74 0 1 \n", "\n", " policy_deductable policy_annual_premium customer_zip ... \\\n", "0 750 3000 99207 ... \n", "1 750 2950 95632 ... \n", "2 750 3000 93203 ... \n", "3 750 3000 85208 ... \n", "4 750 3000 91792 ... \n", "... ... ... ... ... \n", "3995 750 3000 93654 ... \n", "3996 750 3000 94305 ... \n", "3997 750 2550 95476 ... \n", "3998 750 3000 90680 ... \n", "3999 900 2650 98029 ... \n", "\n", " collision_type_missing incident_severity_Major \\\n", "0 0 0 \n", "1 0 0 \n", "2 0 0 \n", "3 0 0 \n", "4 0 1 \n", "... ... ... \n", "3995 0 0 \n", "3996 0 0 \n", "3997 0 0 \n", "3998 0 1 \n", "3999 0 1 \n", "\n", " incident_severity_Minor incident_severity_Totaled \\\n", "0 1 0 \n", "1 0 1 \n", "2 1 0 \n", "3 1 0 \n", "4 0 0 \n", "... ... ... \n", "3995 1 0 \n", "3996 0 1 \n", "3997 1 0 \n", "3998 0 0 \n", "3999 0 0 \n", "\n", " authorities_contacted_Ambulance authorities_contacted_Fire \\\n", "0 0 0 \n", "1 0 0 \n", "2 0 0 \n", "3 0 0 \n", "4 0 0 \n", "... ... ... \n", "3995 0 0 \n", "3996 1 0 \n", "3997 0 0 \n", "3998 0 0 \n", "3999 0 0 \n", "\n", " authorities_contacted_None authorities_contacted_Police \\\n", "0 1 0 \n", "1 0 1 \n", "2 0 1 \n", "3 1 0 \n", "4 0 1 \n", "... ... ... \n", "3995 1 0 \n", "3996 0 0 \n", "3997 0 1 \n", "3998 0 1 \n", "3999 0 1 \n", "\n", " police_report_available_No police_report_available_Yes \n", "0 1 0 \n", "1 0 1 \n", "2 0 1 \n", "3 1 0 \n", "4 1 0 \n", "... ... ... \n", "3995 1 0 \n", "3996 1 0 \n", "3997 0 1 \n", "3998 0 1 \n", "3999 0 1 \n", "\n", "[4000 rows x 59 columns]" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "train_prep_df = pd.read_csv(train_preproc_data_uri)\n", "train_prep_df" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# 4. 모델 빌딩 파이프라인 의 스텝(Step) 생성\n", "- 개발자 가이드의 튜닝 단계 참고 --> [튜닝 단계](https://docs.aws.amazon.com/ko_kr/sagemaker/latest/dg/build-and-manage-steps.html#step-type-tuning)\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 4.1 모델 빌딩 파이프라인 변수 생성\n", "\n", "\n", "본 노트북에서 사용하는 파라미터는 다음과 같습니다.\n", "\n", "* `training_instance_type` - 학습작업에서 사용할 `ml.*` 인스턴스 타입\n", "* `training_instance_count` - 학습작업에서 사용할 `ml.*` 인스턴스 갯수\n", "* `input_data` - 입력데이터에 대한 S3 버킷 URI\n", "\n", "\n", "\n", "파이프라인의 각 스텝에서 사용할 변수를 파라미터 변수로서 정의 합니다.\n" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "from sagemaker.workflow.parameters import (\n", " ParameterInteger,\n", " ParameterString,\n", ")\n", "\n", "train_instance_type = ParameterString(\n", " name=\"TrainingInstanceType\",\n", " default_value=\"ml.m5.xlarge\"\n", ")\n", "\n", "\n", "train_instance_count = ParameterInteger(\n", " name=\"TrainInstanceCount\",\n", " default_value=1\n", ")\n", "\n", "input_data = ParameterString(\n", " name=\"InputData\",\n", " default_value=train_preproc_data_uri,\n", ")\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 4.2 고정(Static) 하이퍼파라미터 세팅" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "fraud_sum: 131 , non_fraud_sum: 3869, class_weight: 29\n" ] } ], "source": [ "from src.p_utils import get_pos_scale_weight\n", "class_weight = get_pos_scale_weight(train_prep_df, label='fraud')" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [], "source": [ "hyperparameters = {\n", " \"scale_pos_weight\" : class_weight, \n", " \"objective\": \"binary:logistic\",\n", " \"num_round\": \"100\",\n", "}\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 4.3 Estimator 생성\n", "\n", "Estimator 생성시에 인자가 필요 합니다. 주요한 인자만 보겠습니다.\n", "- 사용자 훈련 코드 \"\"xgboost_script.py\"\n", "- 훈련이 끝난 후에 결과인 모델 아티펙트의 경로 \"estimator_output_path\" 지정 합니다. 지정 안할 시에는 디폴트 경로로 저장 됩니다.\n" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "estimator_output_path: \n", " s3://sagemaker-us-east-1-051065130547/sagemaker-webinar-pipeline-advanced/tuning_jobs\n" ] } ], "source": [ "from sagemaker.xgboost.estimator import XGBoost\n", "\n", "estimator_output_path = f's3://{bucket}/{project_prefix}/tuning_jobs'\n", "print(\"estimator_output_path: \\n\", estimator_output_path)\n", "\n", "xgb_estimator = XGBoost(\n", " entry_point = \"xgboost_script.py\",\n", " source_dir = \"src\",\n", " output_path = estimator_output_path,\n", " hyperparameters = hyperparameters,\n", " role = role,\n", " instance_count = train_instance_count,\n", " instance_type = train_instance_type,\n", " framework_version = \"1.0-1\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "기본 XGBoost 하이퍼파라미터 외에 `scale_pos_weight` 는 레이블이 뷸균형이 되어 있을 경우에, 레이블 값의 가중치를 부여하는 파라미터 입니다. 레이블 0, 1 의 비율에 따라 지정합니다." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 4.4 튜닝할 하이퍼파라미터 범위 설정\n", "여기서는 `eta, min_child_weight, alpha, max_depth` 를 튜닝 합니다." ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [], "source": [ "from sagemaker.tuner import (\n", " IntegerParameter,\n", " CategoricalParameter,\n", " ContinuousParameter,\n", " HyperparameterTuner,\n", ")\n", "\n", "\n", "hyperparameter_ranges = {\n", " \"eta\": ContinuousParameter(0, 1),\n", " \"min_child_weight\": ContinuousParameter(1, 10),\n", " \"alpha\": ContinuousParameter(0, 2),\n", " \"max_depth\": IntegerParameter(1, 10),\n", "}\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 4.5 튜너 설정 및 생성\n", "- xbg_estimator 정의된 estimator 기술\n", "- `objective_metric_name = \"validation:auc\"` 튜닝을 하고자 하는 지표 기술\n", " - 이 지표의 경우는 훈련 코드에서 정의 및 기록을 해야만 합니다.\n", "- `hyperparameter_ranges` 튜닝하고자 하는 파라미터의 범위 설정\n", "- `max_jobs` 기술\n", " - 총 훈련잡의 갯수 입니다.\n", "- `max_parallel_jobs` 기술\n", " - 병렬로 실행할 훈련잡의 개수 (리소스 제한에 따라서 에러가 발생할 수 있습니다. 이 경우에 줄여 주세요.)\n" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [], "source": [ "objective_metric_name = \"validation:auc\"\n", "\n", "pipeline_tuner = HyperparameterTuner(\n", " xgb_estimator, objective_metric_name, hyperparameter_ranges, \n", " max_jobs=5,\n", " max_parallel_jobs=5,\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 4.6 튜닝 단계 정의 \n", "\n" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [], "source": [ "from sagemaker.inputs import TrainingInput\n", "from sagemaker.workflow.steps import TuningStep\n", "from sagemaker.model import Model\n", " \n", "step_tuning = TuningStep(\n", " name = \"Fraud-Advance-HPO\",\n", " tuner = pipeline_tuner,\n", " inputs={\n", " \"train\": TrainingInput(\n", " s3_data= input_data,\n", " content_type=\"text/csv\"\n", " ),\n", " },\n", ")\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# 5. 파리마터, 단계, 조건을 조합하여 최종 파이프라인 정의 및 실행\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 5.1 파이프라인 정의\n", "파이프라인 정의시에 아래 3개의 인자를 제공합니다.\n", "- 파이프라인 이름\n", "- 파이프라인 파라미터\n", "- 파이프라인 실험 설정\n", "- 스텝 정의 " ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [], "source": [ "from sagemaker.workflow.pipeline import Pipeline\n", "\n", "from sagemaker.workflow.execution_variables import ExecutionVariables\n", "from sagemaker.workflow.pipeline_experiment_config import PipelineExperimentConfig\n", "\n", "project_hpo_prefix = project_prefix + \"-HPO-Step\"\n", "\n", "pipeline_name = project_prefix\n", "pipeline = Pipeline(\n", " name=project_hpo_prefix,\n", " parameters=[\n", " train_instance_type, \n", " train_instance_count, \n", " input_data,\n", " ], \n", " pipeline_experiment_config=PipelineExperimentConfig(\n", " ExecutionVariables.PIPELINE_NAME,\n", " ExecutionVariables.PIPELINE_EXECUTION_ID\n", " ), \n", " steps=[step_tuning],\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 5.2 파이프라인 정의 확인\n", "위에서 정의한 파이프라인 정의는 Json 형식으로 정의 되어 있습니다." ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{'Version': '2020-12-01',\n", " 'Metadata': {},\n", " 'Parameters': [{'Name': 'TrainingInstanceType',\n", " 'Type': 'String',\n", " 'DefaultValue': 'ml.m5.xlarge'},\n", " {'Name': 'TrainInstanceCount', 'Type': 'Integer', 'DefaultValue': 1},\n", " {'Name': 'InputData',\n", " 'Type': 'String',\n", " 'DefaultValue': 's3://sagemaker-us-east-1-051065130547/sagemaker-webinar-pipeline-advanced/preporc/train.csv'}],\n", " 'PipelineExperimentConfig': {'ExperimentName': {'Get': 'Execution.PipelineName'},\n", " 'TrialName': {'Get': 'Execution.PipelineExecutionId'}},\n", " 'Steps': [{'Name': 'Fraud-Advance-HPO',\n", " 'Type': 'Tuning',\n", " 'Arguments': {'HyperParameterTuningJobConfig': {'Strategy': 'Bayesian',\n", " 'ResourceLimits': {'MaxNumberOfTrainingJobs': 5,\n", " 'MaxParallelTrainingJobs': 5},\n", " 'TrainingJobEarlyStoppingType': 'Off',\n", " 'HyperParameterTuningJobObjective': {'Type': 'Maximize',\n", " 'MetricName': 'validation:auc'},\n", " 'ParameterRanges': {'ContinuousParameterRanges': [{'Name': 'eta',\n", " 'MinValue': '0',\n", " 'MaxValue': '1',\n", " 'ScalingType': 'Auto'},\n", " {'Name': 'min_child_weight',\n", " 'MinValue': '1',\n", " 'MaxValue': '10',\n", " 'ScalingType': 'Auto'},\n", " {'Name': 'alpha',\n", " 'MinValue': '0',\n", " 'MaxValue': '2',\n", " 'ScalingType': 'Auto'}],\n", " 'CategoricalParameterRanges': [],\n", " 'IntegerParameterRanges': [{'Name': 'max_depth',\n", " 'MinValue': '1',\n", " 'MaxValue': '10',\n", " 'ScalingType': 'Auto'}]}},\n", " 'TrainingJobDefinition': {'StaticHyperParameters': {'scale_pos_weight': '29',\n", " 'objective': '\"binary:logistic\"',\n", " 'num_round': '\"100\"',\n", " 'sagemaker_submit_directory': '\"s3://sagemaker-us-east-1-051065130547/sagemaker-xgboost-2022-03-21-12-26-39-683/source/sourcedir.tar.gz\"',\n", " 'sagemaker_program': '\"xgboost_script.py\"',\n", " 'sagemaker_container_log_level': '20',\n", " 'sagemaker_job_name': '\"sagemaker-xgboost-2022-03-21-12-26-39-683\"',\n", " 'sagemaker_region': '\"us-east-1\"',\n", " 'sagemaker_estimator_class_name': '\"XGBoost\"',\n", " 'sagemaker_estimator_module': '\"sagemaker.xgboost.estimator\"'},\n", " 'RoleArn': 'arn:aws:iam::051065130547:role/sagemaker-notebook-SageMakerIamRole-13SLYUPDCYIY9',\n", " 'OutputDataConfig': {'S3OutputPath': 's3://sagemaker-us-east-1-051065130547/sagemaker-webinar-pipeline-advanced/tuning_jobs'},\n", " 'ResourceConfig': {'InstanceCount': {'Get': 'Parameters.TrainInstanceCount'},\n", " 'InstanceType': {'Get': 'Parameters.TrainingInstanceType'},\n", " 'VolumeSizeInGB': 30},\n", " 'StoppingCondition': {'MaxRuntimeInSeconds': 86400},\n", " 'AlgorithmSpecification': {'TrainingInputMode': 'File',\n", " 'TrainingImage': '683313688378.dkr.ecr.us-east-1.amazonaws.com/sagemaker-xgboost:1.0-1-cpu-py3'},\n", " 'InputDataConfig': [{'DataSource': {'S3DataSource': {'S3DataType': 'S3Prefix',\n", " 'S3Uri': {'Get': 'Parameters.InputData'},\n", " 'S3DataDistributionType': 'FullyReplicated'}},\n", " 'ContentType': 'text/csv',\n", " 'ChannelName': 'train'}]}}}]}" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import json\n", "\n", "definition = json.loads(pipeline.definition())\n", "definition" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 5.3 파이프라인 정의를 제출하고 실행하기 \n", "\n", "파이프라인 정의를 파이프라인 서비스에 제출합니다. 함께 전달되는 역할(role)을 이용하여 AWS에서 파이프라인을 생성하고 작업의 각 단계를 실행할 것입니다. " ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [], "source": [ "pipeline.upsert(role_arn=role)\n", "execution = pipeline.start()\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "워크플로우의 실행상황을 살펴봅니다. " ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{'PipelineArn': 'arn:aws:sagemaker:us-east-1:051065130547:pipeline/sagemaker-webinar-pipeline-advanced-hpo-step',\n", " 'PipelineExecutionArn': 'arn:aws:sagemaker:us-east-1:051065130547:pipeline/sagemaker-webinar-pipeline-advanced-hpo-step/execution/16u4wi1cvgf6',\n", " 'PipelineExecutionDisplayName': 'execution-1647865601970',\n", " 'PipelineExecutionStatus': 'Executing',\n", " 'PipelineExperimentConfig': {'ExperimentName': 'sagemaker-webinar-pipeline-advanced-hpo-step',\n", " 'TrialName': '16u4wi1cvgf6'},\n", " 'CreationTime': datetime.datetime(2022, 3, 21, 12, 26, 41, 866000, tzinfo=tzlocal()),\n", " 'LastModifiedTime': datetime.datetime(2022, 3, 21, 12, 26, 41, 866000, tzinfo=tzlocal()),\n", " 'CreatedBy': {},\n", " 'LastModifiedBy': {},\n", " 'ResponseMetadata': {'RequestId': 'bfeb52a7-c92f-40a8-8dd9-5fb0ab1ac838',\n", " 'HTTPStatusCode': 200,\n", " 'HTTPHeaders': {'x-amzn-requestid': 'bfeb52a7-c92f-40a8-8dd9-5fb0ab1ac838',\n", " 'content-type': 'application/x-amz-json-1.1',\n", " 'content-length': '573',\n", " 'date': 'Mon, 21 Mar 2022 12:26:42 GMT'},\n", " 'RetryAttempts': 0}}" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "execution.describe()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 5.4 파이프라인 실행 기다리기" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [], "source": [ "execution.wait()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 5.5 파이프라인 실행 단계 기록 보기" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[{'StepName': 'Fraud-Advance-HPO',\n", " 'StartTime': datetime.datetime(2022, 3, 21, 12, 26, 42, 640000, tzinfo=tzlocal()),\n", " 'EndTime': datetime.datetime(2022, 3, 21, 12, 30, 34, 662000, tzinfo=tzlocal()),\n", " 'StepStatus': 'Succeeded',\n", " 'AttemptCount': 0,\n", " 'Metadata': {'TuningJob': {'Arn': 'arn:aws:sagemaker:us-east-1:051065130547:hyper-parameter-tuning-job/16u4wi1cvgf6-fraud-a-clcktdpyo0'}}}]" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "execution.list_steps()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# 6. HPO 잡 실행 확인 하기\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 6.1 세이지 메이커 스튜디오에서 튜닝 스텝 결과 확인 하기\n", "실행된 파이프라인의 그래프의 HPO 스텝을 클릭하면 아래와 같은 상세 사항이 나옵니다.\n", "- input 은 위에서 Tunner 의 정의한 내역이 보여 집니다.\n", "- output 은 실제 HPO 잡이 수행한 후에 결과가 보입니다." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "![hpo-result.png](img/hpo-result.png)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 6.2 Python SDK 로 실행 결과 확인 하기" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "tunning job 중에서 가장 최근에 실행한 job name 을 가져옵니다." ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [], "source": [ "list_tuning_job = sm_client.list_hyper_parameter_tuning_jobs(\n", " SortBy = 'CreationTime',\n", " SortOrder = 'Descending'\n", ")\n", "latest_tuner_job_name = list_tuning_job['HyperParameterTuningJobSummaries'][0]['HyperParameterTuningJobName']" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Best 훈련 Job 출력\n", "- 수행된 훈련 잡 중에서 가장 성능이 좋은 훈련 잡을 기술하고, 최종 사용된 하이퍼 파리미터 값을 보여 줌" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "5 training jobs have completed\n", "Best model found so far:\n", "{'CreationTime': datetime.datetime(2022, 3, 21, 12, 26, 52, tzinfo=tzlocal()),\n", " 'FinalHyperParameterTuningJobObjectiveMetric': {'MetricName': 'validation:auc',\n", " 'Value': 0.8003000020980835},\n", " 'ObjectiveStatus': 'Succeeded',\n", " 'TrainingEndTime': datetime.datetime(2022, 3, 21, 12, 29, 52, tzinfo=tzlocal()),\n", " 'TrainingJobArn': 'arn:aws:sagemaker:us-east-1:051065130547:training-job/16u4wi1cvgf6-fraud-a-clcktdpyo0-003-d6725723',\n", " 'TrainingJobName': '16u4wi1cvgf6-Fraud-A-cLCKTDpYO0-003-d6725723',\n", " 'TrainingJobStatus': 'Completed',\n", " 'TrainingStartTime': datetime.datetime(2022, 3, 21, 12, 28, 15, tzinfo=tzlocal()),\n", " 'TunedHyperParameters': {'alpha': '1.7510442116057328',\n", " 'eta': '0.9045527299383493',\n", " 'max_depth': '4',\n", " 'min_child_weight': '8.499031101347908'}}\n" ] } ], "source": [ "from pprint import pprint\n", "\n", "# run this cell to check current status of hyperparameter tuning job\n", "tuning_job_result = sm_client.describe_hyper_parameter_tuning_job(\n", " HyperParameterTuningJobName=latest_tuner_job_name\n", ")\n", "\n", "status = tuning_job_result[\"HyperParameterTuningJobStatus\"]\n", "if status != \"Completed\":\n", " print(\"Reminder: the tuning job has not been completed.\")\n", "\n", "job_count = tuning_job_result[\"TrainingJobStatusCounters\"][\"Completed\"]\n", "print(\"%d training jobs have completed\" % job_count)\n", "is_minimize = (\n", " tuning_job_result[\"HyperParameterTuningJobConfig\"][\"HyperParameterTuningJobObjective\"][\"Type\"] != \"Maximize\"\n", ")\n", "objective_name = tuning_job_result[\"HyperParameterTuningJobConfig\"][\"HyperParameterTuningJobObjective\"][\"MetricName\"]\n", "\n", "if tuning_job_result.get(\"BestTrainingJob\", None):\n", " print(\"Best model found so far:\")\n", " pprint(tuning_job_result[\"BestTrainingJob\"])\n", "else:\n", " print(\"No training jobs have reported results yet.\")\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 튜닝을 수행한 모든 훈련 잡의 결과 확인\n", "- `FinalObjectiveValue` 의 성능 지표 순서로 보여 줌" ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Number of training jobs with valid objective: 5\n", "{'lowest': 0.7838000059127808, 'highest': 0.8003000020980835}\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "/home/ec2-user/anaconda3/envs/python3/lib/python3.6/site-packages/ipykernel/__main__.py:13: FutureWarning: Passing a negative integer is deprecated in version 1.0 and will not be supported in future version. Instead, use None to not limit the column width.\n" ] }, { "data": { "text/html": [ "

\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "

	alpha	eta	max_depth	min_child_weight	TrainingJobName	TrainingJobStatus	FinalObjectiveValue	TrainingStartTime	TrainingEndTime	TrainingElapsedTimeSeconds
2	1.751044	0.904553	4.0	8.499031	16u4wi1cvgf6-Fraud-A-cLCKTDpYO0-003-d6725723	Completed	0.8003	2022-03-21 12:28:15+00:00	2022-03-21 12:29:52+00:00	97.0
0	1.149466	0.732784	6.0	8.921073	16u4wi1cvgf6-Fraud-A-cLCKTDpYO0-005-336f104a	Completed	0.7910	2022-03-21 12:28:24+00:00	2022-03-21 12:29:47+00:00	83.0
1	1.887661	0.732418	4.0	4.856367	16u4wi1cvgf6-Fraud-A-cLCKTDpYO0-004-1c6910eb	Completed	0.7905	2022-03-21 12:28:22+00:00	2022-03-21 12:29:44+00:00	82.0
3	1.965165	0.397501	6.0	6.336754	16u4wi1cvgf6-Fraud-A-cLCKTDpYO0-002-50212b84	Completed	0.7887	2022-03-21 12:28:22+00:00	2022-03-21 12:29:58+00:00	96.0
4	0.341912	0.571159	10.0	1.933035	16u4wi1cvgf6-Fraud-A-cLCKTDpYO0-001-e3a8d495	Completed	0.7838	2022-03-21 12:28:27+00:00	2022-03-21 12:29:49+00:00	82.0

\n", "

" ], "text/plain": [ " alpha eta max_depth min_child_weight \\\n", "2 1.751044 0.904553 4.0 8.499031 \n", "0 1.149466 0.732784 6.0 8.921073 \n", "1 1.887661 0.732418 4.0 4.856367 \n", "3 1.965165 0.397501 6.0 6.336754 \n", "4 0.341912 0.571159 10.0 1.933035 \n", "\n", " TrainingJobName TrainingJobStatus \\\n", "2 16u4wi1cvgf6-Fraud-A-cLCKTDpYO0-003-d6725723 Completed \n", "0 16u4wi1cvgf6-Fraud-A-cLCKTDpYO0-005-336f104a Completed \n", "1 16u4wi1cvgf6-Fraud-A-cLCKTDpYO0-004-1c6910eb Completed \n", "3 16u4wi1cvgf6-Fraud-A-cLCKTDpYO0-002-50212b84 Completed \n", "4 16u4wi1cvgf6-Fraud-A-cLCKTDpYO0-001-e3a8d495 Completed \n", "\n", " FinalObjectiveValue TrainingStartTime TrainingEndTime \\\n", "2 0.8003 2022-03-21 12:28:15+00:00 2022-03-21 12:29:52+00:00 \n", "0 0.7910 2022-03-21 12:28:24+00:00 2022-03-21 12:29:47+00:00 \n", "1 0.7905 2022-03-21 12:28:22+00:00 2022-03-21 12:29:44+00:00 \n", "3 0.7887 2022-03-21 12:28:22+00:00 2022-03-21 12:29:58+00:00 \n", "4 0.7838 2022-03-21 12:28:27+00:00 2022-03-21 12:29:49+00:00 \n", "\n", " TrainingElapsedTimeSeconds \n", "2 97.0 \n", "0 83.0 \n", "1 82.0 \n", "3 96.0 \n", "4 82.0 " ] }, "execution_count": 21, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import pandas as pd\n", "\n", "tuner_df = sagemaker.HyperparameterTuningJobAnalytics(latest_tuner_job_name)\n", "\n", "full_df = tuner_df.dataframe()\n", "\n", "if len(full_df) > 0:\n", " df = full_df[full_df[\"FinalObjectiveValue\"] > -float(\"inf\")]\n", " if len(df) > 0:\n", " df = df.sort_values(\"FinalObjectiveValue\", ascending=is_minimize)\n", " print(\"Number of training jobs with valid objective: %d\" % len(df))\n", " print({\"lowest\": min(df[\"FinalObjectiveValue\"]), \"highest\": max(df[\"FinalObjectiveValue\"])})\n", " pd.set_option(\"display.max_colwidth\", -1) # Don't truncate TrainingJobName\n", " else:\n", " print(\"No training jobs have reported valid results yet.\")\n", "\n", "df" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "instance_type": "ml.m5.large", "kernelspec": { "display_name": "conda_python3", "language": "python", "name": "conda_python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.13" } }, "nbformat": 4, "nbformat_minor": 4 }