{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# [모듈 4.1] 평가 및 조건 단계 개발 \n", "\n", "이 노트북은 아래와 같은 목차로 진행 됩니다. 전체를 모두 실행시에 완료 시간은 약 5분-10분 소요 됩니다.\n", "\n", "- 1. 모델 평가 및 조건 단계 개요\n", "- 2. 기본 라이브러리 로딩\n", "- 3. 모델 아피텍트 위치 확인\n", "- 4. 모델 빌딩 파이프라인 의 스텝(Step) 생성\n", "- 5. 파리마터, 단계, 조건을 조합하여 최종 파이프라인 정의 및 실행\n", "- 6. 파이프라인 실행 확인 하기\n", " \n", "---" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# 1. 모델 평가 및 조건 단계 개요\n", "\n", "- 모델 처리 및 조건 단계의 개발자 가이드 \n", " - [모델 처리 스텝](https://docs.aws.amazon.com/ko_kr/sagemaker/latest/dg/build-and-manage-steps.html#step-type-processing)\n", " - [모델 조건 단계](https://docs.aws.amazon.com/ko_kr/sagemaker/latest/dg/build-and-manage-steps.html#step-type-condition)\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# 2. 기본 라이브러리 로딩\n", "\n", "세이지 메이커 관련 라이브러리를 로딩 합니다." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import boto3\n", "import sagemaker\n", "import pandas as pd\n", "import os\n", "\n", "#region = boto3.Session().region_name\n", "sagemaker_session = sagemaker.session.Session()\n", "role = sagemaker.get_execution_role()\n", "sm_client = boto3.client(\"sagemaker\")\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 2.1 노트북 변수 로딩\n" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "%store -r" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# 3. 모델 아피텍트 위치 확인" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "train_model_artifact: \n", " s3://sagemaker-us-east-1-051065130547/sagemaker-webinar-pipeline-advanced/training_jobs/pipelines-twg45ptdqzid-Fraud-Advance-Train-lzWNxOzDEB/output/model.tar.gz\n" ] } ], "source": [ "print(\"train_model_artifact: \\n\", train_model_artifact)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# 4. 모델 빌딩 파이프라인 의 스텝(Step) 생성\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 4.1 모델 빌딩 파이프라인 변수 생성\n", "\n", "본 노트북에서 사용하는 파라미터는 다음과 같습니다.\n", "\n", "* `processing_instance_type` - 프로세싱 작업에서 사용할 `ml.*` 인스턴스 타입\n", "* `processing_instance_count` - 프로세싱 작업에서 사용할 `ml.*` 인스턴스 갯수\n", "* `input_data` - 입력데이터에 대한 S3 버킷 URI\n", "\n", "\n", "\n", "파이프라인의 각 스텝에서 사용할 변수를 파라미터 변수로서 정의 합니다.\n" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "from sagemaker.workflow.parameters import (\n", " ParameterInteger,\n", " ParameterString,\n", ")\n", "\n", "processing_instance_count = ParameterInteger(\n", " name=\"ProcessingInstanceCount\",\n", " default_value=1\n", ")\n", "processing_instance_type = ParameterString(\n", " name=\"ProcessingInstanceType\",\n", " default_value=\"ml.m5.xlarge\"\n", ")\n", "train_model_artifact_path = ParameterString(\n", " name=\"TrainModelArtifactPath\",\n", " default_value= train_model_artifact\n", ")\n", "test_preproc_data_path = ParameterString(\n", " name=\"TestPreprocDataPath\",\n", " default_value= test_preproc_data_uri\n", ")\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 4.2 SKLearn Processor 생성\n", "\n", "SKLearn Processor 생성시에 인자가 필요 합니다.\n", "\n" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "#from sagemaker.processing import ScriptProcessor\n", "from sagemaker.sklearn.processing import SKLearnProcessor\n", "from sagemaker.processing import ProcessingInput, ProcessingOutput\n", "\n", "script_eval = SKLearnProcessor(\n", " framework_version= \"0.23-1\",\n", " role=role,\n", " instance_type=processing_instance_type,\n", " instance_count= processing_instance_count,\n", " base_job_name=\"script-fraud-scratch-eval\",\n", " )\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 4.3 Property 파일 정의\n", "\n", "- PropertyFile 은 step_eval 단계가 실행 한 후에 모델 평가 지표의 결과 파일 내용을 정의하는데 사용 됩니다.\n", "\n", "```\n", "형식:\n", " = PropertyFile(\n", " name=\"\",\n", " output_name=\"\",\n", " path=\"\"\n", ")\n", "예시:\n", "evaluation_report = PropertyFile(\n", " name=\"EvaluationReport\",\n", " output_name=\"evaluation\",\n", " path=\"evaluation.json\"\n", ")\n", "```\n", "\n", "\n", "- 위의 PropertyFile 의 output_name=\"evaluation\" 이고, 파일 이름은 evaluation.json\" 라는 것을 의미 합니다. \"evaluation.json\" 파일 안에는 아래의 값이 저장이 됩니다.\n", "\n", "```\n", " report_dict = {\n", " \"binary_classification_metrics\": {\n", " \"auc\": {\n", " \"value\": ,\n", " \"standard_deviation\" : \"NaN\",\n", " },\n", " },\n", " }\n", "```\n", "\n", "\n", "- 최종적으로 evaluation.json 안의 \\ 값이 추후 (조건 스텝) 에 사용이 됩니다.\n", "- step_eval 이 실행이 되면 `evaluation.json` 이 S3에 저장이 됩니다.\n", "\n", "\n", "\n", "#### 참고\n", "- 참고 자료: [Property Files and JsonGet](https://docs.aws.amazon.com/ko_kr/sagemaker/latest/dg/build-and-manage-propertyfile.html)" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [], "source": [ "from sagemaker.workflow.properties import PropertyFile\n", "from sagemaker.workflow.steps import ProcessingStep\n", "\n", "from sagemaker.workflow.properties import PropertyFile\n", "\n", "\n", "evaluation_report = PropertyFile(\n", " name=\"EvaluationReport\",\n", " output_name=\"evaluation\",\n", " path=\"evaluation.json\"\n", ")\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 4.4 모델 평가 스텝 정의" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [], "source": [ "step_eval = ProcessingStep(\n", " name=\"Fraud-Advance-Eval\",\n", " processor=script_eval,\n", " inputs=[\n", " ProcessingInput(\n", " source= train_model_artifact_path,\n", " destination=\"/opt/ml/processing/model\"\n", " ),\n", " ProcessingInput(\n", " source= test_preproc_data_path,\n", " destination=\"/opt/ml/processing/test\"\n", " )\n", " ],\n", " outputs=[\n", " ProcessingOutput(output_name=\"evaluation\", source=\"/opt/ml/processing/evaluation\"),\n", " ],\n", " code=\"src/evaluation.py\",\n", " property_files=[evaluation_report],\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 4.5 조건 단계 정의\n", "- 조건 단계에서 사용하는 ConditionLessThanOrEqualTo 에서 evaluation.json 을 로딩하여 내용을 확인\n", "\n", "```\n", "\n", "형식:\n", "cond_lte = ConditionLessThanOrEqualTo(\n", " left=JsonGet(\n", " step=step_eval,\n", " property_file=,\n", " json_path=\"test_metrics.roc.value\",\n", " ),\n", " right=6.0\n", ")\n", "\n", "에시:\n", "cond_lte = ConditionLessThanOrEqualTo(\n", " left=JsonGet(\n", " step=step_eval,\n", " property_file=evaluation_report,\n", " json_path=\"binary_classification_metrics.auc.value\",\n", " ),\n", " right=6.0\n", ")\n", "\n", "\n", "```\n", "\n", "- property_file=evaluation_report 는 위의 모델 평가 스텝에서 정의한 PropertyFile-evaluation_report 를 사용합니다. evaluation_report 에서 정의한 evaluation.json 파일 안의 \"binary_classification_metrics.auc.value\" 의 값을 사용한다는 것을 의미 합니다.\n" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "The class JsonGet has been renamed in sagemaker>=2.\n", "See: https://sagemaker.readthedocs.io/en/stable/v2.html for details.\n" ] } ], "source": [ "from sagemaker.workflow.conditions import ConditionLessThanOrEqualTo\n", "from sagemaker.workflow.condition_step import (\n", " ConditionStep,\n", " JsonGet,\n", ")\n", "\n", "\n", "cond_lte = ConditionLessThanOrEqualTo(\n", " left=JsonGet(\n", " step=step_eval,\n", " property_file=evaluation_report,\n", " json_path=\"binary_classification_metrics.auc.value\",\n", " ),\n", " right=6.0\n", ")\n", "\n", "step_cond = ConditionStep(\n", " name=\"Fraud-Advance-Condition\",\n", " conditions=[cond_lte],\n", " if_steps=[], \n", " else_steps=[], \n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# 5. 파리마터, 단계, 조건을 조합하여 최종 파이프라인 정의 및 실행\n", "\n", "\n", "이제 지금까지 생성한 단계들을 하나의 파이프라인으로 조합하고 실행하도록 하겠습니다.\n", "\n", "파이프라인은 name, parameters, steps 속성이 필수적으로 필요합니다. \n", "여기서 파이프라인의 이름은 (account, region) 조합에 대하여 유일(unique))해야 합니다.\n", "우리는 또한 여기서 Experiment 설정을 추가 하여, 실험에 등록 합니다.\n", "\n", "주의:\n", "\n", "- 정의에 사용한 모든 파라미터가 존재해야 합니다.\n", "- 파이프라인으로 전달된 단계(step)들은 실행순서와는 무관합니다. SageMaker Pipeline은 단계가 실행되고 완료될 수 있도록 의존관계를를 해석합니다." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 5.1 파이프라인 정의\n", "파이프라인 정의시에 아래 3개의 인자를 제공합니다.\n", "- 파이프라인 이름\n", "- 파이프라인 파라미터\n", "- 파이프라인 실험 설정\n", "- 스텝 정의 " ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [], "source": [ "from sagemaker.workflow.pipeline import Pipeline\n", "\n", "pipeline_name = project_prefix + \"-Eval-Cond-Step\"\n", "pipeline = Pipeline(\n", " name=pipeline_name,\n", " parameters=[\n", " processing_instance_type, \n", " processing_instance_count,\n", " train_model_artifact_path,\n", " test_preproc_data_path, \n", " ],\n", " steps=[step_eval, step_cond],\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 5.2 파이프라인 정의 확인\n", "위에서 정의한 파이프라인 정의는 Json 형식으로 정의 되어 있습니다." ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{'Version': '2020-12-01',\n", " 'Metadata': {},\n", " 'Parameters': [{'Name': 'ProcessingInstanceType',\n", " 'Type': 'String',\n", " 'DefaultValue': 'ml.m5.xlarge'},\n", " {'Name': 'ProcessingInstanceCount', 'Type': 'Integer', 'DefaultValue': 1},\n", " {'Name': 'TrainModelArtifactPath',\n", " 'Type': 'String',\n", " 'DefaultValue': 's3://sagemaker-us-east-1-051065130547/sagemaker-webinar-pipeline-advanced/training_jobs/pipelines-twg45ptdqzid-Fraud-Advance-Train-lzWNxOzDEB/output/model.tar.gz'},\n", " {'Name': 'TestPreprocDataPath',\n", " 'Type': 'String',\n", " 'DefaultValue': 's3://sagemaker-us-east-1-051065130547/sagemaker-webinar-pipeline-advanced/preporc/test.csv'}],\n", " 'PipelineExperimentConfig': {'ExperimentName': {'Get': 'Execution.PipelineName'},\n", " 'TrialName': {'Get': 'Execution.PipelineExecutionId'}},\n", " 'Steps': [{'Name': 'Fraud-Advance-Eval',\n", " 'Type': 'Processing',\n", " 'Arguments': {'ProcessingResources': {'ClusterConfig': {'InstanceType': {'Get': 'Parameters.ProcessingInstanceType'},\n", " 'InstanceCount': {'Get': 'Parameters.ProcessingInstanceCount'},\n", " 'VolumeSizeInGB': 30}},\n", " 'AppSpecification': {'ImageUri': '683313688378.dkr.ecr.us-east-1.amazonaws.com/sagemaker-scikit-learn:0.23-1-cpu-py3',\n", " 'ContainerEntrypoint': ['python3',\n", " '/opt/ml/processing/input/code/evaluation.py']},\n", " 'RoleArn': 'arn:aws:iam::051065130547:role/sagemaker-notebook-SageMakerIamRole-13SLYUPDCYIY9',\n", " 'ProcessingInputs': [{'InputName': 'input-1',\n", " 'AppManaged': False,\n", " 'S3Input': {'S3Uri': {'Get': 'Parameters.TrainModelArtifactPath'},\n", " 'LocalPath': '/opt/ml/processing/model',\n", " 'S3DataType': 'S3Prefix',\n", " 'S3InputMode': 'File',\n", " 'S3DataDistributionType': 'FullyReplicated',\n", " 'S3CompressionType': 'None'}},\n", " {'InputName': 'input-2',\n", " 'AppManaged': False,\n", " 'S3Input': {'S3Uri': {'Get': 'Parameters.TestPreprocDataPath'},\n", " 'LocalPath': '/opt/ml/processing/test',\n", " 'S3DataType': 'S3Prefix',\n", " 'S3InputMode': 'File',\n", " 'S3DataDistributionType': 'FullyReplicated',\n", " 'S3CompressionType': 'None'}},\n", " {'InputName': 'code',\n", " 'AppManaged': False,\n", " 'S3Input': {'S3Uri': 's3://sagemaker-us-east-1-051065130547/Fraud-Advance-Eval-955a33142036ab10f5b2dc96b96cd633/input/code/evaluation.py',\n", " 'LocalPath': '/opt/ml/processing/input/code',\n", " 'S3DataType': 'S3Prefix',\n", " 'S3InputMode': 'File',\n", " 'S3DataDistributionType': 'FullyReplicated',\n", " 'S3CompressionType': 'None'}}],\n", " 'ProcessingOutputConfig': {'Outputs': [{'OutputName': 'evaluation',\n", " 'AppManaged': False,\n", " 'S3Output': {'S3Uri': 's3://sagemaker-us-east-1-051065130547/Fraud-Advance-Eval-955a33142036ab10f5b2dc96b96cd633/output/evaluation',\n", " 'LocalPath': '/opt/ml/processing/evaluation',\n", " 'S3UploadMode': 'EndOfJob'}}]}},\n", " 'PropertyFiles': [{'PropertyFileName': 'EvaluationReport',\n", " 'OutputName': 'evaluation',\n", " 'FilePath': 'evaluation.json'}]},\n", " {'Name': 'Fraud-Advance-Condition',\n", " 'Type': 'Condition',\n", " 'Arguments': {'Conditions': [{'Type': 'LessThanOrEqualTo',\n", " 'LeftValue': {'Std:JsonGet': {'PropertyFile': {'Get': 'Steps.Fraud-Advance-Eval.PropertyFiles.EvaluationReport'},\n", " 'Path': 'binary_classification_metrics.auc.value'}},\n", " 'RightValue': 6.0}],\n", " 'IfSteps': [],\n", " 'ElseSteps': []}}]}" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import json\n", "\n", "\n", "definition = json.loads(pipeline.definition())\n", "definition" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 5.3 파이프라인 정의를 제출하고 실행하기 \n", "\n", "파이프라인 정의를 파이프라인 서비스에 제출합니다. 함께 전달되는 역할(role)을 이용하여 AWS에서 파이프라인을 생성하고 작업의 각 단계를 실행할 것입니다. " ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [], "source": [ "pipeline.upsert(role_arn=role)\n", "execution = pipeline.start()" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{'PipelineArn': 'arn:aws:sagemaker:us-east-1:051065130547:pipeline/sagemaker-webinar-pipeline-advanced-eval-cond-step',\n", " 'PipelineExecutionArn': 'arn:aws:sagemaker:us-east-1:051065130547:pipeline/sagemaker-webinar-pipeline-advanced-eval-cond-step/execution/0fi1kuucungy',\n", " 'PipelineExecutionDisplayName': 'execution-1647866257006',\n", " 'PipelineExecutionStatus': 'Executing',\n", " 'CreationTime': datetime.datetime(2022, 3, 21, 12, 37, 36, 862000, tzinfo=tzlocal()),\n", " 'LastModifiedTime': datetime.datetime(2022, 3, 21, 12, 37, 36, 862000, tzinfo=tzlocal()),\n", " 'CreatedBy': {},\n", " 'LastModifiedBy': {},\n", " 'ResponseMetadata': {'RequestId': '5cc58385-816c-4cb5-9eea-6146976c146a',\n", " 'HTTPStatusCode': 200,\n", " 'HTTPHeaders': {'x-amzn-requestid': '5cc58385-816c-4cb5-9eea-6146976c146a',\n", " 'content-type': 'application/x-amz-json-1.1',\n", " 'content-length': '465',\n", " 'date': 'Mon, 21 Mar 2022 12:37:36 GMT'},\n", " 'RetryAttempts': 0}}" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "execution.describe()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 5.4 파이프라인 실행 기다리기" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [], "source": [ "execution.wait()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 5.5 파이프라인 실행 단계 기록 보기" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[{'StepName': 'Fraud-Advance-Condition',\n", " 'StartTime': datetime.datetime(2022, 3, 21, 12, 42, 46, 431000, tzinfo=tzlocal()),\n", " 'EndTime': datetime.datetime(2022, 3, 21, 12, 42, 46, 854000, tzinfo=tzlocal()),\n", " 'StepStatus': 'Succeeded',\n", " 'AttemptCount': 0,\n", " 'Metadata': {'Condition': {'Outcome': 'True'}}},\n", " {'StepName': 'Fraud-Advance-Eval',\n", " 'StartTime': datetime.datetime(2022, 3, 21, 12, 37, 37, 845000, tzinfo=tzlocal()),\n", " 'EndTime': datetime.datetime(2022, 3, 21, 12, 42, 45, 468000, tzinfo=tzlocal()),\n", " 'StepStatus': 'Succeeded',\n", " 'AttemptCount': 0,\n", " 'Metadata': {'ProcessingJob': {'Arn': 'arn:aws:sagemaker:us-east-1:051065130547:processing-job/pipelines-0fi1kuucungy-fraud-advance-eval-wy4eo0oxka'}}}]" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "execution.list_steps()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# 6. 파이프라인 실행 확인 하기\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 6.1 모델 평가 단계 결과 확인 하기\n", "모델 평가의 결과로서 evaluation.json 이 S3에 저장 됩니다." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "![eval_step_result.png](img/eval_step_result.png)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 6.2 조건 스텝 결과 확인 하기\n", "- 조건 스템의 결과로서 `True` 제공 되었습니다." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "![condition-result.png](img/condition-result.png)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "instance_type": "ml.m5.large", "kernelspec": { "display_name": "conda_python3", "language": "python", "name": "conda_python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.13" } }, "nbformat": 4, "nbformat_minor": 4 }