{ "cells": [ { "cell_type": "markdown", "id": "baa6707a", "metadata": {}, "source": [ "# Step Function 기반 MLOps 구축하기" ] }, { "cell_type": "markdown", "id": "f3e7f7f7", "metadata": {}, "source": [ "## 1. 사전 준비 과정\n", "### 1.1 MLOps 구현을 위한 Acount 정보 가져오기" ] }, { "cell_type": "code", "execution_count": null, "id": "fed40e0a", "metadata": {}, "outputs": [], "source": [ "import boto3\n", "import json\n", "from sagemaker import get_execution_role\n", "from time import strftime\n", "import calendar\n", "import time" ] }, { "cell_type": "code", "execution_count": null, "id": "d5012d06", "metadata": {}, "outputs": [], "source": [ "iam_client = boto3.client('iam')\n", "role=get_execution_role()\n", "base_role_name=role.split('/')[-1]" ] }, { "cell_type": "code", "execution_count": null, "id": "585e20ba", "metadata": {}, "outputs": [], "source": [ "sts_client = boto3.client(\"sts\")\n", "account_id = sts_client.get_caller_identity()['Account']\n", "sess = boto3.Session()\n", "region = sess.region_name" ] }, { "cell_type": "code", "execution_count": null, "id": "ea451d84-a5b9-4ada-9a0d-c1b9ed88463e", "metadata": {}, "outputs": [], "source": [ "role" ] }, { "cell_type": "markdown", "id": "671b960a", "metadata": {}, "source": [ "### 1.2 MLOps에서 활용할 Policy 설정하기\n", "\n", "해당 HOL에서 구현할 아키텍처에 필요한 managed policy를 아래와 같이 정의합니다. Role을 별도 생성하셔도 되지만 HOL의 편의성을 위해 SageMaker Notebook/Studio와 동일한 Role에 policy를 추가하여 계속 활용합니다." ] }, { "cell_type": "code", "execution_count": null, "id": "e5bf4d27", "metadata": {}, "outputs": [], "source": [ "iam_client.attach_role_policy(\n", " RoleName=base_role_name,\n", " PolicyArn='arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryFullAccess'\n", ")\n", "iam_client.attach_role_policy(\n", " RoleName=base_role_name,\n", " PolicyArn='arn:aws:iam::aws:policy/AmazonEventBridgeFullAccess'\n", ")\n", "iam_client.attach_role_policy(\n", " RoleName=base_role_name,\n", " PolicyArn='arn:aws:iam::aws:policy/AWSLambda_FullAccess'\n", ")\n", "iam_client.attach_role_policy(\n", " RoleName=base_role_name,\n", " PolicyArn='arn:aws:iam::aws:policy/AWSCodeCommitFullAccess'\n", ")\n", "iam_client.attach_role_policy(\n", " RoleName=base_role_name,\n", " PolicyArn='arn:aws:iam::aws:policy/SecretsManagerReadWrite'\n", ")" ] }, { "cell_type": "markdown", "id": "a4bb7a04", "metadata": {}, "source": [ "### 1.3 CodeCommit 생성\n", "CodeCommit 콘솔에 가서 CodeCommit을 생성합니다. 학습 시 사용했던 CodeCommit을 활용하셔도 됩니다.\n", "\n", "\n", "![codecommit-intro.png](../figures/codecommit-intro.png)" ] }, { "cell_type": "markdown", "id": "4042828b", "metadata": {}, "source": [ "### 1.4 CodeCommit 관련 Credentials 생성 및 Secret Manager에 저장하기" ] }, { "cell_type": "markdown", "id": "788044d9", "metadata": {}, "source": [ "#### - CodeCommit Credentials" ] }, { "cell_type": "code", "execution_count": null, "id": "5010638f", "metadata": {}, "outputs": [], "source": [ "user_name = 'XXXXXX' ## ==> IAM에서 사용자 아이디 확인\n", "codecommit_cred = 'codecommit-cred-'+user_name\n", "codecommit_cred" ] }, { "cell_type": "code", "execution_count": null, "id": "df458649", "metadata": {}, "outputs": [], "source": [ "try:\n", " response = iam_client.list_service_specific_credentials(\n", " UserName=user_name,\n", " ServiceName='codecommit.amazonaws.com'\n", " )\n", " if len(response['ServiceSpecificCredentials']) > 0:\n", " response = iam_client.delete_service_specific_credential(\n", " UserName=user_name,\n", " ServiceSpecificCredentialId=response['ServiceSpecificCredentials'][-1]['ServiceSpecificCredentialId']\n", " )\n", "except:\n", " print(\"Create new codecommit crendentials\")\n", " pass\n", "finally:\n", " response = iam_client.create_service_specific_credential(\n", " UserName=user_name,\n", " ServiceName='codecommit.amazonaws.com'\n", " )\n", " ServiceUserName = response['ServiceSpecificCredential']['ServiceUserName']\n", " ServicePassword = response['ServiceSpecificCredential']['ServicePassword']\n", "print(f\"ServiceUserName : {ServiceUserName} \\nServicePassword : {ServicePassword}\")" ] }, { "cell_type": "markdown", "id": "ee1ff7f1", "metadata": {}, "source": [ "#### - Secret Manager (Optional)\n", "CodeCommit의 Credentials 정보를 Secret Manager에 Key, Value로 넣어놓고 안전하게 사용합니다." ] }, { "cell_type": "code", "execution_count": null, "id": "eee1fc46", "metadata": {}, "outputs": [], "source": [ "sec_client = boto3.client('secretsmanager')" ] }, { "cell_type": "code", "execution_count": null, "id": "a4be61fd", "metadata": {}, "outputs": [], "source": [ "secret_string = json.dumps({\n", " \"username\": ServiceUserName,\n", " \"password\": ServicePassword\n", " })" ] }, { "cell_type": "code", "execution_count": null, "id": "095883c5", "metadata": {}, "outputs": [], "source": [ "sec_list = [[sec_name['Name'], sec_name['ARN']] for sec_name in sec_client.list_secrets()['SecretList'] if sec_name['Name'] == codecommit_cred]\n", "\n", "if len(sec_list) == 0:\n", " sec_response = sec_client.create_secret(\n", " Name=codecommit_cred,\n", " Description='This credential uses git_config for SageMaker in Lambda',\n", " SecretString=secret_string,\n", " Tags=[\n", " {\n", " 'Key': 'codecommit-name',\n", " 'Value': 'codecommit_credentials'\n", " },\n", " ]\n", " )\n", " sec_arn = sec_response['ARN']\n", " print(f'sec_arn : {sec_arn}')\n", "else:\n", " print(f'sec_arn : {sec_list[0][1]}')\n", " sec_response = sec_client.update_secret(\n", " SecretId=sec_list[0][1],\n", " SecretString=secret_string\n", " )\n", " sec_arn = sec_list[0][1]" ] }, { "cell_type": "code", "execution_count": null, "id": "9fc9e5fa-164e-4368-829f-6628c6d6d38c", "metadata": {}, "outputs": [], "source": [ "%store sec_arn" ] }, { "cell_type": "markdown", "id": "66fc911f", "metadata": {}, "source": [ "## 2. MLOps 구성하기" ] }, { "cell_type": "markdown", "id": "b1fd9e44", "metadata": {}, "source": [ "### 2.1 Create Step functions\n", "\n", "- step_functions 폴더 내의 mlops-yolov5.json을 import하여 기존 구성한 base definition을 활용합니다.\n", "\n", "![step-functions.png](../figures/step-functions.png)" ] }, { "cell_type": "markdown", "id": "68845db6", "metadata": {}, "source": [ "### 2.2 Step functions의 Role에 대한 정책(Policy) 추가\n", "- 기본적으로 Lambda를 이용하여 AWS 의 SageMaker 등을 활용할 예정이므로 Step functions을 실행하는 Role에는 LambdaFullAccess 정책이 추가되어야 합니다.\n", "\n", "![step-functions-role.png](../figures/step-functions-role.png)" ] }, { "cell_type": "markdown", "id": "655b4220", "metadata": {}, "source": [ "## 3. Step Functions에 활용할 Lambda 함수 생성\n", "### 3.1 Start-Training-Job의 Lambda 생성" ] }, { "cell_type": "code", "execution_count": null, "id": "00d8ea6e", "metadata": {}, "outputs": [], "source": [ "!mkdir ./1-Start-Training-Job" ] }, { "cell_type": "code", "execution_count": null, "id": "897419be", "metadata": {}, "outputs": [], "source": [ "%%writefile 1-Start-Training-Job/sm_training_job.py\n", "\n", "import json\n", "import boto3\n", "import os\n", "from time import strftime\n", "import subprocess\n", "import sagemaker\n", "\n", "import datetime\n", "import glob\n", "import os\n", "import time\n", "import warnings\n", "\n", "from sagemaker.pytorch import PyTorch\n", "\n", "from smexperiments.experiment import Experiment\n", "from smexperiments.trial import Trial\n", "\n", "import base64\n", "from botocore.exceptions import ClientError\n", "\n", "\n", "instance_type = os.environ[\"INSTANCE_TYPE\"]\n", "\n", "\n", "def lambda_handler(event, context):\n", "\n", " role = 'arn:aws:iam::${account_id}:role/service-role/AmazonSageMaker-ExecutionRole-{}' ### <== 1. Role 추가\n", " \n", " sagemaker_session = sagemaker.Session()\n", " \n", " experiment_name = 'yolov5-poc-exp1' ### <== 2. Experiment 명\n", " \n", " instance_count = 1\n", " do_spot_training = False\n", " max_wait = None\n", " max_run = 2*60*60\n", " \n", " ## SageMaker Experiments Setting\n", " try:\n", " sm_experiment = Experiment.load(experiment_name)\n", " except:\n", " sm_experiment = Experiment.create(\n", " experiment_name=experiment_name,\n", " tags=[{'Key': 'model-name', 'Value': 'yolov5'}]\n", " ) \n", " \n", " ## Trials Setting\n", " create_date = strftime(\"%m%d-%H%M%s\")\n", " \n", " sm_trial = Trial.create(trial_name=f'{experiment_name}-{create_date}',\n", " experiment_name=experiment_name)\n", "\n", " job_name = f'{sm_trial.trial_name}'\n", " \n", " #TODO: use the bucket name\n", "\n", " bucket = '' ### <== 3. 사용할 Bucket 명\n", " code_location = f's3://{bucket}/yolov5/sm_codes'\n", " output_path = f's3://{bucket}/yolov5/output' \n", " \n", " # TODO\n", " metric_definitions = [\n", " {'Name': 'Precision', 'Regex': r'all\\s+[0-9.]+\\s+[0-9.]+\\s+([0-9.]+)'},\n", " {'Name': 'Recall', 'Regex': r'all\\s+[0-9.]+\\s+[0-9.]+\\s+[0-9.]+\\s+([0-9.]+)'},\n", " {'Name': 'mAP@.5', 'Regex': r'all\\s+[0-9.]+\\s+[0-9.]+\\s+[0-9.]+\\s+[0-9.]+\\s+([0-9.]+)'},\n", " {'Name': 'mAP@.5:.95', 'Regex': r'all\\s+[0-9.]+\\s+[0-9.]+\\s+[0-9.]+\\s+[0-9.]+\\s+[0-9.]+\\s+([0-9.]+)'}\n", " ]\n", " \n", " hyperparameters = {\n", " 'data': 'data_sm.yaml',\n", " 'cfg': 'yolov5s.yaml',\n", " 'weights': 'weights/yolov5s.pt', # Transfer learning\n", " 'batch-size': 64,\n", " 'epochs': 1,\n", " 'project': '/opt/ml/model',\n", " 'workers': 0, # To avoid shm OOM issue\n", " 'freeze': 10, # For transfer learning, freeze all Layers except for the final output convolution layers.\n", " }\n", " \n", " \n", " s3_data_path = f's3://{bucket}/dataset/BCCD'\n", " checkpoint_s3_uri = f's3://{bucket}/poc_yolov5/checkpoints'\n", "\n", " if do_spot_training:\n", " max_wait = max_run\n", "\n", " \n", " secret=get_secret()\n", " \n", " ## \n", " codecommit_repo = f'https://git-codecommit.${region}.amazonaws.com/v1/repos/${git_repo_name}' ### <== 4. source codecommit repository\n", " \n", " git_config = {'repo': codecommit_repo,\n", " 'branch': 'main',\n", " 'username': secret['username'],\n", " 'password': secret['password']}\n", " \n", " source_dir = 'yolov5'\n", " \n", " estimator = PyTorch(\n", " entry_point='train.py',\n", " source_dir=source_dir,\n", " git_config=git_config,\n", " role=role,\n", " sagemaker_session=sagemaker_session,\n", " framework_version='1.10',\n", " py_version='py38',\n", " instance_count=instance_count,\n", " instance_type=instance_type,\n", " # volume_size=1024,\n", " code_location = code_location,\n", " output_path=output_path,\n", " metric_definitions=metric_definitions,\n", " hyperparameters=hyperparameters,\n", " # distribution=distribution,\n", " # disable_profiler=True,\n", " # debugger_hook_config=False,\n", " max_run=max_run,\n", " use_spot_instances=do_spot_training,\n", " max_wait=max_wait,\n", " checkpoint_s3_uri=checkpoint_s3_uri,\n", " )\n", " \n", " estimator.fit(\n", " inputs={'inputdata': s3_data_path},\n", " job_name=job_name,\n", " wait=False,\n", " )\n", " \n", " event['training_job_name'] = job_name\n", " event['stage'] = 'Training'\n", " \n", " return event\n", " \n", "\n", "def get_secret():\n", " secret_name = f\"arn:aws:secretsmanager:${region}:${account-id}:secret:${secret-manager-name}\" ### <== 5. Secret Manager ARN 정보\n", " region_name = \"ap-northeast-2\" ### <== 6. region 명\n", "\n", " secret = {}\n", " # Create a Secrets Manager client\n", " session = boto3.session.Session()\n", " client = session.client(\n", " service_name='secretsmanager',\n", " region_name=region_name\n", " )\n", "\n", " # In this sample we only handle the specific exceptions for the 'GetSecretValue' API.\n", " # See https://docs.aws.amazon.com/secretsmanager/latest/apireference/API_GetSecretValue.html\n", " # We rethrow the exception by default.\n", "\n", " get_secret_value_response = client.get_secret_value(\n", " SecretId=secret_name\n", " )\n", " \n", " if 'SecretString' in get_secret_value_response:\n", " secret = get_secret_value_response['SecretString']\n", " secret = json.loads(secret)\n", " else:\n", " print(\"secret is not defined. Checking the Secrets Manager\")\n", "\n", " return secret\n", "\n" ] }, { "cell_type": "code", "execution_count": null, "id": "03541842", "metadata": {}, "outputs": [], "source": [ "%%writefile 1-Start-Training-Job/Dockerfile\n", "\n", "# Define function directory\n", "ARG FUNCTION_DIR=\"/function\"\n", "\n", "FROM python:buster as build-image\n", "\n", "# Install aws-lambda-cpp build dependencies\n", "RUN apt-get update && \\\n", " apt-get install -y \\\n", " g++ \\\n", " make \\\n", " cmake \\\n", " unzip \\\n", " git \\\n", " libcurl4-openssl-dev\n", "\n", "# Include global arg in this stage of the build\n", "ARG FUNCTION_DIR\n", "# Create function directory\n", "RUN mkdir -p ${FUNCTION_DIR}\n", "\n", "# Copy function code\n", "COPY sm_training_job.py ${FUNCTION_DIR}\n", "# COPY git_lambda ${FUNCTION_DIR}/git_lambda\n", "# COPY yolov5 ${FUNCTION_DIR}/yolov5\n", "\n", "# Install the runtime interface client\n", "RUN pip install \\\n", " --target ${FUNCTION_DIR} \\\n", " awslambdaric sagemaker smdebug sagemaker-experiments\n", "\n", "# Multi-stage build: grab a fresh copy of the base image\n", "FROM python:buster\n", "\n", "# Include global arg in this stage of the build\n", "ARG FUNCTION_DIR\n", "# Set working directory to function root directory\n", "WORKDIR ${FUNCTION_DIR}\n", "\n", "# Copy in the build image dependencies\n", "COPY --from=build-image ${FUNCTION_DIR} ${FUNCTION_DIR}\n", "\n", "ENTRYPOINT [ \"/usr/local/bin/python\", \"-m\", \"awslambdaric\" ]\n", "CMD [ \"sm_training_job.lambda_handler\" ]" ] }, { "cell_type": "markdown", "id": "504bd044", "metadata": {}, "source": [ "#### - Role 정보" ] }, { "cell_type": "code", "execution_count": null, "id": "c14ed4d8", "metadata": {}, "outputs": [], "source": [ "import sagemaker\n", "role = sagemaker.get_execution_role()\n", "role" ] }, { "cell_type": "markdown", "id": "180839e6", "metadata": {}, "source": [ "#### - Secret Manager ARN 정보" ] }, { "cell_type": "code", "execution_count": null, "id": "c802a490", "metadata": {}, "outputs": [], "source": [ "sec_arn = sec_response['ARN']\n", "%store sec_arn\n", "sec_arn" ] }, { "cell_type": "code", "execution_count": null, "id": "c100bd19", "metadata": {}, "outputs": [], "source": [ "%store -r" ] }, { "cell_type": "code", "execution_count": null, "id": "a0ad1f0a", "metadata": {}, "outputs": [], "source": [ "print(f\"bucket : {bucket}\\ncodecommit_repo : {codecommit_repo}\")" ] }, { "cell_type": "markdown", "id": "6270d1a5", "metadata": {}, "source": [ "#### - Training job의 lambda_function 생성\n", "- 기존 Yolov5의 학습작업의 실행 노트북에서 작성한 학습 클러스터의 정의와 실행 명령어를 그대로 사용하시면 됩니다.\n", "- 학습 코드는 CodeCommit에 push 된 코드를 가져와서 학습을 실행하게 됩니다." ] }, { "cell_type": "markdown", "id": "f4964f07", "metadata": {}, "source": [ "#### - Lambda Container Image 생성 후 ECR Push하기" ] }, { "cell_type": "code", "execution_count": null, "id": "e48bc37b", "metadata": {}, "outputs": [], "source": [ "%%bash\n", "cd ./1-Start-Training-Job\n", "echo $(pwd)\n", "container_name=lambda-yolo5-training\n", "account=$(aws sts get-caller-identity --query Account --output text)\n", "\n", "# Get the region defined in the current configuration (default to us-west-2 if none defined)\n", "region=$(aws configure get region)\n", "# region=${region:-us-west-2}\n", "\n", "fullname=\"${account}.dkr.ecr.${region}.amazonaws.com/${container_name}:1.0\"\n", "\n", "# If the repository doesn't exist in ECR, create it.\n", "aws ecr describe-repositories --repository-names \"${container_name}\" > /dev/null 2>&1\n", "if [ $? -ne 0 ]\n", "then\n", " aws ecr create-repository --repository-name \"${container_name}\" > /dev/null\n", "fi\n", "\n", "# # Get the login command from ECR and execute it directly\n", "# $(aws ecr get-login-password --region us-west-2 | docker login --username AWS --password-stdin \"763104351884.dkr.ecr.us-west-2.amazonaws.com\")\n", "\n", "# Build the docker image locally with the image name and then push it to ECR\n", "# with the full name.\n", "docker build -f Dockerfile -t ${fullname} .\n", "# docker tag ${container_name} ${fullname}\n", "\n", "# Get the login command from ECR and execute it directly\n", "$(aws ecr get-login --region ${region} --no-include-email)\n", "docker push ${fullname}" ] }, { "cell_type": "markdown", "id": "0ac88a96", "metadata": {}, "source": [ "#### - Lambda 함수 생성\n", "![lambda-container.png](../figures/lambda-container.png)\n" ] }, { "cell_type": "markdown", "id": "c2792a64", "metadata": {}, "source": [ "#### - Lambda Role 설정\n", "Role은 기본적인 Role을 생성하신 후, 필요한 Policy를 추가하도록 합니다.\n", "- SecretsManagerReadWrite\n", "- AmazonSageMakerFullAccess\n", "- AWSLambdaBasicExecutionRole\n", "\n", "![lambda-role-setting.png](../figures/lambda-role-setting.png)" ] }, { "cell_type": "markdown", "id": "66dbec0d", "metadata": {}, "source": [ "#### - Lambda start-training-job의 설정 변경\n", "![lambda-start-training-job-config.png](../figures/lambda-start-training-job-config.png)" ] }, { "cell_type": "markdown", "id": "68a55a26", "metadata": {}, "source": [ "### 3.2 Check Status Training\n", "lambda에서 수행되는 함수를 zip으로 묶어서 upload를 합니다." ] }, { "cell_type": "code", "execution_count": null, "id": "ffdd96a1", "metadata": {}, "outputs": [], "source": [ "!mkdir ./2-Check-Status-Training" ] }, { "cell_type": "code", "execution_count": null, "id": "5ae6ddb7", "metadata": {}, "outputs": [], "source": [ "%%writefile 2-Check-Status-Training/lambda_function.py\n", "\n", "import json\n", "import boto3\n", "import os\n", "\n", "sagemaker = boto3.client('sagemaker')\n", "\n", "def lambda_handler(event, context):\n", " stage = event['stage']\n", "\n", " if stage == 'Training':\n", " training_job_name = event['training_job_name']\n", " training_details = describe_training_job(training_job_name)\n", " print(training_details)\n", "\n", " status = training_details['TrainingJobStatus']\n", " if status == 'Completed':\n", " s3_output_path = training_details['OutputDataConfig']['S3OutputPath']\n", " model_data_url = os.path.join(s3_output_path, training_details['TrainingJobName'], 'output/model.tar.gz')\n", "\n", " event['message'] = 'Training job \"{}\" complete. Model data uploaded to \"{}\"'.format(training_job_name, model_data_url)\n", " event['model_data_url'] = model_data_url\n", " event['training_job'] = training_details['TrainingJobName']\n", " elif status == 'Failed':\n", " failure_reason = training_details['FailureReason']\n", " event['message'] = 'Training job failed. {}'.format(failure_reason)\n", " \n", " event['status'] = status\n", " \n", " print(event)\n", " \n", " return event\n", "\n", "def describe_training_job(name):\n", " \"\"\" Describe SageMaker training job identified by input name.\n", " Args:\n", " name (string): Name of SageMaker training job to describe.\n", " Returns:\n", " (dict)\n", " Dictionary containing metadata and details about the status of the training job.\n", " \"\"\"\n", " try:\n", " response = sagemaker.describe_training_job(\n", " TrainingJobName = name\n", " )\n", " except Exception as e:\n", " print(e)\n", " print('Unable to describe training job.')\n", " raise(e)\n", " \n", " return response\n" ] }, { "cell_type": "code", "execution_count": null, "id": "7bbfcc89", "metadata": {}, "outputs": [], "source": [ "%%bash\n", "cd ./2-Check-Status-Training\n", "rm -rf check-status-training.zip\n", "zip -r check-status-training.zip lambda_function.py" ] }, { "cell_type": "code", "execution_count": null, "id": "23b74bf2", "metadata": {}, "outputs": [], "source": [ "%store -r\n", "print(f\"bucket : {bucket}\")" ] }, { "cell_type": "code", "execution_count": null, "id": "6ce68d79", "metadata": {}, "outputs": [], "source": [ "!aws s3 cp 2-Check-Status-Training/check-status-training.zip s3://$bucket/lambda_function/" ] }, { "cell_type": "markdown", "id": "93e05970", "metadata": {}, "source": [ "### 3.3 Check Accuracy\n", "lambda에서 수행되는 함수를 zip으로 묶어서 upload를 합니다." ] }, { "cell_type": "code", "execution_count": null, "id": "94387e8e", "metadata": {}, "outputs": [], "source": [ "!mkdir ./3-Check-Accuracy" ] }, { "cell_type": "code", "execution_count": null, "id": "4c2221e6", "metadata": {}, "outputs": [], "source": [ "%%writefile 3-Check-Accuracy/lambda_function.py\n", "\n", "\n", "import json\n", "import boto3\n", "import tarfile\n", "from io import BytesIO\n", "import os\n", "import pickle\n", "from io import StringIO\n", "import csv\n", "\n", "\n", "s3 = boto3.client('s3')\n", "sm = boto3.client('sagemaker')\n", "s3_resource = boto3.resource('s3')\n", "\n", "\n", "acc_col_num = os.environ['ACC_COL_NUM']\n", "bucket = os.environ['BUCKET']\n", "\n", "\n", "def lambda_handler(event, context):\n", " # print(event) \n", " model_data_url = event['model_data_url']\n", " # bucket = event['bucket']\n", " key_value = model_data_url.split(bucket)[1][1:]\n", " print(key_value)\n", " tar_file_obj = s3.get_object(Bucket=bucket, Key=key_value)\n", " tar_content = tar_file_obj ['Body'].read()\n", " \n", " accuracy = 0\n", " \n", " with tarfile.open(fileobj = BytesIO(tar_content)) as tar:\n", " for tar_resource in tar:\n", " if (tar_resource.isfile()):\n", " if \"results.csv\" in tar_resource.name:\n", " inner_file_bytes = tar.extractfile(tar_resource).read()\n", " file_data = inner_file_bytes.decode('utf-8')\n", " file = StringIO(file_data)\n", " csv_data = csv.reader(file, delimiter=\",\")\n", "\n", " max_line = len(list(csv_data))\n", "\n", " file = StringIO(file_data)\n", " csv_data = csv.reader(file, delimiter=\",\")\n", "\n", " line_count = 0\n", "\n", " for row in csv_data:\n", " line_count += 1\n", " if line_count == max_line:\n", " accuracy = row[int(acc_col_num)].lstrip()\n", " \n", " print(\"accuracy is \" + accuracy)\n", " \n", " desired_accuracy = event['desired_accuracy']\n", " \n", " if accuracy > desired_accuracy:\n", " event['train_result'] = \"PASS\"\n", " print(\"PASS\")\n", " else:\n", " event['train_result'] = \"FAIL\"\n", " print(\"FAIL\")\n", "\n", " return event\n" ] }, { "cell_type": "code", "execution_count": null, "id": "a4a4fd36", "metadata": {}, "outputs": [], "source": [ "%%bash\n", "cd ./3-Check-Accuracy\n", "rm -rf check-accuracy.zip\n", "zip -r check-accuracy.zip lambda_function.py" ] }, { "cell_type": "code", "execution_count": null, "id": "5354168a", "metadata": {}, "outputs": [], "source": [ "!aws s3 cp 3-Check-Accuracy/check-accuracy.zip s3://$bucket/lambda_function/" ] }, { "cell_type": "markdown", "id": "81d78a1a", "metadata": {}, "source": [ "#### - lambda-check-accuracy 설정 추가\n", "\n", "- check하는 시간이 3초 이상 소요되므로 20초 정도로 변경합니다.\n", "- 또한, 버킷 이름을 추가해 주시면 됩니다.\n", "\n", "![lambda-check-accuracy-setting.png](../figures/lambda-check-accuracy-setting.png)" ] }, { "cell_type": "markdown", "id": "7ad19079", "metadata": {}, "source": [ "### 3.4 Reister Model\n", "- 학습이 완료된 이후 설정한 desired accuracy를 넘는 모델 Artifacts는 Model Registry에 등록하게 됩니다." ] }, { "cell_type": "code", "execution_count": null, "id": "4a495974", "metadata": {}, "outputs": [], "source": [ "!mkdir 4-Register-Model" ] }, { "cell_type": "code", "execution_count": null, "id": "cbef03a2-73d0-4bf7-b0e8-2ce31a67d01d", "metadata": {}, "outputs": [], "source": [ "import sagemaker\n", "image_uri = sagemaker.image_uris.retrieve(framework='pytorch', \n", " image_scope='training',\n", " version='1.10',\n", " instance_type='ml.c5.2xlarge', \n", " region=region)\n", "image_uri" ] }, { "cell_type": "code", "execution_count": null, "id": "7f88145b", "metadata": {}, "outputs": [], "source": [ "%%writefile 4-Register-Model/lambda_function.py\n", "\n", "import json\n", "import boto3\n", "import botocore\n", "import os\n", "\n", "\n", "sm_client = boto3.client(\"sagemaker\")\n", "\n", "model_package_group_name = os.environ['MODEL_PACKAGE_GROUP_NAME']\n", "model_package_group_desc = os.environ['MODEL_PACKAGE_GROUP_DESC']\n", "\n", "\n", "def lambda_handler(event, context):\n", " \n", " modelpackage_inference_specification = {\n", " \"InferenceSpecification\": {\n", " \"Containers\": [\n", " {\n", " \"Image\": '763104351884.dkr.ecr.ap-northeast-2.amazonaws.com/pytorch-training:1.10-cpu-py38',\n", " }\n", " ],\n", " \"SupportedContentTypes\": [ \"application/x-image\" ],\n", " \"SupportedResponseMIMETypes\": [ \"application/x-image\" ],\n", " }\n", " }\n", " \n", " model_data_url = event['model_data_url'] \n", " \n", " \n", " # Specify the model data\n", " modelpackage_inference_specification[\"InferenceSpecification\"][\"Containers\"][0][\"ModelDataUrl\"]=model_data_url\n", " \n", " create_model_package_input_dict = {\n", " \"ModelPackageGroupName\" : model_package_group_name,\n", " \"ModelPackageDescription\" : model_package_group_desc,\n", " \"ModelApprovalStatus\" : \"PendingManualApproval\"\n", " }\n", "\n", " create_model_package_input_dict.update(modelpackage_inference_specification)\n", " modelpackage_inference_specification[\"InferenceSpecification\"][\"Containers\"][0]\n", " \n", " try:\n", " create_mode_package_response = sm_client.create_model_package(**create_model_package_input_dict)\n", " except botocore.exceptions.ClientError as ce:\n", " # When model package group does not exit\n", " print('Model package grop does not exist. Creating a new one')\n", " if ce.operation_name == \"CreateModelPackage\":\n", " if ce.response[\"Error\"][\"Message\"] == \"Model Package Group does not exist.\":\n", " # Create model package group\n", " create_model_package_group_response = sm_client.create_model_package_group(\n", " ModelPackageGroupName=model_package_group_name,\n", " ModelPackageGroupDescription=model_package_group_desc,\n", " )\n", " \n", " create_mode_package_response = sm_client.create_model_package(**create_model_package_input_dict)\n", " \n", " return event\n" ] }, { "cell_type": "code", "execution_count": null, "id": "af20577d", "metadata": {}, "outputs": [], "source": [ "%%bash\n", "cd ./4-Register-Model\n", "rm -rf register-model.zip\n", "zip -r register-model.zip lambda_function.py" ] }, { "cell_type": "code", "execution_count": null, "id": "0930603a", "metadata": {}, "outputs": [], "source": [ "!aws s3 cp ./4-Register-Model/register-model.zip s3://$bucket/lambda_function/" ] }, { "cell_type": "markdown", "id": "86ddac7c", "metadata": {}, "source": [ "#### - lambda-register-model의 설정 추가\n", "\n", "- 추론에 사용된 ECR의 Container URI, 생성할 Model Packagegroup의 name과 Description을 환경변수에 추가합니다." ] }, { "cell_type": "markdown", "id": "ac8fe48a", "metadata": {}, "source": [ "![lambda-register-model-setting.png](../figures/lambda-register-model-setting.png)" ] }, { "cell_type": "markdown", "id": "6cc6cb51", "metadata": {}, "source": [ "## 4. Lambda-Step-Functions-Trigger\n", "- Step Function을 실행하는 Lambda 함수를 생성합니다.\n", "- 이 Lambda 함수를 통해 S3의 object가 추가된 경우 또는 CodeCommit에 신규 학습 코드가 push 된 경우 자동으로 Step Functions을 실행할 수 있습니다." ] }, { "cell_type": "code", "execution_count": null, "id": "3493cb0a", "metadata": {}, "outputs": [], "source": [ "!mkdir 5-Step-Functions-Trigger" ] }, { "cell_type": "code", "execution_count": null, "id": "6233e6d3", "metadata": {}, "outputs": [], "source": [ "%%writefile 5-Step-Functions-Trigger/lambda_function.py\n", "\n", "import json\n", "import boto3\n", "import os\n", "\n", "\n", "s3 = boto3.client('s3')\n", "sf = boto3.client('stepfunctions')\n", "\n", "\n", "state_machine_arn = os.environ['STATE_MACHINE_ARN']\n", "desired_accuracy = os.environ['DESIRED_ACCURACY']\n", "\n", "\n", "def lambda_handler(event, context):\n", " \n", " print(event)\n", " json_string = {\n", " \"desired_accuracy\": desired_accuracy\n", " }\n", " \n", " # json_content = json.loads(json_string)\n", " print(json_string)\n", " \n", " sf.start_execution(\n", " stateMachineArn = state_machine_arn,\n", " input = json.dumps(json_string))\n", " \n", " return event\n" ] }, { "cell_type": "code", "execution_count": null, "id": "c757ecc9", "metadata": {}, "outputs": [], "source": [ "%%bash\n", "cd ./5-Step-Functions-Trigger\n", "rm -rf step-functions-trigger.zip\n", "zip -r step-functions-trigger.zip lambda_function.py" ] }, { "cell_type": "code", "execution_count": null, "id": "47ae0a33", "metadata": {}, "outputs": [], "source": [ "!aws s3 cp ./5-Step-Functions-Trigger/step-functions-trigger.zip s3://$bucket/lambda_function/" ] }, { "cell_type": "markdown", "id": "7da611c9", "metadata": {}, "source": [ "### - Lambda step functions trigger의 설정 변경\n", "\n", "![lambda-step-functions-trigger-config.png](../figures/lambda-step-functions-trigger-config.png)" ] }, { "cell_type": "markdown", "id": "525e372b", "metadata": {}, "source": [ "### - Trigger 추가\n", "Trigger를 추가할 경우 아래와 같이 추가를 할 수 있습니다.\n", "\n", "\n", "![step-functions-trigger.png](../figures/step-functions-trigger.png)" ] }, { "cell_type": "code", "execution_count": null, "id": "67f19130", "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "conda_python3", "language": "python", "name": "conda_python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.12" } }, "nbformat": 4, "nbformat_minor": 5 }