{ "cells": [ { "cell_type": "markdown", "id": "attended-township", "metadata": {}, "source": [ "# XGBoost simple example (SageMaker version)\n", "\n", "source : https://www.datacamp.com/community/tutorials/xgboost-in-python" ] }, { "cell_type": "markdown", "id": "romantic-extent", "metadata": {}, "source": [ "### 데이터 로드\n", "\n", "[xgboost simple 예제](warmingup1.xgboost_simple.ipynb)와 동일한 데이터셋을 사용합니다." ] }, { "cell_type": "code", "execution_count": null, "id": "retained-apparatus", "metadata": {}, "outputs": [], "source": [ "from sklearn.datasets import load_boston\n", "import pandas as pd\n", "import numpy as np\n", "\n", "boston = load_boston()\n", "data = pd.DataFrame(boston.data)\n", "data.columns = boston.feature_names\n", "data.head()" ] }, { "cell_type": "code", "execution_count": null, "id": "thrown-dating", "metadata": {}, "outputs": [], "source": [ "print(boston.DESCR)" ] }, { "cell_type": "markdown", "id": "stainless-sample", "metadata": {}, "source": [ "### 학습/테스트 데이터셋 분리 & S3 데이터 업로드" ] }, { "cell_type": "code", "execution_count": null, "id": "breathing-rotation", "metadata": {}, "outputs": [], "source": [ "import sagemaker\n", "\n", "sess = sagemaker.Session()\n", "bucket = sagemaker.Session().default_bucket() # replace with an existing bucket if needed\n", "prefix = 'sagemaker/DEMO-boston-sm' # prefix used for all data stored within the bucket\n", "\n", "# Define IAM role\n", "import boto3\n", "from sagemaker import get_execution_role\n", "\n", "role = get_execution_role()" ] }, { "cell_type": "markdown", "id": "lesbian-advertiser", "metadata": {}, "source": [ "SageMaker 에서 제공하는 XGBoost를 사용하기 위해 첫번째 컬럼에 레이블이 오도록 데이터셋을 생성하고 S3에 업로드합니다. " ] }, { "cell_type": "code", "execution_count": null, "id": "flush-scheduling", "metadata": {}, "outputs": [], "source": [ "data['y'] = boston.target\n", "train_df, valid_df, test_df = np.split(pd.concat([data['y'],data.iloc[:,:-1]],axis=1), [int(len(data)*0.7), int(len(data)*0.9)])\n", "train_df.to_csv('boston_train.csv', index=False, header=False)\n", "valid_df.to_csv('boston_valid.csv', index=False, header=False)" ] }, { "cell_type": "code", "execution_count": null, "id": "sexual-carol", "metadata": {}, "outputs": [], "source": [ "import os \n", "boto3.Session().resource('s3').Bucket(bucket).Object(os.path.join(prefix, 'train/train.csv')).upload_file('boston_train.csv')\n", "boto3.Session().resource('s3').Bucket(bucket).Object(os.path.join(prefix, 'validation/validation.csv')).upload_file('boston_valid.csv')" ] }, { "cell_type": "markdown", "id": "aging-bradley", "metadata": {}, "source": [ "### SageMaker XGBoost를 이용한 Regression 학습\n" ] }, { "cell_type": "code", "execution_count": null, "id": "shaped-machinery", "metadata": {}, "outputs": [], "source": [ "from sagemaker.amazon.amazon_estimator import image_uris\n", "container = image_uris.retrieve('xgboost', region=sess.boto_region_name, version='latest')\n", "\n", "s3_input_train = sagemaker.inputs.TrainingInput(s3_data='s3://{}/{}/train'.format(bucket, prefix), content_type='csv')\n", "s3_input_valid = sagemaker.inputs.TrainingInput(s3_data='s3://{}/{}/validation/'.format(bucket, prefix), content_type='csv')\n" ] }, { "cell_type": "markdown", "id": "subjective-minnesota", "metadata": {}, "source": [ "SageMaker를 이용하여 Cloud에서 학습을 실행합니다. 