{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# XGBoost simple example\n",
"\n",
"source : https://www.datacamp.com/community/tutorials/xgboost-in-python"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"!pip install xgboost"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 데이터 로드"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from sklearn.datasets import load_boston\n",
"import pandas as pd\n",
"import numpy as np\n",
"\n",
"boston = load_boston()\n",
"data = pd.DataFrame(boston.data)\n",
"data.columns = boston.feature_names\n",
"data.head()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"print(boston.DESCR)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 학습/테스트 데이터셋 분리\n",
"\n",
"_아래 힌트를 참고하여 다음 셀의 TO DO 를 완성하세요._\n",
"\n",
"\n",
" 힌트
\n",
" \n",
" ```python\n",
" X_train, X_test, y_train, y_test = train_test_split(data, boston.target, test_size=0.2, random_state=123)\n",
" ```\n",
" \n",
" "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from sklearn.model_selection import train_test_split\n",
"\n",
"X_train, X_test, y_train, y_test = train_test_split()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"y_train"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### XGBoost를 이용한 Regression 학습\n",
"\n",
"- 파라미터 참조 : https://xgboost.readthedocs.io/en/latest/parameter.html#general-parameters\n",
"- 주요 파리미터\n",
" + objective : determines the loss function to be used like `reg:linear` for regression problems, `reg:logistic` for classification problems with only decision, `binary:logistic` for classification problems with probability.\n",
" + colsample_bytree : percentage of features used per tree. High value can lead to overfitting.\n",
" + learning_rate : step size shrinkage used to prevent overfitting. Range is [0,1]\n",
" + max_depth : determines how deeply each tree is allowed to grow during any boosting round.\n",
" + alpha : L1 regularization on leaf weights. A large value leads to more regularization.\n",
" + n_estimators : number of trees you want to build."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import xgboost as xgb\n",
"\n",
"xg_reg = xgb.XGBRegressor(objective ='reg:squarederror', colsample_bytree = 0.3, learning_rate = 0.1,\n",
" max_depth = 5, alpha = 10, n_estimators = 10)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"_아래 힌트를 참고하여 다음 셀의 TO DO 를 완성하세요._\n",
"\n",
"\n",
" 힌트
\n",
" \n",
" ```python\n",
" model = xg_reg.fit(X_train,y_train)\n",
" ```\n",
" \n",
" "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"model = xg_reg.fit()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"model.get_booster().get_score(importance_type='weight')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 예측\n",
"\n",
"_아래 힌트를 참고하여 다음 셀의 TO DO 를 완성하세요._\n",
"\n",
"\n",
" 힌트
\n",
" \n",
" ```python\n",
" preds = xg_reg.predict(X_test)\n",
" ```\n",
" \n",
" "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"preds = xg_reg.predict()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from sklearn.metrics import mean_squared_error\n",
"\n",
"rmse = np.sqrt(mean_squared_error(y_test, preds))\n",
"print(\"RMSE: %f\" % (rmse))"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import matplotlib.pyplot as plt\n",
"plt.plot(preds)\n",
"plt.plot(y_test)\n",
"plt.legend(['pred','real'])\n",
"plt.title('Prediction vs Real price')\n",
"plt.show()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "conda_python3",
"language": "python",
"name": "conda_python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.13"
}
},
"nbformat": 4,
"nbformat_minor": 4
}