{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# 리뷰 데이터를 통해 제품 성공 예측하기 \n", "\n", "_**매출이 \"Hit\"임계치를 초과할지를 예측하기 위해 XGBoost 이용하기 [(원본)](https://github.com/jihys/sagemaker-workshop-0809/blob/master/module2-video-game-sales-xgboost.ipynb)**_\n", "\n", "---\n", "\n", "## 목차\n", "\n", "1. [배경](#배경)\n", "1. [설정](#설정)\n", "1. [데이터](#데이터)\n", "1. [훈련](#훈련)\n", "1. [호스트](#호스트)\n", "1. [평가](#평가)\n", "1. [확장](#확장)\n", "1. [정리](#정리)\n", "\n", "\n", "## 배경\n", "\n", "사용자 리뷰, 비평 리뷰, 소셜 미디어 댓글 등의 입소문은 종종 제품이 궁극적으로 성공할것인지에 대한 인사이트를 제공할 수 있습니다. 특히 비디오 게임 업계에서는 리뷰와 평가가 게임의 성공에 큰 영향을 줄 수 있습니다. 그러나 나쁜 리뷰의 게임이 모두 실패하는 것이 아니고 또한 좋은 리뷰의 게임이 hit가 되는 것 또한 아닙니다. hit게임을 예측하기 위해서는 기계학습 알고리즘은 리뷰와 함께 잠재적인 다양한 관련 데이터의 특성을 이용할 수 있습니다. \n", "\n", "이 노트북에서는 Kaggle의 [Video Game Sales with Ratings](https://www.kaggle.com/rush4ratio/video-game-sales-with-ratings) 데이터셋을 사용할 것입니다. 이 데이터셋은 [Metacritic](http://www.metacritic.com/browse/games/release-date/available)와 user review, 뿐만 아니라 critic review와 ESRB ratings과 다른 것들을 포함하고 있습니다. user review와 critic review는 rating score 형식으로서 0에서 10, 또는 0에서 100까지 척도로 되어있습니다. 이것들은 편리하지만 데이터셋의 중대한 문제는 상대적으로 작다는 것입니다.\n", "\n", "이와 같이 작은 데이터셋을 다루는 것은 기계학습에서 공통적인 문제입니다. 이 문제는 종종 작은 데이터셋의 클래스간의 불균형으로 인해 악화되기도 합니다. 이러한 상황해서 앙상블 학습을 사용하는 것은 좋은 선택입니다. 이 노트북에서는 앙상블 Learner인 XGBoost를 사용하여, 게임의 hit 여부를 판단하기 위한 분류기를 만드는데 초점을 맞출 것입니다. \n", "\n", "\n", "## 설정\n", "\n", "다음 설정부터 시작합니다. \n", "\n", "- 실습을 위해 필요한 다양한 Python 라이브러리들을 import 합니다 \n", "- 노트북 내의 다양한 작업을 위해 SageMaker의 세션객체를 생성하고 AWS 리전 정보를 가져옵니다. \n", "- 학습과 모델 데이터 사용을 위한 S3 버킷과 Prefix를 설정합니다. \n", "- S3 데이터 접근을 위한 정의한 IAM role을 SageMaker 노트북 인스턴스에서 가져옵니다. \n" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import timeit\n", "start_time = timeit.default_timer()" ] }, { "cell_type": "code", "execution_count": 34, "metadata": { "isConfigCell": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Bucket: \n", "role: arn:aws:iam::415373942856:role/service-role/AmazonSageMaker-ExecutionRole-20191024T194435\n" ] } ], "source": [ "import numpy as np \n", "import pandas as pd \n", "import matplotlib.pyplot as plt \n", "from IPython.display import Image \n", "from IPython.display import display \n", "from sklearn.datasets import dump_svmlight_file \n", "from time import gmtime, strftime \n", "import sys \n", "import math \n", "import json\n", "import boto3\n", "import sagemaker\n", "\n", "session = sagemaker.Session()\n", "region = session.boto_region_name\n", "\n", "#bucket = session.default_bucket()\n", "bucket=''\n", "prefix = 'sagemaker/videogames-xgboost'\n", "role = sagemaker.get_execution_role()\n", "\n", "print('Bucket: {}'.format(bucket))\n", "print('role: {}'.format(role))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---\n", "## 데이터\n", "\n", "먼저 public S3 버킷에서 이 노트북 인스턴스로 데이터셋을 다운로드합니다. 이 데이터는 이 노트북과 같은 디렉토리에 나타날 것입니다. \n", "우리는 그 데이터를 먼저 먼저 살펴보겠습니다. " ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
NamePlatformYear_of_ReleaseGenrePublisherNA_SalesEU_SalesJP_SalesOther_SalesGlobal_SalesCritic_ScoreCritic_CountUser_ScoreUser_CountDeveloperRating
0Wii SportsWii2006.0SportsNintendo41.3628.963.778.4582.5376.051.08322.0NintendoE
1Super Mario Bros.NES1985.0PlatformNintendo29.083.586.810.7740.24NaNNaNNaNNaNNaNNaN
2Mario Kart WiiWii2008.0RacingNintendo15.6812.763.793.2935.5282.073.08.3709.0NintendoE
3Wii Sports ResortWii2009.0SportsNintendo15.6110.933.282.9532.7780.073.08192.0NintendoE
4Pokemon Red/Pokemon BlueGB1996.0Role-PlayingNintendo11.278.8910.221.0031.37NaNNaNNaNNaNNaNNaN
5TetrisGB1989.0PuzzleNintendo23.202.264.220.5830.26NaNNaNNaNNaNNaNNaN
6New Super Mario Bros.DS2006.0PlatformNintendo11.289.146.502.8829.8089.065.08.5431.0NintendoE
7Wii PlayWii2006.0MiscNintendo13.969.182.932.8428.9258.041.06.6129.0NintendoE
8New Super Mario Bros. WiiWii2009.0PlatformNintendo14.446.944.702.2428.3287.080.08.4594.0NintendoE
9Duck HuntNES1984.0ShooterNintendo26.930.630.280.4728.31NaNNaNNaNNaNNaNNaN
...................................................
1670915 DaysPC2009.0AdventureDTP Entertainment0.000.010.000.000.0163.06.05.88.0DTP EntertainmentNaN
16710Men in Black II: Alien EscapeGC2003.0ShooterInfogrames0.010.000.000.000.01NaNNaNtbdNaNAtariT
16711Aiyoku no EustiaPSV2014.0Miscdramatic create0.000.000.010.000.01NaNNaNNaNNaNNaNNaN
16712Woody Woodpecker in Crazy Castle 5GBA2002.0PlatformKemco0.010.000.000.000.01NaNNaNNaNNaNNaNNaN
16713SCORE International Baja 1000: The Official GamePS22008.0RacingActivision0.000.000.000.000.01NaNNaNNaNNaNNaNNaN
16714Samurai Warriors: Sanada MaruPS32016.0ActionTecmo Koei0.000.000.010.000.01NaNNaNNaNNaNNaNNaN
16715LMA Manager 2007X3602006.0SportsCodemasters0.000.010.000.000.01NaNNaNNaNNaNNaNNaN
16716Haitaka no PsychedelicaPSV2016.0AdventureIdea Factory0.000.000.010.000.01NaNNaNNaNNaNNaNNaN
16717Spirits & SpellsGBA2003.0PlatformWanadoo0.010.000.000.000.01NaNNaNNaNNaNNaNNaN
16718Winning Post 8 2016PSV2016.0SimulationTecmo Koei0.000.000.010.000.01NaNNaNNaNNaNNaNNaN
\n", "

16719 rows × 16 columns

\n", "
" ], "text/plain": [ " Name Platform \\\n", "0 Wii Sports Wii \n", "1 Super Mario Bros. NES \n", "2 Mario Kart Wii Wii \n", "3 Wii Sports Resort Wii \n", "4 Pokemon Red/Pokemon Blue GB \n", "5 Tetris GB \n", "6 New Super Mario Bros. DS \n", "7 Wii Play Wii \n", "8 New Super Mario Bros. Wii Wii \n", "9 Duck Hunt NES \n", "... ... ... \n", "16709 15 Days PC \n", "16710 Men in Black II: Alien Escape GC \n", "16711 Aiyoku no Eustia PSV \n", "16712 Woody Woodpecker in Crazy Castle 5 GBA \n", "16713 SCORE International Baja 1000: The Official Game PS2 \n", "16714 Samurai Warriors: Sanada Maru PS3 \n", "16715 LMA Manager 2007 X360 \n", "16716 Haitaka no Psychedelica PSV \n", "16717 Spirits & Spells GBA \n", "16718 Winning Post 8 2016 PSV \n", "\n", " Year_of_Release Genre Publisher NA_Sales EU_Sales \\\n", "0 2006.0 Sports Nintendo 41.36 28.96 \n", "1 1985.0 Platform Nintendo 29.08 3.58 \n", "2 2008.0 Racing Nintendo 15.68 12.76 \n", "3 2009.0 Sports Nintendo 15.61 10.93 \n", "4 1996.0 Role-Playing Nintendo 11.27 8.89 \n", "5 1989.0 Puzzle Nintendo 23.20 2.26 \n", "6 2006.0 Platform Nintendo 11.28 9.14 \n", "7 2006.0 Misc Nintendo 13.96 9.18 \n", "8 2009.0 Platform Nintendo 14.44 6.94 \n", "9 1984.0 Shooter Nintendo 26.93 0.63 \n", "... ... ... ... ... ... \n", "16709 2009.0 Adventure DTP Entertainment 0.00 0.01 \n", "16710 2003.0 Shooter Infogrames 0.01 0.00 \n", "16711 2014.0 Misc dramatic create 0.00 0.00 \n", "16712 2002.0 Platform Kemco 0.01 0.00 \n", "16713 2008.0 Racing Activision 0.00 0.00 \n", "16714 2016.0 Action Tecmo Koei 0.00 0.00 \n", "16715 2006.0 Sports Codemasters 0.00 0.01 \n", "16716 2016.0 Adventure Idea Factory 0.00 0.00 \n", "16717 2003.0 Platform Wanadoo 0.01 0.00 \n", "16718 2016.0 Simulation Tecmo Koei 0.00 0.00 \n", "\n", " JP_Sales Other_Sales Global_Sales Critic_Score Critic_Count \\\n", "0 3.77 8.45 82.53 76.0 51.0 \n", "1 6.81 0.77 40.24 NaN NaN \n", "2 3.79 3.29 35.52 82.0 73.0 \n", "3 3.28 2.95 32.77 80.0 73.0 \n", "4 10.22 1.00 31.37 NaN NaN \n", "5 4.22 0.58 30.26 NaN NaN \n", "6 6.50 2.88 29.80 89.0 65.0 \n", "7 2.93 2.84 28.92 58.0 41.0 \n", "8 4.70 2.24 28.32 87.0 80.0 \n", "9 0.28 0.47 28.31 NaN NaN \n", "... ... ... ... ... ... \n", "16709 0.00 0.00 0.01 63.0 6.0 \n", "16710 0.00 0.00 0.01 NaN NaN \n", "16711 0.01 0.00 0.01 NaN NaN \n", "16712 0.00 0.00 0.01 NaN NaN \n", "16713 0.00 0.00 0.01 NaN NaN \n", "16714 0.01 0.00 0.01 NaN NaN \n", "16715 0.00 0.00 0.01 NaN NaN \n", "16716 0.01 0.00 0.01 NaN NaN \n", "16717 0.00 0.00 0.01 NaN NaN \n", "16718 0.01 0.00 0.01 NaN NaN \n", "\n", " User_Score User_Count Developer Rating \n", "0 8 322.0 Nintendo E \n", "1 NaN NaN NaN NaN \n", "2 8.3 709.0 Nintendo E \n", "3 8 192.0 Nintendo E \n", "4 NaN NaN NaN NaN \n", "5 NaN NaN NaN NaN \n", "6 8.5 431.0 Nintendo E \n", "7 6.6 129.0 Nintendo E \n", "8 8.4 594.0 Nintendo E \n", "9 NaN NaN NaN NaN \n", "... ... ... ... ... \n", "16709 5.8 8.0 DTP Entertainment NaN \n", "16710 tbd NaN Atari T \n", "16711 NaN NaN NaN NaN \n", "16712 NaN NaN NaN NaN \n", "16713 NaN NaN NaN NaN \n", "16714 NaN NaN NaN NaN \n", "16715 NaN NaN NaN NaN \n", "16716 NaN NaN NaN NaN \n", "16717 NaN NaN NaN NaN \n", "16718 NaN NaN NaN NaN \n", "\n", "[16719 rows x 16 columns]" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "raw_data_filename = 'Video_Games_Sales_as_at_22_Dec_2016.csv'\n", "data_bucket = 'sagemaker-workshop-pdx'\n", "\n", "s3 = boto3.resource('s3')\n", "s3.Bucket(data_bucket).download_file(raw_data_filename, 'raw_data.csv')\n", "\n", "data = pd.read_csv('./raw_data.csv')\n", "pd.set_option('display.max_rows', 20) \n", "data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "계속 진행하기 전에, 예측할 타켓을 지정해야 합니다. 비디오게임개발 에산은 수천만달러에 이르기 때문에, 비용을 회수하고 수익을 얻기위해 게임 퍼블리셔가 \"hit\"게임을 퍼블리싱하는 것은 매우 중요합니다. \"hit\"게임은 글로벌 매출이 100만 이상인 것을 타켓으로 지정할 것입니다. " ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [], "source": [ "data['y'] = (data['Global_Sales'] > 1)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "이제 목표가 정의되었으므로 \"hit\"와 \"not a hit\" 클래스의 불균형을 살펴보도록 하겠습니다.: " ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYcAAAD8CAYAAACcjGjIAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDMuMC4zLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvnQurowAAEjBJREFUeJzt3X+snuV93/H3Z3bJjy6JTThjie3sWIuVyrA2pWeELWq1hsqYpKqRRiNYVJzMiv8oWZuuUwKdNm+kSESdxoqWULnBw1RRCKKpsBoS6hKydFshHBLCzzDOIIntQTiJDWmXlczsuz/O5e6Jr3N8zPMc/Bxy3i/p0XPf3+u67vu6pUd8zv3LpKqQJGnQ3xj3BCRJy4/hIEnqGA6SpI7hIEnqGA6SpI7hIEnqGA6SpI7hIEnqGA6SpM7qcU9gWGeccUZNTk6OexqS9LJy3333faeqJhbr97INh8nJSaanp8c9DUl6WUnyzZPp52UlSVLHcJAkdQwHSVLHcJAkdQwHSVLHcJAkdQwHSVLHcJAkdQwHSVLnZfuG9Cgmr/jsuKegZeob17xr3FOQlgXPHCRJHcNBktQxHCRJHcNBktRZNByS7EnyTJKH5mn7zSSV5Iy2niTXJZlJ8kCScwb6bk/yePtsH6j/TJIH25jrkmSpDk6SNJyTOXO4Edh6fDHJBmAL8K2B8oXApvbZCVzf+p4O7ALeBpwL7Eqyto25Hnj/wLhuX5KkU2vRcKiqLwGH52m6FvgQUAO1bcBNNeduYE2SNwAXAPur6nBVHQH2A1tb22ur6u6qKuAm4KLRDkmSNKqh7jkk2QYcqqqvHde0DjgwsH6w1U5UPzhPXZI0Ri/6JbgkrwZ+i7lLSqdUkp3MXa7iTW9606nevSStGMOcOfxdYCPwtSTfANYDX0nyt4FDwIaBvutb7UT19fPU51VVu6tqqqqmJiYW/f9jS5KG9KLDoaoerKq/VVWTVTXJ3KWgc6rqaWAfcFl7auk84Lmqegq4A9iSZG27Eb0FuKO1fS/Jee0ppcuA25bo2CRJQzqZR1k/Bfw58JYkB5PsOEH324EngBng94FfBaiqw8BHgHvb56pWo/X5RBvzP4DPDXcokqSlsug9h6q6dJH2yYHlAi5foN8eYM889Wng7MXmIUk6dXxDWpLUMRwkSR3DQZLUMRwkSR3DQZLUMRwkSR3DQZLUMRwkSR3DQZLUMRwkSR3DQZLUMRwkSR3DQZLUMRwkSR3DQZLUMRwkSR3DQZLUMRwkSR3DQZLUMRwkSZ1FwyHJniTPJHlooPY7Sb6e5IEkf5RkzUDblUlmkjyW5IKB+tZWm0lyxUB9Y5J7Wv3TSU5bygOUJL14J3PmcCOw9bjafuDsqvpJ4L8DVwIk2QxcApzVxnw8yaokq4CPARcCm4FLW1+AjwLXVtWbgSPAjpGOSJI0skXDoaq+BBw+rvYnVXW0rd4NrG/L24Cbq+r5qnoSmAHObZ+Zqnqiqn4A3AxsSxLgHcCtbfxe4KIRj0mSNKKluOfwT4HPteV1wIGBtoOttlD99cCzA0FzrD6vJDuTTCeZnp2dXYKpS5LmM1I4JPmXwFHgk0sznROrqt1VNVVVUxMTE6dil5K0Iq0edmCS9wK/CJxfVdXKh4ANA93WtxoL1L8LrEmyup09DPaXJI3JUGcOSbYCHwJ+qaq+P9C0D7gkySuSbAQ2AV8G7gU2tSeTTmPupvW+Fip3ARe38duB24Y7FEnSUjmZR1k/Bfw58JYkB5PsAP4j8Bpgf5L7k/weQFU9DNwCPAJ8Hri8ql5oZwUfAO4AHgVuaX0BPgz88yQzzN2DuGFJj1CS9KItelmpqi6dp7zgf8Cr6mrg6nnqtwO3z1N/grmnmSRJy4RvSEuSOoaDJKljOEiSOoaDJKljOEiSOoaDJKljOEiSOoaDJKljOEiSOoaDJKljOEiSOoaDJKljOEiSOoaDJKljOEiSOoaDJKljOEiSOoaDJKljOEiSOouGQ5I9SZ5J8tBA7fQk+5M83r7XtnqSXJdkJskDSc4ZGLO99X88yfaB+s8kebCNuS5JlvogJUkvzsmcOdwIbD2udgVwZ1VtAu5s6wAXApvaZydwPcyFCbALeBtwLrDrWKC0Pu8fGHf8viRJp9ii4VBVXwIOH1feBuxty3uBiwbqN9Wcu4E1Sd4AXADsr6rDVXUE2A9sbW2vraq7q6qAmwa2JUkak2HvOZxZVU+15aeBM9vyOuDAQL+DrXai+sF56pKkMRr5hnT7i7+WYC6LSrIzyXSS6dnZ2VOxS0lakYYNh2+3S0K072da/RCwYaDf+lY7UX39PPV5VdXuqpqqqqmJiYkhpy5JWsyw4bAPOPbE0XbgtoH6Ze2ppfOA59rlpzuALUnWthvRW4A7Wtv3kpzXnlK6bGBbkqQxWb1YhySfAv4RcEaSg8w9dXQNcEuSHcA3gXe37rcD7wRmgO8D7wOoqsNJPgLc2/pdVVXHbnL/KnNPRL0K+Fz7SJLGaNFwqKpLF2g6f56+BVy+wHb2AHvmqU8DZy82D0nSqeMb0pKkjuEgSeoYDpKkjuEgSeoYDpKkjuEgSeoYDpKkjuEgSeoYDpKkjuEgSeoYDpKkjuEgSeoYDpKkjuEgSeoYDpKkjuEgSeoYDpKkjuEgSeoYDpKkzkjhkOQ3kjyc5KEkn0ryyiQbk9yTZCbJp5Oc1vq+oq3PtPbJge1c2eqPJblgtEOSJI1q6HBIsg74NWCqqs4GVgGXAB8Frq2qNwNHgB1tyA7gSKtf2/qRZHMbdxawFfh4klXDzkuSNLpRLyutBl6VZDXwauAp4B3Ara19L3BRW97W1mnt5ydJq99cVc9X1ZPADHDuiPOSJI1g6HCoqkPAvwO+xVwoPAfcBzxbVUdbt4PAura8DjjQxh5t/V8/WJ9njCRpDEa5rLSWub/6NwJvBH6cuctCL5kkO5NMJ5menZ19KXclSSvaKJeVfgF4sqpmq+r/AJ8B3g6saZeZANYDh9ryIWADQGt/HfDdwfo8Y35IVe2uqqmqmpqYmBhh6pKkExklHL4FnJfk1e3ewfnAI8BdwMWtz3bgtra8r63T2r9QVdXql7SnmTYCm4AvjzAvSdKIVi/eZX5VdU+SW4GvAEeBrwK7gc8CNyf57Va7oQ25AfiDJDPAYeaeUKKqHk5yC3PBchS4vKpeGHZekqTRDR0OAFW1C9h1XPkJ5nnaqKr+CvjlBbZzNXD1KHORJC0d35CWJHUMB0lSx3CQJHUMB0lSx3CQJHUMB0lSx3CQJHUMB0lSx3CQJHUMB0lSx3CQJHUMB0lSx3CQJHUMB0lSx3CQJHUMB0lSx3CQJHUMB0lSx3CQJHVGCocka5LcmuTrSR5N8g+SnJ5kf5LH2/fa1jdJrksyk+SBJOcMbGd76/94ku2jHpQkaTSjnjn8LvD5qvoJ4KeAR4ErgDurahNwZ1sHuBDY1D47gesBkpwO7ALeBpwL7DoWKJKk8Rg6HJK8Dvg54AaAqvpBVT0LbAP2tm57gYva8jbgpppzN7AmyRuAC4D9VXW4qo4A+4Gtw85LkjS6Uc4cNgKzwH9K8tUkn0jy48CZVfVU6/M0cGZbXgccGBh/sNUWqkuSxmSUcFgNnANcX1U/Dfwv/v8lJACqqoAaYR8/JMnOJNNJpmdnZ5dqs5Kk44wSDgeBg1V1T1u/lbmw+Ha7XET7fqa1HwI2DIxf32oL1TtVtbuqpqpqamJiYoSpS5JOZOhwqKqngQNJ3tJK5wOPAPuAY08cbQdua8v7gMvaU0vnAc+1y093AFuSrG03ore0miRpTFaPOP6fAZ9MchrwBPA+5gLnliQ7gG8C7259bwfeCcwA3299qarDST4C3Nv6XVVVh0eclyRpBCOFQ1XdD0zN03T+PH0LuHyB7ewB9owyF0nS0vENaUlSx3CQJHUMB0lSx3CQJHUMB0lSx3CQJHUMB0lSx3CQJHUMB0lSx3CQJHUMB0lSx3CQJHUMB0lSx3CQJHUMB0lSx3CQJHUMB0lSx3CQJHUMB0lSx3CQJHVGDockq5J8Nckft/WNSe5JMpPk00lOa/VXtPWZ1j45sI0rW/2xJBeMOidJ0miW4szh14FHB9Y/ClxbVW8GjgA7Wn0HcKTVr239SLIZuAQ4C9gKfDzJqiWYlyRpSCOFQ5L1wLuAT7T1AO8Abm1d9gIXteVtbZ3Wfn7rvw24uaqer6ongRng3FHmJUkazahnDv8B+BDwf9v664Fnq+poWz8IrGvL64ADAK39udb/r+vzjPkhSXYmmU4yPTs7O+LUJUkLGTockvwi8ExV3beE8zmhqtpdVVNVNTUxMXGqditJK87qEca+HfilJO8EXgm8FvhdYE2S1e3sYD1wqPU/BGwADiZZDbwO+O5A/ZjBMZKkMRj6zKGqrqyq9VU1ydwN5S9U1XuAu4CLW7ftwG1teV9bp7V/oaqq1S9pTzNtBDYBXx52XpKk0Y1y5rCQDwM3J/lt4KvADa1+A/AHSWaAw8wFClX1cJJbgEeAo8DlVfXCSzAvSdJJWpJwqKovAl9sy08wz9NGVfVXwC8vMP5q4OqlmIskaXS+IS1J6hgOkqSO4SBJ6hgOkqSO4SBJ6hgOkqSO4SBJ6hgOkqSO4SBJ6hgOkqSO4SBJ6hgOkqSO4SBJ6hgOkqSO4SBJ6hgOkqSO4SBJ6hgOkqSO4SBJ6gwdDkk2JLkrySNJHk7y661+epL9SR5v32tbPUmuSzKT5IEk5wxsa3vr/3iS7aMfliRpFKOcORwFfrOqNgPnAZcn2QxcAdxZVZuAO9s6wIXApvbZCVwPc2EC7ALeBpwL7DoWKJKk8Rg6HKrqqar6Slv+C+BRYB2wDdjbuu0FLmrL24Cbas7dwJokbwAuAPZX1eGqOgLsB7YOOy9J0uhWL8VGkkwCPw3cA5xZVU+1pqeBM9vyOuDAwLCDrbZQXVqxJq/47LinoGXqG9e865TsZ+Qb0kn+JvCHwAer6nuDbVVVQI26j4F97UwynWR6dnZ2qTYrSTrOSOGQ5MeYC4ZPVtVnWvnb7XIR7fuZVj8EbBgYvr7VFqp3qmp3VU1V1dTExMQoU5ckncAoTysFuAF4tKr+/UDTPuDYE0fbgdsG6pe1p5bOA55rl5/uALYkWdtuRG9pNUnSmIxyz+HtwK8ADya5v9V+C7gGuCXJDuCbwLtb2+3AO4EZ4PvA+wCq6nCSjwD3tn5XVdXhEeYlSRrR0OFQVf8FyALN58/Tv4DLF9jWHmDPsHORJC0t35CWJHUMB0lSx3CQJHUMB0lSx3CQJHUMB0lSx3CQJHUMB0lSx3CQJHUMB0lSx3CQJHUMB0lSx3CQJHUMB0lSx3CQJHUMB0lSx3CQJHUMB0lSx3CQJHWWTTgk2ZrksSQzSa4Y93wkaSVbFuGQZBXwMeBCYDNwaZLN452VJK1cyyIcgHOBmap6oqp+ANwMbBvznCRpxVou4bAOODCwfrDVJEljsHrcE3gxkuwEdrbVv0zy2Djn8yPkDOA7457EcpCPjnsGWoC/0WYJfqN/52Q6LZdwOARsGFhf32o/pKp2A7tP1aRWiiTTVTU17nlIC/E3euotl8tK9wKbkmxMchpwCbBvzHOSpBVrWZw5VNXRJB8A7gBWAXuq6uExT0uSVqxlEQ4AVXU7cPu457FCealOy52/0VMsVTXuOUiSlpnlcs9BkrSMGA4rQJL3JnnjCOP/TZJ/sUDbf2vfk0n+ybD7kOCvf0cPzVO/KskvtOUPJnn1qZ/dymI4rAzvBYYOhxOpqn/YFicBw0Eviar611X1p231g4Dh8BIzHF5m2l9Wjyb5/SQPJ/mTJK9qbW9NcneSB5L8UZK1SS4GpoBPJrn/WN+B7b0/yb1JvpbkD0/wF9nmJF9M8kSSXxsY/5dt8RrgZ9s+fuMlOHStHKuO/30nuTHJxe2390bgriR3jXuiP8oMh5enTcDHquos4FngH7f6TcCHq+ongQeBXVV1KzANvKeq3lpV//u4bX2mqv5+Vf0U8CiwY4F9/gRwAXP/DtauJD92XPsVwJ+1fVw76gFqRVvo901VXQf8T+Dnq+rnxzS/FcFweHl6sqrub8v3AZNJXgesqar/3Op7gZ87iW2dneTPkjwIvAc4a4F+n62q56vqO8AzwJkjzF86ke73Pca5rFiGw8vT8wPLLzDa+yo3Ah+oqr8H/Fvgladgn9KJ+FtbBgyHHxFV9RxwJMnPttKvAMfOIv4CeM0CQ18DPNUuE71nhCmcaB/SUvK3dgoYDj9atgO/k+QB4K3AVa1+I/B7892QBv4VcA/wX4Gvj7DvB4AX2o1tb0jrpbQb+Lw3pF9aviEtSep45iBJ6hgOkqSO4SBJ6hgOkqSO4SBJ6hgOkqSO4SBJ6hgOkqTO/wNKOBKMYUtaYAAAAABJRU5ErkJggg==\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "plt.bar(['not a hit', 'hit'], data['y'].value_counts())\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "놀라울 것도 없이, 단지 일부의 게임만이 \"hits\"로 간주되어집니다. 다음으로 타켓에 예측력이 높은 피처를 선택할 것입니다. 리뷰 스코어들과 글로벌 매출을 플로팅을 시작하여 이러한 스코어들이 매출에 영향을 미치는지 확인합니다. 명확성을 위해 로그스케일을 사용합니다. " ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAY8AAAEPCAYAAAC6Kkg/AAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDMuMC4zLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvnQurowAAIABJREFUeJztnXmcFPWZ/z9PVR9zAMMw3MxwmAFcUEAdRQPy80iMUdTsiiYa1/2tiSa7McnGRE3WNV6/HEbNYTSb4LEJ0dUQkhVBchl0EYLHqMMZxAkoMyDXMAzM1Uf19/dHdzXV1VXdVd1VXdU9z/v18iVdU/X9PlU9833q+5wkhADDMAzD2EHyWgCGYRim/GDlwTAMw9iGlQfDMAxjG1YeDMMwjG1YeTAMwzC2YeXBMAzD2IaVB8MwDGMbVh4MwzCMbVh5MAzDMLZh5cEwDMPYJuC1AG4xevRoMXXqVK/FYBiGKRvefPPNw0KIMVbOrVjlMXXqVLS2tnotBsMwTNlARO9bPbcslAcRfQLApQBGAHhCCPFHj0ViGIYZ0njm8yCiJ4noIBFt1R2/mIjeIaJ2Ivo6AAghnhNC3Ajg8wA+6YW8DMMwzAm8dJj/HMDF2gNEJAN4FMDHAcwCcA0RzdKc8h+pnzMMwzAe4pnyEEKsA3BEd/gsAO1CiF1CiCiAZwFcQUnuB/A7IcRbpZaVYRiGycRvobqTAHRoPnemjn0RwEcALCGiz5tdTEQ3EVErEbUeOnTIXUkZhmGGMGXhMBdCPAzgYQvnLQWwFABaWlq4RSLDMJ7R1RtBZ/cAGuur0TAs7LU4juM35bEXQJPmc2PqmGWI6DIAlzU3NzspF8MwjGVWtu3F7b/ZjKAkIZZI4HtXzsHl8yZ5LZaj+M1s9QaA6UQ0jYhCAD4F4Hk7AwghVgkhbqqrq3NFQIZhmFx09UZw+282YzCWwPFIHIOxBG77zWZ09Ua8Fs1RvAzVfQbARgAziaiTiD4jhIgDuBnAHwD8FcByIcQ2r2RkGIaxS2f3AIJS5tIalCR0dg94JJE7eGa2EkJcY3J8DYA1hY7LZiuGYbyksb4asUQi41gskUBjfbVHErmD38xWRcNmK4ZhvKRhWBjfu3IOqoIShocDqApK+N6VcyrOae43hznDMEzZc/m8SVjQPJqjrcoJNlsxDOMHGoaFK1JpqLDZimEYhrFNxSkPhmEYxn0qTnkQ0WVEtLSnp8drURiGYSqWilMebLZiGIZxn4pTHgzDMIz7sPJgGIZhbFNxyoN9HgzDMO5TccqDfR4MwzDuU3HKg2EYhnEfVh4MwzCMbVh5MAzDlIiu3gg2dRytiN4eXNuKYRimBFRad8GK23mww5xhGD+g3WVUYnfBitt5MAzDeI1+l/GF85oRlCQM4kSTKLW7YLlW3mXlwTAM4yDaXYaqLB55qR2AyDgvqijoGYiiqzdSlgqk4sxWDMMwuXDbaW3UwzwkS7j5/OmoCkqoDcmQCVASAl94+m0suH8tnm/b64osbsI7D4ZhhgylcFqb9TC/dv5kjKoN4Z5V26EIAAI4HokDAG77zWYsaB5dVjuQitt5cHkShmGMKJXT2qyHOQDc98J2RJVE1jWq/6OcqLidhxBiFYBVLS0tN3otC8Mw/kE1J5XCaW3Uw3xTx9Gs+VViiQQa66sdlcFtKk55MAzDGGFmTnJr0db3MDeaHwDCAcL3rpxTViYroALNVgzDMEaYmZOcWLStOOH184cDEv7l/5yEx65vwYLm0UXLUGpICJH/rDKkpaVFtLa2ei0GwzAe0dUbyTAb5Tte6Jh2nfDqGFv39uC+F7b7KuOciN4UQrRYOZfNVgzDVBy5FnS9OamYMRc0j87K6TCKnDJSOp9cutH0umIUXKlg5cEwTEVhlKRXbCis2ZhL/7ElywkuE+GlHQdx/slj0TAsbKh0pjTUmjrv17cfLosaWOzzYBimojBK0is2FNZsTEBkOcH7ogruXrUNC+5fi6dfe98wPLg2JBs672tDctnUwGLlwTBMReFGVJXZmLMn1qWd4LUhOf2z3oiCwVgC96zaDpko47qgJKEvqhg67/uiiuOKzy0qzmzFJdkZZmijRjXdpjP9FOM7yDWmmtPx0o6DuHvVNvRGlPR1QZkQixsrsrlNI7NyQbp6IyUNJy4GjrZiGKYiccPpnGvMrt4IFty/FoOxE4t/OEC4YcE0PLlhN0KybMmH8Xzb3iwlVSqfh51oK1YeDMMwDqFd+AdicRARqgIyokoCN5/fjGvnTzZUZHql5FW0FSsPsPJgGKY4Cl3Au3oj2LbvGG5c1oqIxmRVFZSw4fYLACBjXD91GOQ8D4Ypc8ohzt/vFPMMi1nQG4aFUVcdREiWMpRHUJLw9Gt78JOX29Pj3rl4Fu5bvd3RsOJSwcqDYXyGn95E7VDIYu2WkizmGTqRJ2IUnRVVFDz6Ujsi8RPj3rNqO4JSdjRWOXQY5FBdhvEAs1pI5drremXbXiy4fy2ue/w1y82NCrnGCsU+QyfyRIzqaN18/nSE5MxxJQKicSXjmF+jq/TwzoNhSkyut+JSlg13ikLe1N3IAlfp7B6ASGT6chUlgW37jmHRjDF5r3cqT0Rblr02JGNfzwAi8XjGOYOxBIIyISAJVAcDSVPWpbPSisqv3znAOw+GKSn53opLXTbcCQp5U3cjC1ylNiQjomQqj1gCuHFZq6XdTaHVd412kw3Dwnivqw+LH1mPG3/RiqiSfV1MEZCI8NWLZuCWj8zAfS9sd3w35ga882CYEpJvZ+FGgpvbFKLw3FSSfVEFVUEpI98CACLxhOXdjVEzp1yY7Sa1Lwu5iCoC9//+HQzEktqlHJznrDwYpoRYWTTtLlxeU4jCc1NJ5lJAdkyAVqvvGpngvvbrTZg1YUS63IhR90A9quLQIhLCtybLslAeRHQSgDsA1AkhlngtD8MUitVFs9Cy4V5RiMJzS0mqz/hrv27LMhM5tbtpP3AcbR1HMa9ppKGCiCoCl/x4Pe66bJZh90CrRBSRUTPLT3imPIjoSQCLARwUQpyiOX4xgB8BkAE8LoT4rhBiF4DPENEKb6RlKp1S5lWU287CKoUoPLeUpABAJCEkA1ElgbBMIKmwdq/6341vPrcFy17dk/751S2NhgoiGk/gvtXbceels3DfC9uRUBKI2tQjVcFkEUU/4uXO4+cAHgGwTD1ARDKARwF8FEAngDeI6HkhxHZPJGSGBF7kVZTbzqKcUM1I2gQ9QYQXbl6I5nHDLV2vKgttb42oouCaMydnKA4AWN7aiSvmTsDKTR9kjSUSAqdMqsOG2y/ASzsO4q7nt9lSBkIkdx6bOo767kXDM+UhhFhHRFN1h88C0J7aaYCIngVwBQBWHowruBkyyniDUVBCWLb2Bq99kYgqCSiJBOKJEw7sn2983/C6NVv3Gx5XzU4Nw8I4/+Sx+I+VW23dSzwBXPyjdekwXj8ljPotVHcSgA7N504Ak4iogYh+CuA0IvqG2cVEdBMRtRJR66FDh9yWlakAjHICVCflUMUsgbFcKDSSSx9GHYknFYcV9Ml/KlqzU8OwMK4+ozHj55eeMh5Xt2Qe0yacKwmBeAK+TBgtC4e5EKILwOctnLcUwFIgWRjRbbmY8scoJ8DPTkq3KdfSKFpUh/mtKzZBJgmKsBbJZbRjscLVLZPw27fM8zEa66tTxRJ78PRrmSav32/bj4BO8SRyrFx+Shj1m/LYC6BJ87kxdcwy3AyKsYNRToCfnZRuUkkmvOT6SwABEJm1o8yCI4x2LEGZQBCGyX1Xnj4J15zZhP6YYqg8woGkg171mxAIuvcUKAII2Khs7qeEUb8pjzcATCeiaUgqjU8BuNbOAEKIVQBWtbS03OiCfEyFYfaH6Jc/0FLiVmkUJyLZ7Ixh5DBXlaDWAa7fWZmFUS9oHo3HX9mF//zfXRnzrGzbi1Wb9oGIssxbNSEZP73udMyeWJfVIEqPfucrUVJphWQZA7E4hACqgjIUIXyVMOplqO4zAM4DMJqIOgHcJYR4gohuBvAHJEN1nxRCbPNKRqbyKceMbrdwI+vbCTOY3THMlODGv3Xh1l9vQlQRpjsrszDqi0+ZgF+++n5Gi9mkwhBQ9zlaEkJgYl01XtpxMKuHeT4kAu65bDZef68bqzbtRTCQdN7ffflsX5kQK64ZlMZsdeO7777rtThMmcD9M5I42QLVqC2r2hDJTsl2u2MYXROQAIkIUd1b/vBwAE99dj7mNo3MK8f8b7+Y14FeHZAgCLj6jEYsf7MTAYkyFE4xBCTgtX//iKu/n0O6GRSbrZhC4LyLJE4mMDphBitkDP1uMqooSAhkKQ7AeR/CDQun4u9Pa8TiR9ZnmapqQzLiiQQ++ndjsWbLAZtu+eROx2pl4FJQccqDYRh/4IQZrNAxtEqw40gfbluxBTElcwcQkjMzznP1EX/6tT2WwnZ/tm4XEgJZCq8mJOGKeRPxqzc6sHrLAYt3n83e7v6Cr3WailMeHG3FMIXjZKiuE/6kQsNu1WvXtx/GbSsynecAEApIWPPFExnn+vtWzU5qsmDUYsJHPAH85//ugqxzc/RHE/jVGx1Z0VZ2OdIX8U22ecX5PFRaWlpEa2ur12IwTNnghI/CbNxizGAr2/bithWbIUsEJSHwwBJrCs3ofoBkCO0DS+amx2g/cByXPPyKoVmrUFRZnSYckBCS3cvBsePz8FuGOcMwHuFWg6aGYWHMbRpZkOLQht32R5V0Tw6jLGt9ZrzR/dSEZDx2fUt60V3ZtheX/Hi9o4oDAAIEhF3IM43E/dOemM1WDMMA8L6LodEOxarD3MjctqB5dNb9qCG0mzqOIhZX0qG7TqPP3XADr7PNK27nIYRYJYS4qa6uzmtRGMZX5KtZVWj7VSdY2bYXC+5fm9V+1YpCM2vtCyDjfsIBwgUzx+DSH7+CTy7diCU/e9VQcQSlpNnJ73idbV5xOw+GYbKx6gj3otdIvrIo+ZzuuYpbqvfz9Gt78OhL7VizVY10Mt4ZhAISvn/VXHx1eRv8WqCmNuSPbHNWHgxT4ditWVXqnJdci3/DsHBehWaluOVPXm7PirjSE5IJDy6Zg5PHDy+J2akQgjLhP687A7MnjvA82qrizFZEdBkRLe3p6fFaFIYpGCfLorvlCNdTqMxWFv9cTne1uKUWbXHLbfuOQUJuM1QoIGHNl87F5fMmGY7nF6oCMuqqg54rDqACdx6cYc6UO06XRbfqN8hnqsp1jl7mOy+dhVMm1VkyfRVb2djI7p9IJJWPGuZrtuvQtqdVcz78XBQzEld80y6g4pQHw5QzbpRFz+c3sKKscp1jJPMdz21FdVBCQoiMnAojnKhs/IXzmvHjtTshBBBLAJJEuOThV6AkhGFi3qWnjMdXPjoDfVElS8GpTZv07Wb9ABFh8SPrfdFnhZUHw/iIQmo5Wdk1XD5vEmZNGIG2jqOY1zQy/ZZtRVnlO8esidJAaidxy/K2nMqvmEx0VamJRGbPjVwl0AHgz+8cxFc+OiPjmPocY3EFz77RYXKlt6g7KD/0WWHlwTA+wm6uhVUTl9l5VpRVvnOMZNZipaBfIVFeWqVmF5EQuOTH6xGWT5Qj+VVrB4RJAUW/4XWOB8AOc4bxFXZyLczyG/QOa6Pzbl2xCet2HkRtSM6rrPIpNK3MVQGzJSX/gmzkFNc74bWfjQIBrBJRBKKabO1lr+5BJC7KQnEA3ud4ABW482CHOVPuWH0Lt2riMjovEhf4/FNvISEErm5pxPLWTlOTkRWzkirztn3H8JlfvIGYZhEOyoTZE60n7aqKYeveHtz3wvYTxQp1ct65eFbOHY8RYZkgkPSJFLJj8QNqe1uvI664MCLDlClWCxmaFQjUXrP65oWGzmOtPwWA4b/1i9jzbXtxawGFDIET5jUrTZRCMuFrF83E91/cCZEQlnIzAjLhZ58+HZ976k1LJdZlAohg6dxS8C//5yR89tyTXFMcrjSDIqLvAfh/AAYA/B7AHABfEUI8VZCUDMMUhZUdgbr433npLNz3wnZIIPTHMhfloJQMi9V30zPzk+TzsxSapd5+4LitWlNRReDBP+3EXZfNwikT6xCLK7j2iddzlk+PK8kdl2JBGQQlQBGwdG4pIAB/N8H75EAVyzsPImoTQswjor8HsBjALQDWCSHmuilgofDOgxkqmEVbZeVeLJ6Fpvoa3LisNSPvQb/zAIBt+3pS54ms8/Rd8oop2641Ud2zervlvhlatPOrbXSt7kTMCEgEImSY3/yATMDrd7jXitatNrTquZcC+LUQoodsNnZnGMZ5jMqJGIXX3rd6OzbcfgEeWJK5W7n6jGTb1KAkYTCuQAiBUEDOUBxAcofS1nG06NayKlZNVFUBCQmRQEwxdrurpUwAYEpDLZ664Sxc+8TrsOKkNyPuQi8OJ1AEsPFvXVg8d6LXothSHquJaAeSZqt/IaIxAAbdEatwuCQ7Uw4U2yApH7mc6VqzUm1ITu8ktOfGDbK7Y4kE5jWNdKRsu5UwWwmALBMkiRCPm6uCiCLw6q4u/ODFnQhKEiJKAlShvlwgmTeTEMLzJEHLcW5CiK8D+DCAFiFEDEA/gCvcEqxQuCQ743fMyo87iZXw2rlNI9EXVfKGu9YE5XTIcPO44ZZCifPVubISZptA0mzUH1VyOqxlAh760850KHI0nvBtYUMniCrC80ZQgD2HeQ2AfwUwGcBNACYCmAlgtTuiMUxloI9Ycrr8iBFWs7bzJfjJBPz0HzOruOZziFtJXMw3rx0UAVRJQNSR0coDPyQJ2jFb/ReAN5HcfQDAXgC/BisPZohjp2DgF85rdsxnkA8rUU9aJSMTZRUjlCQyLP9tVrbdam2ufPPaIRyQ0Bf1SUhUiSi3JMEPCSE+SUTXAIAQop/YY84McewWDHzkpXborfduLgRWenOoSualHQdx96ptGc7rqoBsS7HZqc2Va147JBzawZQDVTKQAOHOS2d5HrJrJ7c/SkTVSP3mE9GHAHhrdGMYD8lXHsTIrh+SJdx8/nRPWr3momFYGOefPDYrysiuYjMyR0UVBT0D0QwbffuB41jR2oHuvijOP3ksBmPWFcfMcbUZn/9uwgjL15Y7g0oyY/++F7a74iuzg52dx11IJgc2EdHTABYA+L9uCMUw5UAhBQNjiQSunT8Z186fnGFOcjv6ygrFVLfVjqEvZx5PAF94+u30eK3vHcn4+dUtjUgaMaw5ud850Jfxefu+Y5blqwRUE53XlXUtKw8hxJ+I6C0AZyOZ7PhlIcRh1yRjGJ9jtWCg2WJsp5+G05gpK7vZ4fpxunojWP5mZ8Y5SkLgeCQOAPiaQQb58tZO1ASlghPy4pUbWJUTmchTp3le5UFEp+sOfZD6/2QimiyEeMt5sRjG/9gpGGi2GLvR/Ckfye56myCTBEUkspo1af0kuWpbPf3aHjz60rsIyXK6e2A4IEHO4QqNmyiIqF9qgJQRMcVbp7mVncdDOX4mAFzgkCyOwEmCTD6cNBFZjWgqpjKuk/J29Ubw1eVtqbyJpJ/BrFmTdkc0EIuDiFAVkDEYV5DQdOiLxJO7ijue24rakJwzcspMRdz2sZNx/+93GHb9KxVBCSCSykaR3XXZbH+H6gohzi+FIE7BJdn9hx/s+SpumIisRDSZkc/0ZUVeO893275jWQl3Rs2ajHZEgEBMieccv5CQ20tOHYfacACyJEHxcOEWIul/GTs8hB/8ud0zOaxw+dwJ+PTZUzyVwVY/DyI6BcAsAFXqMSHEMqeFYioHL+z5ZrhlIjKy+1t1hucyfVmR1/7zNX61PzYQxaaOo2kZzVrLOsnpk0di694erNt5GGu2HDA85/K5E7Bmy/6S1JqKC+Dp1/cgJPs/A+H5TR9g/knv49PzvVMgdjLM7wJwHpLKYw2AjwNYD4CVB2OIF/b8XBTSHzwf+sVb37Do6jMasfzNzpyLu5npK5+8hTzf2RPrEJQpwzktEXDL8k0ISCd8IAuaRzuWAW7GW3uOAkiG8uqRCfjRp05D06gavLTjUNrhXgrKpZvgPau24+LZ4z3bzdvJ81gC4EIA+4UQ/wxgLgAuIMWYYpTnoC5+XmC3P3i++kxGeR7LNu7J/PzqnrxtYgHjFqz55C3k+TYMC+Ohq+YiHJBQE5LTb9lRRaA/piASF7hleRsApGtYVZu0lq0OSAgHJHz1ozPwrb8/BeEA5WhDaw9FAONHJMOde0uoOMqJoEye/S0B9sxWA0KIBBHFiWgEgIMAmlySi6kA7C7WVijGf6KaiG7VRBqZ5TFYMQcVYtqxs9PJF81V6PPV7nQ6jvTh5mfaMn6u+kDU81Zt2oe7V23PGuf2j5+My+ZORMOwMFa27QWQrIAblAlCCIQDuZ3n+bjm8ddw60UziyisXtkoCeFptJWd14RWIhoJ4DEka1y9BWCjK1IxFYG6+DmVTe1ENdrkQkTJTCUY27bzZY6rFFLcz67yvHzeJKy+eSHuumwWVt+8MCukttDnq+50RlQHDX++69BxdPVG0DAsjMvmTkRQ5wcIypRWHOrzisQT6I8qiCkCsiThX8/7kOHYVjcnMUXgO7/bYe3kIURVAL6oTGAnSfBfU//8KRH9HsAIIcRmd8RiKoVcoax2dhFO+E+0i5yK0RhWfSNGO4Msn4fus90/eLdavqoY+UAA4ME/7sR3f/9Oer6Hrpqb1Zc8l28mIBGODhjXub3x3JPwX395D0EpGRa75PRGrHirM+N7UeFdRzZxQbjtohme9/OwkiQ4BcBRIURP6vP5AD4B4H0i2iGEGEqVkJkCMApltRsl5ISz2+oYdsxBRov3ly+ckfOzVexUqC30DVT1gdy6YjMkAgZSzZnUIoXqfPr7BJCOzjJ6Xn1RBf/9WofhnBeePBZXnt6Ito6jmNc0EvW1IfyqdY/huUw2cUXg22uSO7KbFhnv7kqBlZ3HcgB/D6CHiOYhWYb9O0g6zH8C4LPuicdUIoXsIpzwn1gdw26NJ/3ine+zFbp6I3hpx8GsbG2nyrdrd325qttq51P/M1L8RuXVzfwdS372KmSJUBOUEVUU3LBwGszSOwhAKCAZ7kqGOt9eswO14YBn4bpWlEe1EGJf6t/XAXhSCPEQEUkA2nJcxzCGFJJV7VTRvlw5Fdr5ijUHFYO2t7d+AXaifLvZru/8k8fiP1ZuzThXXxF3274e3LZiEyJxkf7+vvbrTVjzpXOx4fYL8NKOg7jjf7bk7eSnrXf1ny/vMj0vIFsvmDgU8TJc14ry0L76XADgGwCQirxyRagsAYhqkdzlRAG8LIR4uiQTM65QaFa1Ewu60Rhm8xVjDioUs97etSEZihBFO0nNdn2zJoxAX1TBnZfOwn0vbE+XJIkrAjf+4k3EEwkQAaGAjIiuEmFUEbjkx+vx4JI5mNc00tkWsAmBGRPqsGVfj3NjVhABCZ4VR7SiPNYS0XIkCyLWA1gLAEQ0AUV0fiSiJwEsBnBQCHGK5vjFAH4EQAbwuBDiuwD+AcAKIcQqIvoVAFYeZUwxWdVOLOj6wn9+SGRUdz49A7GsXVltWMY9l83G+SePLbp8u9GuTySSi39YTn4Xdy6ehab6GvzTk69DACdKhgggbmKKisaTEWlL/7EFVUEpS/kVSkyAFUcO4op34bpWlMe/AfgkgAkAFgohYqnj4wHcUcTcPwfwCDQZ6kQkA3gUwEcBdAJ4g4ieB9AIYEvqtMIDxxnfUGhWtdPYnc+NOl3anU9USUDR7cqUhEgrjmLLvRjt+pI7BYFoyq9w76pt+MzCabaNRcmERTYxlRIviyNaKYwoADxrcPxt7Wci2iiEOMfqxEKIdUQ0VXf4LADtQohdqTGfBXAFkoqkEUkfizMprIznGO0iGuurMRDLzCgeiMVde7uy44h3sk6XqoRqQ3LWzicoE8IBpEud26l1lQ+jZk0BCRnFEiNxgcde2Z13rOzr4th1qBe3fGQGvv/iTkgg9NvoEMjY498vOdnT4oi2CiPmoSr/KXmZBEAb39cJYD6AhwE8QkSXAlhldjER3QTgJgCYPHmyA+IwXqDvKuemb82qI95J85ZWCUWURFYlWQnAQ1fNxWAsgXlNI9E8bjgAZ3ZlRs2ajAKZ8jVmCkjAly+cgUdfbkdQktAbiSOqAHev+iuAZEHDmeOG44E/7rQkF2OP2rCM+dMaPJXBSeXh2n5VCNEH4J8tnLcUwFIAaGlp4f1zGdLZPYCqgJxR+rsqILvqFLTiiHfKnGZc6jyTiCLw1V9vRkjO3OE4Ea7c2T0AoatQW0g8UzwBfPyU8bh2/mRs/FsXbn4mwxCB5zd9gJC83+aojFW8Lk0C+M8EtBeZ9bIaU8cYn5OviKBV3KiHZQWjwoRuyGVUzNCISDy7NIoT5V5qQ3JWNFQ+xSEbiBsOENo6klVxB01MU+VSnbbcCMrwvDQJ4OzOwwnbwhsAphPRNCSVxqcAXGtLCO4kWHKc9AU4kc/hBnbk0jvV2w8cT2dTF1IPS7vDsdLWNtcOqi+q2I6GMkrgi8QF7l61DXc8twXXnMkm4lJBAH73pUVpU6ansiT94Q4MRHSKEGJr/jPT5z+DZH+Q0QAOALhLCPEEEV0C4IdIhuo+KYT4ViHytLS0iNbW1kIuZWzQ1RvBgvvXZixGVUEJG26/wNEGS34hn1x6RXrmlHq80t6V/vn150xGy5RRGUqoZUo91mvOkQjQWpasPk+rXQf135cdqmTCoMGOQiaC4tBawphz5WkT8e+XznLtb4KI3hRCtFg6N5/yIKLjMN7ZEpLBWCPsi+gemp3Hje+++67X4pQNhS7WmzqO4rrHX8to1jM8HMBTn52PuU0j3RDVt1hdmF/8yiLU14bS0VYX/2hdhtNaomTElTbaKt9OzooSV7/jrXt70omARsorFzeeOw3L/vKeaSJgQCJHu/6dPa0eg7EE2jo51wMAhoVlxBPCtY6cdpSHlVBd7/dHNuAe5vYpxuzklY/CT+RK8DOireMolrQ0oWFYGOt2HsqKdkqIZLRV06hay8o8n0Nf/x3fuXgWTplYh9qQjMWPrLd0nxIBH5s1LmcYr9PtYt94rztVooQBsgtWlkVJdhUiGovMHuZcDrOMKTYE1a8+ilKRL8HnTlGaAAAgAElEQVTPiHmaHdne7n7Dc44Pxk13bka7xFxK3Og7vm/1dmy4/QJbDa0kAl7bfSTveU6iCCBUUDxYZSEBGd+QTORZWRIVOz3MLwfwEICJSHYRnALgrwBmuyNaYbDD3B5OhKB6WUTQS4wW5aS5SSAgyVBEAmdNHZXl89A6OyNx40gls+Mr2/bi1l+3gSBBIIEHr5qXrsNlpsQ3dRxFTLe9icUT6e/LqgM/ngAeXttu6VwniZiV3B1C6J9AX1TB1n09npqG7ew87gNwNoAXhRCnpfp6XOeOWIXDZit7OGV2crqIoF8d5lqMFG+yQEeqU6EgXNXShC9fOB3r3j2MRdNHo0WX2LWweQyS72AwOJ5JV28E//ZsW+odPDnnl59tSxc1XNA8Or2b0D63WFyB3kWhiORxrdLRllM3w6w0+pcvbMbP1u0yHEMmZM1vi6G96TDlnue3oqm+GrMn1nnyN2InzyMmhOgCIBGRJIR4CYAlxwrjX5xuFesETrSbLQVmdaKiikB/VEEknsAty9vw6Sdex39teA/XPfl61r00jxuO68/JDHW9/pzJqK8NZeXNbPxbV9Y6KgB8/OFX0s/q91uzE/Pe6zI2janHL583CRtuvwD3XznH8LwqCz6H6WOHY/XNC3Hrx2ZmxewX6wbhfYcxUQX4/FNvefY3YmfncZSIhgF4BcDTRHQQQJ87YhUOm63s4yezk1+q3FpBbyqKxBUkhMhwgMcTQDyRSL+xG93LvVeciuvPnprOBdn2wTEsuH9tVgDD4d5BQzliikhn5N/x3NaMiJwFzaNRFZQNr9P6XhqGhU2T/f7xw1OxbOP7ORsy/X7rPnxtxSEQyFDBMe7QH/XOgW5HeVwBYBDJKrufBlAH4F43hCoGNlsVhhe9K4wodVXdYtEq3lhcwZKfvZrzfLN7aR43HM3jhudUnmYmLj1qRM4ty9sgESEgSVm5I3rfS1dvBHET38LHZo3DqZPqcNtvNkMC0G8Qirx6y4G8cjHu4cXfiGXlIYToI6LxSFa+PQLgDykzFsM4RjmG/qqKd1PH0bzZ2/nuJZfynNs0EgubGyznZCQ3CgJR5cSOIhxIWqpnjhuR7kG+vv0wbv/NZkgGBSgDUtK8df7JY7Hh9gsM61gx3uPF34idaKvPAvgmks2gCMCPieheIcSTbgnHDD3KOfTX6I83KBMkyi6vnmuMXCG3r+4q7n1NNT3d8dxWVAclKAmRZWrTEk8Ad67cisRzAt9cPBvvdfnOUj1kCcpAVSDg2d+IHbPVrQBOU3cbRNQA4C8AfKU82OdR/jjpg3E6aktbp0pfX8hM8dm5F6N+G1e3NJomFBpRE5QRSyjI10pjwGKJEvW8O57bmt65MN4SlAm/+9K56Isqnvkp7SiPLgDHNZ+Pp475CvZ5uEOpQ2ed8ME4WbARAL753JaMRf36cybj3itOzTjHSPHZqTRs1G9jeWsnvnzhDFh1PX9vyRycPH44Pv7wK3n7ctgll9OcKR3XnNXkeXHEvMqDiG5J/bMdwGtEtBLJ3+IrAGx2UTbGJ1gtuOeHaC2tPE5GbbUfOJ6hOABg2cY9uP7sqYY7EHUOs2dn9rxy+TxmT6zLcnzrqQpKaBpVg+Zxw/HQVXNx64rNkMj6LoMpD9QXCr9nmKt/GX9L/aey0nlxGL9hZRF2+g3fCZyO2lJ7VxgdN3sDNHt2xwfjGYUJtc8rX8CALBESeXYT6rmXz5uEWRNG4PlN+/DE+l3oi7qvQGTJuIQ74yx+iEC0UhjxHu3nVK4HhBC9bgnF+AejznMiIdK/uH7Ny3Aiaku7O5jaUGN4jtlxwPjZJRIC96zejmjc+HkZ+U3uvHRWuvBiVpfFoIREQiAckDPOBYD17Ydx24pNkIjy7jyCEiBLEgaLNEux4igNfohAtBNtdQqAXwIYlfp8GMD1QohtLslWEKV2mPvNXFMoZvdh1HkuogjUhpKJZ37Nyyg2amtl217ctmITZJKgiARuWDDN8DyjnAcVo2cXVQRqQ4So5pj+eWn9Jtry6WaFF9ekHKdb9/bg3tXbIJOEeCKBmCJSXpITMlQHJAzoFAQh2Seek/nKg6BUfp0ElwK4JVWWBER0HoDHAHzYBbkKppQOcz+aawoh130YdZ6rCkrp+kVu5WU4oZQLjdrq6o3gq8vbUpFNyftcum6XydnZS662RLv+2YUD2XkgA7G44fPqGYji3tXbEImLjMKL4UBm6G99bQj7enrwzZVbUzWkzMOsblg4FY++nHkvAmrLWFYf5UAsAbS+f8TztcaO8qhVFQcACCFeJqJaF2QqC/xqrrFLvvswUwLqcTfyMpxua2tXlm37jmWFxCoiu8NfUCbMnlhnKrvxTkFAkgiKZkdCuuQ8dQyJCJF45oJeFZDx6KdPQ111KJ3gt+D+tSCQpeKDNSHjUiVMeWEWrFFK7CiPXUR0J5KmKyBZUdfsdazi8au5xi757kNVDrdqTDh65eB0XoaTStnqDkZ7ntkb+OcWnYQnN7wHWSIoCYEHlmQ+B7MS7dqdwhfOa8bSdbsy/RYBGdv2HUNddRC1ITk9hhGxRAI1QRnvHjiOWFzJea6eoEyY0jDM0rmM/8kVrFEK7CiPGwDcA+C3qc+vpI4NScqxjIYRVu4juZSeKDNuhFO1sfI56O1gdQdj1GUvKFNGjkRQJnz23JPw2XNPMlVGRrJLAB67viW9UwCAR1/O7IkxEIvjxmWtCMkSIkoCZNAauiYoIwGBlin1GfWzchW8pZTcAVlKK7tzPtSQdW+yRAhIgEwS+vNlFjK+wet8TcvTCyG6hRBfEkKcnvrvy0KIbjeF8zN+LGVeCPnuQ32bjsQT6TLjt/1ms63ENzvkc9BbRbsLOB6JYzBmLLfRefet3o67L5uNcEBCTUhGOCDhoavmphXk3KaRht+zmewT66rT1+ifdzhAICJE4sn5o/FE1hjhgISf/uMZeOqGs7LqWuUyVREB//3Z+bj38tl44YsL002jrjmzKeO8T89vwgtfPBe3XTzTVo+GfIRlgkzeL3KVyq2/2eJpuwIrSYKrkMOTJoS43FGJygg/lTIvhlz3UWrzXD4HvVWsym123imT6vCXr2c3VnJCdu3z7hmI4gtPv51hxgpIyPC5fPLMRiyaMQY/32DeO9yIhACuffx1hAOZpVL++/XMZMdfbtyDX73RiaBMjvbOSCCpwDh81x1iivDUz2rFbPWgwTFVmfiuM32pQ3X9Usq8WMzuo9TmuXwOejvjWJE713lGzySXD8WO7OrYXb2RrPn1zvpnX9+Dk0bXFlRqJKokEFVO+I4eXDI3a3yBZNmRSDz7+mJwujQKk42XflYrG8qRAE4RQvyvEOJ/ATwA4BcAfg5grIuyFYQQYpUQ4qa6urr8JzN5KbV5zqn5rI6T67yu3khGN798HQ7VooZa1KKGueTUX6M380QV4O5Vf8W31uyw9Qz0SABe/Gt2p0GmfJB0r+te+llJGDjnMk4g2gDgU0KIjtTnNgAXAqgF8F9CiAtdl7IAWlpaRGtrq9diVAylToZ0ar5Coq0ahoUNnej3rd6eZZLacPsFGf6hBfevzXmO0bz6axjGjEXNDXj9/W7XcsuI6E0hhKX24lbMViFVcaRYnyrL3jWU8zyGGqU2zzk1n9VxtOcZhdzes2o7grrXPr3JwMx/oobhmkVo6a8hcLoeY8wr7V3401cWeVqKXcWK8qjXfhBC3Kz5OMZZcRg38GsJlVLIVcgchkpAJsTixr4RdY7akJzlv9CG4Rq9KTbWV2MglulsYMXBmCEA7OsZxKIZ3i+9VpTHa0R0oxDiMe1BIvocgNfdEYtxCr+WUMknlxOKpdB7N3KiKwmBuy6bna4dpSZLqi1c1TmubmnE8tbOVIa5goRQndHmCY/JDHNrKmPCiBA+OHaiMta5zQ3J7oJErjmog1JSukQCCMiUKmXCeMWxgWj+k0qAFeXxFQDPEdG1AN5KHTsDQBjAJ9wSjCkev5ZQySeXEwrP7r3rlZVRyRV9suTxSDztB1HnWN7aiUevOQ2bOnswoa4Kdz+fWTdUTXgEYFopNxfdA3Gs+NzZeK+rH/OaRmLZxvfwSnsX3NyvKOJEWRZWHN4zojrotQgArJVkPwjgw0R0AYDZqcMvCCHWuioZUzR+LaGSSy4Ajig8O5nqZspKm/sCAAvuX5vRSc/IDxKLJ/CZZW+ayhVRBF7d1YUfvLjTtP5Vrn1IUJIQDMhY0tJk2KDKDXI1n2JKz8Q6f1SxsFyeJKUsfK8wuIf5CZzM0XDSP5FLLqcUntVM9Xw7FHXOTR1HLflBrLyYP/jHdxBTMivlhmSBgCQjnlAgYG6C0n5/Zg2qmMpmX8+g5y1oARvlScoFv+d56HMH3MSpnIl8+Q1OyuWUwlOzvbUYZXurykqLdhekYuTYjsQU3HXZbIQDhJqgnLPOVAa68PhEQiBtDgPl/KM8c0p9+vub1zTS8JygBITkZI/1cIBQxfVBKgx/bAXtFEZkisQL53WxJVTc8puYyeVUifdc2d7aXZQdZaV3bKul1IUAFAhIlH/nEZCS/Ri0KAJQFIGokr8EyyvtXWjd3YVgQEZjfTUWNjdk1btSx999qA8AQZIob+9zpnwoO7MVUxxeOq+LyZlw029iJpcTNcPMlJA+Oup7V86xpKw6uweydgQEaBowWVuZPzZrPP78zsGcSYEBCZCIQCBEDApDXfP4a6gKyIgqiQwfjJ5XdEqFqQz8YrZi5VEi/Oq8zodXpeedSBI0c3rrFfiG2y/A6psXoq3jKOY1jTT8wzRrKWuXF3ccRD5FE08AT1x/Ot7c042f6Lr+AcmaUVajs5jK49VdhzF74ggA8DR/i5VHiSjX/h9udApUcdIJbzaW3umdlegXT+Dp1/bgJy+35zQn9kWVrGq3hZiCQrKEmxadhEdT80XiCgQyiwgGJOBf/vttBPSFjBgGwE9e3oWf/u8uyBIlw7w9yt9i5VEi3FyE3caN0vNO+n+sjhWLK1k+CUUAP/7zTsQSyGlOrA3JWdVoE8J+KZGBWBzXzp+Ma+dPTmelL35kfYbyiCcAJBLwRypYJtUBwkCcnSdekxBAQrMD9SJ/i8MwSsjl8yZhw+0X4KnPzseG2y/wRaa3VXI1QQLsRZFZbdRkZZ58Y2nPf6+r33BMifJHW+3rGTS5Nq/IGej7ldfXhjIiz0IyZUWJ6QnJSEepndvcYE+AIvmHM5owLJy/MReBF5dSYvQ76za88ygxVm35fq1HZYTdXYSVBL72A8ezfBBG80xpqDX1Jemd47d8ZIahPAJWzInGb9uhgIQBGxVxJcDQTKb6XKY21OC6J3NX/fn3S/4Op00ehcb6avzoxZ0ldYwTgFiO7k5quLIi/BJQOjTwwgTOysOH+LUelRGFRJHlS+D75nNbMjKnrz9nMr584QzDeVbfvNDQl1QbkrPOf+hPOw3l+dpFJ+P7qYxvM3OiWXikPjs8HxFF4JG17yKqSRK8ZXkbZElKF0+cPKoaOw/0mY5xysQ6zG0aWbIMcy3/8/ZeJERSSRjFC3D1ktJSG5KhCOGJCZyVh8/waz0qMwqJIjMzAanH9Qviso17cNbUBsN5+qKKoS+pL6pknW/GqNoQNtyeu+WsWYvZL5zXnOH8jiki5xt3SJYQkCgjpyOeAOKJE2G3uRQHAOw/ljTFrX/3UN57cxo1yTLAvnzPqQlJuOfy2Tj/5LEcbWUGEZ0E4A4AdUKIJV7L4yblFtJbWBSZ2fIqTEtuHO4dNJ1nbtPILIe+UXtXM+alfDm5nm9jfTUiOvNUJJYwdH7nyuFQEoksv4ddblnehnBAxmDcXl93J2Gfuff0RxOIKAnP1gXXfVpE9CQRHSSirbrjFxPRO0TUTkRfzzWGEGKXEOIzduaNJ0TJyoA4id3FuFTlTszmUaPI1BId4QDl3UKbmYAm1lWbltxY2DwmZ6kVvUPfqATKA0vm4PpzJmeMe/05ky0lXHX3RbNUnkgdV9E7v8MBgqzzqEsS4ZuLZ2sc5HmnziKqCByPxLlHOIP7Vm/3bI0rxc7j5wAeAbBMPUBEMoBHAXwUQCeAN4joeQAygO/orr8hVdnXFjv2H8N1j7/me5+BHjshvaXyjeSbR1+qPB9mJqC+qIK5TSNx/TmTsWxjps+jedxwNI8bbitk2CjE+PJ5k3D92VNzJgQaYbYjenLDbvz27b0Zz0Y1gfUMRHHjL1qh3R9IAE6ZVJdxzk2/fLPgNrRVQQlKQiAhBHL4sZkKxUurhOvKQwixjoim6g6fBaBdCLELAIjoWQBXCCG+A2CxM/MCxyPexUAXg5W8ilL5RvLNo/5cWyYjnxy56k4BwL1XnGq6wNvNPDc6X1VEdjDbEf3mrb2IxLMz1lWHtllggCqXE2+Nv/vSudix/zhufubtosdS+ft5E/A/bR84Nh7jDl4mGnsVij0JgLYvemfqmCFE1EBEPwVwGhF9I8d5NxFRKxG1Kv096eN2YqDdNgNZHT9fXoXVarDFkm+eQuSwUu23edxwLGlpcqWGTyHfcfO44Vkmr0tOHYeQbH7vuSr7qjIAyHgWVirzEoCwLCEkJ69tHjcci+dOzJLPcpVfA45HlKKuZ9xHJniaaFwWDnMhRBeAz1s4bymApQAQnjA9/cpnVTu7bQZycnyjEuEDsbjjbyH5fDCFll1xI2vdCsV8B2dMGYVfvdGZzipf0DwGa3dkRjzpn01cZ0uKKwls3duDTy7daGjqOnx8MGczKaTmVgsmtr5/5IT8OhdIMVV0X/yrbUsxU2KKDbwoFq92HnsBNGk+N6aOOQYRbPWwKCbr2QpujK//5XHjl0ndJYTkzLfdXI5pv5ZdKTazXTXPDaZ6kt+3ejvuvHRW1r0DyTpa3X1Rw+/o3tXbsmTYfagX7x44jrrqoK2s9WUb96B1dxdWb9qXFeIscCIbPSgTAhKymmEBqTYiTNkRTwjcusK5NcouXu083gAwnYimIak0PgXgWicGVjsJTjvpQ3jqs/Mtv9W6HSLr9Pid3QNZva+rArIrzrPW944gGd5v8LaLwnYRXiRCFvMdmGXFa53fjfXVWN9+GAvuX5vM+1ASqUZPmcR0ca6DsQSW/OzV9Ge75qJrHn8NssmLgzYbHQBe2nEQd6/aht7ICTe+lT4kjD+RJfLMYV6KUN1nAGwEMJOIOonoM0KIOICbAfwBwF8BLBdCbHNiPrWT4Kj6kTl9Bnrcrnrr9PilqtJrlMW8bOMetB84nnEsn49Gi9u7PDOKeWa5suLVeweQcV/ReCJrUY4p+mIo2dhdyGOKwKBJX4+FzWPS30vDsDDOP3ks4jqFxoqjfFESonId5kKIa4QQE4QQQSFEoxDiidTxNUKIGUKIDwkhvuXUfER0GREt7enpyX+yBrfNL06PX0h+RSGYhajqj9txQpfK2Q8kld+K1g60Hzhu+zvQXmulra3RfemxWmZdJqAmKCMgnfh3OEBZhRCDObYp+hwWtV7anZfOQlACghJZMlnlWyTY7OUNEgEPLGGHuWMIIVYBWNXS0nKj3WvdduI6Pb7d/IpCMAtR1R63a4Iq1a7JqEbWvVecauk70F97dUujYYa5Vmaj+9Jj1Z8hS1LyqyWCECL9HU8bXYvX3+sGIekQz1WkMNmGNon2O+qNxFO/O9a2HPl2Srxx8YblN52Nlmmlraqshasm67BjfvFyfK0Dtz+qIBJ3x/RjFKKqfaMtxARVCid7LnNbvu/A6NrlrZ15M8yN7uv6cyZnfL7rstlZWed6fSJLhKiS/F5jioAikP6Ol726J+20jyqJnAu32u9c/x25sdh/bPZYF0ZlzJAlwrQxwzyVoeJ2HqrDvLm52WtRXKWUNbByJe0VKofbu7xc5rZ8uSNm11oZz+i+vnzhjPTnzu4B1ATldAIrAAwLB/CdfzgVgzEFVUEZ3/jtloyfF8O6dw8jGJAtF4kslJnjRuDWi07Gd3+/g8N8XUDfdKwm6E5wjB0qbuehOsxrh48oy9pWKvl8CHZMP04kPpol7VmRI1ddLLd2ebnMbVpfhp1rzc6183zNntf4EclnMH5E2HJBRyssmj7akjmtWGqDEt7v6kPC5XmGKrpcVF+0sCYhKtNiWTVxuph+4yNlV9sKsO5DeL5tb1YNLP15pQiJzSWHl71Jrnv8VazXNEo6t7kB00bXGvpB9Fz0g5czSqPPHFeL+Sc1ZNXcOmPKqIz7u/qMRix/s/PE55ZGLG/tzLh/ABnPq2VKfYacM8fV4h2Tsux1VTJ6Bk846SeMCOGDY8YNa2eOq8UfvnKe4f1o4day/icoJ31f1cGAq39HRPSmEKLF0rmVqjzCE6aLCf/0QwDJqJgNt1/gy8Q1PV29ESy4f21W0UAz+XN1HLQ7VrFy6+Uo5fxG8ujnDsmEqEFc6otfWZSxo2o/cBwf+cE6w/MApM139bWhrDnyod4/kDT5xeJKRo6Hk6hz7T7U69gc15zZiGfe6LR07uzxw7Btf6/hz2QA3hWUL0/CAQmPXd+C2RNHuPb3Y0d5VJzZSg3VTQyeeMvyor9vodgNY81l+illSKyRHKWcX4/R3GYZ+HofRz5/iWq+sxKaq0frD5rbNNK0r7oTqHOte/ewY2Me6jXe5Rix/7i5Gc+l4MCKJiRLqKsO+uYluOKUh+rzkKpq08fctA86XUjRyTBWJ8cq5D7zze9mEUqjuc122XofRy5/iVbmQnwJ+udvx79iF7Ud78S6KsfG7B2IWT63KmC+vFSowcNVooqCjiP9WLfzkC98uRWnPFTs1rYqhJVte7Hg/rW47vHXsOD+tXi+rfjyXE6GsTo1VqH3mWt+N55dvrkfvGqupWZQzeOGY6EuGe/c5gZs++BYhswb2g9nzaG/bua42ozPV7c0ZlUP1stUV2XeISqoe2OvztEPtr46gMWPrMe31uzImcinn09K/e0Y8ep73TlGymRvT46dh+VRGJVIXODmZ97G9U++jvnfftHxvxm7VKzPY97pZ4hfrPyzaxVb3bbn5/JllHIsJ+5TP7/Xvpj2A8dzNoMyki8ckAAIRDSOZb3/wkobWrP7VGUKSMC/Ld9czC0XTTgg4WsXzcC31uzwVA4mN+EA4S9fv9DRvxk7Po+KzvOY66JJwO08C7tNj9way4n71M9fyhyVQppBGcknS5Qy1J9w86oyq76eTR1H8+ZTmN2nKtP3//iOzTt0npAsof2gsaOb8Q8yeddFEKhAs5Xq86irqyvoeqt2+FKV2CgVZvftxn0WM2a+HA0nMJJPSQgoIrfMVnwg+e5z0fTRBUjsLAPROPaVSYDJUEYR3q43Fac8isGOHb6c+ljkI9d9NwwL4+ozGjPO19vt7VLos/vmc1vwkR+sw9dWbMZHfrAO31y5pWAZ7Mr3wJI5OHPqqIzzzpxSnyGz0bPK5/PQ0zKtIav4oRa9f2LCiJDpuUFC+h5yofebxAXwyt+6TM5m/EBAAh5YMtfT9aZifR4tLS2itbXV8vmF2uGd9E14Qb77dtM/YefZ5cq9cKNVrV6+7r5o3vmNnpUeq8+udXcX/qdtH55+bU/Wz3549RzsOtyPuY11+MIzb+ec74dXz0FAlh3tb854R1Am/ODqeTjnQw2urDec51FASfZCcxLcLqToNm70KLeKnWdntTS8k2jlszK/lbwPq8+uZVoDTjPx2cUTwC0XzcTo4VV559u46wgGY5yOVykEZQlNo2p8sd5UnPIo1OdRaT4Mq7jVo9xprJSG93p+J3wedua0Mt9Fs8aV7Bkx7uNl8yc9Fac8CqWSfBh2yHfffnku+UrD+2F+KyXZ7Ty7fHPq59Mzc1wtLpw13nAcSZMHlcvHooVgnP9BsN8QqpBrhjpBmTxt/qSHfR46yt2HUSj57tsvzyVfjoYf5jfKaynm2VnJS1HHb9vTjT9uP4CLZo3DhbPGm45TXxvKkEn7s/e7+tJjTGmoNbxm96FerHv3MOY21mH08KqMY2rEmPrvkTUhtHUcRX1NEN39sYyxDh8fxKbOHtPzjvZHse7dwyCRwLb9vRhZJeP97kHMmzQC1eEgFk0fjZ6BGP64/QBGhGXsPjKA8cND2H88imoZaO8awBlNdRg1rApHegfxZkdP+nNtUMKurn70D0ax5YNeTK2vQl9coL5KRkdPBM0N1RhQgGmjqnEsoiAoAS/tPARFAc6cUocBBem5Dh/rx/b9vairCuIT8ybiWETBiLCMP/31IHojcTSODONAbwx1YRl7ugdQEwrg9KY67Njfi7qaID6zcBreev8oXt3dhTmTRiAuCLMnjsD5M8diX88gAIHZE+tc/9sb0nkexeJkfoURflmE9eS7bzeeSyHPor42hOnjhqO+1jzKKB9Gi7FVWfLliADWnlU+BaP9nA/tfPMm16cXcz1H+6PYc6QfUxtq0ovyoumj0TAsnPVcz5w6ClMaajPu98/b96eVSl11EADwwdGB9OLfMxDDgWOD6BmIpX8OnPjODh8fTM+vKqiTGmrS523dexS/37Yfs8cPgyAJUxtqMLImhMmjatDR1YuegRgkkUBvJI79PQMYUAZw0uga9A7GsetwH2oDhPe6B9E/GEL3oIKmujCGhQM4eCypNPSfZ48fBlmWEYknEIkriCoJAIRoPIGBqILjg3H0xQWAaowbUYXW3YexvyeCAAGbOntwsDeKqaOqMbwmjDOnNuB4JIG9R/vxyruHQLKM+ioZvZE4hoUDGDu8CkcHExhVG0L3QBwBiTB2RBU6ugdw4FgEazbvw96eCIISYV5TPQbiCWzbdwxjhoUwalgYAKFtT3f6WasdBL1cT3jnUUK8LE/uNwp5Fk48P6PWtPqy6k5+L0YyCyCzjLuubLv2c6+u859ZCXmzudT70Jen1zJzXC3eP5J09vdF40hoJlTny1XS3Qr6ZkZMcZzb3IAlLU2O/95ySXb4T3l4WZ7cbxTyLJx4fmbhvvpS7U6GIlspc6EX+8kAAA6NSURBVGIXo/DkXM+n2JLs/+/yWfiP57cXfD3jDkEJ0EZpO/F7y6G6BYTquo2X5cn9RiHPwonnZxZuqy/V7tT3YiSzLBFkKu7Pzug+cj2fYkuyr9z8QVHXM27hzu+tVSpOeRRbnsQt/BLy6gcKeRZOPD+zkFX97tup78VqmRO7GN1HrudTbMmTK+ZMKOp6xi3c+b21SsUpD7/il5BXP1DIs3Di+ZmFvj541VxXvhezMicPLJmbM5xX+1kfzmoWnpzr+eQreTJzXG36Okk34fXnTMZ1H56WVWbFLhyW6yznNjfgoavnebqesM+jxPg12soLCnkWTjy/YqKtCsGsRa/VaKvuvqjl8ORc99G6u8swlLZlWoOl+fTRVuvePYyxw0I42BvNCJnV/nzR9NGYNmZYVliueu5JDTXoiyWwaPpodHb3Y/WWE9FW2vDdjq5e/GV3d1YY7eJTx6N3MI6Vmz9IR1tNHHEi2qp7UElfo/+sRlt19w5iywfHMW1UNWKC0uOoobtqWHDr7sN47b2jCBAwbkQ4I9pq3qQRWPvOIew92o+po2rS0VY7D/ZhWDiAGWNrsfNQPyaOCOFvh/sRkAjnzxyDN9/vRldfDPMaRyT7nwiBa86anI62Om/G6HS0VSyuuB5txQ5z+Fd5MAzD+JUh7TBnGIZh3IeTBMscv5jBCpGjkGxxu/M4nZGuNf2opoN8Mloxk+W6r3zjaT9rk/9apjVkmJo+ODqAlZs/wBVzJuDsD402HUObYa7NUjfKUN996Dg27jqSrKE1ud7Q9KXKdNLoGsQTyDimZnlr5dNmj6smr1/+ZRde2nkYTSPDiCQow2y1ZvM+rGvvQvPoGoyvr0U8Fsc7h/owc0wtAsEAJJHAzkP9GBYkfHA8mh5DKAo6jg6iNijhWETByeNqMXPCSHR09WLbB8dRVxVAz2Ac0xqqEQwGcbw/gp0H+1EdJCz4UAO6BxUcPjaAjqODmDgijL6ognhCYFpDNQ70xlAblNDZPYiasIyWySPR0RNBLBbHnu5BfGhMDT591hQsb+3Ekf4omkfX4G+H+zGyJoALZo7FOwf7MLGuCguaR2PH/mTvmo/NHu9JVQUz2GxVxvgl6bAQOYyS9cyS3wqdp5A5cqFPtDu3uQG//OzZOWVsmVKfcY1RUqI+SVB7X/rxzpxSj1c0480cV4t3TJL3qgOEAQv5JLnGmDmuFn/4ynlZz1KWCImEyIj3IQDDwgH0xxQoCWfWFYkAh4aqCIr9Hc4H+zxQ+crDL0mHhchRSG8Ou/M43f+jdXeXYaLdis+dneG8zNfPAwBCMhDNUSVd2xvdynhuw0mC/sLNHjZD2ufh1yRBp/FL0mEhchTSm8PuPE73/zBLtNMet9LPAwAoz5+del9Wx3MbThL0F272sLGD97+ZDuPXJEGn8UvSYSFyFNKbw+48Tvf/MEu00x630l8DAASs9fywOp7bcJKgv/BLf5aKUx5DBb8kHRYiRyG9OezO43T/D6NEu3ObGzKc5kYy6q9JJiXOy5kkqN6XlfFyJe/pe5ObkWuMmeNqcd2Hp2U9S1mirMQ/td+HrM80LAIHh6oIStnDJh/s8yhzONrK+TlywdFWHG1VydFW7DDH0FEeDMMwTjGkHeYMwzCM+7DyYBiGYWzDyoNhGIaxDSsPhmEYxjasPBiGYRjbsPJgGIZhbFMWVXWJ6BMALgUwAsATQog/eiwSwzDMkMZ15UFETwJYDOCgEOIUzfGLAfwIgAzgcSHEd83GEEI8B+A5IqoH8CAAVh4p/JIk6DROJQMW83ysdAC0cp2VxEK7cmgxSt6zkmxo9jNtgp92vHwdDZ97qwOrt+zH4lPH49wZY9PXAcm6X7G4gve6+jOSBLWdCNWugV3HB/HGnqMZ46hdCEkksG1/LxafOh6fOL3J8L5UOaaNqsaxiIJzThqFaWOGpxMaT504AlWhAAISsOtwP470DuLNjh6c0VSHUcOq0omLQQl452AfZo6tRSwBjB8ews5DfRg/ogoBAtbvOoJxtUEc6Ith9vhhONIfR21IRl11AG929GBcbRA7D/ahOizjtMY6vN89iJlja7G3ZxDDQgHMnzYKOw70oq46gNMn12P15n1492AvPtXShOs+PC3ru6kNydjXM4C93QOIxBUsbB7jaba560mCRLQIQC+AZaryICIZwE4AHwXQCeANANcgqUi+oxviBiHEwdR1DwF4WgjxVr55h0KSoF9KsjuNU6XXi3k+RtcKIO94+usmj6rGTk25c6My7sU8C6NS6TVBOW9pd7Of6cupq+P1ReMZpdH1pcHP/vafsP9YNEP24eEABmJxEBGEEIgXWKaLABitUhNGhPD1S2Zl3FdYJvQM5ihZXCbUVcnYdPfF6e8GgGF1ZadLtPsuw5yIpgJYrVEe5wC4WwjxsdTnbwCAEEKvONTrCcB3AfxJCPGilTkrXXn4pSS70zhVen3F587GdU++XtDzMZIhHCAAhEjcfDyrJdm1ZdztyqGd0+ze9ecC2aXdnSj7rpYGf+6tDvzb8s22r3eCAAEWWpaUJd/42Az8YG173u/GyRLt5ZBhPglAh+ZzZ+qYGV8E8BEAS4jo82YnEdFNRNRKRK2HDh1yRlKf4peS7E7jVOn1de8eLvj5GMkgk5RV8E8/ntUS6mbl3a3IoZ0zX2nuXKXdnSj7rs6/esv+gq53ggrVGwCA5zZ/YOm78apEe1lEWwkhHhZCnCGE+LwQ4qc5zlsqhGgRQrSMGTOmlCKWHL+UZHcap0qvL5o+uuDnYySDIhJZ3fH041ktoW5W3t2KHNo585XmzlXa3Ymy7+r8i08dn+dM96jkorufmDPB0nfjVYl2r5THXgBNms+NqWNFM1SaQfmlJLvTOFV6vWVaQ8HPx0iGB5bMxQNLco9ndJ2+3Lm+jHsxz8Lo3mWJLJV2N/uZfneljqcvja4tDf6J05swYUQoS/7h4aRTOigTAkWsNGYKYsKIEL7/yczy9nVVcuET+Yi6KhmfO396+rupCho/QC9LtHvl8wgg6TC/EEml8QaAa4UQ25yas9J9HiocbZWEo6042oqjrYrHVw5zInoGwHkARgM4AOAuIcQTRHQJgB8iGWH1pBDiW07OO1SUB8MwjFPYUR6u53kIIa4xOb4GwBqn5yOiywBc1tzc7PTQDMMwTIqycJjbYaj0MGcYhvGSilMeQ8VhzjAM4yUVpzx458EwDOM+Fac8GIZhGPcpSaiuFxDRIQBHARRivxoNwFoaMFMsdSjsO/I7fr0vr+Rye16nx3dqvGLGKfTaYtavKUIISxnWFas8AICIlgohbirgular4WpMcRT6Hfkdv96XV3K5Pa/T4zs1XjHj+H39qnSz1SqvBWDyUqnfkV/vyyu53J7X6fGdGq+Ycfz6OwSgwncehcI7D4ZhyhXeeXjLUq8FYBiGKZCSrF+882AYhmFswzsPhmEYxjasPBiGYRjbsPJgGIZhbMPKIw9EVEtEvyCix4jo017LwzAMYwciOomIniCiFU6OOySVBxE9SUQHiWir7vjFRPQOEbUT0ddTh/8BwAohxI0ALi+5sAzDMDrsrGFCiF1CiM84LcOQVB4Afg7gYu0BIpIBPArg4wBmAbiGiGYh2SK3I3WaUkIZGYZhzPg5rK9hrjAklYcQYh2AI7rDZwFoT2npKIBnAVwBoBNJBQIM0efFMIy/sLmGuQIvhieYhBM7DCCpNCYB+C2AK4noP+HzcgEMwwxpDNcwImogop8COI2IvuHUZK63oS13hBB9AP7ZazkYhmEKQQjRBeDzTo/LO48T7AXQpPncmDrGMAxTDpR0DWPlcYI3AEwnomlEFALwKQDPeywTwzCMVUq6hg1J5UFEzwDYCGAmEXUS0WeEEHEANwP4A4C/AlguhNjmpZwMwzBG+GEN48KIDMMwjG2G5M6DYRiGKQ5WHgzDMIxtWHkwDMMwtmHlwTAMw9iGlQfDMAxjG1YeDMMwjG1YeTAMwzC2YeXBMACIaKpBb4S7iehrLswlEdHDRLSViLYQ0RtENM3peRjGTbgwIsO4BBEFUlm/ej4JYCKAOUKIBBE1AuhzaS6GcQXeeTBMHojoS0S0nYg2E9GzqWO1qW5urxPR20R0Rer4/yWi54loLYA/mww5AcAHQogEAAghOoUQ3anrLyait4hoExH9OXVsFBE9l5r/VSKakzp+NxH9kog2APglEclE9EBqJ7OZiD7n7pNhhjK882CY/HwdwDQhRISIRqaO3QFgrRDihtSx14noxdTPTkdyV6Fv1qOyHMB6IjoXSQXzlBDibSIaA+AxAIuEELuJaFTq/HsAvC2E+AQRXQBgGYB5qZ/NArBQCDFARDcB6BFCnElEYQAbiOiPQojdzj0KhknCyoNhkpgVeRMANgN4moieA/Bc6vhFAC7X+ESqAExO/ftPORQHhBCdRDQTwAWp//5MRFcBqAGwTl3sNWMsBHBl6tjaVHOfEamfPS+EGNDINIeIlqQ+1wGYDoCVB+M4rDwYJkkXgHrdsVFILryXAlgE4DIAdxDRqQAIwJVCiHe0FxDRfFjwXwghIgB+B+B3RHQAwCcA/LEAubVzEYAvCiH+UMA4DGML9nkwDAAhRC+AD1JmIaRMRhcDWA+gSQjxEoDbkXybH4Zk2esvEhGlzj/N6lxEdDoRTUz9WwIwB8D7AF4FsEiNvNKYrV4B8OnUsfMAHBZCHDMY+g8A/oWIgqlzZxBRreWHwDA24J0Hw5zgegCPEtH3U5/vAbAHwEtEVIfkm/3DQoijRHQfgB8C2JxSALsBLLY4z1gAj6X8EgDwOoBHhBCDKb/Fb1NjHgTwUQB3A3iSiDYD6AfwTybjPg5gKoC3UkrtEJI7GoZxHO7nwTAMw9iGzVYMwzCMbdhsxTAukXKs/1J3OCKEmO+FPAzjJGy2YhiGYWzDZiuGYRjGNqw8GIZhGNuw8mAYhmFsw8qDYRiGsQ0rD4ZhGMY2/x/kVASR9wn5EwAAAABJRU5ErkJggg==\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "viz = data.filter(['User_Score','Critic_Score', 'Global_Sales'], axis=1)\n", "viz['User_Score'] = pd.Series(viz['User_Score'].apply(pd.to_numeric, errors='coerce'))\n", "viz['User_Score'] = viz['User_Score'].mask(np.isnan(viz[\"User_Score\"]), viz['Critic_Score'] / 10.0)\n", "viz.plot(kind='scatter', logx=True, logy=True, x='Critic_Score', y='Global_Sales')\n", "viz.plot(kind='scatter', logx=True, logy=True, x='User_Score', y='Global_Sales')\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "리뷰 스코어들과 매출 사이의 관계에 대한 우리의 직관이 타당해보입니다. 우리는 또한 데이터셋으로부터 다른 관련 피처들을 추출할 수 있습니다. 예를 들면, ESRB raing은 'E'가 있는 게임이 일반적으로 성인 등급에 대한 연령 제한이 있는 \"M\"게임보다 더 넓은 사용자에 도달하기 때문에 영향이 더 미치지만, 다른 피처들, 장르 (shooter혹은 action)에 따라 M-rating 게임들 또한 큰 히트를 칠 수 있습니다. 모델은 이러한 관계들과 다른 것들을 배우게 될 것입니다. \n", "\n", "다음으로, 데이터셋의 피처들을 살펴보면 제외해야 할 몇개의 컬럼을 식별할 수 있습니다. 예를 들면 매출 숫자를 지정하는 5개의 컬럼이 있습니다: 이 숫자는 우리가 예측할 타켓과 직접적으로 연관이 있기 때문에 삭제해야 합니다. 게임 이름과 같은 다른 피처들도 관련이 없을 수 있습니다. " ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [], "source": [ "data = data.drop(['Name', 'Year_of_Release', 'NA_Sales', 'EU_Sales', 'JP_Sales', 'Other_Sales', 'Global_Sales', 'Critic_Count', 'User_Count', 'Developer'], axis=1)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "컬럼 숫자를 줄였으니 이제 얼마나 많은 컬럼에 데이터가 누락되었는지 체크해 보겠습니다:" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Platform 0\n", "Genre 2\n", "Publisher 54\n", "Critic_Score 8582\n", "User_Score 6704\n", "Rating 6769\n", "y 0\n", "dtype: int64" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "data.isnull().sum()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "이 데이터셋의 Kaggle의 개요에서 언급한 것과 같이, 많은 리뷰 rating이 누락되어 있습니다. 불행하게도 그것들은 우리가 예측의 위해 의존하는 중요한 피처들이고 그들을 대체하기 위한 방법이 없기 때문에 해당 피처가 누락된 행은 삭제할 필요가 있습니다. " ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [], "source": [ "data = data.dropna()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "이제 User_Score 컬럼에 있는 문제를 해결할 것입니다: 그것은 'tbd'라는 문자열을 포함하고 있기 때문에 명백히 숫자라고 할 수 없습니다. User_Score는 범주형 피처보다는 숫자타입이 더 적합하기 때문에 문자열 타입을 숫자형으로 변환할 필요가 있고 일시적으로 tbd값들을 NaN으로 채울 것입니다. 그 다음으로 새로운 NaN 값들을 어떻게 해야할지 결정해야만 합니다. 우리는 이미 많은 행들을 버렸기 때문에 이 행들을 복구해야 합니다. 첫번째 근사치로서 Critic_Score 컬럼을 이용할 것이며, User Score는 Critic Score를 따라가는 경향이 있기 때문에 Critic Score를 10으로 나눈 값을 사용할 것입니다. \n" ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [], "source": [ "data['User_Score'] = data['User_Score'].apply(pd.to_numeric, errors='coerce')\n", "data['User_Score'] = data['User_Score'].mask(np.isnan(data[\"User_Score\"]), data['Critic_Score'] / 10.0)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "최종 데이터 전처리로서 범주형 피처들을 one-hot encdoing 메서드를 사용하여 숫자로 변환하는 것을 포함합니다." ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Critic_ScoreUser_ScorePlatform_3DSPlatform_DCPlatform_DSPlatform_GBAPlatform_GCPlatform_PCPlatform_PSPlatform_PS2...Publisher_inXile EntertainmentRating_AORating_ERating_E10+Rating_K-ARating_MRating_RPRating_Ty_noy_yes
076.08.000000000...0010000001
282.08.300000000...0010000001
380.08.000000000...0010000001
689.08.500100000...0010000001
758.06.600000000...0010000001
\n", "

5 rows × 333 columns

\n", "
" ], "text/plain": [ " Critic_Score User_Score Platform_3DS Platform_DC Platform_DS \\\n", "0 76.0 8.0 0 0 0 \n", "2 82.0 8.3 0 0 0 \n", "3 80.0 8.0 0 0 0 \n", "6 89.0 8.5 0 0 1 \n", "7 58.0 6.6 0 0 0 \n", "\n", " Platform_GBA Platform_GC Platform_PC Platform_PS Platform_PS2 ... \\\n", "0 0 0 0 0 0 ... \n", "2 0 0 0 0 0 ... \n", "3 0 0 0 0 0 ... \n", "6 0 0 0 0 0 ... \n", "7 0 0 0 0 0 ... \n", "\n", " Publisher_inXile Entertainment Rating_AO Rating_E Rating_E10+ \\\n", "0 0 0 1 0 \n", "2 0 0 1 0 \n", "3 0 0 1 0 \n", "6 0 0 1 0 \n", "7 0 0 1 0 \n", "\n", " Rating_K-A Rating_M Rating_RP Rating_T y_no y_yes \n", "0 0 0 0 0 0 1 \n", "2 0 0 0 0 0 1 \n", "3 0 0 0 0 0 1 \n", "6 0 0 0 0 0 1 \n", "7 0 0 0 0 0 1 \n", "\n", "[5 rows x 333 columns]" ] }, "execution_count": 23, "metadata": {}, "output_type": "execute_result" } ], "source": [ "if data['y'].dtype == bool:\n", " data['y'] = data['y'].apply(lambda y: 'yes' if y == True else 'no')\n", "model_data = pd.get_dummies(data)\n", "\n", "model_data.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "모델의 오퍼피팅을 막기 위해 데이터를 3개의 그룹으로 랜덤하게 나눌 것입니다. 구체적으로 이 모델은 데이터의 70%에 대해 학습할 것입니다. 그 다음 데이터의 20%에 대해 평가하여 새로운 데이터의 정확도를 추정합니다. 최종 테스트 데이터셋으로서 나머지 10%를 남겨둘 것입니다. " ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [], "source": [ "train_data, validation_data, test_data = np.split(model_data.sample(frac=1, random_state=1729), [int(0.7 * len(model_data)), int(0.9 * len(model_data))]) " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Amazon SageMaker의 XGBoost는 CSV와 libSVM의 입력 데이터를 지원합니다. 우리는 여기서 피쳐와 타켓 변수를 별도의 파라미터로 제공하는 libSVM을 사용할 것입니다. 무작위 순서변경으로 인한 오정렬 문제를 피하기 위해서 위의 셀에서 분리가 완료 후 수행합니다. 학습 전에 마지막 단계로, Amazon SageMaker에서의 학습을 위한 입력으로서, 결과 파일들을 S3에 복사할 것입니다. " ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [], "source": [ "dump_svmlight_file(X=train_data.drop(['y_no', 'y_yes'], axis=1), y=train_data['y_yes'], f='train.libsvm')\n", "dump_svmlight_file(X=validation_data.drop(['y_no', 'y_yes'], axis=1), y=validation_data['y_yes'], f='validation.libsvm')\n", "dump_svmlight_file(X=test_data.drop(['y_no', 'y_yes'], axis=1), y=test_data['y_yes'], f='test.libsvm')\n", "\n", "s3.Bucket(bucket).Object(prefix + '/train/train.libsvm').upload_file('train.libsvm')\n", "s3.Bucket(bucket).Object(prefix + '/validation/validation.libsvm').upload_file('validation.libsvm')\n", "\n", "s3_input_train = sagemaker.s3_input(s3_data='s3://{}/{}/train'.format(bucket, prefix), content_type='libsvm')\n", "s3_input_validation = sagemaker.s3_input(s3_data='s3://{}/{}/validation/'.format(bucket, prefix), content_type='libsvm')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---\n", "## 훈련\n", "\n", "우리의 데이터는 XGBoost 모델을 학습하기 위한 준비가 되었습니다. XGBoost 알고리즘은 많은 튜닝 가능한 하이퍼파라미터를 가지고 있습니다. 이러한 하이퍼파라미터의 일부는 다음과 같습니다. 처음에는 몇 가지만 사용하도록 할 것입니다. \n", "\n", "- `max_depth`: 트리의 최대 깊이. 주의사항으로, 이 값이 너무 작으면 데이터가 언더피트의 가능성이 있고, 반면 증가하면 모델이 더욱 복잡해지고 오버피트의 가능성이 있습니다.(즉, 고전적인 Bias-Variance의 트레이드오프)\n", "- `eta`: 오버피트 방지를 위한 업데이트에서 사용되는 단계 크기 축소 \n", "- `eval_metric`: 검증 데이터에 대한 평가 메트릭. 불균형이 있는 이와 같은 데이터는 데이터의 경우는 AUC 매트릭을 사용합니다. \n", "- `scale_pos_weight`: 양과 음의 가중치의 균형을 제어하여 불균형한 클래스를 갖는 데이터셋에 유용합니다.\n", "\n", "먼저 우리는 Amazon SageMaker Estimator 객체를 위한 파라미터를 와 알고리즘 자체의 하이퍼파라미터를 설정할 것입니다. Amazon SageMaker Python SDK의 Estimator 객체는 최소한의 코드로 Training job을 설정할 수 있는 편리한 방법입니다. " ] }, { "cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Training job videogames-xgboost-2019-11-04-12-49-17\n" ] } ], "source": [ "job_name = 'videogames-xgboost-' + strftime(\"%Y-%m-%d-%H-%M-%S\", gmtime())\n", "print(\"Training job\", job_name)\n", "\n", "from sagemaker.amazon.amazon_estimator import get_image_uri\n", "\n", "container = get_image_uri(region, 'xgboost', '0.90-1')\n", "\n", "xgb = sagemaker.estimator.Estimator(container,\n", " role, \n", " base_job_name=job_name,\n", " train_instance_count=1, \n", " train_instance_type='ml.c5.xlarge',\n", " output_path='s3://{}/{}/output'.format(bucket, prefix),\n", " sagemaker_session=session)\n", "\n", "xgb.set_hyperparameters(max_depth=3,\n", " eta=0.1,\n", " subsample=0.5,\n", " eval_metric='auc',\n", " objective='binary:logistic',\n", " scale_pos_weight=2.0,\n", " num_round=100)\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "다음으로, 우리는 Trainig job을 실행할 것입니다. Training job을 위한 하드웨어는 노트북 인스턴스와 별개이며 Amazon SageMaker에서 관리합니다. Amazon SageMaker는 훈련 Cluster를 설정 및 작업 완료 시에 해제하는 것과 같은 무거운 작업을 수행합니다. 다음 한줄의 코드로 Training job이 시작됩니다. " ] }, { "cell_type": "code", "execution_count": 28, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "2019-11-04 13:07:06 Starting - Starting the training job...\n", "2019-11-04 13:07:08 Starting - Launching requested ML instances......\n", "2019-11-04 13:08:37 Starting - Preparing the instances for training......\n", "2019-11-04 13:09:31 Downloading - Downloading input data\n", "2019-11-04 13:09:31 Training - Downloading the training image...\n", "2019-11-04 13:09:59 Uploading - Uploading generated training model\n", "2019-11-04 13:09:59 Completed - Training job completed\n", "\u001b[31mINFO:sagemaker-containers:Imported framework sagemaker_xgboost_container.training\u001b[0m\n", "\u001b[31mINFO:sagemaker-containers:Failed to parse hyperparameter eval_metric value auc to Json.\u001b[0m\n", "\u001b[31mReturning the value itself\u001b[0m\n", "\u001b[31mINFO:sagemaker-containers:Failed to parse hyperparameter objective value binary:logistic to Json.\u001b[0m\n", "\u001b[31mReturning the value itself\u001b[0m\n", "\u001b[31mINFO:sagemaker-containers:No GPUs detected (normal if no gpus installed)\u001b[0m\n", "\u001b[31mINFO:sagemaker_xgboost_container.training:Running XGBoost Sagemaker in algorithm mode\u001b[0m\n", "\u001b[31m[13:09:49] 5614x331 matrix with 33684 entries loaded from /opt/ml/input/data/train\u001b[0m\n", "\u001b[31m[13:09:49] 1604x331 matrix with 9624 entries loaded from /opt/ml/input/data/validation\u001b[0m\n", "\u001b[31mINFO:root:Single node training.\u001b[0m\n", "\u001b[31mINFO:root:Train matrix has 5614 rows\u001b[0m\n", "\u001b[31mINFO:root:Validation matrix has 1604 rows\u001b[0m\n", "\u001b[31m[0]#011train-auc:0.778801#011validation-auc:0.765479\u001b[0m\n", "\u001b[31m[1]#011train-auc:0.79049#011validation-auc:0.776439\u001b[0m\n", "\u001b[31m[2]#011train-auc:0.794468#011validation-auc:0.779476\u001b[0m\n", "\u001b[31m[3]#011train-auc:0.800483#011validation-auc:0.788066\u001b[0m\n", "\u001b[31m[4]#011train-auc:0.807711#011validation-auc:0.792379\u001b[0m\n", "\u001b[31m[5]#011train-auc:0.809722#011validation-auc:0.793961\u001b[0m\n", "\u001b[31m[6]#011train-auc:0.810795#011validation-auc:0.792471\u001b[0m\n", "\u001b[31m[7]#011train-auc:0.812122#011validation-auc:0.794129\u001b[0m\n", "\u001b[31m[8]#011train-auc:0.814009#011validation-auc:0.795064\u001b[0m\n", "\u001b[31m[9]#011train-auc:0.814834#011validation-auc:0.795198\u001b[0m\n", "\u001b[31m[10]#011train-auc:0.81763#011validation-auc:0.799186\u001b[0m\n", "\u001b[31m[11]#011train-auc:0.819472#011validation-auc:0.800404\u001b[0m\n", "\u001b[31m[12]#011train-auc:0.821013#011validation-auc:0.802302\u001b[0m\n", "\u001b[31m[13]#011train-auc:0.82369#011validation-auc:0.80253\u001b[0m\n", "\u001b[31m[14]#011train-auc:0.824144#011validation-auc:0.803563\u001b[0m\n", "\u001b[31m[15]#011train-auc:0.826624#011validation-auc:0.806042\u001b[0m\n", "\u001b[31m[16]#011train-auc:0.828072#011validation-auc:0.808865\u001b[0m\n", "\u001b[31m[17]#011train-auc:0.831149#011validation-auc:0.80937\u001b[0m\n", "\u001b[31m[18]#011train-auc:0.832595#011validation-auc:0.808052\u001b[0m\n", "\u001b[31m[19]#011train-auc:0.83325#011validation-auc:0.808756\u001b[0m\n", "\u001b[31m[20]#011train-auc:0.834194#011validation-auc:0.808151\u001b[0m\n", "\u001b[31m[21]#011train-auc:0.836241#011validation-auc:0.810824\u001b[0m\n", "\u001b[31m[22]#011train-auc:0.837778#011validation-auc:0.813709\u001b[0m\n", "\u001b[31m[23]#011train-auc:0.838762#011validation-auc:0.814457\u001b[0m\n", "\u001b[31m[24]#011train-auc:0.839512#011validation-auc:0.813747\u001b[0m\n", "\u001b[31m[25]#011train-auc:0.840195#011validation-auc:0.814684\u001b[0m\n", "\u001b[31m[26]#011train-auc:0.842162#011validation-auc:0.816531\u001b[0m\n", "\u001b[31m[27]#011train-auc:0.843906#011validation-auc:0.81694\u001b[0m\n", "\u001b[31m[28]#011train-auc:0.84562#011validation-auc:0.817372\u001b[0m\n", "\u001b[31m[29]#011train-auc:0.846485#011validation-auc:0.818337\u001b[0m\n", "\u001b[31m[30]#011train-auc:0.848618#011validation-auc:0.820157\u001b[0m\n", "\u001b[31m[31]#011train-auc:0.849204#011validation-auc:0.820631\u001b[0m\n", "\u001b[31m[32]#011train-auc:0.851065#011validation-auc:0.821782\u001b[0m\n", "\u001b[31m[33]#011train-auc:0.851897#011validation-auc:0.822088\u001b[0m\n", "\u001b[31m[34]#011train-auc:0.853238#011validation-auc:0.822793\u001b[0m\n", "\u001b[31m[35]#011train-auc:0.854162#011validation-auc:0.822355\u001b[0m\n", "\u001b[31m[36]#011train-auc:0.855103#011validation-auc:0.822992\u001b[0m\n", "\u001b[31m[37]#011train-auc:0.856192#011validation-auc:0.82483\u001b[0m\n", "\u001b[31m[38]#011train-auc:0.856972#011validation-auc:0.825236\u001b[0m\n", "\u001b[31m[39]#011train-auc:0.857321#011validation-auc:0.826683\u001b[0m\n", "\u001b[31m[40]#011train-auc:0.857793#011validation-auc:0.827152\u001b[0m\n", "\u001b[31m[41]#011train-auc:0.85975#011validation-auc:0.828857\u001b[0m\n", "\u001b[31m[42]#011train-auc:0.861199#011validation-auc:0.828673\u001b[0m\n", "\u001b[31m[43]#011train-auc:0.861739#011validation-auc:0.829101\u001b[0m\n", "\u001b[31m[44]#011train-auc:0.862806#011validation-auc:0.829095\u001b[0m\n", "\u001b[31m[45]#011train-auc:0.863004#011validation-auc:0.829155\u001b[0m\n", "\u001b[31m[46]#011train-auc:0.864234#011validation-auc:0.829926\u001b[0m\n", "\u001b[31m[47]#011train-auc:0.865389#011validation-auc:0.830902\u001b[0m\n", "\u001b[31m[48]#011train-auc:0.866072#011validation-auc:0.83073\u001b[0m\n", "\u001b[31m[49]#011train-auc:0.866688#011validation-auc:0.83173\u001b[0m\n", "\u001b[31m[50]#011train-auc:0.867109#011validation-auc:0.832104\u001b[0m\n", "\u001b[31m[51]#011train-auc:0.867848#011validation-auc:0.832753\u001b[0m\n", "\u001b[31m[52]#011train-auc:0.868071#011validation-auc:0.832541\u001b[0m\n", "\u001b[31m[53]#011train-auc:0.869171#011validation-auc:0.83358\u001b[0m\n", "\u001b[31m[54]#011train-auc:0.870338#011validation-auc:0.834622\u001b[0m\n", "\u001b[31m[55]#011train-auc:0.87096#011validation-auc:0.835333\u001b[0m\n", "\u001b[31m[56]#011train-auc:0.871977#011validation-auc:0.836434\u001b[0m\n", "\u001b[31m[57]#011train-auc:0.872843#011validation-auc:0.83723\u001b[0m\n", "\u001b[31m[58]#011train-auc:0.872993#011validation-auc:0.837513\u001b[0m\n", "\u001b[31m[59]#011train-auc:0.873447#011validation-auc:0.837555\u001b[0m\n", "\u001b[31m[60]#011train-auc:0.873873#011validation-auc:0.836535\u001b[0m\n", "\u001b[31m[61]#011train-auc:0.874091#011validation-auc:0.836567\u001b[0m\n", "\u001b[31m[62]#011train-auc:0.8745#011validation-auc:0.836862\u001b[0m\n", "\u001b[31m[63]#011train-auc:0.874914#011validation-auc:0.837149\u001b[0m\n", "\u001b[31m[64]#011train-auc:0.875699#011validation-auc:0.83781\u001b[0m\n", "\u001b[31m[65]#011train-auc:0.876221#011validation-auc:0.838345\u001b[0m\n", "\u001b[31m[66]#011train-auc:0.8768#011validation-auc:0.838519\u001b[0m\n", "\u001b[31m[67]#011train-auc:0.87719#011validation-auc:0.838754\u001b[0m\n", "\u001b[31m[68]#011train-auc:0.877336#011validation-auc:0.838593\u001b[0m\n", "\u001b[31m[69]#011train-auc:0.877831#011validation-auc:0.839167\u001b[0m\n", "\u001b[31m[70]#011train-auc:0.878327#011validation-auc:0.839834\u001b[0m\n", "\u001b[31m[71]#011train-auc:0.878621#011validation-auc:0.839761\u001b[0m\n", "\u001b[31m[72]#011train-auc:0.878942#011validation-auc:0.839909\u001b[0m\n", "\u001b[31m[73]#011train-auc:0.879695#011validation-auc:0.841542\u001b[0m\n", "\u001b[31m[74]#011train-auc:0.879939#011validation-auc:0.841853\u001b[0m\n", "\u001b[31m[75]#011train-auc:0.880311#011validation-auc:0.841607\u001b[0m\n", "\u001b[31m[76]#011train-auc:0.880835#011validation-auc:0.842055\u001b[0m\n", "\u001b[31m[77]#011train-auc:0.881272#011validation-auc:0.842345\u001b[0m\n", "\u001b[31m[78]#011train-auc:0.882109#011validation-auc:0.842463\u001b[0m\n", "\u001b[31m[79]#011train-auc:0.882501#011validation-auc:0.843188\u001b[0m\n", "\u001b[31m[80]#011train-auc:0.882929#011validation-auc:0.843293\u001b[0m\n", "\u001b[31m[81]#011train-auc:0.883472#011validation-auc:0.84352\u001b[0m\n", "\u001b[31m[82]#011train-auc:0.883691#011validation-auc:0.843611\u001b[0m\n", "\u001b[31m[83]#011train-auc:0.884096#011validation-auc:0.843169\u001b[0m\n", "\u001b[31m[84]#011train-auc:0.884764#011validation-auc:0.842866\u001b[0m\n", "\u001b[31m[85]#011train-auc:0.885542#011validation-auc:0.842436\u001b[0m\n", "\u001b[31m[86]#011train-auc:0.885901#011validation-auc:0.842554\u001b[0m\n", "\u001b[31m[87]#011train-auc:0.886121#011validation-auc:0.842114\u001b[0m\n", "\u001b[31m[88]#011train-auc:0.886387#011validation-auc:0.842419\u001b[0m\n", "\u001b[31m[89]#011train-auc:0.886605#011validation-auc:0.843055\u001b[0m\n", "\u001b[31m[90]#011train-auc:0.886894#011validation-auc:0.842851\u001b[0m\n", "\u001b[31m[91]#011train-auc:0.886822#011validation-auc:0.843002\u001b[0m\n", "\u001b[31m[92]#011train-auc:0.887083#011validation-auc:0.843142\u001b[0m\n", "\u001b[31m[93]#011train-auc:0.887122#011validation-auc:0.842454\u001b[0m\n", "\u001b[31m[94]#011train-auc:0.88751#011validation-auc:0.842648\u001b[0m\n", "\u001b[31m[95]#011train-auc:0.887956#011validation-auc:0.843361\u001b[0m\n", "\u001b[31m[96]#011train-auc:0.888247#011validation-auc:0.843686\u001b[0m\n", "\u001b[31m[97]#011train-auc:0.88874#011validation-auc:0.845334\u001b[0m\n", "\u001b[31m[98]#011train-auc:0.889659#011validation-auc:0.844977\u001b[0m\n", "\u001b[31m[99]#011train-auc:0.889981#011validation-auc:0.84512\u001b[0m\n", "Training seconds: 43\n", "Billable seconds: 43\n" ] } ], "source": [ "xgb.fit({'train': s3_input_train, 'validation': s3_input_validation})" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---\n", "## 호스트\n", "\n", "이제 우리의 데이터로 XGBoost 알고리즘으로 훈련이 되었으므로, 훈련된 모델을 Amazon SageMaker에서 호스팅된 endpoint에 다음과 같은 간단한 코드 한줄로 배포가 가능합니다. " ] }, { "cell_type": "code", "execution_count": 29, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "---------------------------------------------------------------------------------------!" ] } ], "source": [ "xgb_predictor = xgb.deploy(initial_instance_count=1,\n", " instance_type='ml.m5.xlarge')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---\n", "\n", "## 평가\n", "\n", "이제 우리가 호스팅하는 endpoint가 있으므로, 우리는 이것으로 예측을 생성할 수 있습니다. 보다 구체적으로, 우리의 모델이 아직 보지 못한 데이터를 얼마나 잘 생성하는지 이해하기 위해서 테스트 데이터셋에서 예측을 생성해 보기로 하겠습니다. \n", "\n", "기계학습 모델의 성ㅇ능 비교하기 위한 많은 방법이 있습니다. 우리는 간단히 게임이 \"hit\" (`1`) 이거나 혹은 아니인지(`0`)에 대한 실제와 예측 값을 비교할 것입니다. 그 다음 혼동 행렬을 만들어서 모델이 각 범주에서 얼마나 많은 테스트 데이터의 포인트 수를 예측했는지와 얼마나 많은 테스트 데이터의 포인트가 실제 각 범주에 속했는지를 볼 것입니다. " ] }, { "cell_type": "code", "execution_count": 30, "metadata": {}, "outputs": [], "source": [ "xgb_predictor.content_type = 'text/x-libsvm'\n", "xgb_predictor.deserializer = None\n", "\n", "def do_predict(data):\n", " payload = '\\n'.join(data)\n", " response = xgb_predictor.predict(payload).decode('utf-8')\n", " result = response.split(',')\n", " preds = [float((num)) for num in result]\n", " preds = [round(num) for num in preds]\n", "\n", " return preds\n", "\n", "def batch_predict(data, batch_size):\n", " items = len(data)\n", " arrs = []\n", " \n", " for offset in range(0, items, batch_size):\n", " if offset+batch_size < items:\n", " results = do_predict(data[offset:(offset+batch_size)])\n", " arrs.extend(results)\n", " else:\n", " arrs.extend(do_predict(data[offset:items]))\n", " sys.stdout.write('.')\n", " return(arrs)" ] }, { "cell_type": "code", "execution_count": 31, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ ".........\n", "error rate=0.144458\n", "CPU times: user 44.3 ms, sys: 505 µs, total: 44.8 ms\n", "Wall time: 353 ms\n" ] } ], "source": [ "%%time\n", "import json\n", "\n", "with open('test.libsvm', 'r') as f:\n", " payload = f.read().strip()\n", "\n", "labels = [int(line.split(' ')[0]) for line in payload.split('\\n')]\n", "test_data = [line for line in payload.split('\\n')]\n", "preds = batch_predict(test_data, 100)\n", "\n", "print ('\\nerror rate=%f' % ( sum(1 for i in range(len(preds)) if preds[i]!=labels[i]) /float(len(preds))))" ] }, { "cell_type": "code", "execution_count": 32, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
col_001
row_0
061655
16171
\n", "
" ], "text/plain": [ "col_0 0 1\n", "row_0 \n", "0 616 55\n", "1 61 71" ] }, "execution_count": 32, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pd.crosstab(index=np.array(labels), columns=np.array(preds))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "우리의 매트릭에 의해 테스트셋에서 \"hits\"인 것은 132개의 게임이었고, 모델은 정확히 70개가 넘게 식별했습니다. 반면 대략적인 에러율은 13%입니다. False Negative와 True Positive의 양은 하이퍼파라미터의 scale_pos_weight 값을 증가시킴으로서 True Positive로 유리하게 이동될 수 있습니다. 물론 이러한 증가는 정확도 감소와 에러율 증가, 그리고 추가적인 False Positive를 초래할 수 있습니다. 이러한 trade-off를 궁극적으로 어떻게 할것인가는 False Positive와 False Negative 등의 상대적 비용에 기반한 비즈니스상의 결정입니다. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---\n", "## 확장\n", "\n", "이 XGBoost 모델은 게임이 리뷰와 다른 특성을 기반으로 하여 hit가 될지 여부를 에측하기 위한 출발점일 뿐입니다. 모델 성능을 향상하기위한 몇개의 가능한 방법이 있습니다. 첫째로, 우선 더 많은 데이터를 모아야 하고 가능하면 기존 누락 필드를 실제 정보로 채워야만 합니다. 또다른 가능성은 Amazon SageMaker의 Automatic Model Tuning기능을 이용한 추가적인 하이퍼파라미터 튜닝입니다. 이 기능의 예제는 [hyperparameter tuning directory of the SageMaker Examples GitHub repository](https://github.com/awslabs/amazon-sagemaker-examples/tree/master/hyperparameter_tuning) 와 Amazon SageMaker 노트북 인스턴스의 **SageMaker Examples** 탭에서 찾을 수 있습니다. 또한 앙상블 학습은 종종 불균형 데이터셋을 잘 처리하지만 다운샘플링이나 합성 데이터증강이나 다른 접근법과 같은, 불균형을 완화할 수 있는 기술을 살펴볼 가치가 있을 수 있습니다. \n", ". \n", "\n", "---\n", "## 정리\n", "\n", "이 노트북이 끝나면 아래 셀을 실행시켜 주세요. 이것은 생성했던 endpoint를 제거하여 남겨진 유휴인스턴스에 요금이 발생되지 않도록 하기 위합니다.\n" ] }, { "cell_type": "code", "execution_count": 33, "metadata": {}, "outputs": [], "source": [ "session.delete_endpoint(xgb_predictor.endpoint)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "conda_python3", "language": "python", "name": "conda_python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.5" }, "notice": "Copyright 2017 Amazon.com, Inc. or its affiliates. All Rights Reserved. Licensed under the Apache License, Version 2.0 (the \"License\"). You may not use this file except in compliance with the License. A copy of the License is located at http://aws.amazon.com/apache2.0/ or in the \"license\" file accompanying this file. This file is distributed on an \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License." }, "nbformat": 4, "nbformat_minor": 4 }