{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "![MLU Logo](../../data/MLU_Logo.png)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Amazon Access Samples Data Set\n",
    " \n",
    " Let's apply our boosting algorithm to a real dataset! We are going to use the __Amazon Access Samples dataset__. \n",
    " \n",
    " We download this dataset from UCI ML repository from this [link](https://archive.ics.uci.edu/ml/datasets/Amazon+Access+Samples). Dua, D. and Graff, C. (2019). [UCI Machine Learning Repository](http://archive.ics.uci.edu/ml). Irvine, CA: University of California, School of Information and Computer Science.\n",
    "\n",
    " \n",
    "__Dataset description:__\n",
    "\n",
    "Employees need to request certain resources to fulfill their daily duties. This data consists of anonymized historical data of employee IT access requests. Data fields look like this:\n",
    " #### Column Descriptions\n",
    "\n",
    "* __ACTION__: 1 if the resource was approved, 0 if not.\n",
    "* __RESOURCE__: An ID for each resource\n",
    "* __PERSON_MGR_ID__: ID of the user's manager\n",
    "* __PERSON_ROLLUP_1__: User grouping ID\n",
    "* __PERSON_ROLLUP_2__: User grouping ID\n",
    "* __PERSON_BUSINESS_TITLE__: Title ID \n",
    "* __PERSON_JOB_FAMILY__: Job family ID \n",
    "* __PERSON_JOB_CODE__: Job code ID \n",
    "\n",
    "Our task is to build a machine learning model that can automatically provision an employee's access to company resources given employee profile information and the resource requested."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "%pip install -q -r ../../requirements.txt"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 1. Download and process the dataset\n",
    "\n",
    "In this section, we will download our dataset and process it. It consists of two files, we will run the following code cells to get our dataset as a single file at the end. One of the files is large (4.8GB), so make sure you have enough storage."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "--2021-11-03 18:04:45--  https://archive.ics.uci.edu/ml/machine-learning-databases/00216/amzn-anon-access-samples.tgz\n",
      "Resolving archive.ics.uci.edu (archive.ics.uci.edu)... 128.195.10.252\n",
      "Connecting to archive.ics.uci.edu (archive.ics.uci.edu)|128.195.10.252|:443... connected.\n",
      "HTTP request sent, awaiting response... 200 OK\n",
      "Length: 12268509 (12M) [application/x-httpd-php]\n",
      "Saving to: ‘amzn-anon-access-samples.tgz’\n",
      "\n",
      "amzn-anon-access-sa 100%[===================>]  11.70M  16.2MB/s    in 0.7s    \n",
      "\n",
      "2021-11-03 18:04:46 (16.2 MB/s) - ‘amzn-anon-access-samples.tgz’ saved [12268509/12268509]\n",
      "\n"
     ]
    }
   ],
   "source": [
    "! wget https://archive.ics.uci.edu/ml/machine-learning-databases/00216/amzn-anon-access-samples.tgz"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "amzn-anon-access-samples-2.0.csv\n",
      "amzn-anon-access-samples-history-2.0.csv\n"
     ]
    }
   ],
   "source": [
    "! tar -zxvf amzn-anon-access-samples.tgz"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We have the following files:\n",
    "* __amzn-anon-access-samples-2.0.csv__: Employee profile data.\n",
    "* __amzn-anon-access-samples-history-2.0.csv__: Resource provision history\n",
    "\n",
    "Below, we first read the amzn-anon-access-samples-2.0.csv file (it is a large file) and use some employee fields."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd\n",
    "import random \n",
    "\n",
    "person_fields = [\"PERSON_ID\", \"PERSON_MGR_ID\",\n",
    "                 \"PERSON_ROLLUP_1\", \"PERSON_ROLLUP_2\",\n",
    "                 \"PERSON_DEPTNAME\", \"PERSON_BUSINESS_TITLE\",\n",
    "                 \"PERSON_JOB_FAMILY\", \"PERSON_JOB_CODE\"]\n",
    "\n",
    "people = {}\n",
    "for chunk in pd.read_csv('amzn-anon-access-samples-2.0.csv', usecols = person_fields, chunksize=5000): \n",
    "    for index, row in chunk.iterrows():\n",
    "        people[row[\"PERSON_ID\"]] = [row[\"PERSON_MGR_ID\"], row[\"PERSON_ROLLUP_1\"],\n",
    "                                    row[\"PERSON_ROLLUP_2\"], row[\"PERSON_DEPTNAME\"],\n",
    "                                    row[\"PERSON_BUSINESS_TITLE\"], row[\"PERSON_JOB_FAMILY\"],\n",
    "                                    row[\"PERSON_JOB_CODE\"]]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now, let's read the resource provision history file. Here, we will create our dataset. We will read the add access and remove access actions and save them."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [],
   "source": [
    "add_access_data = []\n",
    "remove_access_data = []\n",
    "\n",
    "df = pd.read_csv('amzn-anon-access-samples-history-2.0.csv')\n",
    "\n",
    "# Loop through unique logins (employee ids)\n",
    "for login in df[\"LOGIN\"].unique():\n",
    "    login_df = df[df[\"LOGIN\"]==login].copy()\n",
    "    # Save actions\n",
    "    for target in login_df[\"TARGET_NAME\"].unique():\n",
    "        login_target_df = login_df[login_df[\"TARGET_NAME\"]==target]\n",
    "        unique_actions = login_target_df[\"ACTION\"].unique()\n",
    "        if((len(unique_actions)==1) and (unique_actions[0]==\"remove_access\")):\n",
    "            remove_access_data.append([0, target] + people[login])\n",
    "        elif((len(unique_actions)==1) and (unique_actions[0]==\"add_access\")):\n",
    "            add_access_data.append([1, target] + people[login])\n",
    "\n",
    "# Create random seed\n",
    "random.seed(30)\n",
    "\n",
    "# We will use only 8000 random add_access data\n",
    "add_access_data = random.sample(add_access_data, 8000)\n",
    "\n",
    "# Add them together\n",
    "data = add_access_data + remove_access_data\n",
    "\n",
    "# Let's shuffle it\n",
    "random.shuffle(data)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's save this data so that we can use it later"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [],
   "source": [
    "df = pd.DataFrame(data, columns=[\"ACTION\", \"RESOURCE\",\n",
    "                                 \"MGR_ID\", \"ROLLUP_1\",\n",
    "                                 \"ROLLUP_2\", \"DEPTNAME\",\n",
    "                                 \"BUSINESS_TITLE\", \"JOB_FAMILY\",\n",
    "                                 \"JOB_CODE\"])\n",
    "\n",
    "df.to_csv(\"data.csv\", index=False)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Here is how our data look like:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>ACTION</th>\n",
       "      <th>RESOURCE</th>\n",
       "      <th>MGR_ID</th>\n",
       "      <th>ROLLUP_1</th>\n",
       "      <th>ROLLUP_2</th>\n",
       "      <th>DEPTNAME</th>\n",
       "      <th>BUSINESS_TITLE</th>\n",
       "      <th>JOB_FAMILY</th>\n",
       "      <th>JOB_CODE</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1</td>\n",
       "      <td>9802</td>\n",
       "      <td>43122</td>\n",
       "      <td>2</td>\n",
       "      <td>3</td>\n",
       "      <td>33467</td>\n",
       "      <td>45383</td>\n",
       "      <td>11</td>\n",
       "      <td>33326</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>1</td>\n",
       "      <td>10617</td>\n",
       "      <td>36504</td>\n",
       "      <td>33416</td>\n",
       "      <td>33689</td>\n",
       "      <td>36505</td>\n",
       "      <td>41299</td>\n",
       "      <td>33430</td>\n",
       "      <td>33326</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>1</td>\n",
       "      <td>9446</td>\n",
       "      <td>35624</td>\n",
       "      <td>33316</td>\n",
       "      <td>34256</td>\n",
       "      <td>35625</td>\n",
       "      <td>41014</td>\n",
       "      <td>33461</td>\n",
       "      <td>33326</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>1</td>\n",
       "      <td>11065</td>\n",
       "      <td>34326</td>\n",
       "      <td>33299</td>\n",
       "      <td>34397</td>\n",
       "      <td>38458</td>\n",
       "      <td>38459</td>\n",
       "      <td>33678</td>\n",
       "      <td>33289</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>1</td>\n",
       "      <td>11149</td>\n",
       "      <td>40640</td>\n",
       "      <td>33283</td>\n",
       "      <td>40641</td>\n",
       "      <td>40642</td>\n",
       "      <td>40643</td>\n",
       "      <td>33291</td>\n",
       "      <td>33431</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   ACTION  RESOURCE  MGR_ID  ROLLUP_1  ROLLUP_2  DEPTNAME  BUSINESS_TITLE  \\\n",
       "0       1      9802   43122         2         3     33467           45383   \n",
       "1       1     10617   36504     33416     33689     36505           41299   \n",
       "2       1      9446   35624     33316     34256     35625           41014   \n",
       "3       1     11065   34326     33299     34397     38458           38459   \n",
       "4       1     11149   40640     33283     40641     40642           40643   \n",
       "\n",
       "   JOB_FAMILY  JOB_CODE  \n",
       "0          11     33326  \n",
       "1       33430     33326  \n",
       "2       33461     33326  \n",
       "3       33678     33289  \n",
       "4       33291     33431  "
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Delete the previously downloaded files\n",
    "! rm amzn-anon-access-samples-2.0.csv amzn-anon-access-samples-history-2.0.csv amzn-anon-access-samples.tgz"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 2. CatBoost\n",
    "\n",
    "Let's use CatBoost on this dataset. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "from catboost import CatBoostClassifier\n",
    "\n",
    "data = pd.read_csv(\"data.csv\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>ACTION</th>\n",
       "      <th>RESOURCE</th>\n",
       "      <th>MGR_ID</th>\n",
       "      <th>ROLLUP_1</th>\n",
       "      <th>ROLLUP_2</th>\n",
       "      <th>DEPTNAME</th>\n",
       "      <th>BUSINESS_TITLE</th>\n",
       "      <th>JOB_FAMILY</th>\n",
       "      <th>JOB_CODE</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1</td>\n",
       "      <td>9802</td>\n",
       "      <td>43122</td>\n",
       "      <td>2</td>\n",
       "      <td>3</td>\n",
       "      <td>33467</td>\n",
       "      <td>45383</td>\n",
       "      <td>11</td>\n",
       "      <td>33326</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>1</td>\n",
       "      <td>10617</td>\n",
       "      <td>36504</td>\n",
       "      <td>33416</td>\n",
       "      <td>33689</td>\n",
       "      <td>36505</td>\n",
       "      <td>41299</td>\n",
       "      <td>33430</td>\n",
       "      <td>33326</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>1</td>\n",
       "      <td>9446</td>\n",
       "      <td>35624</td>\n",
       "      <td>33316</td>\n",
       "      <td>34256</td>\n",
       "      <td>35625</td>\n",
       "      <td>41014</td>\n",
       "      <td>33461</td>\n",
       "      <td>33326</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>1</td>\n",
       "      <td>11065</td>\n",
       "      <td>34326</td>\n",
       "      <td>33299</td>\n",
       "      <td>34397</td>\n",
       "      <td>38458</td>\n",
       "      <td>38459</td>\n",
       "      <td>33678</td>\n",
       "      <td>33289</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>1</td>\n",
       "      <td>11149</td>\n",
       "      <td>40640</td>\n",
       "      <td>33283</td>\n",
       "      <td>40641</td>\n",
       "      <td>40642</td>\n",
       "      <td>40643</td>\n",
       "      <td>33291</td>\n",
       "      <td>33431</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>0</td>\n",
       "      <td>9799</td>\n",
       "      <td>35395</td>\n",
       "      <td>2</td>\n",
       "      <td>33730</td>\n",
       "      <td>33731</td>\n",
       "      <td>63077</td>\n",
       "      <td>33657</td>\n",
       "      <td>9</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>1</td>\n",
       "      <td>9674</td>\n",
       "      <td>45034</td>\n",
       "      <td>33521</td>\n",
       "      <td>34979</td>\n",
       "      <td>34979</td>\n",
       "      <td>51381</td>\n",
       "      <td>33526</td>\n",
       "      <td>33326</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>1</td>\n",
       "      <td>10561</td>\n",
       "      <td>34023</td>\n",
       "      <td>33316</td>\n",
       "      <td>34024</td>\n",
       "      <td>34025</td>\n",
       "      <td>37456</td>\n",
       "      <td>33430</td>\n",
       "      <td>9</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>1</td>\n",
       "      <td>9667</td>\n",
       "      <td>43085</td>\n",
       "      <td>2</td>\n",
       "      <td>33365</td>\n",
       "      <td>37239</td>\n",
       "      <td>43086</td>\n",
       "      <td>33370</td>\n",
       "      <td>33326</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>1</td>\n",
       "      <td>3164</td>\n",
       "      <td>36605</td>\n",
       "      <td>2</td>\n",
       "      <td>33730</td>\n",
       "      <td>33902</td>\n",
       "      <td>36850</td>\n",
       "      <td>33461</td>\n",
       "      <td>33431</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   ACTION  RESOURCE  MGR_ID  ROLLUP_1  ROLLUP_2  DEPTNAME  BUSINESS_TITLE  \\\n",
       "0       1      9802   43122         2         3     33467           45383   \n",
       "1       1     10617   36504     33416     33689     36505           41299   \n",
       "2       1      9446   35624     33316     34256     35625           41014   \n",
       "3       1     11065   34326     33299     34397     38458           38459   \n",
       "4       1     11149   40640     33283     40641     40642           40643   \n",
       "5       0      9799   35395         2     33730     33731           63077   \n",
       "6       1      9674   45034     33521     34979     34979           51381   \n",
       "7       1     10561   34023     33316     34024     34025           37456   \n",
       "8       1      9667   43085         2     33365     37239           43086   \n",
       "9       1      3164   36605         2     33730     33902           36850   \n",
       "\n",
       "   JOB_FAMILY  JOB_CODE  \n",
       "0          11     33326  \n",
       "1       33430     33326  \n",
       "2       33461     33326  \n",
       "3       33678     33289  \n",
       "4       33291     33431  \n",
       "5       33657         9  \n",
       "6       33526     33326  \n",
       "7       33430         9  \n",
       "8       33370     33326  \n",
       "9       33461     33431  "
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data.head(10)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The dataset looks imbalanced"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "1    8000\n",
       "0     152\n",
       "Name: ACTION, dtype: int64"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data[\"ACTION\"].value_counts()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's get input and target."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [],
   "source": [
    "y = data['ACTION']\n",
    "X = data.drop(columns='ACTION')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We will use 15% of the data for validation"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [],
   "source": [
    "from sklearn.model_selection import train_test_split\n",
    "\n",
    "X_train, X_valid, y_train, y_valid = train_test_split(X,\n",
    "                                                      y,\n",
    "                                                      test_size=0.15,\n",
    "                                                      random_state=136,\n",
    "                                                      stratify=y\n",
    "                                                     )"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "As we have an imbalanced dataset, we will need to calculate the class wegihts and then fit the tree classifier."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "4b7684525653426cb0acbfa9641c97b4",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "MetricVisualizer(layout=Layout(align_self='stretch', height='500px'))"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Learning rate set to 0.102945\n",
      "0:\tlearn: 0.5361259\ttest: 0.5393786\tbest: 0.5393786 (0)\ttotal: 58.9ms\tremaining: 11.7s\n",
      "199:\tlearn: 0.9924433\ttest: 0.7973391\tbest: 0.8949858 (62)\ttotal: 2.32s\tremaining: 0us\n",
      "\n",
      "bestTest = 0.8949858117\n",
      "bestIteration = 62\n",
      "\n",
      "Shrink model to first 63 iterations.\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "<catboost.core.CatBoostClassifier at 0x7f0b94752630>"
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "class_weight_0 = (sum(y_train==0) + sum(y_train==1))/sum(y_train==0)\n",
    "class_weight_1 = (sum(y_train==0) + sum(y_train==1))/sum(y_train==1)\n",
    "\n",
    "params = {'loss_function':'Logloss', # Some others: CrossEntropy\n",
    "          'eval_metric':'F1', # Some others: Accuracy, Precision, Recall, F1, AUC\n",
    "          'verbose': 200, # output training process at every 200 iterations\n",
    "          'random_seed': 13,\n",
    "          'iterations': 200,\n",
    "          'class_weights': [class_weight_0, class_weight_1]\n",
    "         }\n",
    "\n",
    "# All input features are categorical\n",
    "cat_features = [0, 1, 2, 3, 4, 5, 6, 7]\n",
    "cb_classifier = CatBoostClassifier(**params)\n",
    "cb_classifier.fit(X_train, y_train,\n",
    "          eval_set=(X_valid, y_valid), # data to validate on\n",
    "          use_best_model=True, \n",
    "          plot=True, # It plots a nice visual for the training process\n",
    "          cat_features=cat_features)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's see the overall performance on validation set."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "              precision    recall  f1-score   support\n",
      "\n",
      "           0       0.17      0.87      0.28        23\n",
      "           1       1.00      0.92      0.96      1200\n",
      "\n",
      "    accuracy                           0.92      1223\n",
      "   macro avg       0.58      0.89      0.62      1223\n",
      "weighted avg       0.98      0.92      0.94      1223\n",
      "\n"
     ]
    }
   ],
   "source": [
    "from sklearn.metrics import classification_report\n",
    "\n",
    "y_pred = cb_classifier.predict(X_valid)\n",
    "\n",
    "print(classification_report(y_valid, np.round(y_pred)))"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "conda_python3",
   "language": "python",
   "name": "conda_python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.13"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}