{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "cbb67c68",
   "metadata": {},
   "source": [
    "## Input Redshift Cluster Endpoint and User\n",
    "\n",
    "Please input your Amazon Redshift Cluster endpoint and existing database user"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "d4f69e6f",
   "metadata": {},
   "outputs": [],
   "source": [
    "REDSHIFT_ENDPOINT = 'redshift-cluster.xxxxxxxxxx.us-east-1.redshift.amazonaws.com:5439/dev'\n",
    "REDSHIFT_USER=\"awsuser\""
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c61a9220",
   "metadata": {},
   "source": [
    "## Setup Run SQL function using Redshift Data API to get SQL query output directly into pandas dataframe\n",
    "\n",
    "In this step, we are creating function run_sql, which we will use to get SQL query output directly into pandas dataframe. We will also use this function to run DDL statements"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "fc57f8b6",
   "metadata": {},
   "outputs": [],
   "source": [
    "import boto3\n",
    "import time\n",
    "import pandas as pd\n",
    "import numpy as np\n",
    "import matplotlib.pyplot as plt\n",
    "\n",
    "session = boto3.session.Session()\n",
    "region = session.region_name\n",
    "\n",
    "\n",
    "def run_sql(sql_text):\n",
    "    client = boto3.client(\"redshift-data\")\n",
    "    res = client.execute_statement(Database=REDSHIFT_ENDPOINT.split('/')[1], DbUser=REDSHIFT_USER, Sql=sql_text,\n",
    "                                   ClusterIdentifier=REDSHIFT_ENDPOINT.split('.')[0])\n",
    "    query_id = res[\"Id\"]\n",
    "    while True:\n",
    "        time.sleep(1)\n",
    "        status_description = client.describe_statement(Id=query_id)\n",
    "        status = status_description[\"Status\"]\n",
    "        if status == \"FAILED\":\n",
    "            raise Exception('SQL query failed:' + query_id + \": \" + status_description[\"Error\"])\n",
    "        elif status == \"FINISHED\":\n",
    "            if status_description['ResultRows']>0:\n",
    "                results = client.get_statement_result(Id=query_id)\n",
    "                column_labels = []\n",
    "                for i in range(len(results[\"ColumnMetadata\"])): column_labels.append(results[\"ColumnMetadata\"][i]['label'])\n",
    "                records = []\n",
    "                for record in results.get('Records'):\n",
    "                    records.append([list(rec.values())[0] for rec in record])\n",
    "                df = pd.DataFrame(np.array(records), columns=column_labels)\n",
    "                return df\n",
    "            else:\n",
    "                return query_id"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "248da963",
   "metadata": {},
   "source": [
    "## Create User and Grant Permissions - Optional\n",
    "\n",
    "As the database adminstrator, you may skip the permissions section. Otherwise, you can create users and grant them permissions with the Principle of Least Privilege in mind.  \n",
    "\n",
    "\n",
    "If demouser exists with privilege, please revoke before dropping the user:\n",
    "\n",
    "```sql\n",
    "revoke all on schema demo_ml from demouser;\n",
    "```\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "c9105bed",
   "metadata": {},
   "outputs": [],
   "source": [
    "permissions_one_sql = \"\"\"\n",
    "\n",
    "DROP USER IF EXISTS demouser;\n",
    "\n",
    "create user demouser with password '<password>';\n",
    "\n",
    "GRANT CREATE MODEL TO demouser;\n",
    "\n",
    "\"\"\"\n",
    "\n",
    "for sql_text in permissions_one_sql.split(\";\")[:-1]:\n",
    "    run_sql(sql_text);"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "200106f6",
   "metadata": {},
   "source": [
    "## Data Preparation \n",
    "\n",
    "Data preparation script to be ran on Amazon Redshift\n",
    "\n",
    "**Note**: Please change `<accountId>` to your AWS Account Id down in the script below\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "648aacfd",
   "metadata": {},
   "outputs": [],
   "source": [
    "setup_script=\"\"\"\n",
    "\n",
    "DROP SCHEMA IF EXISTS DEMO_ML CASCADE;\n",
    "\n",
    "CREATE SCHEMA DEMO_ML;\n",
    "\n",
    "DROP TABLE IF EXISTS demo_ml.customer_activity;\n",
    "\n",
    "CREATE TABLE demo_ml.customer_activity (\n",
    "state varchar(2), \n",
    "account_length int, \n",
    "area_code int,\n",
    "phone varchar(8), \n",
    "intl_plan varchar(3), \n",
    "vMail_plan varchar(3),\n",
    "vMail_message int, \n",
    "day_mins float, \n",
    "day_calls int, \n",
    "day_charge float,\n",
    "total_charge float,\n",
    "eve_mins float, \n",
    "eve_calls int, \n",
    "eve_charge float, \n",
    "night_mins float,\n",
    "night_calls int, \n",
    "night_charge float, \n",
    "intl_mins float, \n",
    "intl_calls int,\n",
    "intl_charge float, \n",
    "cust_serv_calls int, \n",
    "churn varchar(6),\n",
    "record_date date);\n",
    "\n",
    "COPY DEMO_ML.customer_activity \n",
    "FROM 's3://redshift-downloads/redshift-ml/customer_activity/' \n",
    "IAM_ROLE 'arn:aws:iam::<accountId>:role/RedshiftML' IGNOREHEADER 1 CSV\n",
    "region 'us-east-1';\n",
    "\"\"\""
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e7391214",
   "metadata": {},
   "source": [
    "## Run data preparation script in Amazon Redshift "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "3c9bba54",
   "metadata": {},
   "outputs": [],
   "source": [
    "for sql_text in setup_script.strip().split(\";\")[:-1]:\n",
    "    run_sql(sql_text);\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "15956453",
   "metadata": {},
   "source": [
    "## Granting Permissions - Optional\n",
    "\n",
    "Create demo user\n",
    "Grant create model permissions to `demouser`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "3078e264",
   "metadata": {},
   "outputs": [],
   "source": [
    "permissions_two_sql = \"\"\"\n",
    "\n",
    "GRANT SELECT on demo_ml.customer_activity TO demouser;\n",
    "\n",
    "GRANT CREATE, USAGE ON SCHEMA demo_ml TO demouser;\n",
    "\n",
    "\n",
    "\"\"\"\n",
    "\n",
    "for sql_text in permissions_two_sql.split(\";\")[:-1]:\n",
    "    run_sql(sql_text);"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "82cf9a92",
   "metadata": {},
   "source": [
    "## Read SQL output from Pandas Dataframe"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "d0d4d66d",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>state</th>\n",
       "      <th>account_length</th>\n",
       "      <th>area_code</th>\n",
       "      <th>phone</th>\n",
       "      <th>intl_plan</th>\n",
       "      <th>vmail_plan</th>\n",
       "      <th>vmail_message</th>\n",
       "      <th>day_mins</th>\n",
       "      <th>day_calls</th>\n",
       "      <th>day_charge</th>\n",
       "      <th>...</th>\n",
       "      <th>eve_charge</th>\n",
       "      <th>night_mins</th>\n",
       "      <th>night_calls</th>\n",
       "      <th>night_charge</th>\n",
       "      <th>intl_mins</th>\n",
       "      <th>intl_calls</th>\n",
       "      <th>intl_charge</th>\n",
       "      <th>cust_serv_calls</th>\n",
       "      <th>churn</th>\n",
       "      <th>record_date</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>KS</td>\n",
       "      <td>128</td>\n",
       "      <td>415</td>\n",
       "      <td>382-4657</td>\n",
       "      <td>no</td>\n",
       "      <td>yes</td>\n",
       "      <td>25</td>\n",
       "      <td>265.1</td>\n",
       "      <td>110</td>\n",
       "      <td>45.07</td>\n",
       "      <td>...</td>\n",
       "      <td>16.78</td>\n",
       "      <td>244.7</td>\n",
       "      <td>91</td>\n",
       "      <td>11.01</td>\n",
       "      <td>10.0</td>\n",
       "      <td>3</td>\n",
       "      <td>2.7</td>\n",
       "      <td>1</td>\n",
       "      <td>False.</td>\n",
       "      <td>2020-08-24</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>OH</td>\n",
       "      <td>107</td>\n",
       "      <td>415</td>\n",
       "      <td>371-7191</td>\n",
       "      <td>no</td>\n",
       "      <td>yes</td>\n",
       "      <td>26</td>\n",
       "      <td>161.6</td>\n",
       "      <td>123</td>\n",
       "      <td>27.47</td>\n",
       "      <td>...</td>\n",
       "      <td>16.62</td>\n",
       "      <td>254.4</td>\n",
       "      <td>103</td>\n",
       "      <td>11.45</td>\n",
       "      <td>13.7</td>\n",
       "      <td>3</td>\n",
       "      <td>3.7</td>\n",
       "      <td>1</td>\n",
       "      <td>False.</td>\n",
       "      <td>2019-09-23</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>NJ</td>\n",
       "      <td>137</td>\n",
       "      <td>415</td>\n",
       "      <td>358-1921</td>\n",
       "      <td>no</td>\n",
       "      <td>no</td>\n",
       "      <td>0</td>\n",
       "      <td>243.4</td>\n",
       "      <td>114</td>\n",
       "      <td>41.38</td>\n",
       "      <td>...</td>\n",
       "      <td>10.3</td>\n",
       "      <td>162.6</td>\n",
       "      <td>104</td>\n",
       "      <td>7.32</td>\n",
       "      <td>12.2</td>\n",
       "      <td>5</td>\n",
       "      <td>3.29</td>\n",
       "      <td>0</td>\n",
       "      <td>False.</td>\n",
       "      <td>2020-03-09</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>OH</td>\n",
       "      <td>84</td>\n",
       "      <td>408</td>\n",
       "      <td>375-9999</td>\n",
       "      <td>yes</td>\n",
       "      <td>no</td>\n",
       "      <td>0</td>\n",
       "      <td>299.4</td>\n",
       "      <td>71</td>\n",
       "      <td>50.9</td>\n",
       "      <td>...</td>\n",
       "      <td>5.26</td>\n",
       "      <td>196.9</td>\n",
       "      <td>89</td>\n",
       "      <td>8.86</td>\n",
       "      <td>6.6</td>\n",
       "      <td>7</td>\n",
       "      <td>1.78</td>\n",
       "      <td>2</td>\n",
       "      <td>False.</td>\n",
       "      <td>2019-07-08</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>OK</td>\n",
       "      <td>75</td>\n",
       "      <td>415</td>\n",
       "      <td>330-6626</td>\n",
       "      <td>yes</td>\n",
       "      <td>no</td>\n",
       "      <td>0</td>\n",
       "      <td>166.7</td>\n",
       "      <td>113</td>\n",
       "      <td>28.34</td>\n",
       "      <td>...</td>\n",
       "      <td>12.61</td>\n",
       "      <td>186.9</td>\n",
       "      <td>121</td>\n",
       "      <td>8.41</td>\n",
       "      <td>10.1</td>\n",
       "      <td>3</td>\n",
       "      <td>2.73</td>\n",
       "      <td>3</td>\n",
       "      <td>False.</td>\n",
       "      <td>2020-02-14</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>AL</td>\n",
       "      <td>118</td>\n",
       "      <td>510</td>\n",
       "      <td>391-8027</td>\n",
       "      <td>yes</td>\n",
       "      <td>no</td>\n",
       "      <td>0</td>\n",
       "      <td>223.4</td>\n",
       "      <td>98</td>\n",
       "      <td>37.98</td>\n",
       "      <td>...</td>\n",
       "      <td>18.75</td>\n",
       "      <td>203.9</td>\n",
       "      <td>118</td>\n",
       "      <td>9.18</td>\n",
       "      <td>6.3</td>\n",
       "      <td>6</td>\n",
       "      <td>1.7</td>\n",
       "      <td>0</td>\n",
       "      <td>False.</td>\n",
       "      <td>2019-07-28</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>MA</td>\n",
       "      <td>121</td>\n",
       "      <td>510</td>\n",
       "      <td>355-9993</td>\n",
       "      <td>no</td>\n",
       "      <td>yes</td>\n",
       "      <td>24</td>\n",
       "      <td>218.2</td>\n",
       "      <td>88</td>\n",
       "      <td>37.09</td>\n",
       "      <td>...</td>\n",
       "      <td>29.62</td>\n",
       "      <td>212.6</td>\n",
       "      <td>118</td>\n",
       "      <td>9.57</td>\n",
       "      <td>7.5</td>\n",
       "      <td>7</td>\n",
       "      <td>2.03</td>\n",
       "      <td>3</td>\n",
       "      <td>False.</td>\n",
       "      <td>2019-06-07</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>MO</td>\n",
       "      <td>147</td>\n",
       "      <td>415</td>\n",
       "      <td>329-9001</td>\n",
       "      <td>yes</td>\n",
       "      <td>no</td>\n",
       "      <td>0</td>\n",
       "      <td>157.0</td>\n",
       "      <td>79</td>\n",
       "      <td>26.69</td>\n",
       "      <td>...</td>\n",
       "      <td>8.76</td>\n",
       "      <td>211.8</td>\n",
       "      <td>96</td>\n",
       "      <td>9.53</td>\n",
       "      <td>7.1</td>\n",
       "      <td>6</td>\n",
       "      <td>1.92</td>\n",
       "      <td>0</td>\n",
       "      <td>False.</td>\n",
       "      <td>2020-08-22</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>LA</td>\n",
       "      <td>117</td>\n",
       "      <td>408</td>\n",
       "      <td>335-4719</td>\n",
       "      <td>no</td>\n",
       "      <td>no</td>\n",
       "      <td>0</td>\n",
       "      <td>184.5</td>\n",
       "      <td>97</td>\n",
       "      <td>31.37</td>\n",
       "      <td>...</td>\n",
       "      <td>29.89</td>\n",
       "      <td>215.8</td>\n",
       "      <td>90</td>\n",
       "      <td>9.71</td>\n",
       "      <td>8.7</td>\n",
       "      <td>4</td>\n",
       "      <td>2.35</td>\n",
       "      <td>1</td>\n",
       "      <td>False.</td>\n",
       "      <td>2020-04-10</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>WV</td>\n",
       "      <td>141</td>\n",
       "      <td>415</td>\n",
       "      <td>330-8173</td>\n",
       "      <td>yes</td>\n",
       "      <td>yes</td>\n",
       "      <td>37</td>\n",
       "      <td>258.6</td>\n",
       "      <td>84</td>\n",
       "      <td>43.96</td>\n",
       "      <td>...</td>\n",
       "      <td>18.87</td>\n",
       "      <td>326.4</td>\n",
       "      <td>97</td>\n",
       "      <td>14.69</td>\n",
       "      <td>11.2</td>\n",
       "      <td>5</td>\n",
       "      <td>3.02</td>\n",
       "      <td>0</td>\n",
       "      <td>False.</td>\n",
       "      <td>2020-06-06</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>10 rows × 23 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "  state account_length area_code     phone intl_plan vmail_plan vmail_message  \\\n",
       "0    KS            128       415  382-4657        no        yes            25   \n",
       "1    OH            107       415  371-7191        no        yes            26   \n",
       "2    NJ            137       415  358-1921        no         no             0   \n",
       "3    OH             84       408  375-9999       yes         no             0   \n",
       "4    OK             75       415  330-6626       yes         no             0   \n",
       "5    AL            118       510  391-8027       yes         no             0   \n",
       "6    MA            121       510  355-9993        no        yes            24   \n",
       "7    MO            147       415  329-9001       yes         no             0   \n",
       "8    LA            117       408  335-4719        no         no             0   \n",
       "9    WV            141       415  330-8173       yes        yes            37   \n",
       "\n",
       "  day_mins day_calls day_charge  ... eve_charge night_mins night_calls  \\\n",
       "0    265.1       110      45.07  ...      16.78      244.7          91   \n",
       "1    161.6       123      27.47  ...      16.62      254.4         103   \n",
       "2    243.4       114      41.38  ...       10.3      162.6         104   \n",
       "3    299.4        71       50.9  ...       5.26      196.9          89   \n",
       "4    166.7       113      28.34  ...      12.61      186.9         121   \n",
       "5    223.4        98      37.98  ...      18.75      203.9         118   \n",
       "6    218.2        88      37.09  ...      29.62      212.6         118   \n",
       "7    157.0        79      26.69  ...       8.76      211.8          96   \n",
       "8    184.5        97      31.37  ...      29.89      215.8          90   \n",
       "9    258.6        84      43.96  ...      18.87      326.4          97   \n",
       "\n",
       "  night_charge intl_mins intl_calls intl_charge cust_serv_calls   churn  \\\n",
       "0        11.01      10.0          3         2.7               1  False.   \n",
       "1        11.45      13.7          3         3.7               1  False.   \n",
       "2         7.32      12.2          5        3.29               0  False.   \n",
       "3         8.86       6.6          7        1.78               2  False.   \n",
       "4         8.41      10.1          3        2.73               3  False.   \n",
       "5         9.18       6.3          6         1.7               0  False.   \n",
       "6         9.57       7.5          7        2.03               3  False.   \n",
       "7         9.53       7.1          6        1.92               0  False.   \n",
       "8         9.71       8.7          4        2.35               1  False.   \n",
       "9        14.69      11.2          5        3.02               0  False.   \n",
       "\n",
       "  record_date  \n",
       "0  2020-08-24  \n",
       "1  2019-09-23  \n",
       "2  2020-03-09  \n",
       "3  2019-07-08  \n",
       "4  2020-02-14  \n",
       "5  2019-07-28  \n",
       "6  2019-06-07  \n",
       "7  2020-08-22  \n",
       "8  2020-04-10  \n",
       "9  2020-06-06  \n",
       "\n",
       "[10 rows x 23 columns]"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df = run_sql(\"SELECT * FROM demo_ml.customer_activity;\");\n",
    "df.head(10)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "49fd6fe7",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>state</th>\n",
       "      <th>account_length</th>\n",
       "      <th>area_code</th>\n",
       "      <th>phone</th>\n",
       "      <th>intl_plan</th>\n",
       "      <th>vmail_plan</th>\n",
       "      <th>vmail_message</th>\n",
       "      <th>day_mins</th>\n",
       "      <th>day_calls</th>\n",
       "      <th>day_charge</th>\n",
       "      <th>...</th>\n",
       "      <th>eve_charge</th>\n",
       "      <th>night_mins</th>\n",
       "      <th>night_calls</th>\n",
       "      <th>night_charge</th>\n",
       "      <th>intl_mins</th>\n",
       "      <th>intl_calls</th>\n",
       "      <th>intl_charge</th>\n",
       "      <th>cust_serv_calls</th>\n",
       "      <th>churn</th>\n",
       "      <th>record_date</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>count</th>\n",
       "      <td>3333</td>\n",
       "      <td>3333</td>\n",
       "      <td>3333</td>\n",
       "      <td>3333</td>\n",
       "      <td>3333</td>\n",
       "      <td>3333</td>\n",
       "      <td>3333</td>\n",
       "      <td>3333</td>\n",
       "      <td>3333</td>\n",
       "      <td>3333</td>\n",
       "      <td>...</td>\n",
       "      <td>3333</td>\n",
       "      <td>3333</td>\n",
       "      <td>3333</td>\n",
       "      <td>3333</td>\n",
       "      <td>3333</td>\n",
       "      <td>3333</td>\n",
       "      <td>3333</td>\n",
       "      <td>3333</td>\n",
       "      <td>3333</td>\n",
       "      <td>3333</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>unique</th>\n",
       "      <td>51</td>\n",
       "      <td>212</td>\n",
       "      <td>3</td>\n",
       "      <td>3333</td>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "      <td>46</td>\n",
       "      <td>1667</td>\n",
       "      <td>119</td>\n",
       "      <td>1667</td>\n",
       "      <td>...</td>\n",
       "      <td>1440</td>\n",
       "      <td>1591</td>\n",
       "      <td>120</td>\n",
       "      <td>933</td>\n",
       "      <td>162</td>\n",
       "      <td>21</td>\n",
       "      <td>162</td>\n",
       "      <td>10</td>\n",
       "      <td>2</td>\n",
       "      <td>520</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>top</th>\n",
       "      <td>WV</td>\n",
       "      <td>105</td>\n",
       "      <td>415</td>\n",
       "      <td>389-2540</td>\n",
       "      <td>no</td>\n",
       "      <td>no</td>\n",
       "      <td>0</td>\n",
       "      <td>159.5</td>\n",
       "      <td>102</td>\n",
       "      <td>27.12</td>\n",
       "      <td>...</td>\n",
       "      <td>16.12</td>\n",
       "      <td>191.4</td>\n",
       "      <td>105</td>\n",
       "      <td>9.66</td>\n",
       "      <td>10.0</td>\n",
       "      <td>3</td>\n",
       "      <td>2.7</td>\n",
       "      <td>1</td>\n",
       "      <td>False.</td>\n",
       "      <td>2020-07-18</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>freq</th>\n",
       "      <td>106</td>\n",
       "      <td>43</td>\n",
       "      <td>1655</td>\n",
       "      <td>1</td>\n",
       "      <td>3010</td>\n",
       "      <td>2411</td>\n",
       "      <td>2411</td>\n",
       "      <td>8</td>\n",
       "      <td>78</td>\n",
       "      <td>8</td>\n",
       "      <td>...</td>\n",
       "      <td>11</td>\n",
       "      <td>8</td>\n",
       "      <td>84</td>\n",
       "      <td>15</td>\n",
       "      <td>62</td>\n",
       "      <td>668</td>\n",
       "      <td>62</td>\n",
       "      <td>1181</td>\n",
       "      <td>2850</td>\n",
       "      <td>16</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>4 rows × 23 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "       state account_length area_code     phone intl_plan vmail_plan  \\\n",
       "count   3333           3333      3333      3333      3333       3333   \n",
       "unique    51            212         3      3333         2          2   \n",
       "top       WV            105       415  389-2540        no         no   \n",
       "freq     106             43      1655         1      3010       2411   \n",
       "\n",
       "       vmail_message day_mins day_calls day_charge  ... eve_charge night_mins  \\\n",
       "count           3333     3333      3333       3333  ...       3333       3333   \n",
       "unique            46     1667       119       1667  ...       1440       1591   \n",
       "top                0    159.5       102      27.12  ...      16.12      191.4   \n",
       "freq            2411        8        78          8  ...         11          8   \n",
       "\n",
       "       night_calls night_charge intl_mins intl_calls intl_charge  \\\n",
       "count         3333         3333      3333       3333        3333   \n",
       "unique         120          933       162         21         162   \n",
       "top            105         9.66      10.0          3         2.7   \n",
       "freq            84           15        62        668          62   \n",
       "\n",
       "       cust_serv_calls   churn record_date  \n",
       "count             3333    3333        3333  \n",
       "unique              10       2         520  \n",
       "top                  1  False.  2020-07-18  \n",
       "freq              1181    2850          16  \n",
       "\n",
       "[4 rows x 23 columns]"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.describe()\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f20f0e09",
   "metadata": {},
   "source": [
    "## Run Create Model statement to create a new ML model with Redshift ML\n",
    "\n",
    "Please replace `<accountId>` with your AWS account Id"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "60f9c33f",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'60d56320-0586-4b4e-bd9e-b0435870b3b8'"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "resp = run_sql(\"\"\"\n",
    "CREATE MODEL demo_ml.customer_churn_model\n",
    "  FROM (SELECT state,\n",
    "               area_code,\n",
    "               total_charge/account_length AS average_daily_spend, \n",
    "               cust_serv_calls/account_length AS average_daily_cases,\n",
    "               churn \n",
    "          FROM demo_ml.customer_activity\n",
    "         WHERE record_date < '2020-01-01'\n",
    "     )\n",
    "  TARGET churn\n",
    "FUNCTION predict_customer_churn\n",
    "IAM_ROLE 'arn:aws:iam::<accountId>:role/RedshiftML'\n",
    "SETTINGS (\n",
    "  S3_BUCKET 'redshiftml-<accountId>'\n",
    ")\n",
    ";\n",
    "\"\"\")\n",
    "\n",
    "resp"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b5f27a8c",
   "metadata": {},
   "source": [
    "## Check the status on your ML model \n",
    "\n",
    "You can check the status of your models by running the `SHOW MODEL` command from your SQL prompt.\n",
    "\n",
    "Continuously check `Model State` and once it has been set to `Ready`, continue to the next step. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "id": "4714fcec",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Key</th>\n",
       "      <th>Value</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Model Name</td>\n",
       "      <td>customer_churn_model</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Schema Name</td>\n",
       "      <td>demo_ml</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Owner</td>\n",
       "      <td>awsuser</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Creation Time</td>\n",
       "      <td>Mon, 09.08.2021 23:21:38</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Model State</td>\n",
       "      <td>READY</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>Training Job Status</td>\n",
       "      <td>MaxAutoMLJobRuntimeReached</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>validation:f1_binary</td>\n",
       "      <td>0.271910</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>Estimated Cost</td>\n",
       "      <td>4.907083</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>TRAINING DATA:</td>\n",
       "      <td></td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                    Key                       Value\n",
       "0            Model Name        customer_churn_model\n",
       "1           Schema Name                     demo_ml\n",
       "2                 Owner                     awsuser\n",
       "3         Creation Time    Mon, 09.08.2021 23:21:38\n",
       "4           Model State                       READY\n",
       "5   Training Job Status  MaxAutoMLJobRuntimeReached\n",
       "6  validation:f1_binary                    0.271910\n",
       "7        Estimated Cost                    4.907083\n",
       "8                                                  \n",
       "9        TRAINING DATA:                            "
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df = run_sql('SHOW MODEL demo_ml.customer_churn_model;')\n",
    "\n",
    "df.head(10)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6c1dc6c0",
   "metadata": {},
   "source": [
    "## Evaluate your model performance \n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "id": "f3151f73",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>accountid</th>\n",
       "      <th>churn</th>\n",
       "      <th>predicted</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>510355-9993</td>\n",
       "      <td>False.</td>\n",
       "      <td>True.</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>510394-8006</td>\n",
       "      <td>False.</td>\n",
       "      <td>True.</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>510386-2923</td>\n",
       "      <td>False.</td>\n",
       "      <td>True.</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>415373-2782</td>\n",
       "      <td>False.</td>\n",
       "      <td>True.</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>408357-3817</td>\n",
       "      <td>False.</td>\n",
       "      <td>True.</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>406</th>\n",
       "      <td>510380-3186</td>\n",
       "      <td>False.</td>\n",
       "      <td>True.</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>407</th>\n",
       "      <td>408347-9995</td>\n",
       "      <td>False.</td>\n",
       "      <td>True.</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>408</th>\n",
       "      <td>510340-9013</td>\n",
       "      <td>False.</td>\n",
       "      <td>True.</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>409</th>\n",
       "      <td>408362-8378</td>\n",
       "      <td>False.</td>\n",
       "      <td>True.</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>410</th>\n",
       "      <td>415348-3830</td>\n",
       "      <td>False.</td>\n",
       "      <td>True.</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>411 rows × 3 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "       accountid   churn predicted\n",
       "0    510355-9993  False.     True.\n",
       "1    510394-8006  False.     True.\n",
       "2    510386-2923  False.     True.\n",
       "3    415373-2782  False.     True.\n",
       "4    408357-3817  False.     True.\n",
       "..           ...     ...       ...\n",
       "406  510380-3186  False.     True.\n",
       "407  408347-9995  False.     True.\n",
       "408  510340-9013  False.     True.\n",
       "409  408362-8378  False.     True.\n",
       "410  415348-3830  False.     True.\n",
       "\n",
       "[411 rows x 3 columns]"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df = run_sql(\"\"\"\n",
    "WITH infer_data AS (\n",
    "  SELECT area_code || phone accountid, churn,\n",
    "    demo_ml.predict_customer_churn( \n",
    "          state,\n",
    "          area_code, \n",
    "          total_charge/account_length , \n",
    "          cust_serv_calls/account_length ) AS predicted\n",
    "  FROM demo_ml.customer_activity\n",
    " WHERE record_date <  '2020-01-01'\n",
    ")\n",
    "SELECT * FROM infer_data where churn!=predicted;\n",
    "\"\"\")\n",
    "df"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4dad9eaf",
   "metadata": {},
   "source": [
    "### Evaluation\n",
    "\n",
    "You can see the F1 value for the example model customer_churn_model in the output of the `SHOW MODEL` command. The F1 amount signifies the statistical measure of the precision and recall of all the classes in the model. The value ranges between 0–1; the higher the score, the better the accuracy of the model.\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "id": "71bfc35b",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Key</th>\n",
       "      <th>Value</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Model Name</td>\n",
       "      <td>customer_churn_model</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Schema Name</td>\n",
       "      <td>demo_ml</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Owner</td>\n",
       "      <td>awsuser</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Creation Time</td>\n",
       "      <td>Mon, 09.08.2021 23:21:38</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Model State</td>\n",
       "      <td>READY</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>Training Job Status</td>\n",
       "      <td>MaxAutoMLJobRuntimeReached</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>validation:f1_binary</td>\n",
       "      <td>0.271910</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>Estimated Cost</td>\n",
       "      <td>4.907083</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>TRAINING DATA:</td>\n",
       "      <td></td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                    Key                       Value\n",
       "0            Model Name        customer_churn_model\n",
       "1           Schema Name                     demo_ml\n",
       "2                 Owner                     awsuser\n",
       "3         Creation Time    Mon, 09.08.2021 23:21:38\n",
       "4           Model State                       READY\n",
       "5   Training Job Status  MaxAutoMLJobRuntimeReached\n",
       "6  validation:f1_binary                    0.271910\n",
       "7        Estimated Cost                    4.907083\n",
       "8                                                  \n",
       "9        TRAINING DATA:                            "
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df = run_sql('SHOW MODEL demo_ml.customer_churn_model;')\n",
    "df.head(10)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "df4bc004",
   "metadata": {},
   "source": [
    "## Invoke your ML model for inference\n",
    "\n",
    "\n",
    "You can use your SQL function to apply the ML model to your data in queries, reports, and dashboards. For example, you can run the predict_customer_churn SQL function on new customer data in Amazon Redshift regularly to predict customers at risk of churning and feed this information to sales and marketing teams so they can take preemptive actions, such as sending these customers an offer designed to retain them.\n",
    "\n",
    "For example, you can run the following query to predict which customers in area code 408 might churn and the output shows the account ID and whether the account is predicted to remain active: \n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "id": "b00a93e3",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>accountid</th>\n",
       "      <th>predictedactive</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>408335-4719</td>\n",
       "      <td>True.</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>408350-8884</td>\n",
       "      <td>True.</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>408393-7984</td>\n",
       "      <td>False.</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>408418-6412</td>\n",
       "      <td>True.</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>408383-1121</td>\n",
       "      <td>False.</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>499</th>\n",
       "      <td>408404-5283</td>\n",
       "      <td>False.</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>500</th>\n",
       "      <td>408398-3632</td>\n",
       "      <td>False.</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>501</th>\n",
       "      <td>408340-9449</td>\n",
       "      <td>False.</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>502</th>\n",
       "      <td>408406-6304</td>\n",
       "      <td>False.</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>503</th>\n",
       "      <td>408368-8555</td>\n",
       "      <td>False.</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>504 rows × 2 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "       accountid predictedactive\n",
       "0    408335-4719           True.\n",
       "1    408350-8884           True.\n",
       "2    408393-7984          False.\n",
       "3    408418-6412           True.\n",
       "4    408383-1121          False.\n",
       "..           ...             ...\n",
       "499  408404-5283          False.\n",
       "500  408398-3632          False.\n",
       "501  408340-9449          False.\n",
       "502  408406-6304          False.\n",
       "503  408368-8555          False.\n",
       "\n",
       "[504 rows x 2 columns]"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df = run_sql(\"\"\"\n",
    "SELECT area_code || phone accountid, \n",
    "       demo_ml.predict_customer_churn( \n",
    "          state,\n",
    "          area_code, \n",
    "          total_charge/account_length , \n",
    "          cust_serv_calls/account_length )\n",
    "          AS \"predictedActive\"\n",
    "  FROM demo_ml.customer_activity\n",
    " WHERE area_code='408' and record_date > '2020-01-01';\n",
    "\"\"\")\n",
    "\n",
    "df"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f2e97046",
   "metadata": {},
   "source": [
    "## Granting Permissions - Optional\n",
    "\n",
    "The following code grants the EXECUTE privilege to users such as your marketing_analyst_grp"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "750585e8",
   "metadata": {},
   "outputs": [],
   "source": [
    "df = run_sql('GRANT EXECUTE demo_ml.predict_customer_churn TO marketing_analyst_grp')"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "68e33ded",
   "metadata": {},
   "source": [
    "## Cost Control \n",
    "\n",
    "If the `SELECT` query of `CREATE MODEL` produces 10,000 records for training and each record has five columns, the number of cells in the training data is 50,000. You can control the training cost by setting the `MAX_CELLS`.\n",
    "\n",
    "Please replace `<accountId>` with your AWS account Id\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "f4a74661",
   "metadata": {},
   "outputs": [],
   "source": [
    "df = run_sql(\"\"\"\n",
    "CREATE MODEL demo_ml.customer_churn_model\n",
    "FROM (SELECT state,\n",
    "             area_code,\n",
    "             total_charge/account_length AS average_daily_spend, \n",
    "             cust_serv_calls/account_length AS average_daily_cases,\n",
    "             churn \n",
    "      FROM demo_ml.customer_activity\n",
    "      WHERE account_length > 120 \n",
    "     )\n",
    "TARGET churn\n",
    "FUNCTION predict_customer_churn\n",
    "IAM_ROLE 'arn:aws:iam::<acountId>:role/RedshiftML'\n",
    "SETTINGS (\n",
    "  S3_BUCKET 'redshiftml_<accountId>',\n",
    "   MAX_CELLS 10000\n",
    ")\n",
    ";\n",
    "\"\"\")"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "conda_python3",
   "language": "python",
   "name": "conda_python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.13"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}