{ "cells": [ { "cell_type": "markdown", "id": "cbb67c68", "metadata": {}, "source": [ "## Input Redshift Cluster Endpoint and User\n", "\n", "Please input your Amazon Redshift Cluster endpoint and existing database user" ] }, { "cell_type": "code", "execution_count": 1, "id": "d4f69e6f", "metadata": {}, "outputs": [], "source": [ "REDSHIFT_ENDPOINT = 'redshift-cluster.xxxxxxxxxx.us-east-1.redshift.amazonaws.com:5439/dev'\n", "REDSHIFT_USER=\"awsuser\"" ] }, { "cell_type": "markdown", "id": "c61a9220", "metadata": {}, "source": [ "## Setup Run SQL function using Redshift Data API to get SQL query output directly into pandas dataframe\n", "\n", "In this step, we are creating function run_sql, which we will use to get SQL query output directly into pandas dataframe. We will also use this function to run DDL statements" ] }, { "cell_type": "code", "execution_count": 2, "id": "fc57f8b6", "metadata": {}, "outputs": [], "source": [ "import boto3\n", "import time\n", "import pandas as pd\n", "import numpy as np\n", "import matplotlib.pyplot as plt\n", "\n", "session = boto3.session.Session()\n", "region = session.region_name\n", "\n", "\n", "def run_sql(sql_text):\n", " client = boto3.client(\"redshift-data\")\n", " res = client.execute_statement(Database=REDSHIFT_ENDPOINT.split('/')[1], DbUser=REDSHIFT_USER, Sql=sql_text,\n", " ClusterIdentifier=REDSHIFT_ENDPOINT.split('.')[0])\n", " query_id = res[\"Id\"]\n", " while True:\n", " time.sleep(1)\n", " status_description = client.describe_statement(Id=query_id)\n", " status = status_description[\"Status\"]\n", " if status == \"FAILED\":\n", " raise Exception('SQL query failed:' + query_id + \": \" + status_description[\"Error\"])\n", " elif status == \"FINISHED\":\n", " if status_description['ResultRows']>0:\n", " results = client.get_statement_result(Id=query_id)\n", " column_labels = []\n", " for i in range(len(results[\"ColumnMetadata\"])): column_labels.append(results[\"ColumnMetadata\"][i]['label'])\n", " records = []\n", " for record in results.get('Records'):\n", " records.append([list(rec.values())[0] for rec in record])\n", " df = pd.DataFrame(np.array(records), columns=column_labels)\n", " return df\n", " else:\n", " return query_id" ] }, { "cell_type": "markdown", "id": "248da963", "metadata": {}, "source": [ "## Create User and Grant Permissions - Optional\n", "\n", "As the database adminstrator, you may skip the permissions section. Otherwise, you can create users and grant them permissions with the Principle of Least Privilege in mind. \n", "\n", "\n", "If demouser exists with privilege, please revoke before dropping the user:\n", "\n", "```sql\n", "revoke all on schema demo_ml from demouser;\n", "```\n" ] }, { "cell_type": "code", "execution_count": null, "id": "c9105bed", "metadata": {}, "outputs": [], "source": [ "permissions_one_sql = \"\"\"\n", "\n", "DROP USER IF EXISTS demouser;\n", "\n", "create user demouser with password '';\n", "\n", "GRANT CREATE MODEL TO demouser;\n", "\n", "\"\"\"\n", "\n", "for sql_text in permissions_one_sql.split(\";\")[:-1]:\n", " run_sql(sql_text);" ] }, { "cell_type": "markdown", "id": "200106f6", "metadata": {}, "source": [ "## Data Preparation \n", "\n", "Data preparation script to be ran on Amazon Redshift\n", "\n", "**Note**: Please change `` to your AWS Account Id down in the script below\n" ] }, { "cell_type": "code", "execution_count": 3, "id": "648aacfd", "metadata": {}, "outputs": [], "source": [ "setup_script=\"\"\"\n", "\n", "DROP SCHEMA IF EXISTS DEMO_ML CASCADE;\n", "\n", "CREATE SCHEMA DEMO_ML;\n", "\n", "DROP TABLE IF EXISTS demo_ml.customer_activity;\n", "\n", "CREATE TABLE demo_ml.customer_activity (\n", "state varchar(2), \n", "account_length int, \n", "area_code int,\n", "phone varchar(8), \n", "intl_plan varchar(3), \n", "vMail_plan varchar(3),\n", "vMail_message int, \n", "day_mins float, \n", "day_calls int, \n", "day_charge float,\n", "total_charge float,\n", "eve_mins float, \n", "eve_calls int, \n", "eve_charge float, \n", "night_mins float,\n", "night_calls int, \n", "night_charge float, \n", "intl_mins float, \n", "intl_calls int,\n", "intl_charge float, \n", "cust_serv_calls int, \n", "churn varchar(6),\n", "record_date date);\n", "\n", "COPY DEMO_ML.customer_activity \n", "FROM 's3://redshift-downloads/redshift-ml/customer_activity/' \n", "IAM_ROLE 'arn:aws:iam:::role/RedshiftML' IGNOREHEADER 1 CSV\n", "region 'us-east-1';\n", "\"\"\"" ] }, { "cell_type": "markdown", "id": "e7391214", "metadata": {}, "source": [ "## Run data preparation script in Amazon Redshift " ] }, { "cell_type": "code", "execution_count": 4, "id": "3c9bba54", "metadata": {}, "outputs": [], "source": [ "for sql_text in setup_script.strip().split(\";\")[:-1]:\n", " run_sql(sql_text);\n" ] }, { "cell_type": "markdown", "id": "15956453", "metadata": {}, "source": [ "## Granting Permissions - Optional\n", "\n", "Create demo user\n", "Grant create model permissions to `demouser`" ] }, { "cell_type": "code", "execution_count": null, "id": "3078e264", "metadata": {}, "outputs": [], "source": [ "permissions_two_sql = \"\"\"\n", "\n", "GRANT SELECT on demo_ml.customer_activity TO demouser;\n", "\n", "GRANT CREATE, USAGE ON SCHEMA demo_ml TO demouser;\n", "\n", "\n", "\"\"\"\n", "\n", "for sql_text in permissions_two_sql.split(\";\")[:-1]:\n", " run_sql(sql_text);" ] }, { "cell_type": "markdown", "id": "82cf9a92", "metadata": {}, "source": [ "## Read SQL output from Pandas Dataframe" ] }, { "cell_type": "code", "execution_count": 5, "id": "d0d4d66d", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
stateaccount_lengtharea_codephoneintl_planvmail_planvmail_messageday_minsday_callsday_charge...eve_chargenight_minsnight_callsnight_chargeintl_minsintl_callsintl_chargecust_serv_callschurnrecord_date
0KS128415382-4657noyes25265.111045.07...16.78244.79111.0110.032.71False.2020-08-24
1OH107415371-7191noyes26161.612327.47...16.62254.410311.4513.733.71False.2019-09-23
2NJ137415358-1921nono0243.411441.38...10.3162.61047.3212.253.290False.2020-03-09
3OH84408375-9999yesno0299.47150.9...5.26196.9898.866.671.782False.2019-07-08
4OK75415330-6626yesno0166.711328.34...12.61186.91218.4110.132.733False.2020-02-14
5AL118510391-8027yesno0223.49837.98...18.75203.91189.186.361.70False.2019-07-28
6MA121510355-9993noyes24218.28837.09...29.62212.61189.577.572.033False.2019-06-07
7MO147415329-9001yesno0157.07926.69...8.76211.8969.537.161.920False.2020-08-22
8LA117408335-4719nono0184.59731.37...29.89215.8909.718.742.351False.2020-04-10
9WV141415330-8173yesyes37258.68443.96...18.87326.49714.6911.253.020False.2020-06-06
\n", "

10 rows × 23 columns

\n", "
" ], "text/plain": [ " state account_length area_code phone intl_plan vmail_plan vmail_message \\\n", "0 KS 128 415 382-4657 no yes 25 \n", "1 OH 107 415 371-7191 no yes 26 \n", "2 NJ 137 415 358-1921 no no 0 \n", "3 OH 84 408 375-9999 yes no 0 \n", "4 OK 75 415 330-6626 yes no 0 \n", "5 AL 118 510 391-8027 yes no 0 \n", "6 MA 121 510 355-9993 no yes 24 \n", "7 MO 147 415 329-9001 yes no 0 \n", "8 LA 117 408 335-4719 no no 0 \n", "9 WV 141 415 330-8173 yes yes 37 \n", "\n", " day_mins day_calls day_charge ... eve_charge night_mins night_calls \\\n", "0 265.1 110 45.07 ... 16.78 244.7 91 \n", "1 161.6 123 27.47 ... 16.62 254.4 103 \n", "2 243.4 114 41.38 ... 10.3 162.6 104 \n", "3 299.4 71 50.9 ... 5.26 196.9 89 \n", "4 166.7 113 28.34 ... 12.61 186.9 121 \n", "5 223.4 98 37.98 ... 18.75 203.9 118 \n", "6 218.2 88 37.09 ... 29.62 212.6 118 \n", "7 157.0 79 26.69 ... 8.76 211.8 96 \n", "8 184.5 97 31.37 ... 29.89 215.8 90 \n", "9 258.6 84 43.96 ... 18.87 326.4 97 \n", "\n", " night_charge intl_mins intl_calls intl_charge cust_serv_calls churn \\\n", "0 11.01 10.0 3 2.7 1 False. \n", "1 11.45 13.7 3 3.7 1 False. \n", "2 7.32 12.2 5 3.29 0 False. \n", "3 8.86 6.6 7 1.78 2 False. \n", "4 8.41 10.1 3 2.73 3 False. \n", "5 9.18 6.3 6 1.7 0 False. \n", "6 9.57 7.5 7 2.03 3 False. \n", "7 9.53 7.1 6 1.92 0 False. \n", "8 9.71 8.7 4 2.35 1 False. \n", "9 14.69 11.2 5 3.02 0 False. \n", "\n", " record_date \n", "0 2020-08-24 \n", "1 2019-09-23 \n", "2 2020-03-09 \n", "3 2019-07-08 \n", "4 2020-02-14 \n", "5 2019-07-28 \n", "6 2019-06-07 \n", "7 2020-08-22 \n", "8 2020-04-10 \n", "9 2020-06-06 \n", "\n", "[10 rows x 23 columns]" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df = run_sql(\"SELECT * FROM demo_ml.customer_activity;\");\n", "df.head(10)" ] }, { "cell_type": "code", "execution_count": 6, "id": "49fd6fe7", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
stateaccount_lengtharea_codephoneintl_planvmail_planvmail_messageday_minsday_callsday_charge...eve_chargenight_minsnight_callsnight_chargeintl_minsintl_callsintl_chargecust_serv_callschurnrecord_date
count3333333333333333333333333333333333333333...3333333333333333333333333333333333333333
unique5121233333224616671191667...1440159112093316221162102520
topWV105415389-2540nono0159.510227.12...16.12191.41059.6610.032.71False.2020-07-18
freq10643165513010241124118788...118841562668621181285016
\n", "

4 rows × 23 columns

\n", "
" ], "text/plain": [ " state account_length area_code phone intl_plan vmail_plan \\\n", "count 3333 3333 3333 3333 3333 3333 \n", "unique 51 212 3 3333 2 2 \n", "top WV 105 415 389-2540 no no \n", "freq 106 43 1655 1 3010 2411 \n", "\n", " vmail_message day_mins day_calls day_charge ... eve_charge night_mins \\\n", "count 3333 3333 3333 3333 ... 3333 3333 \n", "unique 46 1667 119 1667 ... 1440 1591 \n", "top 0 159.5 102 27.12 ... 16.12 191.4 \n", "freq 2411 8 78 8 ... 11 8 \n", "\n", " night_calls night_charge intl_mins intl_calls intl_charge \\\n", "count 3333 3333 3333 3333 3333 \n", "unique 120 933 162 21 162 \n", "top 105 9.66 10.0 3 2.7 \n", "freq 84 15 62 668 62 \n", "\n", " cust_serv_calls churn record_date \n", "count 3333 3333 3333 \n", "unique 10 2 520 \n", "top 1 False. 2020-07-18 \n", "freq 1181 2850 16 \n", "\n", "[4 rows x 23 columns]" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.describe()\n" ] }, { "cell_type": "markdown", "id": "f20f0e09", "metadata": {}, "source": [ "## Run Create Model statement to create a new ML model with Redshift ML\n", "\n", "Please replace `` with your AWS account Id" ] }, { "cell_type": "code", "execution_count": 8, "id": "60f9c33f", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'60d56320-0586-4b4e-bd9e-b0435870b3b8'" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "resp = run_sql(\"\"\"\n", "CREATE MODEL demo_ml.customer_churn_model\n", " FROM (SELECT state,\n", " area_code,\n", " total_charge/account_length AS average_daily_spend, \n", " cust_serv_calls/account_length AS average_daily_cases,\n", " churn \n", " FROM demo_ml.customer_activity\n", " WHERE record_date < '2020-01-01'\n", " )\n", " TARGET churn\n", "FUNCTION predict_customer_churn\n", "IAM_ROLE 'arn:aws:iam:::role/RedshiftML'\n", "SETTINGS (\n", " S3_BUCKET 'redshiftml-'\n", ")\n", ";\n", "\"\"\")\n", "\n", "resp" ] }, { "cell_type": "markdown", "id": "b5f27a8c", "metadata": {}, "source": [ "## Check the status on your ML model \n", "\n", "You can check the status of your models by running the `SHOW MODEL` command from your SQL prompt.\n", "\n", "Continuously check `Model State` and once it has been set to `Ready`, continue to the next step. " ] }, { "cell_type": "code", "execution_count": 10, "id": "4714fcec", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
KeyValue
0Model Namecustomer_churn_model
1Schema Namedemo_ml
2Ownerawsuser
3Creation TimeMon, 09.08.2021 23:21:38
4Model StateREADY
5Training Job StatusMaxAutoMLJobRuntimeReached
6validation:f1_binary0.271910
7Estimated Cost4.907083
8
9TRAINING DATA:
\n", "
" ], "text/plain": [ " Key Value\n", "0 Model Name customer_churn_model\n", "1 Schema Name demo_ml\n", "2 Owner awsuser\n", "3 Creation Time Mon, 09.08.2021 23:21:38\n", "4 Model State READY\n", "5 Training Job Status MaxAutoMLJobRuntimeReached\n", "6 validation:f1_binary 0.271910\n", "7 Estimated Cost 4.907083\n", "8 \n", "9 TRAINING DATA: " ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df = run_sql('SHOW MODEL demo_ml.customer_churn_model;')\n", "\n", "df.head(10)" ] }, { "cell_type": "markdown", "id": "6c1dc6c0", "metadata": {}, "source": [ "## Evaluate your model performance \n" ] }, { "cell_type": "code", "execution_count": 11, "id": "f3151f73", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
accountidchurnpredicted
0510355-9993False.True.
1510394-8006False.True.
2510386-2923False.True.
3415373-2782False.True.
4408357-3817False.True.
............
406510380-3186False.True.
407408347-9995False.True.
408510340-9013False.True.
409408362-8378False.True.
410415348-3830False.True.
\n", "

411 rows × 3 columns

\n", "
" ], "text/plain": [ " accountid churn predicted\n", "0 510355-9993 False. True.\n", "1 510394-8006 False. True.\n", "2 510386-2923 False. True.\n", "3 415373-2782 False. True.\n", "4 408357-3817 False. True.\n", ".. ... ... ...\n", "406 510380-3186 False. True.\n", "407 408347-9995 False. True.\n", "408 510340-9013 False. True.\n", "409 408362-8378 False. True.\n", "410 415348-3830 False. True.\n", "\n", "[411 rows x 3 columns]" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df = run_sql(\"\"\"\n", "WITH infer_data AS (\n", " SELECT area_code || phone accountid, churn,\n", " demo_ml.predict_customer_churn( \n", " state,\n", " area_code, \n", " total_charge/account_length , \n", " cust_serv_calls/account_length ) AS predicted\n", " FROM demo_ml.customer_activity\n", " WHERE record_date < '2020-01-01'\n", ")\n", "SELECT * FROM infer_data where churn!=predicted;\n", "\"\"\")\n", "df" ] }, { "cell_type": "markdown", "id": "4dad9eaf", "metadata": {}, "source": [ "### Evaluation\n", "\n", "You can see the F1 value for the example model customer_churn_model in the output of the `SHOW MODEL` command. The F1 amount signifies the statistical measure of the precision and recall of all the classes in the model. The value ranges between 0–1; the higher the score, the better the accuracy of the model.\n", "\n" ] }, { "cell_type": "code", "execution_count": 12, "id": "71bfc35b", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
KeyValue
0Model Namecustomer_churn_model
1Schema Namedemo_ml
2Ownerawsuser
3Creation TimeMon, 09.08.2021 23:21:38
4Model StateREADY
5Training Job StatusMaxAutoMLJobRuntimeReached
6validation:f1_binary0.271910
7Estimated Cost4.907083
8
9TRAINING DATA:
\n", "
" ], "text/plain": [ " Key Value\n", "0 Model Name customer_churn_model\n", "1 Schema Name demo_ml\n", "2 Owner awsuser\n", "3 Creation Time Mon, 09.08.2021 23:21:38\n", "4 Model State READY\n", "5 Training Job Status MaxAutoMLJobRuntimeReached\n", "6 validation:f1_binary 0.271910\n", "7 Estimated Cost 4.907083\n", "8 \n", "9 TRAINING DATA: " ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df = run_sql('SHOW MODEL demo_ml.customer_churn_model;')\n", "df.head(10)" ] }, { "cell_type": "markdown", "id": "df4bc004", "metadata": {}, "source": [ "## Invoke your ML model for inference\n", "\n", "\n", "You can use your SQL function to apply the ML model to your data in queries, reports, and dashboards. For example, you can run the predict_customer_churn SQL function on new customer data in Amazon Redshift regularly to predict customers at risk of churning and feed this information to sales and marketing teams so they can take preemptive actions, such as sending these customers an offer designed to retain them.\n", "\n", "For example, you can run the following query to predict which customers in area code 408 might churn and the output shows the account ID and whether the account is predicted to remain active: \n" ] }, { "cell_type": "code", "execution_count": 13, "id": "b00a93e3", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
accountidpredictedactive
0408335-4719True.
1408350-8884True.
2408393-7984False.
3408418-6412True.
4408383-1121False.
.........
499408404-5283False.
500408398-3632False.
501408340-9449False.
502408406-6304False.
503408368-8555False.
\n", "

504 rows × 2 columns

\n", "
" ], "text/plain": [ " accountid predictedactive\n", "0 408335-4719 True.\n", "1 408350-8884 True.\n", "2 408393-7984 False.\n", "3 408418-6412 True.\n", "4 408383-1121 False.\n", ".. ... ...\n", "499 408404-5283 False.\n", "500 408398-3632 False.\n", "501 408340-9449 False.\n", "502 408406-6304 False.\n", "503 408368-8555 False.\n", "\n", "[504 rows x 2 columns]" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df = run_sql(\"\"\"\n", "SELECT area_code || phone accountid, \n", " demo_ml.predict_customer_churn( \n", " state,\n", " area_code, \n", " total_charge/account_length , \n", " cust_serv_calls/account_length )\n", " AS \"predictedActive\"\n", " FROM demo_ml.customer_activity\n", " WHERE area_code='408' and record_date > '2020-01-01';\n", "\"\"\")\n", "\n", "df" ] }, { "cell_type": "markdown", "id": "f2e97046", "metadata": {}, "source": [ "## Granting Permissions - Optional\n", "\n", "The following code grants the EXECUTE privilege to users such as your marketing_analyst_grp" ] }, { "cell_type": "code", "execution_count": null, "id": "750585e8", "metadata": {}, "outputs": [], "source": [ "df = run_sql('GRANT EXECUTE demo_ml.predict_customer_churn TO marketing_analyst_grp')" ] }, { "cell_type": "markdown", "id": "68e33ded", "metadata": {}, "source": [ "## Cost Control \n", "\n", "If the `SELECT` query of `CREATE MODEL` produces 10,000 records for training and each record has five columns, the number of cells in the training data is 50,000. You can control the training cost by setting the `MAX_CELLS`.\n", "\n", "Please replace `` with your AWS account Id\n" ] }, { "cell_type": "code", "execution_count": null, "id": "f4a74661", "metadata": {}, "outputs": [], "source": [ "df = run_sql(\"\"\"\n", "CREATE MODEL demo_ml.customer_churn_model\n", "FROM (SELECT state,\n", " area_code,\n", " total_charge/account_length AS average_daily_spend, \n", " cust_serv_calls/account_length AS average_daily_cases,\n", " churn \n", " FROM demo_ml.customer_activity\n", " WHERE account_length > 120 \n", " )\n", "TARGET churn\n", "FUNCTION predict_customer_churn\n", "IAM_ROLE 'arn:aws:iam:::role/RedshiftML'\n", "SETTINGS (\n", " S3_BUCKET 'redshiftml_',\n", " MAX_CELLS 10000\n", ")\n", ";\n", "\"\"\")" ] } ], "metadata": { "kernelspec": { "display_name": "conda_python3", "language": "python", "name": "conda_python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.13" } }, "nbformat": 4, "nbformat_minor": 5 }