{ "cells": [ { "cell_type": "markdown", "id": "df276921", "metadata": {}, "source": [ "# Churn Prediction with Text and Interpretability" ] }, { "cell_type": "markdown", "id": "1c1f7791", "metadata": {}, "source": [ "This notebook runs the entire churn prediction pipeline from data preparation to model evaluation and interpretation.\n", "\n", "Alternatively, everything can be run from the terminal as well (see README.md).\n", "\n", "Prerequisite: Dataset has been created (see README.md)." ] }, { "cell_type": "markdown", "id": "032bb6ce", "metadata": {}, "source": [ "### Setup" ] }, { "cell_type": "code", "execution_count": 1, "id": "a32ade35", "metadata": {}, "outputs": [], "source": [ "import os\n", "import pandas as pd\n", "from matplotlib import pyplot as plt\n", "\n", "os.chdir(\"../scripts\")\n", "\n", "import preprocess\n", "import train\n", "import interpret" ] }, { "cell_type": "code", "execution_count": 3, "id": "9d500e76", "metadata": {}, "outputs": [], "source": [ "%load_ext autoreload\n", "%autoreload 2" ] }, { "cell_type": "markdown", "id": "4b8b1336", "metadata": {}, "source": [ "### Load and Prepare the Data" ] }, { "cell_type": "code", "execution_count": 4, "id": "94c30e10", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", " | churn | \n", "chat_log | \n", "state | \n", "account_length | \n", "area_code | \n", "international_plan | \n", "voice_mail_plan | \n", "number_vmail_messages | \n", "total_day_minutes | \n", "total_day_calls | \n", "... | \n", "total_eve_minutes | \n", "total_eve_calls | \n", "total_eve_charge | \n", "total_night_minutes | \n", "total_night_calls | \n", "total_night_charge | \n", "total_intl_minutes | \n", "total_intl_calls | \n", "total_intl_charge | \n", "number_customer_service_calls | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | \n", "no | \n", "Customer: Well, the only thing that I'm consid... | \n", "CT | \n", "134 | \n", "area_code_408 | \n", "no | \n", "no | \n", "0 | \n", "177.2 | \n", "91 | \n", "... | \n", "228.7 | \n", "105 | \n", "19.44 | \n", "194.3 | \n", "113 | \n", "8.74 | \n", "8.9 | \n", "3 | \n", "2.40 | \n", "2 | \n", "
1 | \n", "yes | \n", "Customer: Well, I just want to be able to canc... | \n", "WV | \n", "78 | \n", "area_code_408 | \n", "no | \n", "no | \n", "0 | \n", "226.3 | \n", "88 | \n", "... | \n", "306.2 | \n", "81 | \n", "26.03 | \n", "200.9 | \n", "120 | \n", "9.04 | \n", "7.8 | \n", "11 | \n", "2.11 | \n", "1 | \n", "
2 | \n", "no | \n", "Customer: I would like data.\\nTelCom Agent: Ok... | \n", "IN | \n", "88 | \n", "area_code_415 | \n", "no | \n", "no | \n", "0 | \n", "183.5 | \n", "93 | \n", "... | \n", "170.5 | \n", "80 | \n", "14.49 | \n", "193.8 | \n", "88 | \n", "8.72 | \n", "8.3 | \n", "5 | \n", "2.24 | \n", "3 | \n", "
3 rows × 21 columns
\n", "\n", " | keyword | \n", "sim | \n", "chg | \n", "count | \n", "joint | \n", "
---|---|---|---|---|---|
0 | \n", "voicemail | \n", "90.076553 | \n", "0.006695 | \n", "5 | \n", "0.628774 | \n", "
1 | \n", "cancel | \n", "61.187359 | \n", "-0.081321 | \n", "168 | \n", "0.545633 | \n", "
2 | \n", "sick | \n", "74.789459 | \n", "-0.127242 | \n", "1 | \n", "0.538919 | \n", "
3 | \n", "turnover | \n", "60.896118 | \n", "-0.286630 | \n", "1 | \n", "0.533321 | \n", "
4 | \n", "disappointed | \n", "70.248131 | \n", "-0.091740 | \n", "5 | \n", "0.522520 | \n", "
5 | \n", "spam | \n", "77.940460 | \n", "-0.000022 | \n", "3 | \n", "0.506429 | \n", "
6 | \n", "bored | \n", "78.131271 | \n", "-0.038932 | \n", "1 | \n", "0.502213 | \n", "
7 | \n", "unhappy | \n", "65.601990 | \n", "-0.024782 | \n", "37 | \n", "0.493910 | \n", "
8 | \n", "frustrated | \n", "73.496033 | \n", "-0.006023 | \n", "5 | \n", "0.486930 | \n", "
9 | \n", "mistake | \n", "66.247879 | \n", "-0.093309 | \n", "3 | \n", "0.470609 | \n", "
10 | \n", "late | \n", "65.971649 | \n", "-0.003873 | \n", "18 | \n", "0.458023 | \n", "
11 | \n", "error | \n", "63.123695 | \n", "-0.065609 | \n", "7 | \n", "0.448412 | \n", "
12 | \n", "faulty | \n", "55.486092 | \n", "-0.239817 | \n", "1 | \n", "0.448232 | \n", "
13 | \n", "angry | \n", "70.865860 | \n", "-0.002967 | \n", "3 | \n", "0.444018 | \n", "
14 | \n", "backlog | \n", "74.808548 | \n", "0.000020 | \n", "1 | \n", "0.442185 | \n", "
15 | \n", "customer | \n", "52.072552 | \n", "-0.006985 | \n", "480 | \n", "0.439737 | \n", "
16 | \n", "lag | \n", "69.912178 | \n", "-0.004650 | \n", "3 | \n", "0.436584 | \n", "
17 | \n", "overpay | \n", "65.573105 | \n", "-0.093846 | \n", "1 | \n", "0.429261 | \n", "
18 | \n", "disconnect | \n", "64.002014 | \n", "-0.010485 | \n", "11 | \n", "0.429104 | \n", "
19 | \n", "incompetence | \n", "73.107452 | \n", "-0.000754 | \n", "1 | \n", "0.427229 | \n", "