{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Machine Learning for Telecom with Naive Bayes" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Introduction" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Machine Learning for CallDisconnectReason is a notebook which demonstrates exploration of dataset and CallDisconnectReason classification with Spark ml Naive Bayes Algorithm.\n" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "from pyspark.sql.types import *\n", "from pyspark.sql import SparkSession\n", "from sagemaker import get_execution_role\n", "import sagemaker_pyspark\n", "\n", "\n", "role = get_execution_role()\n", "\n", "# Configure Spark to use the SageMaker Spark dependency jars\n", "jars = sagemaker_pyspark.classpath_jars()\n", "\n", "classpath = \":\".join(sagemaker_pyspark.classpath_jars())\n", "\n", "spark = SparkSession.builder.config(\"spark.driver.extraClassPath\", classpath)\\\n", " .master(\"local[*]\").getOrCreate()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Using S3 Select, enables applications to retrieve only a subset of data from an object by using simple SQL expressions. By using S3 Select to retrieve only the data, you can achieve drastic performance increases – in many cases you can get as much as a 400% improvement.\n", "\n", "- _We first read a parquet compressed format of CDR dataset using s3select which has already been processed by Glue._\n" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "cdr_start_loc = \"<%CDRStartFile%>\"\n", "cdr_stop_loc = \"<%CDRStopFile%>\"\n", "cdr_start_sample_loc = \"<%CDRStartSampleFile%>\"\n", "cdr_stop_sample_loc = \"<%CDRStopSampleFile%>\"\n", "\n", "df = spark.read.format(\"s3select\").parquet(cdr_stop_sample_loc)\n", "df.createOrReplaceTempView(\"cdr\")" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "22413" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "durationDF = spark.sql(\"SELECT _c13 as CallServiceDuration FROM cdr where _c0 = 'STOP'\")\n", "durationDF.count()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Exploration of Data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "- _We see how we can explore and visualize the dataset used for processing. Here we create a bar chart representation of CallServiceDuration from CDR dataset._" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "import matplotlib.pyplot as plt\n", "durationpd = durationDF.toPandas().astype(int) " ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAXwAAAESCAYAAAD+GW7gAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMi4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvhp/UCwAAHP1JREFUeJzt3XmQVfXd5/H3l+6WNoAii8TQDI0JwQ1ZBOPExK1HgkrUgBpTqUipJYnyuGTRMMkfyTNjFZqnMklMzELGFFjihIhjaSLjxAXjaGIeQQwg5FGwQBsVGgQUEbHp7/xxf91cmtt9b9/1LJ9X1a0+96y/s/w+59zfOfe2uTsiIpJ8/WpdABERqQ4FvohISijwRURSQoEvIpISCnwRkZRQ4IuIpIQCX0QkJRT4IiIpocAXEUmJ+loXAGDYsGHe3Nxc62KIiMTKypUrt7v78ELHj0TgNzc3s2LFiloXQ0QkVsxsc1/GV5OOiEhKKPBFRFJCgS8ikhKRaMMXkYyPPvqI1tZW9u3bV+uiSIQ0NjbS1NREQ0NDSfNR4ItESGtrK4MGDaK5uRkzq3VxJALcnR07dtDa2sqYMWNKmldBTTpmtsnM1pjZS2a2IvQbYmaPm9mr4e8xob+Z2V1mtsHMVpvZ5JJKKJIi+/btY+jQoQp76WJmDB06tCyf+vrShn+uu0909ynh/TzgSXcfCzwZ3gNcAIwNrznAr0oupUiKKOylu3IdE6XctL0EWBS6FwGXZvW/1zOeBwab2XElLEdERMqg0DZ8B/5sZg78xt0XACPc/a0w/G1gROgeCbyRNW1r6PcWItInzfMeLev8Nt1xUd5x3n77bW655RZeeOEFBg8ezIgRI/jpT3/Kpz/96ZzjDxw4kD179rBp0yZmzJjB2rVr2bt3L9dddx2rV6/G3Rk8eDCPPfYYAwcOLHkdLrzwQu6//34GDx7cp+kWLlzIrbfeSlNTE3v27OH444/nBz/4AZ/97GdLLhPArl27uP/++7nhhhsAePPNN7nppptYunRpWeZfDoVe4X/O3SeTaa6Za2ZnZQ/0zH9C79N/QzezOWa2wsxWtLW19WXSROmpQpe7oosUwt350pe+xDnnnMPGjRtZuXIl8+fPZ+vWrX2az89+9jNGjBjBmjVrWLt2Lffcc0+fnjA5cOBAj8OWLVvW57Dv9OUvf5lVq1bx6quvMm/ePGbOnMn69esLnr69vb3HYbt27eKXv/xl1/tPfOITJYf96tZdJU3fXUGB7+5bwt9twEPA6cDWzqaa8HdbGH0LMCpr8qbQr/s8F7j7FHefMnx4wT8FISIVtHz5choaGvjGN77R1W/ChAlMmjSJlpYWJk+ezPjx43n44Yd7nc9bb73FyJEju96PGzeO/v37A3Dfffdx+umnM3HiRL7+9a93hfvAgQP59re/zYQJE5g/fz6XX3551/RPP/00M2bMADI/xbJ9+3YA7r33Xk499VQmTJjA1772NQDa2tqYNWsWU6dOZerUqTz33HM5y3juuecyZ84cFixYAMA555zT9RMv27dvp/P3vRYuXMjFF1/MeeedR0tLC3v27Mm5LebNm8fGjRuZOHEit956K5s2beKUU04BMjfjr776asaPH8+kSZNYvnx517xnzpzJ9OnTGTt2LLfddluv27VUeZt0zGwA0M/d3wvd04D/BjwCzAbuCH87j4BHgH8xs98DnwF2ZzX9iEiErV27ltNOO+2w/o2NjTz00EMcddRRbN++nTPOOIOLL764x5uJ11xzDdOmTWPp0qW0tLQwe/Zsxo4dy/r161myZAnPPfccDQ0N3HDDDSxevJirrrqK999/n8985jP8+Mc/pr29neOPP57333+fAQMGsGTJEq688spDlvHyyy9z++2389e//pVhw4bxzjvvAHDzzTfzzW9+k8997nO8/vrrfOELX+jxKn7y5Mn85je/ybtdXnzxRVavXs2QIUNob2/PuS3uuOMO1q5dy0svvQTApk2buqa/++67MTPWrFnDP//5T6ZNm8Yrr7wCwEsvvcSqVavo378/48aN48Ybb2TUqFG5ilGyQtrwRwAPhR1bD9zv7o+Z2QvAH8zsWmAzcEUYfxlwIbAB2AtcXfZSi0hVuTvf+973eOaZZ+jXrx9btmxh69atfPzjH885/sSJE3nttdf485//zBNPPMHUqVP529/+xpNPPsnKlSuZOnUqAB988AHHHnssAHV1dcyaNQuA+vp6pk+fzh//+Ecuu+wyHn30UX70ox8dsoynnnqKyy+/nGHDhgEwZMgQAJ544gnWrVvXNd67777Lnj17elyvQpx//vld8+9pW/Tm2Wef5cYbbwTghBNOYPTo0V2B39LSwtFHHw3ASSedxObNm2sX+O7+GjAhR/8dQEuO/g7MLUvpRFKied6jBd1QLdXq1l2c2tRz+/fJJ5+cs9158eLFtLW1sXLlShoaGmhubs77XPjAgQOZOXMmM2fOpF+/fixbtowjjjiC2bNnM3/+/MPGb2xspK6uruv9lVdeyS9+8QuGDBnClClTGDRoUEHr2NHRwfPPP09jY2PecVetWsWJJ54IZE4yHR0dAIet24ABA7q6i9kWvels6oLMSa+3+wSl0m/plIFusEpSnHfeeXz44Ydd7doAq1evZvPmzRx77LE0NDSwfPlyNm/u/Vd5n3vuOXbu3AnA/v37WbduHaNHj6alpYWlS5eybVvmlt8777zT47zOPvtsXnzxRX77298e1pzTWdYHHniAHTt2dM0LYNq0afz85z/vGq+ziaW7v/zlLyxYsIDrrrsOyNwbWLlyJUCvN1t3796dc1sMGjSI9957L+c0n//851m8eDEAr7zyCq+//jrjxo3rcRmVop9WkIqp1lVruUSxvOUuT76nPsyMhx56iFtuuYU777yTxsZGmpub+eEPf8hNN93E+PHjmTJlCieccEKv89m4cSPXX3897k5HRwcXXXQRs2bNwsy4/fbbmTZtGh0dHTQ0NHD33XczevTow+ZRV1fHjBkzWLhwIYsWLTps+Mknn8z3v/99zj77bOrq6pg0aRILFy7krrvuYu7cuZx66qm0t7dz1lln8etf/xqAJUuW8Oyzz7J3717GjBnDgw8+2HWF/53vfIcrrriCBQsWcNFFPW/3r371q3zxi19k/PjxHH/iqV3bYujQoZx55pmccsopXHDBBcyde7Ch44YbbuD6669n/Pjx1NfXs3DhwkOu7KvG3Wv+Ou200zzORn/3T2WftpR5RkXc1qGW5e1c9rp16yq6nH+8sbOi80+bSm/P7PnnOjaAFd6HrFWTTpD0Zpmkr5+I5JeKwFfYSS309bjTcSqVlorAF4kTL/BRwUoo9zc7pTzKdUykLvB1FZVMSdmvjY2N7Nixo6ahnwTFnriieMLz8Hv4hTxmmo+e0pFIPp2SVk1NTbS2tlKp35fauvMD1r93ZNHD46LY9ejrdJXeXlt3fkDDe0d2/cerUinwRSKkoaGh5P9q1JsL8pzc8w2Pi2LXo6/TVXp7lXv+iWjSKfXjfFKaA0REepOIwI+SpJ88kr5+IkkWucBXoIiIVEbkAl9Eok0XZfGlwBeJoGJDVWEsvVHglyjuFSzu5ZfDaZ/GT7X2mQJfRBJBJ7r8FPgiZabgkahS4ItI0XRyixcFvohISijwq0BXQSISBQp8SZ0onICjUAZJHwW+pIqCtna07XvXPO/Rim8jBb5IARRWkgQKfBGRlFDgl5GuAkUkyiIf+ArRDG2H4mi7FU/bLnkiH/hSXarkEiU6HstLgS+JpbCQQvV0rCTtGFLgS03FoUKVUsY4rF8ucS239E6BL5GkwBEpPwV+kRRIEiU6HuOlVvtLgS8ikhIFB76Z1ZnZKjP7U3g/xsz+bmYbzGyJmR0R+vcP7zeE4c2VKbqkga5cS6dtKJ36coV/M7A+6/2dwE/c/VPATuDa0P9aYGfo/5MwnoiI1FhBgW9mTcBFwP8M7w04D1gaRlkEXBq6LwnvCcNbwvgiutqUopTjuNGxV/gV/k+B24CO8H4osMvd28P7VmBk6B4JvAEQhu8O4+elHSIiUjl5A9/MZgDb3H1lORdsZnPMbIWZrWhrayvnrCuiFicjnQAlTuJ4vMaxzKUo5Ar/TOBiM9sE/J5MU87PgMFmVh/GaQK2hO4twCiAMPxoYEf3mbr7Anef4u5Thg8fXtJK5JK2HSlSCdWsR2n5tmst5Q18d/+v7t7k7s3AlcBT7v5VYDlwWRhtNvBw6H4kvCcMf8rdvayljggdiOmi/S3lVu1jqpTn8L8LfMvMNpBpo78n9L8HGBr6fwuYV1oRRUTiLwoXDPX5RznI3Z8Gng7drwGn5xhnH3B5GcomIlIWUQjbKEj8N221o0VEMhIf+KXQyUKkulTnKkuB30dJOCCTsA4i0ncKfBGRlEhV4OvKVkTSLFWBn4tOAiLJojrds0gGvnaYiEj5RTLwRUSk/BT4Ekv6FCjSd7EKfFXy5NK+Fam8WAV+0in0RKSSFPjSZ5U8MemkJ1I5CnwRAXSyTQMFfkSosknaVOuYV906SIEvUgEKGYkiBX6NlBoISQyUJK6TSJQo8EVE+iDfhUmUL1wU+CIiKZG4wC/k7BrlM3BSaBuXh7ajlFPiAl+Ko2BJJ+33dFHgR5wqZOFqta20jyQuFPgiEaMTiFSKAl9EpMI6T+K1Ppkr8CWyal05RJJGgS8ikhIKfBGRlFDgi4ikhAJfUkn3BySNFPgiIhFXrgsUBb6IFCRKn4pqXZZaL79YCvwqiutBkkuS1iVbUtdL4q8cx6YCX0QkJRT4IlJW+pQUXXkD38wazezfzewfZvaymf1r6D/GzP5uZhvMbImZHRH69w/vN4ThzZVdBakUVdz40z6UbIVc4X8InOfuE4CJwHQzOwO4E/iJu38K2AlcG8a/FtgZ+v8kjFcxaT+g077+hdJ26pm2TXrkDXzP2BPeNoSXA+cBS0P/RcClofuS8J4wvMXMrGwlTiFVSJH4iWK9LagN38zqzOwlYBvwOLAR2OXu7WGUVmBk6B4JvAEQhu8Ghpaz0FEUxZ0rUiuqD9FUUOC7+wF3nwg0AacDJ5S6YDObY2YrzGxFW1tbqbMTkQTRCaMy+vSUjrvvApYD/xkYbGb1YVATsCV0bwFGAYThRwM7csxrgbtPcfcpw4cPL7L4Uk6qZBJl3Y9PHa99V8hTOsPNbHDoPhI4H1hPJvgvC6PNBh4O3Y+E94ThT7m7l7PQIuVSbGgobCSOCrnCPw5YbmargReAx939T8B3gW+Z2QYybfT3hPHvAYaG/t8C5pW/2CLxohNEukR1f9fnG8HdVwOTcvR/jUx7fvf++4DLy1I6EREpG33TtsqieuYXkeRT4IuIpIQCX0QkJRT4IiIxUkqzsAJfRKQIcbwfl8jAj+OOECmGjnXpi0QGfjmpQolIUijwpWg6GdaGtrsUS4EvIlWnk1ZtxCLwdXDUjra9SHLEIvDTpLeAVfhqG0j0xOmYVOBL7MWpwmWLa7klvhT4IlI1OsnVlgJfyiaNlTmu6xzXcpdTGreBAl9SL40VX9JJgS8iEiGVvABR4CdcWq9ek7DecVmHSpUzLusfJwp8EZECJOEEpMAXkUSEmeSnwJeqU7iI1IYCP4IUiCLJV4t6rsAXSQhdKEg+CvyEUGUXkXwU+CJyiEpfPOjiJL9KbSMFfgqogokIKPBjR+EttaDjLhkU+BWkSiJJpuM7fhT4FRLnyhDnsoukRTH1NFGBr6ASiZZC6qTqbfUkKvBFRKRnCnwRkZRQ4IuIpETewDezUWa23MzWmdnLZnZz6D/EzB43s1fD32NCfzOzu8xsg5mtNrPJlV4Jiaeott1GtVwipSrkCr8d+La7nwScAcw1s5OAecCT7j4WeDK8B7gAGBtec4Bflb3UIiKBTtCFyxv47v6Wu78Yut8D1gMjgUuARWG0RcClofsS4F7PeB4YbGbHlb3kkjpRr9hRL5/kFqf9VmpZ+9SGb2bNwCTg78AId38rDHobGBG6RwJvZE3WGvqJiEgNFRz4ZjYQeBC4xd3fzR7m7g54XxZsZnPMbIWZrWhra8s5Ti3OvHE620v86XiTaioo8M2sgUzYL3b3/x16b+1sqgl/t4X+W4BRWZM3hX6HcPcF7j7F3acMHz682PKnWhzCIg5lrDVtI6mWQp7SMeAeYL27/4+sQY8As0P3bODhrP5Xhad1zgB2ZzX9iJRNOYNSoStpUF/AOGcCXwPWmNlLod/3gDuAP5jZtcBm4IowbBlwIbAB2AtcXdYS15BCIXq0TwqXhm2VhnUsRd7Ad/dnAethcEuO8R2YW2K5ak4HjogkTay/aatQFhEpXKwDX0SkN7ooPJQCX6SKFEBSSwr8HFQpy0/bVCpJx1dhFPgiIimhwK+wpF95JH39KiFu2yxu5a2VOGwnBb6ISEoo8BMoDlcakixJOuaStC7dKfBFRMok6icLBb5ITEU9XCR6FPhFUEUTkThS4ItIbOhiqzQK/BqK28HbW3njti7VFIVtE4UySO0p8EVEUkKBLyKSEokOfH2MFZFCpSEvEh34IiJykAJfai4NV1alisr/79W+ijcFvkROlEIlSmURKVVkAl8VS0SksiIR+Gu27K51EWKt1ifLWi9fRAoTicCvhTSEVBrWUUQKF8vAV5BVn7a5SPzFMvAlPwW0iHSnwBeRstGFRrQp8EVEUkKBLyKSEgp8kQRS00p1xWV7K/BTpPtBGZeDVCQq4l5nUhP4cd9RIiKlSk3gi4iknQI/AvTpI5q0XyRpFPgJo5BKNu1fKUXewDez35nZNjNbm9VviJk9bmavhr/HhP5mZneZ2QYzW21mkytZeBGRnujkeLhCrvAXAtO79ZsHPOnuY4Enw3uAC4Cx4TUH+FV5iikikkzVPDHlDXx3fwZ4p1vvS4BFoXsRcGlW/3s943lgsJkdV67CiohI8Yptwx/h7m+F7reBEaF7JPBG1nitoZ+UUbU/quqjsUgylHzT1t0d8L5OZ2ZzzGyFma04sPfwf4CikBERKa9iA39rZ1NN+Lst9N8CjMoaryn0O4y7L3D3Ke4+pe5jRxdZjHjSySwZtB8lbooN/EeA2aF7NvBwVv+rwtM6ZwC7s5p+Ik8VWKJIx6WUS32+EczsfwHnAMPMrBX4AXAH8AczuxbYDFwRRl8GXAhsAPYCV1egzCISAzpRRU/ewHf3r/QwqCXHuA7MLbVQSaSDv3pqua2rtWwdT8Vrnvcom+64KHHLKoS+aSsikhKxCXxd0YiIlCY2gS/ppZO9SHko8CVRqnFy0AlI4iq2gR/lShflsomUg47xeIpt4IukTZJDNsnrFiUKfBGRPJJyQlLgi4ikhAJfRFInDlfslSijAl9iIQ4VVKQSynnsK/AlJwWsSPIo8Aug8BORJEh14CvIRSRNUh34IiJposAXEUkJBb6ISEoo8CXxdK9GoqSWx6MCX0QkJRT4Ijkk4VNBEtZBcit23yrwRURSQoEvItKLJH1SUuCLSM0kKUzjQIEvIpGik0DlKPBTppaVKQkVuZR1SML6S7zFLvBVaXqmbSNpoOO8eLELfBGRSkvqSUWBLxUX1coTxXIVU6YorodEU2ICXwe95KNjRNIuMYEvItEVx5NtHMucjwJfyiqJlUQkKRT4Iimlk3P6KPBFRCooSifWigS+mU03s/8wsw1mNq8Sy5DoitIBLiIHlT3wzawOuBu4ADgJ+IqZnVTu5Uhpah3KtV6+VE+U9nWUylILlbjCPx3Y4O6vuft+4PfAJRVYjqRUFCttFMuUBtrufVOJwB8JvJH1vjX0ExGRGqqv1YLNbA4wp/P95jtn1KooIiJxdVpfRq5E4G8BRmW9bwr9DuHuC4AFAGbmFSiHiIhkqUSTzgvAWDMbY2ZHAFcCj1RgOSIi0gdlv8J393Yz+xfg/wJ1wO/c/eVyL0dERPrG3GvfmmJmHbUug4hIHLl7wS01kQh8ERGpPP20gohISijwRURSourP4Ycnd35M5nHNN4DJwEQyT/LMdvePql0mEZE0qHobvpm1c/CTRQeZJ3nIeg9gQHbBrJdZOplHQW9x9+d7We5fyXxJoaGAeUoyPAh8RRcRUk5m9gXgBDLZs8fdf5fVfzKwFTgZ2O7u80P/mWR+W2wc0Egm67YA/wCagU8CR4ZFeHi1AoOAgcBbwHFk8utAmHaZu9/Wp7LXIPB1l1iq7RZqXzlXh2nHcfCig6zhg8l84p4V5jEZGAD80d3/3q3Mw4CXgRFZZZ6c3R3GWeDu/1HAtF8BTgXGkLkAawf+Fv4eB3yKg60BB8LfLcBRoaw7gKFhnAPAM2HaT4Z5dl5cdQD7gTZgeBj3PeDjYfh+4N9D/6YwPVnTfgRsAkYD7wBXAFPI7L/XgfvcfUdY32tC/+3AXuBNd3/AzG4Gzg3r9J+y1mlJ2DYncvCLo4OIjx3uPizfSAp8EZEEcPe8rRa6aSsikhIKfBGRlFDgi4jEX0G/VlCLn0duJ3NjKF97k5N5qmYD8BTwAHBnBcvl3f7C4WX0rP59vRfhOebXQXEn3d7K1dtyc5WhWtNCeda3p/2jp66k2to5NAs6byzXc/D4ryNzM9qBPcCq0D2UzI36Y8LwfwL/D5hE5iZyIzCVzKPrbwMTgI3Am2RuNo8I8/88sNXdO2+m9yopN227z1OhIJ06yFQgozaV89NknkjpD+wm8zTP2jCvZjJPhmT/nHj2Ezydjyx3lvmjbuvRL4zjHDyZdoTh/bKmtTBt52PQ3bfDvlC2bWSegmki8xRLO7CTzFNHa8n8I6OBwCdCuTcBHwvbZj3wAXB8GO/DMO+nwzY5IWzrkWSe/FlH5imY4cCaUIaxZJ7YceBdMk8KvUPmaZ3jQrmOCdu8Lsyv88q2czsQ1v2DrO3TuU3qwnI+JPN0UOcTQRPD+uwN8/g/Ydxm4L9wMDeimh8vAPvc/ax8I9Yy8Dt3moiIlCjqT+ko7EVEqkg3bUVEUqJWga+vuouIlE9BT+lUPfDd3dz9iGovV0SkAryP3R0c/DmOjl76e7f+PU17AFjt7gU1kdesSSfcYEjCzyyUsg5JWP++qMX69rVCVrJy9jatU/vjIdf6dPY/wOHrk6u7ktP2tK2K2a/FlvkAsDfk12NknuZ6L2s++7O627Pm05HV//wwLLtf96cJe9o+52ZN60A/d59AgWrdhh/Vx5z6opR1SML690Ut1veDClXO7t/JKKRy9laxO8OkncwPfRmZRyE7y5vrdSDr1R5eG8K0rxQwbfd+HwGnZ80rez27Z0X3oHyfzGOqxUz7BjA+x7Tebdpc22p2+Bd/bWF9O8fZn9XdfV07l/vfyTwK2ttyrVt3HfCx8LThdDKPlg4K4/QDjsjqrg/j98t61QFPkHn8tj70sxyvfjledWQec+2c1gAzsw4zi+Zz+F0L1o+oiYiUS0chzTq1vsIXEZESRb4NP8sBMt+Kuybc0M31sb/zI5eIiBSpZk06xahwM1B2m2wlxq+EWpWhe1u0SBp1tpv35UuknfeLOu8l9O82DHquW53TXg3cl90/3MfIK26Bfz3wyzyjKYyku+ybhNWunB3dpitkWjhY5mK+kV7stJ03sqHvZa7VtFBc8MLBG9Xdl9vTMp3MbwsNIBOyRxIzsQr83ugmsPSkkN8YiTszux+4pNjp3X1Akct9lsyPxRXjgLsPLnLaijOznWR+GK6cPuTwk1pZFHKcxyrwFepSZRWrnCLlFvUfTxOJOoW9JIoCX8ptH7X/xqiI5FCL/3gVVQV9Uy2Hd8lsx0YOfgPwk8C/AZdy8B9a9AvD1gCTOfRke8Dd681sP4f+A4zsstVB5mObmf0rcD1wdLfxOv/pQ/cr01JP7J3rlf2tw+7DLZTvSOhq2z0ta5zuv5+0v4f+xdLFi8RBvh85q+hxHKs2fJGemNlu4Kgyz7amlbPK9HRbft3DMkrhucvdh+YbSYEviWdmBf10rEgOsTkJFnLTVk06kgaxqbQilaTAl0TQI7si+SWpDVJERHqhK3wRkeIU+2RfJbxeyEi6aSsikhJq0hERSQkFvohISijwRURSQoEvIpISCnwRkZT4/7PRVCP4zVgoAAAAAElFTkSuQmCC\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "durationpd.plot(kind='bar',stacked=True,width=1)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "- _We can represent the data and visualize with a box plot. The box extends from the lower to upper quartile values of the data, with a line at the median._" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAXoAAAD8CAYAAAB5Pm/hAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMi4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvhp/UCwAAD/NJREFUeJzt3Xus33V9x/HnSyreZwscO9ZS67QLY14ATxiK2QScEdwGZniLG9V0dBfcWHRzYJYwkl1wXpia6SziVowboGBoCFFIATeXgBS530YlEOiAFigoElDwvT/Op/O3ek7P7/ScXw98+nwkJ7/P9/P5fL/f9+8XeP2+/fxuqSokSf16znwXIEkaLYNekjpn0EtS5wx6SeqcQS9JnTPoJalzBr0kdc6gl6TOGfSS1LkF810AwD777FPLly+f7zIk6VnlmmuuebCqxqab94wI+uXLl7Nhw4b5LkOSnlWS3D3MPJduJKlzBr0kdc6gl6TOGfSS1DmDXpI6N1TQJ7kryY1JrkuyofXtleTSJHe020WtP0k+k2RjkhuSHDzKOyBJ2rGZXNEfXlUHVtV42z4ZWF9VK4D1bRvgKGBF+1sNfH6uipUkzdxslm6OAda29lrg2IH+s2vClcDCJPvO4jySpFkY9gNTBVySpIAvVNUaYHFV3dfG7wcWt/YS4J6Bfe9tffcN9JFkNRNX/Cxbtmznqpdm6LTTTtsl5zn11FN3yXmkYQwb9G+qqk1JXgZcmuS2wcGqqvYkMLT2ZLEGYHx83F8o1y4x4wD+ZODD/uepZ7ehlm6qalO73Qx8HTgEeGDbkky73dymbwL2G9h9aeuTJM2DaYM+yYuSvGRbG3grcBOwDljZpq0ELmztdcDx7d03hwKPDizxSJJ2sWGWbhYDX0+ybf6/VdU3klwNnJdkFXA38K42/2LgaGAj8DjwgTmvWpI0tGmDvqruBF43Sf9DwJGT9Bdw4pxUJ0maNT8ZK0mdM+glqXMGvSR1zqCXpM4Z9JLUOYNekjpn0EtS5wx6SeqcQS9JnTPoJalzBr0kdc6gl6TOGfSS1DmDXpI6Z9BLUucMeknqnEEvSZ0z6CWpcwa9JHXOoJekzhn0ktQ5g16SOmfQS1LnDHpJ6pxBL0mdM+glqXMGvSR1zqCXpM4Z9JLUOYNekjo3dNAn2SPJtUkuatuvSHJVko1Jzk2yZ+t/Xtve2MaXj6Z0SdIwZnJFfxJw68D2x4AzqupVwFZgVetfBWxt/We0eZKkeTJU0CdZCrwd+GLbDnAE8LU2ZS1wbGsf07Zp40e2+ZKkeTDsFf0/Ah8BftK29wYeqaqn2va9wJLWXgLcA9DGH23zJUnzYNqgT/KbwOaqumYuT5xkdZINSTZs2bJlLg8tSRowzBX9YcBvJ7kLOIeJJZtPAwuTLGhzlgKbWnsTsB9AG38p8ND2B62qNVU1XlXjY2Njs7oTkqSpTRv0VXVKVS2tquXAe4DLqup9wOXAcW3aSuDC1l7Xtmnjl1VVzWnVkqShzeZ99H8JfCjJRibW4M9q/WcBe7f+DwEnz65ESdJsLJh+yk9V1RXAFa19J3DIJHOeAN45B7VJkuaAn4yVpM4Z9JLUOYNekjpn0EtS52b0Yqz0TLLXSXux9fGtIz1H7Q85YfTf4LHohYt4+NMPj/w82j0Z9HrW2vr4VurM0X9EY1d8CGRXPJlo9+XSjSR1zqCXpM4Z9JLUOYNekjpn0EtS5wx6SeqcQS9JnTPoJalzBr0kdc6gl6TOGfSS1DmDXpI6Z9BLUucMeknqnEEvSZ0z6CWpcwa9JHXOoJekzhn0ktQ5g16SOmfQS1LnDHpJ6pxBL0mdM+glqXMGvSR1btqgT/L8JN9Jcn2Sm5Oc1vpfkeSqJBuTnJtkz9b/vLa9sY0vH+1dkCTtyDBX9E8CR1TV64ADgbclORT4GHBGVb0K2AqsavNXAVtb/xltniRpnkwb9DXhsbb53PZXwBHA11r/WuDY1j6mbdPGj0ySOatYkjQjQ63RJ9kjyXXAZuBS4HvAI1X1VJtyL7CktZcA9wC08UeBvSc55uokG5Js2LJly+zuhSRpSkMFfVU9XVUHAkuBQ4D9Z3viqlpTVeNVNT42Njbbw0mSpjCjd91U1SPA5cAbgIVJFrShpcCm1t4E7AfQxl8KPDQn1UqSZmyYd92MJVnY2i8AfgO4lYnAP65NWwlc2Nrr2jZt/LKqqrksWpI0vAXTT2FfYG2SPZh4Yjivqi5KcgtwTpK/Aa4FzmrzzwK+nGQj8DDwnhHULUka0rRBX1U3AAdN0n8nE+v12/c/AbxzTqqTJM2an4yVpM4Z9JLUOYNekjpn0EtS5wx6SeqcQS9JnTPoJalzBr0kdc6gl6TOGfSS1DmDXpI6Z9BLUucMeknq3DBfUyw9M33x4+SLn5jvKubIx+HM+a5BvTLo9ez1+39BndnHb9rkhAB/Pt9lqFMu3UhS5wx6SeqcQS9JnTPoJalzBr0kdc6gl6TOGfSS1DmDXpI6Z9BLUucMeknqnEEvSZ0z6CWpcwa9JHXOoJekzhn0ktS5aYM+yX5JLk9yS5Kbk5zU+vdKcmmSO9rtotafJJ9JsjHJDUkOHvWdkCRNbZgr+qeAD1fVAcChwIlJDgBOBtZX1QpgfdsGOApY0f5WA5+f86olSUObNuir6r6q+m5r/wC4FVgCHAOsbdPWAse29jHA2TXhSmBhkn3nvHJJ0lBmtEafZDlwEHAVsLiq7mtD9wOLW3sJcM/Abve2PknSPBg66JO8GDgf+LOq+v7gWFUVMKMf70yyOsmGJBu2bNkyk10lSTMwVNAneS4TIf+VqrqgdT+wbUmm3W5u/ZuA/QZ2X9r6/p+qWlNV41U1PjY2trP1S5KmMcy7bgKcBdxaVZ8aGFoHrGztlcCFA/3Ht3ffHAo8OrDEI0naxRYMMecw4PeAG5Nc1/o+CpwOnJdkFXA38K42djFwNLAReBz4wJxWLEmakWmDvqq+DWSK4SMnmV/AibOsS5I0R/xkrCR1zqCXpM4Z9JLUOYNekjpn0EtS5wx6SeqcQS9JnTPoJalzBr0kdc6gl6TOGfSS1DmDXpI6Z9BLUucMeknqnEEvSZ0z6CWpcwa9JHXOoJekzhn0ktQ5g16SOmfQS1LnDHpJ6pxBL0mdM+glqXML5rsAaTZyQkZ6/NofcttITwHAohcuGv1JtNsy6PWsVWfW6E/yyeya80gj5NKNJHXOoJekzhn0ktQ5g16SOmfQS1Lnpg36JF9KsjnJTQN9eyW5NMkd7XZR60+SzyTZmOSGJAePsnhJ0vSGuaL/V+Bt2/WdDKyvqhXA+rYNcBSwov2tBj4/N2VKknbWtEFfVf8BPLxd9zHA2tZeCxw70H92TbgSWJhk37kqVpI0czu7Rr+4qu5r7fuBxa29BLhnYN69rU+SNE9m/WJsVRUw448OJlmdZEOSDVu2bJltGZKkKexs0D+wbUmm3W5u/ZuA/QbmLW19P6Oq1lTVeFWNj42N7WQZkqTp7GzQrwNWtvZK4MKB/uPbu28OBR4dWOKRJM2Dab/ULMm/A28G9klyL3AqcDpwXpJVwN3Au9r0i4GjgY3A48AHRlCzJGkGpg36qnrvFENHTjK3gBNnW5Qkae74yVhJ6pxBL0mdM+glqXMGvSR1zqCXpM4Z9JLUOYNekjpn0EtS5wx6SeqcQS9JnTPoJalzBr0kdc6gl6TOGfSS1DmDXpI6Z9BLUucMeknqnEEvSZ0z6CWpcwa9JHXOoJekzhn0ktQ5g16SOmfQS1LnDHpJ6pxBL0mdM+glqXMGvSR1zqCXpM4Z9JLUOYNekjo3kqBP8rYktyfZmOTkUZxDkjScOQ/6JHsA/wQcBRwAvDfJAXN9HknScEZxRX8IsLGq7qyqHwHnAMeM4DySpCEsGMExlwD3DGzfC/zq9pOSrAZWAyxbtmwEZUg/67TTTpvhHn8NM94HTj311BnvI43KKIJ+KFW1BlgDMD4+XvNVh3YvBrB2R6NYutkE7DewvbT1SZLmwSiC/mpgRZJXJNkTeA+wbgTnkSQNYc6XbqrqqSQfBL4J7AF8qapunuvzSJKGM5I1+qq6GLh4FMeWJM2Mn4yVpM4Z9JLUOYNekjpn0EtS51I1/59VSrIFuHu+65AmsQ/w4HwXIU3h5VU1Nt2kZ0TQS89USTZU1fh81yHNhks3ktQ5g16SOmfQSzu2Zr4LkGbLNXpJ6pxX9JLUOYNeI5Xk55Ock+R7Sa5JcnGSX9rB/Mfa7fIkN7X2C5N8JcmNSW5K8u0kL56j+i5OsnAn9nt/ki1Jrk1yR5JvJnnjXNTUjr8wyR8PbP9Ckq/N1fG1ezHoNTJJAnwduKKqXllVrwdOARbP8FAnAQ9U1Wuq6tXAKuDHM6hjj6nGquroqnpkhvVsc25VHVRVK4DTgQuS/PIM6trRlwouBP4v6Kvqf6rquJ2sU7s5g16jdDjw46r6520dVXU9cG2S9Um+267Sp/tN4X0Z+PGaqrq9qp4ESPK7Sb6T5LokX9gW6kkeS/LJJNcDpyT56rb9k7w5yUWtfVeSfVr7+CQ3JLk+yZdb31iS85Nc3f4Om6zAqrqciRduV7f9rkgy3tr7JLmrtd+fZF2Sy4D1SV48xWNxOvDKdr8+vt2/cJ6f5F/a/GuTHD5w7AuSfKP9K+MfpnlctZuYt58S1G7h1cA1k/Q/Abyjqr7fQvbKJOtq6ncGfAm4JMlxwHpgbVXd0a6e3w0cVlU/TvI54H3A2cCLgKuq6sPtyvnOJC+qqh+2fc4ZPEGSXwH+CnhjVT2YZK829GngjKr6dpJlTPzOwlRX7d8F/mCIx+Vg4LVV9XCr7WceC+Bk4NVVdWCrb/nA/icCVVWvSbJ/e2y2LYcdCBwEPAncnuSzVTX4G87aDRn0mg8B/i7JrwE/YeIH5RcD9082uaquS/KLwFuBtwBXJ3kDcCTw+rYN8AJgc9vtaeD8tv9TSb4B/FZb53478JHtTnME8NWqerDt83DrfwtwQDs+wM/t4PWBTNG/vUsHjj/VY7EjbwI+2+q8LcndwLagX19VjwIkuQV4OWDQ7+YMeo3SzcBk68rvA8aA17cr8buA5+/oQFX1GHABE+vgPwGOBn7ExNX9KZPs8kRVPT2wfQ7wQeBhYENV/WDI+/Ac4NCqemKwcyD4Bx0E3NraT/HTpdHt79sPB9ozfiym8eRA+2n8f1y4Rq/Rugx4XpLV2zqSvJaJq8zNLdgOb9tTSnJYkkWtvSdwABNfgrceOC7Jy9rYXkmmOta3mFgyOYHtlm0Gan1nkr23Hav1XwL8yUAtB05R468zsT5/Zuu6i4l/bcDkT3bbvJTJH4sfAC+ZYp//ZOIJgrZkswy4fQfn0G7OoNfItDX3dwBvycTbK28G/p6Jn5kcT3IjcDxw2zSHeiXwrTb/WmADcH5V3cLEuvolSW4ALmXihdvJankauAg4qt1uP34z8LftPNcDn2pDf9pqvaEthfzhwG7vbi+W/jfwUeB3qmrbFf0ngD9Kci0T34A5la9M9lhU1UPAf2Xi7aQf326fzwHPafucC7x/24vT0mT8ZKwkdc4reknqnEEvSZ0z6CWpcwa9JHXOoJekzhn0ktQ5g16SOmfQS1Ln/hcrk/sucqClCgAAAABJRU5ErkJggg==\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "color = dict(boxes='DarkGreen', whiskers='DarkOrange',\n", " medians='DarkBlue', caps='Gray')\n", "\n", "durationpd.plot.box(color=color, sym='r+')" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [], "source": [ "from pyspark.sql.functions import col\n", "durationDF = durationDF.withColumn(\"CallServiceDuration\", col(\"CallServiceDuration\").cast(DoubleType())) " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "- _We can represent the data and visualize the data with histograms partitioned in different bins._" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(array([48., 0., 0., ..., 0., 0., 44.]),\n", " array([ 1. , 1.02226386, 1.04452773, ..., 499.95547227,\n", " 499.97773614, 500. ]),\n", " )" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAXQAAAD8CAYAAABn919SAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMi4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvhp/UCwAADu9JREFUeJzt3FGMXFd9x/HvrzYhNFASJ4tlxVAHYQXloUnQKiQKqiAhKNCK5CGKiBBdVa78AlVQkcBppVaR+hBeCFSqUC1C8QOFhACyFSHANUFVpSqwJgaSmNQmSkQs27tAApQHqOHfhz1Ot+4uM7szs2sffz/S6N5z7pmd/9lMfnt95t5JVSFJOvf93noXIEkaDwNdkjphoEtSJwx0SeqEgS5JnTDQJakTBrokdWJgoCe5MsmhRY+fJ/lgkk1J9ic50raXrEXBkqSlZSU3FiXZABwD3gy8H/hpVd2XZBdwSVV9ZDJlSpIGWWmgvwP4u6q6McnTwFur6niSLcA3q+rK3/X8yy67rLZt2zZSwZJ0vjl48OCPq2pq0LiNK/y57wE+1/Y3V9Xxtn8C2LzUE5LsBHYCvO51r2N2dnaFLylJ57ckzw0zbugPRZNcALwb+MKZx2rhNH/JU/2q2l1V01U1PTU18A+MJGmVVnKVyzuB71TVydY+2ZZaaNu5cRcnSRreSgL9Lv53uQVgHzDT9meAveMqSpK0ckMFepKLgFuALy3qvg+4JckR4O2tLUlaJ0N9KFpVvwQuPaPvJ8DNkyhKkrRy3ikqSZ0w0CWpEwa6JHXCQJekThjoktQJA12SOmGgS1InDHRJ6oSBLkmdMNAlqRMGuiR1wkCXpE4Y6JLUCQNdkjphoEtSJwx0SeqEgS5JnTDQJakTBrokdcJAl6ROGOiS1AkDXZI6MVSgJ7k4ycNJfpDkcJIbkmxKsj/Jkba9ZNLFSpKWN+wZ+ieAr1bVG4GrgcPALuBAVW0HDrS2JGmdDAz0JK8G/hh4AKCqfl1VLwK3AXvasD3A7ZMqUpI02DBn6FcA88A/J3k8yaeSXARsrqrjbcwJYPNST06yM8lsktn5+fnxVC1J+n+GCfSNwJuAT1bVtcAvOWN5paoKqKWeXFW7q2q6qqanpqZGrVeStIxhAv154Pmqeqy1H2Yh4E8m2QLQtnOTKVGSNIyBgV5VJ4AfJbmydd0MPAXsA2Za3wywdyIVSpKGsnHIcX8JfDbJBcAzwJ+z8MfgoSQ7gOeAOydToiRpGEMFelUdAqaXOHTzeMuRJK2Wd4pKUicMdEnqhIEuSZ0w0CWpEwa6JHXCQJekThjoktQJA12SOmGgS1InDHRJ6oSBLumck3uz3iWclQx0SeqEgS5JnTDQJakTBrokdcJAl6ROGOiS1AkDXZI6YaBLUicMdOkc5002Os1Al6ROGOiS1ImNwwxK8izwC+A3wKmqmk6yCXgQ2AY8C9xZVS9MpkxJ0iArOUN/W1VdU1XTrb0LOFBV24EDrS1JWiejLLncBuxp+3uA20cvR5K0WsMGegFfT3Iwyc7Wt7mqjrf9E8DmpZ6YZGeS2SSz8/PzI5YrSVrOUGvowFuq6liS1wD7k/xg8cGqqiS11BOrajewG2B6enrJMZKk0Q11hl5Vx9p2DvgycB1wMskWgLadm1SRkqTBBgZ6kouSvOr0PvAO4AlgHzDThs0AeydVpCRpsGGWXDYDX05yevy/VNVXk3wbeCjJDuA54M7JlSlJGmRgoFfVM8DVS/T/BLh5EkVJklbOO0UlqRMGuiR1wkCXpE4Y6JLUCQNdkjphoEtSJwx0SeqEgS5JnTDQJakTBrokdcJAl6ROGOiS1AkDXZI6YaBrzeTerHcJUtcMdEnqhIEuSZ0w0NeJyw+Sxs1Al6ROGOiS1AkDXZI6YaBLUicMdEnqxNCBnmRDkseTPNLaVyR5LMnRJA8muWByZUqSBlnJGfrdwOFF7Y8C91fVG4AXgB3jLEyStDJDBXqSrcCfAJ9q7QA3AQ+3IXuA2ydRoCRpOMOeoX8c+DDw29a+FHixqk619vPA5Us9McnOJLNJZufn50cqdq1404+kc9HAQE/yp8BcVR1czQtU1e6qmq6q6ampqdX8CEnSEDYOMeZG4N1J3gVcCPwB8Ang4iQb21n6VuDY5MqUJA0y8Ay9qu6pqq1VtQ14D/CNqnov8ChwRxs2A+ydWJWSpIFGuQ79I8BfJTnKwpr6A+MpSZK0GsMsubykqr4JfLPtPwNcN/6SJEmr4Z2ikrSEc/FqNwNdkjphoEtSJwx0SeqEgS5JnTDQJakTBrokdcJAl6ROGOiS1AkDXZI6YaCrK+fi3X3SuBjoktQJA12SOmGgS1InDHRJ6oSBLkmdMNAlqRMGuiR1wkCXpE4Y6JLUCQNdkjoxMNCTXJjkW0m+m+TJJPe2/iuSPJbkaJIHk1ww+XIlScsZ5gz9V8BNVXU1cA1wa5LrgY8C91fVG4AXgB2TK1OSNMjAQK8F/9WaL2uPAm4CHm79e4DbJ1KhJGkoQ62hJ9mQ5BAwB+wHfgi8WFWn2pDngcsnU6IkaRhDBXpV/aaqrgG2AtcBbxz2BZLsTDKbZHZ+fn6VZUqSBlnRVS5V9SLwKHADcHGSje3QVuDYMs/ZXVXTVTU9NTU1UrGSpOUNc5XLVJKL2/4rgFuAwywE+x1t2Aywd1JFSpIG2zh4CFuAPUk2sPAH4KGqeiTJU8Dnk/w98DjwwATrlCQNMDDQq+p7wLVL9D/Dwnq6JOks4J2iktQJA12SOmGgS1InDHRJ6oSBLkmdMNAlqRMGuiR1wkCXpE4Y6JLUCQNdkjphoEtSJwx0SeqEgS5JnTDQJakTBrokdcJAl6ROGOiS1AkDXZI6YaBLUicMdEnqhIEuSZ0w0CWpEwMDPclrkzya5KkkTya5u/VvSrI/yZG2vWTy5UqSljPMGfop4ENVdRVwPfD+JFcBu4ADVbUdONDakqR1MjDQq+p4VX2n7f8COAxcDtwG7GnD9gC3T6pISdJgK1pDT7INuBZ4DNhcVcfboRPA5rFWJklakaEDPckrgS8CH6yqny8+VlUF1DLP25lkNsns/Pz8SMVKkpY3VKAneRkLYf7ZqvpS6z6ZZEs7vgWYW+q5VbW7qqaranpqamocNUuSljDMVS4BHgAOV9XHFh3aB8y0/Rlg7/jLkyQNa+MQY24E3gd8P8mh1vfXwH3AQ0l2AM8Bd06mREnSMAYGelX9O5BlDt883nIkSavlnaKS1AkDXZI6YaBLUicMdEnqhIEuSZ0w0CWpEwa6JHXCQJekThjoktQJA12SOmGgS1InDHRJ6oSBLkmdMNAlqRMGuiR1wkCXpE4Y6JLUCQNdkjphoEtSJwx0SeqEgS5JnTDQJakTAwM9yaeTzCV5YlHfpiT7kxxp20smW6YkaZBhztA/A9x6Rt8u4EBVbQcOtLYkaR0NDPSq+jfgp2d03wbsaft7gNvHXJckaYVWu4a+uaqOt/0TwOYx1SNJWqWRPxStqgJqueNJdiaZTTI7Pz8/6stJkpax2kA/mWQLQNvOLTewqnZX1XRVTU9NTa3y5SRJg6w20PcBM21/Btg7nnIkSas1zGWLnwP+A7gyyfNJdgD3AbckOQK8vbUlSeto46ABVXXXModuHnMtkqQReKeoJHXCQJekThjoktQJA12SOmGgS1InDHRJ6oSBLkmdMNAlqRMGuiR1wkCXpE4Y6JLUCQNdkjphoEtSJwx0SeqEgS5JnTDQJakTBrokdcJAl6ROGOiS1AkDXZI6YaBLUicMdEnqxEiBnuTWJE8nOZpk17iKkiSt3KoDPckG4B+BdwJXAXcluWpchUmSVmaUM/TrgKNV9UxV/Rr4PHDbeMqSJK3UKIF+OfCjRe3nW58kaR2kqlb3xOQO4Naq+ovWfh/w5qr6wBnjdgI7W/NK4OlVvNxlwI9XVei5yzmfH5zz+WHUOf9hVU0NGrRxhBc4Brx2UXtr6/s/qmo3sHuE1yHJbFVNj/IzzjXO+fzgnM8PazXnUZZcvg1sT3JFkguA9wD7xlOWJGmlVn2GXlWnknwA+BqwAfh0VT05tsokSSsyypILVfUV4CtjquV3GWnJ5hzlnM8Pzvn8sCZzXvWHopKks4u3/ktSJ876QO/16wWSfDrJXJInFvVtSrI/yZG2vaT1J8k/tN/B95K8af0qX70kr03yaJKnkjyZ5O7W3+28k1yY5FtJvtvmfG/rvyLJY21uD7YLC0jy8tY+2o5vW8/6VyvJhiSPJ3mktbueL0CSZ5N8P8mhJLOtb03f22d1oHf+9QKfAW49o28XcKCqtgMHWhsW5r+9PXYCn1yjGsftFPChqroKuB54f/vv2fO8fwXcVFVXA9cAtya5HvgocH9VvQF4AdjRxu8AXmj997dx56K7gcOL2r3P97S3VdU1iy5RXNv3dlWdtQ/gBuBri9r3APesd11jnN824IlF7aeBLW1/C/B02/8n4K6lxp3LD2AvcMv5Mm/g94HvAG9m4SaTja3/pfc5C1eN3dD2N7ZxWe/aVzjPrSyE103AI0B6nu+ieT8LXHZG35q+t8/qM3TOv68X2FxVx9v+CWBz2+/u99D+aX0t8Bidz7stPxwC5oD9wA+BF6vqVBuyeF4vzbkd/xlw6dpWPLKPAx8Gftval9L3fE8r4OtJDrY75GGN39sjXbaoyamqStLlJUhJXgl8EfhgVf08yUvHepx3Vf0GuCbJxcCXgTeuc0kTk+RPgbmqOpjkretdzxp7S1UdS/IaYH+SHyw+uBbv7bP9DH2orxfoyMkkWwDadq71d/N7SPIyFsL8s1X1pdbd/bwBqupF4FEWlhwuTnL6hGrxvF6aczv+auAna1zqKG4E3p3kWRa+gfUm4BP0O9+XVNWxtp1j4Q/3dazxe/tsD/Tz7esF9gEzbX+GhTXm0/1/1j4Zvx742aJ/xp0zsnAq/gBwuKo+tuhQt/NOMtXOzEnyChY+MzjMQrDf0YadOefTv4s7gG9UW2Q9F1TVPVW1taq2sfD/6zeq6r10Ot/TklyU5FWn94F3AE+w1u/t9f4gYYgPGt4F/CcL645/s971jHFenwOOA//NwvrZDhbWDg8AR4B/BTa1sWHhap8fAt8Hpte7/lXO+S0srDN+DzjUHu/qed7AHwGPtzk/Afxt63898C3gKPAF4OWt/8LWPtqOv3695zDC3N8KPHI+zLfN77vt8eTprFrr97Z3ikpSJ872JRdJ0pAMdEnqhIEuSZ0w0CWpEwa6JHXCQJekThjoktQJA12SOvE//nYp6uWc4sYAAAAASUVORK5CYII=\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "import matplotlib.pyplot as plt\n", "\n", "bins, counts = durationDF.select('CallServiceDuration').rdd.flatMap(lambda x: x).histogram(durationDF.count())\n", "plt.hist(bins[:-1], bins=bins, weights=counts,color=['green'])" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "+------------------+--------------+-------------+--------------------+\n", "| Accounting_ID|Calling_Number|Called_Number|CallDisconnectReason|\n", "+------------------+--------------+-------------+--------------------+\n", "|0x00016E0F5BDACAF7| 9645000046| 3512000046| 16|\n", "|0x00016E0F36A4A836| 9645000048| 3512000048| 16|\n", "|0x00016E0F4C261126| 9645000050| 3512000050| 16|\n", "|0x00016E0F4A446638| 9645000052| 3512000052| 16|\n", "|0x00016E0F4040CE81| 9645000054| 3512000054| 16|\n", "|0x00016E0F4D522D63| 9645000055| 3512000055| 16|\n", "|0x00016E0F5854A088| 9645000057| 3512000057| 16|\n", "|0x00016E0F7DFDA482| 9645000060| 3512000060| 16|\n", "|0x00016E0F65D65F76| 9645000062| 3512000062| 16|\n", "|0x00016E0F2378A4AE| 9645000064| 3512000064| 16|\n", "|0x00016E0F5003BC72| 9645000066| 3512000066| 16|\n", "| 0x00016E0F44702AB| 9645000067| 3512000067| 16|\n", "|0x00016E0F500EED75| 9645000069| 3512000069| 16|\n", "|0x00016E0F38D99C7D| 9645000071| 3512000071| 16|\n", "|0x00016E0F4D14C078| 9645000074| 3512000074| 16|\n", "|0x00016E0F4116E96C| 9645000075| 3512000075| 16|\n", "|0x00016E0F1F5CDE40| 9645000077| 3512000077| 16|\n", "|0x00016E0F1BFE3E2A| 9645000079| 3512000079| 16|\n", "|0x00016E0F7E203CC9| 9645000081| 3512000081| 16|\n", "| 0x00016E0F5B43F12| 9645000084| 3512000084| 16|\n", "+------------------+--------------+-------------+--------------------+\n", "only showing top 20 rows\n", "\n" ] } ], "source": [ "sqlDF = spark.sql(\"SELECT _c2 as Accounting_ID, _c19 as Calling_Number,_c20 as Called_Number, _c14 as CallDisconnectReason FROM cdr where _c0 = 'STOP'\")\n", "sqlDF.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Featurization " ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [], "source": [ "from pyspark.ml.feature import StringIndexer\n", "\n", "accountIndexer = StringIndexer(inputCol=\"Accounting_ID\", outputCol=\"AccountingIDIndex\")\n", "accountIndexer.setHandleInvalid(\"skip\")\n", "tempdf1 = accountIndexer.fit(sqlDF).transform(sqlDF)\n", "\n", "callingNumberIndexer = StringIndexer(inputCol=\"Calling_Number\", outputCol=\"Calling_NumberIndex\")\n", "callingNumberIndexer.setHandleInvalid(\"skip\")\n", "tempdf2 = callingNumberIndexer.fit(tempdf1).transform(tempdf1)\n", "\n", "calledNumberIndexer = StringIndexer(inputCol=\"Called_Number\", outputCol=\"Called_NumberIndex\")\n", "calledNumberIndexer.setHandleInvalid(\"skip\")\n", "tempdf3 = calledNumberIndexer.fit(tempdf2).transform(tempdf2)" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "StringIndexer_16798a0b6913" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from pyspark.ml.feature import StringIndexer\n", "# Convert target into numerical categories\n", "labelIndexer = StringIndexer(inputCol=\"CallDisconnectReason\", outputCol=\"label\")\n", "labelIndexer.setHandleInvalid(\"skip\")" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(16944, 5469)" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from pyspark.sql.functions import rand\n", "\n", "trainingFraction = 0.75; \n", "testingFraction = (1-trainingFraction);\n", "seed = 1234;\n", "\n", "trainData, testData = tempdf3.randomSplit([trainingFraction, testingFraction], seed=seed);\n", "\n", "# CACHE TRAIN AND TEST DATA\n", "trainData.cache()\n", "testData.cache()\n", "trainData.count(),testData.count()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Analyzing the label distribution\n", "\n", "- We analyze the distribution of our target labels using a histogram where 16 represents Normal_Call_Clearing." ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYgAAAEWCAYAAAB8LwAVAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMi4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvhp/UCwAAGTZJREFUeJzt3Xm0XXV99/H3h4RRQIZExCCEImJRHxDjhENRnCfoKopTReURXVXR1gl99AGtWK1dqKi10qJEREWQAiIPilFxRoLMIIJUFASJlFGoBPg+f+zflUPcSU5yc+655L5fa5119v7t6Xtucu/n7Om3U1VIkrSsdcZdgCRpejIgJEm9DAhJUi8DQpLUy4CQJPUyICRJvQwI3acl+W6S/z3Vy7bln5zk0tVdvmd9/y/Jfm34VUl+sAbX/fIk31xT69PMYEBoWkjyqyRPH3cdE5IckmRpklva6xdJPplk64l5qur7VbXTkOv6wsrmq6rnVNXCNVD7/CSVZPbAuo+pqmdOdt2aWQwIafmOrapNgC2AvwYeCJw9GBJrQjr+Lmra8T+lprUkmyc5JcmSJDe04W2WmW2HJD9NcnOSk5JsMbD845P8KMmNSc5Lsseq1lBVS6vqImBfYAnw1rbuPZJcNbCtdya5uu1xXJpkzyTPBt4N7Jvk1iTntXm/m+TQJD8EbgP+oueQV9pey01Jfp5kz4EJ99rjWmYv5Xvt/ca2zScse8gqye5JzmrrPivJ7gPTvpvkH5P8sH2WbyaZs6o/N933GRCa7tYBPgdsB2wL3A58cpl5Xgm8BtgauBM4HCDJPODrwAfo9gLeBnw1ydzVKaSq7gJOAp687LQkOwFvBB7T9jqeBfyqqk4DPki3N7JxVe0ysNjfAgcAmwBX9mzyccAvgTnAwcAJg+G3Ak9p75u1bf54mVq3oPu5HA5sCRwGfD3JlgOzvQx4NfAAYD26n51mGANC01pVXV9VX62q26rqFuBQ4K+Wme3oqrqwqv4AvBd4cZJZwCuAU6vq1Kq6u6pOBxYDz51ESb+lC5tl3QWsD+ycZN2q+lVV/XIl6zqqqi6qqjuramnP9OuAj7U9mGOBS4HnTaL2Cc8DLquqo9u2vwT8HHjBwDyfq6pfVNXtwFeAXdfAdnUfY0BoWkuyUZLPJLkyyc10h082awEw4TcDw1cC69J9694OeFE7vHRjkhuBJ9HtaayuecB/L9tYVZcDbwEOAa5L8uUkD1rJun6zkulX171707wSWNk6h/Eg/nyP5Uq6zzbh2oHh24CN18B2dR9jQGi6eyuwE/C4qtqUew6fZGCeBw8MbwssBX5P9wf46KrabOB1v6r60OoU0k4kvwD4ft/0qvpiVT2JLpgK+PDEpOWscmVdKc9LMvg5t6XbgwH4A7DRwLQHrsJ6f9tqHLQtcPVKltMMY0BoOlk3yQYDr9l0x+dvpzvhugXdsfhlvSLJzkk2At4PHN/OF3wBeEGSZyWZ1da5R89J7hVKMjvJXwJfovtDfFjPPDsleVqS9YH/aTXf3Sb/Dpi/GlcqPQA4MMm6SV4E/CVwapt2LvCSNm0BsM/Ackvatv9iOes9FXhokpe1z7YvsDNwyirWp7WcAaHp5FS6P6wTr0OAjwEb0u0R/AQ4rWe5o4Gj6A6LbAAcCFBVvwH2oruKaAndHsXbGf7//b5JbgVuAk4GrgceXVW/7Zl3feBDrc5r6f64v6tNO669X5/kZ0NuG+BMYMe2zkOBfarq+jbtvcAOwA3A+4AvTixUVbe1+X/YDq09fnClbR3Pp9s7ux54B/D8qvr9KtSmGSA+MEiS1Mc9CElSLwNCktTLgJAk9TIgJEm9Zq98lulrzpw5NX/+/HGXIUn3KWefffbvq2qlXc7cpwNi/vz5LF68eNxlSNJ9SpK+vr/+jIeYJEm9DAhJUi8DQpLUy4CQJPUyICRJvQwISVKvkQVEks8muS7JhQNtWyQ5Pcll7X3z1p4khye5PMn5SXYbVV2SpOGMcg/iKODZy7QdBCyqqh2BRW0c4Dl03RrvSPeM3k+PsC5J0hBGFhBV9T3+/NGMewEL2/BCYO+B9s9X5yd0j5SczGMhJUmTNNV3Um9VVde04WuBrdrwPO79fN6rWts1LCPJAXR7GWy77bajq1TS6N3riapaJVPwLJ+xnaRuD2Nf5U9YVUdU1YKqWjB37kq7EpEkraapDojfTRw6au/XtfarufeD57fBB6hL0lhNdUCcDOzXhvcDThpof2W7munxwE0Dh6IkSWMwsnMQSb4E7AHMSXIVcDDdQ92/kmR/4ErgxW32U4HnApcDtwGvHlVdkqThjCwgquqly5m0Z8+8BbxhVLVIkladd1JLknoZEJKkXgaEJKmXASFJ6mVASJJ6GRCSpF4GhCSplwEhSeplQEiSehkQkqReBoQkqZcBIUnqZUBIknoZEJKkXgaEJKmXASFJ6mVASJJ6GRCSpF4GhCSplwEhSeplQEiSehkQkqReBoQkqZcBIUnqZUBIknoZEJKkXgaEJKmXASFJ6mVASJJ6GRCSpF4GhCSplwEhSeo1loBI8vdJLkpyYZIvJdkgyfZJzkxyeZJjk6w3jtokSZ0pD4gk84ADgQVV9QhgFvAS4MPAR6vqIcANwP5TXZsk6R7jOsQ0G9gwyWxgI+Aa4GnA8W36QmDvMdUmSWIMAVFVVwP/AvyaLhhuAs4GbqyqO9tsVwHz+pZPckCSxUkWL1myZCpKlqQZaRyHmDYH9gK2Bx4E3A949rDLV9URVbWgqhbMnTt3RFVKksZxiOnpwH9V1ZKqWgqcADwR2KwdcgLYBrh6DLVJkppxBMSvgccn2ShJgD2Bi4HvAPu0efYDThpDbZKkZhznIM6kOxn9M+CCVsMRwDuBf0hyObAlcORU1yZJusfslc+y5lXVwcDByzRfATx2DOVIknp4J7UkqZcBIUnqZUBIknoZEJKkXgaEJKmXASFJ6mVASJJ6GRCSpF4GhCSp11ABkWS7JE9vwxsm2WS0ZUmSxm2lAZHktXR9J32mNW0DnDjKoiRJ4zfMHsQb6Lrjvhmgqi4DHjDKoiRJ4zdMQPyxqu6YGGnPbKjRlSRJmg6GCYgzkryb7hnSzwCOA7422rIkSeM2TEAcBCyhe3bD64BTgfeMsihJ0vit8HkQSWYBn6+qlwP/PjUlSZKmgxXuQVTVXcB2SdabonokSdPEME+UuwL4YZKTgT9MNFbVYSOrSpI0dsMExC/bax3AG+QkaYZYaUBU1fumohBJ0vSy0oBIMhd4B/BwYIOJ9qp62gjrkiSN2TCXuR4D/BzYHngf8CvgrBHWJEmaBoYJiC2r6khgaVWdUVWvAdx7kKS13DAnqZe292uSPA/4LbDF6EqSJE0HwwTEB5LcH3gr8AlgU+DvR1qVJGnshrmK6ZQ2eBPw1NGWI0maLoa9ium1wPzB+du5CEnSWmqYQ0wnAd8HvgXcNdpyJEnTxTABsVFVvXPklUiSppVhLnM9JclzR16JJGlaWe4eRJJb6J4cF+DdSf5Id8lrgKqqTaemREnSOCw3IKrKjvkkaQZb6SGmJE9Mcr82/IokhyXZdjIbTbJZkuOT/DzJJUmekGSLJKcnuay9bz6ZbUiSJmeYcxCfBm5LsgvdzXK/BI6e5HY/DpxWVQ8DdgEuoXu06aKq2hFY1MYlSWMyTEDcWVUF7AV8sqo+xSSeC9Huyn4KcCRAVd1RVTe29S9ssy0E9l7dbUiSJm+YgLglybuAVwBfT7IOsO4ktrk9sAT4XJJzkvxHO4S1VVVd0+a5FthqEtuQJE3SMAGxL/BHYP+quhbYBvjIJLY5G9gN+HRVPYruMab3OpzU9liqb+EkByRZnGTxkiVLJlGGJGlFVhoQVXVtVR1WVd9v47+uqs9PYptXAVdV1Zlt/Hi6wPhdkq0B2vt1y6nniKpaUFUL5s6dO4kyJEkrMswexBrV9kJ+k2Sn1rQncDFwMrBfa9uProsPSdKYDNPVxii8CTgmyXrAFcCr6cLqK0n2B64EXjym2iRJDBkQSTYEtq2qS9fERqvqXGBBz6Q918T6JUmTN8yNci8AzgVOa+O7Jjl51IVJksZrmHMQhwCPBW6EP337336ENUmSpoFhAmJpVd20TFvvJaiSpLXHMOcgLkryMmBWkh2BA4EfjbYsSdK4DbMH8Sbg4XQ3y30JuBl4yyiLkiSN30r3IKrqNuD/tJckaYZYaUAk+Rp/fs7hJmAx8Jmq+p9RFCZJGq9hDjFdAdwK/Ht73QzcAjy0jUuS1kLDnKTevaoeMzD+tSRnVdVjklw0qsIkSeM1zB7ExoNPkGvDG7fRO0ZSlSRp7IbZg3gr8IMkvwRCd5Pc37VnOCxc4ZKSpPusYa5iOrXd//Cw1nTpwInpj42sMknSWA3bm+uOwE7ABsAuSZjkMyEkSdPcMJe5HgzsAewMnAo8B/gBYEBI0lpsmJPU+9B1w31tVb0a2AW4/0irkiSN3TABcXtV3Q3cmWRTukeBPni0ZUmSxm2YcxCLk2xGd1Pc2XQ3zf14pFVJksZumKuY/q4N/luS04BNq+r80ZYlSRq3YZ4ot2hiuKp+VVXnD7ZJktZOy92DSLIBsBEwJ8nmdDfJAWwKzJuC2iRJY7SiQ0yvo3vuw4Pozj1MBMTNwCdHXJckacyWGxBV9XHg40neVFWfmMKaJEnTwDAnqT+RZHdg/uD83kktSWu3Ye6kPhrYATgXuKs1F95JLUlrtWHug1gA7FxVyz5VTpK0FhvmTuoLgQeOuhBJ0vQyzB7EHODiJD8F/jjRWFUvHFlVkqSxGyYgDhl1EZKk6WeYq5jOSLIdsGNVfSvJRsCs0ZcmSRqnYbraeC1wPPCZ1jQPOHGURUmSxm+Yk9RvAJ5Idwc1VXUZ8IBRFiVJGr9hAuKPVXXHxEiS2XT3QUiS1mLDBMQZSd4NbJjkGcBxwNdGW5YkadyGCYiDgCXABXQd+J0KvGeyG04yK8k5SU5p49snOTPJ5UmOTbLeZLchSVp9wwTEhsBnq+pFVbUP8NnWNllvBi4ZGP8w8NGqeghwA7D/GtiGJGk1DRMQi7h3IGwIfGsyG02yDfA84D/aeICn0V0tBbAQ2Hsy25AkTc4wAbFBVd06MdKGN5rkdj8GvAO4u41vCdxYVXe28atYzkOJkhyQZHGSxUuWLJlkGZKk5RkmIP6QZLeJkSSPBm5f3Q0meT5wXVWdvTrLV9URVbWgqhbMnTt3dcuQJK3EMF1tvBk4Lslv6Z4q90Bg30ls84nAC5M8F9iA7hGmHwc2SzK77UVsA1w9iW1IkiZphQGRZB1gPeBhwE6t+dKqWrq6G6yqdwHvauvfA3hbVb08yXHAPsCXgf2Ak1Z3G5KkyVvhIaaquhv4VFUtraoL22u1w2El3gn8Q5LL6c5JHDmi7UiShjDMIaZFSf4GOGFNPzSoqr4LfLcNXwE8dk2uX5K0+oY5Sf06urun70hyc5Jbktw84rokSWM2THffm0xFIZKk6WWY7r6T5BVJ3tvGH5zEQ0GStJYb5hDTvwJPAF7Wxm8FPjWyiiRJ08IwJ6kfV1W7JTkHoKpusCM9SVr7DbMHsTTJLNozIJLM5Z4uMiRJa6lhAuJw4D+BByQ5FPgB8MGRViVJGrthrmI6JsnZwJ50XW3sXVWXrGQxSdJ93HIDIskGwOuBh9A9LOgzA72tSpLWcis6xLQQWEAXDs8B/mVKKpIkTQsrOsS0c1U9EiDJkcBPp6YkSdJ0sKI9iD91yuehJUmaeVa0B7HLQJ9LATZs4wGqqjYdeXWSpLFZbkBU1aypLESSNL0Mcx+EJGkGMiAkSb0MCElSLwNCktTLgJAk9TIgJEm9DAhJUi8DQpLUy4CQJPUyICRJvQwISVIvA0KS1MuAkCT1MiAkSb0MCElSLwNCktTLgJAk9TIgJEm9pjwgkjw4yXeSXJzkoiRvbu1bJDk9yWXtffOprk2SdI9x7EHcCby1qnYGHg+8IcnOwEHAoqraEVjUxiVJYzLlAVFV11TVz9rwLcAlwDxgL2Bhm20hsPdU1yZJusdYz0EkmQ88CjgT2KqqrmmTrgW2Ws4yByRZnGTxkiVLpqROSZqJxhYQSTYGvgq8papuHpxWVQVU33JVdURVLaiqBXPnzp2CSiVpZhpLQCRZly4cjqmqE1rz75Js3aZvDVw3jtokSZ1xXMUU4Ejgkqo6bGDSycB+bXg/4KSprk2SdI/ZY9jmE4G/BS5Icm5rezfwIeArSfYHrgRePIbaJEnNlAdEVf0AyHIm7zmVtUiSls87qSVJvQwISVIvA0KS1MuAkCT1MiAkSb0MCElSLwNCktTLgJAk9TIgJEm9DAhJUi8DQpLUy4CQJPUyICRJvQwISVIvA0KS1MuAkCT1MiAkSb0MCElSLwNCktTLgJAk9TIgJEm9DAhJUi8DQpLUy4CQJPUyICRJvQwISVIvA0KS1MuAkCT1MiAkSb0MCElSLwNCktTLgJAk9TIgJEm9plVAJHl2kkuTXJ7koHHXI0kz2bQJiCSzgE8BzwF2Bl6aZOfxViVJM9e0CQjgscDlVXVFVd0BfBnYa8w1SdKMNXvcBQyYB/xmYPwq4HHLzpTkAOCANnprkkunoLZxmAP8ftxFSDPY9P4dTCaz9HbDzDSdAmIoVXUEcMS46xi1JIurasG465BmKn8Hp9chpquBBw+Mb9PaJEljMJ0C4ixgxyTbJ1kPeAlw8phrkqQZa9ocYqqqO5O8EfgGMAv4bFVdNOayxmmtP4wmTXMz/ncwVTXuGiRJ09B0OsQkSZpGDAhJUi8DYoxW1rVIkvWTHNumn5lk/tRXKa29knw2yXVJLlzO9CQ5vP0Onp9kt6mucZwMiDEZsmuR/YEbquohwEeBD09tldJa7yjg2SuY/hxgx/Y6APj0FNQ0bRgQ4zNM1yJ7AQvb8PHAnsnkbp+UdI+q+h7w3yuYZS/g89X5CbBZkq2nprrxMyDGp69rkXnLm6eq7gRuArackuokwXC/p2stA0KS1MuAGJ9huhb50zxJZgP3B66fkuokwQzvAsiAGJ9huhY5GdivDe8DfLu8s1GaSicDr2xXMz0euKmqrhl3UVNl2nS1MdMsr2uRJO8HFlfVycCRwNFJLqc7kfaS8VUsrX2SfAnYA5iT5CrgYGBdgKr6N+BU4LnA5cBtwKvHU+l42NWGJKmXh5gkSb0MCElSLwNCktTLgJAk9TIgJEm9DAiNVZIHJvlykl8mOTvJqUkeuoL5b23v8yd64EyyR5KbkpzTesf9XpLnDyzz+iSvHP2nWTOS7D3YcWOSo5L8V5Jzk5yXZM9x1qeZw/sgNDat48H/BBZW1Uta2y7AVsAvVnF136+q57d17AqcmOT2qlrUrme/L9kbOAW4eKDt7VV1fJKn0j0Kc8exVKYZxT0IjdNTgaWDf8Cr6jzgnCSLkvwsyQVJlu3ldoWq6lzg/cAbAZIckuRtbfjAJBe3vv2/3No2TvK5tq3zk/xNa39pa7swyZ+6Wk9ya5JD27f5nyTZqrUf1Z4d8KMkVyTZZ2CZtyc5q63/fQPtr2xt5yU5OsnuwAuBj7Q9hh2W+Xg/ZqCzuCSPTnJG2/v6xkRPo0le27Z3XpKvJtmotb+ofZ7zknyvtW0w8PnPaSFEklclOSHJaUkuS/LPq/LvoLVAVfnyNZYXcCDw0Z722cCmbXgO3V2sEzd13tre5wMXtuE9gFOWWceuwCVt+BDgbW34t8D6bXiz9v5h4GMDy24OPAj4NTC31fNtYO82vYAXtOF/Bt7Tho8CjqP74rUzXXfuAM+k+9afNu0U4CnAw+n2lOa0+bYYWM8+A/X8aZxu7+KLbXhd4EfA3Da+L90d+QBbDiz/AeBNbfgCYN4yn/+tA8s9rH3uDYBXAVfQ9QG2AXAl8OBx/7/xNXUvDzFpOgrwwSRPAe6m+8a8FXDtKq6jz/nAMUlOBE5sbU9noBuTqrqhbfu7VbUEIMkxdH/UTwTuoPsjD3A28IyB9Z9YVXcDF0/sWdAFxDOBc9r4xnSHiHYBjquq37ftrui5BB9J8kG6zuKe0Np2Ah4BnN4eEzILmOgn6BFJPgBs1rb3jdb+Q+CoJF8BTmhtTwI+0Wr4eZIrgYnzQIuq6qb2M7gY2I57d3+ttZiHmDROFwGP7ml/Od0390dX1a7A7+i+wa6KRwGX9LQ/j+5JfrsBZ7VeclfV0qqa6KPmLu59Lu+PA8MZeP+nqtq1vR5SVUeu4jbfXlUPBd4JfHZgvRcNrPeRVfXMNu0o4I1V9UjgfbSfX1W9HngPXQ+lZydZ2fNFBj/Psp9VazkDQuP0bWD9JAdMNCT5X3TfUq+rqqXtePh2q7LSto730gXBYPs6dIdIvkP3h/b+dN+uTwfeMDDf5sBPgb9KMifd42FfCpyx6h8R6L69vybJxm3985I8gO7zv2jij3SSLdr8twCbLGddnwTWSfIs4FJgbpIntOXXTfLwNt8mwDVJ1qUL3InPtkNVnVlV/xdYQhcU35+Yp11Btm1bt2Y4A0Jj076F/zXw9HaZ60XAP9H1oLkgyQXAK4GfD7G6J09c5koXDAdW1aJl5pkFfKGt9xzg8Kq6ke4Y/eYTJ2+Bp1bXpfNBwHeA84Czq+qk1fyc3wS+CPy4bft4YJOqugg4FDijbfewtsiXgbe3z7PDMuuqVu87qntU7T7Ah9vy5wK7t1nfC5xJd0hp8Of3kYkT73TnL84D/pUudC4AjgVeVVWDew6aoezNVZLUyz0ISVIvA0KS1MuAkCT1MiAkSb0MCElSLwNCktTLgJAk9fr/S4Z2v+029i8AAAAASUVORK5CYII=\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "import matplotlib.pyplot as plt\n", "\n", "negcount = trainData.filter(\"CallDisconnectReason != 16\").count()\n", "poscount = trainData.filter(\"CallDisconnectReason == 16\").count()\n", "\n", "negfrac = 100*float(negcount)/float(negcount+poscount)\n", "posfrac = 100*float(poscount)/float(poscount+negcount)\n", "ind = [0.0,1.0]\n", "frac = [negfrac,posfrac]\n", "width = 0.35\n", "\n", "plt.title('Label Distribution')\n", "plt.bar(ind, frac, width, color='r')\n", "plt.xlabel(\"CallDisconnectReason\")\n", "plt.ylabel('Percentage share')\n", "plt.xticks(ind,['0.0','1.0'])\n", "plt.show()" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYgAAAEWCAYAAAB8LwAVAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMi4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvhp/UCwAAGTZJREFUeJzt3Xm0XXV99/H3h4RRQIZExCCEImJRHxDjhENRnCfoKopTReURXVXR1gl99AGtWK1dqKi10qJEREWQAiIPilFxRoLMIIJUFASJlFGoBPg+f+zflUPcSU5yc+655L5fa5119v7t6Xtucu/n7Om3U1VIkrSsdcZdgCRpejIgJEm9DAhJUi8DQpLUy4CQJPUyICRJvQwI3acl+W6S/z3Vy7bln5zk0tVdvmd9/y/Jfm34VUl+sAbX/fIk31xT69PMYEBoWkjyqyRPH3cdE5IckmRpklva6xdJPplk64l5qur7VbXTkOv6wsrmq6rnVNXCNVD7/CSVZPbAuo+pqmdOdt2aWQwIafmOrapNgC2AvwYeCJw9GBJrQjr+Lmra8T+lprUkmyc5JcmSJDe04W2WmW2HJD9NcnOSk5JsMbD845P8KMmNSc5Lsseq1lBVS6vqImBfYAnw1rbuPZJcNbCtdya5uu1xXJpkzyTPBt4N7Jvk1iTntXm/m+TQJD8EbgP+oueQV9pey01Jfp5kz4EJ99rjWmYv5Xvt/ca2zScse8gqye5JzmrrPivJ7gPTvpvkH5P8sH2WbyaZs6o/N933GRCa7tYBPgdsB2wL3A58cpl5Xgm8BtgauBM4HCDJPODrwAfo9gLeBnw1ydzVKaSq7gJOAp687LQkOwFvBB7T9jqeBfyqqk4DPki3N7JxVe0ysNjfAgcAmwBX9mzyccAvgTnAwcAJg+G3Ak9p75u1bf54mVq3oPu5HA5sCRwGfD3JlgOzvQx4NfAAYD26n51mGANC01pVXV9VX62q26rqFuBQ4K+Wme3oqrqwqv4AvBd4cZJZwCuAU6vq1Kq6u6pOBxYDz51ESb+lC5tl3QWsD+ycZN2q+lVV/XIl6zqqqi6qqjuramnP9OuAj7U9mGOBS4HnTaL2Cc8DLquqo9u2vwT8HHjBwDyfq6pfVNXtwFeAXdfAdnUfY0BoWkuyUZLPJLkyyc10h082awEw4TcDw1cC69J9694OeFE7vHRjkhuBJ9HtaayuecB/L9tYVZcDbwEOAa5L8uUkD1rJun6zkulX171707wSWNk6h/Eg/nyP5Uq6zzbh2oHh24CN18B2dR9jQGi6eyuwE/C4qtqUew6fZGCeBw8MbwssBX5P9wf46KrabOB1v6r60OoU0k4kvwD4ft/0qvpiVT2JLpgK+PDEpOWscmVdKc9LMvg5t6XbgwH4A7DRwLQHrsJ6f9tqHLQtcPVKltMMY0BoOlk3yQYDr9l0x+dvpzvhugXdsfhlvSLJzkk2At4PHN/OF3wBeEGSZyWZ1da5R89J7hVKMjvJXwJfovtDfFjPPDsleVqS9YH/aTXf3Sb/Dpi/GlcqPQA4MMm6SV4E/CVwapt2LvCSNm0BsM/Ackvatv9iOes9FXhokpe1z7YvsDNwyirWp7WcAaHp5FS6P6wTr0OAjwEb0u0R/AQ4rWe5o4Gj6A6LbAAcCFBVvwH2oruKaAndHsXbGf7//b5JbgVuAk4GrgceXVW/7Zl3feBDrc5r6f64v6tNO669X5/kZ0NuG+BMYMe2zkOBfarq+jbtvcAOwA3A+4AvTixUVbe1+X/YDq09fnClbR3Pp9s7ux54B/D8qvr9KtSmGSA+MEiS1Mc9CElSLwNCktTLgJAk9TIgJEm9Zq98lulrzpw5NX/+/HGXIUn3KWefffbvq2qlXc7cpwNi/vz5LF68eNxlSNJ9SpK+vr/+jIeYJEm9DAhJUi8DQpLUy4CQJPUyICRJvQwISVKvkQVEks8muS7JhQNtWyQ5Pcll7X3z1p4khye5PMn5SXYbVV2SpOGMcg/iKODZy7QdBCyqqh2BRW0c4Dl03RrvSPeM3k+PsC5J0hBGFhBV9T3+/NGMewEL2/BCYO+B9s9X5yd0j5SczGMhJUmTNNV3Um9VVde04WuBrdrwPO79fN6rWts1LCPJAXR7GWy77bajq1TS6N3riapaJVPwLJ+xnaRuD2Nf5U9YVUdU1YKqWjB37kq7EpEkraapDojfTRw6au/XtfarufeD57fBB6hL0lhNdUCcDOzXhvcDThpof2W7munxwE0Dh6IkSWMwsnMQSb4E7AHMSXIVcDDdQ92/kmR/4ErgxW32U4HnApcDtwGvHlVdkqThjCwgquqly5m0Z8+8BbxhVLVIkladd1JLknoZEJKkXgaEJKmXASFJ6mVASJJ6GRCSpF4GhCSplwEhSeplQEiSehkQkqReBoQkqZcBIUnqZUBIknoZEJKkXgaEJKmXASFJ6mVASJJ6GRCSpF4GhCSplwEhSeplQEiSehkQkqReBoQkqZcBIUnqZUBIknoZEJKkXgaEJKmXASFJ6mVASJJ6GRCSpF4GhCSplwEhSeo1loBI8vdJLkpyYZIvJdkgyfZJzkxyeZJjk6w3jtokSZ0pD4gk84ADgQVV9QhgFvAS4MPAR6vqIcANwP5TXZsk6R7jOsQ0G9gwyWxgI+Aa4GnA8W36QmDvMdUmSWIMAVFVVwP/AvyaLhhuAs4GbqyqO9tsVwHz+pZPckCSxUkWL1myZCpKlqQZaRyHmDYH9gK2Bx4E3A949rDLV9URVbWgqhbMnTt3RFVKksZxiOnpwH9V1ZKqWgqcADwR2KwdcgLYBrh6DLVJkppxBMSvgccn2ShJgD2Bi4HvAPu0efYDThpDbZKkZhznIM6kOxn9M+CCVsMRwDuBf0hyObAlcORU1yZJusfslc+y5lXVwcDByzRfATx2DOVIknp4J7UkqZcBIUnqZUBIknoZEJKkXgaEJKmXASFJ6mVASJJ6GRCSpF4GhCSp11ABkWS7JE9vwxsm2WS0ZUmSxm2lAZHktXR9J32mNW0DnDjKoiRJ4zfMHsQb6Lrjvhmgqi4DHjDKoiRJ4zdMQPyxqu6YGGnPbKjRlSRJmg6GCYgzkryb7hnSzwCOA7422rIkSeM2TEAcBCyhe3bD64BTgfeMsihJ0vit8HkQSWYBn6+qlwP/PjUlSZKmgxXuQVTVXcB2SdabonokSdPEME+UuwL4YZKTgT9MNFbVYSOrSpI0dsMExC/bax3AG+QkaYZYaUBU1fumohBJ0vSy0oBIMhd4B/BwYIOJ9qp62gjrkiSN2TCXuR4D/BzYHngf8CvgrBHWJEmaBoYJiC2r6khgaVWdUVWvAdx7kKS13DAnqZe292uSPA/4LbDF6EqSJE0HwwTEB5LcH3gr8AlgU+DvR1qVJGnshrmK6ZQ2eBPw1NGWI0maLoa9ium1wPzB+du5CEnSWmqYQ0wnAd8HvgXcNdpyJEnTxTABsVFVvXPklUiSppVhLnM9JclzR16JJGlaWe4eRJJb6J4cF+DdSf5Id8lrgKqqTaemREnSOCw3IKrKjvkkaQZb6SGmJE9Mcr82/IokhyXZdjIbTbJZkuOT/DzJJUmekGSLJKcnuay9bz6ZbUiSJmeYcxCfBm5LsgvdzXK/BI6e5HY/DpxWVQ8DdgEuoXu06aKq2hFY1MYlSWMyTEDcWVUF7AV8sqo+xSSeC9Huyn4KcCRAVd1RVTe29S9ssy0E9l7dbUiSJm+YgLglybuAVwBfT7IOsO4ktrk9sAT4XJJzkvxHO4S1VVVd0+a5FthqEtuQJE3SMAGxL/BHYP+quhbYBvjIJLY5G9gN+HRVPYruMab3OpzU9liqb+EkByRZnGTxkiVLJlGGJGlFVhoQVXVtVR1WVd9v47+uqs9PYptXAVdV1Zlt/Hi6wPhdkq0B2vt1y6nniKpaUFUL5s6dO4kyJEkrMswexBrV9kJ+k2Sn1rQncDFwMrBfa9uProsPSdKYDNPVxii8CTgmyXrAFcCr6cLqK0n2B64EXjym2iRJDBkQSTYEtq2qS9fERqvqXGBBz6Q918T6JUmTN8yNci8AzgVOa+O7Jjl51IVJksZrmHMQhwCPBW6EP337336ENUmSpoFhAmJpVd20TFvvJaiSpLXHMOcgLkryMmBWkh2BA4EfjbYsSdK4DbMH8Sbg4XQ3y30JuBl4yyiLkiSN30r3IKrqNuD/tJckaYZYaUAk+Rp/fs7hJmAx8Jmq+p9RFCZJGq9hDjFdAdwK/Ht73QzcAjy0jUuS1kLDnKTevaoeMzD+tSRnVdVjklw0qsIkSeM1zB7ExoNPkGvDG7fRO0ZSlSRp7IbZg3gr8IMkvwRCd5Pc37VnOCxc4ZKSpPusYa5iOrXd//Cw1nTpwInpj42sMknSWA3bm+uOwE7ABsAuSZjkMyEkSdPcMJe5HgzsAewMnAo8B/gBYEBI0lpsmJPU+9B1w31tVb0a2AW4/0irkiSN3TABcXtV3Q3cmWRTukeBPni0ZUmSxm2YcxCLk2xGd1Pc2XQ3zf14pFVJksZumKuY/q4N/luS04BNq+r80ZYlSRq3YZ4ot2hiuKp+VVXnD7ZJktZOy92DSLIBsBEwJ8nmdDfJAWwKzJuC2iRJY7SiQ0yvo3vuw4Pozj1MBMTNwCdHXJckacyWGxBV9XHg40neVFWfmMKaJEnTwDAnqT+RZHdg/uD83kktSWu3Ye6kPhrYATgXuKs1F95JLUlrtWHug1gA7FxVyz5VTpK0FhvmTuoLgQeOuhBJ0vQyzB7EHODiJD8F/jjRWFUvHFlVkqSxGyYgDhl1EZKk6WeYq5jOSLIdsGNVfSvJRsCs0ZcmSRqnYbraeC1wPPCZ1jQPOHGURUmSxm+Yk9RvAJ5Idwc1VXUZ8IBRFiVJGr9hAuKPVXXHxEiS2XT3QUiS1mLDBMQZSd4NbJjkGcBxwNdGW5YkadyGCYiDgCXABXQd+J0KvGeyG04yK8k5SU5p49snOTPJ5UmOTbLeZLchSVp9wwTEhsBnq+pFVbUP8NnWNllvBi4ZGP8w8NGqeghwA7D/GtiGJGk1DRMQi7h3IGwIfGsyG02yDfA84D/aeICn0V0tBbAQ2Hsy25AkTc4wAbFBVd06MdKGN5rkdj8GvAO4u41vCdxYVXe28atYzkOJkhyQZHGSxUuWLJlkGZKk5RkmIP6QZLeJkSSPBm5f3Q0meT5wXVWdvTrLV9URVbWgqhbMnTt3dcuQJK3EMF1tvBk4Lslv6Z4q90Bg30ls84nAC5M8F9iA7hGmHwc2SzK77UVsA1w9iW1IkiZphQGRZB1gPeBhwE6t+dKqWrq6G6yqdwHvauvfA3hbVb08yXHAPsCXgf2Ak1Z3G5KkyVvhIaaquhv4VFUtraoL22u1w2El3gn8Q5LL6c5JHDmi7UiShjDMIaZFSf4GOGFNPzSoqr4LfLcNXwE8dk2uX5K0+oY5Sf06urun70hyc5Jbktw84rokSWM2THffm0xFIZKk6WWY7r6T5BVJ3tvGH5zEQ0GStJYb5hDTvwJPAF7Wxm8FPjWyiiRJ08IwJ6kfV1W7JTkHoKpusCM9SVr7DbMHsTTJLNozIJLM5Z4uMiRJa6lhAuJw4D+BByQ5FPgB8MGRViVJGrthrmI6JsnZwJ50XW3sXVWXrGQxSdJ93HIDIskGwOuBh9A9LOgzA72tSpLWcis6xLQQWEAXDs8B/mVKKpIkTQsrOsS0c1U9EiDJkcBPp6YkSdJ0sKI9iD91yuehJUmaeVa0B7HLQJ9LATZs4wGqqjYdeXWSpLFZbkBU1aypLESSNL0Mcx+EJGkGMiAkSb0MCElSLwNCktTLgJAk9TIgJEm9DAhJUi8DQpLUy4CQJPUyICRJvQwISVIvA0KS1MuAkCT1MiAkSb0MCElSLwNCktTLgJAk9TIgJEm9pjwgkjw4yXeSXJzkoiRvbu1bJDk9yWXtffOprk2SdI9x7EHcCby1qnYGHg+8IcnOwEHAoqraEVjUxiVJYzLlAVFV11TVz9rwLcAlwDxgL2Bhm20hsPdU1yZJusdYz0EkmQ88CjgT2KqqrmmTrgW2Ws4yByRZnGTxkiVLpqROSZqJxhYQSTYGvgq8papuHpxWVQVU33JVdURVLaiqBXPnzp2CSiVpZhpLQCRZly4cjqmqE1rz75Js3aZvDVw3jtokSZ1xXMUU4Ejgkqo6bGDSycB+bXg/4KSprk2SdI/ZY9jmE4G/BS5Icm5rezfwIeArSfYHrgRePIbaJEnNlAdEVf0AyHIm7zmVtUiSls87qSVJvQwISVIvA0KS1MuAkCT1MiAkSb0MCElSLwNCktTLgJAk9TIgJEm9DAhJUi8DQpLUy4CQJPUyICRJvQwISVIvA0KS1MuAkCT1MiAkSb0MCElSLwNCktTLgJAk9TIgJEm9DAhJUi8DQpLUy4CQJPUyICRJvQwISVIvA0KS1MuAkCT1MiAkSb0MCElSLwNCktTLgJAk9TIgJEm9plVAJHl2kkuTXJ7koHHXI0kz2bQJiCSzgE8BzwF2Bl6aZOfxViVJM9e0CQjgscDlVXVFVd0BfBnYa8w1SdKMNXvcBQyYB/xmYPwq4HHLzpTkAOCANnprkkunoLZxmAP8ftxFSDPY9P4dTCaz9HbDzDSdAmIoVXUEcMS46xi1JIurasG465BmKn8Hp9chpquBBw+Mb9PaJEljMJ0C4ixgxyTbJ1kPeAlw8phrkqQZa9ocYqqqO5O8EfgGMAv4bFVdNOayxmmtP4wmTXMz/ncwVTXuGiRJ09B0OsQkSZpGDAhJUi8DYoxW1rVIkvWTHNumn5lk/tRXKa29knw2yXVJLlzO9CQ5vP0Onp9kt6mucZwMiDEZsmuR/YEbquohwEeBD09tldJa7yjg2SuY/hxgx/Y6APj0FNQ0bRgQ4zNM1yJ7AQvb8PHAnsnkbp+UdI+q+h7w3yuYZS/g89X5CbBZkq2nprrxMyDGp69rkXnLm6eq7gRuArackuokwXC/p2stA0KS1MuAGJ9huhb50zxJZgP3B66fkuokwQzvAsiAGJ9huhY5GdivDe8DfLu8s1GaSicDr2xXMz0euKmqrhl3UVNl2nS1MdMsr2uRJO8HFlfVycCRwNFJLqc7kfaS8VUsrX2SfAnYA5iT5CrgYGBdgKr6N+BU4LnA5cBtwKvHU+l42NWGJKmXh5gkSb0MCElSLwNCktTLgJAk9TIgJEm9DAiNVZIHJvlykl8mOTvJqUkeuoL5b23v8yd64EyyR5KbkpzTesf9XpLnDyzz+iSvHP2nWTOS7D3YcWOSo5L8V5Jzk5yXZM9x1qeZw/sgNDat48H/BBZW1Uta2y7AVsAvVnF136+q57d17AqcmOT2qlrUrme/L9kbOAW4eKDt7VV1fJKn0j0Kc8exVKYZxT0IjdNTgaWDf8Cr6jzgnCSLkvwsyQVJlu3ldoWq6lzg/cAbAZIckuRtbfjAJBe3vv2/3No2TvK5tq3zk/xNa39pa7swyZ+6Wk9ya5JD27f5nyTZqrUf1Z4d8KMkVyTZZ2CZtyc5q63/fQPtr2xt5yU5OsnuwAuBj7Q9hh2W+Xg/ZqCzuCSPTnJG2/v6xkRPo0le27Z3XpKvJtmotb+ofZ7zknyvtW0w8PnPaSFEklclOSHJaUkuS/LPq/LvoLVAVfnyNZYXcCDw0Z722cCmbXgO3V2sEzd13tre5wMXtuE9gFOWWceuwCVt+BDgbW34t8D6bXiz9v5h4GMDy24OPAj4NTC31fNtYO82vYAXtOF/Bt7Tho8CjqP74rUzXXfuAM+k+9afNu0U4CnAw+n2lOa0+bYYWM8+A/X8aZxu7+KLbXhd4EfA3Da+L90d+QBbDiz/AeBNbfgCYN4yn/+tA8s9rH3uDYBXAVfQ9QG2AXAl8OBx/7/xNXUvDzFpOgrwwSRPAe6m+8a8FXDtKq6jz/nAMUlOBE5sbU9noBuTqrqhbfu7VbUEIMkxdH/UTwTuoPsjD3A28IyB9Z9YVXcDF0/sWdAFxDOBc9r4xnSHiHYBjquq37ftrui5BB9J8kG6zuKe0Np2Ah4BnN4eEzILmOgn6BFJPgBs1rb3jdb+Q+CoJF8BTmhtTwI+0Wr4eZIrgYnzQIuq6qb2M7gY2I57d3+ttZiHmDROFwGP7ml/Od0390dX1a7A7+i+wa6KRwGX9LQ/j+5JfrsBZ7VeclfV0qqa6KPmLu59Lu+PA8MZeP+nqtq1vR5SVUeu4jbfXlUPBd4JfHZgvRcNrPeRVfXMNu0o4I1V9UjgfbSfX1W9HngPXQ+lZydZ2fNFBj/Psp9VazkDQuP0bWD9JAdMNCT5X3TfUq+rqqXtePh2q7LSto730gXBYPs6dIdIvkP3h/b+dN+uTwfeMDDf5sBPgb9KMifd42FfCpyx6h8R6L69vybJxm3985I8gO7zv2jij3SSLdr8twCbLGddnwTWSfIs4FJgbpIntOXXTfLwNt8mwDVJ1qUL3InPtkNVnVlV/xdYQhcU35+Yp11Btm1bt2Y4A0Jj076F/zXw9HaZ60XAP9H1oLkgyQXAK4GfD7G6J09c5koXDAdW1aJl5pkFfKGt9xzg8Kq6ke4Y/eYTJ2+Bp1bXpfNBwHeA84Czq+qk1fyc3wS+CPy4bft4YJOqugg4FDijbfewtsiXgbe3z7PDMuuqVu87qntU7T7Ah9vy5wK7t1nfC5xJd0hp8Of3kYkT73TnL84D/pUudC4AjgVeVVWDew6aoezNVZLUyz0ISVIvA0KS1MuAkCT1MiAkSb0MCElSLwNCktTLgJAk9fr/S4Z2v+029i8AAAAASUVORK5CYII=\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "import matplotlib.pyplot as plt\n", "\n", "negcount = testData.filter(\"CallDisconnectReason != 16\").count()\n", "poscount = testData.filter(\"CallDisconnectReason == 16\").count()\n", "\n", "negfrac = 100*float(negcount)/float(negcount+poscount)\n", "posfrac = 100*float(poscount)/float(poscount+negcount)\n", "ind = [0.0,1.0]\n", "frac = [negfrac,posfrac]\n", "width = 0.35\n", "\n", "plt.title('Label Distribution')\n", "plt.bar(ind, frac, width, color='r')\n", "plt.xlabel(\"CallDisconnectReason\")\n", "plt.ylabel('Percentage share')\n", "plt.xticks(ind,['0.0','1.0'])\n", "plt.show()" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [], "source": [ "from pyspark.ml.feature import VectorAssembler\n", "\n", "from pyspark.ml.feature import VectorAssembler\n", "\n", "vecAssembler = VectorAssembler(inputCols=[\"AccountingIDIndex\",\"Calling_NumberIndex\", \"Called_NumberIndex\"], outputCol=\"features\")\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "__Spark ML Naive Bayes__: \n", " Naive Bayes is a simple multiclass classification algorithm with the assumption of independence between every pair of features. Naive Bayes can be trained very efficiently. Within a single pass to the training data, it computes the conditional probability distribution of each feature given label, and then it applies Bayes’ theorem to compute the conditional probability distribution of label given an observation and use it for prediction.\n", "\n", "\n", "\n", "- _We use Spark ML Naive Bayes Algorithm and spark Pipeline to train the data set._" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [], "source": [ "from pyspark.ml.classification import NaiveBayes\n", "from pyspark.ml.clustering import KMeans\n", "from pyspark.ml import Pipeline\n", "\n", "# Train a NaiveBayes model\n", "nb = NaiveBayes(smoothing=1.0, modelType=\"multinomial\")\n", "\n", "# Chain labelIndexer, vecAssembler and NBmodel in a \n", "pipeline = Pipeline(stages=[labelIndexer,vecAssembler, nb])\n", "\n", "# Run stages in pipeline and train model\n", "model = pipeline.fit(trainData)" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "root\n", " |-- Accounting_ID: string (nullable = true)\n", " |-- Calling_Number: string (nullable = true)\n", " |-- Called_Number: string (nullable = true)\n", " |-- CallDisconnectReason: string (nullable = true)\n", " |-- AccountingIDIndex: double (nullable = false)\n", " |-- Calling_NumberIndex: double (nullable = false)\n", " |-- Called_NumberIndex: double (nullable = false)\n", " |-- label: double (nullable = false)\n", " |-- features: vector (nullable = true)\n", " |-- rawPrediction: vector (nullable = true)\n", " |-- probability: vector (nullable = true)\n", " |-- prediction: double (nullable = false)\n", "\n", "+------------------+--------------+-------------+--------------------+-----------------+-------------------+------------------+-----+-------------------+--------------------+-----------+----------+\n", "| Accounting_ID|Calling_Number|Called_Number|CallDisconnectReason|AccountingIDIndex|Calling_NumberIndex|Called_NumberIndex|label| features| rawPrediction|probability|prediction|\n", "+------------------+--------------+-------------+--------------------+-----------------+-------------------+------------------+-----+-------------------+--------------------+-----------+----------+\n", "| 0x00016E0F1005CE4| 9645000075| 3512000075| 16| 2577.0| 38.0| 39.0| 0.0| [2577.0,38.0,39.0]|[-440.84691368645...| [1.0]| 0.0|\n", "|0x00016E0F100A017D| 9645000010| 3512000010| 16| 21710.0| 33.0| 35.0| 0.0|[21710.0,33.0,35.0]|[-560.2845426594596]| [1.0]| 0.0|\n", "|0x00016E0F100AF60A| 9645000077| 3512000077| 16| 6832.0| 34.0| 45.0| 0.0| [6832.0,34.0,45.0]|[-489.13697166313...| [1.0]| 0.0|\n", "|0x00016E0F104511E6| 9645000059| 3512000059| 16| 9768.0| 25.0| 21.0| 0.0| [9768.0,25.0,21.0]|[-335.75198901002...| [1.0]| 0.0|\n", "|0x00016E0F107E0142| 9645000038| 3512000038| 16| 13013.0| 12.0| 11.0| 0.0|[13013.0,12.0,11.0]| [-239.387649472332]| [1.0]| 0.0|\n", "|0x00016E0F107F6253| 9645000093| 3512000093| 16| 13936.0| 97.0| 98.0| 0.0|[13936.0,97.0,98.0]|[-1181.6164141280...| [1.0]| 0.0|\n", "|0x00016E0F109060DA| 9645000068| 3512000068| 16| 13255.0| 21.0| 13.0| 0.0|[13255.0,21.0,13.0]| [-301.258614162017]| [1.0]| 0.0|\n", "|0x00016E0F109962EA| 9645000014| 3512000014| 16| 12198.0| 57.0| 56.0| 0.0|[12198.0,57.0,56.0]|[-720.9963093850602]| [1.0]| 0.0|\n", "|0x00016E0F10AD5AA3| 9645000077| 3512000077| 16| 17980.0| 34.0| 45.0| 0.0|[17980.0,34.0,45.0]|[-587.2075683476078]| [1.0]| 0.0|\n", "|0x00016E0F10B7685D| 9645000080| 3512000080| 16| 19118.0| 51.0| 62.0| 0.0|[19118.0,51.0,62.0]|[-781.8683149993153]| [1.0]| 0.0|\n", "|0x00016E0F10BF7B70| 9645000092| 3512000092| 16| 13766.0| 18.0| 20.0| 0.0|[13766.0,18.0,20.0]|[-327.4738938976721]| [1.0]| 0.0|\n", "|0x00016E0F10C18111| 9645000037| 3512000037| 16| 4031.0| 83.0| 85.0| 0.0| [4031.0,83.0,85.0]|[-947.8468165201091]| [1.0]| 0.0|\n", "|0x00016E0F11135782| 9645000025| 3512000025| 16| 4836.0| 35.0| 32.0| 0.0| [4836.0,35.0,32.0]|[-406.41238288465...| [1.0]| 0.0|\n", "|0x00016E0F113C9E41| 9645000074| 3512000074| 16| 15726.0| 85.0| 83.0| 0.0|[15726.0,85.0,83.0]|[-1050.7308703158...| [1.0]| 0.0|\n", "|0x00016E0F11525D52| 9645000023| 3512000023| 16| 15228.0| 62.0| 63.0| 0.0|[15228.0,62.0,63.0]|[-812.8214011632385]| [1.0]| 0.0|\n", "|0x00016E0F11533B86| 9645000023| 3512000023| 16| 899.0| 62.0| 63.0| 0.0| [899.0,62.0,63.0]|[-686.7670793214941]| [1.0]| 0.0|\n", "|0x00016E0F1156E519| 9645000047| 3512000047| 16| 14638.0| 42.0| 34.0| 0.0|[14638.0,42.0,34.0]|[-541.5216249696589]| [1.0]| 0.0|\n", "|0x00016E0F1159A234| 9645000005| 3512000005| 16| 11407.0| 16.0| 26.0| 0.0|[11407.0,16.0,26.0]|[-328.44207005065...| [1.0]| 0.0|\n", "|0x00016E0F11780902| 9645000099| 3512000099| 16| 19361.0| 27.0| 27.0| 0.0|[19361.0,27.0,27.0]|[-463.58856733144...| [1.0]| 0.0|\n", "|0x00016E0F117D2932| 9645000080| 3512000080| 16| 13710.0| 51.0| 62.0| 0.0|[13710.0,51.0,62.0]|[-734.2933430877965]| [1.0]| 0.0|\n", "+------------------+--------------+-------------+--------------------+-----------------+-------------------+------------------+-----+-------------------+--------------------+-----------+----------+\n", "only showing top 20 rows\n", "\n" ] } ], "source": [ " # Run inference on the test data and show some results\n", "predictions = model.transform(testData)\n", "predictions.printSchema()\n", "predictions.show()" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [], "source": [ "predictiondf = predictions.select(\"label\", \"prediction\", \"probability\")" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [], "source": [ "pddf_pred = predictions.toPandas()" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Accounting_IDCalling_NumberCalled_NumberCallDisconnectReasonAccountingIDIndexCalling_NumberIndexCalled_NumberIndexlabelfeaturesrawPredictionprobabilityprediction
00x00016E0F1005CE496450000753512000075162577.038.039.00.0[2577.0, 38.0, 39.0][-440.84691368645406][1.0]0.0
10x00016E0F100A017D964500001035120000101621710.033.035.00.0[21710.0, 33.0, 35.0][-560.2845426594596][1.0]0.0
20x00016E0F100AF60A96450000773512000077166832.034.045.00.0[6832.0, 34.0, 45.0][-489.13697166313796][1.0]0.0
30x00016E0F104511E696450000593512000059169768.025.021.00.0[9768.0, 25.0, 21.0][-335.75198901002926][1.0]0.0
40x00016E0F107E0142964500003835120000381613013.012.011.00.0[13013.0, 12.0, 11.0][-239.387649472332][1.0]0.0
50x00016E0F107F6253964500009335120000931613936.097.098.00.0[13936.0, 97.0, 98.0][-1181.6164141280706][1.0]0.0
60x00016E0F109060DA964500006835120000681613255.021.013.00.0[13255.0, 21.0, 13.0][-301.258614162017][1.0]0.0
70x00016E0F109962EA964500001435120000141612198.057.056.00.0[12198.0, 57.0, 56.0][-720.9963093850602][1.0]0.0
80x00016E0F10AD5AA3964500007735120000771617980.034.045.00.0[17980.0, 34.0, 45.0][-587.2075683476078][1.0]0.0
90x00016E0F10B7685D964500008035120000801619118.051.062.00.0[19118.0, 51.0, 62.0][-781.8683149993153][1.0]0.0
100x00016E0F10BF7B70964500009235120000921613766.018.020.00.0[13766.0, 18.0, 20.0][-327.4738938976721][1.0]0.0
110x00016E0F10C1811196450000373512000037164031.083.085.00.0[4031.0, 83.0, 85.0][-947.8468165201091][1.0]0.0
120x00016E0F1113578296450000253512000025164836.035.032.00.0[4836.0, 35.0, 32.0][-406.41238288465064][1.0]0.0
130x00016E0F113C9E41964500007435120000741615726.085.083.00.0[15726.0, 85.0, 83.0][-1050.7308703158105][1.0]0.0
140x00016E0F11525D52964500002335120000231615228.062.063.00.0[15228.0, 62.0, 63.0][-812.8214011632385][1.0]0.0
150x00016E0F11533B869645000023351200002316899.062.063.00.0[899.0, 62.0, 63.0][-686.7670793214941][1.0]0.0
160x00016E0F1156E519964500004735120000471614638.042.034.00.0[14638.0, 42.0, 34.0][-541.5216249696589][1.0]0.0
170x00016E0F1159A234964500000535120000051611407.016.026.00.0[11407.0, 16.0, 26.0][-328.44207005065533][1.0]0.0
180x00016E0F11780902964500009935120000991619361.027.027.00.0[19361.0, 27.0, 27.0][-463.58856733144796][1.0]0.0
190x00016E0F117D2932964500008035120000801613710.051.062.00.0[13710.0, 51.0, 62.0][-734.2933430877965][1.0]0.0
200x00016E0F1195D573964500001135120000111618602.071.075.00.0[18602.0, 71.0, 75.0][-956.5501906528943][1.0]0.0
210x00016E0F11A22AE0964500002735120000271616454.048.042.00.0[16454.0, 48.0, 42.0][-633.528720854468][1.0]0.0
220x00016E0F11A40325964500008535120000851620684.091.091.00.0[20684.0, 91.0, 91.0][-1170.3786026181476][1.0]0.0
230x00016E0F11A6148B964500008935120000891614327.079.066.00.0[14327.0, 79.0, 66.0][-913.5175409333822][1.0]0.0
240x00016E0F11C2572396450000823512000082162666.010.05.00.0[2666.0, 10.0, 5.0][-104.9180221824804][1.0]0.0
250x00016E0F11CB77B796450000773512000077163722.034.045.00.0[3722.0, 34.0, 45.0][-461.7778439551454][1.0]0.0
260x00016E0F121E40C5964500007635120000761618888.020.023.00.0[18888.0, 20.0, 23.0][-399.6868792524099][1.0]0.0
270x00016E0F12302D02964500002135120000211614365.037.036.00.0[14365.0, 37.0, 36.0][-522.8249118161635][1.0]0.0
280x00016E0F1240F35C964500007235120000721620227.094.094.00.0[20227.0, 94.0, 94.0][-1198.9435286840064][1.0]0.0
290x00016E0F124BDC0A964500006135120000611619564.017.022.00.0[19564.0, 17.0, 22.0][-383.90956038827][1.0]0.0
.......................................
54390x00016E0F74CB70E964500002535120000251611120.035.032.00.0[11120.0, 35.0, 32.0][-461.69365571970695][1.0]0.0
54400x00016E0F7540A871964500001235120000121616886.044.044.00.0[16886.0, 44.0, 44.0][-626.46522124715][1.0]0.0
54410x00016E0F75C6412296450000413512000041163061.078.068.00.0[3061.0, 78.0, 68.0][-819.8386880641135][1.0]0.0
54420x00016E0F75D07FED96450000913512000091168471.032.047.00.0[8471.0, 32.0, 47.0][-503.55407827205806][1.0]0.0
54430x00016E0F7743C746964500003235120000321613560.095.092.00.0[13560.0, 95.0, 92.0][-1134.8631413000821][1.0]0.0
54440x00016E0F77A9767B964500004735120000471619922.042.034.00.0[19922.0, 42.0, 34.0][-588.0057506317273][1.0]0.0
54450x00016E0F7832FE4B964500009235120000921618021.018.020.00.0[18021.0, 18.0, 20.0][-364.9057551187359][1.0]0.0
54460x00016E0F785E5B4396450000863512000086164949.08.09.00.0[4949.0, 8.0, 9.0][-135.86152354163926][1.0]0.0
54470x00016E0F78C17B2696450000323512000032169308.095.092.00.0[9308.0, 95.0, 92.0][-1097.4576715205371][1.0]0.0
54480x00016E0F7ABFB0CC96450000423512000042163627.061.051.00.0[3627.0, 61.0, 51.0][-640.1682801951773][1.0]0.0
54490x00016E0F7AD3552296450000933512000093162664.097.098.00.0[2664.0, 97.0, 98.0][-1082.4549711941504][1.0]0.0
54500x00016E0F7B33B8F2964500007835120000781620701.050.055.00.0[20701.0, 50.0, 55.0][-752.3493622870137][1.0]0.0
54510x00016E0F7C39611E964500008235120000821610148.010.05.00.0[10148.0, 10.0, 5.0][-170.73827733077636][1.0]0.0
54520x00016E0F7CEE793096450000253512000025166866.035.032.00.0[6866.0, 35.0, 32.0][-424.2705916458162][1.0]0.0
54530x00016E0F7F06CE4D96450000403512000040168126.01.02.00.0[8126.0, 1.0, 2.0][-87.77787468775551][1.0]0.0
54540x00016E0F7FB6CB21964500006735120000671615652.030.043.00.0[15652.0, 30.0, 43.0][-534.141878601174][1.0]0.0
54550x00016E0F7FDE97BA964500003735120000371613825.083.085.00.0[13825.0, 83.0, 85.0][-1034.0060759323533][1.0]0.0
54560x00016E0F81D602996450000933512000093169733.097.098.00.0[9733.0, 97.0, 98.0][-1144.642004560002][1.0]0.0
54570x00016E0F8525DAD964500000735120000071611009.058.053.00.0[11009.0, 58.0, 53.0][-699.6761782293465][1.0]0.0
54580x00016E0F8C69FA2964500003735120000371619237.083.085.00.0[19237.0, 83.0, 85.0][-1081.6162364325642][1.0]0.0
54590x00016E0F96643EE964500005335120000531619223.011.04.00.0[19223.0, 11.0, 4.0][-250.57309672944564][1.0]0.0
54600x00016E0FA5340F496450000523512000052161419.066.069.00.0[1419.0, 66.0, 69.0][-745.6495909208345][1.0]0.0
54610x00016E0FA9AB2FC96450000463512000046164772.040.028.00.0[4772.0, 40.0, 28.0][-411.28342547001455][1.0]0.0
54620x00016E0FB04F0A496450000013512000001163785.077.080.00.0[3785.0, 77.0, 80.0][-885.9427896531429][1.0]0.0
54630x00016E0FD782A3996450000813512000081169729.088.090.00.0[9729.0, 88.0, 90.0][-1052.281664984985][1.0]0.0
54640x00016E0FE9352479645000009351200000916593.074.072.00.0[593.0, 74.0, 72.0][-798.1244936259648][1.0]0.0
54650x00016E0FEAE927C96450000303512000030168220.086.084.00.0[8220.0, 86.0, 84.0][-995.5612244100009][1.0]0.0
54660x00016E0FFB2D264964500007835120000781620823.050.055.00.0[20823.0, 50.0, 55.0][-753.4226142421182][1.0]0.0
54670x00016E0FFB333EE96450000443512000044163623.056.059.00.0[3623.0, 56.0, 59.0][-656.4210955437193][1.0]0.0
54680x00016E0FFDB7D14964500003635120000361615533.070.070.00.0[15533.0, 70.0, 70.0][-896.9679412626872][1.0]0.0
\n", "

5469 rows × 12 columns

\n", "
" ], "text/plain": [ " Accounting_ID Calling_Number Called_Number CallDisconnectReason \\\n", "0 0x00016E0F1005CE4 9645000075 3512000075 16 \n", "1 0x00016E0F100A017D 9645000010 3512000010 16 \n", "2 0x00016E0F100AF60A 9645000077 3512000077 16 \n", "3 0x00016E0F104511E6 9645000059 3512000059 16 \n", "4 0x00016E0F107E0142 9645000038 3512000038 16 \n", "5 0x00016E0F107F6253 9645000093 3512000093 16 \n", "6 0x00016E0F109060DA 9645000068 3512000068 16 \n", "7 0x00016E0F109962EA 9645000014 3512000014 16 \n", "8 0x00016E0F10AD5AA3 9645000077 3512000077 16 \n", "9 0x00016E0F10B7685D 9645000080 3512000080 16 \n", "10 0x00016E0F10BF7B70 9645000092 3512000092 16 \n", "11 0x00016E0F10C18111 9645000037 3512000037 16 \n", "12 0x00016E0F11135782 9645000025 3512000025 16 \n", "13 0x00016E0F113C9E41 9645000074 3512000074 16 \n", "14 0x00016E0F11525D52 9645000023 3512000023 16 \n", "15 0x00016E0F11533B86 9645000023 3512000023 16 \n", "16 0x00016E0F1156E519 9645000047 3512000047 16 \n", "17 0x00016E0F1159A234 9645000005 3512000005 16 \n", "18 0x00016E0F11780902 9645000099 3512000099 16 \n", "19 0x00016E0F117D2932 9645000080 3512000080 16 \n", "20 0x00016E0F1195D573 9645000011 3512000011 16 \n", "21 0x00016E0F11A22AE0 9645000027 3512000027 16 \n", "22 0x00016E0F11A40325 9645000085 3512000085 16 \n", "23 0x00016E0F11A6148B 9645000089 3512000089 16 \n", "24 0x00016E0F11C25723 9645000082 3512000082 16 \n", "25 0x00016E0F11CB77B7 9645000077 3512000077 16 \n", "26 0x00016E0F121E40C5 9645000076 3512000076 16 \n", "27 0x00016E0F12302D02 9645000021 3512000021 16 \n", "28 0x00016E0F1240F35C 9645000072 3512000072 16 \n", "29 0x00016E0F124BDC0A 9645000061 3512000061 16 \n", "... ... ... ... ... \n", "5439 0x00016E0F74CB70E 9645000025 3512000025 16 \n", "5440 0x00016E0F7540A871 9645000012 3512000012 16 \n", "5441 0x00016E0F75C64122 9645000041 3512000041 16 \n", "5442 0x00016E0F75D07FED 9645000091 3512000091 16 \n", "5443 0x00016E0F7743C746 9645000032 3512000032 16 \n", "5444 0x00016E0F77A9767B 9645000047 3512000047 16 \n", "5445 0x00016E0F7832FE4B 9645000092 3512000092 16 \n", "5446 0x00016E0F785E5B43 9645000086 3512000086 16 \n", "5447 0x00016E0F78C17B26 9645000032 3512000032 16 \n", "5448 0x00016E0F7ABFB0CC 9645000042 3512000042 16 \n", "5449 0x00016E0F7AD35522 9645000093 3512000093 16 \n", "5450 0x00016E0F7B33B8F2 9645000078 3512000078 16 \n", "5451 0x00016E0F7C39611E 9645000082 3512000082 16 \n", "5452 0x00016E0F7CEE7930 9645000025 3512000025 16 \n", "5453 0x00016E0F7F06CE4D 9645000040 3512000040 16 \n", "5454 0x00016E0F7FB6CB21 9645000067 3512000067 16 \n", "5455 0x00016E0F7FDE97BA 9645000037 3512000037 16 \n", "5456 0x00016E0F81D6029 9645000093 3512000093 16 \n", "5457 0x00016E0F8525DAD 9645000007 3512000007 16 \n", "5458 0x00016E0F8C69FA2 9645000037 3512000037 16 \n", "5459 0x00016E0F96643EE 9645000053 3512000053 16 \n", "5460 0x00016E0FA5340F4 9645000052 3512000052 16 \n", "5461 0x00016E0FA9AB2FC 9645000046 3512000046 16 \n", "5462 0x00016E0FB04F0A4 9645000001 3512000001 16 \n", "5463 0x00016E0FD782A39 9645000081 3512000081 16 \n", "5464 0x00016E0FE935247 9645000009 3512000009 16 \n", "5465 0x00016E0FEAE927C 9645000030 3512000030 16 \n", "5466 0x00016E0FFB2D264 9645000078 3512000078 16 \n", "5467 0x00016E0FFB333EE 9645000044 3512000044 16 \n", "5468 0x00016E0FFDB7D14 9645000036 3512000036 16 \n", "\n", " AccountingIDIndex Calling_NumberIndex Called_NumberIndex label \\\n", "0 2577.0 38.0 39.0 0.0 \n", "1 21710.0 33.0 35.0 0.0 \n", "2 6832.0 34.0 45.0 0.0 \n", "3 9768.0 25.0 21.0 0.0 \n", "4 13013.0 12.0 11.0 0.0 \n", "5 13936.0 97.0 98.0 0.0 \n", "6 13255.0 21.0 13.0 0.0 \n", "7 12198.0 57.0 56.0 0.0 \n", "8 17980.0 34.0 45.0 0.0 \n", "9 19118.0 51.0 62.0 0.0 \n", "10 13766.0 18.0 20.0 0.0 \n", "11 4031.0 83.0 85.0 0.0 \n", "12 4836.0 35.0 32.0 0.0 \n", "13 15726.0 85.0 83.0 0.0 \n", "14 15228.0 62.0 63.0 0.0 \n", "15 899.0 62.0 63.0 0.0 \n", "16 14638.0 42.0 34.0 0.0 \n", "17 11407.0 16.0 26.0 0.0 \n", "18 19361.0 27.0 27.0 0.0 \n", "19 13710.0 51.0 62.0 0.0 \n", "20 18602.0 71.0 75.0 0.0 \n", "21 16454.0 48.0 42.0 0.0 \n", "22 20684.0 91.0 91.0 0.0 \n", "23 14327.0 79.0 66.0 0.0 \n", "24 2666.0 10.0 5.0 0.0 \n", "25 3722.0 34.0 45.0 0.0 \n", "26 18888.0 20.0 23.0 0.0 \n", "27 14365.0 37.0 36.0 0.0 \n", "28 20227.0 94.0 94.0 0.0 \n", "29 19564.0 17.0 22.0 0.0 \n", "... ... ... ... ... \n", "5439 11120.0 35.0 32.0 0.0 \n", "5440 16886.0 44.0 44.0 0.0 \n", "5441 3061.0 78.0 68.0 0.0 \n", "5442 8471.0 32.0 47.0 0.0 \n", "5443 13560.0 95.0 92.0 0.0 \n", "5444 19922.0 42.0 34.0 0.0 \n", "5445 18021.0 18.0 20.0 0.0 \n", "5446 4949.0 8.0 9.0 0.0 \n", "5447 9308.0 95.0 92.0 0.0 \n", "5448 3627.0 61.0 51.0 0.0 \n", "5449 2664.0 97.0 98.0 0.0 \n", "5450 20701.0 50.0 55.0 0.0 \n", "5451 10148.0 10.0 5.0 0.0 \n", "5452 6866.0 35.0 32.0 0.0 \n", "5453 8126.0 1.0 2.0 0.0 \n", "5454 15652.0 30.0 43.0 0.0 \n", "5455 13825.0 83.0 85.0 0.0 \n", "5456 9733.0 97.0 98.0 0.0 \n", "5457 11009.0 58.0 53.0 0.0 \n", "5458 19237.0 83.0 85.0 0.0 \n", "5459 19223.0 11.0 4.0 0.0 \n", "5460 1419.0 66.0 69.0 0.0 \n", "5461 4772.0 40.0 28.0 0.0 \n", "5462 3785.0 77.0 80.0 0.0 \n", "5463 9729.0 88.0 90.0 0.0 \n", "5464 593.0 74.0 72.0 0.0 \n", "5465 8220.0 86.0 84.0 0.0 \n", "5466 20823.0 50.0 55.0 0.0 \n", "5467 3623.0 56.0 59.0 0.0 \n", "5468 15533.0 70.0 70.0 0.0 \n", "\n", " features rawPrediction probability prediction \n", "0 [2577.0, 38.0, 39.0] [-440.84691368645406] [1.0] 0.0 \n", "1 [21710.0, 33.0, 35.0] [-560.2845426594596] [1.0] 0.0 \n", "2 [6832.0, 34.0, 45.0] [-489.13697166313796] [1.0] 0.0 \n", "3 [9768.0, 25.0, 21.0] [-335.75198901002926] [1.0] 0.0 \n", "4 [13013.0, 12.0, 11.0] [-239.387649472332] [1.0] 0.0 \n", "5 [13936.0, 97.0, 98.0] [-1181.6164141280706] [1.0] 0.0 \n", "6 [13255.0, 21.0, 13.0] [-301.258614162017] [1.0] 0.0 \n", "7 [12198.0, 57.0, 56.0] [-720.9963093850602] [1.0] 0.0 \n", "8 [17980.0, 34.0, 45.0] [-587.2075683476078] [1.0] 0.0 \n", "9 [19118.0, 51.0, 62.0] [-781.8683149993153] [1.0] 0.0 \n", "10 [13766.0, 18.0, 20.0] [-327.4738938976721] [1.0] 0.0 \n", "11 [4031.0, 83.0, 85.0] [-947.8468165201091] [1.0] 0.0 \n", "12 [4836.0, 35.0, 32.0] [-406.41238288465064] [1.0] 0.0 \n", "13 [15726.0, 85.0, 83.0] [-1050.7308703158105] [1.0] 0.0 \n", "14 [15228.0, 62.0, 63.0] [-812.8214011632385] [1.0] 0.0 \n", "15 [899.0, 62.0, 63.0] [-686.7670793214941] [1.0] 0.0 \n", "16 [14638.0, 42.0, 34.0] [-541.5216249696589] [1.0] 0.0 \n", "17 [11407.0, 16.0, 26.0] [-328.44207005065533] [1.0] 0.0 \n", "18 [19361.0, 27.0, 27.0] [-463.58856733144796] [1.0] 0.0 \n", "19 [13710.0, 51.0, 62.0] [-734.2933430877965] [1.0] 0.0 \n", "20 [18602.0, 71.0, 75.0] [-956.5501906528943] [1.0] 0.0 \n", "21 [16454.0, 48.0, 42.0] [-633.528720854468] [1.0] 0.0 \n", "22 [20684.0, 91.0, 91.0] [-1170.3786026181476] [1.0] 0.0 \n", "23 [14327.0, 79.0, 66.0] [-913.5175409333822] [1.0] 0.0 \n", "24 [2666.0, 10.0, 5.0] [-104.9180221824804] [1.0] 0.0 \n", "25 [3722.0, 34.0, 45.0] [-461.7778439551454] [1.0] 0.0 \n", "26 [18888.0, 20.0, 23.0] [-399.6868792524099] [1.0] 0.0 \n", "27 [14365.0, 37.0, 36.0] [-522.8249118161635] [1.0] 0.0 \n", "28 [20227.0, 94.0, 94.0] [-1198.9435286840064] [1.0] 0.0 \n", "29 [19564.0, 17.0, 22.0] [-383.90956038827] [1.0] 0.0 \n", "... ... ... ... ... \n", "5439 [11120.0, 35.0, 32.0] [-461.69365571970695] [1.0] 0.0 \n", "5440 [16886.0, 44.0, 44.0] [-626.46522124715] [1.0] 0.0 \n", "5441 [3061.0, 78.0, 68.0] [-819.8386880641135] [1.0] 0.0 \n", "5442 [8471.0, 32.0, 47.0] [-503.55407827205806] [1.0] 0.0 \n", "5443 [13560.0, 95.0, 92.0] [-1134.8631413000821] [1.0] 0.0 \n", "5444 [19922.0, 42.0, 34.0] [-588.0057506317273] [1.0] 0.0 \n", "5445 [18021.0, 18.0, 20.0] [-364.9057551187359] [1.0] 0.0 \n", "5446 [4949.0, 8.0, 9.0] [-135.86152354163926] [1.0] 0.0 \n", "5447 [9308.0, 95.0, 92.0] [-1097.4576715205371] [1.0] 0.0 \n", "5448 [3627.0, 61.0, 51.0] [-640.1682801951773] [1.0] 0.0 \n", "5449 [2664.0, 97.0, 98.0] [-1082.4549711941504] [1.0] 0.0 \n", "5450 [20701.0, 50.0, 55.0] [-752.3493622870137] [1.0] 0.0 \n", "5451 [10148.0, 10.0, 5.0] [-170.73827733077636] [1.0] 0.0 \n", "5452 [6866.0, 35.0, 32.0] [-424.2705916458162] [1.0] 0.0 \n", "5453 [8126.0, 1.0, 2.0] [-87.77787468775551] [1.0] 0.0 \n", "5454 [15652.0, 30.0, 43.0] [-534.141878601174] [1.0] 0.0 \n", "5455 [13825.0, 83.0, 85.0] [-1034.0060759323533] [1.0] 0.0 \n", "5456 [9733.0, 97.0, 98.0] [-1144.642004560002] [1.0] 0.0 \n", "5457 [11009.0, 58.0, 53.0] [-699.6761782293465] [1.0] 0.0 \n", "5458 [19237.0, 83.0, 85.0] [-1081.6162364325642] [1.0] 0.0 \n", "5459 [19223.0, 11.0, 4.0] [-250.57309672944564] [1.0] 0.0 \n", "5460 [1419.0, 66.0, 69.0] [-745.6495909208345] [1.0] 0.0 \n", "5461 [4772.0, 40.0, 28.0] [-411.28342547001455] [1.0] 0.0 \n", "5462 [3785.0, 77.0, 80.0] [-885.9427896531429] [1.0] 0.0 \n", "5463 [9729.0, 88.0, 90.0] [-1052.281664984985] [1.0] 0.0 \n", "5464 [593.0, 74.0, 72.0] [-798.1244936259648] [1.0] 0.0 \n", "5465 [8220.0, 86.0, 84.0] [-995.5612244100009] [1.0] 0.0 \n", "5466 [20823.0, 50.0, 55.0] [-753.4226142421182] [1.0] 0.0 \n", "5467 [3623.0, 56.0, 59.0] [-656.4210955437193] [1.0] 0.0 \n", "5468 [15533.0, 70.0, 70.0] [-896.9679412626872] [1.0] 0.0 \n", "\n", "[5469 rows x 12 columns]" ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pddf_pred" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "- _We use Scatter plot for visualization and represent the dataset._" ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAY4AAAGrCAYAAADXZXD5AAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMi4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvhp/UCwAAIABJREFUeJzt3XmcnfP5//H3lRAilohEvoQURdNQSRkhRVFGrQ1tx77TdJGKvYpGEGpv7Bpr1BZjXxo1v1QtEYmJELG0VKtCECW1h5jr98e5Z3Jm5pyZc869neX1fDzyyJz7vs/nvpyM8z6f5dy3ubsAAChUj7QLAABUFoIDAFAUggMAUBSCAwBQFIIDAFAUggMAUBSCAxXPzNzM1g9+vtHMJqRYywFm9kiu2spZpdSJ8kBwoGyY2f5m1mxmn5jZAjObamZbh2hvOzNrCdr7xMzmm9kdZrZ5EW0UFUTufou779RFW18GtXxgZk1mNqTQtoFyQXCgLJjZcZImSjpH0kBJgyVdKWlUyKbfdvcVJa0kaUtJr0h6wsx2CNluqc4P6hkk6S1J1yV5cjNbJsnzoToRHEidma0i6UxJR7n73e7+qbt/5e4PuPuJZjbCzGaY2aKgJ3K5mfUq5hyeMd/dx0m6VtJ5WecfEnz6/8DM/m5mewfbR0s6QNJJQS/hgWD7yWb2TzP72MxeMrO9sto61MyeLKCezyXdIWl4h9ficDN72cw+NLO/mNk3svZtlFXnu2Z2SrB9OTObaGZvB38mmtlywb7tgp7Wb8zsHUk3BNtPDF7Lt83s8GJeS4DgQDkYKWl5Sffk2f+1pGMl9Q+O3UHSr0Kc725Jm5pZHzPrI6lJ0q2SVpe0r6QrzWyou0+SdIuCXoK77xE8/5+StpG0iqQzJN1sZmsUU0Bw3v0kvZa1bZSkUyT9WNIASU9Iui3Yt5Kk/yfpYUlrSlpf0rTgqacq05saLmmYpBGSTss63f9J6ifpG5JGm9nOkk6QVC9pA0k7FlM7QHCgHKwm6X13X5Jrp7vPdven3X2Ju/9b0h8lbRvifG9LMkl9Je0u6d/ufkPQ/hxJd0lqyPdkd29097fdvcXdp0h6VZk360KcYGaLJH0saWtJB2Xt+4Wk37v7y8FrcY6k4UGvY3dJ77j7Re7+hbt/7O4zg+cdIOlMd3/P3RcqE2bZ7bZIOt3dFwc9nb0l3eDu89z9U0njC6wdkERwoDz8V1L/fOPvZrahmT1oZu+Y2UfKvKH2D3G+QZJc0iJlPoVvEQyDLQre1A9Q5lN6TmZ2sJk9l3X8xkXUc6G795W0jqTPJX0ra983JF2S1e4HygTcIElrK9PTyWVNSW9kPX4j2NZqobt/0eH4NzscDxSM4EA5mCFpsaQ98+y/SplJ7Q3cfWVlhnMsxPn2kvRs8Gn7TUmPuXvfrD8ruvsvg2PbXT46+PR/jaQxklYLQmBesfW4+38kjVUmKHoHm9+U9PMOtfR296eCfevlae5tZUKn1eBgW9vpOhy/QJkgyj4eKBjBgdS5+/8kjZN0hZntaWYrmNmyZraLmZ2vzIqojyR9Eixf/WVX7eViGYPM7HRJRyoTPpL0oKQNzeyg4JzLmtnmZvbtYP+7av+G3UeZN+KFQbuHKdPjKJq7NynzBj862HS1pN+a2UZB26uYWeuQ2YOS1jCzY4LJ8JXMbItg322STjOzAWbWX5nX8uYuTn2HpEPNbKiZrSDp9FLqR+0iOFAW3P0iSccpM6m7UJlP2GMk3avMRO7+yswLXCNpShFNr2lmn0j6RNIzkr4jaTt3fyQ478eSdlJmUvxtSe8os+JqueD510kaGgwf3evuL0m6SJle0rtBe9NL/M+WpAuUWbW1nLvfE5z79mBIbp6kXbLqrJe0R1Djq5K2D9qYIKlZ0lxJL0h6NtiWk7tPVWbp81+VmZz/a4j6UYOMGzkBAIpBjwMAUBSCAwBQFIIDAFAUggMAUJSyuOBZ//79fZ111km7DACoabNnz37f3Qd0d1xZBMc666yj5ubmtMsAgJpmZgVdRYChKgBAUQgOAEBRCA4AQFEIDgBAUQgOAEBRCA4AQFEIDgBAUQgOAEBRCA4AQFG6DQ4zu97M3jOzeVnb+plZk5m9Gvy9arDdzOxSM3vNzOaa2aZxFg8ASF4hPY4bJe3cYdvJkqa5+waSpgWPpczdyjYI/oxW5l7RAIAq0m1wuPvjkj7osHmUpMnBz5Ml7Zm1/SbPeFpSXzNbI6piAQDpK/UihwPdfUHw8zuSBgY/D1LmXtGt5gfbFqgDMxutTK9EgwcPLrEMAKhtP1xhb7V80f4W4Mv26ak/f3x7bOcMPTnumZuWF33jcnef5O517l43YEC3V/EFAHRwxSk3dAoNSfrq069115UPxXbeUoPj3dYhqODv94Ltb0laO+u4tYJtAICI3Xvun/Puu3rMjbGdt9TguF/SIcHPh0i6L2v7wcHqqi0l/S9rSAsAUKJZs2apvkdD259Zs2alVku3cxxmdpuk7ST1N7P5kk6XdK6kO8zsCElvSNo7OPzPknaV9JqkzyQdFkPNAFBT6ns0dNp26pYXpFBJRrfB4e775dm1Q45jXdJRYYsCAITTb81VYmubb44DQBkrdUhqyvxrI65kqbK45zgAILf/zHqv+4OyNLU0xlTJUvQ4AKCM/XTM7gUfm0RoSAQHAFSNXfvkm5KOFkNVAFCmDh5ylBb8o/Chqq8+XxJjNUsRHABQhnItwS0XDFUBQJm5atzk7g/KgTkOAKhRd094sOjnrDFkYPcHRYShKgCI2bk/n6hp10zvtL3UHkJTS6NefvlljR0+Tt/afD1d9uR5YUssCsEBADHLFRpSZh4jV3isseHq3U6Kf/vb39Yji5MZmuqIoSoAiFEpk9w3vXJF3n2bjtokTDmRIDgAIGXZV709YZdxknIPY/34tN113j2/S7q8TixzXcJ01dXVeXNzc9plAEDkSl1Wm9QKqWxmNtvd67o7jh4HAMQojQCIG8EBAHFbtvin8AVAAKhhTYsb1dTSqHU3Hayey1va5YRGcABAQiY1X6SvvyhsXvnoG0fHXE3pCA4ASEgxN2Xa4+D6GCsJh+AAgISctm3h9wlvXZ5bjggOAEjI6AsOKfo55RgeBAcAJKSYu/llK7fwIDgAIEFnP31i2iWExkUOASAGuXoJy628rB5cdKuaWhp15+UPatKJkzXhsRM1YsSIsutVdIUeBwBEbPe+++fcvvijrzRz5kxJmWGrRz5v1IgRI5IsLRIEBwBEbPFHX+Xdd9rIC3Nu7+rSJOV22RKGqgCgSB2HlaJ6Y29qaYyt7SgRHABQhFxzEfluyFSKcgyKjhiqAoAC7bbKfnn3FTq5PXriwVGVkxqCAwAK9OXHSwo6rqteQ8PRe0RVTmoYqgKAbpSyVLY1PA4berT6fWNVXTT1jKjLSg3BAQBdCPv9ihteujSiSsoHQ1UAEIGznjo+7RISQ3AAQAR+972LVN+jQS+88ELapcSO4ACACB037Eztv94v0i4jVgQHAERs4b//m3YJsSI4AKALlfCFvKQRHADQjaaWRjW1NGqDzddV/eHbqqmlUbZs189pvYPfc889l0yRCTL3wm6cHqe6ujpvbm5OuwwAKEqhS3UrpddiZrPdva674+hxAEDMDlz/l2mXECm+AAigpoW5Gm1TS6NG9TtYny36vMvj3n39/ZJqK1f0OADUrHxXui3GfR/c1G3YrDxgxaLaLHcEB4Ca1FVARH0b17vevSHS9tJGcABABPL1OhrOHJVwJfFjjgNATYm6N5GtNTwuPOpS7fiz72v48OGxnStNBAeAmhFnaGQ74YqjEzlPWhiqAoAOKuV7F2mhxwEAWVpD4/YL79V1J93Stn2lASvo7ncnp1VWWSE4ACDQGhq7rLSPlnza0m7fxws/U32PBnojYqgKACS1H57qGBpoj+AAUDPy9RZ69C68jaQm2MsZQ1UAakpreNw0/g4N3m51bbfddpKk6dOna/w2E1OsrHJwdVwAUPVd6bYUXB0XAAq01+oHp11CRWGoCkDVm3jMJD10aVPb436DVtGUN69te/zJ+11f3VaSlunTQ1M/nhJLfZWG4ABQ1XINQX3w1v/aLa1dsX/vLsOjmoenSsFQFYCad897N6VdQkUhOADUrD1WPaDt5/FPHJPzGHobnTFUBaDqnHvopZp20xNFPWerrbZSU8tWMVVUXehxAKg6hYbGAx/e0v1B6ITgAFBV+GZ3/BiqAlD2OobB2U+fqBEjRpTc3hrfXF03vXpF2LJqVqgeh5kda2Yvmtk8M7vNzJY3s3XNbKaZvWZmU8ysV1TFAqg9uXoQp255gU7YbXxJ7TW1NBIaIZUcHGY2SNLRkurcfWNJPSXtK+k8SX9w9/UlfSjpiCgKBVB77rz8wbz7np/6Ys7trIKKX9g5jmUk9TazZSStIGmBpB9IujPYP1nSniHPASBGf779r5o3b17aZeT0x6NLu3HSAaf9JOd2QiUaJc9xuPtbZnahpP9I+lzSI5JmS1rk7kuCw+ZLGpTr+WY2WtJoSRo8eHCpZQAoUa4hoGWW76Gpn1X+ZTUOPXNfHXrmvmmXUbXCDFWtKmmUpHUlrSmpj6SdC32+u09y9zp3rxswYECpZQAoQX2v3CuPlnzRUla9j7OfPjHtEpBDmKGqHSX9y90XuvtXku6WtJWkvsHQlSStJemtkDUCiNqS/LuO3eSM5OroRlcrp7Y5aIsEK0G2MMtx/yNpSzNbQZmhqh0kNUt6VNJPJd0u6RBJ94UtEkB6Lj/5et13/tR225KcK2hqadROvRvki5duC7scF+GEmeOYaWZ3SnpWmc8vcyRNkvSQpNvNbEKw7booCgWQvL0HHaEPF3zUaXv2lWWT8MjnTGqXk1BfAHT30yWd3mHz65L4KACUsV59ltWXn36Vc98f5i79XzpXaLSaNWsWn/prFJccAWrQQx/fmnP7yv1X1MYbb1xQG6dueUGUJaGCcMkRoEZFMdTUuqSX70fUFnocAEK7/8ZH0i4BCSI4AOS1wcj1CjrussOvibkSlBOGqgDkdeX08zRr1qyKmc/o+G14htDiQY8DQJdGjBhREW/AuS6hwr054kFwAChIvzX75t13wbOnJlhJZ10FBOERPYIDQEGmzM89j7HcSr00fPjwnPv2HfwzTbmYi0dUG3P3tGtQXV2dNzc3p10GgIjk+pS/3mbf0B+fuTCx82WrhKG2cmBms929rrvj6HEAiFS+N/HXZ7+RcCWIC8EBIDFxzTd01aOgtxE9luMCSFTH8Dji/AO07wnhbxTa1NLIctyEEBwAUnXdSbfo6anNmjhtQui2CIpkMFQFIHUvPvr3tEtAEQgOAJEK+6m/vneDThkVvveB+LAcF0AiSpkY771qL93/31tiqAa5sBwXQFk54vwDin7O5x9+GUMlCIvgAJCIUldOccmQ8kNwAEhMU0ujlunD206l418QQKKmfjyFZbMVjuAAUNYImfJDcABoM7rueO3Rt/hJ7GLNnj27oOMOOXOfmCtBKfjmOADVL9MgtWQ9Diak0/q0Ty+jvNHjAGrcs88+2y40ssW1ommzzTaLpV0kg+AAatxv6n6fdgntWdoFoDsMVQGIRH3PBinrQhR9+vXWve/flPf4ppZG3TNpqq78xfVt28595mR6IxWA4AAQWq4hrU8/+Fz1PRq6nK/Ya/Qu2mv0LnGWhhgwVAXUOCaiUSyCA4C+vf2GefdNufi+Lp/76+1PiboclDmCA4AunXa2mloa1dTSqCMvPLDdvmtPuLnL1VVrrD8g7vJQZggOAO1ce8LNObfnC49Trjk2znJQhggOAG2Orf9dpO316dc70vZQHlhVBaDNvGmvlPS8JJfW5ur5MMGfLO4ACKDNlIvvyztUJaX/Bt3VXEvatVUD7gAIoGj7HDcq7RJQAQgOAO3k++Sexif6+h4NBV8v67jtTou5GrRijgNAJ2kP+3QMi0LCY+j3hsRVDjqgxwGgrOy/zs9Let6R5xzY/UGIBMEBoKws/M8HaZeAbhAcACrKiQ//LOf2YuZDEA7BAaCi7LTTTmpqaVTdXsNy7ic84kdwACgroycenHffj07Yue3n5nuez3tcwxpHRFoT2iM4ACSi0KGkhqP3UM/eOW4D2FP69fmFBcKidz8qtjwUgeW4AGKVb2ltV0t+H/70jlhrQjj0OADEZqfe+XsYh288Nrbz/mTcHrG1DYIDQIx8cf59b770dqi2z2v+bd59vxiff54E4REcAErWOm/R+mfi2CsSO/emm27aabhr3U0Hp/6t91rAHAeAkuSa6H7osr/pf+9/otNv+U1idRAUyaPHAaBoe/U/JO++J29beouEbQ7aIu9xZz99YqQ1ITkEB1ADdltlP9X3aNBpPzknkvY++eCzgo4bN/mEnOMaK662gkaMGBFJLUgeQ1VAFTt621P08hOvtj2eec8c1fdoSHR4p+lLhpKqDT0OoIplh0a2sJfl2O+cvUI9H5WN4ACq1G6r7Bdb24efvH/efUxWVz+GqoCI1S/TILW035bGm+mXHy+Jtf2mlkadtNvpmjP1pXbbUP0IDiBC+YaAkp5XkKQt9vquZt4zJ9ZznP/QGbG2j/LEUBVQpSbcdUraJaBKERxAQtK4T0S+Xg5DSgiDoSogQfU9GtRzOdPDnyd39VdCAlGjxwEk7OvFnnYJQCgEBxChfoP6FnQctzdFJWOoCojQlDevkVR8MOQ6niEmlKtQPQ4z62tmd5rZK2b2spmNNLN+ZtZkZq8Gf68aVbFApSjmTb+rJbxAOQo7VHWJpIfdfYikYZJelnSypGnuvoGkacFjAFnoTaCSlRwcZraKpO9Luk6S3P1Ld18kaZSkycFhkyXtGbZIoBLlC4eLnx9XcBv0OlCOwsxxrCtpoaQbzGyYpNmSxkoa6O4LgmPekTQw15PNbLSk0ZI0ePDgEGUA5SuKnkVreNBLQbkIM1S1jKRNJV3l7t+V9Kk6DEu5u0vKufbQ3Se5e5271w0YMCBEGQCAJIUJjvmS5rv7zODxncoEybtmtoYkBX+/F65EoLqtV/eNgo5j2ArlouShKnd/x8zeNLNvufvfJe0g6aXgzyGSzg3+vi+SSoEKN336dI3fZmK7beOfOEZ/nHWhJIIBlSPsqqpfS7rFzOZKGi7pHGUCo97MXpW0Y/AYqHkdQ6N12/Tp0yUxh4HKEeoLgO7+nKS6HLt2CNMuUG266k2M32aimlq26rYNggXlgkuOAGUkXzic9dTx3T73tvPvUX2PBoa8EDsuOQKUmVJ6Fh3DgiW8iBM9DqDCddXDaJ0/AaJEcAAJ6OqTf5y9glwT8kBYBAeQkKaWRq3Yv3fb454rWLehMWPGjLZ5i9Y/lxx3TdylAl1ijgNI0D3v3VTU8eO2urjTtgcnPqIRDRtr5MiRUZUFFIUeB1Cmupq7aBcoy+Zvg8lxxIHgACLSOpR08a8vS/S8TYtzh8Ph5+6faB2oHQxVASHVL9sgfb308dQrHtfUKx5P9NM+PQskiR4HENbXuTeH/SKeLRfq6UBsCA4ghDi/pf3I5+ks4QW6w1AVUMaaWhr1881P0Ouz35CU6YV0FSjdqe/VIC3pfA6gGAQHEELP5Xvo6y9aYj3HH5+5MJJ28vWO6ns0EB4oCkNVQAgPfzYl7RIiMW/evLRLQAUhOICQ8n1aj+tTfOuy38M2OTqyNo/d5IzI2kL1Y6gKiEASQz2j+h2kzxZ90fZ4/rwFkQ0z9ehlodtA7aDHAVSI7NDIFsXKrr98cUfoNlA7CA6gAkQRDsfe+svcO3gXQJEYqgJSkisM4hzy2nXfH2jXfX/Q7ryspkIpCA4gBcUujf1O/RC90PRKJOcmLBAWnVQgYWO3P7Xo51z8l7NiqAQoDcEBJOylx/5R0vOKXfbbumz31nPvLul8QD7m7mnXoLq6Om9ubk67DCAR3U10hx1Kytc+Q1TojpnNdve67o6jxwEkLM438IvHXJ13X5wXZERtITiAMjL+iWNCPX/qldMiqgTIj1VVQAnumHi/rjnuT+22/ezig7T3MT8q6PlNLY2aPn26xm8zsd02oBIwxwEU6emnn9bvvndRzn1pv/nHPX+C6sYcBxCTfKEhpT+P0FUwDFy3f4KVoJoRHEAF233l/duW3bY67Jz9Oh+4jHTzP69KsDJUM+Y4gAo0d+5cHT+8/ZcC63s0aPDwQbru2Yna/+Qfp1QZagE9DqACdQyNVv957q2EK0EtIjiAInU1j1AOk89Hbf2btEtAlSM4UPNOGTWhbZ6g9c/jjz/e5XOaWhr1zc3XaXv8nfpvl0VoSNI/nno97RJQ5ZjjQE27ecKdeuaB5zttP2u7y9TU8v0un3v1zAviKiuUi577XdoloMrR40BNmzxuSt59aS+tLdUmm2ySdgmocgQHYlW/bEOn5aIIr9gr5QJRYqgKsXjgpiZdeuikdtvqezRo/S3X1VVPnZ9SVdWFkEBa6HEgFh1Do9VrT/8r4UpC4P8OICf+10DiLv71ZWmX0KbLpbVL+EQP5EJwoGj1vRo6LV8txqN/mhlTZaVpamnU2t9es+3xD0dvzzAQ0AXmOFCUnXo3SEs6b6/v0VDwm+2Ex06MuKrwrn/xkrRLACoGPQ4UxRfn31doz2PYsGERVQMgDQQHYtHdctG5c+e2DXNNHHtFkqUBCImhKsQmX3h07Jk8dNnf9NBlf2NeAagQ9DgQmTOnH9ftMV1dgI8vCQKVgeBAUbrqFYwcObLb53MBPqDyMVSFojW1NGqn3g1tE+VnTj+uoNAoJ7l6NwyVAYUhOFCSRz6v3DfZfENixSwpBmoZQ1VIVFdvzLv9ervkCsnjN3vlvrMegKUIDoT24M3/r6iltbnuFzF4+CAdc8lRcZRXlGfvm5t2CUDZM3dPuwbV1dV5c3Nz2mWgBPmGfcp5yKe71VvlXDsQJzOb7e513R1HjwMla1jjiLz7KnVpLaEBdI/gQMkWvftR2iWUhHAAwmFVFWpSU0uj6nstvWBjIWHCEl4gg+BAzWr6svA3fZbwAksxVIWSdfWGOfamnydYSbp26lOZ8zlAqQgOhJIrILber067H7hjCtV0r9SbT3XFP4+sKaAiMFSFUHY/cMeyDYlsP+zToJYOb/D1PRp00LgGHTx+73SKAioUPQ7UhI6h0epPZ4afn2COA7WG4EDV+9UWJ4Vug3AAlmKoClXv1Wf+VfJzu5oLIUxQq0L3OMysp5nNMbMHg8frmtlMM3vNzKaYWa/wZQKlK/UNntAAcotiqGqspJezHp8n6Q/uvr6kDyXlvy4FAKDihAoOM1tL0m6Srg0em6QfSLozOGSypD3DnANodfuF97Ytpf3xwEOKem6+HkK+7WccdEHR9QG1Iuwcx0RJJ0laKXi8mqRF7h5cyEHzJQ3K9UQzGy1ptCQNHjw4ZBmodh2HjT5e+FnR39ou5tidx3xfT94yq+DjgVpSco/DzHaX9J67zy7l+e4+yd3r3L1uwIABpZaBGjBun3Pz7ovrKrxbbLFFLO0C1SDMUNVWkn5kZv+WdLsyQ1SXSOprZq09mbUkvRWqQtS8GY0lfTYJbf0t1s25/Ucn7JxwJUB5KXmoyt1/K+m3kmRm20k6wd0PMLNGST9VJkwOkXRfBHWiQlTTFWSvmnG+pPb/TZX63wJEKY7vcfxG0u1mNkHSHEnXxXAOlKFqvYJsJdcOxCGSb467+9/cfffg59fdfYS7r+/uDe6+OIpzoLwdvvHY2Nru6o17o+2/Fdt5AeTGJUcQiTdfejuytnJdwfaMJ4/tdFy/Qato4rQJkZ0XQGG45AjKxs0T7tTkcVPabWsNj6aWRjW1fC+NsgB0QI8DkYhiHqBjaAAoTwQHYhdFqNT35i57QLkgOBCZzHDS0pBYcbUVoluRtDgzbPXss89G0x6AkjHHgcjFuXz1N3W/Z3kskDJ6HKg4cV1mBEBhCA4kpn65pcts77ryoU776UkAlYHgQCLqezRIXy19fPWYG/NenoQAAcobwYHYdTW0lKvn0Z2jJnFvMCBNBAdSdfWYG3Nu76rXseeRXJ0WSBOrqlC2mloadfX4m3TXmQ+0PS7G7NmzdfLmne/lwVAYEI65e9o1qK6uzpubm9MuAzHpbhVUXG/k+c67zAo9NfWT22M5J1DJzGy2u9d1dxxDVYjd4I1z3j1YUvKhIUlLPvs6lnMCtYLgQOyumztRA9fr32l7uQ4Z3Xb+PTmv0AsggzkOJOLm165Ku4SC5AqKSr8RFRA1ehyoSuc+c3LRz5k+fXreffQ8gKUIDlSlzTbbLO++fL2H8dtMjKscoKowVIXU5fsGeVitbbS2/+PTdtcvzzwkdLtArSM4kKp8Q0BRzisU3E4PSS2RnBKoagxVITUnjToz7RLaaVqSP2BW7N87wUqA8kaPA6mZ88ALsbSb3YvZcIv1dMWM8wp+blNLY6de0Ir9e+ue926KrD6g0hEcSE9PSRF/F6/jm/4/Zr5e9LAXS2+BrjFUhdQ0fRXtG3RXS2b36Ld/pOcCahnBgVStPGDFnNuj/tT/xaKvuj8IQEEYqkJB4loye9e7N7Rr///WH6A//ePK0O0CiA/BgW6V1ZJZAKljqApV4+Dxe+fdRzAB0SE4EMq1p05O5bzPP/98pyvYHjSuQTsduV2nYwkNIFoMVSGUrQ/aIvFz5ruC7T4T9tSJk47SiZOOSrwmoJbQ40AoQ4YMSbuENlNOuzftEoCaQHCgW5e9dFbO7T+7NPnvRuzR98DEzwmgPYID3RoyZIiaWhq1+jczd/Hrt1ZfNbU0au8xeyVeyxcfLe5yP3ftA+Jn7p52Daqrq/Pm5ua0y0AFeP7553XCdycUdCyT4kBxzGy2u9d1dxw9DlSUYcOGFXzs2O1PjbESoHYRHMhr6h2PdlryWg4K7Um89Ng/Yq4EqE0sx0VO+Za8TnxhvDbaaKMUKmqv4939ACSHHgc6efHFF/PuO+Y745MrJCTmOIB4EBzopBrCYdU1V064EqB2MFRVRuK6Am21a2pp1NjtT22b0+A1A+JFj6NMdHUF2qRtMWp44ucM65JHz1ZTSyOhASSA4CgDt51/T9oltDPhnvzLWFdZfaUEKwFQjhiqKgPXn3wsYRw6AAANnklEQVRr2iW02aPvgXm/nb3+iMG66umLEq4IQLkhOMrAOsPW1r+ffzPtMvIOizH8AyAbQ1Vl4Jo5F6ddQpdzKXxXAkA2gqNMDN12w5zb+bQPoNwwVJWwXJ/eh267oS559GxJ0s4r7q2vP3OtM2ztsuiJAEBHBEeCdl4x9z2xs6+p9PAndyRVDgCUhKGqBH39Wf5L2Kc9j9DVkBjDZQCyERwV7tFHo7uC7YVzTitoG4DaxlBVBct3Bds+/ZfXve/9qej2hg0bRu8CQLfocZSJw8+N7v7dn77/RWRtAUBHBEeCuvo0v99Jxd2/+5ebnxi2HAAoCUNVCWtqadT06dM1fpuJbY9z6e5KuQveeCeeAgGgG/Q4UrDVVlt1eSXXQq6UW8ocBgBEgeCoMHsNPDjtEgDUOIaqKswnCz9v+7mppTGSmz9lt7HxDkP0h6azSi8QQNUjOCpcmOWzM2bM0Lit2l/WZN60V1Tfo4FluQDyYqiqwkT5ht4xNLLNmDEjsvMAqC4ERxnKFw6rrd03sRq6ChUAtY3gKFNNLY2qP3zbdtv+++aitkuLHLrxr1OqDECtKzk4zGxtM3vUzF4ysxfNbGywvZ+ZNZnZq8Hfq0ZXbm056doxGjT0/3Lue+uldzRnzpzYzr3xDkNiaxtAZQvT41gi6Xh3HyppS0lHmdlQSSdLmubuG0iaFjxGid56Kf8X/U7a7JxQba+8ep+8+1hZBSCfkoPD3Re4+7PBzx9LelnSIEmjJE0ODpssac+wRVaz+t4NkV3dtlh3vXNjzp4FK6oAdCWS5bhmto6k70qaKWmguy8Idr0jaWCe54yWNFqSBg8eHEUZFSff1W2TfOOmZwGgWKEnx81sRUl3STrG3T/K3ufuLinn3YvcfZK717l73YABA8KWUXH2XvuIvPsK7Xn8dPyPoioHAAoWKjjMbFllQuMWd7872Pyuma0R7F9D0nvhSqxOH771UfcHqetho5+POyiqcgCgYCUPVZmZSbpO0svunr3o/35Jh0g6N/j7vlAVVoGwcxet4VHfo0HL9O6hqZ9OiaIsAChJmDmOrSQdJOkFM3su2HaKMoFxh5kdIekNSXuHK7GyRTnhzaQ1gHJQcnC4+5OSLM/uHUptF9IBp/0k7RIAIC8uclhmDjjtJzr0zH0jaatjb4ceC4AoEBwpWn2d1XTL61fH0nY5LPUFUJ24VlWKkgyNQvYBQCEIjpjlvT3sz7dOuBIAiAZDVQloamnUfdf/RZcfea022uFbmtg0IdV6orhrIIDaZZkvd6errq7Om5ub0y6japQ6HEV4ALXNzGa7e113xzFUVYUIAABxIjiqFOEBIC4ERxVramkkQABEjuCocvdd/5e0SwBQZQiOKnf5kdcWdNxpj46JuRIA1YLluBXm+F1O19y/vNRpe6lDUgeP31sHjeNLgQAKR4+jwuQKDSn/EtyNdvhWl+0RGgCKRXBUkFK+n9HVlw1H/GR4mHIA1CiGqqpMdrgMXK+/bn7tKjW1NHYKnTHXHqlRh/8w6fIAVAGCo4p0DId3X3+/7Yq4LMsFEBWGqipIqW/+zzzzTMSVAKhlBEel6VX8U07Z4vzo6wBQs2p6qKoSrxLb9EWmvuN3OV1/n/GqHlx0qyTuswEgOTUbHPneaCvlLnkXTT2j4GN/cfmh8RUCoOYwVJXDY489lnYJReu35ip59/3kV7slWAmAakdw5HDJPteHbmP0ZservkdD25+nnnoqgsrymzL/Wu158q6dtldC7wlAZanZGzl1NSdw2qNjtO2220be9hrfXF03vXpFye0CQJy4kVMIYUKjKwv++V4s7QJAkmo2OPIN4YS9SiyrmwBUu6pfVZXvjbz129SPPfaYfr/bFfrtQ0fF1tMAgGpS1T2OC0bnn084bOjRkjLDUg9/ckdkocFkNIBqV9XB8ci1f8u7b/4rC5IrpFXP5E8JAFGr6uCIyl6rH9xuae306dO7PD5Xr2Nkw2Zq+oreCIDKV/VzHGHlmiMZv81ErTPsLl0z5+K8z2PICkC1quoex4ZbrJd33/J9lw3V9r+ffzPU8wGgUlV1cFwx47y8+x744NZun8/SWgDorOqHqlqHjH67x1mSpN8/8Ltun0NgAEB+VR8crQoJDEl6/PHHY64EACpbVQ9VleKs7S4r6LgV+/eOuRIAKE8ER5ZCh6h2HbOj7nnvppirAYDyVDNDVV2ZMWOGxm2Vf2ltNpbZAqh19DikgkMDAEBwFIXeBgAwVFWQ7MDINQ9y+uNjtfXWWydZEgCkhh5HN9Zcf2Dbz/kmz8/4/iVJlQMAqSM4JB154YF5903+x+UFtXHxmKujKgcAyhrBIWmf40bpzOnHddpezJzG1Cun8Y1zADWBOY7AyJEjI5n8ru/RwCQ6gKpGjyMGO61AzwNA9SI4ilBoT8K/iLkQAEgRwVGkppbGggKk9W6BAFBtCI6YER4Aqg3BUaJTpv0q7RIAIBU1t6rqh8s3qOXLpY8veu532mSTTYpuZ/vtt9f2LdvTowBQc2qqx1Hfo31oSNLxw8/SGQfkv8Vsd1h6C6DW1ExwzJ07N+++J29rDtX2GU8eG+r5AFBJqi44Zs2alXP78cPPiu2c3/ve9/Luo0cCoNpUzRxHrrmGJN+0CQgAtaIqehz5Jqizt1/03O+SKgcAqlpVBEchulo5tfV+dQlWAgCVrWqGqvJp7XW0fuN795X31+JPvmrbX+pyXACoVVUfHK1ar1r74Ee3pl0KAFS0mhmqkqQzD7kw7RIAoOJVfHA0Nzdrr1N2K+jYJ/40M+ZqAKD6xTJUZWY7S7pEUk9J17r7uVGf48jhx+qNufOjbhYA0I3Iexxm1lPSFZJ2kTRU0n5mNjTq85QSGqNO2iXqMgCg5sQxVDVC0mvu/rq7fynpdkmjojxBqRcWHHPu4VGWAQA1KY7gGCTpzazH84Nt7ZjZaDNrNrPmhQsXRlpAp29xG9/sBoCopLYc190nSZokSXV1dR51+wQFAMQjjh7HW5LWznq8VrAtMoQCAKQnjuB4RtIGZraumfWStK+k+6M+ybaHjcy5nVABgHiZe+SjRDKzXSVNVGY57vXufnZXx9fV1Xlzc7h7YgAAwjGz2e7e7cX7YpnjcPc/S/pzHG0DANJV8d8cBwAki+AAABSF4AAAFIXgAAAUheAAABSF4AAAFIXgAAAUheAAABSF4AAAFIXgAAAUheAAABSF4AAAFCWWq+MWXYTZQklvhGiiv6T3IyqnUvEaZPA68BpIvAZSaa/BN9x9QHcHlUVwhGVmzYVcCria8Rpk8DrwGki8BlK8rwFDVQCAohAcAICiVEtwTEq7gDLAa5DB68BrIPEaSDG+BlUxxwEASE619DgAAAkhOAAARan44DCznc3s72b2mpmdnHY9STCztc3sUTN7ycxeNLOxwfZ+ZtZkZq8Gf6+adq1xM7OeZjbHzB4MHq9rZjOD34cpZtYr7RrjZGZ9zexOM3vFzF42s5G19ntgZscG/x/MM7PbzGz5Wvg9MLPrzew9M5uXtS3nv71lXBq8HnPNbNMw567o4DCznpKukLSLpKGS9jOzoelWlYglko5396GStpR0VPDffbKkae6+gaRpweNqN1bSy1mPz5P0B3dfX9KHko5IparkXCLpYXcfImmYMq9FzfwemNkgSUdLqnP3jSX1lLSvauP34EZJO3fYlu/ffhdJGwR/Rku6KsyJKzo4JI2Q9Jq7v+7uX0q6XdKolGuKnbsvcPdng58/VubNYpAy/+2Tg8MmS9oznQqTYWZrSdpN0rXBY5P0A0l3BodU9WtgZqtI+r6k6yTJ3b9090Wqsd8DSctI6m1my0haQdIC1cDvgbs/LumDDpvz/duPknSTZzwtqa+ZrVHquSs9OAZJejPr8fxgW80ws3UkfVfSTEkD3X1BsOsdSQNTKispEyWdJKkleLyapEXuviR4XO2/D+tKWijphmC47loz66Ma+j1w97ckXSjpP8oExv8kzVZt/R5ky/dvH+l7ZaUHR00zsxUl3SXpGHf/KHufZ9ZZV+1aazPbXdJ77j477VpStIykTSVd5e7flfSpOgxL1cDvwarKfJpeV9Kakvqo8/BNTYrz377Sg+MtSWtnPV4r2Fb1zGxZZULjFne/O9j8bmv3M/j7vbTqS8BWkn5kZv9WZojyB8qM9/cNhiyk6v99mC9pvrvPDB7fqUyQ1NLvwY6S/uXuC939K0l3K/O7UUu/B9ny/dtH+l5Z6cHxjKQNghUUvZSZFLs/5ZpiF4zlXyfpZXe/OGvX/ZIOCX4+RNJ9SdeWFHf/rbuv5e7rKPPv/ld3P0DSo5J+GhxW7a/BO5LeNLNvBZt2kPSSauj3QJkhqi3NbIXg/4vW16Bmfg86yPdvf7+kg4PVVVtK+l/WkFbRKv6b42a2qzJj3T0lXe/uZ6dcUuzMbGtJT0h6QUvH909RZp7jDkmDlblM/d7u3nHyrOqY2XaSTnD33c1sPWV6IP0kzZF0oLsvTrO+OJnZcGUWB/SS9Lqkw5T5QFgzvwdmdoakfZRZbThH0pHKjN9X9e+Bmd0maTtlLp/+rqTTJd2rHP/2Qahersww3meSDnP35pLPXenBAQBIVqUPVQEAEkZwAACKQnAAAIpCcAAAikJwAACKQnAAAIpCcAAAivL/AeTQo118l57SAAAAAElFTkSuQmCC\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "import matplotlib.pyplot as plt\n", "import numpy as np\n", "# Set the size of the plot\n", "plt.figure(figsize=(14,7))\n", " \n", "# Create a colormap\n", "colormap = np.array(['red', 'lime', 'black'])\n", " \n", "# Plot CDR\n", "plt.subplot(1, 2, 1)\n", "plt.scatter(pddf_pred.Calling_NumberIndex, pddf_pred.Called_NumberIndex, c=pddf_pred.prediction)\n", "plt.title('CallDetailRecord')\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Evaluation" ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "1.0\n" ] } ], "source": [ "from pyspark.ml.evaluation import MulticlassClassificationEvaluator\n", "\n", "evaluator = MulticlassClassificationEvaluator(labelCol=\"label\", predictionCol=\"prediction\",\n", " metricName=\"accuracy\")\n", "accuracy = evaluator.evaluate(predictiondf)\n", "print(accuracy)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Confusion Matrix" ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [], "source": [ "from sklearn.metrics import confusion_matrix\n", "import pandas as pd\n", "import matplotlib.pyplot as plt\n", "import seaborn as sn\n", "\n", "outdataframe = predictiondf.select(\"prediction\", \"label\")\n", "pandadf = outdataframe.toPandas()\n", "npmat = pandadf.values\n", "labels = npmat[:,0]\n", "predicted_label = npmat[:,1]\n", "\n", "cnf_matrix = confusion_matrix(labels, predicted_label)\n" ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "\n", "def plot_confusion_matrix(cm,\n", " target_names,\n", " title='Confusion matrix',\n", " cmap=None,\n", " normalize=True):\n", "\n", " import matplotlib.pyplot as plt\n", " import numpy as np\n", " import itertools\n", "\n", " accuracy = np.trace(cm) / float(np.sum(cm))\n", " misclass = 1 - accuracy\n", "\n", " if cmap is None:\n", " cmap = plt.get_cmap('Blues')\n", "\n", " plt.figure(figsize=(8, 6))\n", " plt.imshow(cm, interpolation='nearest', cmap=cmap)\n", " plt.title(title)\n", " plt.colorbar()\n", "\n", " if target_names is not None:\n", " tick_marks = np.arange(len(target_names))\n", " plt.xticks(tick_marks, target_names, rotation=45)\n", " plt.yticks(tick_marks, target_names)\n", "\n", " if normalize:\n", " cm = cm.astype('float') / cm.sum(axis=1)[:, np.newaxis]\n", "\n", " thresh = cm.max() / 1.5 if normalize else cm.max() / 2\n", " for i, j in itertools.product(range(cm.shape[0]), range(cm.shape[1])):\n", " if normalize:\n", " plt.text(j, i, \"{:0.4f}\".format(cm[i, j]),\n", " horizontalalignment=\"center\",\n", " color=\"white\" if cm[i, j] > thresh else \"black\")\n", " else:\n", " plt.text(j, i, \"{:,}\".format(cm[i, j]),\n", " horizontalalignment=\"center\",\n", " color=\"white\" if cm[i, j] > thresh else \"black\")\n", "\n", "\n", " plt.tight_layout()\n", " plt.ylabel('label')\n", " plt.xlabel('Predicted \\naccuracy={:0.4f}; misclass={:0.4f}'.format(accuracy, misclass))\n", " plt.show()" ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [ { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAfkAAAHCCAYAAADsC7CKAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMi4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvhp/UCwAAIABJREFUeJzt3Xm4XVV9//H3JwmTECCAIJMFFbBOUIuAiAoOARwKtghqrRG1VMWhta2ltj9R1NZCtUVbqVQRUBxbUapISFWc0QAigyIggxBAhjDITOD7+2PvSw+X3JtLcu89N3u/XzznyTnr7L3XOocn+Z7vWmuvlapCkiR1z6xhN0CSJE0Ng7wkSR1lkJckqaMM8pIkdZRBXpKkjjLIS5LUUQZ5aZolWSfJ/yS5NcmXVuE6f5zk9Mls2zAk+UaSBcNuh9RFBnlpDEleleSsJLcnubYNRntMwqUPADYDNq6ql6/sRarqpKqaPwnteYgkeyapJCePKt+xLT9jgtd5T5LPrOi4qtq3qk5YyeZKGodBXlqOJO8A/hX4B5qA/FjgY8B+k3D53wEurqplk3CtqXID8MwkGw+ULQAunqwK0vDfIGkK+RdMGiXJBsARwKFV9eWquqOq7quq/6mqv26PWSvJvya5pn38a5K12vf2THJ1kr9Mcn3bC3Bw+957gXcDB7U9BK8fnfEm2abNmOe0r1+b5LIkv01yeZI/Hij//sB5uydZ3A4DLE6y+8B7ZyR5X5IftNc5Pckm43wN9wJfAV7Rnj8bOAg4adR3dXSSq5LcluTsJM9uy/cB3jXwOX820I4PJPkBcCfwuLbsDe37xyT574Hr/1OSbybJhP8HSnqQQV56uGcCawMnj3PM3wG7ATsBOwK7AH8/8P5jgA2ALYHXA/+eZF5VHU7TO/CFqlqvqj45XkOSrAt8BNi3quYCuwPnLue4jYCvt8duDHwY+PqoTPxVwMHApsCawF+NVzdwIvCa9vnewAXANaOOWUzzHWwEfBb4UpK1q+q0UZ9zx4Fz/gQ4BJgLXDnqen8JPLX9AfNsmu9uQbn+trRSDPLSw20M3LiC7vQ/Bo6oquur6gbgvTTBa8R97fv3VdWpwO3ADivZngeApyRZp6quraoLl3PMi4FLqurTVbWsqj4HXAS8dOCYT1XVxVV1F/BFmuA8pqr6IbBRkh1ogv2JyznmM1V1U1vnh4C1WPHnPL6qLmzPuW/U9e6k+R4/DHwGeGtVXb2C60kag0FeeribgE1GusvHsAUPzUKvbMsevMaoHwl3Aus90oZU1R003eRvBK5N8vUkT5xAe0batOXA6+tWoj2fBt4C7MVyejaS/FWSX7RDBLfQ9F6MNwwAcNV4b1bVj4HLgND8GJG0kgzy0sP9CLgH2H+cY66hmUA34rE8vCt7ou4AHjXw+jGDb1bVwqp6IbA5TXb+nxNoz0iblqxkm0Z8GngzcGqbZT+o7U5/J3AgMK+qNgRupQnOAGN1sY/b9Z7kUJoegWva60taSQZ5aZSqupVmcty/J9k/yaOSrJFk3yRHtod9Dvj7JI9uJ7C9m6Z7eWWcCzwnyWPbSX9/O/JGks2S7NeOzd9D0+3/wHKucSqwfXvb35wkBwFPAr62km0CoKouB55LMwdhtLnAMpqZ+HOSvBtYf+D93wDbPJIZ9Em2B94PvJqm2/6dScYdVpA0NoO8tBzt+PI7aCbT3UDTxfwWmhnn0ASis4DzgPOBc9qylalrEfCF9lpn89DAPKttxzXAUpqA+6blXOMm4CU0E9duosmAX1JVN65Mm0Zd+/tVtbxeioXAaTS31V0J3M1Du+JHFvq5Kck5K6qnHR75DPBPVfWzqrqEZob+p0fuXJD0yMRJq5IkdZOZvCRJHWWQlySpowzykiR1lEFekqSOMshLktRR463opdY668+ruZtuueIDJQ3FYzdce9hN0CQ6++yzb6yqR091PbPX/52qZXet8nXqrhsWVtU+k9CkSWeQn4C5m27JgUd9acUHShqKf3vZ7w67CZpESUYv0TwlatldrLXDgat8nbvP/fcVLeVMkg2BTwBPoVn18XXAL2nWyNgGuAI4sKpubnddPBp4Ec0S1K+tqnPa6yzg/zbDen9VnTBevXbXS5J6KpBZq/6YmKOB06rqiTQ7V/4COAz4ZlVtB3yzfQ2wL7Bd+zgEOAYe3G3ycGBXmp0vD08yb7xKDfKSpH4KkKz6Y0XVNMtVPwf4JEBV3VtVtwD7ASOZ+An8334Z+wEnVuNMYMMkm9Ns+byoqpZW1c3AImDcYQK76yVJ/TXxTHxVbEuzPPankuxIs3z124HNqura9pjrgM3a51vy0CWir27Lxiofk5m8JEmrZpMkZw08Dhn1/hzg6cAxVfV7NDtPHjZ4QDVrzE/6OvNm8pKk/ppAd/sE3FhVO4/z/tXA1VX14/b1f9EE+d8k2byqrm27469v318CbD1w/lZt2RJgz1HlZ4zXMDN5SVJPTc/Eu6q6DrgqyQ5t0fOBnwOnAAvasgXAV9vnpwCvSWM34Na2W38hMD/JvHbC3fy2bExm8pKk/pqcTH4i3gqclGRN4DLgYJpE+4tJXk+zXfPI/Xyn0tw+dynNLXQHA1TV0iTvAxa3xx1RVUvHq9QgL0nSFKuqc4Hldek/fznHFnDoGNc5DjhuovUa5CVJ/RSma3b90BjkJUk9NbH73Fdn3f4JI0lSj5nJS5L6y+56SZI6quPd9QZ5SVJPpfOZfLc/nSRJPWYmL0nqp5Fd6DrMIC9J6q+Od9cb5CVJPeWYvCRJWk2ZyUuS+muWY/KSJHWPa9dLktRhHZ9d3+2fMJIk9ZiZvCSpp7o/u94gL0nqL7vrJUnS6shMXpLUX3bXS5LUQUnnu+sN8pKk/up4Jt/tTydJUo+ZyUuS+svuekmSusj75CVJ6q6OZ/Ld/gkjSVKPmclLkvrJXegkSeoqx+QlSeoux+QlSdLqyExektRfdtdLktRRdtdLkqTVkZm8JKmf4ux6SZK6q+Pd9QZ5SVJvpeNBvtv9FJIk9ZiZvCSpl0L3M3mDvCSpn9I+OswgL0nqqXQ+k3dMXpKkjjKTlyT1VtczeYO8JKm3DPKSJHVU14O8Y/KSJHWUmbwkqZ+8hU6SpG5KD26hM8hLknqr60HeMXlJkjrKTF6S1Ftdz+QN8pKk3up6kLe7XpKkjjKTlyT1k7fQSZLUXV3vrjfIS5J6qQ/3yTsmL0nSFEtyRZLzk5yb5Ky2bKckZ46UJdmlLU+SjyS5NMl5SZ4+cJ0FSS5pHwtWVK+ZvCSpt6Y5k9+rqm4ceH0k8N6q+kaSF7Wv9wT2BbZrH7sCxwC7JtkIOBzYGSjg7CSnVNXNY1VoJi9J6q9MwmPlFbB++3wD4Jr2+X7AidU4E9gwyebA3sCiqlraBvZFwD7jVWAmL0nqp0xrJl/A6UkK+HhVHQv8ObAwyT/TJN27t8duCVw1cO7VbdlY5WMyyEuStGo2GRlnbx3bBvFBe1TVkiSbAouSXAQcAPxFVf13kgOBTwIvmMyGGeQlSb01SZn8jVW183gHVNWS9s/rk5wM7AIsAN7eHvIl4BPt8yXA1gOnb9WWLaEZsx8sP2O8eh2TlyT1VpJVfkygjnWTzB15DswHLqAZg39ue9jzgEva56cAr2ln2e8G3FpV1wILgflJ5iWZ115n4Xh1m8lLknppGu+T3ww4ua1rDvDZqjotye3A0UnmAHcDh7THnwq8CLgUuBM4GKCqliZ5H7C4Pe6Iqlo6XsUGeUmSplBVXQbsuJzy7wO/v5zyAg4d41rHAcdNtG6DvCSpv7q94J1BXpLUU9N7C91QOPFOkqSOMpOXJPVW1zN5g7wkqbcM8pIkdVW3Y7xj8pIkdZWZvCSpt+yulySpgya6LO3qzCAvSeqtrgd5x+QlSeooM3lJUm91PZM3yEuS+qvbMd4gL0nqr65n8o7JS5LUUWbykqR+6sEudAZ5SVIvBeh4jLe7XpKkrjKTlyT1lCveSZLUWR2P8QZ5SVJ/dT2Td0xekqSOMpOXJPVT7K6XJKmTAsya1e0ob5CXJPVW1zN5x+QlSeooM3lJUm91fXa9QV6S1E9OvJMkqZuateu7HeUdk5ckqaPM5CVJPeXa9ZIkdVbHY7zd9ZIkdZWZvCSpt+yul2a4985/PPcse4AHCh6o4sgzrljucTttMZc37LoVR377cn59y90AbLH+Wrzy9x7D2nNmU+25yx4onr7lXPbeYRNmJVxw3W/56oU3TOMnkjQtvIVOWj0c/f1fc8e994/5/lpzZrHn4zfi8qV3PVg2K7Bg5y048axrWHLbPay75mzuf6BYd83Z7P+UzTjy25dz+7338ye/vznbP/pRXHzDndPxUSRNE2+hkzriJb/7aBZdfBPL7n/gwbInbrouS269hyW33QPAHffeTwEbP2oNbrj9Xm5vfzRcdP0d7LTF+sNotiStEjN5rfYKeMuzHktV8YMrbuEHV9zykPe32mBt5q0zhwt/czsv2G6jB8s3XW9NAA7dfWvWW2sOZ199K/97yVJuuONeNp27Jhs9ag1uues+dtx8LrM7vlOV1FcdT+SnP8gnuR84v637F8CCqnpE/aBJPgF8uKp+nuRdVfUPA+/9sKp2n9RGa0b7l+9eya13L2O9NWfzlj0ey3W/vYdf3dR0ywf4o6duyqfPufZh581OeNzG63DUGVdw7/0P8LY9Hsuvb7mbi2+4ky+cex2ve8aWFMVlN93FJuuuOc2fStJ06Hp3/TAy+buqaieAJCcBbwQ+/EguUFVvGHj5LuAfBt4zwPfMrXcvA+D2e+/nvGt+yzbz1nkwyK81Zxabr78Wb9/jsQCsv/Yc/my3rfj4mVdzy13L+NVNdz44ln/hdXew9YZrc/ENd3LBdbdzwXW3A/CsbTbkgRrCB5M05Toe44c+Jv894AkASd6R5IL28edt2bpJvp7kZ235QW35GUl2TvJBYJ0k57Y/GEhye/vn55O8eKSiJMcnOSDJ7CRHJVmc5LwkfzbdH1qTZ83ZYa05sx58/sRN1+Wa2+7hOY+bx3MeN4+7lz3AYadewuGn/4rDT/8VVyy9i4+feTW/vuVufn797Wyx/tqsMTvMCjxhk0dx3W33ArDemrMBWGeNWTx723n86MpbxmyDJM1UQxuTTzIH2Bc4LcnvAwcDu9L0sP44yXeAxwHXVNWL23M2GLxGVR2W5C0jPQOjfAE4EPh6kjWB5wNvAl4P3FpVz0iyFvCDJKdX1eVT80k1leauNYc/3W0roOl+P+uqW/nF9XfwlMesx2UDM+mX5677HuBbl97EO/fclqK48Lo7uPA3TfZ+wNM2Y8sN1gbgtItu5Prb753aDyJp+sXu+qmwTpJz2+ffAz5JE3xPrqo7AJJ8GXg2cBrwoST/BHytqr73COr5BnB0G8j3Ab5bVXclmQ88LckB7XEbANsBDwnySQ4BDgFY79Gbr8TH1HS46c77+OC3Hv77bON11+DL5//mYeVHf//XD3m9+KrbWHzVbQ877vizrpm8RkqakZpb6Ibdiqk11DH5EWP9kqqqi5M8HXgR8P4k36yqIyZSSVXdneQMYG/gIODzI9UBb62qhSs4/1jgWIBNn/AUR2RXM//xo6uH3QRJM173N6gZ9pj8iO8B+yd5VJJ1gZcB30uyBXBnVX0GOAp4+nLOvS/JGmNc9ws0wwAjvQIAC4E3jZyTZPu2TkmSOmVG3CdfVeckOR74SVv0iar6aZK9gaOSPADcR9OtP9qxwHlJzqmqPx713unAp4GvVtXIoOongG2Ac9L8hLsB2H9SP5AkabXQ8UR++oN8Va03RvmHGXUrXdul/rBu9arac+D53wB/s7zrV9V9wEajzn2A5ra7d63UB5AkdUbXu+tnRCYvSdK068EGNTNlTF6SJE0yM3lJUi/1YRc6g7wkqbe6HuTtrpckqaPM5CVJvdXxRN4gL0nqr6531xvkJUn95C10kiRpVSW5Isn57dboZw2UvzXJRUkuTHLkQPnfJrk0yS/b1V9Hyvdpyy5NctiK6jWTlyT1UqZ/g5q9qurGB+tP9gL2A3asqnuSbNqWPwl4BfBkYAvgf5Ns357278ALgauBxUlOqaqfj1WhQV6S1FtD7q5/E/DBqroHoKqub8v3Az7fll+e5FJgl/a9S6vqMoAkn2+PHTPI210vSeqtWckqP4BNkpw18DhkOVUVcHqSswfe3x54dpIfJ/lOkme05VsCVw2ce3VbNlb5mMzkJUlaNTdW1c4rOGaPqlrSdskvSnIRTQzeCNgNeAbwxSSPm8yGGeQlSb01Xd31VbWk/fP6JCfTdL9fDXy5qgr4Sbut+ibAEmDrgdO3assYp3y57K6XJPVS0twnv6qPFdeTdZPMHXkOzAcuAL4C7NWWbw+sCdwInAK8IslaSbYFtgN+AiwGtkuybZI1aSbnnTJe3WbykqTemjU9mfxmwMntD4I5wGer6rQ2UB+X5ALgXmBBm9VfmOSLNBPqlgGHVtX9AEneAiwEZgPHVdWF41VskJckaQq1s+F3XE75vcCrxzjnA8AHllN+KnDqROs2yEuSestlbSVJ6qiOx3gn3kmS1FVm8pKkXgrN0rZdZpCXJPXWNM2uHxqDvCSpnyZ4n/vqzDF5SZI6ykxektRbHU/kDfKSpH4KjOwi11kGeUlSb3U8xjsmL0lSV5nJS5J6q+uz6w3ykqRearaaHXYrppZBXpLUW12feOeYvCRJHWUmL0nqrW7n8QZ5SVKPdX3ind31kiR1lJm8JKmXmhXvht2KqWWQlyT1Uw92oTPIS5J6q+Mx3jF5SZK6ykxektRbdtdLktRBTryTJKnDup7JOyYvSVJHmclLknqr23n8CoJ8kj8c7/2q+vLkNkeSpOmRdH8XuhVl8i8d570CDPKSpNVWx2P8+EG+qg6eroZIkqTJNaGJd0k2S/LJJN9oXz8pyeuntmmSJE2ttEvbrspjJpvo7PrjgYXAFu3ri4E/n4oGSZI0XZJVf8xkEw3ym1TVF4EHAKpqGXD/lLVKkiStsoneQndHko1pJtuRZDfg1ilrlSRJUyyk97PrR7wDOAV4fJIfAI8GDpiyVkmSNNVWg+72VTWhIF9V5yR5LrADzdoBv6yq+6a0ZZIkTbGZPnFuVU0oyCdZG3gzsAdNl/33kvxHVd09lY2TJEkrb6Ld9ScCvwU+2r5+FfBp4OVT0ShJkqZD1zdwmWiQf0pVPWng9beT/HwqGiRJ0nQI3e+un+iPmHPaGfUAJNkVOGtqmiRJ0vSYlVV/zGQr2qDmfJox+DWAHyb5dfv6d4CLpr55kiRpZa2ou/4l09IKSZKGYKZn4qtqRRvUXDn4OsmmwNpT2iJJkqZBsyxtt6P8RG+h+wPgQzRr119P013/C+DJU9c0SZKmVtcz+YlOvHsfsBtwcVVtCzwfOHPKWiVJklbZRIP8fVV1EzAryayq+jaw8xS2S5KkKdf1Xegmep/8LUnWA74LnJTkeuCOqWuWJElTK9D5DWommsnvB9wF/AVwGvAr4KVT1ShJkqbDrEl4zGQT3aBmMGs/YYraIkmSJtGKFsP5Le0e8qPfAqqq1p+SVkmSNA063lu/wvvk505XQyRJmk5JHJOXJEmrp4nOrpckqXM6nsgb5CVJ/eWKd5IkddDIffKr+phQXckVSc5Pcm6Ss0a995dJKskm7esk+UiSS5Ocl+TpA8cuSHJJ+1iwonrN5CVJmh57VdWNgwVJtgbmA78eKN4X2K597AocA+yaZCPgcJoVZws4O8kpVXXzWBWayUuSemsGLGv7L8A7eejt6vsBJ1bjTGDDJJsDewOLqmppG9gXAfuMd3GDvCSpn9KMya/qY4IKOD3J2UkOAUiyH7Ckqn426tgtgasGXl/dlo1VPia76yVJvRUmZebdJqPG2Y+tqmNHHbNHVS1JsimwKMlFwLtouuqnjEFekqRVc2NVjbsza1Utaf+8PsnJwHOBbYGfpenz3wo4J8kuwBJg64HTt2rLlgB7jio/Y7x67a6XJPVSM7t+6rvrk6ybZO7Ic5rsfXFVbVpV21TVNjRd70+vquuAU4DXtLPsdwNuraprgYXA/CTzksxrr7NwvLrN5CVJvTVN98lvBpzcZuxzgM9W1WnjHH8q8CLgUuBO4GCAqlqa5H3A4va4I6pq6XgVG+QlSb2VaVjyrqouA3ZcwTHbDDwv4NAxjjsOOG6iddtdL0lSR5nJS5J6aWRMvssM8pKkfpqcxWxmNLvrJUnqKDN5SVJvTXSDmdWVQV6S1EuOyUuS1GEdT+Qdk5ckqavM5CVJPRVmTc4GNTOWQV6S1Euh+931BnlJUj89sv3gV0uOyUuS1FFm8pKk3vI+eUmSOsgxeUmSOqzrmbxj8pIkdZSZvCSptzqeyBvkJUn9FLrfnd31zydJUm+ZyUuS+imQjvfXG+QlSb3V7RBvkJck9VSzn3y3w7xj8pIkdZSZvCSpt7qdxxvkJUk91vHeeoO8JKmv0vnZ9Y7JS5LUUWbykqRe6sOKdwZ5SVJvdb273iAvSeqtbof47vdUSJLUW2byE/DYDdfm3172u8NuhiRpMrl2vSRJ3dSHiXdd/3ySJPWWmbwkqbfsrpckqaO6HeIN8pKkHut4Iu+YvCRJXWUmL0nqpWZ2fbdTeYO8JKm3ut5db5CXJPVUSMczecfkJUnqKDN5SVJv2V0vSVIHOfFOkqSuSvczecfkJUnqKDN5SVJvdT2TN8hLknqr67fQGeQlSb0UYFa3Y7xj8pIkdZWZvCSpt+yulySpo7o+8c7uekmSOspMXpLUW3bXS5LUQc6ulySpszIp/02opuSKJOcnOTfJWW3ZUUkuSnJekpOTbDhw/N8muTTJL5PsPVC+T1t2aZLDVlSvQV6SpOmxV1XtVFU7t68XAU+pqqcBFwN/C5DkScArgCcD+wAfSzI7yWzg34F9gScBr2yPHZPd9ZKkfhryBjVVdfrAyzOBA9rn+wGfr6p7gMuTXArs0r53aVVdBpDk8+2xPx+rDjN5SVJvZRIeE1TA6UnOTnLIct5/HfCN9vmWwFUD713dlo1VPiYzeUlSLzUT7yYlld9kZJy9dWxVHTvqmD2qakmSTYFFSS6qqu8CJPk7YBlw0mQ0ZpBBXpKkVXPjwDj7clXVkvbP65OcTNP9/t0krwVeAjy/qqo9fAmw9cDpW7VljFO+XHbXS5J6azq665Osm2TuyHNgPnBBkn2AdwJ/UFV3DpxyCvCKJGsl2RbYDvgJsBjYLsm2SdakmZx3ynh1m8lLkvpreibebQacnGZoYA7w2ao6rZ1QtxZN9z3AmVX1xqq6MMkXaSbULQMOrar7AZK8BVgIzAaOq6oLx6vYIC9J6q3pWPGunQ2/43LKnzDOOR8APrCc8lOBUydat931kiR1lJm8JKm3ur4LnUFektRbHY/xdtdLktRVZvKSpP7qeCpvkJck9VJzn3u3o7xBXpLUT0PeoGY6OCYvSVJHmclLknqr44m8QV6S1GMdj/IGeUlST6XzE+8ck5ckqaPM5CVJvdX12fUGeUlSL010P/jVmUFektRfHY/yjslLktRRZvKSpN7q+ux6g7wkqbe6PvHO7npJkjrKTF6S1FsdT+QN8pKknurBPXQGeUlSb3V94p1j8pIkdZSZvCSpl0L3Z9cb5CVJvdXxGG+QlyT1WMejvGPykiR1lJm8JKm3uj673iAvSeotJ95JktRRHY/xjslLktRVZvKSpP7qeCpvkJck9VKzdH23o7zd9ZIkdZSZvCSpn+LsekmSOqvjMd4gL0nqsY5HecfkJUnqKDN5SVJPpfOz6w3ykqTecuKdJEkdFDo/JO+YvCRJXWUmL0nqr46n8gZ5SVJvOfFOkqSO6vrEO8fkJUnqKDN5SVJvdTyRN8hLknrKDWokSeqybkd5x+QlSeooM3lJUi8Fu+slSeqsjsd4u+slSeoqM3lJUm/ZXS9JUkd1fVlbu+slSf2VSXhMpJrkiiTnJzk3yVlt2UZJFiW5pP1zXlueJB9JcmmS85I8feA6C9rjL0myYEX1GuQlSZoee1XVTlW1c/v6MOCbVbUd8M32NcC+wHbt4xDgGGh+FACHA7sCuwCHj/wwGItBXpLUW9OUyI9lP+CE9vkJwP4D5SdW40xgwySbA3sDi6pqaVXdDCwC9hmvAoO8JKmXksl5TFABpyc5O8khbdlmVXVt+/w6YLP2+ZbAVQPnXt2WjVU+JifeSZJ6a5Im3m0yMs7eOraqjh11zB5VtSTJpsCiJBcNvllVlaQmozGDDPKSJK2aGwfG2Zerqpa0f16f5GSaMfXfJNm8qq5tu+Ovbw9fAmw9cPpWbdkSYM9R5WeMV6/d9ZKk/pqGQfkk6yaZO/IcmA9cAJwCjMyQXwB8tX1+CvCadpb9bsCtbbf+QmB+knnthLv5bdmYzOQlSb01TXfJbwacnGYAfw7w2ao6Lcli4ItJXg9cCRzYHn8q8CLgUuBO4GCAqlqa5H3A4va4I6pq6XgVG+QlSb01HSveVdVlwI7LKb8JeP5yygs4dIxrHQccN9G67a6XJKmjzOQlST2Vzi9ra5CXJPVSH/aTt7tekqSOMshLktRRdtdLknqr6931BnlJUm91feKd3fWSJHWUmbwkqZ8e2S5yqyWDvCSplyZhP/gZzyAvSeqvjkd5x+QlSeooM3lJUm91fXa9QV6S1FtOvJMkqaM6HuMdk5ckqaumLMgnqSQfGnj9V0neMwX1vGvU6x9Odh2SpI7KJDxmsKnM5O8B/jDJJlNYB8BDgnxV7T7F9UmSOiKT8N9MNpVBfhlwLPAXo99I8ugk/51kcft41kD5oiQXJvlEkitHfiQk+UqSs9v3DmnLPgisk+TcJCe1Zbe3f34+yYsH6jw+yQFJZic5qq33vCR/NoXfgSRJQ5OqmpoLN8F2C+A8YEfgT4H1quo9ST4LfKyqvp/kscDCqvrdJP8GLKmqf0yyD/AN4NFVdWOSjapqaZJ1gMXAc6vqpiS3V9V6g/VW1XpJXgbsX1ULkqwJ/ArYHvgTYNOqen+StYAfAC+vqstHtf8Q4JD25Q7AL6fki9IwbALcOOxGSBrTDlU1d6orSXIazb8Hq+rGqtpnEq4z6aZ0dn1V3ZbkROCRH+LyAAARMklEQVRtwF0Db70AeFL+796F9ZOsB+wBvKw997QkNw+c87Y2cANsDWwH3DRO9d8Ajm4D+T7Ad6vqriTzgaclOaA9boP2Wg8J8lV1LE1PhDomyVlVtfOw2yFp+ZKcNR31zNTAPJmm4xa6fwXOAT41UDYL2K2q7h48MGPcsJhkT5ofBs+sqjuTnAGsPV6lVXV3e9zewEHA50cuB7y1qhY+0g8iSdLqZMpvoauqpcAXgdcPFJ8OvHXkRZKd2qc/AA5sy+YD89ryDYCb2wD/RGC3gWvdl2SNMar/AnAw8GzgtLZsIfCmkXOSbJ9k3ZX8eJIkzVjTdZ/8h3jouMfbgJ3biW8/B97Ylr8XmJ/kAuDlwHXAb2kC9JwkvwA+CJw5cK1jgfNGJt6NcjrwXOB/q+retuwTwM+Bc9p6Po6LAvWNwzDSzObf0UkyZRPvVkY7fn5/VS1L8kzgmKraaUXnSZKkh5tpGexjgS8mmQXcSzMjX5IkrYQZlclLkqTJ49r1kiR1lEFekqSOMshLo2SsBRskaTUz0ybeSUOVJNVOVEnyAmB94MfAdVV1/1AbJ+lBI39Xk2xOM7/smmG3aSYyk5cGDAT4t9Os27Ar8C1gl2G2S9JDtQF+f+BzwDFJ/inJVsNu10xjkJdGSbI9zQZIzwKuAH5Nk82PvG93vjRkSZ4KvAN4CfATYC/g1qE2agYyyEsDkmwMXEOziuLxwP7AvlX1QJIFSTYo7zuVZoL7ga/RrI76YuAVVfXbJE8ebrNmFoO81EqyK/C3NP94PAZ4AvD6dgXGVwN/CUz59peSxpbkSUleTrNg2rOBNwOvqarLkuwL/GeSxwy1kTOIi+Gol9ou91TVAwNl2wLfBN5A00V/JHAzMBv4PeCPq+qCITRXUivJnwIHV9XuSf6cZr7Mt4A7gb8D/qaqvjbMNs4kBnn10qhZ9BsD91TV7Un+CNirqt6SZDuajH4zYHFVXTnEJku9NDCLfk5VLWvLTgLOrKqPJnkD8DvARsBXq+r0wb/ffectdOqVNoN/KvD/gJcn+X3gMOCKJMfRTLDbL8n2VXUxcMnwWiv1VzsBdseq+lL793SvJJdW1VeATwF7A1TVJ9rj16iq+9oyA3zLMXn1SjXOA96SZE/gXJqAfz3wZeBZwOOBf06y5tAaKmkWcH2SucDVwJrAoUk+CiwD9k3yJwPHLxtCG2c8g7x6I8k6Ay9vBA4GLgAur6qjgLcDGwP3AE8CHjXtjZQEQFVdBPwAuArYv6r+AfgDmjkyuwIbAguSrNceb/a+HI7JqxeSrE0zO/5UmlnzT62qd7dd9M8Edqqqe5LMAdYFNq6qy4bXYql/kjwKeGFVfbW92+VeIMBpwAeq6uh2K/LHAAcCl1TV14fX4pnPIK/OS7JJVd2Y5NnAd4BLaYL8Pe37n6KZPb9bVd09xKZKvdeuT7EzcDfwp1X10yRPB/4X+Puq+tio451kNw6769VZaWwNvL/t0vs58FVgc5p/RACoqoOBC4HvDqWhkgZXkvxHmpnyy6rqpwBVdQ7wAuDodsnpBxngx2cmr85Lsj7wFGDdqlqU5HnAV4BXVdXXkuxWVWcm2bSqrh9ua6X+GbhNbhawHjAPOA64r6r2GThuO2Cbqlo0pKaudszk1UmD68tX1W3AjsC7k+xTVd8CXg18KcmHgOOSbGWAl6bfQICfD/w9zfK0V1bV84E1k/xPkl2TfAe4qf2h7v4RE+R98uqcUQvdvAq4taqOSXIf8Nft+6ckeSHwXJqZu1cPs81SX7UBfh/gQ8BbgM8l2RH4f1X1vCSfo9kR8kNVtXTknOG1ePVid706K8mhNEvUHlhVl7RlrwJeB3ykDfRO2pGGpO2enwucQLNexWbAUcAS4BbgrVV1c5INq+oW/74+cmby6py2K+8JwGtodqe6LsnLgK2BzwBrAK9P8s2qumN4LZX6aSBYr11VtyZ5Pc1kuyNoJsWuA1wHXJXkiKq6BczgV4ZBXp0w+Au//fOSdgzv88AvgQ1oNpt5W1W9J8lXDfDS9BsYg98V+FiS11bV+Uk2pbkvfh7NolTfAr5cVXcNs72rO4O8VnujxuB3p/lH4lzgCzRrz3+rqn6V5BBgp/a0W4fSWKnnBsbgD6BZvW5hkr3bQP8T4CSanrg3V9XiYba1CxyT12pr9Phckr8CXgHcANwEfB84qap+23YHvgl4rdvFSsPTbul8Gs12sT9M8m7gtTRDa7+i6a5fVlU/GV4ru8NMXquzOcB9AEkeQ7Mr1bOr6q52y9hnA09OcgPNinYHG+ClobuJZrfHywCq6ogkTwAWAs+qqh8Os3Fd433yWi21t7+dmOSwtuvvJppFNJ4DUFX/DawF7FdVvwL+sqrOH1qDpZ4auac9yQZJNmjXrVgf+MOBw06i6YH76siGM5ocZvJa7bRB/Qjg08CmwCtpJtV9Ftglyc1tV9/ZwPZJZo+sUy9perVj8C8F3gHcnORM4DCa++G3Au6iCfgHA39Gs0HU7cNqb9eYyWu1kmQjmp3k3ldVHwWOBdammY17WnvYvyQ5luYfkhOq6v6hNFbqqcEV6ZLsBrwL+BPgJzSbzlwEHESzT/y6NLe7zgOeBTww7Q3uMCfeabWT5MXAkcAzq+q2JCcB36mqY5PMA7YFtgHOrqorh9hUqXeSPBrYH/hcVd2e5Dk0e7+vRZPNv6qqLk+yTVVd0Z6zO3AizeqTzpuZRHbXa7VTVV9P8gBwdpKFNAtnfKZ972aarvtzhthEqc+eBewKrNVuGzubZme5m4B925XrXgi8Mckb2/Irgef7o3zymclrtZXkBcDpwGOq6voka7sfvDQc7dyX+5PMpsnk9wR+3u4b8T7gZcDLgacB7wbeWVVfH1qDe8Igr9Vakn2Bfwb2chc5aTiS7ECzT8TpwHer6p727+a+NIH+P5K8B9icpuv+uKpa6Fr0U88gr9Vekv2Aw2kW0Sj/0ZCmV5LnAt+mWWHyi8DjaDaaeSGwJnANcHw7094et2lkkFcnJFmvqrztRhqSJHsAX6MZj/8jmtnyL6OZQf8E4D3AcQBV5Qz6aeLEO3WCAV4arqr6fpJXAv8F7N4uJ/014KnAIcDlBvfpZyYvSZo0SV4EfBR4RlUtbctGdp5zDH6amclLkiZNVZ3a3uJ6UZIdqurmUdtAaxqZyUuSJl27aNUdVXXGsNvSZwZ5SdKUsYt+uAzykiR1lBvUSJLUUQZ5SZI6yiAvSVJHGeSlIUlyf5Jzk1yQ5EtJHrUK19qzXXiEJH+Q5LBxjt0wyZtXoo73JPmrlW2jpOlnkJeG566q2qmqngLcC7xx8M00HvHf0ao6pao+OM4hGwKPOMhLWv0Y5KWZ4XvAE5Jsk+SXSU4ELgC2TjI/yY+SnNNm/OsBJNknyUVJzgH+cORCSV6b5N/a55slOTnJz9rH7sAHgce3vQhHtcf9dZLFSc5L8t6Ba/1dkouTfB/YYdq+DUmTwhXvpCFLModmS87T2qLtgAVVdWaSTYC/B15QVXck+RvgHUmOBP4TeB5wKfCFMS7/EeA7VfWydp/v9YDDgKdU1U5t/fPbOncBApyS5DnAHcArgJ1o/q04Bzh7cj+9pKlkkJeGZ50k57bPvwd8EtgCuLKqzmzLdwOeBPwgCTTbdv4IeCLNhh+XACT5DM0mIKM9D3gNQFXdD9yaZN6oY+a3j5+2r9ejCfpzgZOr6s62jlNW6dNKmnYGeWl47hrJpke0gfyOwSJgUVW9ctRxDzlvFQX4x6r6+Kg6/nwS65A0BI7JSzPbmcCzkjwBIMm6SbYHLgK2SfL49rhXjnH+N4E3tefOTrIB8FuaLH3EQuB1A2P9WybZFPgusH+SdZLMBV46yZ9N0hQzyEszWFXdALwW+FyS82i76qvqbpru+a+3E++uH+MSbwf2SnI+zXj6k6rqJpru/wuSHFVVpwOfBX7UHvdfwNyqOodmrP9nwDeAxVP2QSVNCdeulySpo8zkJUnqKIO8JEkdZZCXJkmS57QL1ixLcsA4x/1+kvOTXJrkI2mn1CfZKMmiJJe0f85ry9Med2m7WM3TB661oD3+kiQLJvGznJpkw0d4zoNL606H8b6XUcfN+O9bmioGeXVau9DMdPk1zSS5z67guGOAP6W5F307YJ+2/DDgm1W1Hc2s+JH15/cdOPaQ9nySbAQcDuxKs5DN4cu5B36lVNWLquqWybjWFFru97IcM/77lqaKQV5DkeQrSc5OcmGSQwbK92mz4Z8l+WZbtl6ST7XZ2HlJ/qgtv33gvAOSHN8+Pz7JfyT5MXBkkl3SLAv70yQ/TLJDe9zsJP/czjI/L8lbkzwvyVcGrvvCJCdP5DNV1RVVdR7wwDife3Ng/ao6s5pZrycC+7dv7wec0D4/YVT5idU4E9iwvc7eNPfQL62qm4FFtAEsySeS7Lyc+o9PckySM5Nc1mbfxyX5xcj31x53RZJN0tyy9/X2/8cFSQ5q339G+13+LMlP0txiN1jPWN/5k9vjz22/8+3GqmMCxvpepv37lmYqF8PRsLyuqpYmWQdYnOS/aX50/ifwnKq6vM2cAP4fcGtVPRVggtnTVsDuVXV/kvWBZ1fVsiQvAP4B+COaLG0bYKf2vY2Am4GPJXl0e/vawcBxbb1fYPnrt3+4qk6c4OfeErh64PXVbRnAZlV1bfv8OmCzgXOuWs45Y5VTVW8Ypw3zgGcCfwCcAjwLeAPN/4edqurcgWP3Aa6pqhcDJNkgyZo0t9YdVFWL2+/3rlF1XMTyv/M3AkdX1UntdWYDLxpdR/vnvwB7Laf9n2834Bnr8187UDYt37c0UxnkNSxvS/Ky9vnWNF2jjwa+W1WXA1TV0vb9F9CsoU5bfvMErv+ldhlXgA2AE5JsBxSwxsB1/6Oqlg3Wl+TTwKuTfIomGI4sCzvRDHOVVVUlmar7W/+nvf75wG+q6nyAJBfS/OgZDPLnAx9K8k/A16rqe0meClxbVYvbtt7Wnj9Yx1jf+Y+Av0uyFfDlqrqkbcdD6miv+xdT8eGXZ4q/b2lo7K7XtEuyJ02AfWZV7UizZvraK3GpwX+UR58/uDTs+4Bvt1u6vnQCdX0KeDXNKnJfGvkRkOQLbTfz6MdrHkGbl9D0MozYqi0D+M1Id3P75/UD52y9nHPGKl+Re9o/Hxh4PvL6IT/8q+pi4Ok0wf79Sd49gevDGN95VX2WpgfhLuDUJM8bq44k/zLG9z0ydj6Rzz8Tvm9paAzyGoYNgJur6s4kT6TZhAWaJVyfk2RbeHCiEzRjn4eOnDzQXf+bJL+bZs/1kV6Bseob+cf4tQPli4A/Szs5b6S+qroGuIZm97dPjRxcVQe1+7+Pfky0q562e/i2JLulSX1fA3y1ffsUYGTG9oJR5a9JYzeaoYtraZajnZ9kXvudzG/LSHJikl0m2q6xJNkCuLOqPgMcRROMfwlsnuQZ7TFz8/AJjsv9zpM8Drisqj7Sfr6njVEHVfUXY3zfH1zB9/Kg6fq+pZnKIK9hOA2Yk+QXNHubnwkPLuF6CPDlJD/j/7ZPfT8wr52U9TP+b5z2MOBrwA956DjsaEcC/5jkpzw0U/0EzYz489rrvmrgvZOAq6rqFxP9UO1ktKuBlwMfb7u/R94b7AJ/c1v3pcCvaJaMhea7eGGSS2h6OkaC2anAZe3x/9mePzK88D6a5WYXA0cMDHE8jeaHyqp6KvCTtv2HA++vqnuBg4CPtt/bIh7eOzLWd34gcEF7vafQTIR7WB0TbNtyvxcYyvctzUguaystR5J/A35aVZ8cdlseqXYi3Cer6uXDbouk4TLIS6MkOZtmTP+FVXXPio6XpJnKIC9JUkc5Ji9JUkcZ5CVJ6iiDvCRJHWWQlySpowzykiR1lEFekqSO+v+bx+aGVQ+mwAAAAABJRU5ErkJggg==\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "plot_confusion_matrix(cnf_matrix,\n", " normalize = False,\n", " target_names = ['Positive', 'Negative'],\n", " title = \"Confusion Matrix\")" ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "DenseMatrix([[5469.]])\n", "\n" ] } ], "source": [ "from pyspark.mllib.evaluation import MulticlassMetrics\n", "# Create (prediction, label) pairs\n", "predictionAndLabel = predictiondf.select(\"prediction\", \"label\").rdd\n", "\n", "# Generate confusion matrix\n", "metrics = MulticlassMetrics(predictionAndLabel)\n", "print(metrics.confusionMatrix())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Cross Validation" ] }, { "cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [], "source": [ "from pyspark.ml.tuning import ParamGridBuilder, CrossValidator\n", "\n", "# Create ParamGrid and Evaluator for Cross Validation\n", "paramGrid = ParamGridBuilder().addGrid(nb.smoothing, [0.0, 0.2, 0.4, 0.6, 0.8, 1.0]).build()\n", "cvEvaluator = MulticlassClassificationEvaluator(metricName=\"accuracy\")" ] }, { "cell_type": "code", "execution_count": 28, "metadata": {}, "outputs": [], "source": [ "# Run Cross-validation\n", "cv = CrossValidator(estimator=pipeline, estimatorParamMaps=paramGrid, evaluator=cvEvaluator)\n", "cvModel = cv.fit(trainData)" ] }, { "cell_type": "code", "execution_count": 29, "metadata": {}, "outputs": [], "source": [ "# Make predictions on testData. cvModel uses the bestModel.\n", "cvPredictions = cvModel.transform(testData)" ] }, { "cell_type": "code", "execution_count": 30, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "+-----+----------+-----------+\n", "|label|prediction|probability|\n", "+-----+----------+-----------+\n", "| 0.0| 0.0| [1.0]|\n", "| 0.0| 0.0| [1.0]|\n", "| 0.0| 0.0| [1.0]|\n", "| 0.0| 0.0| [1.0]|\n", "| 0.0| 0.0| [1.0]|\n", "| 0.0| 0.0| [1.0]|\n", "| 0.0| 0.0| [1.0]|\n", "| 0.0| 0.0| [1.0]|\n", "| 0.0| 0.0| [1.0]|\n", "| 0.0| 0.0| [1.0]|\n", "| 0.0| 0.0| [1.0]|\n", "| 0.0| 0.0| [1.0]|\n", "| 0.0| 0.0| [1.0]|\n", "| 0.0| 0.0| [1.0]|\n", "| 0.0| 0.0| [1.0]|\n", "| 0.0| 0.0| [1.0]|\n", "| 0.0| 0.0| [1.0]|\n", "| 0.0| 0.0| [1.0]|\n", "| 0.0| 0.0| [1.0]|\n", "| 0.0| 0.0| [1.0]|\n", "+-----+----------+-----------+\n", "only showing top 20 rows\n", "\n" ] } ], "source": [ "cvPredictions.select(\"label\", \"prediction\", \"probability\").show()" ] }, { "cell_type": "code", "execution_count": 31, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "1.0" ] }, "execution_count": 31, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Evaluate bestModel found from Cross Validation\n", "evaluator.evaluate(cvPredictions)" ] } ], "metadata": { "kernelspec": { "display_name": "conda_python3", "language": "python", "name": "conda_python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.5" } }, "nbformat": 4, "nbformat_minor": 2 }