{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# An Introduction to the Amazon Fraud Detector API \n", "#### Supervised fraud detection \n", "-------\n", "\n", "## Introduction\n", "-------\n", "\n", "Amazon Fraud Detector is a fully managed service that makes it easy to identify potentially fraudulent online activities such as online payment fraud and the creation of fake accounts. Fraud Detector capitalizes on the latest advances in machine learning (ML) and 20 years of fraud detection expertise from AWS and Amazon.com to automatically identify potentially fraudulent activity so you can catch more fraud faster.\n", "\n", "If you would like to know more, please check out [Fraud Detector's Documentation](https://docs.aws.amazon.com/frauddetector/). " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from IPython.core.display import display, HTML\n", "from IPython.display import clear_output\n", "display(HTML(\"\"))\n", "# ------------------------------------------------------------------\n", "\n", "import numpy as np\n", "import matplotlib.pyplot as plt\n", "import pandas as pd\n", "pd.set_option('display.max_rows', 500)\n", "pd.set_option('display.max_columns', 500)\n", "pd.set_option('display.width', 1000)\n", "\n", "import os\n", "import sys\n", "import time\n", "import json\n", "import uuid \n", "from datetime import datetime\n", "\n", "# -- AWS stuff -- \n", "import boto3\n", "import sagemaker\n", "from sagemaker import get_execution_role\n", "\n", "# -- sklearn --\n", "from sklearn.metrics import roc_curve, roc_auc_score, auc, roc_auc_score\n", "%matplotlib inline " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# -- initialize the AFD client \n", "client = boto3.client('frauddetector')\n", "\n", "# -- suffix is appended to detector and model name for uniqueness \n", "sufx = datetime.now().strftime(\"%Y%m%d\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 1. Setup \n", "-----\n", "\n", "***To get started *** \n", "\n", "1. Load the config.json that was created for you\n", "2. The properties EVENT_TYPE, ENTITY_TYPE, MODEL_NAME and DETECTOR_NAME will all be set for you based on your configuration. The names you named in your CloudFormation template will be set here.\n", "3. Load ARN of you local instance\n", "4. Let the source file from S3 be specified\n", "\n", "Then you can interactively exeucte the code cells in the notebook, no need to change anything unless you want to. \n", "\n", "\n", "
Fraud Detector Components \n", "Fraud Detector Components: EVENT_TYPE is a business activity that you want evaluated for fraud risk. ENTITY_TYPE represents the \"what or who\" that is performing the event you want to evaluate. MODEL_NAME is the name of your supervised machine learning model that Fraud Detector trains on your behalf. DETECTOR_NAME is the name of the detector that contains the detection logic (model and rules) that you apply to events that you want to evaluate for fraud.\n", "\n", "
\n", "\n", "-----\n", "\n", "### Bucket, File, and ARN Role\n", "\n", "Bucket, ARN and Model Name Identify the following assets. S3_BUCKET is the name of the bucket where your file lives. S3_FILE is the URL to your s3 file. ARN_ROLE is the data access role \"ARN\" for the training data source.\n", "\n", "
Bucket, ARN and Model Name \n", "\n", "Identify the following assets. S3_BUCKET is the name of the bucket where your file lives. S3_FILE is the URL to your s3 file. ARN_ROLE is the data access role \"ARN\" for the training data source.\n", "\n", "
" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "with open(\"config.json\") as json_file:\n", " config = json.load(json_file)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# -- This is all you need to fill out. Once complete simply interactively run each code cell. -- \n", "\n", "ENTITY_TYPE = config[\"ENTITY_TYPE\"]\n", "ENTITY_DESC = \"entity description: {0}\".format(sufx)\n", "\n", "EVENT_TYPE = config[\"EVENT_TYPE\"]\n", "EVENT_DESC = \"example event description: {0}\".format(sufx)\n", "\n", "MODEL_NAME = config[\"MODEL_NAME\"]\n", "MODEL_DESC = \"model trained on: {0}\".format(sufx)\n", "\n", "DETECTOR_NAME = config[\"DETECTOR_NAME\"] \n", "DETECTOR_DESC = \"detects synthetic fraud events created: {0}\".format(sufx)\n", "\n", "ARN_ROLE = get_execution_role()\n", "S3_BUCKET = config[\"S3_BUCKET\"]\n", "S3_FILE = config[\"S3_FILE\"]\n", "S3_FILE_LOC = \"s3://{0}/{1}\".format(S3_BUCKET, S3_FILE)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Save the file to S3\n", "df = pd.read_csv(S3_FILE)\n", "df.to_csv(S3_FILE_LOC, index=False)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 2. Profile Your Dataset \n", "-----\n", "\n", " \n", "
💡 Profiling \n", "\n", "The function below will: 1. profile your data, creating descriptive statistics, 2. perform basic data quality checks (nulls, unique variables, etc.), and 3. return summary statistics and the EVENT and MODEL schemas used to define your EVENT_TYPE and TRAIN your MODEL.\n", "\n", "\n", "
" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# --- no changes; just run this code block ---\n", "def summary_stats(df):\n", " \"\"\" Generate summary statistics for a panda's data frame \n", " Args:\n", " df (DataFrame): panda's dataframe to create summary statistics for.\n", " Returns:\n", " DataFrame of summary statistics, training data schema, event variables and event lables \n", " \"\"\"\n", " df = df.copy()\n", " rowcnt = len(df)\n", " df['EVENT_LABEL'] = df['EVENT_LABEL'].astype('str', errors='ignore')\n", " df_s1 = df.agg(['count', 'nunique']).transpose().reset_index().rename(columns={\"index\":\"feature_name\"})\n", " df_s1[\"null\"] = (rowcnt - df_s1[\"count\"]).astype('int64')\n", " df_s1[\"not_null\"] = rowcnt - df_s1[\"null\"]\n", " df_s1[\"null_pct\"] = df_s1[\"null\"] / rowcnt\n", " df_s1[\"nunique_pct\"] = df_s1['nunique']/ rowcnt\n", " dt = pd.DataFrame(df.dtypes).reset_index().rename(columns={\"index\":\"feature_name\", 0:\"dtype\"})\n", " df_stats = pd.merge(dt, df_s1, on='feature_name', how='inner').round(4)\n", " df_stats['nunique'] = df_stats['nunique'].astype('int64')\n", " df_stats['count'] = df_stats['count'].astype('int64')\n", " \n", " # -- variable type mapper -- \n", " df_stats['feature_type'] = \"UNKOWN\"\n", " df_stats.loc[df_stats[\"dtype\"] == object, 'feature_type'] = \"CATEGORY\"\n", " df_stats.loc[(df_stats[\"dtype\"] == \"int64\") | (df_stats[\"dtype\"] == \"float64\"), 'feature_type'] = \"NUMERIC\"\n", " df_stats.loc[df_stats[\"feature_name\"].str.contains(\"ipaddress|ip_address|ipaddr\"), 'feature_type'] = \"IP_ADDRESS\"\n", " df_stats.loc[df_stats[\"feature_name\"].str.contains(\"email|email_address|emailaddr\"), 'feature_type'] = \"EMAIL_ADDRESS\"\n", " df_stats.loc[df_stats[\"feature_name\"] == \"EVENT_LABEL\", 'feature_type'] = \"TARGET\"\n", " df_stats.loc[df_stats[\"feature_name\"] == \"EVENT_TIMESTAMP\", 'feature_type'] = \"EVENT_TIMESTAMP\"\n", " \n", " # -- variable warnings -- \n", " df_stats['feature_warning'] = \"NO WARNING\"\n", " df_stats.loc[(df_stats[\"nunique\"] != 2) & (df_stats[\"feature_name\"] == \"EVENT_LABEL\"),'feature_warning' ] = \"LABEL WARNING, NON-BINARY EVENT LABEL\"\n", " df_stats.loc[(df_stats[\"nunique_pct\"] > 0.9) & (df_stats['feature_type'] == \"CATEGORY\") ,'feature_warning' ] = \"EXCLUDE, GT 90% UNIQUE\"\n", " df_stats.loc[(df_stats[\"null_pct\"] > 0.2) & (df_stats[\"null_pct\"] <= 0.5), 'feature_warning' ] = \"NULL WARNING, GT 20% MISSING\"\n", " df_stats.loc[df_stats[\"null_pct\"] > 0.5,'feature_warning' ] = \"EXCLUDE, GT 50% MISSING\"\n", " df_stats.loc[((df_stats['dtype'] == \"int64\" ) | (df_stats['dtype'] == \"float64\" ) ) & (df_stats['nunique'] < 0.2), 'feature_warning' ] = \"LIKELY CATEGORICAL, NUMERIC w. LOW CARDINALITY\"\n", " \n", " # -- target check -- \n", " exclude_fields = df_stats.loc[(df_stats['feature_warning'] != 'NO WARNING')]['feature_name'].to_list()\n", " event_variables = df_stats.loc[(~df_stats['feature_name'].isin(['EVENT_LABEL', 'EVENT_TIMESTAMP']))]['feature_name'].to_list()\n", " event_labels = df[\"EVENT_LABEL\"].unique().tolist()\n", " \n", " trainingDataSchema = {\n", " 'modelVariables' : df_stats.loc[(df_stats['feature_type'].isin(['IP_ADDRESS', 'EMAIL_ADDRESS', 'CATEGORY', 'NUMERIC' ]))]['feature_name'].to_list(),\n", " 'labelSchema' : {\n", " 'labelMapper' : {\n", " 'FRAUD' : [df[\"EVENT_LABEL\"].value_counts().idxmin()],\n", " 'LEGIT' : [df[\"EVENT_LABEL\"].value_counts().idxmax()]\n", " }\n", " }\n", " }\n", " \n", " \n", " model_variables = df_stats.loc[(df_stats['feature_type'].isin(['IP_ADDRESS', 'EMAIL_ADDRESS', 'CATEGORY', 'NUMERIC' ]))]['feature_name'].to_list()\n", " \n", " \n", " # -- label schema -- \n", " label_map = {\n", " 'FRAUD' : [df[\"EVENT_LABEL\"].value_counts().idxmin()],\n", " 'LEGIT' : [df[\"EVENT_LABEL\"].value_counts().idxmax()]\n", " }\n", " \n", " \n", " print(\"--- summary stats ---\")\n", " print(df_stats)\n", " print(\"\\n\")\n", " print(\"--- event variables ---\")\n", " print(event_variables)\n", " print(\"\\n\")\n", " print(\"--- event labels ---\")\n", " print(event_labels)\n", " print(\"\\n\")\n", " print(\"--- training data schema ---\")\n", " print(trainingDataSchema)\n", " print(\"\\n\")\n", " \n", " return df_stats, trainingDataSchema, event_variables, event_labels\n", "\n", "# -- connect to S3, snag file, and convert to a panda's dataframe --\n", "s3 = boto3.resource('s3')\n", "obj = s3.Object(S3_BUCKET, S3_FILE)\n", "body = obj.get()['Body']\n", "df = pd.read_csv(body)\n", "\n", "# -- call profiling function -- \n", "df_stats, trainingDataSchema, eventVariables, eventLabels = summary_stats(df)\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 3. Create Variables\n", "-----\n", "\n", "
💡 Create Variables. \n", "\n", "The following section will automatically create your modeling input variables and your model scoring variable for you. \n", "\n", "
" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "df_stats.loc[(df_stats['feature_type'].isin(['IP_ADDRESS', 'EMAIL_ADDRESS']))]" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# --- no changes just run this code block ---\n", "def create_label(df, FRAUD_LABEL):\n", " \"\"\"\n", " Returns a dictionary for the model labelSchema, by identifying the rare event as fraud / and common as not-fraud \n", " \n", " Arguments:\n", " df -- input dataframe \n", " FRAUD_LABEL -- the name of the field that contains fraud label \n", " \n", " Returns:\n", " labelSchema -- a dictionary containing labelKey & labelMapper \n", " \"\"\"\n", " label_summary = df[FRAUD_LABEL].value_counts()\n", " labelSchema = {'labelKey': FRAUD_LABEL,\n", " \"labelMapper\" : { \"FRAUD\": [str(label_summary.idxmin())], \n", " \"LEGIT\": [str(label_summary.idxmax())]}\n", " }\n", " client.put_label(\n", " name = str(label_summary.idxmin()),\n", " description = 'FRAUD')\n", " \n", " client.put_label(\n", " name = str(label_summary.idxmax()),\n", " description = 'LEGIT')\n", " return labelSchema\n", " \n", "# -- function to create all your variables --- \n", "def create_variables(df_stats, MODEL_NAME):\n", " \"\"\"\n", " Returns a variable list of model input variables, checks to see if variable exists,\n", " and, if not, then it adds the variable to Fraud Detector \n", " \n", " Arguments: \n", " enrichment_features -- dictionary of optional features, mapped to specific variable types enriched (CARD_BIN, USERAGENT)\n", " numeric_features -- optional list of numeric field names \n", " categorical_features -- optional list of categorical features \n", " \n", " Returns:\n", " variable_list -- a list of variable dictionaries \n", " \n", " \"\"\"\n", " enrichment_features = df_stats.loc[(df_stats['feature_type'].isin(['IP_ADDRESS', 'EMAIL_ADDRESS']))].to_dict(orient=\"record\")\n", " numeric_features = df_stats.loc[(df_stats['feature_type'].isin(['NUMERIC']))]['feature_name'].to_dict()\n", " categorical_features = df_stats.loc[(df_stats['feature_type'].isin(['CATEGORY']))]['feature_name'].to_dict()\n", " \n", " variable_list = []\n", " # -- first do the enrichment features\n", " for feature in enrichment_features: \n", " variable_list.append( {'name' : feature['feature_name']})\n", " try:\n", " resp = client.get_variables(name=feature['feature_name'])\n", " except:\n", " print(\"Creating variable: {0}\".format(feature['feature_name']))\n", " resp = client.create_variable(\n", " name = feature['feature_name'],\n", " dataType = 'STRING',\n", " dataSource ='EVENT',\n", " defaultValue = '', \n", " description = feature['feature_name'],\n", " variableType = feature['feature_type'] )\n", " \n", " \n", " # -- check and update the numeric features \n", " for feature in numeric_features: \n", " variable_list.append( {'name' : numeric_features[feature]})\n", " try:\n", " resp = client.get_variables(name=numeric_features[feature])\n", " except:\n", " print(\"Creating variable: {0}\".format(numeric_features[feature]))\n", " resp = client.create_variable(\n", " name = numeric_features[feature],\n", " dataType = 'FLOAT',\n", " dataSource ='EVENT',\n", " defaultValue = '0.0', \n", " description = numeric_features[feature],\n", " variableType = 'NUMERIC' )\n", " \n", " # -- check and update the categorical features \n", " for feature in categorical_features: \n", " variable_list.append( {'name' : categorical_features[feature]})\n", " try:\n", " resp = client.get_variables(name=categorical_features[feature])\n", " except:\n", " print(\"Creating variable: {0}\".format(categorical_features[feature]))\n", " resp = client.create_variable(\n", " name = categorical_features[feature],\n", " dataType = 'STRING',\n", " dataSource ='EVENT',\n", " defaultValue = '', \n", " description = categorical_features[feature],\n", " variableType = 'CATEGORICAL' )\n", " \n", " return variable_list\n", "\n", "\n", "model_variables = create_variables(df_stats, MODEL_NAME)\n", "print(\"\\n --- model variable dict --\")\n", "print(model_variables)\n", "\n", "\n", "model_label = create_label(df, \"EVENT_LABEL\")\n", "print(\"\\n --- model label schema dict --\")\n", "print(model_label)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 4. Create Entity and Event Types\n", "-----\n", " \n", "
💡 Entity and Event. \n", " \n", "The following code block will automatically create your entity and event types for you.\n", "\n", "
" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "eventLabels" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# --- no changes just run this code block ---\n", "response = client.put_entity_type(\n", " name = ENTITY_TYPE,\n", " description = ENTITY_DESC\n", ")\n", "print(\"-- create entity --\")\n", "print(response)\n", "\n", "\n", "response = client.put_event_type (\n", " name = EVENT_TYPE,\n", " eventVariables = eventVariables,\n", " labels = eventLabels,\n", " entityTypes = [ENTITY_TYPE])\n", "print(\"-- create event type --\")\n", "print(response)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 5. Create & Train your Model\n", "-----\n", " \n", "
💡 Train Model. \n", "\n", "The following section will automatically train and activate your model for you. \n", "\n", "
" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# --- no changes; just run this code block. ---\n", "\n", "# -- create our model --\n", "response = client.create_model(\n", " description = MODEL_DESC,\n", " eventTypeName = EVENT_TYPE,\n", " modelId = MODEL_NAME,\n", " modelType = 'ONLINE_FRAUD_INSIGHTS')\n", "\n", "print(\"-- initalize model --\")\n", "print(response)\n", "# -- initializes the model, it's now ready to train -- \n", "response = client.create_model_version(\n", " modelId = MODEL_NAME,\n", " modelType = 'ONLINE_FRAUD_INSIGHTS',\n", " trainingDataSource = 'EXTERNAL_EVENTS',\n", " trainingDataSchema = trainingDataSchema,\n", " externalEventsDetail = {\n", " 'dataLocation' : S3_FILE_LOC,\n", " 'dataAccessRoleArn': ARN_ROLE\n", " }\n", ")\n", "print(\"-- model training --\")\n", "print(response)\n", "\n", "# -- model training takes time, we'll loop until it's complete -- \n", "print(\"-- wait for model training to complete --\")\n", "stime = time.time()\n", "while True:\n", " print(\"In progress\")\n", " clear_output(wait=True)\n", " response = client.get_model_version(modelId=MODEL_NAME, modelType = \"ONLINE_FRAUD_INSIGHTS\", modelVersionNumber = '1.0')\n", " if response['status'] == 'TRAINING_IN_PROGRESS':\n", " print(f\"current progress: {(time.time() - stime)/60:{3}.{3}} minutes\")\n", " time.sleep(60) # -- sleep for 60 seconds \n", " if response['status'] != 'TRAINING_IN_PROGRESS':\n", " print(\"Model status : \" + response['status'])\n", " break\n", " \n", "etime = time.time()\n", "\n", "# -- summarize -- \n", "print(\"\\n --- model training complete --\")\n", "print(\"Elapsed time : %s\" % (etime - stime) + \" seconds \\n\" )\n", "print(response)\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "response = client.update_model_version_status (\n", " modelId = MODEL_NAME,\n", " modelType = 'ONLINE_FRAUD_INSIGHTS',\n", " modelVersionNumber = '1.0',\n", " status = 'ACTIVE'\n", ")\n", "print(\"-- activating model --\")\n", "print(response)\n", "\n", "#-- wait until model is active \n", "print(\"--- waiting until model status is active \")\n", "stime = time.time()\n", "while True:\n", " clear_output(wait=True)\n", " response = client.get_model_version(\n", " modelId=MODEL_NAME,\n", " modelType=\"ONLINE_FRAUD_INSIGHTS\",\n", " modelVersionNumber='1.0')\n", " if response['status'] != 'ACTIVE':\n", " print(f\"current progress: {(time.time() - stime)/60:{3}.{3}} minutes\")\n", " time.sleep(60) # sleep for 1 minute \n", " if response['status'] == 'ACTIVE':\n", " print(\"Model status : \" + response['status'])\n", " break\n", " \n", "etime = time.time()\n", "print(\"Elapsed time : %s\" % (etime - stime) + \" seconds \\n\" )\n", "print(response)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# -- model performance summary -- \n", "auc = client.describe_model_versions(\n", " modelId= MODEL_NAME,\n", " modelVersionNumber='1.0',\n", " modelType='ONLINE_FRAUD_INSIGHTS',\n", " maxResults=10\n", ")['modelVersionDetails'][0]['trainingResult']['trainingMetrics']['auc']\n", "\n", "\n", "df_model = pd.DataFrame(client.describe_model_versions(\n", " modelId= MODEL_NAME,\n", " modelVersionNumber='1.0',\n", " modelType='ONLINE_FRAUD_INSIGHTS',\n", " maxResults=10\n", ")['modelVersionDetails'][0]['trainingResult']['trainingMetrics']['metricDataPoints'])\n", "\n", "\n", "plt.figure(figsize=(10,10))\n", "plt.plot(df_model[\"fpr\"], df_model[\"tpr\"], color='darkorange',\n", " lw=2, label='ROC curve (area = %0.3f)' % auc)\n", "plt.plot([0, 1], [0, 1], color='navy', lw=2, linestyle='--')\n", "plt.xlabel('False Positive Rate')\n", "plt.ylabel('True Positive Rate')\n", "plt.title( MODEL_NAME + ' ROC Chart')\n", "plt.legend(loc=\"lower right\",fontsize=12)\n", "plt.axvline(x = 0.02 ,linewidth=2, color='r')\n", "plt.axhline(y = 0.73 ,linewidth=2, color='r')\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 6. Create Detector, generate Rules and assemble your Detector\n", "\n", "-----\n", " \n", "
💡 Generate Rules, Create and Publish a Detector. \n", " \n", "The following section will automatically generate a number of fraud, investigate and approve rules based on the false positive rate and score thresholds of your model. These are just example rules that you could create, it is recommended that you fine tune your rules specifically to your business use case.\n", "
" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# -- initialize your detector -- \n", "response = client.put_detector(detectorId = DETECTOR_NAME, \n", " description = DETECTOR_DESC, \n", " eventTypeName = EVENT_TYPE )\n", "\n", "print(response)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# -- make rules -- \n", "model_stat = df_model.round(decimals=2) \n", "\n", "m = model_stat.loc[model_stat.groupby([\"fpr\"])[\"threshold\"].idxmax()] \n", "\n", "def make_rule(x):\n", " rule = \"\"\n", " if x['fpr'] <= 0.05: \n", " rule = \"${0}_insightscore > {1}\".format(MODEL_NAME,x['threshold'])\n", " if x['fpr'] == 0.06:\n", " rule = \"${0}_insightscore <= {1}\".format(MODEL_NAME,x['threshold_prev'])\n", " return rule\n", " \n", "m[\"threshold_prev\"] = m['threshold'].shift(1)\n", "m['rule'] = m.apply(lambda x: make_rule(x), axis=1)\n", "\n", "m['outcome'] = \"approve\"\n", "m.loc[m['fpr'] <= 0.03, \"outcome\"] = \"fraud\"\n", "m.loc[(m['fpr'] > 0.03) & (m['fpr'] <= 0.05), \"outcome\"] = \"investigate\"\n", "\n", "print (\" --- score thresholds 1% to 6% --- \")\n", "print(m[[\"fpr\", \"tpr\", \"threshold\", \"rule\", \"outcome\"]].loc[(m['fpr'] > 0.0 ) & (m['fpr'] <= 0.06)].reset_index(drop=True))\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# -- create outcomes -- \n", "def create_outcomes(outcomes):\n", " \"\"\" create Fraud Detector Outcomes \n", " \n", " \"\"\" \n", " for outcome in outcomes:\n", " print(\"creating outcome variable: {0} \".format(outcome))\n", " response = client.put_outcome(\n", " name=outcome,\n", " description=outcome)\n", "\n", "# -- get distinct outcomes \n", "outcomes = m[\"outcome\"].unique().tolist()\n", "\n", "create_outcomes(outcomes)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "rule_set = m[(m[\"fpr\"] > 0.0) & (m[\"fpr\"] <= 0.06)][[\"outcome\", \"rule\"]].to_dict('records')\n", "rule_list = []\n", "for i, rule in enumerate(rule_set):\n", " ruleId = \"rule{0}_{1}\".format(i, MODEL_NAME)\n", " rule_list.append({\"ruleId\": ruleId, \n", " \"ruleVersion\" : '1',\n", " \"detectorId\" : DETECTOR_NAME\n", " \n", " })\n", " print(\"creating rule: {0}: IF {1} THEN {2}\".format(ruleId, rule[\"rule\"], rule['outcome']))\n", " try:\n", " response = client.create_rule(\n", " ruleId = ruleId,\n", " detectorId = DETECTOR_NAME,\n", " expression = rule['rule'],\n", " language = 'DETECTORPL',\n", " outcomes = [rule['outcome']]\n", " )\n", " except:\n", " print(\"this rule already exists in this detector\")\n", "rule_list " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "client.create_detector_version(\n", " detectorId = DETECTOR_NAME,\n", " rules = rule_list,\n", " modelVersions = [{\"modelId\": MODEL_NAME, \n", " \"modelType\" : \"ONLINE_FRAUD_INSIGHTS\",\n", " \"modelVersionNumber\" : \"1.0\"}],\n", " ruleExecutionMode = 'FIRST_MATCHED'\n", " )\n", "\n", "print(\"\\n -- detector created -- \")\n", "print(response) \n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "response = client.update_detector_version_status(\n", " detectorId= DETECTOR_NAME,\n", " detectorVersionId='1',\n", " status='ACTIVE'\n", ")\n", "print(\"\\n -- detector activated -- \")\n", "print(response)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 7. Make Predictions \n", "-----\n", " \n", "
💡 Make Predictions. \n", " \n", "The following section will apply your detector to the first 10 records in your training dataset. To apply your detector to more simply change the record_count, alternatively you can specify the full training data with the following: \n", "\n", "
\n", "\n", "```python\n", "\n", "record_count = df.shape()[0]\n", "\n", "```" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "record_count = 10 \n", "tmp = df[eventVariables].head(record_count).astype(str).to_dict(orient='records')\n", "tmp[0]" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "df.groupby(\"EVENT_LABEL\").count()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# -- this will apply your detector to the first 10 records of your trainig dataset. -- \n", "predicted_dat = []\n", "pred_data = df[eventVariables].head(record_count).astype(str).to_dict(orient='records')\n", "timestampStr = datetime.now().strftime(\"%Y-%m-%dT%H:%M:%SZ\")\n", "for rec in pred_data:\n", " eventId = uuid.uuid1()\n", " pred = client.get_event_prediction(detectorId=DETECTOR_NAME, \n", " detectorVersionId='1',\n", " eventId = str(eventId),\n", " eventTypeName = EVENT_TYPE,\n", " eventTimestamp = timestampStr, \n", " entities = [{'entityType': ENTITY_TYPE, 'entityId':str(eventId.int)}],\n", " eventVariables=rec) \n", " \n", " rec[\"score\"] = pred['modelScores'][0]['scores'][\"{0}_insightscore\".format(MODEL_NAME)]\n", " rec[\"outcome\"] = pred['ruleResults'][0]['outcomes']\n", " predicted_dat.append(rec)\n", " " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# -- review your predictons -- \n", "predictions = pd.DataFrame(predicted_dat)\n", "predictions.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Optionally Write Predictions to File\n", "\n", "
💡 Write Predictions. \n", "\n", "- You can write your prediction dataset to a CSV or Excel to manually review predictions\n", "- Simply add a cell below and copy the code below\n", "\n", "
\n", "\n", "\n", "\n", "```python\n", "\n", "# -- optionally write predictions to a CSV file -- \n", "predictions.to_csv(MODEL_NAME + \".csv\", index=False)\n", "# -- or to a XLS file \n", "predictions.to_excel(MODEL_NAME + \".xlsx\", index=False)\n", "\n", "```" ] } ], "metadata": { "kernelspec": { "display_name": "conda_python3", "language": "python", "name": "conda_python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.13" } }, "nbformat": 4, "nbformat_minor": 4 }