{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Part 2: Setting an Amazon Fraud Detector model"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Uncomment and install s3fs, this is required to read CSV files from S3 directly into Pandas dataframe\n",
"# Once installed, please restart the Notebook Kernel (Kernel > Restart Kernel) before proceeding\n",
"\n",
"#%pip install s3fs"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Overview \n",
"\n",
"* [Notebook 1: Data Preparation, Process, and Store Features](./1-data-analysis-prep.ipynb)\n",
"* **[Notebook 2: Amazon Fraud Detector Model Setup](./2-afd-model-setup.ipynb)**\n",
" * **[Introduction](#intro)**\n",
" * **[Setup Notebook](#setup)**\n",
" * **[Set AFD Entity type, event type, and Detector names](#entity)**\n",
" * **[Profile Your Dataset](#profile)**\n",
" * **[Create Labels, Variables, Entity and Event Types](#labels)**\n",
" * **[Conclusion](#conclusion)**\n",
"* [Notebook 3: Model training, deployment, real-time and batch inference](./3-afd-model-train-deploy.ipynb)\n",
"* [Notebook 4: Create an end-to-end pipeline](./4-afd-pipeline.ipynb)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 1. Introduction \n",
"___\n",
"overview"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Amazon Fraud Detector is a fully managed service that makes it easy to identify potentially fraudulent online activities such as online payment fraud and the creation of fake accounts. Fraud Detector capitalizes on the latest advances in machine learning (ML) and 20 years of fraud detection expertise from AWS and Amazon.com to automatically identify potentially fraudulent activity so you can catch more fraud faster.\n",
"\n",
"In this notebook, we'll use the Amazon Fraud Detector API to define an entity and event of interest and use CSV data stored in S3 to train a model. Next, we'll derive some rules and create a \"detector\" by combining our entity, event, model, and rules into a single endpoint. Finally, we'll apply the detector to a sample of our data to identify potentially fraudulent events.\n",
"\n",
"After running this notebook you should be able to:\n",
"\n",
"* Define an Entity and Event\n",
"* Create a Detector\n",
"* Train a Machine Learning (ML) Model\n",
"* Author Rules to identify potential fraud based on the model's score\n",
"* Apply the Detector's \"predict\" function, to generate a model score and rule outcomes on data\n",
"\n",
"If you would like to know more, please check out [Fraud Detector's Documentation](https://docs.aws.amazon.com/frauddetector/latest/ug/what-is-frauddetector.html).\n",
"\n",
"To create models within Amazon Fraud Detector, you must provide data for training. This data has input features (defined by variables) and output labels (defined by labels in the Amazon Fraud Detector service). Additionally, you define events based on the type of entities sending the data for predictions. The following diagram shows the sequence of component creation followed in this tutorial.\n",
"\n",
""
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### IAM Permissions\n",
"---\n",
"\n",
"To use Amazon Fraud Detector, you have to set up permissions that allow access to the Amazon Fraud Detector console and API operations. You also have to allow Amazon Fraud Detector to perform tasks on your behalf and to access resources that you own. \n",
"\n",
"The following policies provide the required permission to use Amazon Fraud Detector:\n",
"\n",
"* `AmazonFraudDetectorFullAccessPolicy`\n",
" Allows you to perform the following actions:\n",
" - Access all Amazon Fraud Detector resources \n",
" - List and describe all model endpoints in Amazon SageMaker \n",
" - List all IAM roles in the account \n",
" - List all Amazon S3 buckets \n",
" - Allow IAM Pass Role to pass a role to Amazon Fraud Detector \n",
"\n",
"\n",
"* `AmazonS3FullAccess`\n",
" Allows full access to Amazon S3. This is required to upload training files to S3.\n",
"\n",
"In this case we will assign `AmazonFraudDetectorFullAccessPolicy` and `AmazonS3FullAccess` policies to the SageMaker Execution Role."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Plan\n",
"\n",
"#### Plan a Fraud Detector\n",
"---\n",
"\n",
"A Detector contains the event, model(s) and rule(s) detection logic for a particular type of fraud that you want to detect. We'll use the following 7 step process to plan a Fraud Detector:\n",
"\n",
"\n",
"\n",
"* Setup your notebook\n",
" - Name the major components `entity`, `entity type`, `model`, `detector`\n",
" - Get IAM role ARN\n",
" - S3 Bucket with your training data CSV File\n",
"* Read and Profile your Data\n",
" - This will give you an idea of what your dataset contains\n",
" - This will also identify the variables and labels that will need to be created to define your event\n",
"* Create event variables and labels\n",
" - This will create the variables and labels in fraud detector\n",
"* Define your Entity and Event Type\n",
" - What is the activity that you are detecting? That's likely your Event Type (e.g., account_registration)\n",
" - Who is performing this activity? That's likely your Entity (e.g., customer)\n",
"* Create and Train your Model\n",
" - Model training takes anywhere from 45-60 minutes\n",
" - Promote your model once training is complete\n",
"* Create Detector, generate Rules and assemble your Detector\n",
" - Create your detector\n",
" - Create rules based on your model scores\n",
" - Define outcomes (e.g., fraud, investigate and approve)\n",
" - Assemble your detector by adding your model and rules to it\n",
"* Test your Detector\n",
" - Interactively call predict on a handful of records\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 2. Setup your Notebook \n",
"---\n",
"overview\n",
"\n",
"1. Name the major components of Fraud Detector\n",
"2. Get IAM role ARN \n",
"3. S3 Bucket with your training data CSV File\n",
"\n",
"Then you can interactively exeucte the code cells in the notebook, no need to change anything unless you want to. \n",
"\n",
"\n",
"
\n", " | feature_name | \n", "dtype | \n", "count | \n", "nunique | \n", "null | \n", "not_null | \n", "null_pct | \n", "nunique_pct | \n", "feature_type | \n", "feature_warning | \n", "
---|---|---|---|---|---|---|---|---|---|---|
0 | \n", "EVENT_TIMESTAMP | \n", "object | \n", "100133 | \n", "90634 | \n", "0 | \n", "100133 | \n", "0.0 | \n", "0.9051 | \n", "EVENT_TIMESTAMP | \n", "NO WARNING | \n", "
1 | \n", "EVENT_LABEL | \n", "object | \n", "100133 | \n", "2 | \n", "0 | \n", "100133 | \n", "0.0 | \n", "0.0000 | \n", "TARGET | \n", "NO WARNING | \n", "
2 | \n", "ip_address | \n", "object | \n", "100133 | \n", "3801 | \n", "0 | \n", "100133 | \n", "0.0 | \n", "0.0380 | \n", "IP_ADDRESS | \n", "NO WARNING | \n", "
3 | \n", "email_address | \n", "object | \n", "100133 | \n", "3296 | \n", "0 | \n", "100133 | \n", "0.0 | \n", "0.0329 | \n", "EMAIL_ADDRESS | \n", "NO WARNING | \n", "
4 | \n", "user_agent | \n", "object | \n", "100133 | \n", "2867 | \n", "0 | \n", "100133 | \n", "0.0 | \n", "0.0286 | \n", "CATEGORY | \n", "NO WARNING | \n", "
5 | \n", "customer_name | \n", "object | \n", "100133 | \n", "71178 | \n", "0 | \n", "100133 | \n", "0.0 | \n", "0.7108 | \n", "CATEGORY | \n", "NO WARNING | \n", "
6 | \n", "phone_number | \n", "object | \n", "100133 | \n", "99371 | \n", "0 | \n", "100133 | \n", "0.0 | \n", "0.9924 | \n", "CATEGORY | \n", "EXCLUDE, GT 90% UNIQUE | \n", "
7 | \n", "customer_city | \n", "object | \n", "100133 | \n", "3430 | \n", "0 | \n", "100133 | \n", "0.0 | \n", "0.0343 | \n", "CATEGORY | \n", "NO WARNING | \n", "
8 | \n", "customer_postal | \n", "float64 | \n", "100133 | \n", "1993 | \n", "0 | \n", "100133 | \n", "0.0 | \n", "0.0199 | \n", "NUMERIC | \n", "NO WARNING | \n", "
9 | \n", "customer_state | \n", "object | \n", "100133 | \n", "51 | \n", "0 | \n", "100133 | \n", "0.0 | \n", "0.0005 | \n", "CATEGORY | \n", "NO WARNING | \n", "
10 | \n", "customer_address | \n", "object | \n", "100133 | \n", "99985 | \n", "0 | \n", "100133 | \n", "0.0 | \n", "0.9985 | \n", "CATEGORY | \n", "EXCLUDE, GT 90% UNIQUE | \n", "
These are the available features in the data set for the AFD model training
" ], "text/plain": [ "We have two types of events - Fraud events and legitimate events
" ], "text/plain": [ "Training data schema is required for creating and training the model. Refer to documentation
" ], "text/plain": [ "