Amazon Fraud Detector CSV Profile


Overview
s3://opensourcedatahao/synthetic_data.csv
Record count 116,715
Column count 23
Duplicate count 0
Memory size 20.48 MB
Record size 184.0 bytes
Date range 2020-11-01 to 2021-02-28
Day count 119 days
Field Summary
  1. Inferred variable type: during event creation in Amazon Fraud Detector, you will need to map variables in your data to a list of predefined variable types. You should first try to select variable types to the best of your knowledge. If you cannot find the matching variable types or are unsure about the variable types of your variables, you can use the Inferred Variable Type column below as reference.
  2. Count: the total number of records (rows) in your data.
  3. # Distinct: the number of unique values in this feature column.
  4. % Distinct: the ratio between # Distinct and Count.
  5. # Missing: the number of missing values in this feature column.
  6. % Missing: the ratio between # Missing and Count.
Name Data Type Inferred Variable Type Count # Distinct % Distinct # Missing % Missing
EVENT_LABEL STRING EVENT_LABEL 110331 3 0.00% 6384 5.47%
EVENT_TIMESTAMP STRING EVENT_TIMESTAMP 116715 115953 99.35% 0 0.00%
event_id STRING CATEGORY 116715 116715 100.00% 0 0.00%
entity_id STRING CATEGORY 116715 29995 25.70% 0 0.00%
card_bin INTEGER NUMERIC 116715 23312 19.97% 0 0.00%
customer_name STRING CATEGORY 116715 689 0.59% 0 0.00%
address STRING TEXT 116715 29995 25.70% 0 0.00%
billing_city STRING CATEGORY 116715 11777 10.09% 0 0.00%
billing_state STRING CATEGORY 116715 51 0.04% 0 0.00%
billing_zip INTEGER NUMERIC 116715 19607 16.80% 0 0.00%
billing_latitude FLOAT NUMERIC 116715 18302 15.68% 0 0.00%
billing_longitude FLOAT NUMERIC 116715 18987 16.27% 0 0.00%
customer_job STRING CATEGORY 116715 639 0.55% 0 0.00%
ip_address STRING IP_ADDRESS 116715 29995 25.70% 0 0.00%
customer_email STRING EMAIL_ADDRESS 116715 29518 25.29% 0 0.00%
phone STRING CATEGORY 116715 29995 25.70% 0 0.00%
user_agent STRING TEXT 116715 24178 20.72% 0 0.00%
product_category STRING CATEGORY 116715 14 0.01% 0 0.00%
order_price FLOAT NUMERIC 116715 28614 24.52% 0 0.00%
payment_currency STRING CATEGORY 9 1 0.00% 116706 99.99%
merchant STRING CATEGORY 116715 693 0.59% 0 0.00%
time STRING DATETIME 116715 115953 99.35% 0 0.00%
_EMAIL_DOMAIN STRING CATEGORY 116715 7936 6.80% 0 0.00%
Field Warnings
The following 1 fields may cause potential issues, check the message and consider excluding them from model training.
Name Data Type Inferred Variable Type Count # Distinct % Distinct # Missing % Missing Message
payment_currency STRING CATEGORY 9 1 0.00% 116706 99.99% ONLY 1 UNIQUE VALUE; >90% MISSING; EXCLUDE

Data & Label Maturity


Amazon Fraud Detector models require a minimum of 400 observations labeled as “fraud” and 400 observations labeled as "non-fraud". You can map multiple label values to "fraud" or "non-fraud". As part of the data gathering process it is important to ensure that records have had sufficient time to “mature”, i.e. that enough time has passed to insure “non-fraud" and “fraud” records have been correctly identified.

Note: It can often take 30 - 45 days (or more) to correctly identify fraudulent events.
Label Warnings
Your EVENT_LABEL column contains 6384 missing values. AFD requires less than 1% of the values in label column are missing. Consider assigning proper labels or drop those records.
Label Summary
Label Value Mapped Label Class Count Percentage
legit NON-FRAUD 79299 67.94%
suspicious FRAUD 22093 18.93%
fraud FRAUD 8939 7.66%
Missing Labels Undefined 6384 5.47%


Categorical Feature Analysis


Name Type Count # Distinct % Distinct # Missing % Missing
entity_id CATEGORY 116715 29995 25.70% 0 0.00%

 

Name Type Count # Distinct % Distinct # Missing % Missing
customer_name CATEGORY 116715 689 0.59% 0 0.00%

 

Name Type Count # Distinct % Distinct # Missing % Missing
address TEXT 116715 29995 25.70% 0 0.00%

 

Name Type Count # Distinct % Distinct # Missing % Missing
billing_city CATEGORY 116715 11777 10.09% 0 0.00%

 

Name Type Count # Distinct % Distinct # Missing % Missing
billing_state CATEGORY 116715 51 0.04% 0 0.00%

 

Name Type Count # Distinct % Distinct # Missing % Missing
customer_job CATEGORY 116715 639 0.55% 0 0.00%

 

Name Type Count # Distinct % Distinct # Missing % Missing
ip_address IP_ADDRESS 116715 29995 25.70% 0 0.00%

 

Name Type Count # Distinct % Distinct # Missing % Missing
customer_email EMAIL_ADDRESS 116715 29518 25.29% 0 0.00%

 

Name Type Count # Distinct % Distinct # Missing % Missing
phone CATEGORY 116715 29995 25.70% 0 0.00%

 

Name Type Count # Distinct % Distinct # Missing % Missing
user_agent TEXT 116715 24178 20.72% 0 0.00%

 

Name Type Count # Distinct % Distinct # Missing % Missing
product_category CATEGORY 116715 14 0.01% 0 0.00%

 

Name Type Count # Distinct % Distinct # Missing % Missing
payment_currency CATEGORY 9 1 0.00% 116706 99.99%

 

Name Type Count # Distinct % Distinct # Missing % Missing
merchant CATEGORY 116715 693 0.59% 0 0.00%

 

Name Type Count # Distinct % Distinct # Missing % Missing
_EMAIL_DOMAIN CATEGORY 116715 7936 6.80% 0 0.00%

 


Numeric Feature Analysis



Name Type Count # Distinct % Distinct # Missing % Missing Mean Min Max
card_bin NUMERIC 116715 23312 19.97% 0 0.00% 410215.81942338176 180000.0 676399.0

 

Name Type Count # Distinct % Distinct # Missing % Missing Mean Min Max
billing_zip NUMERIC 116715 19607 16.80% 0 0.00% 49863.9313884248 1002.0 99929.0

 

Name Type Count # Distinct % Distinct # Missing % Missing Mean Min Max
billing_latitude NUMERIC 116715 18302 15.68% 0 0.00% 38.96244405774751 19.0668 71.2346

 

Name Type Count # Distinct % Distinct # Missing % Missing Mean Min Max
billing_longitude NUMERIC 116715 18987 16.27% 0 0.00% -91.12338885575977 -176.7874 -67.0408

 

Name Type Count # Distinct % Distinct # Missing % Missing Mean Min Max
order_price NUMERIC 116715 28614 24.52% 0 0.00% 114.5816346656385 1.0 18846.18

 

Name Type Count # Distinct % Distinct # Missing % Missing Mean Min Max
time DATETIME 116715 115953 99.35% 0 0.00% 2020-12-27 16:01:40.446103552+00:00 2020-11-01 00:00:00+00:00 2021-02-28 00:00:00+00:00

 

Feature and Label Correlations