Record count | 116,715 |
Column count | 23 |
Duplicate count | 0 |
Memory size | 20.48 MB |
Record size | 184.0 bytes |
Date range | 2020-11-01 to 2021-02-28 |
Day count | 119 days |
Name | Data Type | Inferred Variable Type | Count | # Distinct | % Distinct | # Missing | % Missing |
---|---|---|---|---|---|---|---|
EVENT_LABEL | STRING | EVENT_LABEL | 110331 | 3 | 0.00% | 6384 | 5.47% |
EVENT_TIMESTAMP | STRING | EVENT_TIMESTAMP | 116715 | 115953 | 99.35% | 0 | 0.00% |
event_id | STRING | CATEGORY | 116715 | 116715 | 100.00% | 0 | 0.00% |
entity_id | STRING | CATEGORY | 116715 | 29995 | 25.70% | 0 | 0.00% |
card_bin | INTEGER | NUMERIC | 116715 | 23312 | 19.97% | 0 | 0.00% |
customer_name | STRING | CATEGORY | 116715 | 689 | 0.59% | 0 | 0.00% |
address | STRING | TEXT | 116715 | 29995 | 25.70% | 0 | 0.00% |
billing_city | STRING | CATEGORY | 116715 | 11777 | 10.09% | 0 | 0.00% |
billing_state | STRING | CATEGORY | 116715 | 51 | 0.04% | 0 | 0.00% |
billing_zip | INTEGER | NUMERIC | 116715 | 19607 | 16.80% | 0 | 0.00% |
billing_latitude | FLOAT | NUMERIC | 116715 | 18302 | 15.68% | 0 | 0.00% |
billing_longitude | FLOAT | NUMERIC | 116715 | 18987 | 16.27% | 0 | 0.00% |
customer_job | STRING | CATEGORY | 116715 | 639 | 0.55% | 0 | 0.00% |
ip_address | STRING | IP_ADDRESS | 116715 | 29995 | 25.70% | 0 | 0.00% |
customer_email | STRING | EMAIL_ADDRESS | 116715 | 29518 | 25.29% | 0 | 0.00% |
phone | STRING | CATEGORY | 116715 | 29995 | 25.70% | 0 | 0.00% |
user_agent | STRING | TEXT | 116715 | 24178 | 20.72% | 0 | 0.00% |
product_category | STRING | CATEGORY | 116715 | 14 | 0.01% | 0 | 0.00% |
order_price | FLOAT | NUMERIC | 116715 | 28614 | 24.52% | 0 | 0.00% |
payment_currency | STRING | CATEGORY | 9 | 1 | 0.00% | 116706 | 99.99% |
merchant | STRING | CATEGORY | 116715 | 693 | 0.59% | 0 | 0.00% |
time | STRING | DATETIME | 116715 | 115953 | 99.35% | 0 | 0.00% |
_EMAIL_DOMAIN | STRING | CATEGORY | 116715 | 7936 | 6.80% | 0 | 0.00% |
Name | Data Type | Inferred Variable Type | Count | # Distinct | % Distinct | # Missing | % Missing | Message |
---|---|---|---|---|---|---|---|---|
payment_currency | STRING | CATEGORY | 9 | 1 | 0.00% | 116706 | 99.99% | ONLY 1 UNIQUE VALUE; >90% MISSING; EXCLUDE |
Amazon Fraud Detector models require a minimum of 400 observations labeled as “fraud” and 400 observations labeled as "non-fraud". You can map multiple label values to "fraud" or "non-fraud". As part of the data gathering process it is important to ensure that records have had sufficient time to “mature”, i.e. that enough time has passed to insure “non-fraud" and “fraud” records have been correctly identified.
Note: It can often take 30 - 45 days (or more) to correctly identify fraudulent events.Label Value | Mapped Label Class | Count | Percentage |
---|---|---|---|
legit | NON-FRAUD | 79299 | 67.94% |
suspicious | FRAUD | 22093 | 18.93% |
fraud | FRAUD | 8939 | 7.66% |
Missing Labels | Undefined | 6384 | 5.47% |
Hints on the interactive graphs:
1. You can zoom in and zoom out by scrolling the mouse wheel over plots.
2. You can drag the plots leftwards and rightwards to change x-axis ranges.
3. You can toggle the legend to show or hide the corresponding bars or curves.The plot shows the label distribution across categories for a categorical feature. We have several sorting options from which you can choose the one best fits your needs:
In each sorting option we display the top 100 categories by default, and you can drag the plot and scroll wheels to see up to 500 categories in total.
You can choose which data to plot from the data showing options button. You can also toggle to legends to show or hide the corresponding bars or curves.Name | Type | Count | # Distinct | % Distinct | # Missing | % Missing |
---|---|---|---|---|---|---|
entity_id | CATEGORY | 116715 | 29995 | 25.70% | 0 | 0.00% |
 
Name | Type | Count | # Distinct | % Distinct | # Missing | % Missing |
---|---|---|---|---|---|---|
customer_name | CATEGORY | 116715 | 689 | 0.59% | 0 | 0.00% |
 
Name | Type | Count | # Distinct | % Distinct | # Missing | % Missing |
---|---|---|---|---|---|---|
address | TEXT | 116715 | 29995 | 25.70% | 0 | 0.00% |
 
Name | Type | Count | # Distinct | % Distinct | # Missing | % Missing |
---|---|---|---|---|---|---|
billing_city | CATEGORY | 116715 | 11777 | 10.09% | 0 | 0.00% |
 
Name | Type | Count | # Distinct | % Distinct | # Missing | % Missing |
---|---|---|---|---|---|---|
billing_state | CATEGORY | 116715 | 51 | 0.04% | 0 | 0.00% |
 
Name | Type | Count | # Distinct | % Distinct | # Missing | % Missing |
---|---|---|---|---|---|---|
customer_job | CATEGORY | 116715 | 639 | 0.55% | 0 | 0.00% |
 
Name | Type | Count | # Distinct | % Distinct | # Missing | % Missing |
---|---|---|---|---|---|---|
ip_address | IP_ADDRESS | 116715 | 29995 | 25.70% | 0 | 0.00% |
 
Name | Type | Count | # Distinct | % Distinct | # Missing | % Missing |
---|---|---|---|---|---|---|
customer_email | EMAIL_ADDRESS | 116715 | 29518 | 25.29% | 0 | 0.00% |
 
Name | Type | Count | # Distinct | % Distinct | # Missing | % Missing |
---|---|---|---|---|---|---|
phone | CATEGORY | 116715 | 29995 | 25.70% | 0 | 0.00% |
 
Name | Type | Count | # Distinct | % Distinct | # Missing | % Missing |
---|---|---|---|---|---|---|
user_agent | TEXT | 116715 | 24178 | 20.72% | 0 | 0.00% |
 
Name | Type | Count | # Distinct | % Distinct | # Missing | % Missing |
---|---|---|---|---|---|---|
product_category | CATEGORY | 116715 | 14 | 0.01% | 0 | 0.00% |
 
Name | Type | Count | # Distinct | % Distinct | # Missing | % Missing |
---|---|---|---|---|---|---|
payment_currency | CATEGORY | 9 | 1 | 0.00% | 116706 | 99.99% |
 
Name | Type | Count | # Distinct | % Distinct | # Missing | % Missing |
---|---|---|---|---|---|---|
merchant | CATEGORY | 116715 | 693 | 0.59% | 0 | 0.00% |
 
Name | Type | Count | # Distinct | % Distinct | # Missing | % Missing |
---|---|---|---|---|---|---|
_EMAIL_DOMAIN | CATEGORY | 116715 | 7936 | 6.80% | 0 | 0.00% |
 
Name | Type | Count | # Distinct | % Distinct | # Missing | % Missing | Mean | Min | Max |
---|---|---|---|---|---|---|---|---|---|
card_bin | NUMERIC | 116715 | 23312 | 19.97% | 0 | 0.00% | 410215.81942338176 | 180000.0 | 676399.0 |
 
Name | Type | Count | # Distinct | % Distinct | # Missing | % Missing | Mean | Min | Max |
---|---|---|---|---|---|---|---|---|---|
billing_zip | NUMERIC | 116715 | 19607 | 16.80% | 0 | 0.00% | 49863.9313884248 | 1002.0 | 99929.0 |
 
Name | Type | Count | # Distinct | % Distinct | # Missing | % Missing | Mean | Min | Max |
---|---|---|---|---|---|---|---|---|---|
billing_latitude | NUMERIC | 116715 | 18302 | 15.68% | 0 | 0.00% | 38.96244405774751 | 19.0668 | 71.2346 |
 
Name | Type | Count | # Distinct | % Distinct | # Missing | % Missing | Mean | Min | Max |
---|---|---|---|---|---|---|---|---|---|
billing_longitude | NUMERIC | 116715 | 18987 | 16.27% | 0 | 0.00% | -91.12338885575977 | -176.7874 | -67.0408 |
 
Name | Type | Count | # Distinct | % Distinct | # Missing | % Missing | Mean | Min | Max |
---|---|---|---|---|---|---|---|---|---|
order_price | NUMERIC | 116715 | 28614 | 24.52% | 0 | 0.00% | 114.5816346656385 | 1.0 | 18846.18 |
 
Name | Type | Count | # Distinct | % Distinct | # Missing | % Missing | Mean | Min | Max |
---|---|---|---|---|---|---|---|---|---|
time | DATETIME | 116715 | 115953 | 99.35% | 0 | 0.00% | 2020-12-27 16:01:40.446103552+00:00 | 2020-11-01 00:00:00+00:00 | 2021-02-28 00:00:00+00:00 |
 
The plot shows the correlation between pair-wise features and label. The label is treated as a categorical variable. For two numerical features, the correlation is the absolute value of Pearson correlation. For a numerical feature and a categorical feature, correlation ratio is used for measuring the curvilinear relationship. For two categorical features, Cramér's V value is used which is based on Pearson's chi-squared statistic.
After training an AFD model, you will get the feature importance distribution. You can combine that with feature correlation to identify potential label leakage. For example, if a feature has >0.99 correlation with label and it has significantly higher feature importance than other features, then there's a risk of label leakage on that feature.