# Fraud Detector SDK User Guide ## Data Preparation #### Event Timestamp #### The Amazon Fraud Detector service requires that a column called `EVENT_TIMESTAMP` is included in the data-set. This is the timestamp when the event occurred. The timestamp must be in *ISO 8601 standard format in UTC* - for example `2021-12-22T03:20:11Z` (https://en.wikipedia.org/wiki/ISO_8601). #### Event Labels #### The Amazon Fraud Detector service requires that the outcome-label for training data is in a data-column labeled `EVENT_LABEL`. Data in this column must be of type *String*, not a `1`/`0` label - these should be represented as `fraud`/`legit`, or similar appropriate string label values. These label-strings should match the labels that are created in the Amazon Fraud Detector service's context (this can be achieved with the `FraudDetector().create_labels()` method) # Create Fraud Detector Resources # ## Instantiate a Fraud Detector instance **Class** `frauddetector.FraudDetector()` **Parameters** `entity_type` : name-label for the type of fraud - EG registration, credit_card, phone_call `event_type` : name-label for the type of event - EG user_registration, card_transaction, CDR `detector_name` : name-label for this fraud detector `model_name` : name-label for the model `model_version` : version-number for the model `model_type` : one of `ONLINE_FRAUD_INSIGHTS` or `TRANSACTION_FRAUD_INSIGHTS` ref: https://docs.aws.amazon.com/frauddetector/latest/ug/choosing-model-type.html `detector_version` : version-number for this detector (combining rules, model, outcomes model) ```python from frauddetector import frauddetector detector = frauddetector.FraudDetector( entity_type="registration", event_type="user-registration", detector_name="registration-detector", model_name="registration-model", model_version="1", model_type="ONLINE_FRAUD_INSIGHTS", detector_version="1" ) ``` ## Profiling data The data to be profiled should include the fraud-outcome Label identifier. This field needs to be named `EVENT_LABEL` as this naming convention is built into the product. Assuming the data you are going to load contains the columns `EVENT_TIMESTAMP` (the timestamps of your events) and `EVENT_LABEL` (your label "legit" or "fraud"), EG ![head_data](./images/Dataset-Head.png) then you can use the Profiler as follows: ```python import pandas as pd from frauddetector import profiler profiler = profiler.Profiler() df = pd.read_csv("example/training_data/registration_data_20K_minimum.csv.zip") data_schema, variables, labels = profiler.get_frauddetector_inputs(data=df) ``` View the variables structure generated by the Profiler: ```python [{'name': 'ip_address', 'variableType': 'IP_ADDRESS', 'dataType': 'STRING', 'defaultValue': 'unknown'}, {'name': 'email_address', 'variableType': 'EMAIL_ADDRESS', 'dataType': 'STRING', 'defaultValue': 'unknown'}] ``` View the labels structure generated by the Profiler: ```python [{'name': 'legit'}, {'name': 'fraud'}] ``` If the source data column names differ from the standard Fraud Detector timestamp and event-label names, for instance your timestamp is in a column called `dttm` and the event label in a column called `event` then run the profiler as follows: ```python data_schema, variables, labels = profiler.get_frauddetector_inputs( data=df, event_column='event', timestamp_column='dttm', filter_warnings=True) ``` Note: The `filter_warnings` flag filters data points out that produce a warning. By default it is set to `False`. The `Profiler` class currently filters for: * `CATEGORY` * `NUMERIC` * `IP_ADDRESS` * `EMAIL_ADDRESS` For other data types run manual pre-checks on the data. The Profiler will categorize these entries as *UNKNOWN*. The `get_summary_stats_table()` method summarizes the categories found in the source data: ```python summary_table = profiler.get_summary_stats_table(data=df) ``` ![summary_table](./images/Summary-Table.png) This method also has arguments `event_column` and `timestamp_columns`. ## Train a model First instantiate a Fraud Detector SDK instance (called `detector` in the example below) Next configure an AWS Role with appropriate privileges to run Amazon Fraud Detector and access the training data. For full access privileges, use a role that has the policy `AmazonFraudDetectorFullAccessPolicy` attached to it. ```python # https://docs.aws.amazon.com/frauddetector/latest/ug/security-iam.html role_arn="arn:aws:iam::9999999999:role/MyFraudDetectorRole" ``` Either manually define the `variables` and `labels` definitions or use the Profiler to extract these definitions from some sample data. Make sure the field containing the fraud-outcome to train the model against is named `EVENT_LABEL`. Then train a model using the `fit()` method: ```python detector.fit(data_schema=data_schema , data_location="s3:///training/registration_data_20K_minimum.csv" , role=role_arn , variables=variables , labels=labels) ``` ## Create a detector and activate it Provide a list of outcomes to create an active model associated with FraudDetector outcomes. Fraud Detector rules are associated with the outcomes. First, confirm the detector instance model has completed the training phase: ```python print(detector.model_status) ``` ``` TRAINING_COMPLETE ``` Define some outcomes: ```python outcomes = [("review_outcome", "Start a review process workflow"), ("verify_outcome", "Sideline event for review"), ("approve_outcome", "Approve the event")] ``` Activate the detector: ```python detector.activate(outcomes_list=outcomes) ``` Check the status of the detector: ```python print(detector.model_status) ``` ``` ACTIVE ``` ## Deploy a detector with rules Define some *rules* to map to the *outcomes* in an activated detector. The rule-boundry metrics can be determined by checking the model training metrics in the AWS console. See the following link for more information about defining rules: https://docs.aws.amazon.com/frauddetector/latest/ug/rule-language-reference.html ```python # this example is for applying rules to a model called registration_model rules = [{'ruleId': 'high_fraud_risk', 'expression': '$registration_model_insightscore > 900', 'outcomes': ['verify_outcome'] }, {'ruleId': 'low_fraud_risk', 'expression': '$registration_model_insightscore <= 900 and $registration_model_insightscore > 700', 'outcomes': ['review_outcome'] }, {'ruleId': 'no_fraud_risk', 'expression': '$registration_model_insightscore <= 700', 'outcomes': ['approve_outcome'] } ] # deploy the detector with rules detector.deploy(rules_list=rules) ``` ## Get predictions from a detector Use the `predict()` or `batch_predict()` methods to predict for a single event, passed in as a dictionary, or a batch of events passed in as a dataframe. ```python # define event variables to pass to the detector in a dictionary structure event_variables = { 'email_address' : 'johndoe@exampledomain.com', 'ip_address' : '1.2.3.4' } # pass the event to an active deployed detector with an event-timestamp in ISO 8601 format prediction = detector.predict('2021-11-13T12:18:21Z',event_variables) ``` The detector passes back the model score and the associated rule-outcome that this triggers, for example: ```python {'registration_model_insightscore': 861.0, 'ruleResults': [{'ruleId': 'low_fraud_risk', 'outcomes': ['review_outcome']}]} ``` # Delete Fraud Detector Resources # Before each delete step, instantiate the Fraud Detector SDK with the correct attributes to apply the resource deletion for. The order of operations is important because of the dependancies between resources. Instantiate SDK example: ```python from frauddetector import frauddetector detector = frauddetector.FraudDetector( entity_type="registration", event_type="user-registration", detector_name="registration-detector", model_name="registration-model", model_version="1.00", model_type="ONLINE_FRAUD_INSIGHTS", detector_version="1" ) ``` ## Delete a Model and Rules ## Before deleting rules for a detector, deactivate the model associated with it. ```python detector.set_model_version_inactive() ``` Wait until the model is inactive before deleting the rules: ```python if detector.model_status == 'INACTIVE' or detector.model_status == 'TRAINING_COMPLETE': print("** deleting detector rules **") detector.delete_rules(detector.rules) else: print("Wait until model is inactive") exit(0) ``` Delete the detector version: ```python detector.delete_detector_version() ``` Get all model versions: ```python models_json = detector.get_models() print(json.dumps(models_json, indent=2)) ``` Delete the model version: ```python detector.delete_model_version() ``` Delete the model: ```python detector.delete_model() ``` Delete the detector: ```python response = detector.delete_detector() print(response) ``` NOTE - sometimes it may be necessary to manually delete the detector in the console. ## Delete Variables and Event-Type ## ``` # get the variables to delete before deleting the event-type variables = detector.variables detector.delete_event_type() detector.delete_variables(variables) ``` ## Delete Labels and Outcomes ## ```python # Delete labels - not directly linked to detector-instance - they can be referenced and shared by multiple detectors label_names = [n['name'] for n in detector.labels['labels']] print(label_names) detector.delete_labels(label_names) # Delete Outcomes - not directly linked to detector-instance - they can be referenced and shared by multiple detectors outcome_names = [o[0] for o in detector.outcomes] print(outcome_names) detector.delete_outcomes(outcome_names) ```