{ "cells": [ { "cell_type": "markdown", "id": "3c946127", "metadata": {}, "source": [ "Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.\n", "\n", "SPDX-License-Identifier: Apache-2.0" ] }, { "cell_type": "markdown", "id": "88a0136a", "metadata": {}, "source": [ "# Train time series anomaly detection model (NAB)\n", "\n", "\n", "## Table of Contents\n", "1. Specify data folder containing individual CSVs\n", "2. Specify location containing label JSON\n", "3. Train Context OSE Model from NAB library\n", "4. Perform inference on test set\n", "\n" ] }, { "cell_type": "code", "execution_count": null, "id": "be3cfa13", "metadata": {}, "outputs": [], "source": [ "import os\n", "import pandas as pd\n", "import numpy as np" ] }, { "cell_type": "code", "execution_count": null, "id": "d3f0264d", "metadata": {}, "outputs": [], "source": [ "from matplotlib import pyplot as plt\n", "import matplotlib\n", "font = {'family' : 'normal', 'size' : 18}\n", "matplotlib.rc('font', **font)\n", "\n", "import matplotlib.cm as cm\n", "plt.rcParams[\"figure.figsize\"] = (20,12)" ] }, { "cell_type": "code", "execution_count": null, "id": "65095098", "metadata": {}, "outputs": [], "source": [ "import sys\n", "\n", "sys.path.append('../../src/')\n", "\n", "from anomaly_detection_spatial_temporal_data.model.time_series import NABAnomalyDetector" ] }, { "cell_type": "markdown", "id": "cf903d6a", "metadata": {}, "source": [ "## Load in one example time series data\n", "\n", "The models within the NAB use a window of historical context to predict if a future time step is an anomaly. Therefore, one time series is sufficient to demonstrate its usage." ] }, { "cell_type": "code", "execution_count": null, "id": "25f226da", "metadata": {}, "outputs": [], "source": [ "#[\"'C1001065306'\" \"'es_health'\"] is a good example \n", "example_c = \"\"\"'C1001065306'\"\"\"\n", "example_m = \"\"\"'es_health'\"\"\"" ] }, { "cell_type": "code", "execution_count": null, "id": "59652e66", "metadata": {}, "outputs": [], "source": [ "example_ts_file_path = f\"\"\"../../data/02_intermediate/financial_fraud/ts_data/{example_c}_{example_m}_transaction_data.csv\"\"\"\n", "\n", "ts_example_data = pd.read_csv(example_ts_file_path)\n", "\n", "ts_example_data.head(10)" ] }, { "cell_type": "code", "execution_count": null, "id": "7d1634f5", "metadata": {}, "outputs": [], "source": [ "example_ts_label_file_path = f\"\"\"../../data/02_intermediate/financial_fraud/ts_label/{example_c}_{example_m}_transaction_label.csv\"\"\"\n", "\n", "ts_example_label = pd.read_csv(example_ts_label_file_path)\n", "\n", "ts_example_label.head(10)" ] }, { "cell_type": "code", "execution_count": null, "id": "68e6bf35", "metadata": {}, "outputs": [], "source": [ "ts_example_data.shape, ts_example_label.shape" ] }, { "cell_type": "markdown", "id": "fcc8cd8e", "metadata": {}, "source": [ "### Plot time series with anomaly" ] }, { "cell_type": "code", "execution_count": null, "id": "342412a1", "metadata": {}, "outputs": [], "source": [ "plt.plot(ts_example_data.timestamp, ts_example_data.value, label='amount')\n", "label_pos =np.where(ts_example_label.label==1)\n", "plt.scatter(ts_example_data.iloc[label_pos].timestamp, ts_example_data.iloc[label_pos].value, label='fraud', color='red')\n", "plt.xlabel('Timestep')\n", "plt.ylabel('Amount')\n", "plt.legend()\n", "plt.show()" ] }, { "cell_type": "markdown", "id": "11768cf6", "metadata": {}, "source": [ "## Run NAB model training and inference " ] }, { "cell_type": "code", "execution_count": null, "id": "f36cb70a", "metadata": {}, "outputs": [], "source": [ "model_name = \"contextOSE\"\n", "model_path = \"../../src/anomaly_detection_spatial_temporal_data/model/NAB\"\n", "input_dir = \"../../data/02_intermediate/financial_fraud/ts_data\"\n", "output_dir = \"../../data/07_model_output/financial_fraud/ts_result\"\n", "label_dict_path = \"../../data/02_intermediate/financial_fraud/ts_label/labels-combined.json\"" ] }, { "cell_type": "code", "execution_count": null, "id": "576ae5e9", "metadata": {}, "outputs": [], "source": [ "model_obj = NABAnomalyDetector(\n", " model_name, \n", " model_path,\n", " input_dir,\n", " label_dict_path,\n", " output_dir,\n", ")" ] }, { "cell_type": "code", "execution_count": null, "id": "52bb7e22", "metadata": {}, "outputs": [], "source": [ "model_obj.predict()" ] }, { "cell_type": "markdown", "id": "fd14782b", "metadata": {}, "source": [ "## Load inference result " ] }, { "cell_type": "code", "execution_count": null, "id": "65be38b2", "metadata": {}, "outputs": [], "source": [ "output_dir= f'../../data/07_model_output/financial_fraud/ts_result/{model_name}'" ] }, { "cell_type": "code", "execution_count": null, "id": "980c993c", "metadata": {}, "outputs": [], "source": [ "example_result_file_path = os.path.join(\n", " output_dir, \n", " f\"\"\"{model_name}_{example_c}_{example_m}_transaction_data.csv\"\"\")" ] }, { "cell_type": "code", "execution_count": null, "id": "10110066", "metadata": {}, "outputs": [], "source": [ "example_result_file_path" ] }, { "cell_type": "code", "execution_count": null, "id": "e1f7eb76", "metadata": {}, "outputs": [], "source": [ "ts_example_result = pd.read_csv(example_result_file_path)" ] }, { "cell_type": "code", "execution_count": null, "id": "bc934e63", "metadata": {}, "outputs": [], "source": [ "ts_example_result" ] }, { "cell_type": "code", "execution_count": null, "id": "8de1a0d4", "metadata": {}, "outputs": [], "source": [ "anomaly_score_threshold = 0.95" ] }, { "cell_type": "code", "execution_count": null, "id": "ad639256", "metadata": {}, "outputs": [], "source": [ "plt.plot(ts_example_data.timestamp, ts_example_data.value, label='amount')\n", "predict_pos =np.where(ts_example_result.anomaly_score>=anomaly_score_threshold)\n", "if predict_pos:\n", " plt.scatter(ts_example_data.iloc[predict_pos].timestamp, ts_example_data.iloc[predict_pos].value, label='predicted_fraud', color='red')\n", "plt.xlabel('Timestep')\n", "plt.ylabel('Amount')\n", "plt.legend()\n", "plt.show()" ] }, { "cell_type": "markdown", "id": "61145d2d", "metadata": {}, "source": [ "# References\n", "\n", "Edgar Alonso Lopez-Rojas and Stefan Axelsson. 2014. BANKSIM: A BANK PAYMENTS SIMULATOR FOR FRAUD DETECTION RESEARCH.\n", "\n", "Alexander Lavin and Subutai Ahmad. 2015. Evaluating Real-Time Anomaly Detection Algorithms – The Numenta Anomaly Benchmark." ] }, { "cell_type": "code", "execution_count": null, "id": "3295cd5c", "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "kedro-nab-venv", "language": "python", "name": "kedro-nab-venv" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.12" } }, "nbformat": 4, "nbformat_minor": 5 }