{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "fd46e112",
   "metadata": {},
   "source": [
    "Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.\n",
    "\n",
    "SPDX-License-Identifier: Apache-2.0\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4738f3e2",
   "metadata": {},
   "source": [
    "# Prepare Financial Fraud dataset for Numenta Benchmark (NAB)\n",
    "\n",
    "The [Numenta Benchmark](https://github.com/numenta/NAB) consists of multiple anomaly detection algorithms for time-series. The algorithms identify anomalies contextually - based on previous values for that time series. Therefore the NAB repository expects each time series to be in its own CSV with timestamp and value as the columns.\n",
    "\n",
    "## This notebook consists of steps to \n",
    "1. Load raw data\n",
    "2. Process raw data into independent time series for NAB\n",
    "3. Save JSON specifying anomalies"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "cabcf7f0",
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd\n",
    "import numpy as np"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "7ff96ba1",
   "metadata": {},
   "outputs": [],
   "source": [
    "import sys\n",
    "sys.path.append('../../src/')\n",
    "\n",
    "from anomaly_detection_spatial_temporal_data.utils import ensure_directory"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a6ef58a4",
   "metadata": {},
   "source": [
    "## Load raw data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "6e050f45",
   "metadata": {},
   "outputs": [],
   "source": [
    "raw_data_path = '../../data/01_raw/financial_fraud/bs140513_032310.csv'\n",
    "\n",
    "raw_trans_data = pd.read_csv(raw_data_path)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "05cb5a62",
   "metadata": {},
   "source": [
    "## Construct purchase time series for a single (customer, merchant) pair\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "4c567fef",
   "metadata": {},
   "outputs": [],
   "source": [
    "# values in the CSV have quotations \n",
    "example_c = \"\"\"'C1001065306'\"\"\"\n",
    "example_m = \"\"\"'es_health'\"\"\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "c5522fc8",
   "metadata": {},
   "outputs": [],
   "source": [
    "c_m_p_data_example = raw_trans_data.loc[\n",
    "    (raw_trans_data.customer==example_c)\n",
    "    &(raw_trans_data.category==example_m)\n",
    "][['step','amount','fraud']]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "988f9adf",
   "metadata": {},
   "outputs": [],
   "source": [
    "c_m_p_data_example"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "ccd5a913",
   "metadata": {},
   "outputs": [],
   "source": [
    "example_ts_file_path = f\"\"\"../../data/02_intermediate/financial_fraud/ts_data/{example_c}_{example_m}_transaction_data.csv\"\"\"\n",
    "print(example_ts_file_path)\n",
    "\n",
    "ensure_directory(example_ts_file_path)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "1d170cc6",
   "metadata": {},
   "outputs": [],
   "source": [
    "c_m_p_data_example.rename(columns={'step':'timestamp','amount':'value','fraud':'label'}, inplace=True)\n",
    "c_m_p_data_example[['timestamp','value']].to_csv(example_ts_file_path, index=False)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "ead29d7f",
   "metadata": {},
   "outputs": [],
   "source": [
    "example_ts_label_file_path = f\"\"\"../../data/02_intermediate/financial_fraud/ts_label/{example_c}_{example_m}_transaction_label.csv\"\"\"\n",
    "print(example_ts_label_file_path)\n",
    "\n",
    "ensure_directory(example_ts_label_file_path)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "95650a3c",
   "metadata": {},
   "outputs": [],
   "source": [
    "c_m_p_data_example[['label']].to_csv(example_ts_label_file_path, index=False)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d4264831",
   "metadata": {},
   "source": [
    "## Generate a label dict needed for NAB model"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "df5ea147",
   "metadata": {},
   "outputs": [],
   "source": [
    "from pathlib import Path\n",
    "import json\n",
    "\n",
    "def generate_dummy_labels(data_dir: str, label_dir:str) -> str:\n",
    "    \"\"\"Generate a dummy label JSON file and return its path\"\"\"\n",
    "    data_dir_path = Path(data_dir)\n",
    "    dummy_labels = dict()\n",
    "    for file_path in data_dir_path.rglob(\"*.csv\"):\n",
    "        file_path_relative = file_path.relative_to(data_dir_path)\n",
    "        dummy_labels[str(file_path_relative)] = []\n",
    "    dummy_label_path = Path(f\"{label_dir}/labels-combined.json\")\n",
    "    with dummy_label_path.open(\"w\") as file:\n",
    "        json.dump(dummy_labels, file, indent=4)\n",
    "    return str(dummy_label_path.resolve())"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "9f7e7ada",
   "metadata": {},
   "outputs": [],
   "source": [
    "label_dict_filepath = generate_dummy_labels(\n",
    "    \"../../data/02_intermediate/financial_fraud/ts_data/\",\n",
    "    \"../../data/02_intermediate/financial_fraud/ts_label/\"\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "df25941e",
   "metadata": {},
   "source": [
    "# References\n",
    "\n",
    "Edgar Alonso Lopez-Rojas and Stefan Axelsson. 2014. BANKSIM: A BANK PAYMENTS SIMULATOR FOR FRAUD DETECTION RESEARCH.\n",
    "\n",
    "Alexander Lavin and Subutai Ahmad. 2015. Evaluating Real-Time Anomaly Detection Algorithms – The Numenta Anomaly Benchmark."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "ca9de484",
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "kedro-nab-venv",
   "language": "python",
   "name": "kedro-nab-venv"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.12"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}