{ "cells": [ { "cell_type": "markdown", "id": "7bc0ec07", "metadata": {}, "source": [ "Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.\n", "\n", "SPDX-License-Identifier: Apache-2.0" ] }, { "cell_type": "markdown", "id": "9b6417fc", "metadata": {}, "source": [ "# Train GDN Model with BATADAL data\n", "\n", "## Table of Contents\n", "\n", "1. Load processed data\n", " * Load sensor columns\n", " * Load train and test CSVs\n", "2. Load model and environment config files\n", "3. Train GDN model \n", "4. Perform inference on test set" ] }, { "cell_type": "code", "execution_count": null, "id": "7cb2f178", "metadata": {}, "outputs": [], "source": [ "import sys \n", "sys.path.append('../../src/anomaly_detection_spatial_temporal_data/')" ] }, { "cell_type": "code", "execution_count": null, "id": "f4f1bc0e", "metadata": {}, "outputs": [], "source": [ "import pandas as pd\n", "import numpy as np\n", "import yaml\n", "from model.GDN.GDNTrainer import GDNTrainer" ] }, { "cell_type": "markdown", "id": "496dee29", "metadata": {}, "source": [ "# Load data" ] }, { "cell_type": "markdown", "id": "5ee410bd", "metadata": {}, "source": [ "### Load sensor columns" ] }, { "cell_type": "code", "execution_count": null, "id": "ba5c1a65", "metadata": {}, "outputs": [], "source": [ "data_dir = \"../../data/03_primary/iot\"" ] }, { "cell_type": "code", "execution_count": null, "id": "fd6da636", "metadata": {}, "outputs": [], "source": [ "with open(f\"{data_dir}/iot_sensor_list_batadal.txt\", \"r\") as f:\n", " sensors = f.read().split(\"\\n\")" ] }, { "cell_type": "code", "execution_count": null, "id": "7bca7ef3", "metadata": {}, "outputs": [], "source": [ "print(f\"Number of sensors: {len(sensors)}\")" ] }, { "cell_type": "code", "execution_count": null, "id": "358bde56", "metadata": {}, "outputs": [], "source": [ "print(sensors)" ] }, { "cell_type": "markdown", "id": "1c189fe3", "metadata": {}, "source": [ "### Load train and test CSVs" ] }, { "cell_type": "code", "execution_count": null, "id": "d50eab41", "metadata": {}, "outputs": [], "source": [ "train_df = pd.read_csv(f\"{data_dir}/iot_gdn_train.csv\")\n", "test_df = pd.read_csv(f\"{data_dir}/iot_gdn_test.csv\")\n", "\n", "print(train_df.shape)\n", "print(test_df.shape)\n" ] }, { "cell_type": "markdown", "id": "aec13f81", "metadata": {}, "source": [ "### Load training and environment configurations" ] }, { "cell_type": "code", "execution_count": null, "id": "1a64932f", "metadata": {}, "outputs": [], "source": [ "model_config_file = \"../../conf/base/parameters/gdn.yml\"" ] }, { "cell_type": "code", "execution_count": null, "id": "81870b07", "metadata": {}, "outputs": [], "source": [ "with open(model_config_file, \"r\") as stream:\n", " try:\n", " model_config = yaml.safe_load(stream)\n", " print(model_config)\n", " except yaml.YAMLError as exc:\n", " print(exc)" ] }, { "cell_type": "code", "execution_count": null, "id": "6289cd6d", "metadata": {}, "outputs": [], "source": [ "train_config = model_config[\"train_config\"]\n", "env_config = model_config[\"env_config_wifi\"]\n", "\n", "env_config[\"checkpoint_save_dir\"] = \"../../data/07_model_output/gdn-iot-notebook\"" ] }, { "cell_type": "markdown", "id": "d740e03c", "metadata": {}, "source": [ "# Train model" ] }, { "cell_type": "code", "execution_count": null, "id": "6b694b92", "metadata": {}, "outputs": [], "source": [ "# by default this runs for 3 epochs\n", "# we can change this by uncommenting the following line before creating the training object\n", "# train_config[\"epoch\"] = 5\n", "\n", "trainer = GDNTrainer(\n", " sensors, train_df, test_df, \n", " train_config, env_config\n", ")" ] }, { "cell_type": "code", "execution_count": null, "id": "ca027b6c", "metadata": {}, "outputs": [], "source": [ "trainer.run()" ] }, { "cell_type": "markdown", "id": "7d55b8f6", "metadata": {}, "source": [ "# Run inference" ] }, { "cell_type": "code", "execution_count": null, "id": "a3efed25", "metadata": {}, "outputs": [], "source": [ "pred, labels = trainer.predict()" ] }, { "cell_type": "code", "execution_count": null, "id": "bfd70697", "metadata": {}, "outputs": [], "source": [ "# we lose 5 items due to the windowing process from the TimeDataset constructor\n", "# window is `slide_win`\n", "# pred.shape" ] }, { "cell_type": "code", "execution_count": null, "id": "0dc0d18f", "metadata": {}, "outputs": [], "source": [ "# np.array(labels).shape" ] }, { "cell_type": "markdown", "id": "339225b2", "metadata": {}, "source": [ "# References\n", "Riccardo Taormina and Stefano Galelli and Nils Ole Tippenhauer and Elad Salomons and Avi Ostfeld and Demetrios G. Eliades and Mohsen Aghashahi and Raanju Sundararajan and Mohsen Pourahmadi and M. Katherine Banks and B. M. Brentan and Enrique Campbell and G. Lima and D. Manzi and D. Ayala-Cabrera and M. Herrera and I. Montalvo and J. Izquierdo and E. Luvizotto and Sarin E. Chandy and Amin Rasekh and Zachary A. Barker and Bruce Campbell and M. Ehsan Shafiee and Marcio Giacomoni and Nikolaos Gatsis and Ahmad Taha and Ahmed A. Abokifa and Kelsey Haddad and Cynthia S. Lo and Pratim Biswas and M. Fayzul K. Pasha and Bijay Kc and Saravanakumar Lakshmanan Somasundaram and Mashor Housh and Ziv Ohar; \"The Battle Of The Attack Detection Algorithms: Disclosing Cyber Attacks On Water Distribution Networks.\" Journal of Water Resources Planning and Management, 144 (8), August 2018\n", "\n", "Ailin Deng and Bryan Hooi. 2021. Graph Neural Network-Based Anomaly Detection in Multivariate Time Series. CoRR abs/2106.06947, (2021). Retrieved from https://arxiv.org/abs/2106.06947 " ] }, { "cell_type": "code", "execution_count": null, "id": "3db906ed", "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "kedro-gdn-venv", "language": "python", "name": "kedro-gdn-venv" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.12" } }, "nbformat": 4, "nbformat_minor": 5 }