{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Setup the Dataset\n",
    "\n",
    "The repo (https://github.com/gsoh/VED) was installed and mounted in `./VED/` by the Notebook setup.\n",
    "Data is in 7zip files (2 parts)\n",
    "\n",
    "## this extraction probably only needs to be done once\n",
    "\n",
    "First, need to install tools"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!sudo yum install -y https://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm\n",
    "!sudo yum-config-manager --enable epel\n",
    "!sudo yum install epel-release\n",
    "!sudo yum install p7zip\n",
    "\n",
    "## Now extract and join the archives\n",
    "!mkdir -p DynamicData \n",
    "\n",
    "!7za x VED/Data/VED_DynamicData_Part1.7z\n",
    "!7za x VED/Data/VED_DynamicData_Part2.7z\n",
    "\n",
    "!mv *.csv DynamicData/\n",
    "\n",
    "### the vehicle IDs are in xlsx files. Convert them\n",
    "!pip install xlsx2csv\n",
    "\n",
    "!mkdir -p StaticData\n",
    "\n",
    "!xlsx2csv 'VED/Data/VED_Static_Data_ICE&HEV.xlsx' StaticData/ICEHEV.csv\n",
    "!xlsx2csv 'VED/Data/VED_Static_Data_PHEV&EV.xlsx' StaticData/PHEVEV.csv"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Data Organization\n",
    "\n",
    "The list of vehicle IDs and reference information is in `StaticData` in two CSV files -- one for ICE-type cars and another for EV-type cars.  Note that the columns are slightly different.\n",
    "\n",
    "## if the data has been expanded already, can start here\n",
    "\n",
    "Then in the `DynamicData` folder, each of the 22 files is a week of telemetry data for the cars.  "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### first, let's combine the StaticData into a consistent dataframe"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd\n",
    "\n",
    "vehiclesICE = pd.read_csv(\"StaticData/ICEHEV.csv\")\n",
    "vehiclesEV = pd.read_csv(\"StaticData/PHEVEV.csv\")\n",
    "\n",
    "# rename the EngineType column to match the ICE dataframe\n",
    "vehiclesEV = vehiclesEV.rename(columns={\"EngineType\":\"Vehicle Type\"})\n",
    "\n",
    "# combine the two sets of vehicle data into one dataframe\n",
    "vehicles = pd.concat([vehiclesICE, vehiclesEV])\n",
    "\n",
    "vehicles.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### now, combine all the weeks of data into one dataframe"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from functools import reduce\n",
    "from os import listdir\n",
    "from os.path import isfile, join\n",
    "\n",
    "dataDirectory = 'DynamicData'\n",
    "\n",
    "telemetry = reduce(\n",
    "    lambda d, w: d.append(pd.read_csv(join(dataDirectory,w))),\n",
    "    [f for f in listdir(dataDirectory) if isfile(join(dataDirectory, f))],\n",
    "    pd.DataFrame())\n",
    "       \n",
    "telemetry.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "len(telemetry)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "telemetry.columns"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "trips = telemetry['Trip'].unique()\n",
    "len(trips)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Summary\n",
    "\n",
    "We now have a `telemetry` dataframe with 22M rows -- but not all columns are complete as some are EV and others ICE specific\n",
    "\n",
    "There are 4000 trips with lat/lon that can be used"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import matplotlib.pyplot as plt\n",
    "\n",
    "soc = telemetry['HV Battery SOC[%]'].dropna()\n",
    "\n",
    "soc.plot(kind='hist', y='HV Battery SOC[%]')\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "lots of 0s... even though I dropped the nas, which took the count from 22M to 3M. Could be due to ICE rows... but let's strip out the 0s first"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "socMinThresh = soc.where(soc > 2).dropna()\n",
    "socMinThresh.plot(kind='hist', y=\"SOC\")\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "2% seems to be the right level"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Explore Trip Telemetry\n",
    "\n",
    "pick a random trip and plot out the telemetry values"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import random\n",
    "\n",
    "tripID = random.choice(trips)\n",
    "print(f\"looking at trip #{tripID}\")\n",
    "\n",
    "tripData = telemetry[telemetry['Trip'] == tripID]\n",
    "tripData"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### plot the telemetry for this trip"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# tripData.plot(kind='line', x='Timestamp(ms)', y='Vehicle Speed[km/h]') #, y2='Engine RPM[RPM]', y3='Long Term Fuel Trim Bank 1[%]')\n",
    "\n",
    "\n",
    "tripTel = tripData[['Timestamp(ms)','Vehicle Speed[km/h]','Engine RPM[RPM]','Long Term Fuel Trim Bank 1[%]']]\n",
    "tripTel.plot(kind=\"scatter\", x='Timestamp(ms)', y='Vehicle Speed[km/h]')\n",
    "tripTel.plot(kind=\"scatter\", x='Timestamp(ms)', y='Engine RPM[RPM]')\n",
    "tripTel.plot(kind=\"scatter\", x='Timestamp(ms)', y='Long Term Fuel Trim Bank 1[%]')\n",
    "tripTel.plot(kind='scatter', x='Engine RPM[RPM]', y='Vehicle Speed[km/h]')\n",
    "# plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Prep for replay\n",
    "\n",
    "What's going to be most helpful is strip this dataset apart by trip so that a simulated device can replay the trip."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!mkdir -p TripData"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "tripDir = 'TripData'\n",
    "\n",
    "# [print(id) for id in trips]\n",
    "\n",
    "[ telemetry[telemetry['Trip'] == id].sort_values(by=['DayNum','Timestamp(ms)']).to_csv(join(tripDir, str(id) + \".csv\")) for id in trips ]\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!aws s3 cp TripData s3://connected-vehicle-datasource/ --recursive --acl public-read"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!ls TripData"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "conda_pytorch_latest_p36",
   "language": "python",
   "name": "conda_pytorch_latest_p36"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.10"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}