{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Setup the Dataset\n", "\n", "The repo (https://github.com/gsoh/VED) was installed and mounted in `./VED/` by the Notebook setup.\n", "Data is in 7zip files (2 parts)\n", "\n", "## this extraction probably only needs to be done once\n", "\n", "First, need to install tools" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "!sudo yum install -y https://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm\n", "!sudo yum-config-manager --enable epel\n", "!sudo yum install epel-release\n", "!sudo yum install p7zip\n", "\n", "## Now extract and join the archives\n", "!mkdir -p DynamicData \n", "\n", "!7za x VED/Data/VED_DynamicData_Part1.7z\n", "!7za x VED/Data/VED_DynamicData_Part2.7z\n", "\n", "!mv *.csv DynamicData/\n", "\n", "### the vehicle IDs are in xlsx files. Convert them\n", "!pip install xlsx2csv\n", "\n", "!mkdir -p StaticData\n", "\n", "!xlsx2csv 'VED/Data/VED_Static_Data_ICE&HEV.xlsx' StaticData/ICEHEV.csv\n", "!xlsx2csv 'VED/Data/VED_Static_Data_PHEV&EV.xlsx' StaticData/PHEVEV.csv" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Data Organization\n", "\n", "The list of vehicle IDs and reference information is in `StaticData` in two CSV files -- one for ICE-type cars and another for EV-type cars. Note that the columns are slightly different.\n", "\n", "## if the data has been expanded already, can start here\n", "\n", "Then in the `DynamicData` folder, each of the 22 files is a week of telemetry data for the cars. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### first, let's combine the StaticData into a consistent dataframe" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import pandas as pd\n", "\n", "vehiclesICE = pd.read_csv(\"StaticData/ICEHEV.csv\")\n", "vehiclesEV = pd.read_csv(\"StaticData/PHEVEV.csv\")\n", "\n", "# rename the EngineType column to match the ICE dataframe\n", "vehiclesEV = vehiclesEV.rename(columns={\"EngineType\":\"Vehicle Type\"})\n", "\n", "# combine the two sets of vehicle data into one dataframe\n", "vehicles = pd.concat([vehiclesICE, vehiclesEV])\n", "\n", "vehicles.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### now, combine all the weeks of data into one dataframe" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from functools import reduce\n", "from os import listdir\n", "from os.path import isfile, join\n", "\n", "dataDirectory = 'DynamicData'\n", "\n", "telemetry = reduce(\n", " lambda d, w: d.append(pd.read_csv(join(dataDirectory,w))),\n", " [f for f in listdir(dataDirectory) if isfile(join(dataDirectory, f))],\n", " pd.DataFrame())\n", " \n", "telemetry.head()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "len(telemetry)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "telemetry.columns" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "trips = telemetry['Trip'].unique()\n", "len(trips)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Summary\n", "\n", "We now have a `telemetry` dataframe with 22M rows -- but not all columns are complete as some are EV and others ICE specific\n", "\n", "There are 4000 trips with lat/lon that can be used" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import matplotlib.pyplot as plt\n", "\n", "soc = telemetry['HV Battery SOC[%]'].dropna()\n", "\n", "soc.plot(kind='hist', y='HV Battery SOC[%]')\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "lots of 0s... even though I dropped the nas, which took the count from 22M to 3M. Could be due to ICE rows... but let's strip out the 0s first" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "socMinThresh = soc.where(soc > 2).dropna()\n", "socMinThresh.plot(kind='hist', y=\"SOC\")\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "2% seems to be the right level" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Explore Trip Telemetry\n", "\n", "pick a random trip and plot out the telemetry values" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import random\n", "\n", "tripID = random.choice(trips)\n", "print(f\"looking at trip #{tripID}\")\n", "\n", "tripData = telemetry[telemetry['Trip'] == tripID]\n", "tripData" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### plot the telemetry for this trip" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# tripData.plot(kind='line', x='Timestamp(ms)', y='Vehicle Speed[km/h]') #, y2='Engine RPM[RPM]', y3='Long Term Fuel Trim Bank 1[%]')\n", "\n", "\n", "tripTel = tripData[['Timestamp(ms)','Vehicle Speed[km/h]','Engine RPM[RPM]','Long Term Fuel Trim Bank 1[%]']]\n", "tripTel.plot(kind=\"scatter\", x='Timestamp(ms)', y='Vehicle Speed[km/h]')\n", "tripTel.plot(kind=\"scatter\", x='Timestamp(ms)', y='Engine RPM[RPM]')\n", "tripTel.plot(kind=\"scatter\", x='Timestamp(ms)', y='Long Term Fuel Trim Bank 1[%]')\n", "tripTel.plot(kind='scatter', x='Engine RPM[RPM]', y='Vehicle Speed[km/h]')\n", "# plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Prep for replay\n", "\n", "What's going to be most helpful is strip this dataset apart by trip so that a simulated device can replay the trip." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "!mkdir -p TripData" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "tripDir = 'TripData'\n", "\n", "# [print(id) for id in trips]\n", "\n", "[ telemetry[telemetry['Trip'] == id].sort_values(by=['DayNum','Timestamp(ms)']).to_csv(join(tripDir, str(id) + \".csv\")) for id in trips ]\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "!aws s3 cp TripData s3://connected-vehicle-datasource/ --recursive --acl public-read" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "!ls TripData" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "conda_pytorch_latest_p36", "language": "python", "name": "conda_pytorch_latest_p36" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.10" } }, "nbformat": 4, "nbformat_minor": 4 }