{ "cells": [ { "cell_type": "markdown", "id": "7d0f97f8", "metadata": {}, "source": [ "# Logistics Analysis using a Transportation Network\n", "\n", "For any company that deals with moving goods from one location to another, the logistical challenges and associated costs of moving people and goods from one location to another represent a significant amount of effort and resources. Companies often employ a variety of route optimization techniques to optimize both the costs and efficiency of their transportation logistics. When dealing with route logistics, there are a few main points of data that come into play including time, cost, distance, etc. Using graphs to represent the connections between data within these logistics systems provides a robust way to represent these attributes. Graphs allow us to represent attributes of connections as properties of those connections, which is a natural way to store data for a logistics system. Graphs also provide advantages in logistics systems as they enable certain types of graph algorithms, specifically around path finding, to drive insights that can feed into route optimizations.\n", "\n", "Route optimization can take many forms, be it minimizing distance, maximizing load, minimizing cost, maximizing speed, or balancing multiple factors to achieve an optimized solution. In this notebook, we will look at the types of inputs that graphs can help provide to these optimizations.\n", "\n", "\n", "## Creating a logistics graph\n", "\n", "In this section, we'll load a graph focused on airports, routes between airports, and the distance of those routes.\n", "\n", "### Load data\n", "The cell below loads the example graph into your Neptune cluster. When you run the cell below, a graph for an example transportation dataset will load, which will take about 1 minute." ] }, { "cell_type": "code", "execution_count": null, "id": "d0ce4226", "metadata": {}, "outputs": [], "source": [ "%seed --model Property_Graph --run --dataset airports" ] }, { "cell_type": "markdown", "id": "036f588e", "metadata": {}, "source": [ "### Install Required Libraries\n", "\n", "For this notebook, we will also need to install [iGraph](https://igraph.org/) which is an open-source network analysis library. You could also perform similar tasks using other common Python libraries such as [NetworkX](https://networkx.org/) if you desire. " ] }, { "cell_type": "code", "execution_count": null, "id": "234d9a68", "metadata": {}, "outputs": [], "source": [ "pip install igraph -q" ] }, { "cell_type": "markdown", "id": "8860c93c", "metadata": {}, "source": [ "### Set visualization and configuration options\n", "\n", "The cell below configures the visualization to use specific colors and icons for the different parts of the data model." ] }, { "cell_type": "code", "execution_count": null, "id": "10528702", "metadata": {}, "outputs": [], "source": [ "%%graph_notebook_vis_options\n", "\n", "{\n", " \"groups\": {\n", " \"airport\": {\n", " \"shape\": \"icon\",\n", " \"icon\": {\n", " \"face\": \"FontAwesome\",\n", " \"code\": \"\\uf072\",\n", " \"color\": \"blue\"\n", " }\n", " },\n", " \"US\": {\n", " \"shape\": \"icon\",\n", " \"icon\": {\n", " \"face\": \"FontAwesome\",\n", " \"code\": \"\\uf072\",\n", " \"color\": \"blue\"\n", " }\n", " },\n", " \"IS\": {\n", " \"shape\": \"icon\",\n", " \"icon\": {\n", " \"face\": \"FontAwesome\",\n", " \"code\": \"\\uf072\",\n", " \"color\": \"yellow\"\n", " }\n", " },\n", " \"RU\": {\n", " \"shape\": \"icon\",\n", " \"icon\": {\n", " \"face\": \"FontAwesome\",\n", " \"code\": \"\\uf072\",\n", " \"color\": \"red\"\n", " }\n", " }\n", "}}" ] }, { "cell_type": "markdown", "id": "3cc036d5", "metadata": {}, "source": [ "### Data model\n", "The transportation graph included in this example is relatively straightforward, consisting of airports, routes connecting those airports together, and a `distance` property specifying how long the flight between airports is in miles.\n", "\n", "The following query shows a single `airport` (Anchorage) and all the places you can fly to from that `airport`. After running the query, click the Graph tab to see a visualization of the results." ] }, { "cell_type": "code", "execution_count": null, "id": "436c1b9b", "metadata": { "scrolled": false }, "outputs": [], "source": [ "%%oc -d code\n", "\n", "MATCH p=(a:airport {code: 'ANC'})-[:route]->()\n", "RETURN p" ] }, { "cell_type": "markdown", "id": "ad0c60cd", "metadata": {}, "source": [ "# Examining our transportation graph\n", "\n", "Now that we have seen what our transportation graph looks like, let's start by looking at some overall statistics on the graph and see the shape of our transportation network.\n", "\n", "### How many airports and routes are in the graph?" ] }, { "cell_type": "code", "execution_count": null, "id": "9e675a06", "metadata": {}, "outputs": [], "source": [ "%%oc\n", "\n", "MATCH (a:airport)-[r:route]->()\n", "with count(distinct(a)) as num_airports, count(r) as num_routes \n", "RETURN num_airports, num_routes, toFloat(num_routes)/num_airports as ratio" ] }, { "cell_type": "markdown", "id": "c69044b7", "metadata": {}, "source": [ "As we can see we have around 3400 airports and 50k routes in this dataset meaning that each airport has on average ~14 routes. \n", "\n", "Let's take a look at the airports with the most flight routes." ] }, { "cell_type": "code", "execution_count": null, "id": "370d025e", "metadata": {}, "outputs": [], "source": [ "%%oc\n", "\n", "MATCH (a:airport)-[r:route]->()\n", "RETURN a.desc, count(r) as num_routes\n", "ORDER BY num_routes DESC LIMIT 5" ] }, { "cell_type": "markdown", "id": "f1524970", "metadata": {}, "source": [ "Well, no surprise there as all of these airports are well-known airport hubs.\n", "\n", "### What is the distribution of airports in the data?\n", "\n", "After looking at the overall distribution of items in the graph, another common use case is to group these by a property to look at the distribution. In this case, let's group by the `region` property to see how many airports are in each region/state." ] }, { "cell_type": "code", "execution_count": null, "id": "dd6c4a71", "metadata": {}, "outputs": [], "source": [ "%%oc\n", "\n", "MATCH (a:airport)\n", "RETURN a.region, count(a) as cnt \n", "ORDER BY cnt DESC " ] }, { "cell_type": "markdown", "id": "e6a7baa4", "metadata": {}, "source": [ "That is a bit unexpected. Alaska has 3 times more airports than any other region in this dataset. Let's take a moment to examine some of the details of the airports in Alaska.\n", "\n", "## Where can I fly to from Alaska?\n", "\n", "Running the query below will show you all the locations you can fly to, starting from an airport in Alaska." ] }, { "cell_type": "code", "execution_count": null, "id": "5994fb2b", "metadata": { "scrolled": false }, "outputs": [], "source": [ "%%oc -d code -g country\n", "\n", "MATCH p=(a:airport {region: 'US-AK'})-[]->()\n", "RETURN p" ] }, { "cell_type": "markdown", "id": "671e8879", "metadata": {}, "source": [ "# Examining Graph Logistics using Graph Analytics\n", "\n", "Now that we have taken a look at some of our characteristics of our transportation graph, let's start running some analysis on this graph to see how we can use it to help solve some logistics questions.\n", "\n", "\n", "## Load a Pandas DataFrame for analysis\n", "\n", "Up until now we have been working with Amazon Neptune directly. For this analysis we are going to leverage an integration between Neptune and Pandas DataFrames, supplied by [AWS SDK for pandas](https://aws-sdk-pandas.readthedocs.io/en/stable/), to read and write data from Neptune and the [iGraph](https://igraph.org/) library to perform network analysis/graph algorithms on top of this data.\n", "\n", "Running the cell below will retrieve the required data from Neptune and load it into a Pandas DataFrame. We then break this result up into two DataFrames, one containing the airport information and the other containing the routes connecting the airports that we will use for later analysis." ] }, { "cell_type": "code", "execution_count": null, "id": "f59d6b9e", "metadata": {}, "outputs": [], "source": [ "import awswrangler as wr\n", "import pandas as pd\n", "import igraph as ig\n", "import graph_notebook as gn\n", "from graph_notebook.configuration.generate_config import AuthModeEnum\n", "from IPython.display import HTML, display\n", "\n", "def print_path(g, paths, fields=None):\n", " result=[]\n", " for idx, n in enumerate(paths):\n", " path = []\n", " for a in g.vs[n]:\n", " if fields:\n", " values={}\n", " for f in fields:\n", " values[f]=a[f]\n", " path.append(values)\n", " else:\n", " path.append(a.attributes())\n", " result.append(path)\n", " display(pd.DataFrame(result))\n", "\n", "# Get the configuration information for the notebook\n", "config = gn.configuration.get_config.get_config()\n", "iam=True if config.auth_mode==AuthModeEnum.IAM else False\n", "\n", "# Retrieve Data from neptune\n", "client = wr.neptune.connect(config.host, config.port, iam_enabled=iam)\n", "query = \"\"\"MATCH ()-[r:route]->()\n", "RETURN startnode(r) as source, endnode(r) as target, \n", "id(startnode(r)) as source_id, id(endnode(r)) as target_id, r.dist\"\"\"\n", "\n", "df = wr.neptune.execute_opencypher(client, query)\n", "\n", "# Create the dataframe of airports and remove duplicates\n", "airports = wr.neptune.flatten_nested_df(pd.concat([df['source'].apply(pd.Series).apply(pd.Series), \n", " df['target'].apply(pd.Series).apply(pd.Series)]), seperator=\"_\")\n", "airports = airports.drop_duplicates(subset='~id', keep=\"first\").drop('index', axis=1)\n", "\n", "\n", "# remove the tildas from column names to make life easier\n", "airports.columns = airports.columns.str[1:]\n", "\n", "# Create the routes dataframe\n", "routes = pd.DataFrame(data=df,columns=['source_id', 'target_id', 'r.dist']).apply(pd.Series).apply(pd.Series)\n", "routes.columns = routes.columns.str.replace(\".\", \"_\", regex=False)\n", "display(airports.head(5))\n", "display(routes.head(5))\n", "\n", "g = ig.Graph.DataFrame(routes, directed=True, vertices=airports, use_vids=False)" ] }, { "cell_type": "markdown", "id": "395f536d", "metadata": {}, "source": [ "## Routing in a logistics graph\n", "\n", "One of the most common types of questions commonly asked when looking at logistics is how to move items from `Location A` to `Location B` most effectively. These types of questions are ones where logistics graphs and graph analytics excel through the use of a category of graph algorithms known as path finding algorithms. Path finding algorithms are a set of algorithms that traverse through a graph from a start to an end node to determine the \"most efficient\" path in terms of the number of connections or a weight, which can represent a relative attribute of the connection such as time, distance, cost, capacity, complexity, etc.\n", "\n", "For these examples, we will choose a city that presents some unique logistics problem as an example. We will use [Deadhorse, Alaska, USA](https://en.wikipedia.org/wiki/Deadhorse,_Alaska) as the target for logistics shipping. Deadhorse, AK is a small community in the far northern reaches of Alaska's Prudhoe Bay, approximately 495 miles (~800 kms) from the nearest city. It is also a hub of oil production in Alaska so the logistics of flying equipment and personnel to/from this location is a real challenge. Let's take a look at some of the ways we can utilize a graph to simplify these logistics.\n", "\n", "\n", "### What is the fewest flights from Deadhorse to Miami?\n", "\n", "Let's start by taking a look at one of the simplest path finding queries, finding the shortest path between two locations, in this case, flying from Deadhorse, AK to Miami, FL, USA." ] }, { "cell_type": "code", "execution_count": null, "id": "46f93951", "metadata": {}, "outputs": [], "source": [ "path = g.get_all_shortest_paths(g.vs.find(properties_code='SCC'), g.vs.find(properties_code='MIA'))\n", "print_path(g, path, ['properties_desc'])" ] }, { "cell_type": "markdown", "id": "1be3757c", "metadata": {}, "source": [ "As we see from the results, there are 12 different routes. However, some of these routes seem a bit out-of-the-way. For example, one route goes from `Deadhorse -> Anchorage -> Reykjavik, Iceland -> Miami` which definitely seems unnecessary. This is due to the fact that the shortest path algorithm we ran counted every connection as equal, so it minimized the number of connections. If we want to find the shortest flight distance, then we want to use a shortest path algorithm that takes into account the distance of each route which can be represented as a weight on the `route` edge.\n", "\n", "### Run shortest distance to fly from Deadhorse to Miami?\n", "\n", "Let's see what our route looks like if we take into account the distance of each flight." ] }, { "cell_type": "code", "execution_count": null, "id": "875ef710", "metadata": {}, "outputs": [], "source": [ "path = g.get_all_shortest_paths(g.vs.find(properties_code='SCC'), g.vs.find(properties_code='MIA'), \"r_dist\")\n", "print_path(g, path, ['properties_desc'])" ] }, { "cell_type": "markdown", "id": "50509654", "metadata": {}, "source": [ "Now this route makes a lot of sense and seems very direct. As we can see, in situations where not all connections are equal, adding weights to edges and then using those weights as part of the shortest path calculation provides us a very powerful tool to help solve logistical challenges.\n", "\n", "\n", "### Traveling to multiple locations\n", "\n", "So far we have only looked at what it takes to travel from one location to another, however it is also very common that you need a system that routes a piece of equipment/personnel across multiple locations. This sort of problem is known as a [Traveling Salesman Problem](https://en.wikipedia.org/wiki/Travelling_salesman_problem) and is an [NP-hard](https://en.wikipedia.org/wiki/NP-hardness) problem. Luckily, there are some ways we can approximate this problem such as using a [Minimum Spanning Tree](https://en.wikipedia.org/wiki/Minimum_spanning_tree) to find the shortest path that traverses all the entities.\n", "\n", "Let's take a look at what is the minimum distance required to travel to all 150 airports in Alaska, USA." ] }, { "cell_type": "code", "execution_count": null, "id": "520d26ea", "metadata": {}, "outputs": [], "source": [ "# Retrieve Data from neptune\n", "client = wr.neptune.connect(config.host, config.port, iam_enabled=iam)\n", "query = \"\"\"MATCH p=(a:airport)-[r:route]->(b:airport)\n", "WHERE a.region=\"US-AK\"\n", "AND b.region=\"US-AK\"\n", "RETURN id(a) as source, id(b) as target, r.dist as dist\"\"\"\n", "mst = wr.neptune.execute_opencypher(client, query)\n", "\n", "\n", "# Run a minimum spanning tree\n", "mst_g = ig.Graph.TupleList(mst.itertuples(index=False), directed=True, weights=True)\n", "tree = mst_g.spanning_tree(return_tree=False)\n", "print(\"The minimum spanning tree for all airports in Alaska, USA is:\")\n", "for idx, n in enumerate(tree):\n", " try:\n", " print(g.vs.find(name=mst_g.vs[n]['name'])['properties_desc'])\n", " except:\n", " pass" ] }, { "cell_type": "markdown", "id": "1aa1b0a6", "metadata": {}, "source": [ "With this we can now see the most efficent path that we would need to take to travel to all the airports in Alaska.\n", "\n", "# Conclusion\n", "Graph analysis provides unique insights into transportation/logistics graphs. The analysis demonstrated in this notebook only scratch the surface of the types of powerful insights you can draw using graphs and graph analytics. Finding and understanding these optimized paths and connection patterns between data is a strength of graph, graph databases, and graph analysis that serve questions dealing with logistics efficiently and effectively." ] }, { "cell_type": "markdown", "id": "e058e8e5", "metadata": {}, "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.12" } }, "nbformat": 4, "nbformat_minor": 5 }