{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Use Case 2" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Find the people who have worked for a company at a specific location during a particular time period" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## What questions would we have to ask of our data?" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "_\"Who worked for company X, and at which locations, between Y1-Y2?\"_\n", "\n", "This use case adds another entity, _location_, and an attribute or pair of attributes to represent a data range.\n", "\n", "A location has identity and multiple attributes (a name and an address at a minimum), and is therefore best represented as a vertex. This vertex will help qualify the relationship between a person and a company – a _Person_ _WORKED FOR_ a _Company_ at a particular _Location_.\n", "\n", "This, however, throws up a problem. We can't attribute an edge with a vertex. The structure as we described it above would require something like the following, which is not allowed in a property graph:" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This is common of many N-ary modelling scenarios in which we want to relate several entities in a single context. To address in a property graph we add an intermediate node:" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Intermediate nodes" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Intermediate nodes make visible another patr of the domain — a hidden or implicit concept with its own informational content and meaningful domain semantic.\n", "\n", "If you're struggling to come up with a graph structure that captures the complex interdependencies between several things in your domain, look for the nouns, and hence the domain concepts, hidden in the verb phrases you use to describe the structuring of your domain. \n", "\n", "Intermediate nodes are usually self-evident wherever an adverbial phrase qualifies a clause. \"Li worked at Example Corp, at the HQ, from 21-11-2013 to 23-03-2016, in the role of Analyst\" leads us to introduce an intermediate node that connects Li, Example Corp and HQ location. This node represents a _Job_, to which we can attach the date properties _from_ and _to_. This new vertex type also provides a suitable site for the _role_ property. We'll drop the _WORKED FOR_ relationships and connect vertices with edges whose labels simply indicate the kind of vertex to be found at the other end of the edge – _JOB_, _LOCATION_ and _COMPANY_." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Sample dataset" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We'll update our dataset with this new structure." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Creating some revised sample data" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%load_ext ipython_unittest\n", "%run '../util/neptune.py'" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "neptune.clear()\n", "g = neptune.graphTraversal()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from datetime import *\n", "\n", "(g.\n", " addV('Person').property(id,'p-1').property('firstName','Martha').property('lastName','Rivera').\n", " addV('Person').property(id,'p-2').property('firstName','Richard').property('lastName','Roe').\n", " addV('Person').property(id,'p-3').property('firstName','Li').property('lastName','Juan').\n", " addV('Person').property(id,'p-4').property('firstName','John').property('lastName','Stiles').\n", " addV('Person').property(id,'p-5').property('firstName','Saanvi').property('lastName','Sarkar').\n", " addV('Company').property(id,'c-1').property('name','Example Corp').\n", " addV('Company').property(id,'c-2').property('name','AnyCompany').\n", " addV('Location').property(id,'l-1').property('name','HQ').property('address','100 Main St, Anytown').\n", " addV('Location').property(id,'l-2').property('name','Offices').property('address','Downtown, Anytown').\n", " addV('Location').property(id,'l-3').property('name','Exchange').property('address','50 High St, Anytown').\n", " addV('Job').property(id,'j-1').property('from',datetime(2010,10,20)).property('to',datetime(2017,11,1)).\n", " property('role','Principal Analyst').\n", " addV('Job').property(id,'j-2').property('from',datetime(2011,2,16)).property('to',datetime(2013,9,17)).\n", " property('role','Senior Analyst').\n", " addV('Job').property(id,'j-3').property('from',datetime(2013,11,21)).property('to',datetime(2016,3,23)).\n", " property('role','Analyst').\n", " addV('Job').property(id,'j-4').property('from',datetime(2015,2,2)).property('to',datetime(2018,2,8)).\n", " property('role','Analyst').\n", " addV('Job').property(id,'j-5').property('from',datetime(2011,7,15)).property('to',datetime(2017,10,14)).\n", " property('role','Manager').\n", " addV('Job').property(id,'j-6').property('from',datetime(2012,3,23)).property('to',datetime(2013,11,1)).\n", " property('role','Associate Analyst').\n", " V('c-1').addE('LOCATION').to(V('l-1')).\n", " V('c-1').addE('LOCATION').to(V('l-2')).\n", " V('c-2').addE('LOCATION').to(V('l-3')). \n", " V('p-1').addE('JOB').to(V('j-1')).\n", " V('j-1').addE('COMPANY').to(V('c-1')).\n", " V('j-1').addE('LOCATION').to(V('l-1')). \n", " V('p-2').addE('JOB').to(V('j-2')).\n", " V('j-2').addE('COMPANY').to(V('c-1')).\n", " V('j-2').addE('LOCATION').to(V('l-2')). \n", " V('p-3').addE('JOB').to(V('j-3')).\n", " V('j-3').addE('COMPANY').to(V('c-1')).\n", " V('j-3').addE('LOCATION').to(V('l-1')).\n", " V('p-4').addE('JOB').to(V('j-4')).\n", " V('j-4').addE('COMPANY').to(V('c-1')).\n", " V('j-4').addE('LOCATION').to(V('l-2')). \n", " V('p-5').addE('JOB').to(V('j-5')).\n", " V('j-5').addE('COMPANY').to(V('c-2')).\n", " V('j-5').addE('LOCATION').to(V('l-3')).\n", " V('p-3').addE('JOB').to(V('j-6')).\n", " V('j-6').addE('COMPANY').to(V('c-2')).\n", " V('j-6').addE('LOCATION').to(V('l-3')).\n", " toList())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Querying the data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Query 2 – Who worked for Example Corp, and at which locations, between 2015-2017?" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To answer this question, we'll have to perform the following steps:\n", "\n", " 1. Start at Company vertex\n", " 2. Traverse to Job vertices\n", " 3. Filter by date\n", " 4. Traverse to Person and Location vertices" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%%unittest\n", "\n", "results = (g.\n", " V('c-1').in_('COMPANY'). # traverse to Job from Company\n", " or_( \n", " (has('from', between(datetime(2015,1,1), datetime(2018,1,1)))), # filter by date\n", " (has('to', between(datetime(2015,1,1), datetime(2018,1,1))))\n", " ).\n", " order().by(id).\n", " project('name', 'location').\n", " by(in_('JOB').values('firstName', 'lastName').fold()). # traverse to Person from Job\n", " by(out('LOCATION').values('name', 'address').fold()). # traverse to Location from Job\n", " toList())\n", "\n", "assert results == [{'name': ['Martha', 'Rivera'], 'location': ['HQ', '100 Main St, Anytown']},\n", " {'name': ['Li', 'Juan'], 'location': ['HQ', '100 Main St, Anytown']},\n", " {'name': ['John', 'Stiles'], 'location': ['Offices', 'Downtown, Anytown']}]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Broken tests" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In revising the model and moving _role_ from an edge to a vertex, we've broken the test for Query 1." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Query 1 [BROKEN] – Which companies has Li worked for, and in what roles?" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%%unittest\n", "\n", "results = (g.V('p-3').\n", " outE('WORKED_FOR').as_('e').\n", " otherV().\n", " project('company', 'role').\n", " by('name').\n", " by(select('e').values('role')).\n", " toList())\n", "\n", "assert results == [{'company': 'Example Corp', 'role': 'Analyst'}, \n", " {'company': 'AnyCompany', 'role': 'Associate Analyst'}]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Query 1 (revised) – Which companies has Li worked for, and in what roles?" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%%unittest\n", "\n", "results = (g.V('p-3').\n", " out('JOB').\n", " project('company', 'role').\n", " by(out('COMPANY').values('name')).\n", " by('role').\n", " toList())\n", "\n", "assert results == [{'company': 'Example Corp', 'role': 'Analyst'}, \n", " {'company': 'AnyCompany', 'role': 'Associate Analyst'}]" ] } ], "metadata": { "kernelspec": { "display_name": "conda_python3", "language": "python", "name": "conda_python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.5" } }, "nbformat": 4, "nbformat_minor": 2 }