Find the people who have worked for a company at a specific location during a particular time period
"Who worked for company X, and at which locations, between Y1-Y2?"
This use case adds another entity, location, and an attribute or pair of attributes to represent a data range.
A location has identity and multiple attributes (a name and an address at a minimum), and is therefore best represented as a vertex. This vertex will help qualify the relationship between a person and a company – a Person WORKED FOR a Company at a particular Location.
This, however, throws up a problem. We can't attribute an edge with a vertex. The structure as we described it above would require something like the following, which is not allowed in a property graph:
This is common of many N-ary modelling scenarios in which we want to relate several entities in a single context. To address in a property graph we add an intermediate node:
Intermediate nodes make visible another patr of the domain — a hidden or implicit concept with its own informational content and meaningful domain semantic.
If you're struggling to come up with a graph structure that captures the complex interdependencies between several things in your domain, look for the nouns, and hence the domain concepts, hidden in the verb phrases you use to describe the structuring of your domain.
Intermediate nodes are usually self-evident wherever an adverbial phrase qualifies a clause. "Li worked at Example Corp, at the HQ, from 21-11-2013 to 23-03-2016, in the role of Analyst" leads us to introduce an intermediate node that connects Li, Example Corp and HQ location. This node represents a Job, to which we can attach the date properties from and to. This new vertex type also provides a suitable site for the role property. We'll drop the WORKED FOR relationships and connect vertices with edges whose labels simply indicate the kind of vertex to be found at the other end of the edge – JOB, LOCATION and COMPANY.
We'll update our dataset with this new structure.
%load_ext ipython_unittest
%run '../util/neptune.py'
neptune.clear()
g = neptune.graphTraversal()
from datetime import *
(g.
addV('Person').property(id,'p-1').property('firstName','Martha').property('lastName','Rivera').
addV('Person').property(id,'p-2').property('firstName','Richard').property('lastName','Roe').
addV('Person').property(id,'p-3').property('firstName','Li').property('lastName','Juan').
addV('Person').property(id,'p-4').property('firstName','John').property('lastName','Stiles').
addV('Person').property(id,'p-5').property('firstName','Saanvi').property('lastName','Sarkar').
addV('Company').property(id,'c-1').property('name','Example Corp').
addV('Company').property(id,'c-2').property('name','AnyCompany').
addV('Location').property(id,'l-1').property('name','HQ').property('address','100 Main St, Anytown').
addV('Location').property(id,'l-2').property('name','Offices').property('address','Downtown, Anytown').
addV('Location').property(id,'l-3').property('name','Exchange').property('address','50 High St, Anytown').
addV('Job').property(id,'j-1').property('from',datetime(2010,10,20)).property('to',datetime(2017,11,1)).
property('role','Principal Analyst').
addV('Job').property(id,'j-2').property('from',datetime(2011,2,16)).property('to',datetime(2013,9,17)).
property('role','Senior Analyst').
addV('Job').property(id,'j-3').property('from',datetime(2013,11,21)).property('to',datetime(2016,3,23)).
property('role','Analyst').
addV('Job').property(id,'j-4').property('from',datetime(2015,2,2)).property('to',datetime(2018,2,8)).
property('role','Analyst').
addV('Job').property(id,'j-5').property('from',datetime(2011,7,15)).property('to',datetime(2017,10,14)).
property('role','Manager').
addV('Job').property(id,'j-6').property('from',datetime(2012,3,23)).property('to',datetime(2013,11,1)).
property('role','Associate Analyst').
V('c-1').addE('LOCATION').to(V('l-1')).
V('c-1').addE('LOCATION').to(V('l-2')).
V('c-2').addE('LOCATION').to(V('l-3')).
V('p-1').addE('JOB').to(V('j-1')).
V('j-1').addE('COMPANY').to(V('c-1')).
V('j-1').addE('LOCATION').to(V('l-1')).
V('p-2').addE('JOB').to(V('j-2')).
V('j-2').addE('COMPANY').to(V('c-1')).
V('j-2').addE('LOCATION').to(V('l-2')).
V('p-3').addE('JOB').to(V('j-3')).
V('j-3').addE('COMPANY').to(V('c-1')).
V('j-3').addE('LOCATION').to(V('l-1')).
V('p-4').addE('JOB').to(V('j-4')).
V('j-4').addE('COMPANY').to(V('c-1')).
V('j-4').addE('LOCATION').to(V('l-2')).
V('p-5').addE('JOB').to(V('j-5')).
V('j-5').addE('COMPANY').to(V('c-2')).
V('j-5').addE('LOCATION').to(V('l-3')).
V('p-3').addE('JOB').to(V('j-6')).
V('j-6').addE('COMPANY').to(V('c-2')).
V('j-6').addE('LOCATION').to(V('l-3')).
toList())
To answer this question, we'll have to perform the following steps:
%%unittest
results = (g.
V('c-1').in_('COMPANY'). # traverse to Job from Company
or_(
(has('from', between(datetime(2015,1,1), datetime(2018,1,1)))), # filter by date
(has('to', between(datetime(2015,1,1), datetime(2018,1,1))))
).
order().by(id).
project('name', 'location').
by(in_('JOB').values('firstName', 'lastName').fold()). # traverse to Person from Job
by(out('LOCATION').values('name', 'address').fold()). # traverse to Location from Job
toList())
assert results == [{'name': ['Martha', 'Rivera'], 'location': ['HQ', '100 Main St, Anytown']},
{'name': ['Li', 'Juan'], 'location': ['HQ', '100 Main St, Anytown']},
{'name': ['John', 'Stiles'], 'location': ['Offices', 'Downtown, Anytown']}]
In revising the model and moving role from an edge to a vertex, we've broken the test for Query 1.
%%unittest
results = (g.V('p-3').
outE('WORKED_FOR').as_('e').
otherV().
project('company', 'role').
by('name').
by(select('e').values('role')).
toList())
assert results == [{'company': 'Example Corp', 'role': 'Analyst'},
{'company': 'AnyCompany', 'role': 'Associate Analyst'}]
%%unittest
results = (g.V('p-3').
out('JOB').
project('company', 'role').
by(out('COMPANY').values('name')).
by('role').
toList())
assert results == [{'company': 'Example Corp', 'role': 'Analyst'},
{'company': 'AnyCompany', 'role': 'Associate Analyst'}]