Use Case 3

Find the people in more senior roles at the companies where X worked

What questions would we have to ask of our data?

"Who were in senior roles at the companies where X worked?"

Once again, we're looking at roles. But now we see there are some structural relations between roles – a role hierarchy. Structural relations are best modelled using edges. Therefore, we will now promote role to become a vertex:

With role now a vertex, our overall data model looks like this:

Sample dataset

Creating some revised sample data

In [1]:
%load_ext ipython_unittest
%run '../util/neptune.py'
In [2]:
neptune.clear()
g = neptune.graphTraversal()
clearing data...
clearing property graph data [edge_batch_size=200, edge_count=Unknown]...
clearing property graph data [vertex_batch_size=200, vertex_count=Unknown]...
clearing rdf data...
done
gremlin: ws://xxxxxxxxxxxxxxxx-xxxxxxxxxxxxxx.cluster-xxxxxxxxxxxx.us-east-1.neptune.amazonaws.com:8182/gremlin
In [3]:
from datetime import *

(g.
   addV('Person').property(id,'p-1').property('firstName','Martha').property('lastName','Rivera').
   addV('Person').property(id,'p-2').property('firstName','Richard').property('lastName','Roe').
   addV('Person').property(id,'p-3').property('firstName','Li').property('lastName','Juan').
   addV('Person').property(id,'p-4').property('firstName','John').property('lastName','Stiles').
   addV('Person').property(id,'p-5').property('firstName','Saanvi').property('lastName','Sarkar').
   addV('Role').property(id,'r-1').property('name','Analyst').
   addV('Role').property(id,'r-2').property('name','Senior Analyst').
   addV('Role').property(id,'r-3').property('name','Principal Analyst').
   addV('Role').property(id,'r-4').property('name','Associate Analyst').
   addV('Role').property(id,'r-5').property('name','Manager').
   addV('Company').property(id,'c-1').property('name','Example Corp').
   addV('Company').property(id,'c-2').property('name','AnyCompany').
   addV('Location').property(id,'l-1').property('name','HQ').property('address','100 Main St, Anytown').
   addV('Location').property(id,'l-2').property('name','Offices').property('address','Downtown, Anytown').
   addV('Location').property(id,'l-3').property('name','Exchange').property('address','50 High St, Anytown').
   addV('Job').property(id,'j-1').property('from',datetime(2010,10,20)).property('to',datetime(2017,11,1)).
   addV('Job').property(id,'j-2').property('from',datetime(2011,2,16)).property('to',datetime(2013,9,17)).
   addV('Job').property(id,'j-3').property('from',datetime(2013,11,21)).property('to',datetime(2016,3,23)).
   addV('Job').property(id,'j-4').property('from',datetime(2015,2,2)).property('to',datetime(2018,2,8)).
   addV('Job').property(id,'j-5').property('from',datetime(2011,7,15)).property('to',datetime(2017,10,14)).
   addV('Job').property(id,'j-6').property('from',datetime(2012,3,23)).property('to',datetime(2013,11,1)).
   V('r-1').addE('PARENT_ROLE').to(V('r-2')).
   V('r-2').addE('PARENT_ROLE').to(V('r-3')).
   V('r-4').addE('PARENT_ROLE').to(V('r-5')).
   V('c-1').addE('LOCATION').to(V('l-1')).
   V('c-1').addE('LOCATION').to(V('l-2')).
   V('c-2').addE('LOCATION').to(V('l-3')). 
   V('p-1').addE('JOB').to(V('j-1')).
   V('j-1').addE('ROLE').to(V('r-3')).
   V('j-1').addE('COMPANY').to(V('c-1')).
   V('j-1').addE('LOCATION').to(V('l-1')).                            
   V('p-2').addE('JOB').to(V('j-2')).
   V('j-2').addE('ROLE').to(V('r-2')).
   V('j-2').addE('COMPANY').to(V('c-1')).
   V('j-2').addE('LOCATION').to(V('l-2')).                            
   V('p-3').addE('JOB').to(V('j-3')).
   V('j-3').addE('ROLE').to(V('r-1')).
   V('j-3').addE('COMPANY').to(V('c-1')).
   V('j-3').addE('LOCATION').to(V('l-1')).
   V('p-4').addE('JOB').to(V('j-4')).
   V('j-4').addE('ROLE').to(V('r-1')).
   V('j-4').addE('COMPANY').to(V('c-1')).
   V('j-4').addE('LOCATION').to(V('l-2')).                              
   V('p-5').addE('JOB').to(V('j-5')).
   V('j-5').addE('ROLE').to(V('r-5')).
   V('j-5').addE('COMPANY').to(V('c-2')).
   V('j-5').addE('LOCATION').to(V('l-3')).
   V('p-3').addE('JOB').to(V('j-6')).
   V('j-6').addE('ROLE').to(V('r-4')).
   V('j-6').addE('COMPANY').to(V('c-2')).
   V('j-6').addE('LOCATION').to(V('l-3')).
   toList())
Out[3]:
[e[74b3e41d-6a27-c729-036e-41f262e2d822][j-6-LOCATION->l-3]]

Querying the data

Query 3 – Who were in senior roles at the companies where Li worked?

To answer this question, we'll have to perform the following steps:

  1. Start at the Person's vertex
  2. Follow JOB and ROLE edges to Roles
  3. Traverse up Role hierarchy
  4. For each parent Role:
    • Get associated Jobs
    • Filter Jobs by date
    • Get Role and Person details for each Job
In [4]:
%%unittest

results = (g.V('p-3').out('JOB').as_('j1'). # traverse from Person to each Job 
 out('ROLE'). # traverse to job's Role
 repeat(out('PARENT_ROLE')).until(outE().count().is_(0)). # climb the Role hierarchy...
 emit().in_('ROLE').as_('j2'). # for each Role encountered in the hierarchy, traverse to the Jobs associated with that role
   or_(        
     (where('j1', between('j2', 'j2')).by('from').by('from').by('to')), # filter based on the dates of the original Job (j1)
     (where('j1', between('j2', 'j2')).by('to').by('from').by('to')),
     (where('j1', lte('j2').and_(gt('j2'))).by('from').by('from').by('to').by('from'))
   ).
 order().by(id). 
 project('role', 'name'). # for each Job emitted from above
   by(out('ROLE').values('name')). # get Role
   by(in_('JOB').values('firstName', 'lastName').fold()). # get Person's name
 toList())

assert results == [{'role': 'Principal Analyst', 'name': ['Martha', 'Rivera']}, 
                   {'role': 'Manager', 'name': ['Saanvi', 'Sarkar']}]

Success
.
----------------------------------------------------------------------
Ran 1 test in 0.028s

OK
Out[4]:
<unittest.runner.TextTestResult run=1 errors=0 failures=0>

Query 2 – Who worked for Example Corp, and at which locations, between 2015-2017?

In [5]:
%%unittest

results = (g.
 V('c-1').in_('COMPANY').
   or_(        
       (has('from', between(datetime(2015,1,1), datetime(2018,1,1)))),
       (has('to', between(datetime(2015,1,1), datetime(2018,1,1))))
     ).
 order().by(id).
 project('name', 'location').
   by(in_('JOB').values('firstName', 'lastName').fold()).
   by(out('LOCATION').values('name', 'address').fold()).
 toList())

assert results == [{'name': ['Martha', 'Rivera'], 'location': ['HQ', '100 Main St, Anytown']},
 {'name': ['Li', 'Juan'], 'location': ['HQ', '100 Main St, Anytown']},
 {'name': ['John', 'Stiles'], 'location': ['Offices', 'Downtown, Anytown']}]

Success
.
----------------------------------------------------------------------
Ran 1 test in 0.023s

OK
Out[5]:
<unittest.runner.TextTestResult run=1 errors=0 failures=0>

Broken tests

Once again we've broken the test for Query 1. Not surprising given that role has been promoted to a vertex.

Query 1 [BROKEN] – Which companies has Li worked for, and in what roles?

In [6]:
%%unittest

results = (g.V('p-3').
             out('JOB').
             project('company', 'role').
             by(out('COMPANY').values('name')).
             by('role').
             toList())

assert results == [{'company': 'Example Corp', 'role': 'Analyst'}, 
                   {'company': 'AnyCompany', 'role': 'Associate Analyst'}]

Fail
E
======================================================================
ERROR: test_1 (__main__.JupyterTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "Cell Tests", line 6, in test_1
  File "/home/ec2-user/anaconda3/envs/python3/lib/python3.6/site-packages/gremlin_python/process/traversal.py", line 57, in toList
    return list(iter(self))
  File "/home/ec2-user/anaconda3/envs/python3/lib/python3.6/site-packages/gremlin_python/process/traversal.py", line 47, in __next__
    self.traversal_strategies.apply_strategies(self)
  File "/home/ec2-user/anaconda3/envs/python3/lib/python3.6/site-packages/gremlin_python/process/traversal.py", line 373, in apply_strategies
    traversal_strategy.apply(traversal)
  File "/home/ec2-user/anaconda3/envs/python3/lib/python3.6/site-packages/gremlin_python/driver/remote_connection.py", line 143, in apply
    remote_traversal = self.remote_connection.submit(traversal.bytecode)
  File "/home/ec2-user/anaconda3/envs/python3/lib/python3.6/site-packages/gremlin_python/driver/driver_remote_connection.py", line 54, in submit
    results = result_set.all().result()
  File "/home/ec2-user/anaconda3/envs/python3/lib/python3.6/concurrent/futures/_base.py", line 432, in result
    return self.__get_result()
  File "/home/ec2-user/anaconda3/envs/python3/lib/python3.6/concurrent/futures/_base.py", line 384, in __get_result
    raise self._exception
  File "/home/ec2-user/anaconda3/envs/python3/lib/python3.6/site-packages/gremlin_python/driver/resultset.py", line 81, in cb
    f.result()
  File "/home/ec2-user/anaconda3/envs/python3/lib/python3.6/concurrent/futures/_base.py", line 425, in result
    return self.__get_result()
  File "/home/ec2-user/anaconda3/envs/python3/lib/python3.6/concurrent/futures/_base.py", line 384, in __get_result
    raise self._exception
  File "/home/ec2-user/anaconda3/envs/python3/lib/python3.6/concurrent/futures/thread.py", line 56, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/home/ec2-user/anaconda3/envs/python3/lib/python3.6/site-packages/gremlin_python/driver/connection.py", line 78, in _receive
    status_code = self._protocol.data_received(data, self._results)
  File "/home/ec2-user/anaconda3/envs/python3/lib/python3.6/site-packages/gremlin_python/driver/protocol.py", line 98, in data_received
    "{0}: {1}".format(status_code, message["status"]["message"]))
gremlin_python.driver.protocol.GremlinServerError: 500: {"requestId":"6123e051-ad82-41da-8953-07bfe9dbab88","code":"InternalFailureException","detailedMessage":"The property does not exist as the key has no associated value for the provided element: v[j-3]:role"}

----------------------------------------------------------------------
Ran 1 test in 0.012s

FAILED (errors=1)
Out[6]:
<unittest.runner.TextTestResult run=1 errors=1 failures=0>

Query 1 (revised) – Which companies has Li worked for, and in what roles?

In [7]:
%%unittest

results = (g.V('p-3').
             out('JOB').
             project('company', 'role').
             by(out('COMPANY').values('name')).
             by(out('ROLE').values('name')).
             toList())

assert results == [{'company': 'Example Corp', 'role': 'Analyst'}, 
                   {'company': 'AnyCompany', 'role': 'Associate Analyst'}]

Success
.
----------------------------------------------------------------------
Ran 1 test in 0.014s

OK
Out[7]:
<unittest.runner.TextTestResult run=1 errors=0 failures=0>