
# Neptune Ontology Example
This notebook shows the use of a semantic ontology in Neptune. We use the organizational ontology (https://www.w3.org/TR/vocab-org/) defined using OWL. 

For more context, read the AWS blog post https://aws.amazon.com/blogs/database/model-driven-graphs-using-owl-in-amazon-neptune/

Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
SPDX-License-Identifier: MIT-0

Begin by setting up. Run the next cell to instruct the notebook to get Neptune data from S3 bucket provisioned for you.

In [None]:
import os
import subprocess

stream = os.popen("source ~/.bashrc ; echo $STAGE_BUCKET; echo $M2C_ANALYSIS_BUCKET")
lines=stream.read().split("\n")
STAGING_BUCKET=lines[0]
STAGING_BUCKET

## Loading the Ontology and Examples into Neptune

First, load the organizational ontology into Neptune. The ontology is written as a set of RDF triples in Turtle form. Load it using Neptune's loader; modify the -s argument if the S3 bucket name does not match yours. You will be prompted with a submit form. Click Submit to run the loader, and check it completes successfully.


In [None]:
%load -s s3://{STAGING_BUCKET}/data/org.ttl -f turtle --named-graph-uri=http://www.w3.org/ns/org

Next load the sample data set, which depicts a fictional organization and member structure. Load using the same approach as above. Check the S3 bucket and modify if necessary.

In [None]:
%load -s s3://{STAGING_BUCKET}/data/example_org.ttl -f turtle --named-graph-uri=http://amazonaws.com/db/neptune/examples/ontology/org

Finally load a contrived ontology meant to test edge cases not covered by the org ontology. Modify S3 bucket if necessary.

In [None]:
%load -s s3://{STAGING_BUCKET}/data/tester_ontology.ttl -f turtle --named-graph-uri=http://amazonaws.com/db/neptune/examples/ontology/tester

## Querying Org Ontology

Let's query the organizational ontology to discover classes and properties. Let's first get a high-level picture of the classes. The first query finds OWL classes as well as keys, equivalent classes and subclasses. Among the classes shown in the results are expected ones like http://www.w3.org/ns/org#Organization and http://www.w3.org/ns/org#Role. But we also see peculiar classes that are blank nodes, which begin with the letter b. We will make sense of these later in the notebook when we build a model.

In [None]:
%%sparql

# You will notice some of the classes or related classes are blank nodes. 
# We need to drill down and see that they include.
# Not here, though.

PREFIX rdf: 
PREFIX rdfs: 
PREFIX owl: 

select ?class 
 (GROUP_CONCAT(distinct ?subOf;SEPARATOR=",") AS ?subsOf)
 (GROUP_CONCAT(distinct ?equiv;SEPARATOR=",") AS ?equivs)
 (GROUP_CONCAT(distinct ?key;SEPARATOR=",") AS ?keys) where { 
 ?class rdf:type owl:Class .
 OPTIONAL { ?class rdfs:subClassOf ?subOf . } .
 OPTIONAL { ?class owl:equivalentClass ?equiv . } .
 OPTIONAL { ?class owl:hasKey ?keylist . ?keylist rdf:rest*/rdf:first ?key . } .
} group by ?class 
order by ?class


Now let's connect properties to classes. We list properties whose domain is one of the classes from the results above. For each we also get the range and the property type. The results mostly make sense, but we continue to see blank nodes. For example, the class associated with the http://www.w3.org/ns/org#role property is blank. We make sense of this later in the notebook.

In [None]:
%%sparql 

PREFIX rdf: 
PREFIX rdfs: 
PREFIX owl: 

select ?class ?prop ?range 
(GROUP_CONCAT(distinct ?propType;SEPARATOR=",") AS ?propTypes) where { 
 ?class rdf:type owl:Class .
 ?prop rdfs:domain ?class .
 ?prop rdf:type ?propType .
 OPTIONAL {?prop rdfs:range ?range } .
} 
group by ?class ?prop ?range
order by ?class ?prop 

## Querying Example Data

Now let's query the example organization to discover orgs, suborgs, employees and roles. First, we list organizations, suborganizations, and organizational units, as well as the sites of the organizations. 

In [None]:
%%sparql

PREFIX rdf: 
PREFIX rdfs: 
PREFIX org: 

select ?orgName ?subName ?unitName ?siteName where {
 ?org rdf:type org:Organization .
 ?org rdfs:label ?orgName .
 OPTIONAL { ?org org:hasSubOrganization/rdfs:label ?subName } .
 OPTIONAL { ?org org:hasUnit/rdfs:label ?unitName . } .
 OPTIONAL { ?org org:hasSite/rdfs:label ?siteName . }
} order by ?orgName

Let's also check organizational history. Run the next query to see a change event.

In [None]:
%%sparql

PREFIX org: 

select ?event ?prop ?obj where {
 ?event rdf:type org:ChangeEvent .
 ?event ?prop ?obj .
} order by ?event ?prop

Now let's list some of the people in these organizations. Notice in the query results the org:memberOf and org:basedAt relationships, which tie the person to an organization and a site.

In [None]:
%%sparql

PREFIX foaf: 

select ?person ?prop ?obj where {
 ?person rdf:type foaf:Person .
 ?person ?prop ?obj .
} order by ?person ?prop

Let's run a path query to see the hierarchical structure of OrgFinancial.

In [None]:
%%sparql

PREFIX org: 
PREFIX ex: 

select ?personName ?boss (GROUP_CONCAT(?superiorName;SEPARATOR=",") AS ?superiors) where {
 ?person org:memberOf ex:Org-MegaFinancial .
 ?person rdfs:label ?personName .
 OPTIONAL {
 ?person org:reportsTo/rdfs:label ?boss .
 ?person org:reportsTo+ ?superior .
 ?superior rdfs:label ?superiorName .
 } .
} group by ?personName ?boss


Finally, let's see roles and posts in the MegaSystems organization. Run the next two queries.

In [None]:
%%sparql

PREFIX org: 
PREFIX ex: 

select ?post ?postHolder where {
 ?post rdf:type org:Post .
 ?post org:postIn ex:Org-MegaSystems . 
 OPTIONAL {
 ?postHolder org:holds ?post .
 }
}

In [None]:
%%sparql

PREFIX org: 
PREFIX ex: 

select ?role ?roleHolder where {
 ?role rdf:type org:Role .
 ?membership rdf:type org:Membership .
 ?membership org:role ?role .
 ?membership org:organization ex:Org-MegaSystems .
 ?membership org:member ?roleHolder
}

## Enforcing the Ontology!

Now let's bring things together. We need to understand the purpose of those blank nodes above! We also need to check whether our sample data matches the structure expected by the ontology. Finally, let's make use of that structure to insert new members and orgs, guided by a boilerplate structure. 

### Build the Model
The first step is to gather a bit more information from the ontology. We need to "fill in the blanks!". Run the next cell to obtain a complete picture of the ontology. The code that follows runs several queries and brings them together into an opinionated interface, or model, of classes and expected properties.

In [None]:
from IPython.utils import io

# check if uri is bnode or not
def is_bnode(uri):
 return uri.startswith("b")

# check if list contains the val
def list_has_value(list, val):
 try:
 list.index(val)
 return True
 except ValueError:
 return False

# run sparql magic on the specified query. return the results
def run_query(q):
 with io.capture_output() as captured: 
 ipython = get_ipython()
 mgc = ipython.run_cell_magic
 mgc(magic_name = "sparql", line = "--store-to query_res", cell=q) 
 return query_res["results"]["bindings"]
 

# build our model
def build_model():
 
 # Out of scope OWL stuff for this example: 
 # AllDisjointClases, disjointUnionOf
 # assertions - same/diff ind, obj/data prop assertion, neg obj/data prop assertion
 # annotations
 # top/bottom property
 # restriction onProperties;
 # but restriction onProperty IS supported 
 # cardinality
 # but will consider FunctionalProperty
 # Datatype and data ranges
 
 # Limitation: for datatype properties, consider only strings.
 
 CLASS_QUERY = """
PREFIX rdf: 
PREFIX rdfs: 
PREFIX owl: 

select ?class 
 (GROUP_CONCAT(distinct ?subOf;SEPARATOR=",") AS ?subsOf)
 (GROUP_CONCAT(distinct ?equiv;SEPARATOR=",") AS ?equivs)
 (GROUP_CONCAT(distinct ?complement;SEPARATOR=",") AS ?complements) 
 (GROUP_CONCAT(distinct ?keyList;SEPARATOR=",") AS ?keys) 
 (GROUP_CONCAT(distinct ?kentry;SEPARATOR=",") AS ?keyEntries) 
 (GROUP_CONCAT(distinct ?uList;SEPARATOR=",") AS ?unions) 
 (GROUP_CONCAT(distinct ?iList;SEPARATOR=",") AS ?intersections) 
 (GROUP_CONCAT(distinct ?ientry;SEPARATOR=",") AS ?intersectionEntries) 
 (GROUP_CONCAT(distinct ?oneList;SEPARATOR=",") AS ?oneOfs) 
 (GROUP_CONCAT(distinct ?disj;SEPARATOR=",") AS ?disjoints) 
 where { 
 ?class rdf:type owl:Class .
 OPTIONAL { ?class rdfs:subClassOf+ ?subOf . } .
 OPTIONAL { ?class owl:equivalentClass+ ?equiv . } .
 OPTIONAL { ?class owl:complementOf ?complement . } .
 OPTIONAL { ?class owl:hasKey ?keyList . } .
 OPTIONAL { ?class owl:hasKey ?kl . ?kl rdf:rest*/rdf:first ?kentry . } .
 OPTIONAL { ?class owl:unionOf ?uList . } . 
 OPTIONAL { ?class owl:intersectionOf ?iList . } . 
 OPTIONAL { ?class owl:intersectionOf ?il . ?il rdf:rest*/rdf:first ?ientry . } .
 OPTIONAL { ?class owl:oneOf ?oneList . } .
 OPTIONAL { ?class owl:disjointWith ?disj . } . 
} group by ?class
 """

 PROP_QUERY = """
PREFIX rdf: 
PREFIX rdfs: 
PREFIX owl: 

select ?prop 
 (GROUP_CONCAT(distinct ?subPropOf;SEPARATOR=",") AS ?subsOf) 
 (GROUP_CONCAT(distinct ?equiv;SEPARATOR=",") AS ?equivs) 
 (GROUP_CONCAT(distinct ?domain;SEPARATOR=",") AS ?domains) 
 (GROUP_CONCAT(distinct ?du;SEPARATOR=",") AS ?domainUs) 
 (GROUP_CONCAT(distinct ?range;SEPARATOR=",") AS ?ranges) 
 (GROUP_CONCAT(distinct ?ru;SEPARATOR=",") AS ?rangeUs) 
 (GROUP_CONCAT(distinct ?disj;SEPARATOR=",") AS ?disjoints) 
 (GROUP_CONCAT(distinct ?inv;SEPARATOR=",") AS ?inverses) 
 (GROUP_CONCAT(distinct ?type;SEPARATOR=",") AS ?types) 
 where {

 { ?prop rdf:type rdf:Property . }
 UNION
 { ?prop rdf:type owl:ObjectProperty . }
 UNION
 { ?prop rdf:type owl:DatatypeProperty . } .
 OPTIONAL { ?prop rdfs:subPropertyOf+ ?subPropOf . } .
 OPTIONAL { ?prop rdfs:equivalentProperty+ ?equiv . } .
 OPTIONAL { ?prop rdfs:domain ?domain } .
 OPTIONAL { ?prop rdfs:domain/owl:unionOf ?u . ?u rdf:rest*/rdf:first ?du . } .
 OPTIONAL { ?prop rdfs:range ?range } .
 OPTIONAL { ?prop rdfs:range/owl:unionOf ?u1 . ?u1 rdf:rest*/rdf:first ?ru . } .
 OPTIONAL { ?prop owl:propertyDisjointWith ?disj . } . 
 OPTIONAL { { ?prop owl:inverseOf ?inv } UNION { ?inv owl:inverseOf ?prop } } . 
 ?prop rdf:type ?type . # allows us to check functional, transitive, etc
} 
group by ?prop
 """

 RESTRICTION_QUERY ="""
PREFIX rdf: 
PREFIX rdfs: 
PREFIX owl: 

select ?restriction ?prop 
 (GROUP_CONCAT(distinct ?allClass;SEPARATOR=",") AS ?allFromClasses)
 (GROUP_CONCAT(distinct ?someClass;SEPARATOR=",") AS ?someFromClasses)
 (GROUP_CONCAT(distinct ?lval;SEPARATOR=",") AS ?lvals) 
 (GROUP_CONCAT(distinct ?ival;SEPARATOR=",") AS ?ivals) 
 where { 
 ?restriction rdf:type owl:Restriction .
 ?restriction owl:onProperty ?prop .
 OPTIONAL { ?restriction owl:allValuesFrom ?allClass . } .
 OPTIONAL { ?restriction owl:someValuesFrom ?someClass . } .
 OPTIONAL { ?restriction owl:hasValue ?lval . FILTER(isLiteral(?lval)) . } .
 OPTIONAL { ?restriction owl:hasValue ?ival . FILTER(!isLiteral(?ival)) . } .
} group by ?restriction ?prop
 """

 LIST_QUERY = """
PREFIX rdf: 
PREFIX rdfs: 
PREFIX owl: 

select ?list (GROUP_CONCAT(distinct ?entity;SEPARATOR=",") AS ?entities) where { 
 ?subject owl:unionOf|owl:intersectionOf|owl:oneOf|owl:onProperties|owl:members|owl:disjoinUnionOf|owl:propertyChainAxioms|owl:hasKey ?list .
 OPTIONAL {?list rdf:rest*/rdf:first ?entity . } .
} group by ?list
 """

 # sub-function to run a sparql query and transform it
 # the transform works like this
 # sparql result: [ { "col1": { value "a"}, "col2": { value: "b,c"}, "col3 : { value: "d"}"}]
 # transformed: [ "a": { "col2": ["b", "c"], "col3", "d"}]
 # Here "col1" is the key, so the "a" becomes the key
 # "b,c" is comma-sep value and is transformed to list ["b", "c"]
 # "col3" is a single, so it its val is "d" rather than ["d"]
 def run_model_query(q, key, singles):
 res = run_query(q)
 result_dict = {}
 for rec in res:
 this_rec = {"visited": False, "visitedForProps": False, "discoveredProps": [], "restrictedProps": []}
 for rec_key in rec:
 val = str(rec[rec_key]["value"])
 if rec_key == key:
 this_rec[rec_key] = val
 result_dict[val] = this_rec
 elif list_has_value(singles, rec_key) :
 this_rec[rec_key] = val
 elif val == "":
 this_rec[rec_key] = []
 else:
 toks = val.split(",")
 this_rec[rec_key] = toks

 return result_dict 

 # run the queries
 class_res = run_model_query(CLASS_QUERY, "class", [])
 prop_res = run_model_query(PROP_QUERY, "prop", [])
 restriction_res = run_model_query(RESTRICTION_QUERY, "restriction", ["prop"])
 list_res = run_model_query(LIST_QUERY, "list", [])
 classes = list(class_res.keys())
 props = list(prop_res.keys())
 restrictions = list(restriction_res.keys())
 lists = list(list_res.keys())

 # 
 # Walk functions. If a class/prop refers to a bnode, let's drill down and see what that bnode is.
 # Walk the bnode too, and capture its structure in the parent class/prop.
 # Example, suppose a class has a subClassOf b, where b is a bnode. 
 # What is that bnode? It might be a class that is a restriction on a property. 
 # That's useful to know, so we capture that expanded view in the parent class.
 #

 def make_walked_node(b, v):
 return {"bnode": b, "obj": v}

 def expand_list(rec, keys):
 for list_type in keys:
 new_list = []
 for entry in rec[list_type]:
 if is_bnode(entry):
 new_list.append(make_walked_node(entry, walk(entry)))
 else:
 new_list.append(entry)
 rec[list_type+"_expand"] = new_list
 
 
 def walk(entry):
 if list_has_value(classes, entry):
 return walk_class(entry)
 elif list_has_value(restrictions, entry):
 return walk_restriction(entry)
 elif list_has_value(props, entry):
 return walk_prop(entry)
 elif list_has_value(lists, entry):
 return walk_list(entry)
 else:
 return entry

 def walk_list(l):
 #print("visit list " + l)
 if list_has_value(lists, l):
 rec = list_res[l]
 if rec["visited"]:
 return rec
 else:
 new_list = []
 expand_list(rec, ["entities"])
 rec["visited"] = True
 return rec
 else:
 return l
 
 
 def walk_class(clazz):
 #print("visit class " + clazz)
 if list_has_value(classes, clazz):
 rec = class_res[clazz]
 if rec["visited"]:
 return rec
 else:
 expand_list(rec, ["keys", "subsOf", "equivs", "complements", "disjoints", "unions", "intersections"])
 rec["visited"] = True
 return rec
 else:
 return clazz

 def walk_prop(prop):
 #print("visit prop " + prop)
 if list_has_value(props, prop):
 rec = prop_res[prop]
 if rec["visited"]:
 return rec
 else:
 expand_list(rec, ["subsOf", "equivs", "inverses", "disjoints", "domains", "ranges"])
 rec["functional"] = list_has_value(rec["types"], "http://www.w3.org/2002/07/owl#FunctionalProperty")
 if list_has_value(rec["types"], "http://www.w3.org/2002/07/owl#ObjectProperty"):
 rec["propType"] = "ObjectProperty"
 elif list_has_value(rec["types"], "http://www.w3.org/2002/07/owl#DatatypeProperty"):
 rec["propType"] = "DatatypeProperty"
 elif list_has_value(rec["types"], "http://www.w3.org/1999/02/22-rdf-syntax-ns#Property"):
 rec["propType"] = "Property" 
 rec["visited"] = True
 return rec 
 else:
 return clazz

 def walk_restriction(restriction) :
 #print("visit restriction " + restriction)
 if list_has_value(restrictions, restriction):
 rec = restriction_res[restriction]
 if rec["visited"]:
 return rec
 else:
 if is_bnode(rec["prop"]):
 rec["prop"] = make_walked_node(rec["prop"], walk(rec["prop"]))
 expand_list(rec, ["allFromClasses", "someFromClasses"])
 rec["visited"] = True
 return rec
 else:
 return restriction

 # walk the properties and classes, bringing in dependencies like lists, restrictions, and related classes
 for entry in prop_res:
 walk_prop(entry)
 for entry in class_res:
 walk_class(entry)

 # for the given prop, if it belongs to expected_clazz, return the prop plus super-props
 def get_props(prop, expected_clazz):
 if list_has_value(props, prop):
 candidate = False
 if expected_clazz == None:
 candidate = True
 else:
 # class is domain
 for dom in prop_res[prop]["domains"]:
 if dom == expected_clazz:
 candidate = True
 break
 # domain is union and includes class
 for dom in prop_res[prop]["domainUs"]:
 if dom == expected_clazz:
 candidate = True
 break
 if candidate:
 # return this prop and props of which the prop is subsOf
 return list(set([prop] + prop_res[prop]["subsOf"]))
 else:
 return []
 else:
 return []

 # recursively walk the class, looking for properties.
 def walk_class_for_props(clazz):
 #print("visit " + clazz)
 # am i a class or a restriction?
 if list_has_value(restrictions, clazz):
 if not(restriction_res[clazz]["visitedForProps"]):
 #print(" restriction visit " + clazz)
 restriction_res[clazz]["visitedForProps"] = True
 prop_uri = restriction_res[clazz]["prop"]
 restriction_res[clazz]["restrictedProps"] = [{
 "prop": prop_uri,
 "restriction": clazz,
 "all" : restriction_res[clazz]["allFromClasses"], 
 "some": restriction_res[clazz]["someFromClasses"],
 "lvals": restriction_res[clazz]["lvals"],
 "ivals": restriction_res[clazz]["ivals"] }]
 return restriction_res[clazz]
 elif list_has_value(classes, clazz):
 if not(class_res[clazz]["visitedForProps"]):
 #print(" class visit " + clazz)
 
 # if i'm not a bnode, get all props that apply to me
 if not(is_bnode(clazz)):
 for prop in props:
 class_res[clazz]["discoveredProps"] = list(set(class_res[clazz]["discoveredProps"] + get_props(prop, clazz)))
 
 for list_type in ["subsOf", "intersectionEntries", "equivs"]:
 for entry in class_res[clazz][list_type]:
 can_use = list_has_value(classes, entry) or list_has_value(restrictions, entry)
 if list_type == 'equivs' and is_bnode(entry) == False:
 can_use = False
 if can_use:
 # recurse for subsOf, intersectionEntries, equivs (restrictions only)
 recurse_result = walk_class_for_props(entry)
 class_res[clazz]["discoveredProps"] = list(set( class_res[clazz]["discoveredProps"] + recurse_result["discoveredProps"]))
 class_res[clazz]["restrictedProps"] += recurse_result["restrictedProps"]
 class_res[clazz]["visitedForProps"] = True
 return class_res[clazz]
 
 else:
 print(" VERY BAD visit " + clazz)
 return None 
 
 # for each class determine the properties by walking
 for entry in class_res:
 if not(is_bnode(entry)):
 walk_class_for_props(entry)

 # return the model - the classes and properties discovered
 return {
 "classes": class_res,
 "props": prop_res
 }

# Print the model
def print_model_summary(model) :
 for clazz in model["classes"]:
 if is_bnode(clazz) == False:
 print("Class " + clazz) 
 print("\tkeys " + str(model["classes"][clazz]["keyEntries"]))
 print("\n")
 for r in model["classes"][clazz]["restrictedProps"]:
 print("\tRestriction on prop " + r["prop"])
 print("\t\tall " + str(r["all"]))
 print("\t\tsome " + str(r["some"]))
 print("\t\tliteral values " + str(r["lvals"]))
 print("\t\tobject values " + str(r["ivals"]))
 for prop in model["classes"][clazz]["discoveredProps"]:
 print("\tProp " + prop)
 if prop in model["props"]:
 prop_def = model["props"][prop]
 print("\t\ttype " + prop_def["propType"])
 print("\t\tfunctional " + str(prop_def["functional"]))
 print("\t\tinverses " + str(prop_def["inverses"]))
 print("\t\trange " + str(prop_def["ranges"]))
 print("\t\trangeUnionOf " + str(prop_def["rangeUs"]))
 
 
model = build_model()
print_model_summary(model)



### Generation
Finally, given the interface we determined above, let's generate sample Turtle. This acts as our boilerplate for new data.

In [None]:
counter = {"current": 0}

# Prefixes for generated Turtle
SAMPLE_HEADER = """
@base .
@prefix ex: .
@prefix owl: .
@prefix rdf: .
@prefix rdfs: .
"""

# generate samples instances for clazz based on model
def generate_sample(model, clazz):

 # start building Turtle
 gen_result = {"ttl": ""}
 
 # Generate sample URI
 def sample_uri(clazz):
 #clazz is an IRI. Get the last token, which follows either the last / or a #
 counter["current"]+= 1
 inst_num = counter["current"]
 clazz_name = clazz.split("/")[-1].split("#")[-1]
 return clazz_name + "-sample-" + str(inst_num)
 
 inst_name = sample_uri(clazz)
 class_def = model["classes"][clazz]
 props = model["props"]
 keys = class_def["keyEntries"]
 discovered_props = class_def["discoveredProps"]
 restricted_props = class_def["restrictedProps"]
 last_idx = 0
 
 # In Turtle, instance has rdf:type that is clazz
 gen_result["ttl"] += f"""
#
# Sample for class {clazz}
# 

# Instantiate
ex:{inst_name} rdf:type <{clazz}> .
 """
 
 #
 # finder helpers
 # 
 
 def find_restricted(prop):
 for entry in restricted_props:
 if entry["prop"] == prop:
 # could there be more than one entry with prop;
 # not sure how; take the first one
 return entry
 return None
 def find_discovered(prop):
 if list_has_value(discovered_props, prop):
 if prop in props: 
 return props[prop]
 else:
 return None
 else:
 return None

 # Based on the model, generate Turtle properties of instance
 def generate_props(prop, inst_name, comment):
 r = find_restricted(prop)
 d = find_discovered(prop)
 if r==None:
 if d==None:
 # Generic case. We have neither a restriction nor a property def.
 # Just assign it a string value
 gen_result["ttl"] += f"""
# {comment} 
ex:{inst_name} <{prop}> "some value" .
# Don't have property definition on hand. Using sample string value.
 """
 else:
 # It's not a restriction and we have a property def.
 # Turtle uses facts about the prop. If-then for object vs datatype
 just_one = d["functional"]
 sample_obj_type = None
 sample_obj_prefix = None
 all_ranges = []
 for r in d["ranges"] + d["rangeUs"]:
 if is_bnode(r) == False:
 if sample_obj_type == None:
 sample_obj_type = "<" + r + ">"
 sample_obj_prefix = r
 all_ranges.append(r)
 if sample_obj_type == None:
 # no range! use a default
 sample_obj_type = "owl:Thing"
 sample_obj_prefix = "not/sure/Anything"
 
 extra_comment = "This is functional: only one " if d["functional"] else "Multiple values allowed"
 if d["propType"] == "ObjectProperty":
 uri = sample_uri(sample_obj_prefix)
 gen_result["ttl"] += f"""
# {comment} - {extra_comment}
ex:{inst_name} <{prop}> ex:{uri} .
ex:{uri} rdf:type {sample_obj_type} .
# ... and fill in the details of ex:{uri} 
# all ranges {all_ranges}
 """
 else:
 # will keep it simple with non-objects: everything is just a string
 # so no other literal types, no value constaints, etc
 gen_result["ttl"] += f"""
# {comment} - {extra_comment}
ex:{inst_name} <{prop}> "sample value" .
# actual ranges {all_ranges}
 """
 
 else:
 # It's a restriction
 functional = False if d==None else d["functional"]
 if len(r["lvals"]) > 0:
 gen_result["ttl"] += f"""
# {comment} - restricted on value; value is literal
ex:{inst_name} <{r["prop"]}> "{r["lvals"][0]}" .
# allowed values: {r["lvals"]}
 """
 elif len(r["ivals"]) > 0:
 gen_result["ttl"] += f"""
# {comment} - restricted on value; value is IRI
ex:{inst_name} <{r["prop"]}> <{r["ivals"][0]}> .
# allowed values: {r["ivals"]}
 """
 elif len(r["some"]) > 0:
 uri = sample_uri(r["some"][0])
 gen_result["ttl"] += f"""
# {comment} - restricted: some values from
ex:{inst_name} <{r["prop"]}> <{uri}> .
<{uri}> rdf:type <{r["some"][0]}> .
# values: {r["some"]}
 """
 elif len(r["all"]) > 0:
 uri = sample_uri(r["all"][0])
 gen_result["ttl"] += f"""
# {comment} - restricted: all values from
ex:{inst_name} <{r["prop"]}> <{uri}> .
<{uri}> rdf:type <{r["all"][0]}> .
# values: {r["all"]}
 """
 
 # In Turtle, need one property for each key
 for key in keys:
 generate_props(key, inst_name, "Add key")
 
 # In Turtle, need restrictions. If key, don't do
 for r in restricted_props:
 if list_has_value(keys, r) == False:
 generate_props(r["prop"], inst_name, "Add a restriction")
 
 # In Turtle, for all other props (non-keys, non-restrictions), add prop to instance.
 for d in discovered_props:
 if list_has_value(keys, d) == False and find_restricted(d) == None:
 generate_props(d, inst_name, "Add prop in domain")
 
 # Return the Turtle
 return gen_result["ttl"]

print(SAMPLE_HEADER)
for clazz in model["classes"]:
 if is_bnode(clazz) == False:
 runnable_ttl = generate_sample(model, clazz)
 print(runnable_ttl)


### Validation
Now let's validate. We will compare the structure of our example org with the expected interface determined above. 

In [None]:
# validate instances against model
def validate_instances(model):

 # pull instances and their triples
 INSTANCE_QUERY = """
PREFIX owl: 
PREFIX rdf: 
PREFIX rdfs: 

select * where {
 ?class rdf:type owl:Class .
 ?inst rdf:type ?class .
 ?inst ?prop ?obj .
 OPTIONAL { ?obj rdf:type ?objType . } .
 BIND (isLiteral(?obj) as ?lit)
} order by ?class ?inst 
"""

 # validation ignores the typical naming sutff
 IGNORES = [
 "http://www.w3.org/1999/02/22-rdf-syntax-ns#type", 
 "http://www.w3.org/2000/01/rdf-schema#label",
 "http://www.w3.org/2000/01/rdf-schema#comment",
 "http://www.w3.org/2004/02/skos/core#prefLabel",
 "http://www.w3.org/2004/02/skos/core#altLabel"
 ]

 # run the instance query and transform into hierarchical result
 # hierarchy: class - instance - prop
 # easier to validate in that form
 def run_inst_query():
 res = run_query(INSTANCE_QUERY)
 hier_result = {}

 for rec in res:
 clazz = rec["class"]["value"]
 inst = rec["inst"]["value"]
 prop = rec["prop"]["value"]
 obj = rec["obj"]["value"] 
 obj_type = rec["objType"]["value"] if "objType" in rec else "" 
 lit = True if rec["lit"]["value"] == "true" else False 
 if not(clazz in hier_result):
 hier_result[clazz] = { "clazz": clazz, "instances": {} }
 if not(inst in hier_result[clazz]["instances"]):
 hier_result[clazz]["instances"][inst] = { "instance": inst, "props": [] }
 hier_result[clazz]["instances"][inst]["props"].append({
 "prop": prop,
 "object": obj,
 "objectType": obj_type,
 "literal": lit
 })
 return hier_result
 
 # print a finding for validation summary
 def print_finding(clazz, inst, prop_assignment, finding):
 print(f"""
Finding in class: {clazz} 
Instance: {inst}.
Prop assignment: {prop_assignment}
Finding: {finding}
 """)
 
 # pull the instances and validate! notice we navigate the hierarchy form
 # The logic is clear if you focus on each call to print_finding.
 inst_summary = run_inst_query()
 for clazz in inst_summary:
 if clazz in model["classes"]:
 class_spec = model["classes"][clazz]
 for inst in inst_summary[clazz]["instances"]:
 # track stuff instance wide. want to check it has keys, has at most one functional, has at last one restrictSome
 tracker = { 
 "keys": {},
 "functionals": {},
 "restrictSome": {}
 }
 for k in class_spec["keyEntries"]: 
 tracker["keys"][k] = 0
 dprops = class_spec["discoveredProps"]
 rprops = class_spec["restrictedProps"]
 for prop_assignment in inst_summary[clazz]["instances"][inst]["props"]:
 prop = prop_assignment["prop"]
 obj = prop_assignment["object"]
 obj_type = prop_assignment["objectType"]
 literal = prop_assignment["literal"]
 if list_has_value(IGNORES, prop) == False:
 # key usage
 if prop in tracker["keys"]:
 tracker["keys"][prop] += 1
 # check against restriction
 checked_as_restriction = False
 for r in rprops:
 lvals = r["lvals"]
 ivals = r["ivals"]
 alls = r["all"]
 somes = r["some"]
 if r["prop"] == prop:
 checked_as_restriction = True
 if len(lvals) > 0:
 if literal == False:
 print_finding(clazz, inst, prop_assignment, f"Restriction requires literal value {lvals} but obj is not literal")
 elif list_has_value(lvals, obj) == False:
 print_finding(clazz, inst, prop_assignment, f"Restriction requires literal value {lvals} but obj not among these")
 elif len(ivals) > 0:
 if literal:
 print_finding(clazz, inst, prop_assignment, f"Restriction requires object value {ivals} but obj is literal")
 elif list_has_value(ivals, obj) == False:
 print_finding(clazz, inst, prop_assignment, f"Restriction requires object value {ivals} but obj not among these")
 elif len(alls) > 0:
 if list_has_value(alls, obj_type) == False:
 print_finding(clazz, inst, prop_assignment, f"Restriction requires all values from {alls} but obj type is not among these")
 elif len(somes) > 0:
 # for the someValues, just keep a count; will deal with it below
 if not(prop in tracker["restrictSome"]):
 tracker["restrictSome"][prop] = {}
 for s in somes:
 tracker["restrictSome"][prop][s] = 0
 if list_has_value(somes, obj_type):
 tracker["restrictSome"][prop][obj_type] += 1
 # discovered prop match - check 
 if checked_as_restriction == False and list_has_value(dprops, prop):
 prop_def = model["props"][prop]
 prop_type= prop_def["propType"]
 all_ranges = []
 for rg in model["props"][prop]["ranges"] + model["props"][prop]["rangeUs"]:
 if is_bnode(rg) == False:
 all_ranges.append(rg)
 
 if literal and prop_type == "ObjectProperty":
 print_finding(clazz, inst, prop_assignment, f"Prop type is {prop_type} but object is literal")
 if literal==False and prop_type == "DatatypeProperty":
 print_finding(clazz, inst, prop_assignment, f"Prop type is {prop_type} but object is not a literal")
 if len(all_ranges) > 0 and list_has_value(all_ranges, obj_type) == False:
 print_finding(clazz, inst, prop_assignment, f"Prop ranges are {all_ranges} but object type is not among these")
 if prop_def["functional"]:
 # for functional, keep a count and deal with it below
 if not(prop in tracker["functionals"]):
 tracker["functionals"][prop] = 0
 tracker["functionals"][prop] += 1
 if checked_as_restriction == False and list_has_value(dprops, prop) ==False:
 print_finding(clazz, inst, prop_assignment, f"Unrecognized prop")
 
 # now check tracker
 for ko in tracker["keys"]:
 num_occ = tracker["keys"][ko]
 if num_occ != 1:
 print_finding(clazz, inst, None, f"Key property {ko} appears {num_occ} times. Should be once.")
 for f in tracker["functionals"]:
 num_occ = tracker["functionals"][f]
 if num_occ > 1:
 print_finding(clazz, inst, None, f"Functional property {f} appears {num_occ} times. Should be once.")
 for p in tracker["restrictSome"]:
 for s in tracker["restrictSome"][p]:
 num_occ = tracker["restrictSome"][p][s]
 if num_occ < 1:
 print_finding(clazz, inst, None, f"Restriction on property {p} having some values from {s} not met.")
 
 

validate_instances(model)

## Cleanup
If you messed up and need to reload the ontology or sample data .. be careful because there are lots of blank nodes! Because of this the reload is not idempotent. It's better to clean slate before reloading. The script below has several options: dropping one of the three named graphs loaded above, or delete all triples. We recommend dropping each of the three named graphs. 


In [None]:
%%sparql


# Delete the org ontoloy
drop graph 

# Delete the examples
#drop graph 

# Delete the tester ontology
#drop graph 

# Delete all triples
#delete {?s ?p ?o} where {
# ?s ?p ?o
#}

