{ "cells": [ { "cell_type": "markdown", "id": "d4746147", "metadata": {}, "source": [ "\n", "# Neptune Ontology Example\n", "This notebook shows the use of a semantic ontology in Neptune. We use the organizational ontology (https://www.w3.org/TR/vocab-org/) defined using OWL. \n", "\n", "For more context, read the AWS blog post https://aws.amazon.com/blogs/database/model-driven-graphs-using-owl-in-amazon-neptune/\n", "\n", "Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.\n", "SPDX-License-Identifier: MIT-0\n", "\n", "Begin by setting up. Run the next cell to instruct the notebook to get Neptune data from S3 bucket provisioned for you." ] }, { "cell_type": "code", "execution_count": null, "id": "e21b908d", "metadata": {}, "outputs": [], "source": [ "import os\n", "import subprocess\n", "\n", "stream = os.popen(\"source ~/.bashrc ; echo $STAGE_BUCKET; echo $M2C_ANALYSIS_BUCKET\")\n", "lines=stream.read().split(\"\\n\")\n", "STAGING_BUCKET=lines[0]\n", "STAGING_BUCKET" ] }, { "cell_type": "markdown", "id": "377c19a3", "metadata": {}, "source": [ "## Loading the Ontology and Examples into Neptune\n", "\n", "First, load the organizational ontology into Neptune. The ontology is written as a set of RDF triples in Turtle form. Load it using Neptune's loader; modify the -s argument if the S3 bucket name does not match yours. You will be prompted with a submit form. Click Submit to run the loader, and check it completes successfully.\n" ] }, { "cell_type": "code", "execution_count": null, "id": "56f12687", "metadata": { "scrolled": true }, "outputs": [], "source": [ "%load -s s3://{STAGING_BUCKET}/data/org.ttl -f turtle --named-graph-uri=http://www.w3.org/ns/org" ] }, { "cell_type": "markdown", "id": "ace562f2", "metadata": {}, "source": [ "Next load the sample data set, which depicts a fictional organization and member structure. Load using the same approach as above. Check the S3 bucket and modify if necessary." ] }, { "cell_type": "code", "execution_count": null, "id": "03f249cc", "metadata": {}, "outputs": [], "source": [ "%load -s s3://{STAGING_BUCKET}/data/example_org.ttl -f turtle --named-graph-uri=http://amazonaws.com/db/neptune/examples/ontology/org" ] }, { "cell_type": "markdown", "id": "1d6d5dc6", "metadata": {}, "source": [ "Finally load a contrived ontology meant to test edge cases not covered by the org ontology. Modify S3 bucket if necessary." ] }, { "cell_type": "code", "execution_count": null, "id": "4a5531c3", "metadata": {}, "outputs": [], "source": [ "%load -s s3://{STAGING_BUCKET}/data/tester_ontology.ttl -f turtle --named-graph-uri=http://amazonaws.com/db/neptune/examples/ontology/tester" ] }, { "cell_type": "markdown", "id": "3c3dd2c2", "metadata": {}, "source": [ "## Querying Org Ontology\n", "\n", "Let's query the organizational ontology to discover classes and properties. Let's first get a high-level picture of the classes. The first query finds OWL classes as well as keys, equivalent classes and subclasses. Among the classes shown in the results are expected ones like http://www.w3.org/ns/org#Organization and http://www.w3.org/ns/org#Role. But we also see peculiar classes that are blank nodes, which begin with the letter b. We will make sense of these later in the notebook when we build a model." ] }, { "cell_type": "code", "execution_count": null, "id": "bae26e1c", "metadata": { "scrolled": false }, "outputs": [], "source": [ "%%sparql\n", "\n", "# You will notice some of the classes or related classes are blank nodes. \n", "# We need to drill down and see that they include.\n", "# Not here, though.\n", "\n", "PREFIX rdf: \n", "PREFIX rdfs: \n", "PREFIX owl: \n", "\n", "select ?class \n", " (GROUP_CONCAT(distinct ?subOf;SEPARATOR=\",\") AS ?subsOf)\n", " (GROUP_CONCAT(distinct ?equiv;SEPARATOR=\",\") AS ?equivs)\n", " (GROUP_CONCAT(distinct ?key;SEPARATOR=\",\") AS ?keys) where { \n", " ?class rdf:type owl:Class .\n", " OPTIONAL { ?class rdfs:subClassOf ?subOf . } .\n", " OPTIONAL { ?class owl:equivalentClass ?equiv . } .\n", " OPTIONAL { ?class owl:hasKey ?keylist . ?keylist rdf:rest*/rdf:first ?key . } .\n", "} group by ?class \n", "order by ?class\n" ] }, { "cell_type": "markdown", "id": "cf197638", "metadata": {}, "source": [ "Now let's connect properties to classes. We list properties whose domain is one of the classes from the results above. For each we also get the range and the property type. The results mostly make sense, but we continue to see blank nodes. For example, the class associated with the http://www.w3.org/ns/org#role property is blank. We make sense of this later in the notebook." ] }, { "cell_type": "code", "execution_count": null, "id": "21680897", "metadata": {}, "outputs": [], "source": [ "%%sparql \n", "\n", "PREFIX rdf: \n", "PREFIX rdfs: \n", "PREFIX owl: \n", "\n", "select ?class ?prop ?range \n", "(GROUP_CONCAT(distinct ?propType;SEPARATOR=\",\") AS ?propTypes) where { \n", " ?class rdf:type owl:Class .\n", " ?prop rdfs:domain ?class .\n", " ?prop rdf:type ?propType .\n", " OPTIONAL {?prop rdfs:range ?range } .\n", "} \n", "group by ?class ?prop ?range\n", "order by ?class ?prop " ] }, { "cell_type": "markdown", "id": "afb9828b", "metadata": {}, "source": [ "## Querying Example Data\n", "\n", "Now let's query the example organization to discover orgs, suborgs, employees and roles. First, we list organizations, suborganizations, and organizational units, as well as the sites of the organizations. " ] }, { "cell_type": "code", "execution_count": null, "id": "f5cfb834", "metadata": {}, "outputs": [], "source": [ "%%sparql\n", "\n", "PREFIX rdf: \n", "PREFIX rdfs: \n", "PREFIX org: \n", "\n", "select ?orgName ?subName ?unitName ?siteName where {\n", " ?org rdf:type org:Organization .\n", " ?org rdfs:label ?orgName .\n", " OPTIONAL { ?org org:hasSubOrganization/rdfs:label ?subName } .\n", " OPTIONAL { ?org org:hasUnit/rdfs:label ?unitName . } .\n", " OPTIONAL { ?org org:hasSite/rdfs:label ?siteName . }\n", "} order by ?orgName" ] }, { "cell_type": "markdown", "id": "7a2b9460", "metadata": {}, "source": [ "Let's also check organizational history. Run the next query to see a change event." ] }, { "cell_type": "code", "execution_count": null, "id": "65f569bb", "metadata": {}, "outputs": [], "source": [ "%%sparql\n", "\n", "PREFIX org: \n", "\n", "select ?event ?prop ?obj where {\n", " ?event rdf:type org:ChangeEvent .\n", " ?event ?prop ?obj .\n", "} order by ?event ?prop" ] }, { "cell_type": "markdown", "id": "96a1eb40", "metadata": {}, "source": [ "Now let's list some of the people in these organizations. Notice in the query results the org:memberOf and org:basedAt relationships, which tie the person to an organization and a site." ] }, { "cell_type": "code", "execution_count": null, "id": "1356d63f", "metadata": {}, "outputs": [], "source": [ "%%sparql\n", "\n", "PREFIX foaf: \n", "\n", "select ?person ?prop ?obj where {\n", " ?person rdf:type foaf:Person .\n", " ?person ?prop ?obj .\n", "} order by ?person ?prop" ] }, { "cell_type": "markdown", "id": "4671d802", "metadata": {}, "source": [ "Let's run a path query to see the hierarchical structure of OrgFinancial." ] }, { "cell_type": "code", "execution_count": null, "id": "ce36f4db", "metadata": {}, "outputs": [], "source": [ "%%sparql\n", "\n", "PREFIX org: \n", "PREFIX ex: \n", "\n", "select ?personName ?boss (GROUP_CONCAT(?superiorName;SEPARATOR=\",\") AS ?superiors) where {\n", " ?person org:memberOf ex:Org-MegaFinancial .\n", " ?person rdfs:label ?personName .\n", " OPTIONAL {\n", " ?person org:reportsTo/rdfs:label ?boss .\n", " ?person org:reportsTo+ ?superior .\n", " ?superior rdfs:label ?superiorName .\n", " } .\n", "} group by ?personName ?boss\n" ] }, { "cell_type": "markdown", "id": "1c3e08f6", "metadata": {}, "source": [ "Finally, let's see roles and posts in the MegaSystems organization. Run the next two queries." ] }, { "cell_type": "code", "execution_count": null, "id": "f679bf1c", "metadata": {}, "outputs": [], "source": [ "%%sparql\n", "\n", "PREFIX org: \n", "PREFIX ex: \n", "\n", "select ?post ?postHolder where {\n", " ?post rdf:type org:Post .\n", " ?post org:postIn ex:Org-MegaSystems . \n", " OPTIONAL {\n", " ?postHolder org:holds ?post .\n", " }\n", "}" ] }, { "cell_type": "code", "execution_count": null, "id": "78fa5813", "metadata": {}, "outputs": [], "source": [ "%%sparql\n", "\n", "PREFIX org: \n", "PREFIX ex: \n", "\n", "select ?role ?roleHolder where {\n", " ?role rdf:type org:Role .\n", " ?membership rdf:type org:Membership .\n", " ?membership org:role ?role .\n", " ?membership org:organization ex:Org-MegaSystems .\n", " ?membership org:member ?roleHolder\n", "}" ] }, { "cell_type": "markdown", "id": "58e6fdd3", "metadata": {}, "source": [ "## Enforcing the Ontology!\n", "\n", "Now let's bring things together. We need to understand the purpose of those blank nodes above! We also need to check whether our sample data matches the structure expected by the ontology. Finally, let's make use of that structure to insert new members and orgs, guided by a boilerplate structure. \n", "\n", "### Build the Model\n", "The first step is to gather a bit more information from the ontology. We need to \"fill in the blanks!\". Run the next cell to obtain a complete picture of the ontology. The code that follows runs several queries and brings them together into an opinionated interface, or model, of classes and expected properties." ] }, { "cell_type": "code", "execution_count": null, "id": "d67751a8", "metadata": {}, "outputs": [], "source": [ "from IPython.utils import io\n", "\n", "# check if uri is bnode or not\n", "def is_bnode(uri):\n", " return uri.startswith(\"b\")\n", "\n", "# check if list contains the val\n", "def list_has_value(list, val):\n", " try:\n", " list.index(val)\n", " return True\n", " except ValueError:\n", " return False\n", "\n", "# run sparql magic on the specified query. return the results\n", "def run_query(q):\n", " with io.capture_output() as captured: \n", " ipython = get_ipython()\n", " mgc = ipython.run_cell_magic\n", " mgc(magic_name = \"sparql\", line = \"--store-to query_res\", cell=q) \n", " return query_res[\"results\"][\"bindings\"]\n", " \n", "\n", "# build our model\n", "def build_model():\n", " \n", " # Out of scope OWL stuff for this example: \n", " # AllDisjointClases, disjointUnionOf\n", " # assertions - same/diff ind, obj/data prop assertion, neg obj/data prop assertion\n", " # annotations\n", " # top/bottom property\n", " # restriction onProperties;\n", " # but restriction onProperty IS supported \n", " # cardinality\n", " # but will consider FunctionalProperty\n", " # Datatype and data ranges\n", " \n", " # Limitation: for datatype properties, consider only strings.\n", " \n", " CLASS_QUERY = \"\"\"\n", "PREFIX rdf: \n", "PREFIX rdfs: \n", "PREFIX owl: \n", "\n", "select ?class \n", " (GROUP_CONCAT(distinct ?subOf;SEPARATOR=\",\") AS ?subsOf)\n", " (GROUP_CONCAT(distinct ?equiv;SEPARATOR=\",\") AS ?equivs)\n", " (GROUP_CONCAT(distinct ?complement;SEPARATOR=\",\") AS ?complements) \n", " (GROUP_CONCAT(distinct ?keyList;SEPARATOR=\",\") AS ?keys) \n", " (GROUP_CONCAT(distinct ?kentry;SEPARATOR=\",\") AS ?keyEntries) \n", " (GROUP_CONCAT(distinct ?uList;SEPARATOR=\",\") AS ?unions) \n", " (GROUP_CONCAT(distinct ?iList;SEPARATOR=\",\") AS ?intersections) \n", " (GROUP_CONCAT(distinct ?ientry;SEPARATOR=\",\") AS ?intersectionEntries) \n", " (GROUP_CONCAT(distinct ?oneList;SEPARATOR=\",\") AS ?oneOfs) \n", " (GROUP_CONCAT(distinct ?disj;SEPARATOR=\",\") AS ?disjoints) \n", " where { \n", " ?class rdf:type owl:Class .\n", " OPTIONAL { ?class rdfs:subClassOf+ ?subOf . } .\n", " OPTIONAL { ?class owl:equivalentClass+ ?equiv . } .\n", " OPTIONAL { ?class owl:complementOf ?complement . } .\n", " OPTIONAL { ?class owl:hasKey ?keyList . } .\n", " OPTIONAL { ?class owl:hasKey ?kl . ?kl rdf:rest*/rdf:first ?kentry . } .\n", " OPTIONAL { ?class owl:unionOf ?uList . } . \n", " OPTIONAL { ?class owl:intersectionOf ?iList . } . \n", " OPTIONAL { ?class owl:intersectionOf ?il . ?il rdf:rest*/rdf:first ?ientry . } .\n", " OPTIONAL { ?class owl:oneOf ?oneList . } .\n", " OPTIONAL { ?class owl:disjointWith ?disj . } . \n", "} group by ?class\n", " \"\"\"\n", "\n", " PROP_QUERY = \"\"\"\n", "PREFIX rdf: \n", "PREFIX rdfs: \n", "PREFIX owl: \n", "\n", "select ?prop \n", " (GROUP_CONCAT(distinct ?subPropOf;SEPARATOR=\",\") AS ?subsOf) \n", " (GROUP_CONCAT(distinct ?equiv;SEPARATOR=\",\") AS ?equivs) \n", " (GROUP_CONCAT(distinct ?domain;SEPARATOR=\",\") AS ?domains) \n", " (GROUP_CONCAT(distinct ?du;SEPARATOR=\",\") AS ?domainUs) \n", " (GROUP_CONCAT(distinct ?range;SEPARATOR=\",\") AS ?ranges) \n", " (GROUP_CONCAT(distinct ?ru;SEPARATOR=\",\") AS ?rangeUs) \n", " (GROUP_CONCAT(distinct ?disj;SEPARATOR=\",\") AS ?disjoints) \n", " (GROUP_CONCAT(distinct ?inv;SEPARATOR=\",\") AS ?inverses) \n", " (GROUP_CONCAT(distinct ?type;SEPARATOR=\",\") AS ?types) \n", " where {\n", "\n", " { ?prop rdf:type rdf:Property . }\n", " UNION\n", " { ?prop rdf:type owl:ObjectProperty . }\n", " UNION\n", " { ?prop rdf:type owl:DatatypeProperty . } .\n", " OPTIONAL { ?prop rdfs:subPropertyOf+ ?subPropOf . } .\n", " OPTIONAL { ?prop rdfs:equivalentProperty+ ?equiv . } .\n", " OPTIONAL { ?prop rdfs:domain ?domain } .\n", " OPTIONAL { ?prop rdfs:domain/owl:unionOf ?u . ?u rdf:rest*/rdf:first ?du . } .\n", " OPTIONAL { ?prop rdfs:range ?range } .\n", " OPTIONAL { ?prop rdfs:range/owl:unionOf ?u1 . ?u1 rdf:rest*/rdf:first ?ru . } .\n", " OPTIONAL { ?prop owl:propertyDisjointWith ?disj . } . \n", " OPTIONAL { { ?prop owl:inverseOf ?inv } UNION { ?inv owl:inverseOf ?prop } } . \n", " ?prop rdf:type ?type . # allows us to check functional, transitive, etc\n", "} \n", "group by ?prop\n", " \"\"\"\n", "\n", " RESTRICTION_QUERY =\"\"\"\n", "PREFIX rdf: \n", "PREFIX rdfs: \n", "PREFIX owl: \n", "\n", "select ?restriction ?prop \n", " (GROUP_CONCAT(distinct ?allClass;SEPARATOR=\",\") AS ?allFromClasses)\n", " (GROUP_CONCAT(distinct ?someClass;SEPARATOR=\",\") AS ?someFromClasses)\n", " (GROUP_CONCAT(distinct ?lval;SEPARATOR=\",\") AS ?lvals) \n", " (GROUP_CONCAT(distinct ?ival;SEPARATOR=\",\") AS ?ivals) \n", " where { \n", " ?restriction rdf:type owl:Restriction .\n", " ?restriction owl:onProperty ?prop .\n", " OPTIONAL { ?restriction owl:allValuesFrom ?allClass . } .\n", " OPTIONAL { ?restriction owl:someValuesFrom ?someClass . } .\n", " OPTIONAL { ?restriction owl:hasValue ?lval . FILTER(isLiteral(?lval)) . } .\n", " OPTIONAL { ?restriction owl:hasValue ?ival . FILTER(!isLiteral(?ival)) . } .\n", "} group by ?restriction ?prop\n", " \"\"\"\n", "\n", " LIST_QUERY = \"\"\"\n", "PREFIX rdf: \n", "PREFIX rdfs: \n", "PREFIX owl: \n", "\n", "select ?list (GROUP_CONCAT(distinct ?entity;SEPARATOR=\",\") AS ?entities) where { \n", " ?subject owl:unionOf|owl:intersectionOf|owl:oneOf|owl:onProperties|owl:members|owl:disjoinUnionOf|owl:propertyChainAxioms|owl:hasKey ?list .\n", " OPTIONAL {?list rdf:rest*/rdf:first ?entity . } .\n", "} group by ?list\n", " \"\"\"\n", "\n", " # sub-function to run a sparql query and transform it\n", " # the transform works like this\n", " # sparql result: [ { \"col1\": { value \"a\"}, \"col2\": { value: \"b,c\"}, \"col3 : { value: \"d\"}\"}]\n", " # transformed: [ \"a\": { \"col2\": [\"b\", \"c\"], \"col3\", \"d\"}]\n", " # Here \"col1\" is the key, so the \"a\" becomes the key\n", " # \"b,c\" is comma-sep value and is transformed to list [\"b\", \"c\"]\n", " # \"col3\" is a single, so it its val is \"d\" rather than [\"d\"]\n", " def run_model_query(q, key, singles):\n", " res = run_query(q)\n", " result_dict = {}\n", " for rec in res:\n", " this_rec = {\"visited\": False, \"visitedForProps\": False, \"discoveredProps\": [], \"restrictedProps\": []}\n", " for rec_key in rec:\n", " val = str(rec[rec_key][\"value\"])\n", " if rec_key == key:\n", " this_rec[rec_key] = val\n", " result_dict[val] = this_rec\n", " elif list_has_value(singles, rec_key) :\n", " this_rec[rec_key] = val\n", " elif val == \"\":\n", " this_rec[rec_key] = []\n", " else:\n", " toks = val.split(\",\")\n", " this_rec[rec_key] = toks\n", "\n", " return result_dict \n", "\n", " # run the queries\n", " class_res = run_model_query(CLASS_QUERY, \"class\", [])\n", " prop_res = run_model_query(PROP_QUERY, \"prop\", [])\n", " restriction_res = run_model_query(RESTRICTION_QUERY, \"restriction\", [\"prop\"])\n", " list_res = run_model_query(LIST_QUERY, \"list\", [])\n", " classes = list(class_res.keys())\n", " props = list(prop_res.keys())\n", " restrictions = list(restriction_res.keys())\n", " lists = list(list_res.keys())\n", "\n", " # \n", " # Walk functions. If a class/prop refers to a bnode, let's drill down and see what that bnode is.\n", " # Walk the bnode too, and capture its structure in the parent class/prop.\n", " # Example, suppose a class has a subClassOf b, where b is a bnode. \n", " # What is that bnode? It might be a class that is a restriction on a property. \n", " # That's useful to know, so we capture that expanded view in the parent class.\n", " #\n", "\n", " def make_walked_node(b, v):\n", " return {\"bnode\": b, \"obj\": v}\n", "\n", " def expand_list(rec, keys):\n", " for list_type in keys:\n", " new_list = []\n", " for entry in rec[list_type]:\n", " if is_bnode(entry):\n", " new_list.append(make_walked_node(entry, walk(entry)))\n", " else:\n", " new_list.append(entry)\n", " rec[list_type+\"_expand\"] = new_list\n", " \n", " \n", " def walk(entry):\n", " if list_has_value(classes, entry):\n", " return walk_class(entry)\n", " elif list_has_value(restrictions, entry):\n", " return walk_restriction(entry)\n", " elif list_has_value(props, entry):\n", " return walk_prop(entry)\n", " elif list_has_value(lists, entry):\n", " return walk_list(entry)\n", " else:\n", " return entry\n", "\n", " def walk_list(l):\n", " #print(\"visit list \" + l)\n", " if list_has_value(lists, l):\n", " rec = list_res[l]\n", " if rec[\"visited\"]:\n", " return rec\n", " else:\n", " new_list = []\n", " expand_list(rec, [\"entities\"])\n", " rec[\"visited\"] = True\n", " return rec\n", " else:\n", " return l\n", " \n", " \n", " def walk_class(clazz):\n", " #print(\"visit class \" + clazz)\n", " if list_has_value(classes, clazz):\n", " rec = class_res[clazz]\n", " if rec[\"visited\"]:\n", " return rec\n", " else:\n", " expand_list(rec, [\"keys\", \"subsOf\", \"equivs\", \"complements\", \"disjoints\", \"unions\", \"intersections\"])\n", " rec[\"visited\"] = True\n", " return rec\n", " else:\n", " return clazz\n", "\n", " def walk_prop(prop):\n", " #print(\"visit prop \" + prop)\n", " if list_has_value(props, prop):\n", " rec = prop_res[prop]\n", " if rec[\"visited\"]:\n", " return rec\n", " else:\n", " expand_list(rec, [\"subsOf\", \"equivs\", \"inverses\", \"disjoints\", \"domains\", \"ranges\"])\n", " rec[\"functional\"] = list_has_value(rec[\"types\"], \"http://www.w3.org/2002/07/owl#FunctionalProperty\")\n", " if list_has_value(rec[\"types\"], \"http://www.w3.org/2002/07/owl#ObjectProperty\"):\n", " rec[\"propType\"] = \"ObjectProperty\"\n", " elif list_has_value(rec[\"types\"], \"http://www.w3.org/2002/07/owl#DatatypeProperty\"):\n", " rec[\"propType\"] = \"DatatypeProperty\"\n", " elif list_has_value(rec[\"types\"], \"http://www.w3.org/1999/02/22-rdf-syntax-ns#Property\"):\n", " rec[\"propType\"] = \"Property\" \n", " rec[\"visited\"] = True\n", " return rec \n", " else:\n", " return clazz\n", "\n", " def walk_restriction(restriction) :\n", " #print(\"visit restriction \" + restriction)\n", " if list_has_value(restrictions, restriction):\n", " rec = restriction_res[restriction]\n", " if rec[\"visited\"]:\n", " return rec\n", " else:\n", " if is_bnode(rec[\"prop\"]):\n", " rec[\"prop\"] = make_walked_node(rec[\"prop\"], walk(rec[\"prop\"]))\n", " expand_list(rec, [\"allFromClasses\", \"someFromClasses\"])\n", " rec[\"visited\"] = True\n", " return rec\n", " else:\n", " return restriction\n", "\n", " # walk the properties and classes, bringing in dependencies like lists, restrictions, and related classes\n", " for entry in prop_res:\n", " walk_prop(entry)\n", " for entry in class_res:\n", " walk_class(entry)\n", "\n", " # for the given prop, if it belongs to expected_clazz, return the prop plus super-props\n", " def get_props(prop, expected_clazz):\n", " if list_has_value(props, prop):\n", " candidate = False\n", " if expected_clazz == None:\n", " candidate = True\n", " else:\n", " # class is domain\n", " for dom in prop_res[prop][\"domains\"]:\n", " if dom == expected_clazz:\n", " candidate = True\n", " break\n", " # domain is union and includes class\n", " for dom in prop_res[prop][\"domainUs\"]:\n", " if dom == expected_clazz:\n", " candidate = True\n", " break\n", " if candidate:\n", " # return this prop and props of which the prop is subsOf\n", " return list(set([prop] + prop_res[prop][\"subsOf\"]))\n", " else:\n", " return []\n", " else:\n", " return []\n", "\n", " # recursively walk the class, looking for properties.\n", " def walk_class_for_props(clazz):\n", " #print(\"visit \" + clazz)\n", " # am i a class or a restriction?\n", " if list_has_value(restrictions, clazz):\n", " if not(restriction_res[clazz][\"visitedForProps\"]):\n", " #print(\" restriction visit \" + clazz)\n", " restriction_res[clazz][\"visitedForProps\"] = True\n", " prop_uri = restriction_res[clazz][\"prop\"]\n", " restriction_res[clazz][\"restrictedProps\"] = [{\n", " \"prop\": prop_uri,\n", " \"restriction\": clazz,\n", " \"all\" : restriction_res[clazz][\"allFromClasses\"], \n", " \"some\": restriction_res[clazz][\"someFromClasses\"],\n", " \"lvals\": restriction_res[clazz][\"lvals\"],\n", " \"ivals\": restriction_res[clazz][\"ivals\"] }]\n", " return restriction_res[clazz]\n", " elif list_has_value(classes, clazz):\n", " if not(class_res[clazz][\"visitedForProps\"]):\n", " #print(\" class visit \" + clazz)\n", " \n", " # if i'm not a bnode, get all props that apply to me\n", " if not(is_bnode(clazz)):\n", " for prop in props:\n", " class_res[clazz][\"discoveredProps\"] = list(set(class_res[clazz][\"discoveredProps\"] + get_props(prop, clazz)))\n", " \n", " for list_type in [\"subsOf\", \"intersectionEntries\", \"equivs\"]:\n", " for entry in class_res[clazz][list_type]:\n", " can_use = list_has_value(classes, entry) or list_has_value(restrictions, entry)\n", " if list_type == 'equivs' and is_bnode(entry) == False:\n", " can_use = False\n", " if can_use:\n", " # recurse for subsOf, intersectionEntries, equivs (restrictions only)\n", " recurse_result = walk_class_for_props(entry)\n", " class_res[clazz][\"discoveredProps\"] = list(set( class_res[clazz][\"discoveredProps\"] + recurse_result[\"discoveredProps\"]))\n", " class_res[clazz][\"restrictedProps\"] += recurse_result[\"restrictedProps\"]\n", " class_res[clazz][\"visitedForProps\"] = True\n", " return class_res[clazz]\n", " \n", " else:\n", " print(\" VERY BAD visit \" + clazz)\n", " return None \n", " \n", " # for each class determine the properties by walking\n", " for entry in class_res:\n", " if not(is_bnode(entry)):\n", " walk_class_for_props(entry)\n", "\n", " # return the model - the classes and properties discovered\n", " return {\n", " \"classes\": class_res,\n", " \"props\": prop_res\n", " }\n", "\n", "# Print the model\n", "def print_model_summary(model) :\n", " for clazz in model[\"classes\"]:\n", " if is_bnode(clazz) == False:\n", " print(\"Class \" + clazz) \n", " print(\"\\tkeys \" + str(model[\"classes\"][clazz][\"keyEntries\"]))\n", " print(\"\\n\")\n", " for r in model[\"classes\"][clazz][\"restrictedProps\"]:\n", " print(\"\\tRestriction on prop \" + r[\"prop\"])\n", " print(\"\\t\\tall \" + str(r[\"all\"]))\n", " print(\"\\t\\tsome \" + str(r[\"some\"]))\n", " print(\"\\t\\tliteral values \" + str(r[\"lvals\"]))\n", " print(\"\\t\\tobject values \" + str(r[\"ivals\"]))\n", " for prop in model[\"classes\"][clazz][\"discoveredProps\"]:\n", " print(\"\\tProp \" + prop)\n", " if prop in model[\"props\"]:\n", " prop_def = model[\"props\"][prop]\n", " print(\"\\t\\ttype \" + prop_def[\"propType\"])\n", " print(\"\\t\\tfunctional \" + str(prop_def[\"functional\"]))\n", " print(\"\\t\\tinverses \" + str(prop_def[\"inverses\"]))\n", " print(\"\\t\\trange \" + str(prop_def[\"ranges\"]))\n", " print(\"\\t\\trangeUnionOf \" + str(prop_def[\"rangeUs\"]))\n", " \n", " \n", "model = build_model()\n", "print_model_summary(model)\n", "\n" ] }, { "cell_type": "markdown", "id": "f22a4812", "metadata": {}, "source": [ "### Generation\n", "Finally, given the interface we determined above, let's generate sample Turtle. This acts as our boilerplate for new data." ] }, { "cell_type": "code", "execution_count": null, "id": "e86478cf", "metadata": {}, "outputs": [], "source": [ "counter = {\"current\": 0}\n", "\n", "# Prefixes for generated Turtle\n", "SAMPLE_HEADER = \"\"\"\n", "@base .\n", "@prefix ex: .\n", "@prefix owl: .\n", "@prefix rdf: .\n", "@prefix rdfs: .\n", "\"\"\"\n", "\n", "# generate samples instances for clazz based on model\n", "def generate_sample(model, clazz):\n", "\n", " # start building Turtle\n", " gen_result = {\"ttl\": \"\"}\n", " \n", " # Generate sample URI\n", " def sample_uri(clazz):\n", " #clazz is an IRI. Get the last token, which follows either the last / or a #\n", " counter[\"current\"]+= 1\n", " inst_num = counter[\"current\"]\n", " clazz_name = clazz.split(\"/\")[-1].split(\"#\")[-1]\n", " return clazz_name + \"-sample-\" + str(inst_num)\n", " \n", " inst_name = sample_uri(clazz)\n", " class_def = model[\"classes\"][clazz]\n", " props = model[\"props\"]\n", " keys = class_def[\"keyEntries\"]\n", " discovered_props = class_def[\"discoveredProps\"]\n", " restricted_props = class_def[\"restrictedProps\"]\n", " last_idx = 0\n", " \n", " # In Turtle, instance has rdf:type that is clazz\n", " gen_result[\"ttl\"] += f\"\"\"\n", "#\n", "# Sample for class {clazz}\n", "# \n", "\n", "# Instantiate\n", "ex:{inst_name} rdf:type <{clazz}> .\n", " \"\"\"\n", " \n", " #\n", " # finder helpers\n", " # \n", " \n", " def find_restricted(prop):\n", " for entry in restricted_props:\n", " if entry[\"prop\"] == prop:\n", " # could there be more than one entry with prop;\n", " # not sure how; take the first one\n", " return entry\n", " return None\n", " def find_discovered(prop):\n", " if list_has_value(discovered_props, prop):\n", " if prop in props: \n", " return props[prop]\n", " else:\n", " return None\n", " else:\n", " return None\n", "\n", " # Based on the model, generate Turtle properties of instance\n", " def generate_props(prop, inst_name, comment):\n", " r = find_restricted(prop)\n", " d = find_discovered(prop)\n", " if r==None:\n", " if d==None:\n", " # Generic case. We have neither a restriction nor a property def.\n", " # Just assign it a string value\n", " gen_result[\"ttl\"] += f\"\"\"\n", "# {comment} \n", "ex:{inst_name} <{prop}> \"some value\" .\n", "# Don't have property definition on hand. Using sample string value.\n", " \"\"\"\n", " else:\n", " # It's not a restriction and we have a property def.\n", " # Turtle uses facts about the prop. If-then for object vs datatype\n", " just_one = d[\"functional\"]\n", " sample_obj_type = None\n", " sample_obj_prefix = None\n", " all_ranges = []\n", " for r in d[\"ranges\"] + d[\"rangeUs\"]:\n", " if is_bnode(r) == False:\n", " if sample_obj_type == None:\n", " sample_obj_type = \"<\" + r + \">\"\n", " sample_obj_prefix = r\n", " all_ranges.append(r)\n", " if sample_obj_type == None:\n", " # no range! use a default\n", " sample_obj_type = \"owl:Thing\"\n", " sample_obj_prefix = \"not/sure/Anything\"\n", " \n", " extra_comment = \"This is functional: only one \" if d[\"functional\"] else \"Multiple values allowed\"\n", " if d[\"propType\"] == \"ObjectProperty\":\n", " uri = sample_uri(sample_obj_prefix)\n", " gen_result[\"ttl\"] += f\"\"\"\n", "# {comment} - {extra_comment}\n", "ex:{inst_name} <{prop}> ex:{uri} .\n", "ex:{uri} rdf:type {sample_obj_type} .\n", "# ... and fill in the details of ex:{uri} \n", "# all ranges {all_ranges}\n", " \"\"\"\n", " else:\n", " # will keep it simple with non-objects: everything is just a string\n", " # so no other literal types, no value constaints, etc\n", " gen_result[\"ttl\"] += f\"\"\"\n", "# {comment} - {extra_comment}\n", "ex:{inst_name} <{prop}> \"sample value\" .\n", "# actual ranges {all_ranges}\n", " \"\"\"\n", " \n", " else:\n", " # It's a restriction\n", " functional = False if d==None else d[\"functional\"]\n", " if len(r[\"lvals\"]) > 0:\n", " gen_result[\"ttl\"] += f\"\"\"\n", "# {comment} - restricted on value; value is literal\n", "ex:{inst_name} <{r[\"prop\"]}> \"{r[\"lvals\"][0]}\" .\n", "# allowed values: {r[\"lvals\"]}\n", " \"\"\"\n", " elif len(r[\"ivals\"]) > 0:\n", " gen_result[\"ttl\"] += f\"\"\"\n", "# {comment} - restricted on value; value is IRI\n", "ex:{inst_name} <{r[\"prop\"]}> <{r[\"ivals\"][0]}> .\n", "# allowed values: {r[\"ivals\"]}\n", " \"\"\"\n", " elif len(r[\"some\"]) > 0:\n", " uri = sample_uri(r[\"some\"][0])\n", " gen_result[\"ttl\"] += f\"\"\"\n", "# {comment} - restricted: some values from\n", "ex:{inst_name} <{r[\"prop\"]}> <{uri}> .\n", "<{uri}> rdf:type <{r[\"some\"][0]}> .\n", "# values: {r[\"some\"]}\n", " \"\"\"\n", " elif len(r[\"all\"]) > 0:\n", " uri = sample_uri(r[\"all\"][0])\n", " gen_result[\"ttl\"] += f\"\"\"\n", "# {comment} - restricted: all values from\n", "ex:{inst_name} <{r[\"prop\"]}> <{uri}> .\n", "<{uri}> rdf:type <{r[\"all\"][0]}> .\n", "# values: {r[\"all\"]}\n", " \"\"\"\n", " \n", " # In Turtle, need one property for each key\n", " for key in keys:\n", " generate_props(key, inst_name, \"Add key\")\n", " \n", " # In Turtle, need restrictions. If key, don't do\n", " for r in restricted_props:\n", " if list_has_value(keys, r) == False:\n", " generate_props(r[\"prop\"], inst_name, \"Add a restriction\")\n", " \n", " # In Turtle, for all other props (non-keys, non-restrictions), add prop to instance.\n", " for d in discovered_props:\n", " if list_has_value(keys, d) == False and find_restricted(d) == None:\n", " generate_props(d, inst_name, \"Add prop in domain\")\n", " \n", " # Return the Turtle\n", " return gen_result[\"ttl\"]\n", "\n", "print(SAMPLE_HEADER)\n", "for clazz in model[\"classes\"]:\n", " if is_bnode(clazz) == False:\n", " runnable_ttl = generate_sample(model, clazz)\n", " print(runnable_ttl)\n" ] }, { "cell_type": "markdown", "id": "fa24755b", "metadata": {}, "source": [ "### Validation\n", "Now let's validate. We will compare the structure of our example org with the expected interface determined above. " ] }, { "cell_type": "code", "execution_count": null, "id": "eaf0f2b1", "metadata": {}, "outputs": [], "source": [ "# validate instances against model\n", "def validate_instances(model):\n", "\n", " # pull instances and their triples\n", " INSTANCE_QUERY = \"\"\"\n", "PREFIX owl: \n", "PREFIX rdf: \n", "PREFIX rdfs: \n", "\n", "select * where {\n", " ?class rdf:type owl:Class .\n", " ?inst rdf:type ?class .\n", " ?inst ?prop ?obj .\n", " OPTIONAL { ?obj rdf:type ?objType . } .\n", " BIND (isLiteral(?obj) as ?lit)\n", "} order by ?class ?inst \n", "\"\"\"\n", "\n", " # validation ignores the typical naming sutff\n", " IGNORES = [\n", " \"http://www.w3.org/1999/02/22-rdf-syntax-ns#type\", \n", " \"http://www.w3.org/2000/01/rdf-schema#label\",\n", " \"http://www.w3.org/2000/01/rdf-schema#comment\",\n", " \"http://www.w3.org/2004/02/skos/core#prefLabel\",\n", " \"http://www.w3.org/2004/02/skos/core#altLabel\"\n", " ]\n", "\n", " # run the instance query and transform into hierarchical result\n", " # hierarchy: class - instance - prop\n", " # easier to validate in that form\n", " def run_inst_query():\n", " res = run_query(INSTANCE_QUERY)\n", " hier_result = {}\n", "\n", " for rec in res:\n", " clazz = rec[\"class\"][\"value\"]\n", " inst = rec[\"inst\"][\"value\"]\n", " prop = rec[\"prop\"][\"value\"]\n", " obj = rec[\"obj\"][\"value\"] \n", " obj_type = rec[\"objType\"][\"value\"] if \"objType\" in rec else \"\" \n", " lit = True if rec[\"lit\"][\"value\"] == \"true\" else False \n", " if not(clazz in hier_result):\n", " hier_result[clazz] = { \"clazz\": clazz, \"instances\": {} }\n", " if not(inst in hier_result[clazz][\"instances\"]):\n", " hier_result[clazz][\"instances\"][inst] = { \"instance\": inst, \"props\": [] }\n", " hier_result[clazz][\"instances\"][inst][\"props\"].append({\n", " \"prop\": prop,\n", " \"object\": obj,\n", " \"objectType\": obj_type,\n", " \"literal\": lit\n", " })\n", " return hier_result\n", " \n", " # print a finding for validation summary\n", " def print_finding(clazz, inst, prop_assignment, finding):\n", " print(f\"\"\"\n", "Finding in class: {clazz} \n", "Instance: {inst}.\n", "Prop assignment: {prop_assignment}\n", "Finding: {finding}\n", " \"\"\")\n", " \n", " # pull the instances and validate! notice we navigate the hierarchy form\n", " # The logic is clear if you focus on each call to print_finding.\n", " inst_summary = run_inst_query()\n", " for clazz in inst_summary:\n", " if clazz in model[\"classes\"]:\n", " class_spec = model[\"classes\"][clazz]\n", " for inst in inst_summary[clazz][\"instances\"]:\n", " # track stuff instance wide. want to check it has keys, has at most one functional, has at last one restrictSome\n", " tracker = { \n", " \"keys\": {},\n", " \"functionals\": {},\n", " \"restrictSome\": {}\n", " }\n", " for k in class_spec[\"keyEntries\"]: \n", " tracker[\"keys\"][k] = 0\n", " dprops = class_spec[\"discoveredProps\"]\n", " rprops = class_spec[\"restrictedProps\"]\n", " for prop_assignment in inst_summary[clazz][\"instances\"][inst][\"props\"]:\n", " prop = prop_assignment[\"prop\"]\n", " obj = prop_assignment[\"object\"]\n", " obj_type = prop_assignment[\"objectType\"]\n", " literal = prop_assignment[\"literal\"]\n", " if list_has_value(IGNORES, prop) == False:\n", " # key usage\n", " if prop in tracker[\"keys\"]:\n", " tracker[\"keys\"][prop] += 1\n", " # check against restriction\n", " checked_as_restriction = False\n", " for r in rprops:\n", " lvals = r[\"lvals\"]\n", " ivals = r[\"ivals\"]\n", " alls = r[\"all\"]\n", " somes = r[\"some\"]\n", " if r[\"prop\"] == prop:\n", " checked_as_restriction = True\n", " if len(lvals) > 0:\n", " if literal == False:\n", " print_finding(clazz, inst, prop_assignment, f\"Restriction requires literal value {lvals} but obj is not literal\")\n", " elif list_has_value(lvals, obj) == False:\n", " print_finding(clazz, inst, prop_assignment, f\"Restriction requires literal value {lvals} but obj not among these\")\n", " elif len(ivals) > 0:\n", " if literal:\n", " print_finding(clazz, inst, prop_assignment, f\"Restriction requires object value {ivals} but obj is literal\")\n", " elif list_has_value(ivals, obj) == False:\n", " print_finding(clazz, inst, prop_assignment, f\"Restriction requires object value {ivals} but obj not among these\")\n", " elif len(alls) > 0:\n", " if list_has_value(alls, obj_type) == False:\n", " print_finding(clazz, inst, prop_assignment, f\"Restriction requires all values from {alls} but obj type is not among these\")\n", " elif len(somes) > 0:\n", " # for the someValues, just keep a count; will deal with it below\n", " if not(prop in tracker[\"restrictSome\"]):\n", " tracker[\"restrictSome\"][prop] = {}\n", " for s in somes:\n", " tracker[\"restrictSome\"][prop][s] = 0\n", " if list_has_value(somes, obj_type):\n", " tracker[\"restrictSome\"][prop][obj_type] += 1\n", " # discovered prop match - check \n", " if checked_as_restriction == False and list_has_value(dprops, prop):\n", " prop_def = model[\"props\"][prop]\n", " prop_type= prop_def[\"propType\"]\n", " all_ranges = []\n", " for rg in model[\"props\"][prop][\"ranges\"] + model[\"props\"][prop][\"rangeUs\"]:\n", " if is_bnode(rg) == False:\n", " all_ranges.append(rg)\n", " \n", " if literal and prop_type == \"ObjectProperty\":\n", " print_finding(clazz, inst, prop_assignment, f\"Prop type is {prop_type} but object is literal\")\n", " if literal==False and prop_type == \"DatatypeProperty\":\n", " print_finding(clazz, inst, prop_assignment, f\"Prop type is {prop_type} but object is not a literal\")\n", " if len(all_ranges) > 0 and list_has_value(all_ranges, obj_type) == False:\n", " print_finding(clazz, inst, prop_assignment, f\"Prop ranges are {all_ranges} but object type is not among these\")\n", " if prop_def[\"functional\"]:\n", " # for functional, keep a count and deal with it below\n", " if not(prop in tracker[\"functionals\"]):\n", " tracker[\"functionals\"][prop] = 0\n", " tracker[\"functionals\"][prop] += 1\n", " if checked_as_restriction == False and list_has_value(dprops, prop) ==False:\n", " print_finding(clazz, inst, prop_assignment, f\"Unrecognized prop\")\n", " \n", " # now check tracker\n", " for ko in tracker[\"keys\"]:\n", " num_occ = tracker[\"keys\"][ko]\n", " if num_occ != 1:\n", " print_finding(clazz, inst, None, f\"Key property {ko} appears {num_occ} times. Should be once.\")\n", " for f in tracker[\"functionals\"]:\n", " num_occ = tracker[\"functionals\"][f]\n", " if num_occ > 1:\n", " print_finding(clazz, inst, None, f\"Functional property {f} appears {num_occ} times. Should be once.\")\n", " for p in tracker[\"restrictSome\"]:\n", " for s in tracker[\"restrictSome\"][p]:\n", " num_occ = tracker[\"restrictSome\"][p][s]\n", " if num_occ < 1:\n", " print_finding(clazz, inst, None, f\"Restriction on property {p} having some values from {s} not met.\")\n", " \n", " \n", "\n", "validate_instances(model)" ] }, { "cell_type": "markdown", "id": "3e56d922", "metadata": {}, "source": [ "## Cleanup\n", "If you messed up and need to reload the ontology or sample data .. be careful because there are lots of blank nodes! Because of this the reload is not idempotent. It's better to clean slate before reloading. The script below has several options: dropping one of the three named graphs loaded above, or delete all triples. We recommend dropping each of the three named graphs. \n" ] }, { "cell_type": "code", "execution_count": null, "id": "c7ad8cef", "metadata": {}, "outputs": [], "source": [ "%%sparql\n", "\n", "\n", "# Delete the org ontoloy\n", "drop graph \n", "\n", "# Delete the examples\n", "#drop graph \n", "\n", "# Delete the tester ontology\n", "#drop graph \n", "\n", "# Delete all triples\n", "#delete {?s ?p ?o} where {\n", "# ?s ?p ?o\n", "#}\n", "\n" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.12" } }, "nbformat": 4, "nbformat_minor": 5 }