{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "94a47c81",
   "metadata": {},
   "source": [
    "# Neptune MultiModel (Ask KG Data Product Questions)\n",
    "This notebook shows the movie example from my talk/blog post on using Amazon Neptune to help model a multimodel database solution.\n",
    "\n",
    "The overall flow is discussed in the blog post (TBD) and in the repo https://github.com/aws-samples/amazon-neptune-ontology-example-blog/blob/main/multimodel/README.md. Read that first to understand what we're trying to accomplish.\n",
    "\n",
    "Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved. SPDX-License-Identifier: MIT-0\n",
    "\n",
    "Begin by setting up. Run the next cell to instruct the notebook to get Neptune data from S3 bucket provisioned for you."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "cc1e530b",
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "import os\n",
    "import subprocess\n",
    "\n",
    "stream = os.popen(\"source ~/.bashrc ; echo $STAGE_BUCKET; echo $M2C_ANALYSIS_BUCKET\")\n",
    "lines=stream.read().split(\"\\n\")\n",
    "STAGING_BUCKET=lines[0]\n",
    "STAGING_BUCKET"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "808f1776",
   "metadata": {},
   "source": [
    "## Obtain UML files\n",
    "We use UML to draw our data products, their implementations, and their relationships. \n",
    "\n",
    "Run the next cell to get a copy of those UML models in the notebook instance."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "eb659729",
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "%%bash -s \"$STAGING_BUCKET\"\n",
    "\n",
    "mkdir -p uml\n",
    "mkdir -p mmgen\n",
    "cd uml\n",
    "aws s3 cp s3://$1/uml . --recursive\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "339364c1",
   "metadata": {},
   "source": [
    "## Extract data products/impl from UML files\n",
    "UML is represented in XML Metadata Interchange (XMI) form. Let's extract the main details from those files into Python data structures. \n",
    "\n",
    "Run the next cell to extract the details we need from UML."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "18ce45b9",
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "import sys\n",
    "import json\n",
    "import xml.etree.ElementTree as ET\n",
    "\n",
    "# Namespace stuff for XMI parsing\n",
    "NS={\n",
    "    \"uml\": \"http://www.omg.org/spec/UML/20131001\",\n",
    "    \"xmi\": \"http://www.omg.org/spec/XMI/20131001\",\n",
    "    \"MMProfile\": \"http://www.magicdraw.com/schemas/MMProfile.xmi\"\n",
    "}\n",
    "XMI_ID = \"{\" + NS['xmi'] + \"}id\"\n",
    "XMI_IDREF = \"{\" + NS['xmi'] + \"}idref\"\n",
    "\n",
    "def get_attrib(elem, name):\n",
    "    if name in elem.attrib:\n",
    "        return elem.attrib[name]\n",
    "    else:\n",
    "        return \"\"\n",
    "        \n",
    "def add_tagval(stereotype, tag, val):\n",
    "    if tag in stereotype['tags']:\n",
    "        stereotype['tags'][tag].append(val)\n",
    "    else:\n",
    "        stereotype['tags'][tag] = [val]\n",
    "\n",
    "def get_tags(elem, tags, stereotype):\n",
    "    for tag in tags:\n",
    "        aval = get_attrib(elem, tag)\n",
    "        if len(aval) > 0:\n",
    "            add_tagval(stereotype, tag, aval)\n",
    "        tagvs = elem.findall(tag)\n",
    "        for tagv in tagvs:\n",
    "            add_tagval(stereotype, tag, tagv.text)\n",
    "\n",
    "umlextract = {}\n",
    "\n",
    "def extract(filename):\n",
    "    path = f\"uml/{filename}\"\n",
    "    packages = {}\n",
    "    classes = {}\n",
    "    usages = {}\n",
    "    props = {}\n",
    "    imports = {}\n",
    "    datatypes = {}\n",
    "    enums = {}\n",
    "    \n",
    "    print(\"Parsing \" + path)\n",
    "    tree = ET.parse(path)\n",
    "\n",
    "    # packages\n",
    "    for elem in tree.findall(\"uml:Model//packagedElement[@xmi:type='uml:Package']\", NS):\n",
    "        id = get_attrib(elem, XMI_ID)\n",
    "        name = get_attrib(elem, 'name')\n",
    "        parent = tree.findall(f'.//packagedElement[@xmi:id=\"{id}\"]...', NS)\n",
    "        parent_id = get_attrib(parent[0], XMI_ID) if len(parent) == 1 else  \"\";\n",
    "        packages[id] = { 'id': id, 'name': name, 'parent': parent_id, 'stereotypes': []}\n",
    "\n",
    "    # package imports\n",
    "    for elem in tree.findall(\"uml:Model//packageImport[@xmi:type='uml:PackageImport']\", NS):\n",
    "        id = get_attrib(elem, XMI_ID)\n",
    "        name = get_attrib(elem, 'name')\n",
    "        parent = tree.findall(f'.//packagedElement[@xmi:id=\"{id}\"]...', NS)\n",
    "        parent_id = get_attrib(parent[0], XMI_ID) if len(parent) == 1 else  \"\";\n",
    "        ip = elem.find(\"importedPackage\", NS)\n",
    "        href = get_attrib(ip, \"href\")\n",
    "        imports[id] = { 'id': id, 'name': name, 'parent': parent_id, 'href': href, 'stereotypes': []}\n",
    "\n",
    "    # classes\n",
    "    for elem in tree.findall(\"uml:Model//packagedElement[@xmi:type='uml:Class']\", NS):\n",
    "        id = get_attrib(elem, XMI_ID)\n",
    "        name = get_attrib(elem, 'name')\n",
    "        parent = tree.findall(f'.//packagedElement[@xmi:id=\"{id}\"]...', NS)\n",
    "        parent_id = get_attrib(parent[0], XMI_ID) if len(parent) == 1 else  \"\";\n",
    "\n",
    "        attribs = elem.findall(\"ownedAttribute\", NS)\n",
    "        class_attribs = {}\n",
    "        for a in attribs:\n",
    "            aid = get_attrib(a, XMI_ID)\n",
    "            aname = get_attrib(a, 'name')\n",
    "            aggregation = get_attrib(a, 'aggregation')\n",
    "            atype = get_attrib(a, 'type')\n",
    "            assoc = get_attrib(a, 'association')\n",
    "            props[aid] = id #map property to clazz\n",
    "            class_attribs[aid] = { 'id': aid, 'name': aname, 'aggregation': aggregation, 'type': atype, 'association': assoc, 'stereotypes': []}\n",
    "\n",
    "        classes[id] = { 'id': id, 'name': name, 'parent': parent_id,  'usages': {}, 'stereotypes': [], 'properties': class_attribs}\n",
    "\n",
    "    # datatypes\n",
    "    for elem in tree.findall(\"uml:Model//packagedElement[@xmi:type='uml:DataType']\", NS):\n",
    "        id = get_attrib(elem, XMI_ID)\n",
    "        name = get_attrib(elem, 'name')\n",
    "        datatypes[id] = { 'id': id, 'name': name}\n",
    "        \n",
    "    # enums\n",
    "    for elem in tree.findall(\"uml:Model//packagedElement[@xmi:type='uml:Enumeration']\", NS):\n",
    "        id = get_attrib(elem, XMI_ID)\n",
    "        name = get_attrib(elem, 'name')\n",
    "        lits = []\n",
    "        for lit_node in elem.findall(\"./ownedLiteral\", NS ):\n",
    "            lits.append(get_attrib(lit_node, 'name'))\n",
    "            \n",
    "        enums[id] = { 'id': id, 'name': name, 'literals': lits}\n",
    "\n",
    "\n",
    "    # usages\n",
    "    for elem in tree.findall(\"uml:Model//packagedElement[@xmi:type='uml:Usage']\", NS):\n",
    "        id = get_attrib(elem, XMI_ID)\n",
    "        target = get_attrib(elem.find('supplier'), XMI_IDREF)\n",
    "        source = get_attrib(elem.find('client'), XMI_IDREF)\n",
    "        targetHref = get_attrib(elem.find('supplier'),'href')\n",
    "        usages[id] = source\n",
    "        if not(source in classes):\n",
    "            print(f\"Warn: usage broken ref {source}\")\n",
    "        else:\n",
    "            u = {'id': id, 'target': target, 'targetHref': targetHref, 'stereotypes': []}\n",
    "            classes[source]['usages'][id] = u\n",
    "\n",
    "    # data products - and link to classes\n",
    "    for elem in tree.findall(\"./MMProfile:DataProduct\", NS):\n",
    "        id = get_attrib(elem, XMI_ID)\n",
    "        clazz_id = get_attrib(elem, 'base_Class')\n",
    "        if not(clazz_id in classes):\n",
    "            print(f\"Warn: stereotype (id) broken ref {clazz_id}\")\n",
    "        else:\n",
    "            classes[clazz_id]['isProduct'] = True\n",
    "\n",
    "    # impls, and link to classes\n",
    "    for elem in tree.findall(\"./MMProfile:DataProductImpl\", NS):\n",
    "        id = get_attrib(elem, XMI_ID)\n",
    "        clazz_id = get_attrib(elem, 'base_Class')\n",
    "        if not(clazz_id in classes):\n",
    "            print(f\"Warn: stereotype (id) broken ref {clazz_id}\")\n",
    "        else:\n",
    "            classes[clazz_id]['isImpl'] = True\n",
    "\n",
    "    # usage rels\n",
    "    urels = {'joins': ['joinAttrib', 'myAttrib'], \n",
    "        'refersTo': ['refersAttrib', 'myAttrib'], \n",
    "        'hasImpl': [], 'caches': [], 'copies': [], 'locatedIn':[],\n",
    "        'similarTo': ['simReason', 'simAlgorithm'], \n",
    "        'config': ['configKV'],\n",
    "        'hasSource': ['integrationType', 'sourceDesc', 'sourceDataSet', 'sourceEventType'],\n",
    "        'federates': ['fedURI']\n",
    "        }\n",
    "    for u in urels:\n",
    "        for elem in tree.findall(f\"./MMProfile:{u}\", NS):\n",
    "            id = get_attrib(elem, XMI_ID)\n",
    "            usage = get_attrib(elem, 'base_Usage')\n",
    "            elem_id = get_attrib(elem, 'base_Element')\n",
    "            stereotype = {'name': u, 'tags': {}}\n",
    "            get_tags(elem, urels[u], stereotype)\n",
    "\n",
    "            if usage in usages:\n",
    "                source = usages[usage]\n",
    "                classes[source]['usages'][usage]['stereotypes'].append(stereotype)\n",
    "            elif elem_id in usages:\n",
    "                source = usages[elem_id]\n",
    "                classes[source]['usages'][elem_id]['stereotypes'].append(stereotype)\n",
    "\n",
    "    # stereotypes at class/package level\n",
    "    srels = {'awsService': ['service'], \n",
    "        'awsResource': ['resource'], \n",
    "        'usagePattern': ['pattern'], \n",
    "        'config': ['configKV'],\n",
    "        'hasSource': ['integrationType', 'sourceDesc', 'sourceDataSet', 'sourceEventType'],\n",
    "        'federates': ['fedURI']\n",
    "        }\n",
    "    for u in srels:\n",
    "        for elem in tree.findall(f\"./MMProfile:{u}\", NS):\n",
    "            id = get_attrib(elem, XMI_ID)\n",
    "            clazz = get_attrib(elem, 'base_Class')\n",
    "            pkg = get_attrib(elem, 'base_Package')\n",
    "            pkgi = get_attrib(elem, 'base_PackageImport')\n",
    "            elem_id = get_attrib(elem, 'base_Element')\n",
    "            stereotype = {'name': u, 'tags': {}}\n",
    "            get_tags(elem, srels[u], stereotype)\n",
    "            if clazz in classes:\n",
    "                classes[clazz]['stereotypes'].append(stereotype)\n",
    "            elif pkg in packages:\n",
    "                packages[pkg]['stereotypes'].append(stereotype)\n",
    "            elif pkgi in imports:\n",
    "                imports[pkgi]['stereotypes'].append(stereotype)\n",
    "            elif elem_id in classes:\n",
    "                classes[elem_id]['stereotypes'].append(stereotype)\n",
    "            elif elem_id in packages:\n",
    "                packages[elem_id]['stereotypes'].append(stereotype)\n",
    "\n",
    "    # property-level\n",
    "    prels = {'productKey': [], 'config': ['configKV'] }\n",
    "    for u in prels:\n",
    "        for elem in tree.findall(f\"./MMProfile:{u}\", NS):\n",
    "            id = get_attrib(elem, XMI_ID)\n",
    "            property = get_attrib(elem, 'base_Property')\n",
    "            elem_id = get_attrib(elem, 'base_Element')\n",
    "            stereotype = {'name': u, 'tags': {}}\n",
    "            get_tags(elem, prels[u], stereotype)\n",
    "            if property in props:\n",
    "                clazz = props[property]\n",
    "                classes[clazz]['properties'][property]['stereotypes'].append(stereotype)\n",
    "            elif elem_id in props:\n",
    "                clazz = props[elem_id]\n",
    "                classes[clazz]['properties'][property]['stereotypes'].append(stereotype)\n",
    "                \n",
    "    print(\"done\")\n",
    "    umlextract[filename] = {'packages': packages, 'classes': classes, \n",
    "        'usages': usages, 'props': props, 'imports': imports, 'datatypes': datatypes, 'enums': enums}\n",
    "\n",
    "\n",
    "UML_FILES = ['DataLake.xml', 'VideoAnalysis.xml', 'StoryAnalysis.xml', 'MovieDoc.xml', 'KnowledgeGraph.xml', 'Bookstore.xml']\n",
    "for uf in UML_FILES:\n",
    "    extract(uf)\n",
    "\n",
    "umlextract"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "65d1dc97",
   "metadata": {},
   "source": [
    "## Combine UML output\n",
    "We collected lots of different details from UML. Let's bring them together into a clean list products and impls.\n",
    "\n",
    "Run the next cell."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "5ec751b5",
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "file_prod_impl_map = None\n",
    "file_prod_impl_map = {}\n",
    "\n",
    "# Build properties for the given class at the given level\n",
    "def build_properties(filename, clazz, level, visited):\n",
    "    props = umlextract[filename]['classes'][clazz]['properties']\n",
    "    for p in props:\n",
    "        name = props[p]['name']\n",
    "        type_id = props[p]['type']\n",
    "        stereotypes = props[p]['stereotypes']\n",
    "        type_name = \"\"  \n",
    "        subtype = None\n",
    "        literals = None\n",
    "        \n",
    "        if type_id in umlextract[filename]['classes'] and \\\n",
    "            not('isProduct' in umlextract[filename]['classes'][type_id]) and \\\n",
    "            not('isImpl' in umlextract[filename]['classes'][type_id]):\n",
    "            \n",
    "            type_name = umlextract[filename]['classes'][type_id]['name']\n",
    "            if not(clazz in visited):\n",
    "                visited[clazz] = clazz\n",
    "                subtype = build_properties(filename, type_id, [], visited)\n",
    "            \n",
    "        if type_id in umlextract[filename]['datatypes']:\n",
    "            type_name = umlextract[filename]['datatypes'][type_id]['name']\n",
    "            \n",
    "        if type_id in umlextract[filename]['enums']:\n",
    "            type_name = umlextract[filename]['enums'][type_id]['name']\n",
    "            literals = umlextract[filename]['enums'][type_id]['literals']\n",
    "            \n",
    "        prop_entry = {\n",
    "            'name': name,\n",
    "            'type': type_name,\n",
    "            'subtype': subtype,\n",
    "            'literals': literals,\n",
    "            'stereotypes': stereotypes\n",
    "        }\n",
    "        level.append(prop_entry)\n",
    "        \n",
    "    return level\n",
    "    \n",
    "def find_targets(filename, target_id, target_href):\n",
    "    target_spec = {'target': None, 'targets': [], 'target_file': None}\n",
    "    if len(target_id) > 0:\n",
    "        if target_id in umlextract[filename]['classes']:\n",
    "            target_spec['target'] = umlextract[filename]['classes'][target_id]['name']\n",
    "            target_spec['targets'].append(target_spec['target'])\n",
    "            target_spec['target_file'] = filename\n",
    "        else:\n",
    "            print(f\"Warn: broken target in usage {u}\")\n",
    "    elif len(target_href) > 0:                \n",
    "        toks = target_href.split(\"#\")\n",
    "        if len(toks) == 2:\n",
    "            target_spec['target_file'] = toks[0]\n",
    "            target_elem = toks[1]\n",
    "            if target_spec['target_file'] in umlextract:\n",
    "                if target_elem in umlextract[target_spec['target_file']]['classes']:\n",
    "                    target_spec['targets'].append(umlextract[target_spec['target_file']]['classes'][target_elem]['name'])\n",
    "                elif target_elem in umlextract[target_spec['target_file']]['packages']:\n",
    "                    for c in file_prod_impl_map[target_spec['target_file']]['products']:\n",
    "                        if target_elem in c['packageAncestry']:\n",
    "                            target_spec['targets'].append(c['name'])\n",
    "\n",
    "            else:\n",
    "                print(f\"Warn: unknown target file in usage {u}\")\n",
    "        else:\n",
    "            print(f\"Warn: unexpected target ref in usage {u}\")\n",
    "    if len(target_spec['targets']) == 1:\n",
    "        target_spec['target'] = target_spec['targets'][0]\n",
    "    return target_spec\n",
    "\n",
    "# Bring together all properties, usages, and stereotypes of products an impls for the given UML filename\n",
    "def combine(filename):\n",
    "    print(\"Combining \" + filename)\n",
    "    products = []\n",
    "    impls = []\n",
    "    \n",
    "    # consider all the classes in the UML file\n",
    "    for c in umlextract[filename]['classes']:\n",
    "        uobj = umlextract[filename]['classes'][c]\n",
    "        obj = {\n",
    "            'name': uobj['name'],\n",
    "            'stereotypes': [],\n",
    "            'properties': build_properties(filename, c, [], {}),\n",
    "            'packageAncestry': []\n",
    "        }\n",
    "\n",
    "        # remember if it's a product or impl\n",
    "        if 'isProduct' in uobj:\n",
    "            products.append(obj)\n",
    "        if 'isImpl' in uobj:\n",
    "            impls.append(obj)\n",
    "            \n",
    "        # class stereotypes - incorporate\n",
    "        for st in uobj['stereotypes']:\n",
    "            obj['stereotypes'].append({\n",
    "                'name': st['name'],\n",
    "                'tags': st['tags']\n",
    "            })\n",
    "\n",
    "        # stereotype package imports\n",
    "        for imp in umlextract[filename]['imports']:   \n",
    "            target_href = umlextract[filename]['imports'][imp]['href']\n",
    "            target_spec = find_targets(filename, \"\", target_href)\n",
    "\n",
    "            for s in umlextract[filename]['imports'][imp]['stereotypes']:    \n",
    "                print(\"Add import \" + s['name'])\n",
    "                obj['stereotypes'].append({\n",
    "                    'name': s['name'],\n",
    "                    'tags': s['tags'],\n",
    "                    'targetClass': target_spec['target'],\n",
    "                    'targetClasses': target_spec['targets'],\n",
    "                    'targetFile': target_spec['target_file']\n",
    "                })\n",
    "\n",
    "        for u in uobj['usages']:\n",
    "            target_id = uobj['usages'][u]['target']\n",
    "            target_href = uobj['usages'][u]['targetHref']\n",
    "            target_spec = find_targets(filename, target_id, target_href)\n",
    "            \n",
    "            for s in uobj['usages'][u]['stereotypes']:\n",
    "                print(\"Add usage  \" + s['name'])\n",
    "                obj['stereotypes'].append({\n",
    "                    'name': s['name'],\n",
    "                    'tags': s['tags'],\n",
    "                    'targetClass': target_spec['target'],\n",
    "                    'targetClasses': target_spec['targets'],\n",
    "                    'targetFile': target_spec['target_file']\n",
    "                })\n",
    "            \n",
    "        # inherit stereotypes from ancestor packages\n",
    "        currobj=uobj\n",
    "        while len(currobj['parent']) > 0 and currobj['parent'] in umlextract[filename]['packages']:       \n",
    "            currobj = umlextract[filename]['packages'][uobj['parent']]\n",
    "            obj['packageAncestry'].append(currobj['id'])\n",
    "            for s in currobj['stereotypes']:\n",
    "                print(\"Add ancestry \" + s['name'])\n",
    "                obj['stereotypes'].append(s)\n",
    "                \n",
    "    file_prod_impl_map[filename] = {'products': products, 'impls': impls}\n",
    "\n",
    "for uf in UML_FILES:\n",
    "    combine(uf)\n",
    "\n",
    "file_prod_impl_map"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "84009a8c",
   "metadata": {},
   "source": [
    "## Look at Movie DocStore products\n",
    "Before we continue, let's take a closer look at what we have so far. We'll look at data products in the movie docstore. \n",
    "\n",
    "Run the next cell and review the products. You'll be reading JSON. Notice that each product has stereotypes (source, join, impl and properties (a few levels deep in some cases).\n",
    "\n",
    "We'll be converting these to RDF shortly. We won't transform all properties, just those with stereotypes or involved in reference relationships.\n",
    "\n",
    "What's you're looking at here is product summaries with enough detail to publish to a mesh marketplace.\n",
    "\n",
    "Run the cell."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "6de4fafc",
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "import json\n",
    "finder = ['MovieDocument', 'ContributorDocument', 'RoleDocument']\n",
    "for p in file_prod_impl_map['MovieDoc.xml']['products']:\n",
    "    if p['name'] in finder:\n",
    "        j = json.dumps(p, indent=2)\n",
    "        print(\"\\n\\n**** Looking at ***** \" + p['name'])\n",
    "        print(j)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e1064843",
   "metadata": {},
   "source": [
    "## Create RDF\n",
    "Now let's convert the details we collected from the UML models into RDF. We'll be generating an ontology in the file mmgen.ttl, which we'll save to the folder mmgen in the notebook instance. This ontology includes all data products and implementations from the UML models, including the stereotypes! \n",
    "\n",
    "The generated ontology builds on the existing core multimodel ontology mm.ttl. \n",
    "\n",
    "Generated products are implemented on several AWS services, including Amazon Neptune. Neptune plays two roles. First, it is the knowledge graph in which we load the generated ontology to keep track of all data products and ask questions about how they are related. Second, it PROVIDES its own data products as RDF resources. And we use an ontology to represent those resources. That ontology is movkg.ttl. \n",
    "\n",
    "That makes three ontologies: mmgen.ttl (which we are about to generate), mm.ttl (which is already written and mmgen.ttl builds on top of), movkg.ttl (which is aleady written and expands on the KG products described in mmgen.ttl). \n",
    "\n",
    "Run the next cell to generate mmgen.ttl."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "31aee252",
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "!pip install rdflib"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "bb78fb94",
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "import os\n",
    "import json\n",
    "from rdflib import Graph, Literal, RDF, RDFS, URIRef, XSD, OWL, BNode\n",
    "\n",
    "NS = \"http://amazon.com/aws/wwso/neptune/demo/multimodel\"\n",
    "DPURI = URIRef(f\"{NS}/DataProduct\")\n",
    "IMPLURI = URIRef(f\"{NS}/DataProductImpl\")\n",
    "MM_ONTOLOGY = URIRef(f\"{NS}/ontology\")\n",
    "NIL = URIRef(\"http://www.w3.org/1999/02/22-rdf-syntax-ns#nil\")\n",
    "\n",
    "\n",
    "PKG = {\n",
    "    \"DataLake.xml\": \"lake\", \n",
    "    \"KnowledgeGraph.xml\": \"kg\",\n",
    "    \"MovieDoc.xml\": \"moviedoc\",\n",
    "    \"StoryAnalysis.xml\": \"story\",\n",
    "    \"VideoAnalysis.xml\": \"video\",\n",
    "    \"Bookstore.xml\": \"bookstoredemo\"\n",
    "}\n",
    "\n",
    "LITERAL_STEREOS = {\n",
    "    \"federates\": {\"tag\": \"fedURI\", \"obj\": lambda val, umlfile, clazz, stereo: URIRef(val) } , \n",
    "    \"awsService\": {\"tag\": \"service\", \"obj\":  lambda val, umlfile, clazz, stereo: make_uri(\"aws\", val) } , \n",
    "    \"awsResource\": {\"tag\": \"resource\", \"obj\":  lambda val, umlfile, clazz, stereo: Literal(val) } , #keep is loose for now\n",
    "    \"usagePattern\": {\"tag\": \"pattern\", \"obj\":  lambda val, umlfile, clazz, stereo: Literal(val) } \n",
    "}\n",
    "\n",
    "USAGE_STEREOS = [\"copies\", \"caches\", \"locatedIn\", \"hasImpl\"]\n",
    "\n",
    "OREL_STEREOS = {\n",
    "    \"hasSource\": {\n",
    "        \"orel\": \"Source\",\n",
    "        \"tags\": {\n",
    "            \"integrationType\": {\n",
    "                \"p\": lambda val, umlfile, clazz, stereo, target_file:make_top_uri(\"integrationType\"), \n",
    "                \"o\": lambda val, umlfile, clazz, stereo, target_file: URIRef(make_top_uri(val))\n",
    "            },\n",
    "            \"sourceDesc\": {\n",
    "                \"p\": lambda val, umlfile, clazz, stereo, target_file:RDFS.comment, \n",
    "                \"o\": lambda val, umlfile, clazz, stereo, target_file: Literal(val)\n",
    "            },\n",
    "            \"sourceDataSet\": {\n",
    "                \"p\": lambda val, umlfile, clazz, stereo, target_file:make_top_uri(\"sourceDataSet\"),\n",
    "                \"o\": lambda val, umlfile, clazz, stereo, target_file: Literal(val)\n",
    "            },\n",
    "            \"sourceEventType\": {\n",
    "                \"p\": lambda val, umlfile, clazz, stereo, target_file:make_top_uri(\"sourceEventType\"),\n",
    "                \"o\": lambda val, umlfile, clazz, stereo, target_file: make_uri(\"aws\", val)\n",
    "            }\n",
    "        }\n",
    "    } , \n",
    "    \"similarTo\": {\n",
    "        \"orel\": \"Similarity\",\n",
    "        \"tags\": {\n",
    "            \"simReason\": {\n",
    "                \"p\": lambda val, umlfile, clazz, stereo, target_file:make_top_uri(\"simReason\"),\n",
    "                \"o\": lambda val, umlfile, clazz, stereo, target_file: Literal(val)\n",
    "            },\n",
    "            \"simAlgorithm\": {\n",
    "                \"p\": lambda val, umlfile, clazz, stereo, target_file:make_top_uri(\"simAlgorithm\"),\n",
    "                \"o\": lambda val, umlfile, clazz, stereo, target_file: Literal(val)   \n",
    "            }\n",
    "        }\n",
    "    },\n",
    "    \"joins\": {\n",
    "        \"orel\": \"Ref\",\n",
    "        \"tags\": {\n",
    "            \"joinAttrib\": {\n",
    "                \"p\": lambda val, umlfile, clazz, stereo, target_file: make_top_uri(\"hasNeighborAttribute\"),\n",
    "                \"o\": lambda val, umlfile, clazz, stereo, target_file: make_uri(PKG[target_file], val)\n",
    "            }, \n",
    "            \"myAttrib\": {\n",
    "                \"p\": lambda val, umlfile, clazz, stereo, target_file: make_top_uri(\"hasMyAttribute\"),\n",
    "                \"o\": lambda val, umlfile, clazz, stereo, target_file: make_uri(PKG[target_file], val)\n",
    "            }\n",
    "        }        \n",
    "    },\n",
    "    \"refersTo\": {\n",
    "        \"orel\": \"Ref\",\n",
    "        \"tags\": {\n",
    "            \"refersAttrib\": {\n",
    "                \"p\": lambda val, umlfile, clazz, stereo, target_file :make_top_uri(\"hasNeighborAttribute\"),\n",
    "                \"o\": lambda val, umlfile, clazz, stereo, target_file: make_uri(PKG[target_file], val)\n",
    "            }, \n",
    "            \"myAttrib\": {\n",
    "                \"p\": lambda val, umlfile, clazz, stereo, target_file: make_top_uri(\"hasMyAttribute\"),\n",
    "                \"o\": lambda val, umlfile, clazz, stereo, target_file: make_uri(PKG[umlfile], val)  \n",
    "            }\n",
    "        }        \n",
    "    }\n",
    "}\n",
    "\n",
    "def make_top_uri(name):\n",
    "    return URIRef(f\"{NS}/{name}\")\n",
    "\n",
    "def make_uri(ns, name):\n",
    "    return URIRef(f\"{NS}/{ns}/{name}\")\n",
    "\n",
    "def create_prop_from_rel(g, puri, name):\n",
    "    g.add((puri, RDF.type, OWL.DatatypeProperty))\n",
    "    g.add((puri, RDFS.label, Literal(name)))\n",
    "    g.add((puri, RDFS.isDefinedBy, MM_ONTOLOGY))\n",
    "    return puri\n",
    "\n",
    "def create_data_prop(g, ns, name, domain_clazz):\n",
    "    puri = make_uri(ns, name)\n",
    "    g.add((puri, RDF.type, OWL.DatatypeProperty))\n",
    "    g.add((puri, RDFS.label, Literal(name)))\n",
    "    g.add((puri, RDFS.isDefinedBy, MM_ONTOLOGY))\n",
    "    g.add((puri, make_top_uri(\"domainIncludes\"), domain_clazz))\n",
    "    return puri\n",
    "\n",
    "def create_data_type_prop(g, ns, name, propuri):\n",
    "    puri = make_uri(ns, name)\n",
    "    g.add((puri, RDF.type, OWL.DatatypeProperty))\n",
    "    g.add((puri, RDFS.label, Literal(name)))\n",
    "    g.add((puri, RDFS.isDefinedBy, MM_ONTOLOGY))\n",
    "    g.add((propuri, RDFS.subPropertyOf, puri))\n",
    "    return puri\n",
    "\n",
    "\n",
    "def create_data_product(g, ns, name):\n",
    "    puri = make_uri(ns, name)\n",
    "    g.add((puri, RDF.type, OWL.Class))\n",
    "    g.add((puri, RDFS.subClassOf, DPURI))\n",
    "    g.add((puri, RDFS.label, Literal(name)))\n",
    "    g.add((puri, RDFS.isDefinedBy, MM_ONTOLOGY))\n",
    "    return puri\n",
    "\n",
    "def create_impl(g, ns, name):\n",
    "    puri = make_uri(ns, name)\n",
    "    g.add((puri, RDF.type, OWL.Class))\n",
    "    g.add((puri, RDFS.subClassOf, IMPLURI))\n",
    "    g.add((puri, RDFS.label, Literal(name)))\n",
    "    g.add((puri, RDFS.isDefinedBy, MM_ONTOLOGY))\n",
    "    return puri\n",
    "    \n",
    "def add_config(g, cfg):\n",
    "    g.add((cfg, RDFS.subClassOf, make_top_uri(\"config\")))\n",
    "    \n",
    "def add_po(g, s, p, o):\n",
    "    g.add((s, p, o))\n",
    "\n",
    "def create_orel(g, oreltype, s, p, po):\n",
    "    po_uri = BNode()\n",
    "    g.add((s, p, po_uri))\n",
    "    g.add((po_uri, RDF.type, OWL.NamedIndividual))\n",
    "    g.add((po_uri, RDF.type, make_top_uri(oreltype)))\n",
    "    g.add((po_uri, RDFS.isDefinedBy, MM_ONTOLOGY))\n",
    "    for one_po in po:\n",
    "        g.add((po_uri, one_po['p'], one_po['o']))\n",
    "\n",
    "def convert_class(umlfile, clazz, s):\n",
    "    # class-level stereotypes\n",
    "    for stereo in clazz['stereotypes']:\n",
    "        # the predicate\n",
    "        sname = stereo['name']\n",
    "        p = make_top_uri(sname)\n",
    "        if sname in LITERAL_STEREOS:\n",
    "            # TODO - some of the objects should be URIs, not literals\n",
    "            tag = LITERAL_STEREOS[sname]['tag']\n",
    "            obj = LITERAL_STEREOS[sname]['obj']\n",
    "            if tag in stereo['tags']:\n",
    "                for val in stereo['tags'][tag]:\n",
    "                    add_po(g, s, p, obj(val, umlfile, clazz, stereo))\n",
    "            else:\n",
    "                print(\"Warn: incorrect tag structure in \" + str(stereo))\n",
    "                \n",
    "        elif sname in USAGE_STEREOS:\n",
    "            if len(stereo['targetClass']) > 0:\n",
    "                o = make_uri(PKG[stereo['targetFile']], stereo['targetClass'])\n",
    "                add_po(g, s, p, o)\n",
    "            elif len(stereo['targetClasses']) > 0:\n",
    "                for cl in stereo['targetClasses']:\n",
    "                    o = make_uri(PKG[stereo['targetFile']], cl)\n",
    "                    add_po(g, s, p, o)\n",
    "\n",
    "        elif sname in OREL_STEREOS:\n",
    "            po = []\n",
    "            target_file = stereo['targetFile'] if 'targetFile' in stereo else None\n",
    "            for t in OREL_STEREOS[sname]['tags']:\n",
    "                if t in stereo['tags']:\n",
    "                    for val in stereo['tags'][t]:\n",
    "                        po_p = OREL_STEREOS[sname]['tags'][t][\"p\"](val, umlfile, clazz, stereo, target_file)\n",
    "                        po_o = OREL_STEREOS[sname]['tags'][t][\"o\"](val, umlfile, clazz, stereo, target_file)\n",
    "                        po.append({\"p\": po_p, \"o\": po_o})\n",
    "                        if po_p == make_top_uri(\"hasNeighborAttribute\") or po_o == make_top_uri(\"hasMyAttribute\"):\n",
    "                            create_prop_from_rel(g, po_o, val)\n",
    "            if 'targetClass' in stereo and not (stereo['targetClass'] is None):\n",
    "                po.append({\"p\": make_top_uri(\"hasNeighbor\"), \"o\": make_uri(PKG[stereo['targetFile']], stereo['targetClass'])})\n",
    "            elif 'targetClasses' in stereo and len(stereo['targetClasses']) > 0:\n",
    "                for t in stereo['targetClasses']:\n",
    "                    po.append({\"p\": make_top_uri(\"hasNeighbor\"), \"o\": make_uri(PKG[stereo['targetFile']], t)})\n",
    "            create_orel(g, OREL_STEREOS[sname]['orel'], s, p, po)\n",
    "\n",
    "        elif sname == 'config':\n",
    "            if 'configKV' in stereo['tags']:\n",
    "                for kv in stereo['tags']['configKV']:\n",
    "                    toks = kv.split(\",\")\n",
    "                    if len(toks) == 2:\n",
    "                        cp = make_uri(\"aws\", \"config-\" + toks[0])\n",
    "                        co = Literal(toks[1])\n",
    "                        add_po(g, s, cp, co)\n",
    "                        add_config(g, cp)\n",
    "                    else:\n",
    "                        print(\"Warn: illegal config in \" + str(stereo))\n",
    "        else:\n",
    "            print(\"Warn: unknown stereotype \" + str(stereo))\n",
    "            \n",
    "            \n",
    "    # Now onto properties\n",
    "    # \n",
    "    # We won't take all the properties. That's great input for data mesh, but we'll consider only a few properties\n",
    "    # Products keys\n",
    "    # Props with config (*** Not today, maybe later ***)\n",
    "    # Properties used in join/refers\n",
    "    #\n",
    "    # Will not consider\n",
    "    # Enums\n",
    "    # Properties not dealt with above\n",
    "    # Properties further down the tree.\n",
    "    # \n",
    "    # How to handle types\n",
    "    # Model as subClassOf instead of range\n",
    "    # The IMDBID type is a great illustration\n",
    "    # It is better to say \"MovieID is subclass of IMDBID\" than \n",
    "    # \"MovieID has range IMDBID\".\n",
    "    keys = []\n",
    "    for prop in clazz['properties']:\n",
    "        prop_name = prop['name']\n",
    "        for prop_stereo in prop['stereotypes']:\n",
    "            propuri = create_data_prop(g, PKG[umlfile], prop_name, s)\n",
    "            if prop_stereo['name'] == 'productKey':\n",
    "                keys.append(propuri)\n",
    "            if len(prop['type']) > 0:\n",
    "                create_data_type_prop(g, PKG[umlfile], prop['type'], propuri)\n",
    "                \n",
    "    if len(keys) > 0:\n",
    "        list_uri = BNode()\n",
    "        add_po(g, s, OWL.hasKey, list_uri)\n",
    "        idx = 0\n",
    "        for k in keys:\n",
    "            add_po(g, list_uri, RDF.first, k)\n",
    "            idx += 1\n",
    "            if idx == len(keys):\n",
    "                add_po(g, list_uri, RDF.rest, NIL)\n",
    "            else:\n",
    "                next_list_uri = BNode()\n",
    "                add_po(g, list_uri, RDF.rest, next_list_uri)  \n",
    "                list_uri = next_list_uri\n",
    "\n",
    "g = Graph()\n",
    "for umlfile in file_prod_impl_map:\n",
    "    for product in file_prod_impl_map[umlfile]['products']:\n",
    "        print(product['name'])\n",
    "        s =  create_data_product(g, PKG[umlfile], product['name'])\n",
    "        convert_class(umlfile, product, s)\n",
    "    for impl in file_prod_impl_map[umlfile]['impls']:\n",
    "        print(impl['name'])\n",
    "        s =  create_impl(g, PKG[umlfile], impl['name'])\n",
    "        convert_class(umlfile, impl, s)\n",
    "    \n",
    "g.serialize(destination = 'mmgen/mmgen.ttl', format='turtle')"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ee0e1c5a",
   "metadata": {},
   "source": [
    "## Copy generated RDF to S3\n",
    "mmgen.ttl is on the notebook instance, but we need it in S3 to load it into Neptune. \n",
    "\n",
    "Run the next cell to move it to S3."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "a6d44ff1",
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "%%bash -s \"$STAGING_BUCKET\"\n",
    "\n",
    "cd mmgen\n",
    "aws s3 cp mmgen.ttl s3://$1/data/mmgen.ttl"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "580113a3",
   "metadata": {},
   "source": [
    "## Upload three ontologies\n",
    "Load from S3 all three aforementioned ontologies to Neptune.\n",
    "\n",
    "Run each of the next six cells in sequential order. There are three loads, and three load statuses. The loads might take a few seconds to complete. Wait for the spinner to stop with a status of LOAD_COMPLETED or LOAD_FAILED. The load statuses reveal any issues in each load. We don't expect any."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "f5c63671",
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "%load -s s3://{STAGING_BUCKET}/data/mm.ttl -f turtle --store-to loadres1 --run"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b4a3a840",
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "%load_status {loadres1['payload']['loadId']} --errors --details"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "5f8c7323",
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "%load -s s3://{STAGING_BUCKET}/data/mmgen.ttl -f turtle --store-to loadres2 --run"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "16eacade",
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "%load_status {loadres2['payload']['loadId']} --errors --details"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "9317b788",
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "%load -s s3://{STAGING_BUCKET}/data/movkg.ttl -f turtle --store-to loadres3 --run"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "6120ebaa",
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "%load_status {loadres3['payload']['loadId']} --errors --details"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "36409f6a",
   "metadata": {},
   "source": [
    "## Query the products \n",
    "Now some queries"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "93889bde",
   "metadata": {},
   "source": [
    "### Get list of data products\n",
    "Run the cell and compare to the UML models. The only product not in the UML models is the LonelyProduct. More on this in a moment."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "685ee74b",
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "%%sparql\n",
    "\n",
    "PREFIX : <http://amazon.com/aws/wwso/neptune/demo/multimodel/> \n",
    "\n",
    "select ?product where {\n",
    "    ?product rdfs:subClassOf :DataProduct .\n",
    "} \n",
    "ORDER BY ?product "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "24e90b31",
   "metadata": {},
   "source": [
    "### Get list of products and their impls\n",
    "Let's now see for each product its implementation.\n",
    "\n",
    "First, let's fill in some gaps. An impl can have an impl. We know MovieDocument hasImpl MovieDocumentImpl, and that MovieDocumentImpl copies MovieSearchDocument. Let's complete the chain. The following insert takes impls and impls and ties them to the original product."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b0f5f457",
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "%%sparql\n",
    "\n",
    "PREFIX : <http://amazon.com/aws/wwso/neptune/demo/multimodel/> \n",
    "\n",
    "INSERT {\n",
    "    ?product :hasImpl ?impl \n",
    "}\n",
    "WHERE {\n",
    "    ?product rdfs:subClassOf :DataProduct .\n",
    "    ?impl rdfs:subClassOf :DataProductImpl .    \n",
    "    ?product (:hasImpl|:copies|:caches|:locatedIn)+ ?impl .\n",
    "}"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8f9bdc0a",
   "metadata": {},
   "source": [
    "And now query to bring back products and their impls (including those that are impls or impls)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "c1bee56f",
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "%%sparql\n",
    "\n",
    "PREFIX : <http://amazon.com/aws/wwso/neptune/demo/multimodel/> \n",
    "\n",
    "select ?product (GROUP_CONCAT(?impl;SEPARATOR=\",\") AS ?impls) where {\n",
    "    ?product rdfs:subClassOf :DataProduct .\n",
    "    OPTIONAL { \n",
    "        ?product :hasImpl ?impl .\n",
    "    } .\n",
    "} \n",
    "GROUP BY ?product\n",
    "ORDER BY ?product "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "76e75b43",
   "metadata": {},
   "source": [
    "### Describe a product\n",
    "Try both the Table and Graph tabs!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "bcc9c295",
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "%%sparql\n",
    "\n",
    "# describe mode https://docs.aws.amazon.com/neptune/latest/userguide/sparql-query-hints-for-describe.html#sparql-query-hints-describeMode\n",
    "PREFIX : <http://amazon.com/aws/wwso/neptune/demo/multimodel/> \n",
    "prefix movkg:           <http://amazon.com/aws/wwso/neptune/demo/multimodel/kg/> \n",
    "PREFIX hint: <http://aws.amazon.com/neptune/vocab/v01/QueryHints#>\n",
    "\n",
    "describe movkg:MovieResource\n",
    "{\n",
    "  hint:Query hint:describeMode \"CBD\"\n",
    "}"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0ed54473",
   "metadata": {},
   "source": [
    "### Which products use OpenSearch and Elasticache?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "d9ef787f",
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "%%sparql\n",
    "\n",
    "PREFIX : <http://amazon.com/aws/wwso/neptune/demo/multimodel/> \n",
    "prefix aws:           <http://amazon.com/aws/wwso/neptune/demo/multimodel/aws/> \n",
    "\n",
    "select * where {\n",
    "    ?product rdfs:subClassOf :DataProduct .\n",
    "    ?product :hasImpl/:awsService aws:OpenSearch .\n",
    "    ?product :hasImpl/:awsService aws:Elasticache .\n",
    "}"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d9c792b2",
   "metadata": {},
   "source": [
    "### Story and Movie Related?\n",
    "Use SPARQL ASK to check if StoryAnalysis product is somehow connected to MovieResource or MovieDocument products. It IS!!!!! The blog post discusses why."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "ebc840f1",
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "%%sparql\n",
    "PREFIX : <http://amazon.com/aws/wwso/neptune/demo/multimodel/> \n",
    "prefix movkg:           <http://amazon.com/aws/wwso/neptune/demo/multimodel/kg/> \n",
    "prefix movvideo:           <http://amazon.com/aws/wwso/neptune/demo/multimodel/video/> \n",
    "prefix movstory:           <http://amazon.com/aws/wwso/neptune/demo/multimodel/story/> \n",
    "prefix movlake:           <http://amazon.com/aws/wwso/neptune/demo/multimodel/lake/> \n",
    "prefix movdoc:           <http://amazon.com/aws/wwso/neptune/demo/multimodel/moviedoc/> \n",
    "prefix aws:           <http://amazon.com/aws/wwso/neptune/demo/multimodel/aws/> \n",
    "\n",
    "ask where {\n",
    "    BIND(movstory:StoryAnalysis as ?product) .\n",
    "    \n",
    "    ?product  ((:hasNeighbor|:hasNeighborAttribute|:joins|:refersTo|:similarTo|\n",
    "        :hasSource|:sourceDataSet|rdfs:subPropertyOf|owl:hasKey/rdf:first|rdfs:domain|rdfs:range|rdfs:subPropertyOf) |^ \n",
    "        (:hasNeighbor|:hasNeighborAttribute|:joins|:refersTo|:similarTo|\n",
    "        :hasSource|:sourceDataSet|rdfs:subPropertyOf|owl:hasKey/rdf:first|rdfs:domain|rdfs:range|rdfs:subPropertyOf))* ?rel .\n",
    "\n",
    "    FILTER(?rel = movkg:MovieResource || ?rel = movdoc:MovieDocument) . \n",
    "} \n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e8cf0b3c",
   "metadata": {},
   "source": [
    "### Story and IMDB?\n",
    "Are we able to connect StoryAnalysis to the IMDB? We are!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "8efc3675",
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "%%sparql\n",
    "PREFIX : <http://amazon.com/aws/wwso/neptune/demo/multimodel/> \n",
    "prefix movkg:           <http://amazon.com/aws/wwso/neptune/demo/multimodel/kg/> \n",
    "prefix movvideo:           <http://amazon.com/aws/wwso/neptune/demo/multimodel/video/> \n",
    "prefix movstory:           <http://amazon.com/aws/wwso/neptune/demo/multimodel/story/> \n",
    "prefix movlake:           <http://amazon.com/aws/wwso/neptune/demo/multimodel/lake/> \n",
    "prefix movdoc:           <http://amazon.com/aws/wwso/neptune/demo/multimodel/moviedoc/> \n",
    "prefix aws:           <http://amazon.com/aws/wwso/neptune/demo/multimodel/aws/> \n",
    "\n",
    "ask where {\n",
    "    BIND(movstory:StoryAnalysis as ?product) .\n",
    "    \n",
    "    ?product  ((:hasNeighbor|:hasNeighborAttribute|:joins|:refersTo|:similarTo|\n",
    "        :hasSource|:sourceDataSet|rdfs:subPropertyOf|owl:hasKey/rdf:first|rdfs:domain|rdfs:range|rdfs:subPropertyOf) |^ \n",
    "        (:hasNeighbor|:hasNeighborAttribute|:joins|:refersTo|:similarTo|\n",
    "        :hasSource|:sourceDataSet|rdfs:subPropertyOf|owl:hasKey/rdf:first|rdfs:domain|rdfs:range|rdfs:subPropertyOf))* ?rel .\n",
    "\n",
    "    FILTER(?rel = movdoc:IMDBID) . \n",
    "} "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "676b4c2a",
   "metadata": {},
   "source": [
    "### Story and the Lonely product\n",
    "Can we connect the story product to the lonely product? Of course not. Lonely product is \"lonely\" in the graph sense - it has no neighbors."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "1706b605",
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "%%sparql\n",
    "PREFIX : <http://amazon.com/aws/wwso/neptune/demo/multimodel/> \n",
    "prefix movkg:           <http://amazon.com/aws/wwso/neptune/demo/multimodel/kg/> \n",
    "prefix movvideo:           <http://amazon.com/aws/wwso/neptune/demo/multimodel/video/> \n",
    "prefix movstory:           <http://amazon.com/aws/wwso/neptune/demo/multimodel/story/> \n",
    "prefix movlake:           <http://amazon.com/aws/wwso/neptune/demo/multimodel/lake/> \n",
    "prefix movdoc:           <http://amazon.com/aws/wwso/neptune/demo/multimodel/moviedoc/> \n",
    "prefix aws:           <http://amazon.com/aws/wwso/neptune/demo/multimodel/aws/> \n",
    "\n",
    "ask where {\n",
    "    BIND(movstory:StoryAnalysis as ?product) .\n",
    "    \n",
    "    ?product  ((:hasNeighbor|:hasNeighborAttribute|:joins|:refersTo|:hasSimilarity|\n",
    "        :hasSource|:sourceDataSet|rdfs:subPropertyOf|owl:hasKey/rdf:first|rdfs:domain|rdfs:range|rdfs:subPropertyOf) |^ \n",
    "        (:hasNeighbor|:hasNeighborAttribute|:joins|:refersTo|:hasSimilarity|\n",
    "        :hasSource|:sourceDataSet|rdfs:subPropertyOf|owl:hasKey/rdf:first|rdfs:domain|rdfs:range|rdfs:subPropertyOf))* ?rel .\n",
    "\n",
    "    FILTER(?rel = :LonelyProduct) . \n",
    "} "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4ec4c1ec",
   "metadata": {},
   "source": [
    "## Movie Example"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "502896fd",
   "metadata": {},
   "source": [
    "### Populate sample data\n",
    "Insert a movie, a couple of its roles, stories that mention, video analysis, links to IMDB, DBPedia, Wikidata"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "ab8c28ab",
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "%%sparql\n",
    "PREFIX : <http://amazon.com/aws/wwso/neptune/demo/multimodel/> \n",
    "prefix movkg:           <http://amazon.com/aws/wwso/neptune/demo/multimodel/kg/> \n",
    "prefix movvideo:           <http://amazon.com/aws/wwso/neptune/demo/multimodel/video/> \n",
    "prefix movstory:           <http://amazon.com/aws/wwso/neptune/demo/multimodel/story/> \n",
    "prefix movlake:           <http://amazon.com/aws/wwso/neptune/demo/multimodel/lake/> \n",
    "prefix movdoc:           <http://amazon.com/aws/wwso/neptune/demo/multimodel/moviedoc/> \n",
    "\n",
    "INSERT DATA {\n",
    "    movkg:Shining a movkg:MovieResource .\n",
    "    movkg:Shining movdoc:MovieID \"tt0081505\" .\n",
    "    movkg:Shining movkg:hasDBPediaRef <http://dbpedia.org/resource/The_Shining_(film)> .\n",
    "    movkg:Shining movkg:hasWikidataRef <http://www.wikidata.org/entity/Q186341> .\n",
    "    \n",
    "    # cast - a couple contributors to give the idea\n",
    "    movkg:RoleShining_Jack a movkg:RoleResource .\n",
    "    movkg:RoleShining_Jack movkg:hasMovie movkg:Shining .\n",
    "    movkg:RoleShining_Jack movkg:hasContribClass movkg:Actor .\n",
    "    movkg:RoleShining_Jack movkg:hasContrib movkg:JackNicholson .\n",
    "\n",
    "    movkg::RoleShining_Kubrick_Dir a movkg:RoleResource .\n",
    "    movkg::RoleShining_Kubrick_Dir movkg:hasMovie movkg:Shining .\n",
    "    movkg::RoleShining_Kubrick_Dir movkg:hasContribClass movkg:Director .\n",
    "    movkg::RoleShining_Kubrick_Dir movkg:hasContrib movkg:StanleyKubrick .\n",
    "\n",
    "    movkg:RoleShining_Kubrick_Prod a movkg:RoleResource .\n",
    "    movkg:RoleShining_Kubrick_Prod movkg:hasMovie movkg:Shining .\n",
    "    movkg:RoleShining_Kubrick_Prod movkg:hasContribClass movkg:Producer .\n",
    "    movkg:RoleShining_Kubrick_Prob movkg:hasContrib movkg:StanleyKubrick .\n",
    "\n",
    "    movkg:JackNicholson a movkg:ContributorResource . \n",
    "    movkg:JackNihcolson movkg:ContribID \"nm0000197\" .\n",
    "    movkg:JackNicholson movkg:hasDBPediaRef <http://dbpedia.org/resource/Jack_Nicholson> .\n",
    "    movkg:JackNicholson movkg:hasWikidataRef <https://www.wikidata.org/entitiy/Q39792> .\n",
    "\n",
    "    movkg:StanleyKubrick a movkg:ContributorResource . \n",
    "    movkg:StanleyKubrick movkg:ContribID \"nm0000040\" .\n",
    "    movkg:StanleyKubrick movkg:hasDBPediaRef <http://dbpedia.org/resource/Stanley_Kubrick> .\n",
    "    movkg:StanleyKubrick movkg:hasWikidataRef <https://www.wikidata.org/entitiy/Q2001> .\n",
    "\n",
    "    # stories that mention\n",
    "    movkg:Story_Staycation_in_Hollywood a movkg:StorytResource .\n",
    "    movkg:Story_Staycation_in_Hollywood movstory:StoryTitle \"Staycation in Hollywood\" .\n",
    "    movkg:Story_Staycation_in_Hollywood movkg:mentions movkg:Shining .\n",
    "    movkg:Story_Starve_Cabin_Fever_Until_Spring a movkg:StorytResource .\n",
    "    movkg:Story_Starve_Cabin_Fever_Until_Spring movstory:StoryTitle \"Starve Cabin Fever Until Spring\" .\n",
    "    movkg:Story_Starve_Cabin_Fever_Until_Spring movkg:mentions movkg:Shining .\n",
    "    \n",
    "    # video analysis\n",
    "    movkg:Analysis_123456789 a movkg:VideoAnalysisResource .\n",
    "    movkg:Shining movkg:hasVideoAnalysis movkg:Analysis_123456789 .\n",
    "    movkg:Analysis_123456789 movvideo:VideoID \"123456789\" .\n",
    "    movkg:Analysis_123456789 movvideo:S3IngestLocation \"s3://va_abcderfg_123456789/ingest\" .\n",
    "    movkg:Analysis_123456789 movvideo:S3AnalysisLocation \"s3://va_abcderfg_123456789/analysis\" .\n",
    "    movkg:Analysis_123456789 movvideo:MP4FileName \"0081505_shining.mp4\" .\n",
    "    movkg:Analysis_123456789 movkg:hasRekognitionCeleb movkg:Analysis_123456789_celeb0 .\n",
    "    movkg:Analysis_123456789_celeb0 movkg:celebName \"Jeff Bezos\" .\n",
    "    movkg:Analysis_123456789_celeb0 movkg:hasWikidataRef <http://www.wikidata.org/entity/Q312556> .\n",
    "    movkg:Analysis_123456789_celeb0 movdoc:ContribID \"nm1757263\" . # this is an IMDB ID\n",
    "    # more detail on occurences of cebeb in video in S3AnalysisLocation given above\n",
    " \n",
    "}"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "506f4515",
   "metadata": {},
   "source": [
    "### With MovieID (IMDBID) as input, get basic details of the movie"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "3ae59b2d",
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "%%sparql\n",
    "\n",
    "PREFIX : <http://amazon.com/aws/wwso/neptune/demo/multimodel/> \n",
    "prefix movkg:           <http://amazon.com/aws/wwso/neptune/demo/multimodel/kg/> \n",
    "prefix movvideo:           <http://amazon.com/aws/wwso/neptune/demo/multimodel/video/> \n",
    "prefix movstory:           <http://amazon.com/aws/wwso/neptune/demo/multimodel/story/> \n",
    "prefix movlake:           <http://amazon.com/aws/wwso/neptune/demo/multimodel/lake/> \n",
    "prefix movdoc:           <http://amazon.com/aws/wwso/neptune/demo/multimodel/moviedoc/> \n",
    "prefix aws:           <http://amazon.com/aws/wwso/neptune/demo/multimodel/aws/> \n",
    "\n",
    "SELECT ?movie ?dbp ?wiki ?storyMention ?video ?mp4\n",
    "WHERE \n",
    "{\n",
    "    ?movie movdoc:MovieID \"tt0081505\" .\n",
    "    ?movie a movkg:MovieResource .\n",
    "    OPTIONAL {?movie movkg:hasDBPediaRef ?dbp . } .\n",
    "    OPTIONAL {?movie movkg:hasWikidataRef ?wiki . } .\n",
    "    \n",
    "    # bring in story mentions\n",
    "    OPTIONAL {?storyMention movkg:mentions ?movie . } .\n",
    "    \n",
    "    # bring in video analyis\n",
    "    OPTIONAL {?movie movkg:hasVideoAnalysis ?video . ?video movvideo:MP4FileName ?mp4 . } .\n",
    "} \n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "99ec3c62",
   "metadata": {},
   "source": [
    "### Knowing the movie URI, DESCRIBE it\n",
    "\n",
    "See https://docs.aws.amazon.com/neptune/latest/userguide/sparql-query-hints-for-describe.html for more on DESCRIBE in Neptune. Try the Graph view too."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "0ad10064",
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "%%sparql\n",
    "\n",
    "PREFIX : <http://amazon.com/aws/wwso/neptune/demo/multimodel/> \n",
    "prefix movkg:           <http://amazon.com/aws/wwso/neptune/demo/multimodel/kg/> \n",
    "PREFIX hint: <http://aws.amazon.com/neptune/vocab/v01/QueryHints#>\n",
    "\n",
    "describe movkg:Shining\n",
    "{\n",
    "  hint:Query hint:describeMode \"CBD\"\n",
    "}"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3e17f2f8",
   "metadata": {},
   "source": [
    "### Get video analysis - celebs"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "7803a306",
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "%%sparql\n",
    "\n",
    "PREFIX : <http://amazon.com/aws/wwso/neptune/demo/multimodel/> \n",
    "prefix movkg:           <http://amazon.com/aws/wwso/neptune/demo/multimodel/kg/> \n",
    "prefix movvideo:           <http://amazon.com/aws/wwso/neptune/demo/multimodel/video/> \n",
    "prefix movstory:           <http://amazon.com/aws/wwso/neptune/demo/multimodel/story/> \n",
    "prefix movlake:           <http://amazon.com/aws/wwso/neptune/demo/multimodel/lake/> \n",
    "prefix movdoc:           <http://amazon.com/aws/wwso/neptune/demo/multimodel/moviedoc/> \n",
    "prefix aws:           <http://amazon.com/aws/wwso/neptune/demo/multimodel/aws/> \n",
    "\n",
    "SELECT ?movie ?mp4 ?celebName ?celebWikdata ?celebIMDB ?roleX\n",
    "WHERE \n",
    "{\n",
    "    ?movie movdoc:MovieID \"tt0081505\" .\n",
    "    ?movie a movkg:MovieResource .\n",
    "    \n",
    "    ?movie movkg:hasVideoAnalysis ?video . \n",
    "    ?video movvideo:MP4FileName ?mp4 .\n",
    "    OPTIONAL {\n",
    "        # bring in celebs in video analysis\n",
    "        ?video movkg:hasRekognitionCeleb ?celeb .\n",
    "        ?celeb movkg:celebName ?celebName .\n",
    "        ?celeb movkg:hasWikidataRef ?celebWikdata .\n",
    "        ?celeb movdoc:ContribID ?celebIMDB .\n",
    "        OPTIONAL {\n",
    "            # Is the celeb a contributor in the movie\n",
    "            ?roleX movkg:hasContributor ?contribX .\n",
    "            ?contribX a movkg:ContributorResource .\n",
    "            ?roleX movkg:hasMovie ?movie .\n",
    "            ?contribX movdoc:ContribID ?celebIMDB . \n",
    "        }\n",
    "    } \n",
    "} \n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1831874b",
   "metadata": {},
   "source": [
    "### Pull in DBPedia"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "fcd66e59",
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "%%sparql\n",
    "\n",
    "PREFIX : <http://amazon.com/aws/wwso/neptune/demo/multimodel/> \n",
    "prefix movkg:           <http://amazon.com/aws/wwso/neptune/demo/multimodel/kg/> \n",
    "prefix movdoc:           <http://amazon.com/aws/wwso/neptune/demo/multimodel/moviedoc/> \n",
    "prefix aws:           <http://amazon.com/aws/wwso/neptune/demo/multimodel/aws/> \n",
    "\n",
    "SELECT ?p ?o \n",
    "WHERE \n",
    "{\n",
    "    ?movie movdoc:MovieID \"tt0081505\" .\n",
    "    ?movie a movkg:MovieResource .\n",
    "    ?movie movkg:hasDBPediaRef ?dbp .\n",
    "    SERVICE <https://dbpedia.org/sparql> {\n",
    "        ?dbp ?p ?o . \n",
    "    }\n",
    "}\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ad7bf44a",
   "metadata": {},
   "source": [
    "### Pull in Wikidata"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "f451cbab",
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "%%sparql\n",
    "\n",
    "PREFIX : <http://amazon.com/aws/wwso/neptune/demo/multimodel/> \n",
    "prefix movkg:           <http://amazon.com/aws/wwso/neptune/demo/multimodel/kg/> \n",
    "prefix movdoc:           <http://amazon.com/aws/wwso/neptune/demo/multimodel/moviedoc/> \n",
    "\n",
    "SELECT ?p ?o \n",
    "WHERE \n",
    "{\n",
    "    ?movie movdoc:MovieID \"tt0081505\" .\n",
    "    ?movie a movkg:MovieResource .\n",
    "    ?movie movkg:hasWikidataRef ?wiki .\n",
    "    SERVICE <https://query.wikidata.org/sparql> {\n",
    "        ?wiki ?p ?o . \n",
    "    }    \n",
    "}\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f001571a",
   "metadata": {},
   "source": [
    "## Cleanup\n",
    "If you messed up... \n",
    "Either of the two approaches works."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "01386517",
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "%%sparql\n",
    "\n",
    "delete {?s ?p ?o} where {?s ?p ?o}"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "c18907a9",
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "%db_reset"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.12"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}