{ "cells": [ { "attachments": {}, "cell_type": "markdown", "id": "fc5237e9", "metadata": {}, "source": [ "# GraphETL Demo" ] }, { "attachments": {}, "cell_type": "markdown", "id": "b2cdd1da", "metadata": {}, "source": [ "This notebook accompanies the blog post for the GraphETL sample. It demonstrates basic queries using the graph traversal language Gremlin." ] }, { "attachments": {}, "cell_type": "markdown", "id": "069e28fd", "metadata": {}, "source": [ "### Pre-requisites" ] }, { "attachments": {}, "cell_type": "markdown", "id": "03f72dcf", "metadata": {}, "source": [ "This notebook assumes you have successfully deployed the [GraphETL](https://github.com/aws-samples/etl-into-amazon-neptune-graph) sample" ] }, { "attachments": {}, "cell_type": "markdown", "id": "fb2498f1", "metadata": {}, "source": [ "### Visualize the graph" ] }, { "attachments": {}, "cell_type": "markdown", "id": "d101d5a2", "metadata": {}, "source": [ "Make sure to click the `Graph` tab for queries that end with the `path()` step." ] }, { "cell_type": "code", "execution_count": null, "id": "15f8a0d2", "metadata": { "scrolled": true }, "outputs": [], "source": [ "%%gremlin\n", "\n", "g.V().bothE().otherV().path().by(elementMap())" ] }, { "attachments": {}, "cell_type": "markdown", "id": "7e3843a1", "metadata": {}, "source": [ "### Exploring the graph\n", "\n", "We can use the following query to see what entities have been extracted from our media files. " ] }, { "cell_type": "code", "execution_count": null, "id": "aae963e7", "metadata": {}, "outputs": [], "source": [ "%%gremlin\n", "\n", "g.V().label().groupCount().unfold().order()" ] }, { "cell_type": "code", "execution_count": null, "id": "55660bcc", "metadata": {}, "outputs": [], "source": [ "%%gremlin\n", "\n", "g.E().label().groupCount()" ] }, { "attachments": {}, "cell_type": "markdown", "id": "25b4a287", "metadata": {}, "source": [ "### Exploring connections" ] }, { "attachments": {}, "cell_type": "markdown", "id": "ff1875a7", "metadata": {}, "source": [ "Which node in our graph is connected to the most nodes?" ] }, { "cell_type": "code", "execution_count": null, "id": "bdd74eae", "metadata": { "scrolled": true }, "outputs": [], "source": [ "%%gremlin\n", "\n", "g.V()\n", " .project('node','degree')\n", " .by(id())\n", " .by(both().count())\n", " .order().by('degree',desc)" ] }, { "attachments": {}, "cell_type": "markdown", "id": "221516b8", "metadata": {}, "source": [ "Let's take a closer look at the node that's connected to the most other nodes (Animal)." ] }, { "cell_type": "code", "execution_count": null, "id": "f974cc1b", "metadata": {}, "outputs": [], "source": [ "%%gremlin -le 40\n", "\n", "g.V('NODE_ID') # An example NODE_ID looks like '3f257e684a3beb0e303fe0572ab07e1de2950880f59821b6ff7449013ee3a063'\n", " .bothE().otherV().path().by(elementMap())" ] }, { "attachments": {}, "cell_type": "markdown", "id": "6b91c4c4", "metadata": {}, "source": [ "What entities were detected in the `video01.mp4` media file?" ] }, { "cell_type": "code", "execution_count": null, "id": "226c22f3", "metadata": {}, "outputs": [], "source": [ "%%gremlin\n", "\n", "g.V()\n", " .hasLabel('video01.mp4')\n", " .bothE('APPEARS_IN').otherV()\n", " .path().by(elementMap())\n" ] }, { "attachments": {}, "cell_type": "markdown", "id": "706b987e", "metadata": {}, "source": [ "Does it have any shared entities with the `video02.mp4` media file?" ] }, { "cell_type": "code", "execution_count": null, "id": "cc69172b", "metadata": {}, "outputs": [], "source": [ "%%gremlin\n", "\n", "g.V()\n", " .hasLabel('video01.mp4')\n", " .repeat(bothE('APPEARS_IN').otherV()).times(2)\n", " .hasLabel('video02.mp4')\n", " .path().by(elementMap())\n" ] }, { "attachments": {}, "cell_type": "markdown", "id": "ad5a56d5", "metadata": {}, "source": [ "What entities are present in all 3 of our media files?" ] }, { "cell_type": "code", "execution_count": null, "id": "044de17d", "metadata": { "scrolled": true }, "outputs": [], "source": [ "%%gremlin -le 40\n", "\n", "g.V()\n", " .where(out('APPEARS_IN').hasLabel('video01.mp4'))\n", " .where(out('APPEARS_IN').hasLabel('video02.mp4'))\n", " .where(out('APPEARS_IN').hasLabel('video03.mp4'))\n", " .bothE('APPEARS_IN').otherV()\n", " .hasLabel(within('video01.mp4','video02.mp4','video03.mp4'))\n", " .path().by(elementMap())" ] }, { "attachments": {}, "cell_type": "markdown", "id": "c6466c30", "metadata": {}, "source": [ "For objects that have a confidence score greater than 80, how do they appear with other objects of confidence score greater than 80?" ] }, { "cell_type": "code", "execution_count": null, "id": "9746dd66", "metadata": { "scrolled": true }, "outputs": [], "source": [ "%%gremlin -le 40\n", "\n", "g.V().has('TYPE','OBJECT')\n", " .has('CONFIDENCE',gt(80))\n", " .bothE('APPEARS_WITH').otherV()\n", " .has('CONFIDENCE',gt(80))\n", " .path().by(elementMap())" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.8" } }, "nbformat": 4, "nbformat_minor": 5 }