{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.\n", "SPDX-License-Identifier: Apache-2.0" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "-" } }, "source": [ "# Social Network Recommendations\n", "\n", "In this example we're going to build a powerful social network predictive capability with some simple Gremlin queries. The techniques intrdocued here can be used to build predictions in other domains outside of social.\n", "\n", "### People You May Know\n", "\n", "A common feature of many social network applications is the ability to recommend People-You-May-Know (or People-You-May-Want-To-Know) – sometimes abbreviated PYMK.\n", "\n", "Using Amazon Neptune, we can implement a PYMK capability using a well-understood phenomenon called *triadic closure*. Tridaic closure is the \n", "tendency for elements at a very local level in a graph to form stable triangles as the data changes over time. This behaviour can be observed in graphs in all kinds of different domains. It's the basis of many homophily-based recommendation systems – systems that exploit the fact that similarity breeds connections. In this example we're going to look at using triadic closure in the context of a social network.\n", "\n", "Let's imagine we have a social network whose members include Bill, Terry and Sarah. Terry is friends with both Bill and Sarah; that is, Terry and Sarah have a mutual friend in Bill. \n", "\n", "Because they have Bill in common, there's a good chance that Sarah and Bill either already know one other or may get to know one another in the near future. Just looking at the graph, we can see they have both the *means* and the *motive* to be friends. Hanging around with Bill provides the means for Sarah and Terry to meet. And because they trust Bill, they have the motive to trust people with whom Bill is friends, increasing the chance that if they do meet, they'll form a connection and close the triangle.\n", "\n", "In the context of a social network, we can use triadic closure to implement PYMK. When a particular user logs into the system, we can look up their vertex in the graph, and then traverse their friend-of-a-friend network, looking for opportunities to close triangles. The more paths that extend from our user, through their immediate friends, to someone to whom they are not currently connected, the greater the likelihood the user may either already know that person, or may benefit from getting to know them." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Setup\n", "\n", "Before we begin, we'll clear any existing data from our Neptune cluster, using the cell magic `%%gremlin` and a subsequent drop query:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%%gremlin\n", "\n", "g.V().drop()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "How do we know which Neptune cluster to access? The cell magics exposed by Neptune Notebooks use a configuration located by default under `~/graph_notebook_config.json` At the time of initialization of the Sagemaker instance, this configuration is generated using environment variables derived from the cluster being connected to. \n", "\n", "You can check the contents of the configuration in two ways. You can print the file itself, or you can look for the configuration being used by the notebook which you have opened." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "scrolled": true }, "outputs": [], "source": [ "%%bash\n", "\n", "cat ~/graph_notebook_config.json" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "scrolled": true }, "outputs": [], "source": [ "%graph_notebook_config" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Create a Social Network\n", "\n", "Next, we'll create a small social network. Note that the script below comprises a single statement. All the vertices and edges here will be created in the context of a single transaction." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%%gremlin\n", "\n", "g.\n", "addV('User').property('name','Bill').property('birthdate', '1988-03-22').\n", "addV('User').property('name','Sarah').property('birthdate', '1992-05-03').\n", "addV('User').property('name','Ben').property('birthdate', '1989-10-21').\n", "addV('User').property('name','Lucy').property('birthdate', '1998-01-17').\n", "addV('User').property('name','Colin').property('birthdate', '2001-08-14').\n", "addV('User').property('name','Emily').property('birthdate', '1998-03-05').\n", "addV('User').property('name','Gordon').property('birthdate', '2002-12-04').\n", "addV('User').property('name','Kate').property('birthdate', '1995-02-12').\n", "addV('User').property('name','Peter').property('birthdate', '2001-02-27').\n", "addV('User').property('name','Terry').property('birthdate', '1989-10-02').\n", "addV('User').property('name','Alistair').property('birthdate', '1992-06-30').\n", "addV('User').property('name','Eve').property('birthdate', '2000-05-13').\n", "addV('User').property('name','Gary').property('birthdate', '1998-09-20').\n", "addV('User').property('name','Mary').property('birthdate', '1997-01-27').\n", "addV('User').property('name','Charlie').property('birthdate', '1989-11-02').\n", "addV('User').property('name','Sue').property('birthdate', '1994-03-08').\n", "addV('User').property('name','Arnold').property('birthdate', '2002-07-23').\n", "addV('User').property('name','Chloe').property('birthdate', '1988-11-04').\n", "addV('User').property('name','Henry').property('birthdate', '1996-03-15').\n", "addV('User').property('name','Josie').property('birthdate', '2003-08-21').\n", "V().hasLabel('User').has('name','Sarah').as('a').V().hasLabel('User').has('name','Bill').addE('FRIEND').to('a').property('strength',1).\n", "V().hasLabel('User').has('name','Colin').as('a').V().hasLabel('User').has('name','Bill').addE('FRIEND').to('a').property('strength',2).\n", "V().hasLabel('User').has('name','Terry').as('a').V().hasLabel('User').has('name','Bill').addE('FRIEND').to('a').property('strength',3).\n", "V().hasLabel('User').has('name','Peter').as('a').V().hasLabel('User').has('name','Colin').addE('FRIEND').to('a').property('strength',1).\n", "V().hasLabel('User').has('name','Kate').as('a').V().hasLabel('User').has('name','Ben').addE('FRIEND').to('a').property('strength',2).\n", "V().hasLabel('User').has('name','Kate').as('a').V().hasLabel('User').has('name','Lucy').addE('FRIEND').to('a').property('strength',3).\n", "V().hasLabel('User').has('name','Eve').as('a').V().hasLabel('User').has('name','Lucy').addE('FRIEND').to('a').property('strength',1).\n", "V().hasLabel('User').has('name','Alistair').as('a').V().hasLabel('User').has('name','Kate').addE('FRIEND').to('a').property('strength',2).\n", "V().hasLabel('User').has('name','Gary').as('a').V().hasLabel('User').has('name','Colin').addE('FRIEND').to('a').property('strength',3).\n", "V().hasLabel('User').has('name','Gordon').as('a').V().hasLabel('User').has('name','Emily').addE('FRIEND').to('a').property('strength',1).\n", "V().hasLabel('User').has('name','Alistair').as('a').V().hasLabel('User').has('name','Emily').addE('FRIEND').to('a').property('strength',3).\n", "V().hasLabel('User').has('name','Terry').as('a').V().hasLabel('User').has('name','Gordon').addE('FRIEND').to('a').property('strength',3).\n", "V().hasLabel('User').has('name','Alistair').as('a').V().hasLabel('User').has('name','Terry').addE('FRIEND').to('a').property('strength',1).\n", "V().hasLabel('User').has('name','Gary').as('a').V().hasLabel('User').has('name','Terry').addE('FRIEND').to('a').property('strength',2).\n", "V().hasLabel('User').has('name','Mary').as('a').V().hasLabel('User').has('name','Terry').addE('FRIEND').to('a').property('strength',3).\n", "V().hasLabel('User').has('name','Henry').as('a').V().hasLabel('User').has('name','Alistair').addE('FRIEND').to('a').property('strength',1).\n", "V().hasLabel('User').has('name','Sue').as('a').V().hasLabel('User').has('name','Eve').addE('FRIEND').to('a').property('strength',2).\n", "V().hasLabel('User').has('name','Sue').as('a').V().hasLabel('User').has('name','Charlie').addE('FRIEND').to('a').property('strength',3).\n", "V().hasLabel('User').has('name','Josie').as('a').V().hasLabel('User').has('name','Charlie').addE('FRIEND').to('a').property('strength',1).\n", "V().hasLabel('User').has('name','Henry').as('a').V().hasLabel('User').has('name','Charlie').addE('FRIEND').to('a').property('strength',2).\n", "V().hasLabel('User').has('name','Henry').as('a').V().hasLabel('User').has('name','Mary').addE('FRIEND').to('a').property('strength',3).\n", "V().hasLabel('User').has('name','Mary').as('a').V().hasLabel('User').has('name','Gary').addE('FRIEND').to('a').property('strength',1).\n", "V().hasLabel('User').has('name','Henry').as('a').V().hasLabel('User').has('name','Gary').addE('FRIEND').to('a').property('strength',2).\n", "V().hasLabel('User').has('name','Chloe').as('a').V().hasLabel('User').has('name','Gary').addE('FRIEND').to('a').property('strength',3).\n", "V().hasLabel('User').has('name','Henry').as('a').V().hasLabel('User').has('name','Arnold').addE('FRIEND').to('a').property('strength',1).\n", "next()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This is what the network looks like:\n", " \n", "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Create a Recommendation\n", "\n", "Let's now create a PYMK recommendation for a specific user.\n", "\n", "In the query below, we're finding the vertex that represents our user. We're then traversing `FRIEND` relationships (we don't care about relationship direction, so we're using `both()`) to find that user's immediate friends. We're then traversing another hop into the graph, looking for friends of those friends who _are not currently connected to our user_ (i.e., we're looking for the unclosed triangles).\n", "\n", "We then count the paths to these candidate friends, and order the results based on the number of times we can reach a candidate via one of the user's immediate friends." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "scrolled": true }, "outputs": [], "source": [ "%%gremlin\n", "\n", "g.V().hasLabel('User').has('name', 'Terry').as('user'). \n", " both('FRIEND').aggregate('friends'). \n", " both('FRIEND').\n", " where(P.neq('user')).where(P.without('friends')). \n", " groupCount().by('name'). \n", " order(Scope.local).by(values, Order.desc).\n", " next()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Using Friendship Strength to Improve Recommendations\n", "\n", "What if we wanted to base our recommendations only on resonably strong friendship bonds?\n", "\n", "If you look at the Gremlin we used to create our graph, you'll see that each `FRIEND` edge has a `strength` property. In the following query, the traversal applies a predicate to this `strength` property. Note that we use `bothE()` rather than `both()` to position the traversal on an edge, where we then apply the predicate. We proceed only where `strength` is greater than one." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "scrolled": true }, "outputs": [], "source": [ "%%gremlin\n", "\n", "g.V().hasLabel('User').has('name', 'Terry').as('user')\n", " .bothE('FRIEND') \n", " .has('strength', P.gt(1)).otherV()\n", " .aggregate('friends')\n", " .bothE('FRIEND')\n", " .has('strength', P.gt(1)).otherV()\n", " .where(P.neq('user')).where(P.without('friends'))\n", " .groupCount().by('name')\n", " .order(Scope.local).by(values, Order.desc)\n", " .next()\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Because we discount weak friendships even when traversing to immediate friends, this query can sometimes end up recommending people that have a weak direct tie to our user. But that makes sense in the context of our social domain: one of our close friends has a strong friendship with one of the people with whom we have a weak connection; therefore, we might predict that over time this weak bond will grow stronger." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## What's next?\n", "\n", "Curious about the business problems can be solved with graph? Check out these sample application notebooks for some inspiration.\n", "\n", "[Introduction to Fraud Graphs](../03-Sample-Applications/01-Fraud-Graphs/01-Building-a-Fraud-Graph-Application.ipynb)\n", "\n", "[Introduction to Knowledge Graphs](../03-Sample-Applications/02-Knowledge-Graphs/01-Building-a-Knowledge-Graph-Application.ipynb)\n", "\n", "[Introduction to Identity Graphs](../03-Sample-Applications/03-Identity-Graphs/01-Building-an-Identity-Graph-Application.ipynb)" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.5" } }, "nbformat": 4, "nbformat_minor": 2 }