{ "cells": [ { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%run '../util/neptune.py'" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "g = neptune.graphTraversal(neptune_endpoint=os.environ['NEPTUNE_READER_ENDPOINT'])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Find people with the most followers\n", "\n", "The following query calculates the vertex degree for every node in the graph, based on incoming 'follows' relationships. It then prints the top ten most followed people in the graph and their degree (number of followers).\n", "\n", "Because this is a full-graph query, this may take several seconds to complete." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "mostFollowersIds = (g.V().\n", " project('v','degree').\n", " by(id).\n", " by(inE('follows').count()).\n", " order().by(select('degree'),decr).\n", " select('v','degree').limit(10).\n", " toList())\n", "\n", "\n", "\n", "print(json.dumps(mostFollowersIds, indent=2))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Find people who follow the most people\n", "\n", "The following query calculates the vertex degree for every node in the graph, based on outgoing 'follows' relationships. It then prints the top ten people with the most outgoing 'follows' relationships.\n", "\n", "Because this is a full-graph query, this may take several seconds to complete." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "mostFollowingIds = (g.V().\n", " project('v','degree').\n", " by(id).\n", " by(outE('follows').count()).\n", " order().by(select('degree'),decr).\n", " select('v','degree').limit(10).\n", " toList())\n", "\n", "print(json.dumps(mostFollowingIds, indent=2))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Recommend new people to follow\n", "\n", "In the next query we'll take the person following the greatest number of people (from the results of the query above), and recommend new people they may wish to follow.\n", "\n", "The query looks for people followed by the people the user follows who are not currently followed by the user. For example, given user A, if A follows B, B follows C, but A does not follow C, then C will be of interest to us. \n", "\n", "Having found people at depth 2 who are not currently followed by the user, but who are indirectly connected to the user by way of the people the user follows at depth 1, the query then counts the number of paths that link each person at depth 2 to the user, and sorts the results based on the number of paths. For example, if there are 5 different paths to C (by way of 5 different people at depth 1), and 7 different paths to D (by way of 7 different people at depth 1), then D will come higher in the results than C." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "personId = mostFollowingIds[0]['v']\n", "\n", "recommendations = (g.V(personId).as_('user'). \n", " # depth 1\n", " choose(\n", " select('user').outE('follows').count().as_('user-count').is_(lt(100)), # if user follows <100 people\n", " out('follows'), # follow all edges\n", " choose(\n", " select('user-count').is_(lt(1000)), # else if user follows <1000 people\n", " out('follows').coin(0.5), # follow half the edges\n", " out('follows').coin(0.1) # otherwise, follow one tenth the edges\n", " )\n", " ).as_('f').aggregate('following').\n", " # depth 2\n", " choose(\n", " select('f').outE('follows').count().as_('f-count').is_(lt(1000)), # if total number people followed by people user follows < 1000\n", " out('follows'), # follow all edges\n", " choose(\n", " select('f-count').is_(lt(10000)), # if total number people followed by people user follows < 10,000\n", " out('follows').coin(0.5), # follow half the edges\n", " out('follows').coin(0.1) # otherwise, follow one tenth the edges\n", " )\n", " ).where(neq('user')).where(without('following')).\n", " group().by().by(count()).unfold().\n", " project('v', 'count').\n", " by(keys).\n", " by(values).\n", " order().by(select('count'), decr).\n", " limit(10).\n", " project('firstName', 'lastName', 'id', 'count').\n", " by(select('v').by('firstName')).\n", " by(select('v').by('lastName')).\n", " by(select('v').by('id')).\n", " by(select('count')).\n", " toList())\n", "\n", "print(json.dumps(recommendations, indent=2))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "conda_python3", "language": "python", "name": "conda_python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.5" } }, "nbformat": 4, "nbformat_minor": 2 }