{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Step 2\n", "\n", "In this notebook we trigger a one-off export of data from MySQL to Neptune via S3. We use an AWS Glue job to export product, product category and supplier data to CSV files, which the job copies to S3. We then use the Neptune bulk load API to load the data from S3 into Neptune.\n", "\n", "For details on converting from a relational data model to a graph data model, see [Converting a Relational Data Model to a Graph Model](https://github.com/aws-samples/aws-dbs-refarch-graph/tree/master/src/converting-to-graph#converting-a-relational-data-model-to-a-graph-model)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Export from MySQL to S3\n", "\n", "First, we'll export some 'static' dimension data (product, product_category, supplier) from MySQL to S3 using a Glue job, `export_from_mysql_to_s3`, the script for which can be found [here](https://github.com/aws-samples/amazon-neptune-samples/tree/master/gremlin/glue-neptune/glue-jobs/mysql-neptune/export-from-mysql-to-s3.py). The script uses the [neptune-python-utils](https://github.com/awslabs/amazon-neptune-tools/tree/master/neptune-python-utils) Python library.\n", "\n", "The export creates CSV files formatted according to the Amazon Neptune [bulk load format](https://docs.aws.amazon.com/neptune/latest/userguide/bulk-load-tutorial-format-gremlin.html)." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%run './glue_utils.py'" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import nest_asyncio\n", "nest_asyncio.apply()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We'll clear the database before running the import to simulate this comprising a first step in a migration to a fresh Neptune database." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%db_reset -y" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now we'll run the [export job](https://github.com/aws-samples/amazon-neptune-samples/tree/master/gremlin/glue-neptune/glue-jobs/mysql-neptune/export-from-mysql-to-s3.py)." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "job_name = glue_resource('export_from_mysql_to_s3')" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "run_job(job_name)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Import from S3\n", "\n", "Next, we'll trigger Neptune's bulk load API, and load the graph using the CSV data in S3.\n", "\n", "The import creates Product and Supplier vertices and connects them with SUPPLIER edges. (A supplier supplies many products.)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import os\n", "from neptune_python_utils.bulkload import BulkLoad\n", "\n", "bulkload = BulkLoad(source=os.environ['S3_EXPORT_PATH'])\n", "bulkload.load()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Query the graph\n", "\n", "Let's query the graph to review the results of the import." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from neptune_python_utils.gremlin_utils import GremlinUtils\n", "\n", "GremlinUtils.init_statics(globals())\n", "\n", "gremlin_utils = GremlinUtils()\n", "\n", "conn = gremlin_utils.remote_connection()\n", "g = gremlin_utils.traversal_source(connection=conn)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Count the number of Product and Supplier vertices in the graph" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "print('# Products : {}'.format(g.V().hasLabel('Product').count().next()))\n", "print('# Suppliers: {}'.format(g.V().hasLabel('Supplier').count().next()))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Show the products supplied by a supplier\n", "\n", "Product vertices are connected to Supplier vertices using an outgoing SUPPLIER edge." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "results = (g.V().hasLabel('Supplier').limit(1).\n", " project('supplier', 'products').\n", " by('name').\n", " by(in_('SUPPLIER').values('name').fold()).next())\n", "\n", "print('Supplier: {}\\nProducts: {}'.format(results['supplier'], results['products']))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "conn.close()" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.12" } }, "nbformat": 4, "nbformat_minor": 2 }