{ "cells": [ { "cell_type": "markdown", "metadata": { }, "source": [ "# Step 1\n", "\n", "In this notebook we prepare the environment – loading data into MySQL, and crawling the database in order to populate the AWS Glue data catalog." ] }, { "cell_type": "markdown", "metadata": { }, "source": [ "# Load MySQL\n", "\n", "First, we'll use AWS Glue to load MySQL with a sample dataset using a Glue job, `load_salesdb`, the script for which can be found [here](https://github.com/aws-samples/amazon-neptune-samples/tree/master/gremlin/glue-neptune/glue-jobs/common/load_mysql.py)." ] }, { "cell_type": "code", "execution_count": null, "metadata": { }, "outputs": [ ], "source": [ "%run './glue_utils.py'" ] }, { "cell_type": "code", "execution_count": null, "metadata": { }, "outputs": [ ], "source": [ "job_name = glue_resource('load_salesdb')\n", "crawler_name = glue_resource('salesdb_crawler')\n", "database_name = glue_resource('salesdb')" ] }, { "cell_type": "code", "execution_count": null, "metadata": { }, "outputs": [ ], "source": [ "run_job(job_name)" ] }, { "cell_type": "markdown", "metadata": { }, "source": [ "## Crawl MySQL database\n", "\n", "Next, we'll crawl the MySQL database using an AWS Glue Crawler. These table definitions will later be used by our ETL scripts to get data from MySQL." ] }, { "cell_type": "code", "execution_count": null, "metadata": { }, "outputs": [ ], "source": [ "run_crawler(crawler_name)\n", "\n", "tables = list_tables(database_name)\n", "for table in tables:\n", " print(table)" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.12" } }, "nbformat": 4, "nbformat_minor": 2 }