{ "cells": [ { "cell_type": "markdown", "id": "44cde1cc", "metadata": {}, "source": [ "# Connect RDBMS using jdbc connector from EMR Studio Notebook using Pyspark, Spark Scala, and SparkR\n", "\n", "#### Topics covered in this example\n", "\n", "* Configuring jdbc driver\n", "* Connecting to database using jdbc to read data\n", "* Connecting to database using jdbc to write data" ] }, { "cell_type": "markdown", "id": "e63a1147", "metadata": {}, "source": [ "## Table of Contents:\n", "\n", "1. [Prerequisites](#Prerequisites)\n", "2. [Introduction](#Introduction)\n", "3. [Upload the MySQL jdbc driver in S3 and declare the path](#Upload-the-MySQL-jdbc-driver-in-S3-and-declare-the-path)\n", "4. [Read and write data using Pyspark](#Read-and-write-data-using-Pyspark)\n", "5. [Read and write data using Scala](#Read-and-write-data-using-Scala)\n", "6. [Read and write data using SparkR](#Read-and-write-data-using-SparkR)\n" ] }, { "cell_type": "markdown", "id": "1138c207", "metadata": {}, "source": [ "## Prerequisites" ] }, { "cell_type": "markdown", "id": "7a9f2d7b", "metadata": {}, "source": [ "Download jdbc driver and upload it on S3 which is accessible from the Amazon EMR cluster attached to the Amazon EMR Studio. " ] }, { "cell_type": "markdown", "id": "34d0aa33", "metadata": {}, "source": [ "## Introduction" ] }, { "cell_type": "markdown", "id": "f0706f71", "metadata": {}, "source": [ "This notebooks shows how to connect RDBS using jdbc connector from Amazon EMR Studio Notebook. " ] }, { "cell_type": "markdown", "id": "980ad752", "metadata": {}, "source": [ "## Upload the MySQL jdbc driver in S3 and declare the path" ] }, { "cell_type": "code", "execution_count": null, "id": "fd99a56a", "metadata": {}, "outputs": [], "source": [ "%%configure -f\n", "{\n", " \"conf\": {\n", " \"spark.jars\": \"s3:///jars/mysql-connector-java-8.0.19.jar\" \n", " }\n", "}" ] }, { "cell_type": "markdown", "id": "1c41e375", "metadata": {}, "source": [ "## Read and write data using Pyspark" ] }, { "cell_type": "code", "execution_count": null, "id": "1c954008", "metadata": {}, "outputs": [], "source": [ "%%pyspark\n", "\n", "#Declare the variables and replace the variables values as appropiate\n", "\n", "str_jdbc_url=\"jdbc:mysql://:3306/\"\n", "str_Query= \"\"\n", "val str_dbname=\"\"\n", "val str_username=\"\"\n", "val str_password=\"\"\n", "val str_tgt_table=\"\"\n", "\n", "# Read data from source table\n", "\n", "val jdbcDF = (spark.read.format(\"jdbc\")\n", " .option(\"url\", str_jdbc_url)\n", " .option(\"query\", str_Query)\n", " .option(\"user\", str_username)\n", " .option(\"password\", str_password)\n", " .load())\n", "\n", "jdbcDF.limit(5).show()\n", "\n", "# Write data to the target database\n", "\n", "val connectionProperties = new java.util.Properties\n", "connectionProperties.put(\"user\", str_username)\n", "connectionProperties.put(\"password\", str_password)\n", "\n", "jdbcDF.write.mode(\"append\").jdbc(str_jdbc_url, str_tgt_table, connectionProperties)\n" ] }, { "cell_type": "markdown", "id": "066b6a73", "metadata": {}, "source": [ "## Read and write data using SparkR" ] }, { "cell_type": "code", "execution_count": null, "id": "898d69a1", "metadata": {}, "outputs": [], "source": [ "%%rspark\n", "\n", "#Declare the variables and replace the variables values as appropiate\n", "\n", "str_jdbc_url=\"jdbc:mysql://:3306/\"\n", "str_dbname=\"\"\n", "str_username=\"\"\n", "str_password=\"\"\n", "str_tgt_table=\"\"\n", "\n", "# Read data from source database\n", "\n", "df <- read.jdbc(str_jdbc_url, \n", " \"(select employee_id, first_name, last_name, email, dept_name from notebook.employee e, notebook.dept d where e.department_id = d.department_id) AS tmp\", \n", " user = str_username, \n", " password = str_password)\n", "\n", "showDF(df)\n", "\n", "jdbcDF.limit(5).show()\n", "\n", "# Write data to the target database\n", "\n", "write.jdbc(df, \n", " str_jdbc_url, \n", " str_tgt_table,\n", " user = str_username,\n", " password = str_password,\n", " mode = \"append\")\n" ] } ], "metadata": { "kernelspec": { "display_name": "PySpark", "language": "", "name": "pysparkkernel" }, "language_info": { "codemirror_mode": { "name": "python", "version": 3 }, "mimetype": "text/x-python", "name": "pyspark", "pygments_lexer": "python3" } }, "nbformat": 4, "nbformat_minor": 5 }