{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "44cde1cc",
   "metadata": {},
   "source": [
    "# Connect RDBMS using jdbc connector from EMR Studio Notebook using Pyspark, Spark Scala, and SparkR\n",
    "\n",
    "#### Topics covered in this example\n",
    "\n",
    "* Configuring jdbc driver\n",
    "* Connecting to database using jdbc to read data\n",
    "* Connecting to database using jdbc to write data"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e63a1147",
   "metadata": {},
   "source": [
    "## Table of Contents:\n",
    "\n",
    "1. [Prerequisites](#Prerequisites)\n",
    "2. [Introduction](#Introduction)\n",
    "3. [Upload the MySQL jdbc driver in S3 and declare the path](#Upload-the-MySQL-jdbc-driver-in-S3-and-declare-the-path)\n",
    "4. [Read and write data using Pyspark](#Read-and-write-data-using-Pyspark)\n",
    "5. [Read and write data using Scala](#Read-and-write-data-using-Scala)\n",
    "6. [Read and write data using SparkR](#Read-and-write-data-using-SparkR)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1138c207",
   "metadata": {},
   "source": [
    "## Prerequisites"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7a9f2d7b",
   "metadata": {},
   "source": [
    "Download jdbc driver and upload it on S3 which is accessible from the Amazon EMR cluster attached to the Amazon EMR Studio. "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "34d0aa33",
   "metadata": {},
   "source": [
    "## Introduction"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f0706f71",
   "metadata": {},
   "source": [
    "This notebooks shows how to connect RDBS using jdbc connector from Amazon EMR Studio Notebook. "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "980ad752",
   "metadata": {},
   "source": [
    "## Upload the MySQL jdbc driver in S3 and declare the path"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "fd99a56a",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%configure -f\n",
    "{\n",
    "    \"conf\": {\n",
    "        \"spark.jars\": \"s3://<S3 Bucket>/jars/mysql-connector-java-8.0.19.jar\"        \n",
    "    }\n",
    "}"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1c41e375",
   "metadata": {},
   "source": [
    "## Read and write data using Pyspark"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "1c954008",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%pyspark\n",
    "\n",
    "#Declare the variables and replace the variables values as appropiate\n",
    "\n",
    "str_jdbc_url=\"jdbc:mysql://<Database endpoint>:3306/<Database name>\"\n",
    "str_Query= \"<Select query>\"\n",
    "str_dbname=\"<Database name>\"\n",
    "str_username=\"<Username>\"\n",
    "str_password=\"<Password>\"\n",
    "str_tgt_table=\"<Target Table>\"\n",
    "\n",
    "# Read data from source table\n",
    "\n",
    "jdbcDF = spark.read \\\n",
    "    .format(\"jdbc\") \\\n",
    "    .option(\"url\", str_jdbc_url) \\\n",
    "    .option(\"query\", str_Query) \\\n",
    "    .option(\"user\", str_username) \\\n",
    "    .option(\"password\", str_password) \\\n",
    "    .load()\n",
    "\n",
    "jdbcDF.limit(5).show()\n",
    "\n",
    "# Write data to the target database\n",
    "\n",
    "jdbcDF.write \\\n",
    "    .format(\"jdbc\") \\\n",
    "    .option(\"url\", str_jdbc_url) \\\n",
    "    .option(\"dbtable\", str_tgt_table) \\\n",
    "    .option(\"user\", str_username) \\\n",
    "    .option(\"password\", str_password).mode(\"append\").save()\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5c26a471",
   "metadata": {},
   "source": [
    "## Read and write data using Scala"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "2666bd59",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%scalaspark\n",
    "\n",
    "#Declare the variables and replace the variables values as appropiate\n",
    "\n",
    "val str_jdbc_url=\"jdbc:mysql://<Database endpoint>:3306/<Database name>\"\n",
    "val str_Query= \"<Select Query>\"\n",
    "val str_dbname=\"<Database name>\"\n",
    "val str_username=\"<Username>\"\n",
    "val str_password=\"<Password>\"\n",
    "val str_tgt_table=\"<Target table>\"\n",
    "\n",
    "# Read data from source table\n",
    "\n",
    "val jdbcDF = (spark.read.format(\"jdbc\")\n",
    "  .option(\"url\", str_jdbc_url)\n",
    "  .option(\"query\", str_Query)\n",
    "  .option(\"user\", str_username)\n",
    "  .option(\"password\", str_password)\n",
    "  .load())\n",
    "\n",
    "jdbcDF.limit(5).show()\n",
    "\n",
    "# Write data to the target database\n",
    "\n",
    "val connectionProperties = new java.util.Properties\n",
    "connectionProperties.put(\"user\", str_username)\n",
    "connectionProperties.put(\"password\", str_password)\n",
    "\n",
    "jdbcDF.write.mode(\"append\").jdbc(str_jdbc_url, str_tgt_table, connectionProperties)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "066b6a73",
   "metadata": {},
   "source": [
    "## Read and write data using SparkR"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "898d69a1",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%rspark\n",
    "\n",
    "#Declare the variables and replace the variables values as appropiate\n",
    "\n",
    "str_jdbc_url=\"jdbc:mysql://<Database endpoint>:3306/<Database name>\"\n",
    "str_dbname=\"<Database name>\"\n",
    "str_username=\"<Username>\"\n",
    "str_password=\"<Password>\"\n",
    "str_tgt_table=\"<Target table>\"\n",
    "\n",
    "# Read data from source database\n",
    "\n",
    "df <- read.jdbc(str_jdbc_url, \n",
    "                \"(select employee_id, first_name, last_name, email, dept_name from notebook.employee e, notebook.dept d where e.department_id = d.department_id) AS tmp\", \n",
    "                user = str_username, \n",
    "                password = str_password)\n",
    "\n",
    "showDF(df)\n",
    "\n",
    "jdbcDF.limit(5).show()\n",
    "\n",
    "# Write data to the target database\n",
    "\n",
    "write.jdbc(df, \n",
    "           str_jdbc_url, \n",
    "           str_tgt_table,\n",
    "           user = str_username,\n",
    "           password = str_password,\n",
    "           mode = \"append\")\n"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "PySpark",
   "language": "",
   "name": "pysparkkernel"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "python",
    "version": 3
   },
   "mimetype": "text/x-python",
   "name": "pyspark",
   "pygments_lexer": "python3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}