{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "a0177726",
   "metadata": {},
   "source": [
    "# Multi-Language Support in Spark Kernels\n",
    "\n",
    "Topics covered in this example:\n",
    "\n",
    "* Using multi-language (Python, Scala, R and SQL) from within Spark Notebooks.\n",
    "* Sharing data across language using temp tables/views.\n",
    "\n",
    "***\n",
    "\n",
    "## Prerequisites\n",
    "<div class=\"alert alert-block alert-info\">\n",
    "<b>NOTE :</b> In order to execute this notebook successfully as is, please ensure the following prerequisites are completed.</div>\n",
    "\n",
    "* The EMR cluster attached to this notebook should have the `Spark` application installed.\n",
    "* The EMR cluster attached to this notebook should be version 6.4.0 or later.\n",
    "* This notebook uses the `PySpark` kernel.\n",
    "***\n",
    "\n",
    "## Introduction\n",
    "\n",
    "This example shows how to use multiple languages within Spark notebooks. You can mix and match Python, Scala, R and SQL from within Spark notebooks. Supported kernels are PySpark, Spark and SparkR kernels. \n",
    "\n",
    "***\n",
    "\n",
    "The `%%pyspark` cellmagic allows users to write pyspark code in all Spark kernels"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "831c3a58",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%pyspark\n",
    "a = 1 "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ca9be37c",
   "metadata": {},
   "source": [
    "The `%%sql` cellmagic allows users to execute Spark-SQL code. Here I am querying the tables in the default database."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "301ceb18",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%sql\n",
    "SHOW TABLES "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0d6b263c",
   "metadata": {},
   "source": [
    "The `%%rspark` cell magic allows users to execute sparkr code. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "ddf003b1",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%rspark\n",
    "a <- 1"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0cd9d9e1",
   "metadata": {},
   "source": [
    "The `%%scalaspark` cell magic allows users to execute spark scala code. Note that here I am reading data from the temp table previously creating using Python."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "faf303a4",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%scalaspark\n",
    "val a = 1"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2f8e1dcf",
   "metadata": {},
   "source": [
    "### Sharing Data using temp tables/views.\n",
    "\n",
    "You can share data between languages using temp tables. Lets create a temp table using python in Spark:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "9a81e181",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%pyspark\n",
    "df=spark.sql(\"SELECT count(1) from nyc_top_trips_report LIMIT 20\")\n",
    "df.createOrReplaceTempView(\"nyc_top_trips_report_v\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "154b5757",
   "metadata": {},
   "source": [
    "And now lets read the temp table using Scala:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "3e26a9b6",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%scalaspark\n",
    "val df=spark.sql(\"SELECT * from nyc_top_trips_report_v\")\n",
    "df.show(5)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "PySpark",
   "language": "",
   "name": "pysparkkernel"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "python",
    "version": 3
   },
   "mimetype": "text/x-python",
   "name": "pyspark",
   "pygments_lexer": "python3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}