{ "cells": [ { "cell_type": "markdown", "id": "a0177726", "metadata": {}, "source": [ "# Multi-Language Support in Spark Kernels\n", "\n", "Topics covered in this example:\n", "\n", "* Using multi-language (Python, Scala, R and SQL) from within Spark Notebooks.\n", "* Sharing data across language using temp tables/views.\n", "\n", "***\n", "\n", "## Prerequisites\n", "
\n", "NOTE : In order to execute this notebook successfully as is, please ensure the following prerequisites are completed.
\n", "\n", "* The EMR cluster attached to this notebook should have the `Spark` application installed.\n", "* The EMR cluster attached to this notebook should be version 6.4.0 or later.\n", "* This notebook uses the `PySpark` kernel.\n", "***\n", "\n", "## Introduction\n", "\n", "This example shows how to use multiple languages within Spark notebooks. You can mix and match Python, Scala, R and SQL from within Spark notebooks. Supported kernels are PySpark, Spark and SparkR kernels. \n", "\n", "***\n", "\n", "The `%%pyspark` cellmagic allows users to write pyspark code in all Spark kernels" ] }, { "cell_type": "code", "execution_count": null, "id": "831c3a58", "metadata": {}, "outputs": [], "source": [ "%%pyspark\n", "a = 1 " ] }, { "cell_type": "markdown", "id": "ca9be37c", "metadata": {}, "source": [ "The `%%sql` cellmagic allows users to execute Spark-SQL code. Here I am querying the tables in the default database." ] }, { "cell_type": "code", "execution_count": null, "id": "301ceb18", "metadata": {}, "outputs": [], "source": [ "%%sql\n", "SHOW TABLES " ] }, { "cell_type": "markdown", "id": "0d6b263c", "metadata": {}, "source": [ "The `%%rspark` cell magic allows users to execute sparkr code. " ] }, { "cell_type": "code", "execution_count": null, "id": "ddf003b1", "metadata": {}, "outputs": [], "source": [ "%%rspark\n", "a <- 1" ] }, { "cell_type": "markdown", "id": "0cd9d9e1", "metadata": {}, "source": [ "The `%%scalaspark` cell magic allows users to execute spark scala code. Note that here I am reading data from the temp table previously creating using Python." ] }, { "cell_type": "code", "execution_count": null, "id": "faf303a4", "metadata": {}, "outputs": [], "source": [ "%%scalaspark\n", "val a = 1" ] }, { "cell_type": "markdown", "id": "2f8e1dcf", "metadata": {}, "source": [ "### Sharing Data using temp tables/views.\n", "\n", "You can share data between languages using temp tables. Lets create a temp table using python in Spark:" ] }, { "cell_type": "code", "execution_count": null, "id": "9a81e181", "metadata": {}, "outputs": [], "source": [ "%%pyspark\n", "df=spark.sql(\"SELECT count(1) from nyc_top_trips_report LIMIT 20\")\n", "df.createOrReplaceTempView(\"nyc_top_trips_report_v\")" ] }, { "cell_type": "markdown", "id": "154b5757", "metadata": {}, "source": [ "And now lets read the temp table using Scala:" ] }, { "cell_type": "code", "execution_count": null, "id": "3e26a9b6", "metadata": {}, "outputs": [], "source": [ "%%scalaspark\n", "val df=spark.sql(\"SELECT * from nyc_top_trips_report_v\")\n", "df.show(5)" ] } ], "metadata": { "kernelspec": { "display_name": "PySpark", "language": "", "name": "pysparkkernel" }, "language_info": { "codemirror_mode": { "name": "python", "version": 3 }, "mimetype": "text/x-python", "name": "pyspark", "pygments_lexer": "python3" } }, "nbformat": 4, "nbformat_minor": 5 }