## Connect DocumentDb using spark connector from EMR Studio Notebook using Pyspark, Spark Scala, and SparkR

#### Topics covered in this example

* Configuring mongodb spark connector
* Configuring mongodb input database URI
* Configuring mongodb output database URI
* Connecting to AWS DocumentDB using mongodb spark connector to read data into Spark DF
* Connecting to AWS DocumentDB using mongodb spark connector to write data from Spark DF to DocumentDB

## Table of Contents:

1. [Prerequisites](#Prerequisites)
2. [Introduction](#Introduction)
3. [Load the configuration in memory](#Load-the-configuration-in-memory)
4. [Read data using Pyspark](#Read-data-using-Pyspark)
5. [Write data using Pyspark](#Write-data-using-Pyspark)
6. [Read data using Scala](#Read-data-using-Scala)
7. [Write data using Scala](#Write-data-using-Scala)
8. [Read data using SparkR](#Read-data-using-SparkR)
9. [Write data using SparkR](#Write-data-using-SparkR)

## Prerequisites

 1. This notebook support Multi-language support for Spark kernels
 2. Mongo Spark Connector Version - mongo-spark-connector_2.12:3.0.1
 3. EMR Version - emr-6.4.0
 4. DocumentDB Engine Version - docdb 4.0.0

## Introduction

This notebooks shows how to connect to DocumentDB using mongo spark connector(mongo-spark-connector_2.12:3.0.1) from Amazon EMR Studio Notebook using Pyspark, Scala, SparkR

## Load the configuration in memory

In [None]:
%%configure -f
{
 "conf": {
 "spark.mongodb.input.uri": "mongodb://:@:/.?readPreference=secondaryPreferred",
 "spark.mongodb.output.uri": "mongodb://:@:/.",
 "spark.jars.packages": "org.mongodb.spark:mongo-spark-connector_2.12:3.0.1"
 }
}

## Read data using Pyspark

In [None]:
%%pyspark
df = spark.read.format("mongo").option("database", "").option("collection", "").load()
df.show()

## Write data using Pyspark

In [None]:
%%pyspark
people = spark.createDataFrame([("Bilbo Baggins", 50), ("Gandalf", 1000), ("Thorin", 195), ("Balin", 178), ("Kili", 77),
 ("Dwalin", 169), ("Oin", 167), ("Gloin", 158), ("Fili", 82), ("Bombur", None)], ["name", "age"])
people.show()
people.write.format("mongo").mode("append").option("database",
"").option("collection", "").save()
df_people = spark.read.format("mongo").option("database", "").option("collection", "").load()
df_people.show()

## Read data using Scala

In [None]:
%%scalaspark
val df = spark.read.format("mongo").option("database", "").option("collection", "").load()
df.show()

## Write data using Scala

In [None]:
%%scalaspark
import com.mongodb.spark._
import com.mongodb.spark.config._
val writeConfig = WriteConfig(Map("collection" -> "", "writeConcern.w" -> "majority"), Some(WriteConfig(sc)))
val sparkDocuments = sc.parallelize((1 to 10).map(i => Document.parse(s"{spark: $i}")))
MongoSpark.save(sparkDocuments, writeConfig)
val numbers_df = spark.read.format("mongo").option("database", "").option("collection", "").load()
numbers_df.show()

## Read data using SparkR

In [None]:
%%rspark
df <- read.df("", source = "com.mongodb.spark.sql.DefaultSource", database = "", collection = "")
showDF(df)

## Write data using SparkR

In [None]:
%%rspark
charactersRdf <- data.frame(list(name=c("Bilbo Baggins", "Gandalf", "Thorin",
 "Balin", "Kili", "Dwalin", "Oin", "Gloin", "Fili", "Bombur"),
 age=c(50, 1000, 195, 178, 77, 169, 167, 158, 82, NA)))
charactersSparkdf <- createDataFrame(charactersRdf)
write.df(charactersSparkdf, "", source = "com.mongodb.spark.sql.DefaultSource",
 mode = "overwrite", database = "", collection = "")
characters_df <- read.df("", source = "com.mongodb.spark.sql.DefaultSource",
 database = "", collection = "")
showDF(characters_df)