# Using Jupyter Magics on EMR Studio


#### Topics covered in this example:

* Built-in Magics
* EMR Magics
    * Mounting a Workspace Directory
    * Executing local python files or package
    * Downloading a file


***

## Prerequisites
<div class="alert alert-block alert-info">
<b>NOTE :</b> In order to execute this notebook successfully as is, please ensure the following prerequisites are completed.</div>

* The EMR cluster attached to this notebook should have the `Spark` application installed.
* This notebook uses the `PySpark` kernel.
***



## Built-in Magics

Jupyter magics act as convenient functions that accomplish something useful and saves the effort of writing Python code instead. There are useful buiilt-in magic functions and some are unique to EMR Studio, we document them here: https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-studio-magics.html

The most important magic commands are:

1. `%%configure` - allows you to change the session properties of a Spark session in Spark Kernels:

```
%%configure -f
{ "conf": {
     spark.submit.deployMode":"cluster"
     }
}
```

2. `%%display` - is only available in Spark Kernels and allows you to display the rows of a Spark dataframe in a tabular format in addition to providing the ability to visualize the rows in a chart.

```
%%display df
```

Let"s see the `%%display` magic in action:

In [None]:
data = [{
    "Category": "A",
    "ID": 1,
    "Value": 121.44,
    "Truth": True
}, {
    "Category": "B",
    "ID": 2,
    "Value": 300.01,
    "Truth": False
}, {
    "Category": "C",
    "ID": 3,
    "Value": 10.99,
    "Truth": None
}, {
    "Category": "E",
    "ID": 4,
    "Value": 33.87,
    "Truth": True
}]

df = spark.createDataFrame(data)

In [None]:
%%display
df

## EMR Magics

The EMR magics package available here (https://pypi.org/simple emr-notebooks-magics) offers the following magics that can be used on Python3 kernels as well as Spark Kernels on EMR Studio. The two magics we discuss in this notebook are:

* mount_workspace_dir - allows you to mount an EMR Studio Workspace directory to an EMR Cluster.
* generate_s3_download_url - alows you to generate a temporary signed download URL for an S3 object.

Lets install the EMR-notebooks-magics package on your EMR Cluster:

```
%pip install -i https://test.pypi.org/simple/ --extra-index-url https://pypi.org/simple emr-notebooks-magics
```

### Mounting a Workspace Directory

Lets mount a Workpsace directory on to the EMR cluster:

In [None]:
%mount_workspace_dir <Workspace_Directory>

Note that your current directory changes to the mounted Workspace directory and you can list the contents in it.

In [None]:
%%sh
pwd

In [None]:
%%sh
ls

### Executing Local Python Files or Packages.




You can now execute local python files and packages.

In [None]:
%run -i  ""<local_python_file.py>"

### Downloading a File.

Sometimes we need to download a file to our local desktop for e.g. to further analyze some data in Excel. Lets now see the `generate_s3_download_url` magic in action that allows us to do just that. We save the dataframe as a Parquet file in an S3 bucket.

In [None]:
df = _
s3_url = "s3://<bucket>/<prefix>/<filename>.parquet.gzip"
df.to_parquet(s3_url, compression="gzip")

In [None]:
%%sh
aws s3 ls s3://<bucket>/<prefix>/<filename>.parquet.gzip

We can now generate a download URL for this file. Note that the url is a temporary one and the command provides options on how long the url should be available.

In [None]:
%generate_s3_download_url s3://<bucket>/<prefix>/<filename>.parquet.gzip

and then view the link for the output.