# Amazon EMR Studio CDK Python project! This is an Amazon EMR Studio project for CDK development with Python. The `cdk.json` file tells the CDK Toolkit how to execute your app. This project is set up like a standard Python project. The initialization process also creates a virtualenv within this project, stored under the `.venv` directory. To create the virtualenv it assumes that there is a `python3` (or `python` for Windows) executable in your path with access to the `venv` package. If for any reason the automatic creation of the virtualenv fails, you can create the virtualenv manually. To manually create a virtualenv on MacOS and Linux: ``` $ python3 -m venv .venv ``` After the init process completes and the virtualenv is created, you can use the following step to activate your virtualenv. ``` $ source .venv/bin/activate ``` If you are a Windows platform, you would activate the virtualenv like this: ``` % .venv\Scripts\activate.bat ``` Once the virtualenv is activated, you can install the required dependencies. ``` (.venv) $ pip install -r requirements.txt ``` At this point you can now synthesize the CloudFormation template for this code.
(.venv) $ cdk synth \
-c vpc_name="your-vpc-name" \
-c emr_studio_name="your-emr-studio-name"
To add additional dependencies, for example other CDK libraries, just add
them to your `setup.py` file and rerun the `pip install -r requirements.txt`
command.
Use cdk `deploy command` to create the stack shown above.
(.venv) $ cdk deploy --require-approval never \
-c vpc_name="your-vpc-name" \
-c emr_studio_name="your-emr-studio-name"
For example,
(.venv) $ cdk deploy --require-approval never \
-c vpc_name="default" \
-c emr_studio_name="datalake-demo"
EmrStudioStack: building assets...
[0%] start: Building eb5eeb490dccbcd549ae27e0359b16b08361800c8444cf3e4a1c969a0c9c84e2:819320734790-us-east-1
[100%] success: Built eb5eeb490dccbcd549ae27e0359b16b08361800c8444cf3e4a1c969a0c9c84e2:819320734790-us-east-1
EmrStudioStack: assets built
EmrStudioStack: deploying...
[0%] start: Publishing eb5eeb490dccbcd549ae27e0359b16b08361800c8444cf3e4a1c969a0c9c84e2:819320734790-us-east-1
[100%] success: Published eb5eeb490dccbcd549ae27e0359b16b08361800c8444cf3e4a1c969a0c9c84e2:819320734790-us-east-1
...
Outputs:
EmrStudioStack.EmrStudioDefaultS3Location = s3://datalake-demo-emr-studio-us-east-1-a4hzjvb
EmrStudioStack.EmrStudioId = es-KWX8LX799XYDYTL7SAWH75UV
EmrStudioStack.EmrStudioName = datalake-demo
EmrStudioStack.EmrStudioUrl = https://es-KWX8LX799XYDYTL7SAWH75UV.emrstudio-prod.us-east-1.amazonaws.com
## Quick Start
After an EMR Studio is successfully created, click EMR Studio Url (check out `EmrStudioUrl` in CloudFormation Outputs section, e.g., https://es-KWX8LX799XYDYTL7SAWH75UV.emrstudio-prod.us-east-1.amazonaws.com).
When you use an EMR Studio, you can create and configure different Workspaces to organize and run notebooks.
Do the following steps to run your notebook.
- **(Step 1)** [Create an EMR Studio Workspace.](https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-studio-configure-workspace.html#emr-studio-create-workspace)
- **(Step 2)** [Launch a Workspace.](https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-studio-configure-workspace.html#emr-studio-use-workspace)
- **(Step 3)** Attach an EMR Cluster to the Jupyter Notebook.

## Useful commands
* `cdk ls` list all stacks in the app
* `cdk synth` emits the synthesized CloudFormation template
* `cdk deploy` deploy this stack to your default AWS account/region
* `cdk diff` compare deployed stack with current state
* `cdk docs` open CDK documentation
## References
* [Use an Amazon EMR Studio](https://docs.aws.amazon.com/emr/latest/ManagementGuide/use-an-emr-studio.html)
Enjoy!