# EMR Serverless Java job This example shows how to run a Java Spark job on EMR Serverless using pre-intialized capacity. We'll use a simple Hello World application from the [./hello-world](./hello-world/) directory. _ℹ️ Throughout this demo, I utilize environment variables to allow for easy copy/paste_ ## Setup _You should have already completed the pre-requisites in this repo's [README](/README.md)._ - Define some environment variables to be used later ```shell export S3_BUCKET= export JOB_ROLE_ARN=arn:aws:iam:::role/emr-serverless-job-role ``` ## Create EMR Serverless application We create an EMR Serverless 6.9.0 application, which will run Spark 3.3.0. We'll also create an initial capacity of 1 driver and 2 workers. ```bash aws emr-serverless create-application \ --type SPARK \ --name serverless-java-demo \ --release-label "emr-6.9.0" \ --initial-capacity '{ "DRIVER": { "workerCount": 1, "workerConfiguration": { "cpu": "4vCPU", "memory": "16GB" } }, "EXECUTOR": { "workerCount": 3, "workerConfiguration": { "cpu": "4vCPU", "memory": "16GB" } } }' ``` We'll set an `APPLICATION_ID` environment variable to reuse later. ```bash export APPLICATION_ID=00et0dhmhuokmr09 ``` Then start our application. ```shell aws emr-serverless start-application \ --application-id $APPLICATION_ID ``` This will create the application and pre-provision the capacity defined above. The capacity will continue running for 15 minutes, which is the default idle timeout for the auto-stop configuration. ## Build and package job First we package a jar containing our application and copy the resulting JAR to S3. ```bash mvn package aws s3 cp target/java-demo-1.0.jar s3://${S3_BUCKET}/code/java-spark/ ``` ## Run our job Now we can run our hello-world app. We'll also configure Spark logs to be delivered to our S3 bucket. ```bash aws emr-serverless start-job-run \ --application-id $APPLICATION_ID \ --execution-role-arn $JOB_ROLE_ARN \ --job-driver '{ "sparkSubmit": { "entryPoint": "s3://'${S3_BUCKET}'/code/java-spark/java-demo-1.0.jar", "sparkSubmitParameters": "--class HelloWorld" } }' \ --configuration-overrides '{ "monitoringConfiguration": { "s3MonitoringConfiguration": { "logUri": "s3://'${S3_BUCKET}'/logs/" } } }' ``` > **Note**: We don't specify any Spark CPU or memory configurations - the defaults [defined here](https://docs.aws.amazon.com/emr/latest/EMR-Serverless-UserGuide/jobs-spark.html#spark-defaults) fit into the pre-init capacity we created above. We used a value of `16GB` for our pre-init capacity because the Spark default is `14GB` plus 10% for Spark memory overhead. ## Clean up When you're all done, make sure to call `stop-application` to decommission your capacity and `delete-application` if you're all done. ```shell aws emr-serverless stop-application \ --application-id $APPLICATION_ID ``` ```shell aws emr-serverless delete-application \ --application-id $APPLICATION_ID ```