# EMR Serverless CloudWatch Dashboard You can visualize EMR Serverless application and job metrics in a CloudWatch Dashboard. The CloudFormation template provided in this repo can be used to deploy the sample CloudWatch Dashboard for Spark applications on EMR Serverless. ## Overview The CloudWatch Dashboard provides an overview of pre-initialized capacity vs. OnDemand as well as drill-down metrics for CPU, memory, and disk usage for Spark Drivers and Executors. [Pre-initialized capacity is an optional feature of EMR Serverless](https://docs.aws.amazon.com/emr/latest/EMR-Serverless-UserGuide/application-capacity.html) that keeps driver and workers pre-initialized and ready to respond in seconds and this dashboard can help understand if pre-initialized capacity is being used effectively. In addition, you can see job-level metrics for the state of your jobs on a per-application basis. Follow along below to see how to get started or see a full demo in the [walkthrough](#example-walkthrough) section. ## Getting started **From the CloudFormation Console:** 1. Open the AWS CloudFormation console at https://console.aws.amazon.com/cloudformation. 2. Create a new stack by choosing: `Create Stack` 3. Choose the option `Upload a template file` and upload the `emr_serverless_cloudwatch_dashboard.yaml` template 4. In the next page, add a Stack name for the stack. 5. In the Parameters section, put in the `Application ID` of the EMR Serverless application you want to monitor 6. Choose `Create Stack` in the final page. **From the CLI:** Alternatively, you can also deploy this dashboard using the CLI: ``` aws cloudformation create-stack --region \ --stack-name emr-serverless-dashboard \ --template-body \ --parameters ParameterKey=ApplicationID,ParameterValue= ``` Once the stack is created, you'll have a new CloudWatch Dashboard with the name `emr-serverless-dashboard-` under Dashboards in the CloudWatch console at https://console.aws.amazon.com/cloudwatch/. ------------------ ## Dashboard Details The sample Cloudwatch Dashboard provides the following functionality: - Capacity Utilization Snapshot view - Shows current Pre-Initialized vs. OnDemand usage - Point in time view of Pre-Initialized Capacity Worker Utilization % - Point in time view of Available Workers (Drivers + Executors) - Point in time view of Running Drivers - Point in time view of Running Executors ![Screenshot of snapshot metrics for available workers and running Drivers and Executors](images/snapshot_metrics.png "Capacity Utilization Snapshot view") - Job Runs - Aggregate view of job states for your application per minute - Point in time counters for different states including Pending, Running, and Failed jobs. ![Screenshot of Job Runs](images/job-runs-overview.png "Job Runs") - Application Metrics - Shows capacity used by your application - Timeline view of Running Workers - Timeline view of CPU Allocated - Timeline view of Memory Allocated - Timeline view of Disk Allocated ![Screenshot of application metrics](images/application_metrics.png "Application Metrics") - Pre-Initialized Capacity Metrics - Shows how utilized the pre-initialized capacity is - Timeline view of Total Workers - Timeline view of idle Workers - Timeline view of Pre-Initialized Capacity Worker Utilization % (Workers used / Total Workers) ![Screenshot of pre-initialized capacity metrics](images/pre_initialized_capacity_metrics.png "Pre-Initialized Capacity Metrics") - Driver Metrics - Timeline view of Running Drivers - Timeline view of CPU Allocated - Timeline view of Memory Allocated - Timeline view of Disk Allocated ![Screenshot of Driver Metrics](images/driver_metrics.png "Driver Metrics") - Executors Metrics - Timeline view of Running Executors - Timeline view of CPU Allocated - Timeline view of Memory Allocated - Timeline view of Disk Allocated ![Screenshot of Executor Metrics](images/executor_metrics.png "Executor Metrics") - Job Metrics - Timeline view of Running Jobs - Timeline view of Success Jobs - Timeline view of Failed Jobs - Timeline view of Cancelled Jobs ![Screenshot of Job Metrics](images/job-metrics-breakdown.png "Job Metrics") ## Example Walkthrough Let's take a quick look and see how we can use the dashboard to optimize our EMR Serverless applications. In this example, we'll start an application with a limited set of pre-initialized capacity and run jobs that both fit and exceed that capacity and see what happens. > **Note**: EMR Serverless sends metrics to Amazon CloudWatch every 1 minute, so you may see different behavior depending on how quickly you run the commands. ### Pre-requisites To walk through this demo, you'll need the following: - Access to EMR Serverless and the ability to create new CloudFormation Stacks - A job runtime role (for instructions see https://docs.aws.amazon.com/emr/latest/EMR-Serverless-UserGuide/getting-started.html#gs-prerequisites) ### Walkthrough 1. Create and start an EMR Serverless application with an initial capacity of 1 driver and 4 executors. ```bash aws emr-serverless create-application \ --name cloudwatch-dashboard-demo \ --type SPARK \ --release-label emr-6.7.0 \ --initial-capacity '{ "DRIVER": { "workerCount": 1, "workerConfiguration": { "cpu": "2vCPU", "memory": "4GB" } }, "EXECUTOR": { "workerCount": 4, "workerConfiguration": { "cpu": "4vCPU", "memory": "8GB" } } }' ``` You'll get back information about the created application - set your `APPLICATION_ID` environment variable appropriately. ```json { "applicationId": "00fabcdef12525", "name": "cloudwatch-dashboard-demo", "arn": "arn:aws:emr-serverless:us-east-1:123456789012:/applications/00fabcdef12525" } ``` ```bash APPLICATION_ID=00fabcdef12525 aws emr-serverless start-application --application-id $APPLICATION_ID ``` Wait for the application to start and continue with the next step. ```bash aws emr-serverless get-application --application-id $APPLICATION_ID ``` ```json { "application": { "applicationId": "00fabcdef12525", "name": "cloudwatch-dashboard-demo", "arn": "arn:aws:emr-serverless:us-east-1:123456789012:/applications/00fabcdef12525", "releaseLabel": "emr-6.7.0", "type": "Spark", "state": "STARTED", ... } } ``` 2. Create a CloudWatch Dashboard Using the `APPLICATION_ID` variable above, create the corresponding dashboard. ```bash aws cloudformation create-stack \ --stack-name emr-serverless-dashboard \ --template-body file://emr_serverless_cloudwatch_dashboard.yaml \ --parameters ParameterKey=ApplicationID,ParameterValue=$APPLICATION_ID ``` Go ahead and open your new dashboard: ```bash open https://console.aws.amazon.com/cloudwatch/home#dashboards:name=emr-serverless-dashboard-$APPLICATION_ID ``` You should notice your application is currently showing 0% "Pre-Initialized Capacity Worker Utilization" and you have 5 Pre-Initialized workers in the "Available Workers" section. That's the combined value of your drivers and executors. ![](images/demo/initial_state.png) 3. Run a simple job Let's go ahead and run a job that will consume a limited amount of resources. We'll use the [Pi example](https://github.com/apache/spark/blob/master/examples/src/main/python/pi.py) built in to Spark. Because we configured our pre-initialized capacity a little bit lower than the [Spark defaults](https://docs.aws.amazon.com/emr/latest/EMR-Serverless-UserGuide/jobs-spark.html#spark-defaults), we need to explicitly specify driver and executor settings when we submit the job. ```bash # Set your job runtime role as a variable JOB_ROLE_ARN=arn:aws:iam::123456789012:role/EMRServerlessS3RuntimeRole aws emr-serverless start-job-run \ --name pi-default \ --application-id $APPLICATION_ID \ --execution-role-arn $JOB_ROLE_ARN \ --job-driver '{ "sparkSubmit": { "entryPoint": "local:///usr/lib/spark/examples/src/main/python/pi.py", "sparkSubmitParameters": "--conf spark.driver.cores=2 --conf spark.driver.memory=3g --conf spark.executor.memory=7g" } }' ``` If we look at the "Running Drivers" and "Running Executors" charts, we'll see that 1 Pre-Initialized Driver and 3 Pre-Initilized Executors are being used, while there are 0 OnDemand workers. ![](images/demo/pi-default.png) Let's see what happens when we use more executors than the pre-initialized capacity. 4. Run a job with more executors We'll simply tell EMR Serverless to use more executor instances (the default is 3) and bump up the number of tasks in the `pi.py` job so the executors actually get used. ```bash aws emr-serverless start-job-run \ --name pi-more-executors \ --application-id $APPLICATION_ID \ --execution-role-arn $JOB_ROLE_ARN \ --job-driver '{ "sparkSubmit": { "entryPoint": "local:///usr/lib/spark/examples/src/main/python/pi.py", "entryPointArguments": ["2000"], "sparkSubmitParameters": "--conf spark.driver.cores=2 --conf spark.driver.memory=3g --conf spark.executor.memory=7g --conf spark.executor.instances=10" } }' ``` Now if we look at the "Running Executors" chart, we'll see that we're using additional OnDemand capacity because we have more executors than the "4" configured for pre-initialized capacity. ![](images/demo/pi-more-executors.png) Note that the job started immediately because we had enough capacity for the driver and it added more executors after the job started. 5. Run a job with more drivers If we submit two jobs at once, we'll see that the first job can make use of pre-initialized capacity while the second job will request OnDemand capacity. ```bash aws emr-serverless start-job-run \ --name pi-more-drivers-1 \ --application-id $APPLICATION_ID \ --execution-role-arn $JOB_ROLE_ARN \ --job-driver '{ "sparkSubmit": { "entryPoint": "local:///usr/lib/spark/examples/src/main/python/pi.py", "entryPointArguments": ["10000"], "sparkSubmitParameters": "--conf spark.driver.cores=2 --conf spark.driver.memory=3g --conf spark.executor.memory=7g" } }' aws emr-serverless start-job-run \ --name pi-more-drivers-2 \ --application-id $APPLICATION_ID \ --execution-role-arn $JOB_ROLE_ARN \ --job-driver '{ "sparkSubmit": { "entryPoint": "local:///usr/lib/spark/examples/src/main/python/pi.py", "entryPointArguments": ["10000"], "sparkSubmitParameters": "--conf spark.driver.cores=2 --conf spark.driver.memory=3g --conf spark.executor.memory=7g" } }' ``` In the EMR Studio console, we see the first job starts running immediately and the second is "Scheduled" pending an OnDemand Driver. ![](images/demo/pi-drivers.png) In the "Running Drivers" chart, we'll see 1 Pre-Initialized driver and 1 OnDemand driver as EMR Serverless responds to the request for more resources. ![](images/demo/pi-drivers-chart.png) Feel free to try other configurations and see how the dashboard responds. When you're done, make sure to delete your EMR Serverless application if you no longer need it. ```bash aws emr-serverless stop-application --application-id $APPLICATION_ID aws emr-serverless delete-application --application-id $APPLICATION_ID ```