# **Spark Driver and Executor Logs** The status of the spark jobs can be monitored via [EMR on EKS describe-job-run API](https://docs.aws.amazon.com/cli/latest/reference/emr-containers/describe-job-run.html). To be able to monitor the job progress and to troubleshoot failures, you must configure your jobs to send log information to Amazon S3, Amazon CloudWatch Logs, or both ### Send Spark Logs to S3 ####**Update the IAM role with S3 write access** Configure the IAM Role passed in StartJobRun input `executionRoleArn` with access to S3 buckets. ```json { "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": [ "s3:PutObject", "s3:GetObject", "s3:ListBucket" ], "Resource": [ "arn:aws:s3:::my_s3_log_location", "arn:aws:s3:::my_s3_log_location/*", ] } ] } ``` ####**Configure the StartJobRun API with S3 buckets** Configure the `monitoringConfiguration` with `s3MonitoringConfiguration`, and configure the S3 location where the logs would be synced. ```json { "name": "", "virtualClusterId": "", "executionRoleArn": "", "releaseLabel": "", "jobDriver": { }, "configurationOverrides": { "monitoringConfiguration": { "persistentAppUI": "ENABLED", "s3MonitoringConfiguration": { "logUri": "s3://my_s3_log_location" } } } } ``` ####**Log location of JobRunner, Driver, Executor in S3** The JobRunner (pod that does spark-submit), Spark Driver, and Spark Executor logs would be found in the following S3 locations. ```text JobRunner/Spark-Submit/Controller Logs - s3://my_s3_log_location/${virtual-cluster-id}/jobs/${job-id}/containers/${job-runner-pod-id}/(stderr.gz/stdout.gz) Driver Logs - s3://my_s3_log_location/${virtual-cluster-id}/jobs/${job-id}/containers/${spark-application-id}/${spark-job-id-driver-pod-name}/(stderr.gz/stdout.gz) Executor Logs - s3://my_s3_log_location/${virtual-cluster-id}/jobs/${job-id}/containers/${spark-application-id}/${spark-job-id-driver-executor-id}/(stderr.gz/stdout.gz) ``` ### Send Spark Logs to CloudWatch ####**Update the IAM role with CloudWatch access** Configure the IAM Role passed in StartJobRun input `executionRoleArn` with access to CloudWatch Streams. ```json { "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": [ "logs:CreateLogStream", "logs:DescribeLogGroups", "logs:DescribeLogStreams" ], "Resource": [ "arn:aws:logs:*:*:*" ] }, { "Effect": "Allow", "Action": [ "logs:PutLogEvents" ], "Resource": [ "arn:aws:logs:*:*:log-group:my_log_group_name:log-stream:my_log_stream_prefix/*" ] } ] } ``` ####**Configure StartJobRun API with CloudWatch** Configure the `monitoringConfiguration` with `cloudWatchMonitoringConfiguration`, and configure the CloudWatch `logGroupName` and `logStreamNamePrefix` where the logs should be pushed. ```json { "name": "", "virtualClusterId": "", "executionRoleArn": "", "releaseLabel": "", "jobDriver": { }, "configurationOverrides": { "monitoringConfiguration": { "persistentAppUI": "ENABLED", "cloudWatchMonitoringConfiguration": { "logGroupName": "my_log_group_name", "logStreamNamePrefix": "my_log_stream_prefix" } } } } ``` ####**Log location of JobRunner, Driver, Executor** The JobRunner (pod that does spark-submit), Spark Driver, and Spark Executor logs would be found in the following AWS CloudWatch locations. ```text JobRunner/Spark-Submit/Controller Logs - ${my_log_group_name}/${my_log_stream_prefix}/${virtual-cluster-id}/jobs/${job-id}/containers/${job-runner-pod-id}/(stderr.gz/stdout.gz) Driver Logs - ${my_log_group_name}/${my_log_stream_prefix}/${virtual-cluster-id}/jobs/${job-id}/containers/${spark-application-id}/${spark-job-id-driver-pod-name}/(stderr.gz/stdout.gz) Executor Logs - ${my_log_group_name}/${my_log_stream_prefix}/${virtual-cluster-id}/jobs/${job-id}/containers/${spark-application-id}/${spark-job-id-driver-executor-id}/(stderr.gz/stdout.gz) ```