AWSTemplateFormatVersion: '2010-09-09' Description: This template creates a security configuration and launches a kerberized Amazon EMR cluster Metadata: AWS::CloudFormation::Interface: ParameterGroups: - Label: default: Cross-realm Trust Configuration Parameters: - DomainDNSName - DomainAdminUser - ADDomainJoinPassword - CrossRealmTrustPrincipalPassword - Label: default: Cluster Configuration Parameters: - KeyPairName - ClusterSubnetID - ClusterSecurityGroup - TypeOfInstance - MasterInstanceCount - CoreInstanceCount - CIDRAccessToPrivateSubnetResources - AppsEMR - Label: default: EMR Kerberos Configuration Parameters: - KerberosRealm - KerberosADdomain - Label: default: KNOX Configuration Parameters: - LDAPHostPrivateIP - LDAPBindUserName - LDAPBindPassword - LDAPSearchBase - LDAPUserSearchAttribute - LDAPUserObjectClass - LDAPGroupSearchBase - LDAPGroupObjectClass - LDAPMemberAttribute ParameterLabels: DomainDNSName: default: 'Active Directory domain: ' DomainAdminUser: default: 'Domain admin user (joiner user): ' ADDomainJoinPassword: default: 'Domain admin password: ' CrossRealmTrustPrincipalPassword: default: 'Cross-realm trust password: ' KeyPairName: default: 'EC2 key pair name: ' ClusterSubnetID: default: 'Subnet ID: ' ClusterSecurityGroup: default: 'Security group ID: ' CIDRAccessToPrivateSubnetResources: default: 'Allowed IP address: ' TypeOfInstance: default: 'Instance type: ' MasterInstanceCount: default: 'Master Node Instance count: ' CoreInstanceCount: default: 'Core Nodes Instance count: ' KerberosRealm: default: 'EMR Kerberos realm: ' KerberosADdomain: default: 'Trusted AD domain: ' AppsEMR: default: 'EMR applications: ' LDAPHostPrivateIP: default: Ldap Host Private IP address LDAPBindUserName: default: 'Ldap Bind user Name: ' LDAPBindPassword: default: 'Ldap Bind user password: ' LDAPSearchBase: default: 'Ldap search base: ' LDAPUserSearchAttribute: default: 'Ldap user search attribute: ' LDAPUserObjectClass: default: 'Ldap user object class: ' LDAPGroupSearchBase: default: 'Ldap group search base: ' LDAPGroupObjectClass: default: 'Ldap group object class: ' LDAPMemberAttribute: default: 'Ldap member attribute: ' Parameters: VPC: Description: Select the Virtual Private Cloud (VPC) that was created Type: AWS::EC2::VPC::Id S3Bucket: Description: S3Bucket for the code [update this is you want to run this stack in a region other than US-EAST-1] Type: String Default: aws-bigdata-blog S3Key: Description: S3Key of the code [update this is you want to run this stack in a region other than US-EAST-1] Type: String Default: artifacts/aws-blog-emr-ranger S3ArtifactBucket: Description: S3Bucket where artifacts are stored Type: String Default: aws-bigdata-blog AllowedValues: ["aws-bigdata-blog"] S3ArtifactKey: Description: S3Key of the Lambda code Type: String Default: artifacts/aws-blog-emr-ranger AllowedValues: ["artifacts/aws-blog-emr-ranger"] ProjectVersion: Default: 3.0 Description: Project version Type: String AllowedValues: - 3.0 - beta EMRLogDir: Description: Log directory for EMR. Eg - s3://xxx Type: String AllowedPattern: "^s3://.*" ClusterSubnetID: Description: ID of an existing subnet for the Amazon EMR cluster Type: AWS::EC2::Subnet::Id AttachAdditionalSourcePrefixToSG: Description: Attaches additional sources to EMR Master SG Default: false Type: String AllowedValues: [true, false] CIDRAccessToPrivateSubnetResources: Description: IP address range (in CIDR notation) of the client that will be allowed to connect to the cluster using SSH e.g., 203.0.113.5/32 AllowedPattern: (\d{1,3})\.(\d{1,3})\.(\d{1,3})\.(\d{1,3})/(\d{1,2}) Type: String MinLength: '9' MaxLength: '18' Default: 10.0.0.0/16 ConstraintDescription: must be a valid CIDR range of the form x.x.x.x/x AdditionalSourcePrefixToSG: Description: Sources that are allowd to access the Ranger Instance. Should be a source prefix e.g., pl-xxx Type: String KeyPairName: Description: Name of an existing EC2 key pair to access the Amazon EMR cluster Type: AWS::EC2::KeyPair::KeyName DBHostName: Description: HostName of the database Type: String DBUserName: Description: ' The RDS MySQL database username' Type: String Default: root AllowedValues: - root DBRootPassword: Description: ' The RDS MySQL database password' MaxLength: '41' MinLength: '8' NoEcho: 'true' Type: String DomainDNSName: AllowedPattern: '[a-zA-Z0-9\-]+\..+' Default: awsemr.com Description: The Active Directory domain that you want to establish the cross-realm trust with e.g., awsemr.com MaxLength: '25' MinLength: '3' Type: String CrossRealmTrustPrincipalPassword: Description: Password of your cross-realm trust MaxLength: '32' MinLength: '5' NoEcho: 'true' Type: String ADDomainJoinPassword: Description: Password of the domain admin (joiner user) account NoEcho: 'true' Type: String KdcAdminPassword: Description: Password of your KDC Password MaxLength: '32' MinLength: '5' NoEcho: 'true' Type: String DomainAdminUser: AllowedPattern: '[a-zA-Z0-9]*' Default: awsadmin Description: User name of an AD account with computer join privileges MaxLength: '25' MinLength: '5' Type: String # KerberosRealm: # AllowedPattern: '[a-zA-Z0-9\-]+\..+' # Default: COMPUTE.INTERNAL # Description: Cluster's Kerberos realm name. This is usually the VPC's domain name # in uppercase letters e.g. COMPUTE.INTERNAL # MaxLength: '25' # MinLength: '3' # Type: String # AllowedValues: ['COMPUTE.INTERNAL'] KerberosADdomain: AllowedPattern: '[a-zA-Z0-9\-]+\..+' Default: AWSEMR.COM Description: The AD domain that you want to trust. This is the same as the AD domain name, but in uppercase letters e.g., AWSEMR.COM MaxLength: '25' MinLength: '3' Type: String MasterInstanceCount: Default: 1 Description: Number of instances (master nodes) for the cluster e.g., 2 Type: Number CoreInstanceCount: Default: 3 Description: Number of instances (core nodes) for the cluster e.g., 2 Type: Number MasterInstanceType: Description: Instance type for the cluster nodes Type: String Default: r5.2xlarge AllowedValues: - m4.large - m4.xlarge - m4.2xlarge - m4.4xlarge - m5.xlarge - m5.2xlarge - r5.2xlarge CoreInstanceType: Description: Instance type for the cluster nodes Type: String Default: r5.2xlarge AllowedValues: - m4.large - m4.xlarge - m4.2xlarge - m4.4xlarge - m5.xlarge - m5.2xlarge - r5.2xlarge emrReleaseLabel: Type: String Default: emr-5.32.0 AllowedValues: - emr-5.32.0 - emr-5.36.0 - emr-6.3.0 - emr-6.4.0 - emr-6.7.0 - emr-6.8.0 Description: Release label for the EMR cluster AppsEMR: Description: 'Comma separated list of applications to install on the cluster e.g., ' Type: String Default: Hadoop, Spark, Hive, Livy, Hue AllowedValues: - "Hadoop, Spark, Hive, Livy, Hue" - "Hadoop, Spark, Hive, Livy, Hue, Trino" - "Hadoop, Spark, Hive, Livy" EnableKerberos: Description: Enable Kerberos on the Cluster. This is Required for Ranger EMR support Default: true Type: String AllowedValues: [true] LDAPHostPrivateIP: Description: Ldap Host Private IP address Type: String LDAPBindUserName: Description: Ldap Bind user name Type: String Default: binduser AllowedValues: - binduser LDAPBindPassword: Description: Ldap Bind user password that was used in the previous cloud formation template Type: String NoEcho: 'true' LDAPSearchBase: Description: Ldap search base Type: String Default: CN=Users,DC=awsemr,DC=com AllowedValues: - CN=Users,DC=awsemr,DC=com LDAPUserSearchAttribute: Description: Ldap user search attribute Type: String Default: sAMAccountName AllowedValues: - sAMAccountName LDAPUserObjectClass: Description: Ldap user object class Type: String Default: person AllowedValues: - person LDAPGroupSearchBase: Description: Ldap group search Type: String Default: dc=awsemr,dc=com AllowedValues: - dc=awsemr,dc=com LDAPGroupObjectClass: Description: Ldap group object class Type: String Default: group AllowedValues: - group LDAPMemberAttribute: Description: Ldap member attribute Type: String Default: member AllowedValues: - member RangerHostname: Description: Ranger Host IP Address Type: String RangerVersion: Description: RangerVersion Type: String Default: '2.0' AllowedValues: - '0.6' - '0.7' - '1.0' - '2.0' RangerHttpProtocol: Description: Ranger HTTP protocol Type: String Default: 'https' AllowedValues: - 'https' RangerCloudWatchLogGroupName: Description: Log Group Name for Ranger Logs Type: String Default: '/aws/emr/rangeraudit/' AllowedValues: - '/aws/emr/rangeraudit/' RangerAdminPassword: Description: Password of the Ranger Admin server NoEcho: 'true' Default: admin Type: String CreateNonEMRResources: Description: Only Install EMR cluster, Other resources are ignored as they might have been created with previous stacks Default: true Type: String AllowedValues: [true, false] RangerAgentKeySecretName: Description: Name of Ranger Agent Cert Secrets mgr resource Type: String Default: emr/rangerGAagentkey AllowedValues: ["emr/rangerGAagentkey"] RangerServerCertSecretName: Description: Name of Ranger Server Cert Secrets mgr resource Type: String Default: emr/rangerGAservercert AllowedValues: ["emr/rangerGAservercert"] EnableIcebergSupport: Description: Enable Iceberg configuration Default: false Type: String AllowedValues: [ true, false ] EnableGlueSupport: Description: (* Experimental feature * - Not recomended for production) Enable Glue support insted of default RDS based Hive metastore Default: false Type: String AllowedValues: [ true, false ] InstallRangerHDFSPlugin: Description: Flag to control if the Ranger HDFS plugin will be added Default: false Type: String AllowedValues: [ true, false ] Conditions: USEastRegion: !Equals [ !Ref 'AWS::Region', "us-east-1" ] EnableKerberos: !Equals [true, !Ref EnableKerberos] CreateNonEMRResources: !Equals [true, !Ref CreateNonEMRResources] AttachAdditionalSourcePrefixToSG: !Equals [true, !Ref AttachAdditionalSourcePrefixToSG] EnableGlueSupportCondition: !Equals [true, !Ref EnableGlueSupport] Resources: LogGroupRanger: Type: AWS::Logs::LogGroup Condition: CreateNonEMRResources Properties: LogGroupName: !Ref RangerCloudWatchLogGroupName RetentionInDays: 14 EMRMasterSecurityGroup: Type: AWS::EC2::SecurityGroup Properties: GroupDescription: CloudFormationGroup VpcId: !Ref VPC SecurityGroupIngress: - IpProtocol: '-1' CidrIp: !Ref CIDRAccessToPrivateSubnetResources Tags: - Key: Name Value: AWSEMREMRMasterSecurityGroup EMRMasterSecurityGroupWithAdditions: Type: AWS::EC2::SecurityGroup Condition: AttachAdditionalSourcePrefixToSG Properties: GroupDescription: CloudFormationGroup VpcId: !Ref VPC SecurityGroupIngress: - IpProtocol: '-1' CidrIp: !Ref CIDRAccessToPrivateSubnetResources - IpProtocol: '-1' SourcePrefixListId: !Ref AdditionalSourcePrefixToSG Tags: - Key: Name Value: AWSEMREMRMasterSecurityGroup LambdaExecutionRole: Type: AWS::IAM::Role Properties: AssumeRolePolicyDocument: Version: '2012-10-17' Statement: - Effect: Allow Principal: Service: - lambda.amazonaws.com Action: - sts:AssumeRole ManagedPolicyArns: - arn:aws:iam::aws:policy/AmazonElasticMapReduceFullAccess - arn:aws:iam::aws:policy/service-role/AWSLambdaBasicExecutionRole - arn:aws:iam::aws:policy/service-role/AWSLambdaRole EmrServiceRole: Type: AWS::IAM::Role Properties: AssumeRolePolicyDocument: Version: '2012-10-17' Statement: - Effect: Allow Principal: Service: - elasticmapreduce.amazonaws.com Action: - sts:AssumeRole ManagedPolicyArns: - arn:aws:iam::aws:policy/service-role/AmazonElasticMapReduceRole EmrEc2Role: Type: AWS::IAM::Role Properties: AssumeRolePolicyDocument: Version: '2012-10-17' Statement: - Effect: Allow Principal: Service: - ec2.amazonaws.com Action: - sts:AssumeRole ManagedPolicyArns: - arn:aws:iam::aws:policy/service-role/AmazonElasticMapReduceforEC2Role - arn:aws:iam::aws:policy/AmazonSSMManagedInstanceCore - arn:aws:iam::aws:policy/CloudWatchFullAccess AllowSecretsRetrievalPolicy: Type: 'AWS::IAM::Policy' DependsOn: EmrEc2Role Properties: PolicyName: AllowSecretsRetrievalPolicy PolicyDocument: Version: 2012-10-17 Statement: - Effect: Allow Action: - secretsmanager:GetSecretValue - secretsmanager:ListSecrets - secretsmanager:DescribeSecret Resource: - !Join [ '', [ 'arn:aws:secretsmanager:', !Ref "AWS::Region", ':', !Ref "AWS::AccountId", ':secret:emr/ranger*' ] ] Roles: - !Ref EmrEc2Role DataAccessRoleARN: Type: AWS::IAM::Role Properties: AssumeRolePolicyDocument: Version: '2012-10-17' Statement: - Effect: Allow Principal: AWS: - !GetAtt 'EmrEc2Role.Arn' Action: - sts:AssumeRole ManagedPolicyArns: - arn:aws:iam::aws:policy/AmazonS3FullAccess - arn:aws:iam::aws:policy/CloudWatchFullAccess - !If [ EnableGlueSupportCondition, "arn:aws:iam::aws:policy/service-role/AWSGlueServiceRole", !Ref AWS::NoValue ] - !If [ EnableGlueSupportCondition, "arn:aws:iam::aws:policy/AWSGlueSchemaRegistryFullAccess", !Ref AWS::NoValue ] UserRoleARN: Type: AWS::IAM::Role Properties: AssumeRolePolicyDocument: Version: '2012-10-17' Statement: - Effect: Allow Principal: AWS: - !GetAtt 'EmrEc2Role.Arn' Action: - sts:AssumeRole ManagedPolicyArns: - arn:aws:iam::aws:policy/CloudWatchFullAccess AllowAssumeOfRolesAndTaggingPolicy: Type: 'AWS::IAM::Policy' Properties: PolicyName: AllowAssumeOfRolesAndTagging PolicyDocument: Version: 2012-10-17 Statement: - Effect: Allow Action: - sts:TagSession - sts:AssumeRole Resource: - !GetAtt 'UserRoleARN.Arn' - !GetAtt 'DataAccessRoleARN.Arn' Roles: - !Ref EmrEc2Role EMRInstanceProfile: Type: AWS::IAM::InstanceProfile Properties: Roles: - !Ref 'EmrEc2Role' SecurityConfiguration: Type: AWS::EMR::SecurityConfiguration Condition: EnableKerberos Properties: SecurityConfiguration: AuthenticationConfiguration: KerberosConfiguration: Provider: ClusterDedicatedKdc ClusterDedicatedKdcConfiguration: TicketLifetimeInHours: 24 CrossRealmTrustConfiguration: Realm: !Ref 'KerberosADdomain' Domain: !Ref 'DomainDNSName' AdminServer: !Ref 'DomainDNSName' KdcServer: !Ref 'DomainDNSName' AuthorizationConfiguration: RangerConfiguration: AdminServerURL: !Join ['', [!Ref RangerHttpProtocol, '://', !Ref RangerHostname, ':6182']] RoleForRangerPluginsARN: !GetAtt 'DataAccessRoleARN.Arn' RoleForOtherAWSServicesARN: !GetAtt 'UserRoleARN.Arn' AdminServerSecretARN: !Join ['', ['arn:aws:secretsmanager:', !Ref "AWS::Region", ':', !Ref "AWS::AccountId", ':secret:', !Ref RangerServerCertSecretName]] RangerPluginConfigurations: - App: "Spark" ClientSecretARN: !Join ['', ['arn:aws:secretsmanager:', !Ref "AWS::Region", ':', !Ref "AWS::AccountId", ':secret:', !Ref RangerAgentKeySecretName]] PolicyRepositoryName: "amazonemrspark" - App: "Hive" ClientSecretARN: !Join ['', ['arn:aws:secretsmanager:', !Ref "AWS::Region", ':', !Ref "AWS::AccountId", ':secret:', !Ref RangerAgentKeySecretName]] PolicyRepositoryName: "hivedev" - App: "EMRFS-S3" ClientSecretARN: !Join ['', ['arn:aws:secretsmanager:', !Ref "AWS::Region", ':', !Ref "AWS::AccountId", ':secret:', !Ref RangerAgentKeySecretName]] PolicyRepositoryName: "amazonemrs3" - App: "Trino" ClientSecretARN: !Join ['', ['arn:aws:secretsmanager:', !Ref "AWS::Region", ':', !Ref "AWS::AccountId", ':secret:', !Ref RangerAgentKeySecretName]] PolicyRepositoryName: "amazonemrtrino" AuditConfiguration: Destinations: AmazonCloudWatchLogs: CloudWatchLogGroup: !Sub 'arn:aws:logs:${AWS::Region}:${AWS::AccountId}:log-group:${RangerCloudWatchLogGroupName}' EncryptionConfiguration: EnableInTransitEncryption: true EnableAtRestEncryption: false InTransitEncryptionConfiguration: TLSCertificateConfiguration: CertificateProviderType: PEM S3Object: !Join ['', ["s3://", !Ref S3Bucket, "/", !Ref S3Key, "/", !Ref ProjectVersion, "/emr-tls/", "emr-certs-certs.zip"]] LaunchEMRClusterFunction: Type: AWS::Lambda::Function DependsOn: LambdaExecutionRole Properties: Handler: cremr.handler Role: !GetAtt 'LambdaExecutionRole.Arn' Code: S3Bucket: !Ref S3Bucket S3Key: !Join ['', [!Ref S3Key, '/', !Ref ProjectVersion, '/launch-cluster.zip']] Runtime: python3.9 Timeout: '300' LaunchKerberizedCluster: Type: AWS::CloudFormation::CustomResource DependsOn: LaunchEMRClusterFunction Version: '1.0' Properties: ServiceToken: !GetAtt 'LaunchEMRClusterFunction.Arn' SignalURL: !Ref 'emrCreateWaitHandle' loglevel: info subnetID: !Ref 'ClusterSubnetID' JobFlowRole: !Ref 'EMRInstanceProfile' ServiceRole: !Ref 'EmrServiceRole' masterSG: !If [ AttachAdditionalSourcePrefixToSG, !Ref EMRMasterSecurityGroupWithAdditions, !Ref EMRMasterSecurityGroup ] # slaveSG: !Ref 'ClusterSecurityGroup' # serviceSG: !Ref 'ServiceAccessSecurityGroup' S3Bucket: !Ref 'S3Bucket' S3Key: !Ref S3Key S3ArtifactBucket: !Ref 'S3ArtifactBucket' S3ArtifactKey: !Ref S3ArtifactKey ProjectVersion: !Ref ProjectVersion EMRSecurityConfig: !If - EnableKerberos - !Ref 'SecurityConfiguration' - "false" LogFolder: !Ref EMRLogDir KeyPairName: !Ref 'KeyPairName' StackName: !Ref 'AWS::StackName' StackRegion: !Ref 'AWS::Region' CrossRealmTrustPrincipalPassword: !Ref 'CrossRealmTrustPrincipalPassword' KdcAdminPassword: !Ref 'KdcAdminPassword' ADDomainJoinPassword: !Ref 'ADDomainJoinPassword' ADDomainUser: !Ref 'DomainAdminUser' KerberosRealm: !If [ USEastRegion, 'EC2.INTERNAL', 'COMPUTE.INTERNAL' ] MasterInstanceCount: !Ref 'MasterInstanceCount' CoreInstanceCount: !Ref 'CoreInstanceCount' MasterInstanceType: !Ref 'MasterInstanceType' CoreInstanceType: !Ref 'CoreInstanceType' AppsEMR: !Ref 'AppsEMR' emrReleaseLabel: !Ref emrReleaseLabel DomainDNSName: !Ref 'DomainDNSName' LDAPHostPrivateIP: !Ref 'LDAPHostPrivateIP' LDAPBindUserName: !Ref 'LDAPBindUserName' LDAPBindPassword: !Ref 'LDAPBindPassword' LDAPSearchBase: !Ref 'LDAPSearchBase' LDAPUserSearchAttribute: !Ref 'LDAPUserSearchAttribute' LDAPUserObjectClass: !Ref 'LDAPUserObjectClass' LDAPGroupSearchBase: !Ref 'LDAPGroupSearchBase' LDAPGroupObjectClass: !Ref 'LDAPGroupObjectClass' LDAPMemberAttribute: !Ref 'LDAPMemberAttribute' RangerHostname: !Ref RangerHostname RangerVersion: !Ref RangerVersion RangerHttpProtocol: !Ref RangerHttpProtocol DBHostName: !Ref DBHostName DBUserName: !Ref DBUserName DBRootPassword: !Ref DBRootPassword ClientSecretARN: !Join ['', ['arn:aws:secretsmanager:', !Ref "AWS::Region", ':', !Ref "AWS::AccountId", ':secret:', !Ref RangerAgentKeySecretName]] CertLocationPath: !Join ['', ["s3://", !Ref S3ArtifactBucket, "/", !Ref S3ArtifactKey, "/", !Ref ProjectVersion]] RangerAdminPassword: !Ref RangerAdminPassword DefaultDomain: !If [ USEastRegion, 'EC2.INTERNAL', 'COMPUTE.INTERNAL' ] EnableIcebergSupport: !Ref EnableIcebergSupport EnableGlueSupport: !Ref EnableGlueSupport InstallRangerHDFSPlugin: !Ref InstallRangerHDFSPlugin emrCreateWaitHandle: Type: AWS::CloudFormation::WaitConditionHandle Properties: {} emrWaitCondition: Type: AWS::CloudFormation::WaitCondition DependsOn: LaunchKerberizedCluster Properties: Handle: !Ref 'emrCreateWaitHandle' Timeout: '4500' AssignLakeFormationPermission: Condition: EnableGlueSupportCondition Type: AWS::CloudFormation::Stack Properties: TemplateURL: !Join [ '', [ 'https://s3.amazonaws.com/', !Ref 'S3ArtifactBucket', '/', !Ref 'S3ArtifactKey', '/', !Ref 'ProjectVersion','/cloudformation/', 'lakeformation-permissions.template' ] ] Parameters: DataAccessRoleARN: !GetAtt DataAccessRoleARN.Arn Outputs: StackName: Value: !Ref 'AWS::StackName' EMRClusterID: Value: !GetAtt 'LaunchKerberizedCluster.ClusterID' Description: EMR cluster ID EMRClusterURL: Value: !GetAtt 'emrWaitCondition.Data' Description: EMR master node URL