AWSTemplateFormatVersion: '2010-09-09' Description: "(SO0016) - RealTime-Analytics with Spark Streaming. Version %%VERSION%%" Mappings: SourceCode: General: S3Bucket: '%%BUCKET_NAME%%' KeyPrefix: '%%SOLUTION_NAME%%/%%VERSION%%' S3TemplateBucket: '%%TEMPLATE_BUCKET_NAME%%' SolutionName: '%%SOLUTION_NAME%%' AnonymousData: SendAnonymousData: Data: Yes Parameters: AvailabilityZones: Description: 'List of Availability Zones to use for the subnets in the VPC. Note: The logical order is preserved.' Type: List KeyPairName: Description: Public/private key pairs allow you to securely connect to your Bastion instance after it launches. Type: AWS::EC2::KeyPair::KeyName ConstraintDescription: Must be the name of an existing EC2 KeyPair. LatestAMIId: Description: Provide SSM query e.g. /aws/service/ami-amazon-linux-latest/amzn2-ami-hvm-x86_64-gp2 for 64 bit instances, /aws/service/ecs/optimized-ami/amazon-linux-2/gpu/recommended for GPU AMIs Type: 'AWS::SSM::Parameter::Value' Default: '/aws/service/ami-amazon-linux-latest/amzn2-ami-hvm-x86_64-gp2' PrivateSubnet1ACIDR: AllowedPattern: ^(([0-9]|[1-9][0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5])\.){3}([0-9]|[1-9][0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5])(\/(1[6-9]|2[0-8]))$ ConstraintDescription: CIDR block parameter must be in the form x.x.x.x/16-28 Default: 10.0.0.0/19 Description: CIDR block for private subnet 1A located in Availability Zone 1 Type: String PrivateSubnet2ACIDR: AllowedPattern: ^(([0-9]|[1-9][0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5])\.){3}([0-9]|[1-9][0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5])(\/(1[6-9]|2[0-8]))$ ConstraintDescription: CIDR block parameter must be in the form x.x.x.x/16-28 Default: 10.0.32.0/19 Description: CIDR block for private subnet 2A located in Availability Zone 2 Type: String PublicSubnet1CIDR: AllowedPattern: ^(([0-9]|[1-9][0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5])\.){3}([0-9]|[1-9][0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5])(\/(1[6-9]|2[0-8]))$ ConstraintDescription: CIDR block parameter must be in the form x.x.x.x/16-28 Default: 10.0.128.0/20 Description: CIDR block for the public DMZ subnet 1 located in Availability Zone 1 Type: String PublicSubnet2CIDR: AllowedPattern: ^(([0-9]|[1-9][0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5])\.){3}([0-9]|[1-9][0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5])(\/(1[6-9]|2[0-8]))$ ConstraintDescription: CIDR block parameter must be in the form x.x.x.x/16-28 Default: 10.0.144.0/20 Description: CIDR block for the public DMZ subnet 2 located in Availability Zone 2 Type: String VPCCIDR: AllowedPattern: ^(([0-9]|[1-9][0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5])\.){3}([0-9]|[1-9][0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5])(\/(1[6-9]|2[0-8]))$ ConstraintDescription: CIDR block parameter must be in the form x.x.x.x/16-28 Default: 10.0.0.0/16 Description: CIDR block for the VPC Type: String ArtifactBucket: Type: String Default: '' Description: Location of all config Artifacts KinesisStream: Type: String Default: default-data-stream Description: Enter the Kinesis Stream name where the events should be published. ShardCount: Type: String Default: '2' Description: Enter the Kinesis Stream Shard count. SubmitMode: Description: Format of Spark Submit Mode (AppJar/Zeppelin) Type: String Default: AppJar AllowedValues: - Zeppelin - AppJar Type: Description: Spark Submit type as a (Script/Command) for AppJar Type: String Default: None AllowedValues: - None - Script - Command Script: Type: String Default: '' Description: '{s3://{bucket_location/spark_submit.sh}' Command: Type: CommaDelimitedList Default: '' Description: --deploy-mode,{cluster/client},--class {className},--master,{yarn/local[?]},{s3://AppLocation/AppJar},{Appname},{StreamName},{OutputLoc} RemoteAccessCIDR: Description: The IP address range that can be used to SSH to the EC2 instances. The input should be of the form of x.x.x.x/x (include the CIDR range) Type: String MinLength: '9' MaxLength: '18' AllowedPattern: ^(([0-9]|[1-9][0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5])\.){3}([0-9]|[1-9][0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5])(\/([0-9]|[1-2][0-9]|3[0-2]))$ ConstraintDescription: must be a valid IP CIDR range of the form x.x.x.x/x. Master: AllowedValues: - m4.large - m4.xlarge - m4.2xlarge - m4.4xlarge - m4.10xlarge - m4.16xlarge - m5.xlarge - m5.2xlarge - m5.4xlarge - m5.8xlarge - m5.12xlarge - m5.16xlarge - m5.24xlarge - m5a.xlarge - m5a.2xlarge - m5a.4xlarge - m5a.8xlarge - m5a.12xlarge - m5a.16xlarge - m5a.24xlarge - m5d.xlarge - m5d.2xlarge - m5d.4xlarge - m5d.8xlarge - m5d.12xlarge - m5d.16xlarge - m5d.24xlarge - c4.large - c4.xlarge - c4.2xlarge - c4.4xlarge - c4.8xlarge - c5.xlarge - c5.2xlarge - c5.4xlarge - c5.9xlarge - c5.12xlarge - c5.18xlarge - c5.24xlarge - c5d.xlarge - c5d.2xlarge - c5d.4xlarge - c5d.9xlarge - c5d.18xlarge - c5n.xlarge - c5n.2xlarge - c5n.4xlarge - c5n.9xlarge - c5n.18xlarge - z1d.xlarge - z1d.2xlarge - z1d.3xlarge - z1d.6xlarge - z1d.12xlarge - r3.xlarge - r3.2xlarge - r3.4xlarge - r3.8xlarge - r4.xlarge - r4.2xlarge - r4.4xlarge - r4.8xlarge - r4.16xlarge - r5.xlarge - r5.2xlarge - r5.4xlarge - r5.8xlarge - r5.12xlarge - r5.16xlarge - r5a.xlarge - r5a.2xlarge - r5a.4xlarge - r5a.8xlarge - r5a.12xlarge - r5a.16xlarge - r5a.24xlarge - r5d.xlarge - r5d.2xlarge - r5d.4xlarge - r5d.8xlarge - r5d.12xlarge - r5d.16xlarge - r5d.24xlarge - h1.2xlarge - h1.4xlarge - h1.8xlarge - h1.8xlarge - i3.xlarge - i3.2xlarge - i3.4xlarge - i3.8xlarge - i3.16xlarge - i3en.xlarge - i3en.2xlarge - i3en.3xlarge - i3en.6xlarge - i3en.12xlarge - i3en.24xlarge - d2.xlarge - d2.2xlarge - d2.4xlarge - d2.8xlarge - g3.4xlarge - g3.8xlarge - g3.16xlarge - g3s.xlarge - p2.xlarge - p2.8xlarge - p2.16xlarge - p3.2xlarge - p3.8xlarge - p3.16xlarge ConstraintDescription: must be a valid EC2 instance type. Default: r5.xlarge Description: EC2 instance type Type: String Core: AllowedValues: - m4.large - m4.xlarge - m4.2xlarge - m4.4xlarge - m4.10xlarge - m4.16xlarge - m5.xlarge - m5.2xlarge - m5.4xlarge - m5.8xlarge - m5.12xlarge - m5.16xlarge - m5.24xlarge - m5a.xlarge - m5a.2xlarge - m5a.4xlarge - m5a.8xlarge - m5a.12xlarge - m5a.16xlarge - m5a.24xlarge - m5d.xlarge - m5d.2xlarge - m5d.4xlarge - m5d.8xlarge - m5d.12xlarge - m5d.16xlarge - m5d.24xlarge - c4.large - c4.xlarge - c4.2xlarge - c4.4xlarge - c4.8xlarge - c5.xlarge - c5.2xlarge - c5.4xlarge - c5.9xlarge - c5.12xlarge - c5.18xlarge - c5.24xlarge - c5d.xlarge - c5d.2xlarge - c5d.4xlarge - c5d.9xlarge - c5d.18xlarge - c5n.xlarge - c5n.2xlarge - c5n.4xlarge - c5n.9xlarge - c5n.18xlarge - z1d.xlarge - z1d.2xlarge - z1d.3xlarge - z1d.6xlarge - z1d.12xlarge - r3.xlarge - r3.2xlarge - r3.4xlarge - r3.8xlarge - r4.xlarge - r4.2xlarge - r4.4xlarge - r4.8xlarge - r4.16xlarge - r5.xlarge - r5.2xlarge - r5.4xlarge - r5.8xlarge - r5.12xlarge - r5.16xlarge - r5a.xlarge - r5a.2xlarge - r5a.4xlarge - r5a.8xlarge - r5a.12xlarge - r5a.16xlarge - r5a.24xlarge - r5d.xlarge - r5d.2xlarge - r5d.4xlarge - r5d.8xlarge - r5d.12xlarge - r5d.16xlarge - r5d.24xlarge - h1.2xlarge - h1.4xlarge - h1.8xlarge - h1.8xlarge - i3.xlarge - i3.2xlarge - i3.4xlarge - i3.8xlarge - i3.16xlarge - i3en.xlarge - i3en.2xlarge - i3en.3xlarge - i3en.6xlarge - i3en.12xlarge - i3en.24xlarge - d2.xlarge - d2.2xlarge - d2.4xlarge - d2.8xlarge - g3.4xlarge - g3.8xlarge - g3.16xlarge - g3s.xlarge - p2.xlarge - p2.8xlarge - p2.16xlarge - p3.2xlarge - p3.8xlarge - p3.16xlarge ConstraintDescription: must be a valid EC2 instance type. Default: r5.xlarge Description: EC2 instance type Type: String Conditions: AnonymousDatatoAWS: !Equals [!FindInMap [AnonymousData, SendAnonymousData, Data], Yes] AppMode: !Equals - !Ref 'SubmitMode' - AppJar SubmitTypeScript: !And - !Equals - Script - !Ref 'Type' - !Condition 'AppMode' SubmitTypeCommand: !And - !Equals - Command - !Ref 'Type' - !Condition 'AppMode' Metadata: AWS::CloudFormation::Interface: ParameterGroups: - Label: default: Availability Zone Configuration Parameters: - AvailabilityZones - Label: default: Network Configuration Parameters: - VPCCIDR - PrivateSubnet1ACIDR - PrivateSubnet2ACIDR - PublicSubnet1CIDR - PublicSubnet2CIDR - Label: default: Amazon EC2 Configuration Parameters: - KeyPairName - RemoteAccessCIDR - Label: default: Kinesis Parameters: - KinesisStream - ShardCount - Label: default: EMR Parameters: - Master - Core - Label: default: Artifact Buckets Parameters: - ArtifactBucket - Label: default: Application Parameters: - SubmitMode - Type - Script - Command Resources: AccessLogsBucket: Type: AWS::S3::Bucket UpdateReplacePolicy: Retain DeletionPolicy: Retain Metadata: cfn_nag: rules_to_suppress: - id: W35 reason: This S3 bucket is used as the destination for storing access logs - id: W51 reason: >- The bucket is private and hence not setting explicit bukcet policies. When using the template, it is recommended to add more restrictive policy that allows only auditors/ administrators to view the logs Properties: AccessControl: LogDeliveryWrite BucketEncryption: ServerSideEncryptionConfiguration: - ServerSideEncryptionByDefault: SSEAlgorithm: AES256 PublicAccessBlockConfiguration: BlockPublicAcls: true BlockPublicPolicy: true IgnorePublicAcls: true RestrictPublicBuckets: true AppBucket: Type: AWS::S3::Bucket UpdateReplacePolicy: Retain DeletionPolicy: Retain Properties: BucketName: !Ref 'ArtifactBucket' LoggingConfiguration: DestinationBucketName: !Ref 'AccessLogsBucket' LogFilePrefix: app-bucket-access-logs/ BucketEncryption: ServerSideEncryptionConfiguration: - ServerSideEncryptionByDefault: SSEAlgorithm: AES256 PublicAccessBlockConfiguration: BlockPublicAcls: true BlockPublicPolicy: true IgnorePublicAcls: true RestrictPublicBuckets: true SparkOutputBucket: Type: AWS::S3::Bucket UpdateReplacePolicy: Retain DeletionPolicy: Retain Properties: BucketName: !Sub '${ArtifactBucket}-output' LoggingConfiguration: DestinationBucketName: !Ref 'AccessLogsBucket' LogFilePrefix: spark-output-bucket-access-logs/ BucketEncryption: ServerSideEncryptionConfiguration: - ServerSideEncryptionByDefault: SSEAlgorithm: AES256 PublicAccessBlockConfiguration: BlockPublicAcls: true BlockPublicPolicy: true IgnorePublicAcls: true RestrictPublicBuckets: true SparkAppLogBucket: Type: AWS::S3::Bucket UpdateReplacePolicy: Retain DeletionPolicy: Retain Properties: BucketName: !Sub '${ArtifactBucket}-log' LoggingConfiguration: DestinationBucketName: !Ref 'AccessLogsBucket' LogFilePrefix: spark-app-bucket-access-logs/ BucketEncryption: ServerSideEncryptionConfiguration: - ServerSideEncryptionByDefault: SSEAlgorithm: AES256 PublicAccessBlockConfiguration: BlockPublicAcls: true BlockPublicPolicy: true IgnorePublicAcls: true RestrictPublicBuckets: true SparkDemoAppResource: Type: Custom::SparkDemoAppResource Version: '1.0' Properties: ServiceToken: !GetAtt 'SparkDemoAppLambdaFunction.Arn' SparkDemoAppLambdaFunction: Type: AWS::Lambda::Function Metadata: cfn_nag: rules_to_suppress: - id: W89 reason: This function does not need to be deployed in a VPC - id: W92 reason: This function does not require reserved concurrency Properties: Description: This Lambda function executes only once to download sample demo app into Application S3 bucket Handler: demo_app_config/demo-app-config.lambda_handler Code: S3Bucket: !Join - '-' - - !FindInMap - SourceCode - General - S3Bucket - !Ref 'AWS::Region' S3Key: !Join - / - - !FindInMap - SourceCode - General - KeyPrefix - demo-app-config.zip Role: !GetAtt 'AppLambdaExecutionRole.Arn' Environment: Variables: S3_BUCKET: !Join - '-' - - !FindInMap - SourceCode - General - S3Bucket - !Ref 'AWS::Region' KEY_PREFIX: !FindInMap - SourceCode - General - KeyPrefix APP_BUCKET: !Ref 'AppBucket' DEMO_APP: 'FALSE' Runtime: python3.8 Timeout: 300 AppLambdaExecutionRole: Type: AWS::IAM::Role Properties: AssumeRolePolicyDocument: Version: '2012-10-17' Statement: - Effect: Allow Principal: Service: - lambda.amazonaws.com Action: - sts:AssumeRole Path: / Policies: - PolicyName: DemoAppLambda PolicyDocument: Version: '2012-10-17' Statement: - Effect: Allow Action: s3:PutObject Resource: - !Join - '' - - 'arn:aws:s3:::' - !Ref 'AppBucket' - !Join - '' - - 'arn:aws:s3:::' - !Ref 'AppBucket' - /* - Effect: Allow Action: s3:GetObject Resource: !Join - '' - - 'arn:aws:s3:::' - !Join - '-' - - !FindInMap - SourceCode - General - S3Bucket - !Ref 'AWS::Region' - / - !FindInMap - SourceCode - General - KeyPrefix - /* - Effect: Allow Action: - logs:CreateLogGroup - logs:CreateLogStream - logs:PutLogEvents Resource: !Join - '' - - 'arn:aws:logs:' - !Ref 'AWS::Region' - ':' - !Ref 'AWS::AccountId' - :log-group:/aws/lambda/* VPCStackQ: Type: AWS::CloudFormation::Stack Properties: Parameters: AvailabilityZones: !Join - ',' - !Ref 'AvailabilityZones' KeyPairName: !Ref 'KeyPairName' PrivateSubnet1ACIDR: !Ref 'PrivateSubnet1ACIDR' PrivateSubnet2ACIDR: !Ref 'PrivateSubnet2ACIDR' PublicSubnet1CIDR: !Ref 'PublicSubnet1CIDR' PublicSubnet2CIDR: !Ref 'PublicSubnet2CIDR' VPCCIDR: !Ref 'VPCCIDR' NATInstanceType: t2.medium CreateAdditionalPrivateSubnets: 'false' TemplateURL: !Join - / - - https://s3.amazonaws.com - !FindInMap - SourceCode - General - S3TemplateBucket - !FindInMap - SourceCode - General - KeyPrefix - aws-vpc.template BastionSecurityGroup: Type: AWS::EC2::SecurityGroup Metadata: cfn_nag: rules_to_suppress: - id: W29 reason: Range provided for ephemeral ports for outbound access Properties: GroupDescription: Enable incoming SSH access VpcId: !GetAtt 'VPCStackQ.Outputs.VPCID' SecurityGroupIngress: - Description: Allow inbound ssh connection on port 22 IpProtocol: tcp FromPort: 22 ToPort: 22 CidrIp: !Ref 'RemoteAccessCIDR' Tags: - Key: Name Value: BastionSecurityGroup BastionRemoteCIDREgressRestrictions: Type: AWS::EC2::SecurityGroupEgress Properties: Description: Allow outbound access on ephemeral ports to RemoteAccessCIDR for inbound ssh connections GroupId: !GetAtt BastionSecurityGroup.GroupId IpProtocol: tcp FromPort: 1024 ToPort: 65535 CidrIp: !Ref 'RemoteAccessCIDR' BastionEMREgressRestrictions: Type: AWS::EC2::SecurityGroupEgress Properties: Description: Allow outbound access on port 22 for ssh connections to EMR master instance GroupId: !GetAtt BastionSecurityGroup.GroupId IpProtocol: tcp FromPort: 22 ToPort: 22 DestinationSecurityGroupId: !GetAtt BastionEMRSecurityGroup.GroupId BastionHost: Type: AWS::EC2::Instance Properties: InstanceType: t2.medium KeyName: !Ref 'KeyPairName' ImageId: !Ref 'LatestAMIId' NetworkInterfaces: - GroupSet: - !Ref 'BastionSecurityGroup' AssociatePublicIpAddress: true DeviceIndex: '0' DeleteOnTermination: true SubnetId: !GetAtt 'VPCStackQ.Outputs.PublicSubnet1ID' Tags: - Key: Name Value: Bastion Host BastionEMRSecurityGroup: Type: AWS::EC2::SecurityGroup Metadata: cfn_nag: rules_to_suppress: - id: W29 reason: Range provided for ephemeral ports for outbound access Properties: GroupDescription: Internal Security Group for Administration and PortForwarding VpcId: !GetAtt 'VPCStackQ.Outputs.VPCID' SecurityGroupIngress: - Description: Allow ssh from Bastion IpProtocol: tcp FromPort: 22 ToPort: 22 SourceSecurityGroupId: !Ref 'BastionSecurityGroup' Tags: - Key: Name Value: BastionEMRSecurityGroup EMRBastionEgressRestrictions: Type: AWS::EC2::SecurityGroupEgress Properties: Description: Allow outbound access on ephemeral ports for inbound SSH connection GroupId: !GetAtt BastionEMRSecurityGroup.GroupId IpProtocol: tcp FromPort: 1024 ToPort: 65535 DestinationSecurityGroupId: !GetAtt BastionSecurityGroup.GroupId BastionRecoveryAlarm: Type: AWS::CloudWatch::Alarm Properties: AlarmDescription: Trigger a recovery when instance status check fails for 15 consecutive minutes. Namespace: AWS/EC2 MetricName: StatusCheckFailed_System Statistic: Minimum Period: 60 EvaluationPeriods: 15 ComparisonOperator: GreaterThanThreshold Threshold: 0 AlarmActions: - !Join - '' - - 'arn:aws:automate:' - !Ref 'AWS::Region' - :ec2:recover Dimensions: - Name: InstanceId Value: !Ref 'BastionHost' ApplicationBucketPolicy: Type: AWS::S3::BucketPolicy Metadata: cfn_nag: rules_to_suppress: - id: F16 reason: Access is limited to connections from the VPC created as part of the solution Properties: Bucket: !Ref 'AppBucket' PolicyDocument: Version: '2012-10-17' Id: Policy3445674452340 Statement: - Sid: Stmt2445373452640 Effect: Allow Principal: '*' Action: - s3:GetObject Resource: !Join - '' - - 'arn:aws:s3:::' - !Ref 'AppBucket' - /* Condition: StringEquals: aws:sourceVpc: !GetAtt 'VPCStackQ.Outputs.VPCID' OutputBucketPolicy: Type: AWS::S3::BucketPolicy Metadata: cfn_nag: rules_to_suppress: - id: F16 reason: Access is limited to connections from the VPC created as part of the solution Properties: Bucket: !Ref 'SparkOutputBucket' PolicyDocument: Version: '2012-10-17' Id: Policy3445674452340 Statement: - Sid: WriteWithinBucket Effect: Allow Principal: '*' Action: - s3:PutObject Resource: !Join - '' - - 'arn:aws:s3:::' - !Ref 'SparkOutputBucket' - /* Condition: StringEquals: aws:sourceVpc: !GetAtt 'VPCStackQ.Outputs.VPCID' LogBucketPolicy: Type: AWS::S3::BucketPolicy Metadata: cfn_nag: rules_to_suppress: - id: F16 reason: Access is limited to connections from the VPC created as part of the solution Properties: Bucket: !Ref 'SparkAppLogBucket' PolicyDocument: Version: '2012-10-17' Id: Policy3445674452340 Statement: - Sid: Stmt2445373452640 Effect: Allow Principal: '*' Action: s3:PutObject Resource: !Join - '' - - 'arn:aws:s3:::' - !Ref 'SparkAppLogBucket' - /* Condition: StringEquals: aws:sourceVpc: !GetAtt 'VPCStackQ.Outputs.VPCID' DataStream: Type: AWS::Kinesis::Stream Metadata: cfn_nag: rules_to_suppress: - id: W28 reason: Name of the stream is referenced in portion of the code external to the CF template Properties: Name: !Ref 'KinesisStream' ShardCount: !Ref 'ShardCount' StreamEncryption: EncryptionType: 'KMS' KeyId: alias/aws/kinesis EMRInstanceProfile: Properties: Path: / Roles: - !Ref 'EMRJobFlowRole' Type: AWS::IAM::InstanceProfile EMRJobFlowRole: Type: AWS::IAM::Role Metadata: cfn_nag: rules_to_suppress: - id: W11 reason: Refer to inline comments for suppression reasons - id: W76 reason: All permissions listed are required for the EMR cluster Properties: AssumeRolePolicyDocument: Statement: - Action: - sts:AssumeRole Effect: Allow Principal: Service: - ec2.amazonaws.com Policies: - PolicyName: !Sub '${AWS::StackName}-emr-ec2-policy' PolicyDocument: Statement: - Effect: Allow Action: # EMR nodes need to have read-only access to EC2 and EMR APIs - ec2:Describe* - elasticmapreduce:Describe* # Actions do not support resource level permissions - elasticmapreduce:ListClusters - s3:HeadBucket - kinesis:ListShards Resource: '*' - Effect: Allow # Cluster id is not available when role is created Resource: !Sub 'arn:aws:elasticmapreduce:${AWS::Region}:${AWS::AccountId}:cluster/*' Action: - elasticmapreduce:AddJobFlowSteps - elasticmapreduce:ListBootstrapActions - elasticmapreduce:ListInstanceGroups - elasticmapreduce:ListInstances - elasticmapreduce:ListSteps - Effect: Allow Action: - s3:GetBucketLocation - s3:GetBucketCORS - s3:GetObjectVersionForReplication - s3:GetObject - s3:GetBucketTagging - s3:GetObjectVersion - s3:GetObjectTagging - s3:ListMultipartUploadParts - s3:ListBucket - s3:ListBucketMultipartUploads - s3:PutObject - s3:PutObjectTagging - s3:DeleteObject Resource: # Limited to buckets created in this stack - !GetAtt AppBucket.Arn - !Sub '${AppBucket.Arn}/*' - !GetAtt SparkOutputBucket.Arn - !Sub '${SparkOutputBucket.Arn}/*' - !GetAtt SparkAppLogBucket.Arn - !Sub '${SparkAppLogBucket.Arn}/*' - Effect: Allow Action: - s3:GetObject Resource: !Sub 'arn:aws:s3:::${AWS::Region}.elasticmapreduce/libs/script-runner/script-runner.jar' - Effect: Allow Action: - kinesis:DescribeStream - kinesis:GetShardIterator - kinesis:GetRecords Resource: !GetAtt DataStream.Arn - Effect: Allow Action: - kms:Encrypt - kms:Decrypt - kms:ReEncrypt* - kms:GenerateDataKey* - kms:DescribeKey # https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-encryption-enable.html#emr-encryption-create-keys Resource: !GetAtt EMRAtRestKey.Arn - Effect: Allow Action: - kms:CreateGrant - kms:ListGrants - kms:RevokeGrant # https://aws.amazon.com/blogs/big-data/secure-your-data-on-amazon-emr-using-native-ebs-and-per-bucket-s3-encryption-options Resource: !GetAtt EMRAtRestKey.Arn Condition: Bool: kms:GrantIsForAWSResource: true EMRServiceRole: Type: AWS::IAM::Role Metadata: cfn_nag: rules_to_suppress: - id: W11 reason: Refer to inline comments for suppression reasons - id: W76 reason: All permissions listed are required for the EMR cluster Properties: AssumeRolePolicyDocument: Statement: - Action: - sts:AssumeRole Effect: Allow Principal: Service: - elasticmapreduce.amazonaws.com Policies: - PolicyName: !Sub '${AWS::StackName}-emr-service-policy' PolicyDocument: Statement: - Effect: Allow Action: - ec2:AuthorizeSecurityGroupEgress - ec2:AuthorizeSecurityGroupIngress - ec2:CancelSpotInstanceRequests - ec2:CreateNetworkInterface - ec2:CreateSecurityGroup - ec2:CreateTags - ec2:DeleteNetworkInterface - ec2:DeleteTags - ec2:DeleteSecurityGroup - ec2:DescribeAvailabilityZones - ec2:DescribeAccountAttributes - ec2:DescribeDhcpOptions - ec2:DescribeImages - ec2:DescribeInstanceStatus - ec2:DescribeInstances - ec2:DescribeKeyPairs - ec2:DescribeNetworkAcls - ec2:DescribeNetworkInterfaces - ec2:DescribePrefixLists - ec2:DescribeRouteTables - ec2:DescribeSecurityGroups - ec2:DescribeSpotInstanceRequests - ec2:DescribeSpotPriceHistory - ec2:DescribeSubnets - ec2:DescribeTags - ec2:DescribeVpcAttribute - ec2:DescribeVpcEndpoints - ec2:DescribeVpcEndpointServices - ec2:DescribeVpcs - ec2:DetachNetworkInterface - ec2:ModifyImageAttribute - ec2:ModifyInstanceAttribute - ec2:RequestSpotInstances - ec2:RevokeSecurityGroupEgress - ec2:RunInstances - ec2:TerminateInstances - ec2:DeleteVolume - ec2:DescribeVolumeStatus - ec2:DescribeVolumes - ec2:DetachVolume - iam:GetRole - iam:GetRolePolicy - iam:ListInstanceProfiles - iam:ListRolePolicies - s3:CreateBucket - sdb:BatchPutAttributes - sdb:Select - cloudwatch:PutMetricAlarm - cloudwatch:DescribeAlarms - cloudwatch:DeleteAlarms - application-autoscaling:RegisterScalableTarget - application-autoscaling:DeregisterScalableTarget - application-autoscaling:PutScalingPolicy - application-autoscaling:DeleteScalingPolicy - application-autoscaling:Describe* # Minimal permissions for EMR to work properly (https://aws.amazon.com/blogs/big-data/best-practices-for-securing-amazon-emr/) Resource: '*' - Effect: Allow Action: - s3:GetBucketLocation - s3:GetBucketCORS - s3:GetObjectVersionForReplication - s3:GetObject - s3:GetBucketTagging - s3:GetObjectVersion - s3:GetObjectTagging - s3:ListMultipartUploadParts - s3:ListBucket - s3:ListBucketMultipartUploads Resource: # Limited to buckets created in this stack - !GetAtt AppBucket.Arn - !Sub '${AppBucket.Arn}/*' - !GetAtt SparkOutputBucket.Arn - !Sub '${SparkOutputBucket.Arn}/*' - !GetAtt SparkAppLogBucket.Arn - !Sub '${SparkAppLogBucket.Arn}/*' - Effect: Allow Action: - sqs:CreateQueue - sqs:DeleteQueue - sqs:DeleteMessage - sqs:DeleteMessageBatch - sqs:GetQueueAttributes - sqs:GetQueueUrl - sqs:PurgeQueue - sqs:ReceiveMessage # Limited to queues whose names are prefixed with the literal string AWS-ElasticMapReduce- Resource: !Sub 'arn:aws:sqs:${AWS::Region}:${AWS::AccountId}:AWS-ElasticMapReduce-*' - Effect: Allow Action: iam:CreateServiceLinkedRole # EMR needs permissions to create this service-linked role for launching EC2 spot instances Resource: !Sub 'arn:aws:iam::${AWS::AccountId}:role/aws-service-role/spot.amazonaws.com/AWSServiceRoleForEC2Spot*' Condition: StringLike: iam:AWSServiceName: spot.amazonaws.com - Effect: Allow Action: iam:PassRole Resource: - !GetAtt 'EMRJobFlowRole.Arn' - !Sub 'arn:aws:iam::${AWS::AccountId}:role/EMR_AutoScaling_DefaultRole' - Effect: Allow Action: - kms:Encrypt - kms:Decrypt - kms:ReEncrypt* - kms:GenerateDataKey* - kms:DescribeKey # https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-encryption-enable.html#emr-encryption-create-keys Resource: !GetAtt EMRAtRestKey.Arn - Effect: Allow Action: - kms:CreateGrant - kms:ListGrants - kms:RevokeGrant # https://aws.amazon.com/blogs/big-data/secure-your-data-on-amazon-emr-using-native-ebs-and-per-bucket-s3-encryption-options Resource: !GetAtt EMRAtRestKey.Arn Condition: Bool: kms:GrantIsForAWSResource: true EMRAtRestKey: Type: AWS::KMS::Key Properties: EnableKeyRotation: true KeyPolicy: Version: '2012-10-17' Id: !Sub '${AWS::StackName}-emr-security-key' Statement: - Sid: Enable IAM User Permissions Effect: Allow Principal: AWS: !Sub 'arn:aws:iam::${AWS::AccountId}:root' Action: kms:* Resource: '*' EMRSecurityConfig: Type: AWS::EMR::SecurityConfiguration Metadata: cfn_nag: rules_to_suppress: - id: W61 reason: Encryption in transit requires a PEM file, and instructions on how to set it up are in the deployment guide Properties: Name: !Sub '${AWS::StackName}-security-config' SecurityConfiguration: EncryptionConfiguration: EnableInTransitEncryption: false EnableAtRestEncryption: true AtRestEncryptionConfiguration: S3EncryptionConfiguration: EncryptionMode: SSE-S3 LocalDiskEncryptionConfiguration: EnableEbsEncryption: true EncryptionKeyProviderType: AwsKms AwsKmsKey: !GetAtt EMRAtRestKey.Arn EMRCluster: Type: AWS::EMR::Cluster Properties: Applications: - Name: Hadoop - Name: Hive - Name: Spark - Name: Zeppelin - Name: Hue Instances: AdditionalMasterSecurityGroups: - !Ref 'BastionEMRSecurityGroup' CoreInstanceGroup: InstanceCount: 2 InstanceType: !Ref 'Core' Market: ON_DEMAND Name: Core Instance Ec2KeyName: !Ref 'KeyPairName' Ec2SubnetId: !GetAtt 'VPCStackQ.Outputs.PrivateSubnet1AID' MasterInstanceGroup: InstanceCount: 1 InstanceType: !Ref 'Master' Market: ON_DEMAND Name: Master Instance SecurityConfiguration: !Ref EMRSecurityConfig LogUri: !Join - '' - - s3:// - !Ref 'SparkAppLogBucket' JobFlowRole: !Ref 'EMRInstanceProfile' Name: EMR Cloud Cluster ReleaseLabel: emr-5.29.0 ServiceRole: !Ref 'EMRServiceRole' Tags: - Key: Name Value: EMR Cluster - Kinesis VisibleToAllUsers: true Configurations: - Classification: yarn-env Configurations: - Classification: export ConfigurationProperties: { "APP_NAMESPACE": app-spark-demo, "BATCH_DURATION_SECONDS": "10", "SOURCE_STREAM_NAME": !Ref KinesisStream, "RESULTS_BUCKET_NAME": !Ref SparkOutputBucket, "NUM_EXECUTORS": "16" } - Classification: spark-defaults ConfigurationProperties: { "spark.dynamicAllocation.enabled": "false", "spark.executor.cores": "2", "spark.executor.memory": "3g", "spark.executor.instances": "16" } - Classification: zeppelin-env Configurations: - Classification: export Configurations: [] ConfigurationProperties: ZEPPELIN_NOTEBOOK_STORAGE: org.apache.zeppelin.notebook.repo.S3NotebookRepo CLASSPATH: >- :/var/lib/zeppelin/.ivy2/jars/*:/usr/lib/hadoop-lzo/lib/*:/usr/lib/hadoop/hadoop-aws.jar:/usr/share/aws/aws-java-sdk/*:/usr/share/aws/emr/emrfs/conf:/usr/share/aws/emr/emrfs/lib/*:/usr/share/aws/emr/emrfs/auxlib/* SPARK_SUBMIT_OPTIONS: '"$SPARK_SUBMIT_OPTIONS --packages com.google.protobuf:protobuf-java:2.6.1,org.apache.spark:spark-streaming-kinesis-asl_2.11:2.4.4"' ZEPPELIN_NOTEBOOK_S3_BUCKET: !Ref 'ArtifactBucket' ZEPPELIN_NOTEBOOK_S3_USER: hadoop ZeppelinConfigStep: Type: AWS::EMR::Step Properties: ActionOnFailure: CONTINUE HadoopJarStep: Args: - !Sub 's3://${ArtifactBucket}/zeppelin_config.sh' Jar: !Sub 's3://${AWS::Region}.elasticmapreduce/libs/script-runner/script-runner.jar' Name: zeppelin_config JobFlowId: !Ref 'EMRCluster' AppJarScript: Type: AWS::EMR::Step Condition: SubmitTypeScript Properties: ActionOnFailure: CONTINUE HadoopJarStep: Args: - !Ref 'Script' Jar: !Sub 's3://${AWS::Region}.elasticmapreduce/libs/script-runner/script-runner.jar' Name: zeppelin_config JobFlowId: !Ref 'EMRCluster' AppJarStep: Type: AWS::EMR::Step Condition: SubmitTypeCommand Properties: ActionOnFailure: CONTINUE HadoopJarStep: Args: - spark-submit - !Ref 'Command' Jar: command-runner.jar MainClass: '' Name: SparkStep JobFlowId: !Ref 'EMRCluster' SolutionHelperRole: Type: AWS::IAM::Role Condition: AnonymousDatatoAWS Properties: AssumeRolePolicyDocument: Version: '2012-10-17' Statement: - Effect: Allow Principal: Service: lambda.amazonaws.com Action: sts:AssumeRole Path: / Policies: - PolicyName: Custom_Solution_Helper_Permissions PolicyDocument: Version: '2012-10-17' Statement: - Effect: Allow Action: - logs:CreateLogGroup - logs:CreateLogStream - logs:PutLogEvents Resource: !Join - '' - - 'arn:aws:logs:' - !Ref 'AWS::Region' - ':' - !Ref 'AWS::AccountId' - :log-group:/aws/lambda/* SolutionHelper: Type: AWS::Lambda::Function Condition: AnonymousDatatoAWS Metadata: cfn_nag: rules_to_suppress: - id: W89 reason: This function does not need to be deployed in a VPC - id: W92 reason: This function does not require reserved concurrency Properties: Handler: solution_helper.lambda_handler Role: !GetAtt 'SolutionHelperRole.Arn' Description: This function generates UUID for each deployment and sends anonymous data to the AWS Solutions team Code: S3Bucket: !Join - '-' - - 'solutions' - !Ref 'AWS::Region' S3Key: solution-helper/v3.1.0/solution_helper.zip Runtime: python3.7 MemorySize: 128 Timeout: 300 CreateUniqueID: Type: Custom::CreateUUID Condition: AnonymousDatatoAWS Properties: ServiceToken: !GetAtt 'SolutionHelper.Arn' CreateUniqueID: 'true' AnonymousData: Type: Custom::AnonymousData Condition: AnonymousDatatoAWS Properties: ServiceToken: !GetAtt 'SolutionHelper.Arn' SendAnonymousData: !Join - '' - - '{ ''Solution'' : ''' - SO00016 - ''', ' - '''UUID'' : ''' - !GetAtt 'CreateUniqueID.UUID' - ''', ' - '''Data'': {' - '''Master'': ''1'',' - '''InstanceType'': ''' - !Ref 'Master' - ''',' - '''CoreInstance'': ''2'',' - '''InstanceType'': ''' - !Ref 'Core' - ''',' - '''Region'': ''' - !Ref 'AWS::Region' - '''' - '}' - '}' Outputs: ArtifactLocation: Description: Name of the Artifact Bucket Value: !Ref 'AppBucket' SparkOutputLocation: Description: Spark Streaming Historical data output location Value: !Ref 'SparkOutputBucket' KinesisStreamName: Description: Kinesis ARN - Default default-data-stream Value: !GetAtt 'DataStream.Arn' EMRClusterId: Description: EMR Cluster Value: !Ref 'EMRCluster' BastionHost: Description: Bastion Host DNS Name Value: !GetAtt 'BastionHost.PublicDnsName'