AWSTemplateFormatVersion: 2010-09-09 Description: 'Service Catalog: This reference architecture setsup an EMR cluster, an Amazon SageMaker notebook instance and connects the two.(fdp-1oc5f3ulf) As part of configuration, it also downloads https://raw.githubusercontent.com/jupyter-incubator/sparkmagic/master/sparkmagic/example_config.json . Please see https://github.com/jupyter-incubator/sparkmagic/blob/master/LICENSE.md and other files from the github repository for licensing of the sparkmagic. (fdp-1qj64b3eo)' Metadata: 'AWS::CloudFormation::Interface': ParameterGroups: - Label: default: Network Configuration Parameters: - VPCID - SubnetId - KeyPair - RemoteAccessCIDR - Label: default: EMR Cluster Configuration Parameters: - ClusterName - MasterInstanceType - CoreInstanceType - CoreNodeCount - ReleaseLabel - CreateLogBucket - BucketName - Label: default: SageMaker notebook instance Configuration Parameters: - NotebookInstanceName - NotebookInstanceType - VolumeSizeInGB ParameterLabels: VPCID: default: VPC KeyPair: default: KeyPair SubnetId: default: Subnet RemoteAccessCIDR: default: Remote Access CIDR Block ClusterName: default: EMR Cluster Name MasterInstanceType: default: Master Instance Type CoreInstanceType: default: Core Instance Type CoreNodeCount: default: Core Node Count ReleaseLabel: default: Release Configuration CreateLogBucket: default: Create EMR Log Bucket for Account VolumeSizeInGB: default: Size of volume attached to SageMaker notebook instance Parameters: NotebookInstanceName: ConstraintDescription: Maximum of 63 alphanumeric characters. Can include hyphens (-), but not spaces. Must be unique within your account in an AWS Region. Description: SageMaker Notebook instance name MaxLength: '63' MinLength: '1' Type: String BucketName: Description: Specify a unique S3 bucket name. Type: String NotebookInstanceType: AllowedValues: - ml.t2.medium ConstraintDescription: Must select a valid notebook instance type. Default: ml.t2.medium Description: Select Instance type for the SageMaker Notebook Type: String VolumeSizeInGB: Description: Size of volume attached to the SageMaker notebook instance. Type: Number MinValue: 5 MaxValue: 500 Default: 10 ClusterName: Description: Enter a name for your EMR cluster. Type: String Default: SC-EMR-SA MasterInstanceType: Description: Select the master node instance type. Type: String Default: m5.xlarge AllowedValues: - m5.xlarge - m5.2xlarge - m5.4xlarge - m5.12xlarge - c5.xlarge - c5.2xlarge - c5.4xlarge CoreInstanceType: Description: Select the core node instance type. Type: String Default: m5.xlarge AllowedValues: - m5.xlarge - m5.2xlarge - m5.4xlarge - m5.12xlarge - c5.xlarge - c5.2xlarge - c5.4xlarge CoreNodeCount: Description: Enter the number of Core Nodes. Minimum is 1. Type: Number Default: 2 AllowedValues: - 1 - 2 - 3 - 4 - 5 - 6 - 7 - 8 - 9 ReleaseLabel: Description: Select the EMR Cluster release configuration. Type: String Default: emr-5.29.0 AllowedValues: - emr-5.29.0 - emr-5.23.0 - emr-5.22.0 - emr-5.21.0 - emr-5.20.0 - emr-5.19.0 - emr-5.18.0 - emr-5.17.0 - emr-5.16.0 - emr-5.15.0 - emr-5.14.0 - emr-5.13.0 - emr-5.12.0 - emr-5.11.1 - emr-5.11.0 - emr-5.10.0 - emr-5.9.0 - emr-5.8.1 - emr-5.8.0 VPCID: Description: Select target VPC for EMR Cluster. Type: 'AWS::EC2::VPC::Id' SubnetId: Description: Select target subnet for EMR Cluster. Type: 'AWS::EC2::Subnet::Id' RemoteAccessCIDR: Description: Remote access IP address range to access the EMR cluster. Type: String AllowedPattern: '(\d{1,3})\.(\d{1,3})\.(\d{1,3})\.(\d{1,3})/(\d{1,2})' ConstraintDescription: must be a valid IP CIDR range of the form x.x.x.x/x. KeyPair: Description: Name of an existing EC2 KeyPair to enable SSH access. Type: 'AWS::EC2::KeyPair::KeyName' Default: 'foo' SageMakerRole: Description: Name of an Amazon sageMaker IAM role to be associated with notebook instance. Type: 'String' Resources: SageMakerNotebookInstance: Type: 'AWS::SageMaker::NotebookInstance' DependsOn: - ScGroup Properties: NotebookInstanceName: !Ref NotebookInstanceName InstanceType: !Ref NotebookInstanceType SubnetId: !Ref SubnetId KmsKeyId: !Ref KMSKey RootAccess: 'Disabled' VolumeSizeInGB : !Ref VolumeSizeInGB SecurityGroupIds: - !Ref ScGroup RoleArn: !Ref SageMakerRole LifecycleConfigName: !GetAtt BasicNotebookInstanceLifecycleConfig.NotebookInstanceLifecycleConfigName BasicNotebookInstanceLifecycleConfig: Type: "AWS::SageMaker::NotebookInstanceLifecycleConfig" Properties: OnCreate: - Content: Fn::Base64: !Join - "\n" - - !Join - '' - - "EMR_MASTER_IP=$(echo '" - !GetAtt - Cluster - MasterPublicDNS - "' | tr '-' '.' | awk -F \".\" '{print $2 \".\" $3 \".\" $4 \".\" $5}')" - 'cd /home/ec2-user/.sparkmagic' - 'echo "Fetching Sparkmagic example config from GitHub..."' - 'wget https://raw.githubusercontent.com/jupyter-incubator/sparkmagic/master/sparkmagic/example_config.json' - 'echo "Replacing EMR master node IP in Sparkmagic config..."' - 'sed -i -- "s/localhost/$EMR_MASTER_IP/g" example_config.json' - 'mv example_config.json config.json' - 'echo "Sending a sample request to Livy.."' - 'curl "$EMR_MASTER_IP:8998/sessions"' ScGroup: Type: "AWS::EC2::SecurityGroup" Properties: VpcId : !Ref VPCID GroupDescription: 'SageMakerSecurity group' MasterScGroup: Type: "AWS::EC2::SecurityGroup" Properties: VpcId : !Ref VPCID GroupDescription: 'EMR master Security group with SM notebook access' SecurityGroupIngress: - IpProtocol: tcp FromPort: 8998 ToPort: 8998 SourceSecurityGroupId: !Ref ScGroup LogBucket: Type: 'AWS::S3::Bucket' DeletionPolicy: Delete Properties: BucketName: !Ref BucketName LoggingConfiguration: LogFilePrefix: !Sub 's3://${AWS::StackName}-sc-emr-logs/' AccessControl: LogDeliveryWrite BucketEncryption: ServerSideEncryptionConfiguration: - ServerSideEncryptionByDefault: SSEAlgorithm: 'aws:kms' KMSMasterKeyID: !GetAtt - KMSKey - Arn KMSKey: Type: 'AWS::KMS::Key' Properties: KeyPolicy: Version: 2012-10-17 Id: key-default-1 Statement: - Sid: Enable IAM User Permissions Effect: Allow Principal: AWS: !Join - '' - - 'arn:aws:iam::' - !Ref 'AWS::AccountId' - ':root' Action: 'kms:*' Resource: '*' - Sid: Enable IAM User Permissions Effect: Allow Principal: AWS: !GetAtt - EMREC2Role - Arn Action: 'kms:*' Resource: '*' Cluster: Type: 'AWS::EMR::Cluster' Properties: Applications: - Name: Spark - Name: Livy Name: !Ref ClusterName Instances: Ec2KeyName: !Ref KeyPair MasterInstanceGroup: InstanceCount: 1 InstanceType: !Ref MasterInstanceType Market: ON_DEMAND Name: Master AdditionalMasterSecurityGroups: - !Ref RemoteAccessSecurityGroup - !Ref MasterScGroup CoreInstanceGroup: InstanceCount: !Ref CoreNodeCount InstanceType: !Ref CoreInstanceType Market: ON_DEMAND Name: Core AdditionalSlaveSecurityGroups: - !Ref RemoteAccessSecurityGroup TerminationProtected: false Ec2SubnetId: !Ref SubnetId JobFlowRole: !Ref EMREC2InstanceProfile ServiceRole: !Ref EMRRole ReleaseLabel: !Ref ReleaseLabel LogUri: !Sub 's3://${AWS::StackName}-sc-emr-logs' VisibleToAllUsers: true Tags: - Key: Name Value: service-catalog-emr-reference-architecture SecurityConfiguration: !Ref ClusterSecurityConfiguration ClusterSecurityConfiguration: Type: 'AWS::EMR::SecurityConfiguration' Properties: SecurityConfiguration: EncryptionConfiguration: EnableInTransitEncryption: false EnableAtRestEncryption: true AtRestEncryptionConfiguration: S3EncryptionConfiguration: EncryptionMode: SSE-KMS AwsKmsKey: !GetAtt - KMSKey - Arn LocalDiskEncryptionConfiguration: EncryptionKeyProviderType: AwsKms AwsKmsKey: !GetAtt - KMSKey - Arn RemoteAccessSecurityGroup: Type: 'AWS::EC2::SecurityGroup' Properties: GroupDescription: EMR Remote Access Secuirty Group VpcId: !Ref VPCID SecurityGroupIngress: - IpProtocol: tcp FromPort: '22' ToPort: '22' CidrIp: !Ref RemoteAccessCIDR EMRRole: Type: 'AWS::IAM::Role' Properties: AssumeRolePolicyDocument: Version: 2008-10-17 Statement: - Sid: '' Effect: Allow Principal: Service: elasticmapreduce.amazonaws.com Action: 'sts:AssumeRole' Path: / ManagedPolicyArns: - 'arn:aws:iam::aws:policy/service-role/AmazonElasticMapReduceRole' EMREC2Role: Type: 'AWS::IAM::Role' Properties: AssumeRolePolicyDocument: Version: 2008-10-17 Statement: - Sid: '' Effect: Allow Principal: Service: ec2.amazonaws.com Action: 'sts:AssumeRole' Path: / ManagedPolicyArns: - 'arn:aws:iam::aws:policy/service-role/AmazonElasticMapReduceforEC2Role' EMREC2InstanceProfile: Type: 'AWS::IAM::InstanceProfile' Properties: Path: / Roles: - !Ref EMREC2Role Outputs: MasterNodeHadoopURL: Description: EMR Resource Manager Value: !Sub 'http://${Cluster.MasterPublicDNS}:8088' S3LogBucket: Description: S3 Log Bucket Value: !Ref LogBucket RemoteAccessSecurityGroup: Description: Remote Access Security Group Value: !Ref RemoteAccessSecurityGroup EMRAccessCIDR: Description: Remote Access CIDR Value: !Ref RemoteAccessCIDR VPCID: Description: VPC Value: !Ref VPCID KeyPair: Description: KeyPair Value: !Ref KeyPair SubnetId: Description: Subnet Value: !Ref SubnetId ClusterName: Description: EMR Cluster Name Value: !Ref ClusterName SageMakerNoteBookURL: Description: URL for the newly created SageMaker Notebook Instance Value: !Sub >- https://${AWS::Region}.console.aws.amazon.com/sagemaker/home?region=${AWS::Region}#/notebook-instances/openNotebook/${NotebookInstanceName} SageMakerNoteBookTerminalURL: Description: Terminal access URL for the newly created SageMaker Notebook Instance Value: !Sub >- https://${NotebookInstanceName}.notebook.${AWS::Region}.sagemaker.aws/terminals/1 SageMakerNotebookInstanceARN: Description: ARN for the newly created SageMaker Notebook Instance Value: !Ref SageMakerNotebookInstance