AWSTemplateFormatVersion: '2010-09-09' Description: This Data Lake Foundation deployment is a Quick Start to create a VPC that provides a data lake architecture, Redshift cluster, and Elasticsearch domain. It also provides SageMaker in the same VPC through AWS CloudFormation templates that provision and configure the required resources, which includes related Quick Starts. This template provisions all of the resources that are specific to the Quick Start. Other Quick Starts (e.g., VPC and LinuxBastion) can be provisioned by using this data-lake-foundation-master template. **WARNING** You are billed for the AWS resources used if you create a stack from this template. (qs-1r0ghem11) Metadata: QuickStartDocumentation: EntrypointName: "Launch into a new VPC" AWS::CloudFormation::Interface: ParameterGroups: - Label: default: Network configuration Parameters: - AvailabilityZones - VPCDefinition - Label: default: S3 bucket configuration Parameters: - SubmissionsBucketName - CuratedDatasetsName - PublishedDataName - RegionalLambdaBucketName - AthenaQueryResultsBucketName - Label: default: Elasticsearch configuration Parameters: - RemoteAccessCIDR - ElasticsearchDomainName - ElasticsearchNodeType - ElasticsearchNodeCount - ElasticsearchEBSVolumeType - Label: default: Redshift configuration Parameters: - EnableRedshift - MakeRedshiftPubliclyAccessible - RedshiftUsername - RedshiftPassword - RedshiftNumberOfNodes - RedshiftNodeType - RedshiftDatabaseName - RedshiftDatabasePort - RedshiftEncrypted - RedshiftSnapshotRetention - Label: default: Kinesis configuration Parameters: - KinesisDataStreamName - KinesisDataStreamS3Prefix - KinesisErrorOutputPrefix - KinesisCompressionFormat - Label: default: SageMaker configuration Parameters: - NotebookInstanceName - NotebookInstanceType - Label: default: AWS Quick Start configuration Parameters: - QSS3BucketName - QSS3BucketRegion - QSS3KeyPrefix - KeyPairName ParameterLabels: AvailabilityZones: default: Availability Zones KeyPairName: default: Key pair name RemoteAccessCIDR: default: Remote access CIDR ElasticsearchNodeType: default: Elasticsearch node type ElasticsearchNodeCount: default: Elasticsearch node Count ElasticsearchEBSVolumeType: default: Elasticsearch EBS volume type ElasticsearchDomainName: default: ElasticsearchDomainName EnableRedshift: default: Enable Redshift MakeRedshiftPubliclyAccessible: default: Make Redshift publicly accessible RedshiftUsername: default: Redshift user name RedshiftPassword: default: Redshift password RedshiftNumberOfNodes: default: Redshift number of nodes RedshiftNodeType: default: Redshift node type RedshiftDatabaseName: default: Redshift database name RedshiftDatabasePort: default: Redshift database port RedshiftEncrypted: default: Redshift data encrypted RedshiftSnapshotRetention: default: Redshift snapshot retention in days QSS3BucketName: default: Quick Start S3 bucket name QSS3BucketRegion: default: Quick Start S3 bucket Region QSS3KeyPrefix: default: Quick Start S3 key prefix VPCDefinition: default: VPC definition KinesisDataStreamName: default: Kinesis Data Stream name KinesisDataStreamS3Prefix: default: Kinesis Data Stream S3 prefix KinesisCompressionFormat: default: Kinesis Data Stream S3 compression format KinesisErrorOutputPrefix: default: Kinesis S3 prefix for error objects NotebookInstanceName: default: Notebook instance name NotebookInstanceType: default: Notebook instance type SubmissionsBucketName: default: Submissions bucket name CuratedDatasetsName: default: Curated datasets bucket name PublishedDataName: default: Published data bucket name RegionalLambdaBucketName: default: Regional Lambda bucket name AthenaQueryResultsBucketName: default: Athena query results bucket name Mappings: VPCDefinitions: QuickstartDefault: VPCCIDR: 10.0.0.0/16 PublicSubnet1CIDR: 10.0.128.0/20 PrivateSubnet1CIDR: 10.0.0.0/19 PublicSubnet2CIDR: 10.0.144.0/20 PrivateSubnet2CIDR: 10.0.32.0/19 NumberOfAZs: '2' Conditions: CreateRedshift: !Equals - !Ref 'EnableRedshift' - 'yes' UsingDefaultBucket: !Equals [!Ref QSS3BucketName, 'aws-quickstart'] Parameters: AvailabilityZones: Description: The list of Availability Zones to use for the subnets in the VPC. You must specify two Availability Zones. Type: List ElasticsearchNodeCount: Default: '1' Description: The number of nodes in the Elasticsearch cluster. For guidance, see the Amazon ES documentation. Type: Number ElasticsearchNodeType: AllowedValues: - c5.large.elasticsearch - c5.xlarge.elasticsearch - c5.2xlarge.elasticsearch - c5.4xlarge.elasticsearch - c5.18xlarge.elasticsearch - i3.large.elasticsearch - i3.2xlarge.elasticsearch - i3.4xlarge.elasticsearch - i3.8xlarge.elasticsearch - i3.16xlarge.elasticsearch - m4.large.elasticsearch - m4.xlarge.elasticsearch - m4.2xlarge.elasticsearch - m4.4xlarge.elasticsearch - m4.10xlarge.elasticsearch - m5.large.elasticsearch - m5.xlarge.elasticsearch - m5.2xlarge.elasticsearch - m5.4xlarge.elasticsearch - m5.12xlarge.elasticsearch - r3.large.elasticsearch - r3.xlarge.elasticsearch - r3.2xlarge.elasticsearch - r3.4xlarge.elasticsearch - r3.8xlarge.elasticsearch - r4.large.elasticsearch - r4.xlarge.elasticsearch - r4.2xlarge.elasticsearch - r4.4xlarge.elasticsearch - r4.8xlarge.elasticsearch - r4.16xlarge.elasticsearch - r5.large.elasticsearch - r5.xlarge.elasticsearch - r5.2xlarge.elasticsearch - r5.4xlarge.elasticsearch - r5.12xlarge.elasticsearch - c4.large.elasticsearch - c4.xlarge.elasticsearch - c4.2xlarge.elasticsearch - c4.4xlarge.elasticsearch - c5.8xlarge.elasticsearch ConstraintDescription: Must be a valid Elasticsearch node type. Default: r5.large.elasticsearch Description: EC2 instance type for the Elasticsearch cluster. Type: String ElasticsearchEBSVolumeType: Description: EBS volume type. Type: String AllowedValues: - gp2 - io1 - st1 Default: gp2 ElasticsearchDomainName: Description: Elasticsearch domain name. This must start with a lowercase letter and contain only letters, numbers, and hyphens (-). Type: String Default: "datalake-quickstart" AllowedPattern: '[a-z][a-zA-Z0-9\-]+' MaxLength: '28' MinLength: '3' EnableRedshift: AllowedValues: - 'yes' - 'no' Default: 'yes' Description: "Specifies whether Amazon Redshift is provisioned.\ \ Set to no if you don\u2019t want to provision the Amazon Redshift cluster." Type: String MakeRedshiftPubliclyAccessible: AllowedValues: - 'yes' - 'no' Default: 'yes' Description: "Specifies whether Amazon Redshift is publicly accessible.\ \ This only applies if EnableRedshift is set to yes." Type: String KeyPairName: Description: Key pairs allow you to securely connect to your instance after it launches. Type: AWS::EC2::KeyPair::KeyName KinesisDataStreamName: Default: streaming-submissions Description: "Name of the Kinesis Data Stream." Type: String KinesisDataStreamS3Prefix: AllowedPattern: ^[0-9a-zA-Z-]+(/[0-9a-zA-Z-]+)*$ Default: streaming-submissions Description: "S3 key prefix for streaming data that is stored in the S3 submissions\ \ bucket. This prefix can include numbers, lowercase letters, uppercase letters,\ \ hyphens (-), and forward slashes (/), but it should not start with a forward slash\ \ because they are automatically appended. Use this parameter to specify the location of the\ \ streaming data you want to load." Type: String KinesisCompressionFormat: Description: Kinesis stream S3 compression format. Type: String Default: "UNCOMPRESSED" AllowedValues: - "UNCOMPRESSED" - "GZIP" - "Snappy" - "ZIP" KinesisErrorOutputPrefix: Description: (Optional) S3 prefix for error objects. Type: String Default: "" QSS3BucketName: AllowedPattern: ^[0-9a-zA-Z]+([0-9a-zA-Z-]*[0-9a-zA-Z])*$ ConstraintDescription: Quick Start bucket name can include numbers, lowercase letters, uppercase letters, and hyphens (-). It cannot start or end with a hyphen (-). Default: aws-quickstart Description: "S3 bucket where the Quick Start templates and scripts are installed.\ \ If you decide to customize or extend the Quick Start for your own purposes,\ \ use this parameter to specify the S3 bucket name you created for your\ \ Quick Start assets. The bucket name can include numbers, lowercase letters,\ \ uppercase letters, and hyphens (-), but should not start or end with a hyphen." Type: String QSS3BucketRegion: Default: 'us-east-1' Description: 'AWS Region where the Quick Start S3 bucket (QSS3BucketName) is hosted. If you use your own bucket, you must specify this value.' Type: String QSS3KeyPrefix: AllowedPattern: ^[0-9a-zA-Z-/]*$ ConstraintDescription: Quick Start key prefix can include numbers, lowercase letters, uppercase letters, hyphens (-), and forward slashes (/). It cannot start or end with a forward slash (/) because they are automatically appended. Default: quickstart-datalake-foundation/ Description: S3 key prefix used to simulate a folder for your copy of Quick Start assets. Specify a value if you decide to customize or extend the Quick Start for your own purposes. This prefix can include numbers, lowercase letters, uppercase letters, hyphens (-), and forward slashes (/). Type: String RedshiftDatabaseName: AllowedPattern: ([a-z]|[0-9])+ ConstraintDescription: Must contain a-z or 0-9 only. Default: quickstart Description: Name of the first database to create when provisioning the Amazon Redshift cluster. Type: String RedshiftDatabasePort: Default: '5439' Description: Port that Amazon Redshift listens on, which is allowed through the security group. Type: String RedshiftEncrypted: AllowedValues: - "true" - "false" Default: "false" Description: Determines whether the Redshift data is encrypted at rest. Type: String RedshiftNodeType: AllowedValues: - ra3.4xlarge - ra3.16xlarge - dc2.large - dc2.8xlarge ConstraintDescription: Must be a valid Redshift node type. Default: dc2.large Description: Instance type for the nodes in the Amazon Redshift cluster. Type: String RedshiftNumberOfNodes: Default: '1' Description: "Number of nodes in the Amazon Redshift cluster. If you specify\ \ a number that\u2019s greater than 1, the Quick Start launches a multinode\ \ cluster." Type: Number RedshiftPassword: Description: 'Password that is associated with the master user account for the Amazon Redshift cluster. The password must contain 8–64 printable ASCII characters, excluding: /, ", \'', \ and @. It must contain one uppercase letter, one lowercase letter, and one number.' MaxLength: '64' NoEcho: 'true' Type: String RedshiftSnapshotRetention: Description: Retention period for Redshift snapshots (in days). If it is set to 0, snapshots are disabled. Default: 1 Type: Number RedshiftUsername: AllowedPattern: ^[a-z][a-z0-9_]*$ ConstraintDescription: User name parameter must be lowercase, begin with a letter, contain only alphanumeric characters or underscores, and have fewer than 128 characters. Default: datalake Description: 'User name that is associated with the master user account for the Amazon Redshift cluster. The user name must contain fewer than 128 alphanumeric characters or underscores, and must be lowercase and begin with a letter.' MaxLength: '128' MinLength: '1' Type: String RemoteAccessCIDR: AllowedPattern: ^(([0-9]|[1-9][0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5])\.){3}([0-9]|[1-9][0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5])(\/([0-9]|[1-2][0-9]|3[0-2]))$ ConstraintDescription: CIDR block parameter must be in the form x.x.x.x/x. Description: CIDR IP range that is permitted to connect through SSH into the bastion host instance and access Amazon ES. We recommend that you set this value to a trusted IP range (e.g., you might grant software access to only your corporate network). You can use http://checkip.amazonaws.com/ to check your IP address. This parameter must be in the form x.x.x.x/x (e.g., 96.127.8.12/32, YOUR_IP/32). Type: String VPCDefinition: Default: QuickstartDefault Description: "VPC definition name from the Mappings section of the template. Each\ \ definition specifies a VPC configuration, including the number of Availability\ \ Zones to be used for the deployment and the CIDR blocks for the VPC, public\ \ subnets, and private subnets. You can support multiple VPC configurations\ \ by extending the map with additional definitions and choosing the appropriate\ \ name. If you don\u2019t want to change the VPC configuration, keep the default\ \ setting. For more information, see the Adding VPC Definitions section of the guide." Type: String NotebookInstanceName: AllowedPattern: ^[0-9a-zA-Z]+([0-9a-zA-Z-]*[0-9a-zA-Z])*$ ConstraintDescription: Must be unique within your account in an AWS Region. Type: String Default: NotebookInstanceName Description: Name of the Amazon SageMaker notebook instance. MaxLength: '30' MinLength: '1' NotebookInstanceType: Description: EC2 instance type for the data-lake Amazon SageMaker notebook instance. ConstraintDescription: Select a valid notebook instance type. Type: String AllowedValues: - ml.t3.medium - ml.t3.large - ml.t3.xlarge - ml.t3.2xlarge - ml.m5.xlarge - ml.m5.2xlarge - ml.c5.xlarge - ml.c5.2xlarge - ml.c5.4large - ml.p3.2xlarge Default: ml.t3.large SubmissionsBucketName: Description: 'Name of the S3 submission bucket. A random string is appended to ensure uniqueness.' Type: String Default: datalake-submissions CuratedDatasetsName: Description: 'Name of the S3 curated datasets bucket. A random string is appended to ensure uniqueness.' Type: String Default: datalake-curated-datasets PublishedDataName: Description: 'Name of the S3 published data bucket. A random string is appended to ensure uniqueness.' Type: String Default: datalake-published-data RegionalLambdaBucketName: Description: 'Name of the S3 Regional lambda bucket. A random string is appended to ensure uniqueness.' Type: String Default: regional-lambda-bucket AthenaQueryResultsBucketName: Description: 'Name of the S3 Athena query results bucket. A random string is appended to ensure uniqueness.' Type: String Default: datalake-athena-query-results Rules: ValidateEnableRedshift: RuleCondition: !Equals [!Ref EnableRedshift, "yes"] Assertions: - Assert: !Not [!Equals [!Ref RedshiftPassword, ""]] AssertDescription: 'Password must contain 8–64 printable ASCII characters, excluding: /, ", \'', \ and @. It must contain 1 uppercase letter, 1 lowercase letter, and 1 number.' Resources: BastionStack: Properties: Parameters: EnableTCPForwarding: 'true' KeyPairName: !Ref 'KeyPairName' PublicSubnet1ID: !GetAtt 'VPCStack.Outputs.PublicSubnet1ID' PublicSubnet2ID: !GetAtt 'VPCStack.Outputs.PublicSubnet2ID' QSS3BucketName: !Ref 'QSS3BucketName' QSS3BucketRegion: !Ref 'QSS3BucketRegion' QSS3KeyPrefix: !Sub - ${Prefix}submodules/quickstart-linux-bastion/ - Prefix: !Ref 'QSS3KeyPrefix' RemoteAccessCIDR: !Ref 'RemoteAccessCIDR' VPCID: !GetAtt 'VPCStack.Outputs.VPCID' TemplateURL: !Sub - 'https://${S3Bucket}.s3.${S3Region}.${AWS::URLSuffix}/${QSS3KeyPrefix}submodules/quickstart-linux-bastion/templates/linux-bastion-entrypoint-existing-vpc.template.yaml' - S3Region: !If [UsingDefaultBucket, !Ref 'AWS::Region', !Ref QSS3BucketRegion] S3Bucket: !If [UsingDefaultBucket, !Sub '${QSS3BucketName}-${AWS::Region}', !Ref QSS3BucketName] Type: AWS::CloudFormation::Stack DataLakeFoundationStack: Properties: Parameters: AvailabilityZones: !Join - ',' - !Ref 'AvailabilityZones' ElasticsearchNodeCount: !Ref 'ElasticsearchNodeCount' ElasticsearchNodeType: !Ref 'ElasticsearchNodeType' ElasticsearchEBSVolumeType: !Ref 'ElasticsearchEBSVolumeType' ElasticsearchDomainName: !Ref 'ElasticsearchDomainName' EnableRedshift: !Ref 'EnableRedshift' MakeRedshiftPubliclyAccessible: !Ref 'MakeRedshiftPubliclyAccessible' KeyPairName: !Ref 'KeyPairName' KinesisDataStreamName: !Ref 'KinesisDataStreamName' KinesisDataStreamS3Prefix: !Ref 'KinesisDataStreamS3Prefix' KinesisCompressionFormat: !Ref 'KinesisCompressionFormat' KinesisErrorOutputPrefix: !Ref 'KinesisErrorOutputPrefix' NAT1ElasticIP: !GetAtt 'VPCStack.Outputs.NAT1EIP' NAT2ElasticIP: !GetAtt 'VPCStack.Outputs.NAT2EIP' PrivateSubnet1ID: !GetAtt 'VPCStack.Outputs.PrivateSubnet1AID' PrivateSubnet2ID: !GetAtt 'VPCStack.Outputs.PrivateSubnet2AID' PublicSubnet1ID: !GetAtt 'VPCStack.Outputs.PublicSubnet1ID' PublicSubnet2ID: !GetAtt 'VPCStack.Outputs.PublicSubnet2ID' QSS3BucketName: !Ref 'QSS3BucketName' QSS3BucketRegion: !Ref 'QSS3BucketRegion' QSS3KeyPrefix: !Ref 'QSS3KeyPrefix' RedshiftDatabaseName: !Ref 'RedshiftDatabaseName' RedshiftDatabasePort: !Ref 'RedshiftDatabasePort' RedshiftEncrypted: !Ref 'RedshiftEncrypted' RedshiftNodeType: !Ref 'RedshiftNodeType' RedshiftNumberOfNodes: !Ref 'RedshiftNumberOfNodes' RedshiftPassword: !Ref 'RedshiftPassword' RedshiftUsername: !Ref 'RedshiftUsername' RedshiftSnapshotRetention: !Ref 'RedshiftSnapshotRetention' RemoteAccessCIDR: !Ref 'RemoteAccessCIDR' VPCCIDR: !GetAtt 'VPCStack.Outputs.VPCCIDR' VPCID: !GetAtt 'VPCStack.Outputs.VPCID' NotebookInstanceName: !Ref 'NotebookInstanceName' NotebookInstanceType: !Ref 'NotebookInstanceType' SubmissionsBucketName: !Ref 'SubmissionsBucketName' CuratedDatasetsName: !Ref 'CuratedDatasetsName' PublishedDataName: !Ref 'PublishedDataName' RegionalLambdaBucketName: !Ref 'RegionalLambdaBucketName' AthenaQueryResultsBucketName: !Ref 'AthenaQueryResultsBucketName' TemplateURL: !Sub - 'https://${S3Bucket}.s3.${S3Region}.${AWS::URLSuffix}/${QSS3KeyPrefix}templates/data-lake.template.yaml' - S3Region: !If [UsingDefaultBucket, !Ref 'AWS::Region', !Ref QSS3BucketRegion] S3Bucket: !If [UsingDefaultBucket, !Sub '${QSS3BucketName}-${AWS::Region}', !Ref QSS3BucketName] Type: AWS::CloudFormation::Stack VPCStack: Properties: Parameters: AvailabilityZones: !Join - ',' - !Ref 'AvailabilityZones' NumberOfAZs: !FindInMap - VPCDefinitions - !Ref 'VPCDefinition' - NumberOfAZs PrivateSubnet1ACIDR: !FindInMap - VPCDefinitions - !Ref 'VPCDefinition' - PrivateSubnet1CIDR PrivateSubnet2ACIDR: !FindInMap - VPCDefinitions - !Ref 'VPCDefinition' - PrivateSubnet2CIDR PublicSubnet1CIDR: !FindInMap - VPCDefinitions - !Ref 'VPCDefinition' - PublicSubnet1CIDR PublicSubnet2CIDR: !FindInMap - VPCDefinitions - !Ref 'VPCDefinition' - PublicSubnet2CIDR VPCCIDR: !FindInMap - VPCDefinitions - !Ref 'VPCDefinition' - VPCCIDR TemplateURL: !Sub - 'https://${S3Bucket}.s3.${S3Region}.${AWS::URLSuffix}/${QSS3KeyPrefix}submodules/quickstart-aws-vpc/templates/aws-vpc.template.yaml' - S3Region: !If [UsingDefaultBucket, !Ref 'AWS::Region', !Ref QSS3BucketRegion] S3Bucket: !If [UsingDefaultBucket, !Sub '${QSS3BucketName}-${AWS::Region}', !Ref QSS3BucketName] Type: AWS::CloudFormation::Stack Outputs: SubmissionsBucketName: Description: Bucket name for submissions. Value: !GetAtt 'DataLakeFoundationStack.Outputs.SubmissionsBucketName' CuratedBucketName: Description: Bucket name for curated datasets. Value: !GetAtt 'DataLakeFoundationStack.Outputs.CuratedBucketName' PublishedBucketName: Description: Bucket name for published data. Value: !GetAtt 'DataLakeFoundationStack.Outputs.PublishedBucketName' RedshiftJDBCEndpoint: Description: Redshift JDBC endpoint. Condition: CreateRedshift Value: !GetAtt 'DataLakeFoundationStack.Outputs.RedshiftJDBCEndpoint' ElasticsearchEndpoint: Description: Elasticsearch endpoint. Value: !GetAtt 'DataLakeFoundationStack.Outputs.ElasticsearchEndpoint' KinesisDataStreamName: Description: Kinesis Data Stream name. Value: !GetAtt 'DataLakeFoundationStack.Outputs.KinesisDataStreamName' SageMakerNotebookURL: Description: SageMaker notebook instance URL. Value: !GetAtt 'DataLakeFoundationStack.Outputs.SageMakerNotebookURL'