AWSTemplateFormatVersion: 2010-09-09 Description: >- This template creates Databricks workspace resources in your AWS account using the API account. The API account is required if you want to use either customer managed VPCs or customer managed keys for notebooks. For feature availability, contact your Databricks representative. (qs-1r0odiedc) Metadata: cfn-lint: config: ignore_checks: - W3005 - W9001 - W9006 # temporary to get rid of warnings QuickStartDocumentation: EntrypointName: Parameters for deploying a workspace and using an existing cross-account IAM role AWS::CloudFormation::Interface: ParameterGroups: - Label: default: Workspace configuration Parameters: - AccountId - Username - Password - HIPAAparm - Label: default: Required IAM role and S3 bucket configuration Parameters: - IAMArn - IAMArnLambda - BucketName - Label: default: (Optional) Recommended to provide a unique deployment name for your workspace. Parameters: - DeploymentName - Label: default: (Optional) Analysing existing data with your workspace Parameters: - ExistingInstanceProfileArn - Label: default: (Optional) Customer managed VPC configuration (premium tier required) Parameters: - VPCID - SubnetIDs - SecurityGroupIDs - Label: default: (Optional) AWS PrivateLink Configuration (requires the premium tier). In preview - requires activation by Databricks for your account. Parameters: - PrivateLinkMode - PrivateLinkSubnetIds - Label: default: (Optional) Customer managed key configuration for notebooks (enterprise tier required) Parameters: - KeyArn - KeyAlias - KeyUseCases - KeyReuseForClusterVolumes - Label: default: Quick Start configuration Parameters: - QSS3BucketName - QSS3KeyPrefix ParameterLabels: AccountId: default: Databricks account ID Username: default: Workspace account email Password: default: Workspace account password DeploymentName: default: Workspace deployment name HIPAAparm: default: HIPAA tier account IAMArn: default: ARN of the existing cross-account IAM role IAMArnLambda: default: ARN of the existing IAM role with Lambda- and S3-access permissions BucketName: default: Root S3 bucket name ExistingInstanceProfileArn: default: Existing Instance Profile ARN VPCID: default: VPC ID SubnetIDs: default: Private subnet IDs SecurityGroupIDs: default: Security group IDs PrivateLinkMode: default: AWS PrivateLink mode PrivateLinkSubnetIds: default: Subnet Ids for the VPC endpoints KeyArn: default: ARN for customer managed AWS KMS key KeyAlias: default: Alias for customer managed AWS KMS key KeyUseCases: default: Use case for the key KeyReuseForClusterVolumes: default: Encrypt cluster Amazon EBS volumes QSS3BucketName: default: Quick Start S3 bucket name QSS3KeyPrefix: default: Quick Start S3 key prefix Outputs: S3BucketName: Description: Name of the S3 root bucket. Value: !Ref assetsS3Bucket CustomerManagedKeyId: Description: ID of the customer managed key object. Condition: IsKMSKeyProvided Value: !Ref createCustomerManagedKey CredentialsId: Description: Credential ID. Value: !Ref createCredentials ExternalId: Description: Databricks external ID. Value: !GetAtt createCredentials.ExternalId NetworkId: Description: Databricks network ID. Condition: CustomerManagedVPC Value: !Ref createNetworks StorageConfigId: Description: Storage configuration ID. Value: !Ref createStorageConfiguration WorkspaceURL: Description: URL of the workspace. Value: !Sub https://${createWorkspace.DeploymentName}.cloud.databricks.com' WorkspaceId: Description: Workspace ID. Value: !Ref createWorkspace WorkspaceStatus: Description: Status of the requested workspace. Value: !GetAtt createWorkspace.WorkspaceStatus WorkspaceStatusMessage: Description: Detailed status description of the requested workspace. Value: !GetAtt createWorkspace.WorkspaceStatusMsg PricingTier: Description: "Pricing tier of the workspace. For more information, see https://databricks.com/product/aws-pricing." Value: !GetAtt createWorkspace.PricingTier ClusterPolicyID: Description: Unique identifier of cluster policy. Value: !GetAtt createWorkspace.ClusterPolicyId Parameters: AccountId: Description: Account must use the E2 version of the platform. For more information, see https://docs.databricks.com/getting-started/overview.html#e2-architecture. AllowedPattern: '^[a-z0-9]{8}-[a-z0-9]{4}-[a-z0-9]{4}-[a-z0-9]{4}-[a-z0-9]{12}$' MinLength: '36' Type: String Default: aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee Username: Description: Account email for authenticating the REST API. Note that this value is case sensitive. AllowedPattern: '^[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+.[a-zA-Z0-9-.]+$' ConstraintDescription: Must be a valid email format. MinLength: '8' Type: String Password: Description: Account password for authenticating the REST API. The minimum length is 8 alphanumeric characters. MinLength: '8' NoEcho: 'true' Type: String DeploymentName: Description: The deployment name defines part of the subdomain for the workspace. The workspace URL for web application and REST APIs is .cloud.databricks.com. Accounts can have a deployment name prefix. Contact your Databricks representative to add an account deployment name prefix to your account. If your account has a non-empty deployment name prefix at workspace creation time, the workspace deployment name is updated so that it begins with the account prefix and a hyphen. If your account has a non-empty deployment name prefix and you set deployment_name to the reserved keyword EMPTY, deployment_name is the account prefix only. For more information, see https://docs.databricks.com/administration-guide/account-api/new-workspace.html#step-6-create-the-workspace Type: String Default: '' HIPAAparm: Description: 'Entering "Yes" creates a template for creating clusters in the HIPAA account.' AllowedValues: - 'Yes' - 'No' Default: 'No' Type: String IAMArn: Description: Enter an existing IAM role ARN. For more information, see https://docs.databricks.com/administration-guide/multiworkspace/iam-role.html. AllowedPattern: 'arn:aws:iam::\d{12}:role/.*' ConstraintDescription: Must be an IAM role ARN. MinLength: 16 Default: arn:aws:iam::111111111111:role/your-role-name Type: String IAMArnLambda: Description: Enter an existing IAM role ARN with AWSLambdaBasicExecutionRole. For more information, see the deployment guide. AllowedPattern: 'arn:aws:iam::\d{12}:role/.*' ConstraintDescription: Must be an IAM role ARN. MinLength: 16 Default: arn:aws:iam::111111111111:role/your-role-name Type: String BucketName: Description: Name of your S3 root bucket. Use only alphanumeric characters. For more information, see https://docs.aws.amazon.com/AmazonS3/latest/dev/BucketRestrictions.html. AllowedPattern: '(?=^.{3,63}$)(?!xn--)([a-z0-9](?:[a-z0-9-]*)[a-z0-9])$' MinLength: '3' MaxLength: '63' Type: String ConstraintDescription: Name of workspace root bucket. This name can include numbers, lowercase letters, uppercase letters, and hyphens, but do not start or end with a hyphen (-)." ExistingInstanceProfileArn: Description: The ARN of an existing instance profile for Databricks to attach to the Spark cluster nodes and the SQL Warehouses. Type: String Default: '' VPCID: Description: "VPC ID for creating your workspace. Only enter a value if you use the customer managed VPC feature. The format is vpc-xxxxxxxxxxxxxxxx. For more information, see https://docs.databricks.com/administration-guide/cloud-configurations/aws/customer-managed-vpc.html. If unspecified, Databricks creates a new workspace in a new VPC." Type: String Default: '' SecurityGroupIDs: Description: Security-group names in your VPC. Only enter a value if you set VPCID. The format is sg-xxxxxxxxxxxxxxxxx. Use commas to separate multiple IDs. Databricks must have access to at least one security group but no more than five. You can reuse existing security groups. For more information, see https://docs.databricks.com/administration-guide/cloud-configurations/aws/customer-managed-vpc.html. Type: String Default: '' SubnetIDs: Description: Enter at least two private subnet IDs. Only enter a value if you set VPCID. Subnets cannot be shared with other workspaces or non-Databricks resources. Each subnet must be private, have outbound access, and a netmask between /17 and /25. The NAT gateway must have its own subnet that routes 0.0.0.0/0 traffic to an internet gateway. For more information, see https://docs.databricks.com/administration-guide/cloud-configurations/aws/customer-managed-vpc.html. Type: String Default: '' PrivateLinkMode: Description: Specify whether endpoints should be set up for the Databricks VPC endpoint services Type: String AllowedValues: - 'Enabled' - 'Disabled' Default: 'Disabled' PrivateLinkSubnetIds: Description: Enter at least two subnet IDs. Only enter a value if you set VPCID. Each subnet must be private with a netmask between /17 and /25 Type: String Default: '' KeyArn: Description: AWS KMS key ARN to encrypt and decrypt workspace notebooks in the control plane. Only enter a value if you use the customer managed key for notebooks. For more information, see https://docs.databricks.com/security/keys/customer-managed-keys-notebook-aws.html. Type: String Default: '' KeyAlias: Description: (Optional) AWS KMS key alias. Type: String Default: '' KeyUseCases: Description: Configures customer managed encryption keys. Acceptable values are MANAGED_SERVICES, STORAGE, or BOTH. For more information, see https://docs.databricks.com/administration-guide/account-api/new-workspace.html#step-5-configure-customer-managed-keys-optional. Type: String Default: '' KeyReuseForClusterVolumes: Description: 'Only enter a value if the use case is STORAGE or BOTH. Acceptable values are "True" and "False."' Type: String Default: '' QSS3BucketName: Description: S3 bucket for Quick Start assets. Use this if you want to customize the Quick Start. The bucket name can include numbers, lowercase letters, uppercase letters, and hyphens, but it cannot start or end with a hyphen (-). AllowedPattern: '(?=^.{3,63}$)(?!xn--)([a-z0-9](?:[a-z0-9-]*)[a-z0-9])$' Default: aws-quickstart Type: String ConstraintDescription: Quick Start bucket name can include numbers, lowercase letters, uppercase letters, and hyphens, but it cannot start or end with a hyphen (-). QSS3KeyPrefix: Description: S3 key prefix to simulate a directory for your Quick Start assets. Use this if you want to customize the Quick Start. The prefix can include numbers, lowercase letters, uppercase letters, hyphens (-), and forward slashes (/). For more information, see https://docs.aws.amazon.com/AmazonS3/latest/dev/UsingMetadata.html. ConstraintDescription: The Quick Start key prefix can include numbers, lowercase letters, uppercase letters, hyphens (-), and forward slashes (/). AllowedPattern: '^[0-9a-zA-Z-/]*$' Default: quickstart-databricks-unified-data-analytics-platform/ Type: String Conditions: # Set condition when VPC ID is provided by the user CustomerManagedVPC: !Not [!Equals [!Ref VPCID, '']] # Set condition when VPC ID is NOT provided by the user CreateDBManagedVPC: !Equals [!Ref VPCID, ''] # Set condition when AWS KMS key ID is provided by the user. IsKMSKeyProvided: !Not [!Equals [!Ref KeyArn, '']] # Test for MANAGED_SERVICES CMK use case IsKeyForManagedServicesUseCase: !And [!Not [!Equals [!Ref KeyArn, '']], !Or [!Equals ['MANAGED_SERVICES',!Ref KeyUseCases], !Equals ['BOTH', !Ref KeyUseCases]]] # Test for STORAGE CMK use case IsKeyForStorageUseCase: !And [!Not [!Equals [!Ref KeyArn, '']], !Or [!Equals ['STORAGE',!Ref KeyUseCases], !Equals ['BOTH', !Ref KeyUseCases]]] # Test for ClusterReuse use case of storage ClusterVolumeSet: !Equals [!Ref KeyReuseForClusterVolumes, 'True'] # Test for the OPTIONAL deployment name IsDeploymentNameSet: !Not [!Equals [!Ref DeploymentName, '']] # Test for PrivateLink IsPrivateLinkEnabled: !Equals [!Ref PrivateLinkMode, 'Enabled'] # Checks if the region supports 3 availability zones IsThirdAvailabilityZoneSupported: !Not [!Or [!Equals [!Ref AWS::Region, 'us-west-1'], !Equals [!Ref AWS::Region, 'sa-east-1']]] CreateDBManagedVPCWithThreeAvailabilityZones: !And [!Equals [!Ref VPCID, ''], !Not [!Or [!Equals [!Ref AWS::Region, 'us-west-1'], !Equals [!Ref AWS::Region, 'sa-east-1']]]] # Checks whether it should register an instance profile RegisterInstanceProfile: !Not [!Equals [!Ref ExistingInstanceProfileArn, '']] # Checks whether a HIPAA cluster policy should be created ShouldCreateHipaaClusterPolicy: !Equals [!Ref HIPAAparm, 'Yes'] Rules: # 1. Check whether the current AWS region is supported SupportedRegion: Assertions: - Assert: !Contains - - ap-northeast-1 - ap-northeast-2 - ap-south-1 - ap-southeast-1 - ap-southeast-2 - ca-central-1 - eu-central-1 - eu-west-1 - eu-west-2 - us-east-1 - us-east-2 - us-west-1 - us-west-2 - !Ref AWS::Region AssertDescription: The current AWS region is not supported for E2 deployment. Switch to one of those listed under https://docs.databricks.com/administration-guide/cloud-configurations/aws/regions.html # 2. HIPAA supported regions, if HIPAA flaf is set SupportedHipaaRegions: RuleCondition: !Equals [!Ref HIPAAparm, 'Yes'] Assertions: - Assert: !Contains [['us-east-1', 'us-east-2', 'ca-central-1'], !Ref AWS::Region] AssertDescription: 'Creates a workspace for only HIPAA tier accounts in the us-east-1, us-east-2, and ca-central-1 Regions.' # 3. Optional section. Ensure that the SubnetIds and SecurityGroupIds are provided, if the user provides a customer managed VPC ID CustomerManagedVPC: RuleCondition: !Not [!Equals [!Ref VPCID, '']] Assertions: - Assert: !Not [!Equals ['', !Ref SubnetIDs]] AssertDescription: SubnetIDs is required when VPCID is provided. - Assert: !Not [!Equals ['', !Ref SecurityGroupIDs]] AssertDescription: SecurityGroupIDs is required when VPCID is provided. # 4. Optional Section. Ensure the KeyARN is provided when a use case is specified. KeyUseCases1: RuleCondition: !Not [!Equals [!Ref KeyArn, '']] Assertions: - Assert: !Contains [['MANAGED_SERVICES', 'STORAGE', 'BOTH'],!Ref KeyUseCases] AssertDescription: Acceptable values are MANAGED_SERVICES, STORAGE, or BOTH when you provide a role ARN. KeyUseCases2: RuleCondition: !Or - !Equals [!Ref KeyUseCases, 'STORAGE'] - !Equals [!Ref KeyUseCases, 'BOTH'] Assertions: - Assert: !Contains [['True', 'False'], !Ref KeyReuseForClusterVolumes] AssertDescription: 'Acceptable values are "True" and "False" when the use case is either STORAGE or BOTH.' KeyUseCases3: RuleCondition: !Equals [!Ref KeyUseCases, 'MANAGED_SERVICES'] Assertions: - Assert: !Equals [!Ref KeyReuseForClusterVolumes, ''] AssertDescription: Value must be null if MANAGED_SERVICES is provided. # 5. Assertion rule to prevent changing the QuickStart Bucket name and Prefix parameter # *********************************************************************************************************************** # This rule must be COMMENTED if it is intended to clone the git repo to make modifications prior to promote the changes # *********************************************************************************************************************** AWSQuickStartGitParametersSettings: Assertions: - Assert: !Equals ['aws-quickstart', !Ref QSS3BucketName] AssertDescription: QSS3BucketName must be set to aws-quickstart. - Assert: !Equals ['quickstart-databricks-unified-data-analytics-platform/', !Ref QSS3KeyPrefix] AssertDescription: "QSS3KeyPrefix must be set to quickstart-databricks-unified-data-analytics-platform/." # 6. Optional Section. Ensure that the SubnetIds for the endpoints are provided if the user has asked for PrivateLink and provides a VPC PrivateLinkForCustomerManagedVPC: RuleCondition: !And [!Not [!Equals [!Ref VPCID, '']], !Equals [!Ref PrivateLinkMode, 'Enabled']] Assertions: - Assert: !Not [!Equals ['', !Ref PrivateLinkSubnetIds]] AssertDescription: PrivateLinkSubnetIds is required when VPCID is provided and PrivateLink is enabled Mappings: DatabricksAddresses: us-east-1: "workspace": "com.amazonaws.vpce.us-east-1.vpce-svc-09143d1e626de2f04" "backend": "com.amazonaws.vpce.us-east-1.vpce-svc-00018a8c3ff62ffdf" us-east-2: "workspace": "com.amazonaws.vpce.us-east-2.vpce-svc-041dc2b4d7796b8d3" "backend": "com.amazonaws.vpce.us-east-2.vpce-svc-090a8fab0d73e39a6" us-west-1: "workspace": "UNSUPPORTED" "backend": "UNSUPPORTED" us-west-2: "workspace": "com.amazonaws.vpce.us-west-2.vpce-svc-0129f463fcfbc46c5" "backend": "com.amazonaws.vpce.us-west-2.vpce-svc-0158114c0c730c3bb" eu-west-1: "workspace": "com.amazonaws.vpce.eu-west-1.vpce-svc-0da6ebf1461278016" "backend": "com.amazonaws.vpce.eu-west-1.vpce-svc-09b4eb2bc775f4e8c" eu-west-2: "workspace": "com.amazonaws.vpce.eu-west-2.vpce-svc-01148c7cdc1d1326c" "backend": "com.amazonaws.vpce.eu-west-2.vpce-svc-05279412bf5353a45" eu-central-1: "workspace": "com.amazonaws.vpce.eu-central-1.vpce-svc-081f78503812597f7" "backend": "com.amazonaws.vpce.eu-central-1.vpce-svc-08e5dfca9572c85c4" ap-southeast-1: "workspace": "com.amazonaws.vpce.ap-southeast-1.vpce-svc-02535b257fc253ff4" "backend": "com.amazonaws.vpce.ap-southeast-1.vpce-svc-0557367c6fc1a0c5c" ap-southeast-2: "workspace": "com.amazonaws.vpce.ap-southeast-2.vpce-svc-0b87155ddd6954974" "backend": "com.amazonaws.vpce.ap-southeast-2.vpce-svc-0b4a72e8f825495f6" ap-northeast-1: "workspace": "com.amazonaws.vpce.ap-northeast-1.vpce-svc-02691fd610d24fd64" "backend": "com.amazonaws.vpce.ap-northeast-1.vpce-svc-02aa633bda3edbec0" ap-south-1: "workspace": "com.amazonaws.vpce.ap-south-1.vpce-svc-0dbfe5d9ee18d6411" "backend": "com.amazonaws.vpce.ap-south-1.vpce-svc-03fd4d9b61414f3de" ca-central-1: "workspace": "com.amazonaws.vpce.ca-central-1.vpce-svc-0205f197ec0e28d65" "backend": "com.amazonaws.vpce.ca-central-1.vpce-svc-0c4e25bdbcbfbb684" Resources: # The VPC DBSVpc: Condition: CreateDBManagedVPC Type: AWS::EC2::VPC Properties: CidrBlock: 10.52.0.0/16 EnableDnsHostnames: true EnableDnsSupport: true Tags: - Key: Name Value: !Sub ${AWS::StackName}-DatabricksVPC # Internet gateway DBSVpcIgw: Condition: CreateDBManagedVPC Type: AWS::EC2::InternetGateway DependsOn: DBSVpc Properties: Tags: - Key: Name Value: !Sub ${AWS::StackName}-DatabricksVpcIgw #... attached to the VPC DBSVpcIgwAttachment: Condition: CreateDBManagedVPC Type: AWS::EC2::VPCGatewayAttachment Properties: InternetGatewayId: !GetAtt DBSVpcIgw.InternetGatewayId VpcId: !Ref DBSVpc # The subnet for the NAT Gateway DBSNatSubnet: Condition: CreateDBManagedVPC Type: AWS::EC2::Subnet Properties: VpcId: !Ref DBSVpc CidrBlock: 10.52.0.0/24 AvailabilityZone: !Select [0, !GetAZs ""] MapPublicIpOnLaunch: true Tags: - Key: Name Value: !Sub ${AWS::StackName}-DatabricksNatSubnet # The subnets for the VPC Endpoints DBSEndpointSubnet1: Condition: CreateDBManagedVPC Type: AWS::EC2::Subnet Properties: VpcId: !Ref DBSVpc CidrBlock: 10.52.6.0/24 AvailabilityZone: !Select [0, !GetAZs ""] MapPublicIpOnLaunch: true Tags: - Key: Name Value: !Sub ${AWS::StackName}-DatabricksEndpointSubnet1 DBSEndpointSubnet2: Condition: CreateDBManagedVPC Type: AWS::EC2::Subnet Properties: VpcId: !Ref DBSVpc CidrBlock: 10.52.7.0/24 AvailabilityZone: !Select [1, !GetAZs ""] MapPublicIpOnLaunch: true Tags: - Key: Name Value: !Sub ${AWS::StackName}-DatabricksEndpointSubnet2 DBSEndpointSubnet3: Condition: CreateDBManagedVPCWithThreeAvailabilityZones Type: AWS::EC2::Subnet Properties: VpcId: !Ref DBSVpc CidrBlock: 10.52.8.0/24 AvailabilityZone: !Select [2, !GetAZs ""] MapPublicIpOnLaunch: true Tags: - Key: Name Value: !Sub ${AWS::StackName}-DatabricksEndpointSubnet3 # The private subnets for the Databricks clusters DBSClusterSubnet1: Condition: CreateDBManagedVPC Type: AWS::EC2::Subnet Properties: VpcId: !Ref DBSVpc CidrBlock: 10.52.160.0/19 AvailabilityZone: !Select [0, !GetAZs ""] MapPublicIpOnLaunch: false Tags: - Key: Name Value: !Sub ${AWS::StackName}-DatabricksClusterSubnet1 DBSClusterSubnet2: Condition: CreateDBManagedVPC Type: AWS::EC2::Subnet Properties: VpcId: !Ref DBSVpc CidrBlock: 10.52.192.0/19 AvailabilityZone: !Select [1, !GetAZs ""] MapPublicIpOnLaunch: false Tags: - Key: Name Value: !Sub ${AWS::StackName}-DatabricksClusterSubnet2 DBSClusterSubnet3: Condition: CreateDBManagedVPCWithThreeAvailabilityZones Type: AWS::EC2::Subnet Properties: VpcId: !Ref DBSVpc CidrBlock: 10.52.224.0/19 AvailabilityZone: !Select [2, !GetAZs ""] MapPublicIpOnLaunch: false Tags: - Key: Name Value: !Sub ${AWS::StackName}-DatabricksClusterSubnet3 # The Elastic IP for the NAT Gateway ElasticIPForNat: Condition: CreateDBManagedVPC Type: AWS::EC2::EIP Properties: Domain: vpc Tags: - Key: Name Value: !Sub ${AWS::StackName}-DatabricksNatElasticIP # The NAT gateway DBSNat: Condition: CreateDBManagedVPC Type: AWS::EC2::NatGateway Properties: AllocationId: !GetAtt ElasticIPForNat.AllocationId ConnectivityType: public SubnetId: !Ref DBSNatSubnet Tags: - Key: Name Value: !Sub ${AWS::StackName}-DatabricksNat # The route table attached to the nat subnet DBSNatRouteTable: Condition: CreateDBManagedVPC Type: AWS::EC2::RouteTable Properties: VpcId: !Ref DBSVpc Tags: - Key: Name Value: !Sub ${AWS::StackName}-DatabricksNatRouteTable # Routes to the internet RouteToInternetInNatRouteTable: Condition: CreateDBManagedVPC DependsOn: DBSVpcIgwAttachment Type: AWS::EC2::Route Properties: RouteTableId: !Ref DBSNatRouteTable DestinationCidrBlock: 0.0.0.0/0 GatewayId: !Ref DBSVpcIgw # Associate the route table to the subnet NatSubnetRouteTableAssociation: Condition: CreateDBManagedVPC DependsOn: RouteToInternetInNatRouteTable Type: AWS::EC2::SubnetRouteTableAssociation Properties: RouteTableId: !Ref DBSNatRouteTable SubnetId: !Ref DBSNatSubnet # The route table for the private subnets DBSPrivateRouteTable: Condition: CreateDBManagedVPC Type: AWS::EC2::RouteTable Properties: VpcId: !Ref DBSVpc Tags: - Key: Name Value: !Sub ${AWS::StackName}-DatabricksPrivateRouteTable RouteToInternetInPrivateRouteTable: Condition: CreateDBManagedVPC Type: AWS::EC2::Route Properties: RouteTableId: !Ref DBSPrivateRouteTable DestinationCidrBlock: 0.0.0.0/0 NatGatewayId: !Ref DBSNat PrivateSubnet1RouteTableAssociation: Condition: CreateDBManagedVPC DependsOn: RouteToInternetInPrivateRouteTable Type: AWS::EC2::SubnetRouteTableAssociation Properties: RouteTableId: !Ref DBSPrivateRouteTable SubnetId: !Ref DBSClusterSubnet1 PrivateSubnet2RouteTableAssociation: Condition: CreateDBManagedVPC DependsOn: RouteToInternetInPrivateRouteTable Type: AWS::EC2::SubnetRouteTableAssociation Properties: RouteTableId: !Ref DBSPrivateRouteTable SubnetId: !Ref DBSClusterSubnet2 PrivateSubnet3RouteTableAssociation: Condition: CreateDBManagedVPCWithThreeAvailabilityZones DependsOn: RouteToInternetInPrivateRouteTable Type: AWS::EC2::SubnetRouteTableAssociation Properties: RouteTableId: !Ref DBSPrivateRouteTable SubnetId: !Ref DBSClusterSubnet3 # The S3 gateway endpoint S3GatewayEndpoint: Condition: CreateDBManagedVPC Type: AWS::EC2::VPCEndpoint Properties: ServiceName: !Sub com.amazonaws.${AWS::Region}.s3 VpcEndpointType: Gateway VpcId: !Ref DBSVpc RouteTableIds: - !Ref DBSPrivateRouteTable # The security group for the workspace DBSWorkspaceSecurityGroup: Condition: CreateDBManagedVPC Type: AWS::EC2::SecurityGroup Properties: GroupName: !Sub ${AWS::StackName}-DBSWorkspaceSG VpcId: !Ref DBSVpc GroupDescription: Allow access from within the same security group Tags: - Key: Name Value: !Sub ${AWS::StackName}-DBSWorkspaceSG # Allow all access from the same security group DBSWorkspaceSecurityGroupDefaultTcpIngress: Condition: CreateDBManagedVPC Type: AWS::EC2::SecurityGroupIngress Properties: GroupId: !GetAtt DBSWorkspaceSecurityGroup.GroupId Description: Allow all tcp inbound access from the same security group SourceSecurityGroupId: !GetAtt DBSWorkspaceSecurityGroup.GroupId IpProtocol: tcp FromPort: 0 ToPort: 65535 DBSWorkspaceSecurityGroupDefaultUdpIngress: Condition: CreateDBManagedVPC Type: AWS::EC2::SecurityGroupIngress Properties: GroupId: !GetAtt DBSWorkspaceSecurityGroup.GroupId Description: Allow all udp inbound access from the same security group SourceSecurityGroupId: !GetAtt DBSWorkspaceSecurityGroup.GroupId IpProtocol: udp FromPort: 0 ToPort: 65535 DBSWorkspaceSecurityGroupDefaultTcpEgress: Condition: CreateDBManagedVPC Type: AWS::EC2::SecurityGroupEgress Properties: GroupId: !GetAtt DBSWorkspaceSecurityGroup.GroupId Description: Allow all tcp output access from the same security group DestinationSecurityGroupId: !GetAtt DBSWorkspaceSecurityGroup.GroupId IpProtocol: tcp FromPort: 0 ToPort: 65535 DBSWorkspaceSecurityGroupDefaultUdpEgress: Condition: CreateDBManagedVPC Type: AWS::EC2::SecurityGroupEgress Properties: GroupId: !GetAtt DBSWorkspaceSecurityGroup.GroupId Description: Allow all udp output access from the same security group DestinationSecurityGroupId: !GetAtt DBSWorkspaceSecurityGroup.GroupId IpProtocol: udp FromPort: 0 ToPort: 65535 DBSWorkspaceSecurityGroupEgressForHttps: Condition: CreateDBManagedVPC Type: AWS::EC2::SecurityGroupEgress Properties: GroupId: !GetAtt DBSWorkspaceSecurityGroup.GroupId Description: Allow accessing Databricks infrastructure, cloud data sources, and library repositories CidrIp: 0.0.0.0/0 IpProtocol: tcp FromPort: 443 ToPort: 443 DBSWorkspaceSecurityGroupEgressForMetastore: Condition: CreateDBManagedVPC Type: AWS::EC2::SecurityGroupEgress Properties: GroupId: !GetAtt DBSWorkspaceSecurityGroup.GroupId Description: Allow accessing the Databricks metastore CidrIp: 0.0.0.0/0 IpProtocol: tcp FromPort: 3306 ToPort: 3306 # The STS VPC endpoint STSInterfaceEndpoint: Condition: CreateDBManagedVPC Type: AWS::EC2::VPCEndpoint Metadata: cfn-lint: config: ignore_checks: - EIAMPolicyWildcardResource - EIAMAccountIDInPrincipal ignore_reasons: EIAMPolicyWildcardResource: "Need to manage databricks workspaces" EIAMAccountIDInPrincipal: "Hardcoded account ID needed for configuration: https://docs.databricks.com/administration-guide/account-settings/aws-accounts.html#step-2-create-a-cross-account-role-and-an-access-policy" Properties: ServiceName: !Sub com.amazonaws.${AWS::Region}.sts VpcEndpointType: Interface VpcId: !Ref DBSVpc PrivateDnsEnabled: true SecurityGroupIds: - !GetAtt DBSWorkspaceSecurityGroup.GroupId SubnetIds: - !Ref DBSEndpointSubnet1 - !Ref DBSEndpointSubnet2 - !If [IsThirdAvailabilityZoneSupported, !Ref DBSEndpointSubnet3, !Ref AWS::NoValue] PolicyDocument: Statement: - Effect: Allow Principal: "AWS": !Ref AWS::AccountId Action: - sts:AssumeRole - sts:GetAccessKeyInfo - sts:GetSessionToken - sts:DecodeAuthorizationMessage - sts:TagSession Resource: "*" - Effect: Allow Principal: "AWS": - arn:aws:iam::414351767826:user/databricks-datasets-readonly-user - "414351767826" Action: - sts:AssumeRole - sts:GetSessionToken - sts:TagSession Resource: "*" # The Kinesis endpoint KinesisInterfaceEndpoint: Condition: CreateDBManagedVPC Type: AWS::EC2::VPCEndpoint Metadata: cfn-lint: config: ignore_checks: - EIAMAccountIDInPrincipal ignore_reasons: EIAMAccountIDInPrincipal: "Hardcoded account ID needed for configuration: https://docs.databricks.com/administration-guide/account-settings/aws-accounts.html#step-2-create-a-cross-account-role-and-an-access-policy" Properties: ServiceName: !Sub com.amazonaws.${AWS::Region}.kinesis-streams VpcEndpointType: Interface VpcId: !Ref DBSVpc PrivateDnsEnabled: true SecurityGroupIds: - !GetAtt DBSWorkspaceSecurityGroup.GroupId SubnetIds: - !Ref DBSEndpointSubnet1 - !Ref DBSEndpointSubnet2 - !If [IsThirdAvailabilityZoneSupported, !Ref DBSEndpointSubnet3, !Ref AWS::NoValue] PolicyDocument: Statement: - Effect: Allow Principal: "AWS": - "414351767826" - !Ref AWS::AccountId Action: - kinesis:PutRecord - kinesis:PutRecords - kinesis:DescribeStream Resource: !Sub arn:${AWS::Partition}:kinesis:${AWS::Region}:414351767826:stream/* # The Databricks REST API endpoint DBSRestApiInterfaceEndpoint: Condition: IsPrivateLinkEnabled Type: AWS::EC2::VPCEndpoint Properties: ServiceName: !FindInMap [DatabricksAddresses, !Ref AWS::Region, workspace] VpcEndpointType: Interface VpcId: !If [CreateDBManagedVPC, !Ref DBSVpc, !Ref VPCID] PrivateDnsEnabled: true SecurityGroupIds: !If - CreateDBManagedVPC - - !Ref DBSWorkspaceSecurityGroup - !GetAtt SecurityGroupList.List SubnetIds: !If - CreateDBManagedVPC - - !Ref DBSEndpointSubnet1 - !Ref DBSEndpointSubnet2 - !If [IsThirdAvailabilityZoneSupported, !Ref DBSEndpointSubnet3, !Ref AWS::NoValue] - !GetAtt SubnetIdList.List # The Databricks REST API endpoint DBSRelayInterfaceEndpoint: Condition: IsPrivateLinkEnabled Type: AWS::EC2::VPCEndpoint Properties: ServiceName: !FindInMap [DatabricksAddresses, !Ref AWS::Region, backend] VpcEndpointType: Interface VpcId: !If [CreateDBManagedVPC, !Ref DBSVpc, !Ref VPCID] PrivateDnsEnabled: true SecurityGroupIds: !If - CreateDBManagedVPC - - !Ref DBSWorkspaceSecurityGroup - !GetAtt SecurityGroupList.List SubnetIds: !If - CreateDBManagedVPC - - !Ref DBSEndpointSubnet1 - !Ref DBSEndpointSubnet2 - !If [IsThirdAvailabilityZoneSupported, !Ref DBSEndpointSubnet3, !Ref AWS::NoValue] - !GetAtt SubnetIdList.List WaitForVpc: Type: AWS::CloudFormation::WaitConditionHandle Metadata: VpcReady: !If - CreateDBManagedVPC - - !Ref NatSubnetRouteTableAssociation - !Ref PrivateSubnet1RouteTableAssociation - !Ref PrivateSubnet2RouteTableAssociation - !If [IsThirdAvailabilityZoneSupported, !Ref PrivateSubnet3RouteTableAssociation, !Ref AWS::NoValue] - !Ref AWS::NoValue SubnetIdList: Condition: CustomerManagedVPC Type: Custom::SubnetIdList Properties: ServiceToken: !GetAtt StringToListFunction.Arn String: !Ref PrivateLinkSubnetIds SecurityGroupList: Condition: CustomerManagedVPC Type: Custom::SecurityGroupList Properties: ServiceToken: !GetAtt StringToListFunction.Arn String: !Ref SecurityGroupIDs # S3 root bucket requirements assetsS3Bucket: Type: AWS::S3::Bucket Properties: BucketName: !Ref BucketName PublicAccessBlockConfiguration: BlockPublicAcls : true BlockPublicPolicy : true IgnorePublicAcls : true RestrictPublicBuckets : true bucketPolicy: Type: 'AWS::S3::BucketPolicy' Metadata: cfn-lint: config: ignore_checks: - EIAMAccountIDInPrincipal ignore_reasons: EIAMAccountIDInPrincipal: "Hardcoded account ID needed for configuration: https://docs.databricks.com/administration-guide/account-settings/aws-accounts.html#step-2-create-a-cross-account-role-and-an-access-policy" Properties: PolicyDocument: Id: MyPolicy Version: 2012-10-17 Statement: - Sid: Grant Databricks Access Effect: Allow Principal: AWS: arn:aws:iam::414351767826:root Action: - 's3:GetObject' - 's3:GetObjectVersion' - 's3:PutObject' - 's3:DeleteObject' - 's3:ListBucket' - 's3:GetBucketLocation' Resource: - !Sub 'arn:${AWS::Partition}:s3:::${assetsS3Bucket}/*' - !Sub 'arn:${AWS::Partition}:s3:::${assetsS3Bucket}' Bucket: !Ref assetsS3Bucket # Databricks API for configuring notebook encryption with a customer managed KMS, if provided createCustomerManagedKey: Condition: IsKMSKeyProvided DependsOn: updateCustomManagedKeys Type: Custom::CreateCustomerManagedKey Properties: ServiceToken: !GetAtt databricksApiFunction.Arn action: CREATE_CUSTOMER_MANAGED_KEY accountId: !Ref AccountId username: !Ref Username password: !Ref Password key_arn: !Ref KeyArn key_alias: !Ref KeyAlias use_cases: !Ref KeyUseCases reuse_key_for_cluster_volumes: !Ref KeyReuseForClusterVolumes user_agent: 'databricks-CloudFormation-provider' # Databricks API for workspace credentials createCredentials: Type: Custom::CreateCredentials Properties: ServiceToken: !GetAtt databricksApiFunction.Arn action: CREATE_CREDENTIALS accountId: !Ref AccountId username: !Ref Username password: !Ref Password credentials_name: !If [IsDeploymentNameSet, !Sub '${DeploymentName}-credentials', !Sub '${AWS::StackName}-credentials'] role_arn: !Ref IAMArn user_agent: 'databricks-CloudFormation-provider' # Databricks API for workspace storage configuration createStorageConfiguration: Type: Custom::CreateStorageConfigurations Properties: ServiceToken: !GetAtt databricksApiFunction.Arn action: CREATE_STORAGE_CONFIGURATIONS accountId: !Ref AccountId username: !Ref Username password: !Ref Password storage_config_name: !If [IsDeploymentNameSet, !Sub '${DeploymentName}-storage', !Sub '${AWS::StackName}-storage'] s3bucket_name: !Ref assetsS3Bucket user_agent: 'databricks-CloudFormation-provider' # PrivateLink endpoint for the REST API WorkspaceVpcEnpoint: Condition: IsPrivateLinkEnabled Type: Custom::WorkspaceVpcEnpoint Properties: ServiceToken: !GetAtt databricksApiFunction.Arn action: CREATE_VPC_ENDPOINT accountId: !Ref AccountId username: !Ref Username password: !Ref Password aws_region: !Ref AWS::Region endpoint_name: !Sub ${AWS::StackName}_workspaceVpcEndpoint vpc_endpoint_id: !Ref DBSRestApiInterfaceEndpoint user_agent: databricks-CloudFormation-provider # PrivateLink endpoint for the SCC Relay BackendVpcEnpoint: Condition: IsPrivateLinkEnabled Type: Custom::BackendVpcEnpoint Properties: ServiceToken: !GetAtt databricksApiFunction.Arn action: CREATE_VPC_ENDPOINT accountId: !Ref AccountId username: !Ref Username password: !Ref Password aws_region: !Ref AWS::Region endpoint_name: !Sub ${AWS::StackName}_workspaceVpcEndpoint vpc_endpoint_id: !Ref DBSRelayInterfaceEndpoint user_agent: databricks-CloudFormation-provider # Databricks API for network configuration createNetworks: Type: Custom::createNetworks DependsOn: WaitForVpc Properties: ServiceToken: !GetAtt databricksApiFunction.Arn action: CREATE_NETWORKS accountId: !Ref AccountId username: !Ref Username password: !Ref Password network_name: !If [IsDeploymentNameSet, !Sub '${DeploymentName}-network', !Sub '${AWS::StackName}-network'] vpc_id: !If [CreateDBManagedVPC, !Ref DBSVpc, !Ref VPCID] subnet_ids: !If - CreateDBManagedVPC - !If - IsThirdAvailabilityZoneSupported - !Sub ${DBSClusterSubnet1}, ${DBSClusterSubnet2}, ${DBSClusterSubnet3} - !Sub ${DBSClusterSubnet1}, ${DBSClusterSubnet2} - !Ref SubnetIDs security_group_ids: !If [CreateDBManagedVPC, !Ref DBSWorkspaceSecurityGroup, !Ref SecurityGroupIDs] relay_access_endpoint_id: !If [IsPrivateLinkEnabled, !Ref BackendVpcEnpoint, !Ref AWS::NoValue] rest_access_endpoint_id: !If [IsPrivateLinkEnabled, !Ref WorkspaceVpcEnpoint, !Ref AWS::NoValue] user_agent: 'databricks-CloudFormation-provider' # Private Access Settings PrivateAccessConfiguration: Condition: IsPrivateLinkEnabled Type: Custom::PrivateAccessConfiguration Properties: ServiceToken: !GetAtt databricksApiFunction.Arn action: CREATE_PRIVATE_ACCESS_CONFIGURATION accountId: !Ref AccountId username: !Ref Username password: !Ref Password aws_region: !Ref AWS::Region private_access_settings_name: !Sub ${AWS::StackName}_privateAccessSettings public_access_enabled: true allowed_vpc_endpoint_ids: !Ref WorkspaceVpcEnpoint # Databricks API for workspace creation createWorkspace: Type: Custom::CreateWorkspace Properties: ServiceToken: !GetAtt databricksApiFunction.Arn action: CREATE_WORKSPACES accountId: !Ref AccountId username: !Ref Username password: !Ref Password workspace_name: !If [IsDeploymentNameSet, !Sub '${DeploymentName}-workspace', !Sub '${AWS::StackName}-workspace'] deployment_name: !Ref DeploymentName aws_region: !Ref AWS::Region credentials_id: !Ref createCredentials storage_config_id: !Ref createStorageConfiguration network_id: !Ref createNetworks private_access_settings_id: !If [IsPrivateLinkEnabled, !Ref PrivateAccessConfiguration, !Ref AWS::NoValue] managed_services_customer_managed_key_id: !If [IsKeyForManagedServicesUseCase, !Ref createCustomerManagedKey, !Ref AWS::NoValue] storage_customer_managed_key_id: !If [IsKeyForStorageUseCase, !Ref createCustomerManagedKey, !Ref AWS::NoValue] user_agent: databricks-CloudFormation-provider # Registers the instance profile registerInstanceProfile: Condition: RegisterInstanceProfile Type: Custom::RegisterInstanceProfile Properties: ServiceToken: !GetAtt databricksApiFunction.Arn action: REGISTER_INSTANCE_PROFILE accountId: !Ref AccountId username: !Ref Username password: !Ref Password workspace_id: !Ref createWorkspace instance_profile_arn: !Ref ExistingInstanceProfileArn user_agent: databricks-CloudFormation-provider # Creates a HIPAA cluster policy createHipaaClusterPolicy: Condition: ShouldCreateHipaaClusterPolicy Type: Custom::CreateHipaaClusterPolicy Properties: ServiceToken: !GetAtt databricksApiFunction.Arn action: CREATE_HIPAA_CLUSTER_POLICY accountId: !Ref AccountId username: !Ref Username password: !Ref Password workspace_deployment_name: !GetAtt createWorkspace.DeploymentName user_agent: databricks-CloudFormation-provider # Creates a starter cluster createStarterCluster: Type: Custom::createStarterCluster Properties: ServiceToken: !GetAtt databricksApiFunction.Arn action: CREATE_STARTER_CLUSTER accountId: !Ref AccountId username: !Ref Username password: !Ref Password workspace_deployment_name: !GetAtt createWorkspace.DeploymentName instance_profile_arn: !If [RegisterInstanceProfile, !Ref ExistingInstanceProfileArn, !Ref AWS::NoValue] policy_id: !If [ShouldCreateHipaaClusterPolicy, !Ref createHipaaClusterPolicy, !Ref AWS::NoValue] user_agent: databricks-CloudFormation-provider # Customer managed Keys - Update policy for S3 and EBS volumes updateCustomManagedKeys: Condition: IsKMSKeyProvided DependsOn: updateKMSkeysFunction Type: Custom::updateCustomManagedKeys Properties: ServiceToken: !GetAtt updateKMSkeysFunction.Arn key_id: !Ref KeyArn arn_credentials: !If [ClusterVolumeSet, !Ref 'IAMArn', ''] use_cases: !Ref KeyUseCases reuse_key_for_cluster_volumes: !Ref KeyReuseForClusterVolumes # Databricks main Lambda for all E2 objects and workspace creation databricksApiFunction: DependsOn: CopyZips Type: AWS::Lambda::Function Properties: Description: Databricks account API. Handler: rest_client.handler Runtime: python3.8 Role: !Ref IAMArnLambda Timeout: 900 Code: S3Bucket: !Ref LambdaZipsBucket S3Key: !Sub ${QSS3KeyPrefix}functions/packages/lambda.zip # Databricks CMK lambda updateKMSkeysFunction: Condition: IsKMSKeyProvided DependsOn: CopyZips Type: AWS::Lambda::Function Properties: Description: Update CMK policy document for storage. Handler: update_custommanaged_cmk_policy.handler Runtime: python3.8 Role: !Ref IAMArnLambda Timeout: 60 Code: S3Bucket: !Ref LambdaZipsBucket S3Key: !Sub ${QSS3KeyPrefix}functions/packages/lambda.zip # Resources to stage lambda.zip file LambdaZipsBucket: Type: AWS::S3::Bucket CopyZips: Type: Custom::CopyZips Properties: ServiceToken: !GetAtt CopyZipsFunction.Arn DestBucket: !Ref LambdaZipsBucket SourceBucket: !Ref QSS3BucketName Prefix: !Ref QSS3KeyPrefix Objects: - functions/packages/lambda.zip StringToListFunction: Condition: CustomerManagedVPC Type: AWS::Lambda::Function Properties: Description: Converts a string of comma-separated values to a list of strings Handler: index.handler Runtime: python3.8 Role: !Ref IAMArnLambda Timeout: 10 Code: ZipFile: | import cfnresponse def handler(event, context): result = [i.strip() for i in event['ResourceProperties']['String'].split(',')] cfnresponse.send(event, context, cfnresponse.SUCCESS, {'List':result}, None) CopyZipsFunction: Type: AWS::Lambda::Function Properties: Description: Copies objects from an S3 bucket to another destination. Handler: index.handler Runtime: python3.7 Role: !Ref IAMArnLambda Timeout: 240 Code: ZipFile: | import json import logging import threading import boto3 import cfnresponse def copy_objects(source_bucket, dest_bucket, prefix, objects): s3 = boto3.client('s3') for o in objects: key = prefix + o copy_source = { 'Bucket': source_bucket, 'Key': key } print('copy_source: %s' % copy_source) print('dest_bucket = %s'%dest_bucket) print('key = %s' %key) s3.copy_object(CopySource=copy_source, Bucket=dest_bucket, Key=key) def delete_objects(bucket, prefix, objects): s3 = boto3.client('s3') objects = {'Objects': [{'Key': prefix + o} for o in objects]} s3.delete_objects(Bucket=bucket, Delete=objects) def timeout(event, context): logging.error('Execution is about to time out, sending failure response to CloudFormation') cfnresponse.send(event, context, cfnresponse.FAILED, {}, None) def handler(event, context): # make sure we send a failure to CloudFormation if the function # is going to timeout timer = threading.Timer((context.get_remaining_time_in_millis() / 1000.00) - 0.5, timeout, args=[event, context]) timer.start() print('Received event: %s' % json.dumps(event)) status = cfnresponse.SUCCESS try: source_bucket = event['ResourceProperties']['SourceBucket'] dest_bucket = event['ResourceProperties']['DestBucket'] prefix = event['ResourceProperties']['Prefix'] objects = event['ResourceProperties']['Objects'] if event['RequestType'] == 'Delete': delete_objects(dest_bucket, prefix, objects) else: copy_objects(source_bucket, dest_bucket, prefix, objects) except Exception as e: logging.error('Exception: %s' % e, exc_info=True) status = cfnresponse.FAILED finally: timer.cancel() cfnresponse.send(event, context, status, {}, None)