AWSTemplateFormatVersion: '2010-09-09' Description: 'Healthcare data lake' Parameters: ArtifactBucket: Description: Bucket with artifacts such as Lambda functions and external libraries Type: AWS::SSM::Parameter::Value Default: /healthcare-data-lake/artifact-bucket Hl7ParsingLibKey: Description: Key for our HL7 parsing library Type: AWS::SSM::Parameter::Value Default: /healthcare-data-lake/hl7v2-parsing-Lambda-Layer Hl7ParsingFuncKey: Description: Key for our HL7 parsing Lambda function Type: AWS::SSM::Parameter::Value Default: /healthcare-data-lake/hl7v2-parsing-Lambda Hl7IngestToRawFuncKey: Description: Key for our HL7 raw Lambda function Type: AWS::SSM::Parameter::Value Default: /healthcare-data-lake/hl7v2-raw-Lambda Hl7CleanEr7FuncKey: Description: Key for our HL7 cleaning Lambda function Type: AWS::SSM::Parameter::Value Default: /healthcare-data-lake/hl7v2-cleaning-Lambda Hl7RetrieveMsgFuncKey: Description: Key for our HL7 retrieve message Lambda function Type: AWS::SSM::Parameter::Value Default: /healthcare-data-lake/hl7v2-retrieving-Lambda DataLakeBucketName: Description: Bucket for ingesting and processing healthcare data Type: String LogRetention: Description: Days to to keep logs Type: Number Default: 1 Resources: #-------------------------------------------------------------- Data Lake bucket DataLakeBucket: Type: AWS::S3::Bucket Properties: BucketName: !Ref DataLakeBucketName BucketEncryption: # Encryption at rest ServerSideEncryptionConfiguration: - ServerSideEncryptionByDefault: SSEAlgorithm: AES256 PublicAccessBlockConfiguration: # Block all public access BlockPublicAcls: true IgnorePublicAcls: true BlockPublicPolicy: true RestrictPublicBuckets: true #-------------------------------------------------------------- Cognito UserPool: Type: AWS::Cognito::UserPool Properties: AdminCreateUserConfig: AllowAdminCreateUserOnly: true UserPoolName: !Sub ${AWS::StackName}-UserPool UsernameAttributes: [email] UserPoolClient: Type: AWS::Cognito::UserPoolClient Properties: ClientName: !Sub ${AWS::StackName}-UserPoolClient GenerateSecret: false UserPoolId: !Ref UserPool ExplicitAuthFlows: - ALLOW_USER_PASSWORD_AUTH - ALLOW_ADMIN_USER_PASSWORD_AUTH - ALLOW_REFRESH_TOKEN_AUTH PreventUserExistenceErrors: ENABLED #-------------------------------------------------------------- Lambda functions Hl7apyLayer: Type: AWS::Lambda::LayerVersion Properties: CompatibleRuntimes: [python2.7, python3.6, python3.7, python3.8] Content: S3Bucket: !Ref ArtifactBucket S3Key: !Ref Hl7ParsingLibKey LayerName: hl7apy Description: "HL7apy parser library" LicenseInfo: MIT # Role assumed by Lambda functions which access the data lake bucket DataLakeLambdaRole: Type: AWS::IAM::Role Properties: AssumeRolePolicyDocument: Version: 2012-10-17 Statement: - Effect: Allow Principal: Service: [lambda.amazonaws.com] Action: ['sts:AssumeRole'] ManagedPolicyArns: - arn:aws:iam::aws:policy/service-role/AWSLambdaBasicExecutionRole # Provides access to CloudWatch for logging Policies: - PolicyName: data-lake-bucket-access PolicyDocument: Version: 2012-10-17 Statement: - Effect: Allow Action: ['s3:GetObject', 's3:PutObject', 's3:PutObjectTagging'] Resource: - !Join ['', ['arn:aws:s3:::', !Ref DataLakeBucket, /*]] # Put into S3 raw zone and store upload parameters as object metadata Hl7IngestToRawLambda: Type: AWS::Lambda::Function Properties: FunctionName: !Sub ${AWS::StackName}_hl7_ingest_raw Code: S3Bucket: !Ref ArtifactBucket S3Key: !Ref Hl7IngestToRawFuncKey Description: Decodes from Base64 and puts into raw zone Handler: hl7_ingest_to_raw.lambda_handler Role: !GetAtt DataLakeLambdaRole.Arn Environment: Variables: data_lake_bucket: !Ref DataLakeBucketName Runtime: python3.8 Hl7IngestToRawLogGroup: Type: AWS::Logs::LogGroup Properties: LogGroupName: !Join ['/', ['/aws/lambda', !Ref Hl7IngestToRawLambda]] RetentionInDays: !Ref LogRetention # Clean the ER7 message Hl7CleanEr7Lambda: Type: AWS::Lambda::Function Properties: FunctionName: !Sub ${AWS::StackName}_hl7_clean_er7 Code: S3Bucket: !Ref ArtifactBucket S3Key: !Ref Hl7CleanEr7FuncKey Description: Cleans ER7 messages using provided input Handler: hl7_clean_er7.lambda_handler Role: !GetAtt DataLakeLambdaRole.Arn Environment: Variables: data_lake_bucket: !Ref DataLakeBucketName Runtime: python3.8 Hl7CleanEr7LogGroup: Type: AWS::Logs::LogGroup Properties: LogGroupName: !Join ['/', ['/aws/lambda', !Ref Hl7CleanEr7Lambda]] RetentionInDays: !Ref LogRetention # Convert from ER7 to JSON Hl7ParseEr7ToJsonLambda: Type: AWS::Lambda::Function Properties: FunctionName: !Sub ${AWS::StackName}_hl7_er7_to_json Code: S3Bucket: !Ref ArtifactBucket S3Key: !Ref Hl7ParsingFuncKey Description: HL7 converter from ER7 to JSON Handler: hl7_parse_er7_to_json.lambda_handler Layers: [!Ref Hl7apyLayer] Role: !GetAtt DataLakeLambdaRole.Arn Environment: Variables: data_lake_bucket: !Ref DataLakeBucketName data_lake_prefix: 'staging/hl7v2/' Runtime: python3.8 Timeout: 30 Hl7ParseEr7ToJsonLogGroup: Type: AWS::Logs::LogGroup Properties: LogGroupName: !Join ['/', ['/aws/lambda', !Ref Hl7ParseEr7ToJsonLambda]] RetentionInDays: !Ref LogRetention # Convert from ER7 to JSON Hl7RetrieveMsgLambda: Type: AWS::Lambda::Function Properties: FunctionName: !Sub ${AWS::StackName}_hl7_retrieve_msg Code: S3Bucket: !Ref ArtifactBucket S3Key: !Ref Hl7RetrieveMsgFuncKey Description: HL7 message retriever Handler: hl7_retrieve_msg.lambda_handler Role: !GetAtt DataLakeLambdaRole.Arn Environment: Variables: data_lake_bucket: !Ref DataLakeBucketName Runtime: python3.8 Hl7ParseEr7ToJsonLogGroup: Type: AWS::Logs::LogGroup Properties: LogGroupName: !Join ['/', ['/aws/lambda', !Ref Hl7RetrieveMsgFuncKey]] RetentionInDays: !Ref LogRetention #-------------------------------------------------------------- SNS topics for other services to attach to Hl7SuccessfulParseTopic: Type: AWS::SNS::Topic Properties: TopicName: !Sub ${AWS::StackName}-Hl7SuccessfulParse Hl7FailedParseTopic: Type: AWS::SNS::Topic Properties: TopicName: !Sub ${AWS::StackName}-Hl7FailedParse #-------------------------------------------------------------- Step Function StatesExecutionRole: Type: "AWS::IAM::Role" Properties: AssumeRolePolicyDocument: Version: "2012-10-17" Statement: - Effect: "Allow" Principal: Service: - !Sub states.${AWS::Region}.amazonaws.com Action: "sts:AssumeRole" Path: "/" Policies: - PolicyName: StatesExecutionPolicy PolicyDocument: Version: "2012-10-17" Statement: - Effect: Allow Action: ["lambda:InvokeFunction"] Resource: "*" - PolicyName: SNSPolicy PolicyDocument: Version: "2012-10-17" Statement: - Effect: Allow Action: ["sns:*"] Resource: "*" - PolicyName: LoggingPolicy PolicyDocument: Version: "2012-10-17" Statement: - Effect: Allow Action: - "logs:CreateLogDelivery" - "logs:GetLogDelivery" - "logs:UpdateLogDelivery" - "logs:DeleteLogDelivery" - "logs:ListLogDeliveries" - "logs:PutResourcePolicy" - "logs:DescribeResourcePolicies" - "logs:DescribeLogGroups" Resource: "*" RawToStagingStateMachine: Type: "AWS::StepFunctions::StateMachine" Properties: DefinitionString: Fn::Sub: - |- { "Comment": "Moves objects from raw to staging in our healthcare data lake", "StartAt": "DetermineProtocol", "States": { "DetermineProtocol": { "Type": "Choice", "Choices": [ { "And": [ { "Variable": "$.key", "StringGreaterThan": "raw/hl7v2/" }, { "Variable": "$.key", "StringLessThan": "raw/hl7v2/~" } ], "Next": "CleanEr7" } ], "Default": "UnknownProtocol" }, "CleanEr7": { "Type": "Task", "Resource": "${cleanEr7LambdaArn}", "Next": "ParseEr7ToJson" }, "ParseEr7ToJson": { "Type": "Task", "Resource": "${parseEr7LambdaArn}", "Catch": [ { "ErrorEquals": [ "KeyError"], "Next": "Unparseable" } ], "Next": "Hl7Parsed" }, "Hl7Parsed": { "Type": "Task", "Resource": "arn:aws:states:::sns:publish", "Parameters": { "TopicArn": "${hl7SucceedParse}", "Message.$": "$" }, "End": true }, "Unparseable": { "Type": "Task", "Resource": "arn:aws:states:::sns:publish", "Parameters": { "TopicArn": "${hl7FailedParse}", "Message.$": "$" }, "End": true }, "UnknownProtocol": { "Type": "Pass", "End": true } } } - cleanEr7LambdaArn: !GetAtt [ Hl7CleanEr7Lambda, Arn ] parseEr7LambdaArn: !GetAtt [ Hl7ParseEr7ToJsonLambda, Arn ] hl7FailedParse: !Ref Hl7FailedParseTopic hl7SucceedParse: !Ref Hl7SuccessfulParseTopic StateMachineType: EXPRESS LoggingConfiguration: Destinations: - CloudWatchLogsLogGroup: LogGroupArn: !GetAtt [RawToStagingStateMachineLogGroup, Arn] IncludeExecutionData: true Level: ALL RoleArn: !GetAtt [ StatesExecutionRole, Arn ] RawToStagingStateMachineLogGroup: Type: AWS::Logs::LogGroup Properties: LogGroupName: !Sub '/aws/states/${AWS::StackName}-RawToStagingStateMachine' RetentionInDays: !Ref LogRetention #-------------------------------------------------------------- Trigger our Step Function # Bucket to hold data lake CloudTrail logs DataLakeCloudTrailBucket: Type: AWS::S3::Bucket Properties: BucketName: !Sub ${DataLakeBucket}-ct-logs BucketEncryption: # Encryption at rest ServerSideEncryptionConfiguration: - ServerSideEncryptionByDefault: SSEAlgorithm: AES256 PublicAccessBlockConfiguration: # Block all public access BlockPublicAcls: true IgnorePublicAcls: true BlockPublicPolicy: true RestrictPublicBuckets: true # Policy to allow CloudTrail service to access our CloudTrail logging bucket DataLakeCloudTrailBucketPolicy: Type: AWS::S3::BucketPolicy Properties: Bucket: Ref: DataLakeCloudTrailBucket PolicyDocument: Version: "2012-10-17" Statement: - Sid: "AWSCloudTrailAclCheck" Effect: "Allow" Principal: Service: "cloudtrail.amazonaws.com" Action: "s3:GetBucketAcl" Resource: !Sub arn:aws:s3:::${DataLakeCloudTrailBucket} - Sid: "AWSCloudTrailWrite" Effect: "Allow" Principal: Service: "cloudtrail.amazonaws.com" Action: "s3:PutObject" Resource: !Sub arn:aws:s3:::${DataLakeCloudTrailBucket}/AWSLogs/${AWS::AccountId}/* Condition: StringEquals: s3:x-amz-acl: "bucket-owner-full-control" # Bucket trail is needed for API events in Amazon S3 to match the CloudWatch Events rule DataLakeRawTrail: DependsOn: # Put the trail as late as possible so it won't collect data during stack creation which would impede rollback - DataLakeCloudTrailBucketPolicy - DataLakeRawEvent Type: AWS::CloudTrail::Trail Properties: EventSelectors: - DataResources: - Type: 'AWS::S3::Object' Values: - !Sub arn:aws:s3:::${DataLakeBucket}/raw/ # We will use the same trail for all data types and choose in the step function ReadWriteType: WriteOnly IsLogging: true S3BucketName: !Ref DataLakeCloudTrailBucket TrailName: !Sub ${AWS::StackName}-DataLakeBucket DataLakeRawEventRole: Type: "AWS::IAM::Role" Properties: AssumeRolePolicyDocument: Version: "2012-10-17" Statement: - Effect: "Allow" Principal: Service: - !Sub states.${AWS::Region}.amazonaws.com - events.amazonaws.com Action: "sts:AssumeRole" Path: /service-role/ Policies: - PolicyName: TriggerStepFunction PolicyDocument: Version: "2012-10-17" Statement: - Effect: Allow Action: ["states:StartExecution"] Resource: !Ref RawToStagingStateMachine # CloudWatch rule to trigger DataLakeRawEvent: Type: AWS::Events::Rule Properties: Description: Monitor our healthcare data lake bucket trail for raw PutObject events and trigger step function EventPattern: source: [aws.s3] detail-type: [AWS API Call via CloudTrail] detail: eventSource: [s3.amazonaws.com] eventName: [PutObject] requestParameters: bucketName: [!Ref DataLakeBucket] Targets: - Arn: !Ref RawToStagingStateMachine Id: !Sub ${DataLakeBucket}-cw-event RoleArn: !GetAtt DataLakeRawEventRole.Arn InputPath: $.detail.requestParameters # Just need the bucket and key #-------------------------------------------------------------- API Gateway HttpAPI: Type: AWS::ApiGatewayV2::Api Properties: Name: !Sub ${AWS::StackName}-APIGateway ProtocolType: HTTP DefaultStage: Type: AWS::ApiGatewayV2::Stage Properties: ApiId: !Ref HttpAPI AutoDeploy: true Description: Default stage StageName: $default MyAuthorizor: Type: AWS::ApiGatewayV2::Authorizer Properties: Name: !Sub ${AWS::StackName}-Authorizer ApiId: !Ref HttpAPI AuthorizerType: JWT IdentitySource: [$request.header.Authorization] JwtConfiguration: Audience: [!Ref UserPoolClient] Issuer: !Sub https://cognito-idp.${AWS::Region}.amazonaws.com/${UserPool} Hl7IngestToRawIntegration: Type: AWS::ApiGatewayV2::Integration DependsOn: [Hl7IngestToRawLambda] Properties: ApiId: !Ref HttpAPI IntegrationType: AWS_PROXY IntegrationUri: !Sub arn:aws:apigateway:${AWS::Region}:lambda:path/2015-03-31/functions/${Hl7IngestToRawLambda.Arn}/invocations PayloadFormatVersion: 2.0 Hl7Er7PostRoute: Type: AWS::ApiGatewayV2::Route Properties: ApiId: !Ref HttpAPI RouteKey: "POST /hl7v2/er7" AuthorizationType: JWT AuthorizerId: !Ref MyAuthorizor Target: !Join [/, [integrations, !Ref Hl7IngestToRawIntegration]] # Permission for the HTTP gateway to invoke our function HttpHl7ParseEr7ToJsonLambdaPermission: Type: AWS::Lambda::Permission DependsOn: [HttpAPI] Properties: Action: lambda:InvokeFunction FunctionName: !Ref Hl7IngestToRawLambda Principal: apigateway.amazonaws.com SourceArn: !Sub arn:aws:execute-api:${AWS::Region}:${AWS::AccountId}:${HttpAPI}/* Hl7RetrieveMsgIntegration: Type: AWS::ApiGatewayV2::Integration DependsOn: [Hl7RetrieveMsgLambda] Properties: ApiId: !Ref HttpAPI IntegrationType: AWS_PROXY IntegrationUri: !Sub arn:aws:apigateway:${AWS::Region}:lambda:path/2015-03-31/functions/${Hl7RetrieveMsgLambda.Arn}/invocations PayloadFormatVersion: 2.0 Hl7RetrieveMsgRoute: Type: AWS::ApiGatewayV2::Route Properties: ApiId: !Ref HttpAPI RouteKey: "GET /hl7v2/format/{format}/msg_uuid/{msg_uuid}" AuthorizationType: JWT AuthorizerId: !Ref MyAuthorizor Target: !Join [/, [integrations, !Ref Hl7RetrieveMsgIntegration]] # Permission for the HTTP gateway to invoke our function HttpHl7RetrieveMsgLambdaPermission: Type: AWS::Lambda::Permission DependsOn: [HttpAPI] Properties: Action: lambda:InvokeFunction FunctionName: !Ref Hl7RetrieveMsgLambda Principal: apigateway.amazonaws.com SourceArn: !Sub arn:aws:execute-api:${AWS::Region}:${AWS::AccountId}:${HttpAPI}/*