# MIT License # # Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved. # # Permission is hereby granted, free of charge, to any person obtaining # a copy of this software and associated documentation files (the # "Software"), to deal in the Software without restriction, including # without limitation the rights to use, copy, modify, merge, publish, # distribute, sublicense, and/or sell copies of the Software, and to # permit persons to whom the Software is furnished to do so, subject # to the following conditions: # # The above copyright notice and this permission notice shall be # included in all copies or substantial portions of the Software. # # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, # EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF # MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND # NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS # BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN # ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN # CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE # SOFTWARE. AWSTemplateFormatVersion: "2010-09-09" Transform: AWS::Serverless-2016-10-31 Description: > This CloudFormation Template deploys the code for Amazon Textract, Amazon Comprehend and Amazon A2I Integration for end-to-end document analysis use-case. Parameters: S3BucketName: Type: String Description: Enter the name for an input S3 Bucket to be created for you. S3OutBucketName: Type: String Description: Enter the name for an output S3 Bucket to be created for you. FlowDefinitionARN: Type: String Description: Enter the Human Review Workflow ARN that you have defined. CustomEntityRecognizerARN: Type: String Description: Enter the Custom Entity Model ARN that is currently in use. CustomEntityTrainingListS3URI: Type: String Description: > Enter the S3 URI for the file that contains entities for the Amazon Comprehend custom entity recognizer training. CustomEntityTrainingDatasetS3URI: Type: String Description: > Enter the S3 URI for the file that contains training dataset for the Amazon Comprehend custom entity recognizer training. Resources: ################################ # Textract Comprehend Lambda ################################ # Create a role for the Comprehend Service to interact with the S3 Bucket ComprehendExecutionRole: Type: "AWS::IAM::Role" Properties: Policies: - PolicyName: "AllowInvoke" PolicyDocument: Version: "2012-10-17" Statement: - Effect: "Allow" Action: "s3:*" Resource: !Sub 'arn:aws:s3:::*' AssumeRolePolicyDocument: Version: "2012-10-17" Statement: - Effect: "Allow" Principal: Service: - comprehend.amazonaws.com Action: "sts:AssumeRole" Path: "/" ManagedPolicyArns: - arn:aws:iam::aws:policy/AmazonS3FullAccess # Textract Comprehend Lambda Role TextractComprehendLambdaRole: Type: "AWS::IAM::Role" Properties: AssumeRolePolicyDocument: Version: "2012-10-17" Statement: - Effect: "Allow" Principal: Service: - lambda.amazonaws.com Action: "sts:AssumeRole" Path: "/" Policies: - PolicyName: "AllowInvoke" PolicyDocument: Version: "2012-10-17" Statement: - Effect: "Allow" Action: "lambda:InvokeFunction" Resource: "*" - PolicyName: "ReadWriteToS3" PolicyDocument: Version: "2012-10-17" Statement: - Effect: "Allow" Action: - "S3:GetObject" - "S3:PutObject" Resource: !Sub 'arn:aws:s3:::*' - Effect: "Allow" Action: - "S3:ListBucket" Resource: !Sub 'arn:aws:s3:::*' - PolicyName: "ComprehendAccessPolicy" PolicyDocument: Version: "2012-10-17" Statement: - Effect: "Allow" Action: - "comprehend:StartEntitiesDetectionJob" Resource: "*" - PolicyName: "LoggingCapability" PolicyDocument: Version: "2012-10-17" Statement: - Effect: "Allow" Action: - "logs:CreateLogGroup" - "logs:CreateLogStream" - "logs:PutLogEvents" Resource: "arn:aws:logs:*:*:*" - PolicyName: "SSMParameterRead" PolicyDocument: Version: "2012-10-17" Statement: Effect: "Allow" Action: - "ssm:GetParameters" - "ssm:GetParameter" Resource: "*" - PolicyName: "IamPassRoleForComprehend" PolicyDocument: Version: "2012-10-17" Statement: Effect: "Allow" Action: - "iam:PassRole" - "iam:GetRole" Resource: Fn::GetAtt: - ComprehendExecutionRole - Arn ManagedPolicyArns: - arn:aws:iam::aws:policy/service-role/AWSLambdaBasicExecutionRole - arn:aws:iam::aws:policy/AmazonTextractFullAccess DataWranglerLayer: Type: AWS::Serverless::LayerVersion Properties: LayerName: aws-datawrangler-pandas Description: Dependencies for using pandas DataFrames in code ContentUri: s3://prem-profiler-test/code/lambda_handlers/awswrangler-layer-2.7.0-py3.8.zip CompatibleRuntimes: - python3.8 TextractComprehendLambda: Type: AWS::Serverless::Function DependsOn: "TextractComprehendLambdaRole" Properties: Handler: TextractComprehend.lambda_handler Description: "Lambda function to start document analysis and send results of Comprehend for A2I Review." Runtime: python3.8 Layers: - !Ref DataWranglerLayer Role: Fn::GetAtt: - TextractComprehendLambdaRole - Arn MemorySize: 512 Timeout: 180 CodeUri: s3://prem-profiler-test/code/lambda_handlers/TextractComprehend.py.zip ################################ # Textract Comprehend Lambda ################################ CALambdaInvokePermission: Type: 'AWS::Lambda::Permission' Properties: FunctionName: Fn::GetAtt: - ComprehendA2ILambda - Arn Action: 'lambda:InvokeFunction' Principal: s3.amazonaws.com SourceAccount: !Ref 'AWS::AccountId' SourceArn: !Sub 'arn:aws:s3:::${S3OutBucketName}' # Grant Invoke Access for the TextractComprehendLambda to the Customer Trigger Lambda TCLambdaInvokePermission: Type: 'AWS::Lambda::Permission' Properties: FunctionName: Fn::GetAtt: - TextractComprehendLambda - Arn Action: 'lambda:InvokeFunction' Principal: s3.amazonaws.com SourceAccount: !Ref 'AWS::AccountId' SourceArn: !Sub 'arn:aws:s3:::${S3BucketName}' # Custom Lambda to capture S3 Bucket Events and invoke the Textract Comprehend Lambda when # appropriate filters are met NotificationS3BucketTC: Type: AWS::S3::Bucket DependsOn: - TCLambdaInvokePermission Properties: BucketName: !Ref S3BucketName PublicAccessBlockConfiguration: BlockPublicAcls: true BlockPublicPolicy: true IgnorePublicAcls: true RestrictPublicBuckets: true NotificationConfiguration: LambdaConfigurations: - Event: s3:ObjectCreated:* Function: Fn::GetAtt: - TextractComprehendLambda - Arn Filter: S3Key: Rules: - Name: "prefix" Value: "intelligent-doc-demo/input/" - Name: "suffix" Value: "png" # IAM Role for the Custom Trigger Lambda LambdaIAMRole: Type: 'AWS::IAM::Role' Properties: AssumeRolePolicyDocument: Version: 2012-10-17 Statement: - Effect: Allow Principal: Service: - lambda.amazonaws.com Action: - 'sts:AssumeRole' Path: / Policies: - PolicyName: root PolicyDocument: Version: 2012-10-17 Statement: - Effect: Allow Action: - 's3:GetBucketNotification' - 's3:PutBucketNotification' Resource: !Sub 'arn:aws:s3:::*' - Effect: Allow Action: - 'logs:CreateLogGroup' - 'logs:CreateLogStream' - 'logs:PutLogEvents' Resource: 'arn:aws:logs:*:*:*' ################################ # Comprehend A2I ################################ ComprehendA2ILambdaExecutionRole: Type: AWS::IAM::Role Properties: AssumeRolePolicyDocument: Version: '2012-10-17' Statement: - Effect: Allow Principal: Service: - lambda.amazonaws.com Action: - sts:AssumeRole Policies: - PolicyName: allowLogging PolicyDocument: Version: '2012-10-17' Statement: - Effect: Allow Action: - logs:* Resource: arn:aws:logs:*:*:* - PolicyName: getAndDeleteObjects PolicyDocument: Version: '2012-10-17' Statement: - Effect: "Allow" Action: - "s3:GetObject" - "s3:putObject" - "s3:DeleteObject" - "s3:List*" Resource: - !Sub 'arn:aws:s3:::*' - PolicyName: "A2IAccess" PolicyDocument: Version: "2012-10-17" Statement: - Effect: "Allow" Action: - "sagemaker:StartHumanLoop" Resource: "*" - PolicyName: "SSMParameterRead" PolicyDocument: Version: "2012-10-17" Statement: Effect: "Allow" Action: - "ssm:GetParameters" - "ssm:GetParameter" "Resource": "*" ManagedPolicyArns: - arn:aws:iam::aws:policy/service-role/AWSLambdaBasicExecutionRole ComprehendA2ILambda: Type: AWS::Serverless::Function DependsOn: "ComprehendA2ILambdaExecutionRole" Properties: Handler: ComprehendA2I.lambda_handler Description: > This lambda function is triggered once Amazon Comprehend Custom Entity Recognition results are generated. It then collates the results and creates a Human Loop in A2I. Runtime: python3.8 Role: Fn::GetAtt: - ComprehendA2ILambdaExecutionRole - Arn MemorySize: 512 Timeout: 180 CodeUri: s3://prem-profiler-test/code/lambda_handlers/ComprehendA2I.py.zip NotificationS3BucketCA: Type: AWS::S3::Bucket DependsOn: - CALambdaInvokePermission Properties: BucketName: !Ref S3OutBucketName PublicAccessBlockConfiguration: BlockPublicAcls: true BlockPublicPolicy: true IgnorePublicAcls: true RestrictPublicBuckets: true NotificationConfiguration: LambdaConfigurations: - Event: s3:ObjectCreated:* Function: Fn::GetAtt: - ComprehendA2ILambda - Arn Filter: S3Key: Rules: - Name: "prefix" Value: "intelligent-doc-demo/cer-output/" - Name: "suffix" Value: ".gz" ################################ # HRW Completion Lambda ################################ HumanReviewWorkflowCompletedRole: Type: "AWS::IAM::Role" Properties: AssumeRolePolicyDocument: Version: "2012-10-17" Statement: - Effect: "Allow" Principal: Service: - lambda.amazonaws.com Action: "sts:AssumeRole" Path: "/" Policies: - PolicyName: "AllowInvoke" PolicyDocument: Version: "2012-10-17" Statement: - Effect: "Allow" Action: "lambda:InvokeFunction" Resource: "*" - PolicyName: "ReadWriteToS3" PolicyDocument: Version: "2012-10-17" Statement: - Effect: "Allow" Action: - "S3:GetObject" - "S3:PutObject" Resource: !Sub 'arn:aws:s3:::*' - Effect: "Allow" Action: - "S3:ListBucket" Resource: !Sub 'arn:aws:s3:::*' - PolicyName: "SSMParameterRead" PolicyDocument: Version: "2012-10-17" Statement: Effect: "Allow" Action: - "ssm:GetParameters" - "ssm:GetParameter" "Resource": "*" ManagedPolicyArns: - arn:aws:iam::aws:policy/service-role/AWSLambdaBasicExecutionRole HumanReviewWorkflowCompletedLambda: Type: AWS::Serverless::Function DependsOn: "HumanReviewWorkflowCompletedRole" Properties: Handler: HumanReviewCompleted.lambda_handler Description: "Lambda function to handle completion of human workflow." Runtime: python3.8 Role: Fn::GetAtt: - HumanReviewWorkflowCompletedRole - Arn MemorySize: 512 Timeout: 180 CodeUri: s3://prem-profiler-test/code/lambda_handlers/HumanReviewCompleted.py.zip HumanLoopStatusChangeCloudwatchEventRule: Type: AWS::Events::Rule Properties: Description: "Event Rule to tie up human workflow completion to Lambda function" EventPattern: source: - "aws.sagemaker" detail-type: - "SageMaker A2I HumanLoop Status Change" State: ENABLED Targets: - Arn: Fn::GetAtt: - HumanReviewWorkflowCompletedLambda - Arn Id: "TargetFunctionV1" HumanReviewWorkflowCompletedPermission: Type: AWS::Lambda::Permission Properties: Action: lambda:InvokeFunction FunctionName: Fn::GetAtt: - HumanReviewWorkflowCompletedLambda - Arn Principal: events.amazonaws.com SourceArn: Fn::GetAtt: - HumanLoopStatusChangeCloudwatchEventRule - Arn ################################ # Time based New-Entity-Check Lambda ################################ NewEntityCheckLambdaRole: Type: "AWS::IAM::Role" Properties: AssumeRolePolicyDocument: Version: "2012-10-17" Statement: - Effect: "Allow" Principal: Service: - lambda.amazonaws.com Action: "sts:AssumeRole" Path: "/" Policies: - PolicyName: "AllowInvoke" PolicyDocument: Version: "2012-10-17" Statement: - Effect: "Allow" Action: "lambda:InvokeFunction" Resource: "*" - PolicyName: "ReadWriteToS3" PolicyDocument: Version: "2012-10-17" Statement: - Effect: "Allow" Action: - "S3:GetObject" - "S3:PutObject" Resource: !Sub 'arn:aws:s3:::*' - Effect: "Allow" Action: - "S3:ListBucket" Resource: !Sub 'arn:aws:s3:::*' - PolicyName: "SSMParameterRead" PolicyDocument: Version: "2012-10-17" Statement: Effect: "Allow" Action: - "ssm:GetParameters" - "ssm:GetParameter" "Resource": "*" - PolicyName: "DeleteAndPutTrainingComprehendCERParameter" PolicyDocument: Version: "2012-10-17" Statement: Effect: "Allow" Action: - "ssm:PutParameter" - "ssm:DeleteParameter" "Resource": "*" - PolicyName: "IamPassRoleForComprehend" PolicyDocument: Version: "2012-10-17" Statement: Effect: "Allow" Action: - "iam:PassRole" - "iam:GetRole" Resource: Fn::GetAtt: - ComprehendExecutionRole - Arn - PolicyName: "ComprehendCreateEntityRecognizerPermission" PolicyDocument: Version: "2012-10-17" Statement: Effect: "Allow" Action: "comprehend:CreateEntityRecognizer" "Resource": "*" - PolicyName: "EnableDisableCWEventForTrainingCER" PolicyDocument: Version: "2012-10-17" Statement: Effect: "Allow" Action: - "events:EnableRule" - "events:DisableRule" Resource: "*" - PolicyName: "ListCWEventsRules" PolicyDocument: Version: "2012-10-17" Statement: Effect: "Allow" Action: "events:ListRules" Resource: "*" ManagedPolicyArns: - arn:aws:iam::aws:policy/service-role/AWSLambdaBasicExecutionRole NewEntityCheckLambda: Type: AWS::Serverless::Function DependsOn: "NewEntityCheckLambdaRole" Properties: Handler: NewEntityCheck.lambda_handler Description: "Lambda function to handle completion of human workflow." Runtime: python3.8 Role: Fn::GetAtt: - NewEntityCheckLambdaRole - Arn MemorySize: 512 Timeout: 180 CodeUri: s3://prem-profiler-test/code/lambda_handlers/NewEntityCheck.py.zip ScheduledNewEntityCheckCWEventRule: Type: AWS::Events::Rule Properties: Description: "Event Rule to tie up human workflow completion to Lambda function" ScheduleExpression: "rate(1 day)" State: ENABLED Targets: - Arn: Fn::GetAtt: - NewEntityCheckLambda - Arn Id: "NewEntityCheckFunction" NewEntityCheckPermission: Type: AWS::Lambda::Permission Properties: Action: lambda:InvokeFunction FunctionName: Fn::GetAtt: - NewEntityCheckLambda - Arn Principal: events.amazonaws.com SourceArn: Fn::GetAtt: - ScheduledNewEntityCheckCWEventRule - Arn ################################ # Training CER Completion Check Lambda ################################ TrainingCERCompletionCheckLambdaRole: Type: "AWS::IAM::Role" Properties: AssumeRolePolicyDocument: Version: "2012-10-17" Statement: - Effect: "Allow" Principal: Service: - lambda.amazonaws.com Action: "sts:AssumeRole" Path: "/" Policies: - PolicyName: "AllowInvoke" PolicyDocument: Version: "2012-10-17" Statement: - Effect: "Allow" Action: "lambda:InvokeFunction" Resource: "*" - PolicyName: "ReadWriteToS3" PolicyDocument: Version: "2012-10-17" Statement: - Effect: "Allow" Action: - "S3:GetObject" - "S3:PutObject" Resource: !Sub 'arn:aws:s3:::*' - Effect: "Allow" Action: - "S3:ListBucket" Resource: !Sub 'arn:aws:s3:::*' - PolicyName: "SSMParameterRead" PolicyDocument: Version: "2012-10-17" Statement: Effect: "Allow" Action: - "ssm:GetParameters" - "ssm:GetParameter" "Resource": "*" - PolicyName: "DeleteAndPutTrainingComprehendCERParameter" PolicyDocument: Version: "2012-10-17" Statement: Effect: "Allow" Action: - "ssm:PutParameter" - "ssm:DeleteParameter" "Resource": "*" - PolicyName: "IamPassRoleForComprehend" PolicyDocument: Version: "2012-10-17" Statement: Effect: "Allow" Action: - "iam:PassRole" - "iam:GetRole" Resource: Fn::GetAtt: - ComprehendExecutionRole - Arn - PolicyName: "ComprehendCreateEntityRecognizerPermission" PolicyDocument: Version: "2012-10-17" Statement: Effect: "Allow" Action: - "comprehend:CreateEntityRecognizer" - "comprehend:ListEntityRecognizers" - "comprehend:DescribeEntityRecognizer" - "comprehend:DeleteEntityRecognizer" "Resource": "*" - PolicyName: "EnableDisableCWEventForTrainingCER" PolicyDocument: Version: "2012-10-17" Statement: Effect: "Allow" Action: - "events:EnableRule" - "events:DisableRule" Resource: "*" - PolicyName: "ListCWEventsRules" PolicyDocument: Version: "2012-10-17" Statement: Effect: "Allow" Action: "events:ListRules" Resource: "*" ManagedPolicyArns: - arn:aws:iam::aws:policy/service-role/AWSLambdaBasicExecutionRole TrainingCERCompletionLambda: Type: AWS::Serverless::Function DependsOn: "TrainingCERCompletionCheckLambdaRole" Properties: Handler: CERTrainingCompleteCheck.lambda_handler Description: "Lambda function to handle completion of training job for Comprehend CER." Runtime: python3.8 Role: Fn::GetAtt: - TrainingCERCompletionCheckLambdaRole - Arn MemorySize: 512 Timeout: 180 CodeUri: s3://prem-profiler-test/code/lambda_handlers/CERTrainingCompleteCheck.py.zip ScheduledTrainingCERCompletionCheckCWEventRule: Type: AWS::Events::Rule Properties: Description: > Event Rule to periodically check for completion of Training Job for new CER. ScheduleExpression: "rate(10 minutes)" State: DISABLED Targets: - Arn: Fn::GetAtt: - TrainingCERCompletionLambda - Arn Id: "ComprehendCERTrainingCompletionCheckFunction" TrainingCERCompletionCheckPermission: Type: AWS::Lambda::Permission Properties: Action: lambda:InvokeFunction FunctionName: Fn::GetAtt: - TrainingCERCompletionLambda - Arn Principal: events.amazonaws.com SourceArn: Fn::GetAtt: - ScheduledTrainingCERCompletionCheckCWEventRule - Arn ################################ # SSM Parameters ################################ S3BucketNameSSM: Type: 'AWS::SSM::Parameter' Properties: Type: 'String' DataType: 'text' Description: > S3BucketName that contains all data required for the Textract Comprehend A2I workflow. Name: "S3BucketName-TCA2I" Value: !Ref S3BucketName S3OutBucketNameSSM: Type: 'AWS::SSM::Parameter' Properties: Type: 'String' DataType: 'text' Description: > S3BucketName for the output portion of the Textract Comprehend A2I workflow. Name: "S3OutBucketName-TCA2I" Value: !Ref S3OutBucketName FlowDefARNSSM: Type: 'AWS::SSM::Parameter' Properties: Type: 'String' DataType: 'text' Description: > Human Review Workflow ARN that will be used to review Comprehend's Custom Entities. Name: "FlowDefARN-TCA2I" Value: !Ref FlowDefinitionARN CustomEntityRecognizerARNSSM: Type: 'AWS::SSM::Parameter' Properties: Type: 'String' DataType: 'text' Description: > The ARN of the current Custom Entity Recognizer. Name: "CustomEntityRecognizerARN-TCA2I" Value: !Ref CustomEntityRecognizerARN TrainingCustomEntityRecognizerARNSSM: Type: 'AWS::SSM::Parameter' Properties: Type: 'String' DataType: 'text' Description: > The ARN of the under-training Custom Entity Recognizer. Name: "TrainingCustomEntityRecognizerARN-TCA2I" Value: !Ref CustomEntityRecognizerARN CERTrainingCompletionCheckRuleARNSSM: Type: 'AWS::SSM::Parameter' Properties: Type: 'String' DataType: 'text' Description: > The S3 URI for the file that contains entities for the Amazon Comprehend custom entity recognizer training. Name: "CERTrainingCompletionCheckRuleARN-TCA2I" Value: Fn::GetAtt: - ScheduledTrainingCERCompletionCheckCWEventRule - Arn CustomEntityRecognizerAccessRoleARNSSM: Type: 'AWS::SSM::Parameter' Properties: Type: 'String' DataType: 'text' Description: > The ARN of the current Comprehend Execution Role to allow access to S3 Buckets. Name: "ComprehendExecutionRole-TCA2I" Value: Fn::GetAtt: - ComprehendExecutionRole - Arn CustomEntityTrainingDatasetS3URISSM: Type: 'AWS::SSM::Parameter' Properties: Type: 'String' DataType: 'text' Description: > The S3 URI for the file that contains entities for the Amazon Comprehend custom entity recognizer training. Name: "CustomEntityTrainingDatasetS3URI-TCA2I" Value: !Ref CustomEntityTrainingDatasetS3URI CustomEntityTrainingListS3URISSM: Type: 'AWS::SSM::Parameter' Properties: Type: 'String' DataType: 'text' Description: > The S3 URI for the file that contains entities for the Amazon Comprehend custom entity recognizer training. Name: "CustomEntityTrainingListS3URI-TCA2I" Value: !Ref CustomEntityTrainingListS3URI