AWSTemplateFormatVersion: '2010-09-09' Transform: AWS::Serverless-2016-10-31 Description: > "Appling voice classification in Amazon Connect contact flow: - Create the basic foundation for streaming customer audio from Amazon Connect by deploying: - S3 Bucket for audio recordings - Dynamo DB tables: liveaudioStreamingMLinference and liveaudioStreamingcontactDetails - A Lambda triggered on inbound contacts to store the initial contact details - A Lambda to trigger and pass the stream details to the Java Lambda - A Java Lambda to consume KVS and stream it to Amazon Transcribe, store the segments in DDB and upload the raw audio to S3 Metadata: 'AWS::CloudFormation::Interface': ParameterGroups: - Label: default: Existing configuration Parameters: - existingS3BucketName - Label: default: Amazon S3 Configuration Parameters: - s3BucketName - audioFilePrefix - Label: default: Amazon DynamoDB Configuration Parameters: - MLinferenceTable - contactDetailsTable ParameterLabels: s3BucketName: default: Call Audio Bucket Name audioFilePrefix: default: Audio File Prefix MLinferenceTable: default: Transcript Table Name for audio from the customer contactDetailsTable: default: Contacts Table Name existingS3BucketName: default: Existing S3 Bucket Name Parameters: s3BucketName: Type: String Default: liveaudiostreaming-demo Description: Enter the (globally unique) name you would like to use for the Amazon S3 bucket where we will store the audio files. This template will fail to deploy if the bucket name you chose is currently in use. audioFilePrefix: Type: String Default: recordings/ Description: The Amazon S3 prefix where the audio files will be saved (must end in "/") MLinferenceTable: Type: String Default: liveaudioStreamingMLinference Description: The name of the DynamoDB Table where the machine learning prediction results are saved. contactDetailsTable: Type: String Default: liveaudioStreamingcontactDetails Description: The name of the DynamoDB Table where contact details will be written (Ensure you do not have a table with this name already). existingS3BucketName: Type: String Default: connect-voice-classification Description: The name of the S3 bucket that contains the zipped lambda files and connect contact flow SageMakerInferenceEndpointName: Type: String Description: The SageMaker Inference Endpoint Name to make machine learning prediction Resources: allowConnectToKvsConsumerTriggerLambda: Type: 'AWS::Lambda::Permission' Properties: FunctionName: !Ref kvsConsumerTrigger Action: 'lambda:InvokeFunction' Principal: SourceAccount: !Ref 'AWS::AccountId' allowConnectToInitContactDetailsLambda: Type: 'AWS::Lambda::Permission' Properties: FunctionName: !Ref initContactDetails Action: 'lambda:InvokeFunction' Principal: SourceAccount: !Ref 'AWS::AccountId' createS3Bucket: Type: 'AWS::S3::Bucket' Properties: BucketName: !Ref s3BucketName VersioningConfiguration: Status: Enabled PublicAccessBlockConfiguration: BlockPublicAcls: True BlockPublicPolicy: True IgnorePublicAcls: True RestrictPublicBuckets: True BucketEncryption: ServerSideEncryptionConfiguration: - ServerSideEncryptionByDefault: SSEAlgorithm: AES256 CorsConfiguration: CorsRules: - AllowedOrigins: - '*' AllowedHeaders: - '*' AllowedMethods: - PUT - HEAD MaxAge: '3000' InferenceDDBTable: Type: AWS::DynamoDB::Table Properties: TableName: !Ref MLinferenceTable AttributeDefinitions: - AttributeName: "ContactId" AttributeType: "S" - AttributeName: "StartTime" AttributeType: "N" KeySchema: - AttributeName: "ContactId" KeyType: "HASH" - AttributeName: "StartTime" KeyType: "RANGE" # assuming 5 concurrent calls ProvisionedThroughput: ReadCapacityUnits: 5 WriteCapacityUnits: 5 PointInTimeRecoverySpecification: PointInTimeRecoveryEnabled: True SSESpecification: SSEEnabled: True TimeToLiveSpecification: AttributeName: "ExpiresAfter" Enabled: True StreamSpecification: StreamViewType: NEW_AND_OLD_IMAGES contactDetailsDDBTable: Type: AWS::DynamoDB::Table Properties: TableName: !Ref contactDetailsTable AttributeDefinitions: - AttributeName: "contactId" AttributeType: "S" KeySchema: - AttributeName: "contactId" KeyType: "HASH" # assuming 5 concurrent calls ProvisionedThroughput: ReadCapacityUnits: 5 WriteCapacityUnits: 5 PointInTimeRecoverySpecification: PointInTimeRecoveryEnabled: True SSESpecification: SSEEnabled: True ConnectUserStsRole: Type: "AWS::IAM::Role" Properties: AssumeRolePolicyDocument: Version: "2012-10-17" Statement: - Effect: "Allow" Principal: AWS: !Join - '' - - 'arn:' - !Ref 'AWS::Partition' - ':iam::' - !Ref 'AWS::AccountId' - ':' - 'root' Action: - "sts:AssumeRole" Path: "/" ConnectUserStsPolicy: Type: 'AWS::IAM::Policy' Metadata: cfn_nag: rules_to_suppress: - id: W12 reason: comprehend, translate, and connect do not support resource-level permissions Properties: PolicyName: !Sub ${AWS::StackName}-UserStsPolicy Roles: - !Ref ConnectUserStsRole PolicyDocument: Version: "2012-10-17" Statement: - Effect: "Allow" Action: - "comprehend:ListEntityRecognizers" - "comprehend:DetectSentiment" - "comprehend:DetectEntities" - "comprehend:ListDocumentClassifiers" - "comprehend:DetectSyntax" - "comprehend:DetectKeyPhrases" Resource: "*" - Effect: "Allow" Action: - "translate:TranslateText" Resource: "*" - Effect: "Allow" Action: - "s3:PutObject" Resource: - !Sub ${createS3Bucket.Arn}/* - Effect: "Allow" Action: - "connect:UpdateContactAttributes" Resource: "*" kvsMLInferenceFunction: Type: "AWS::Serverless::Function" Properties: Description: > Process audio from Kinesis Video Stream and use Amazon Transcribe to get text for the caller audio. Will be invoked by the kvsConsumerTrigger Lambda, writes results to the transcript DynamoDB tables, and uploads the audio file to S3. CodeUri: kvsMLInferenceFunction Handler: "com.amazonaws.kvsmlinference.KVSMLInferenceLambda::handleRequest" Runtime: java8 MemorySize: 512 # maximum timeout is 15 minutes today Timeout: 900 Role: !GetAtt kvsMLInferenceFunctionRole.Arn Environment: Variables: # JAVA_TOOL_OPTIONS: "" APP_REGION: !Ref "AWS::Region" TRANSCRIBE_REGION: !Ref "AWS::Region" RECORDINGS_BUCKET_NAME: !Ref s3BucketName RECORDINGS_KEY_PREFIX: !Ref audioFilePrefix TABLE_ML_INFERENCE: !Ref MLinferenceTable RECORDINGS_PUBLIC_READ_ACL: "FALSE" CONSOLE_LOG_TRANSCRIPT_FLAG: "TRUE" LOGGING_LEVEL: "FINE" SAVE_PARTIAL_TRANSCRIPTS: "TRUE" START_SELECTOR_TYPE: "NOW" SM_ENDPOINT_NAME: !Ref SageMakerInferenceEndpointName kvsMLInferenceFunctionRole: Type: "AWS::IAM::Role" Metadata: cfn_nag: rules_to_suppress: - id: F3 reason: transcribe:* do not support resource-level permissions and kinesisvideo streams are dynamically created and therefore cannot be specificed directly - id: W11 reason: transcribe:* do not support resource-level permissions and kinesisvideo streams are dynamically created and therefore cannot be specificed directly Properties: AssumeRolePolicyDocument: Version: "2012-10-17" Statement: - Effect: "Allow" Principal: Service: - "" Action: - "sts:AssumeRole" Path: "/" Policies: - PolicyName: kvs-streaming-transcribe-policy PolicyDocument: Version: "2012-10-17" Statement: - Effect: "Allow" Action: - 'logs:CreateLogGroup' - 'logs:CreateLogStream' - 'logs:PutLogEvents' Resource: - !Sub "arn:${AWS::Partition}:logs:${AWS::Region}:${AWS::AccountId}:log-group:/aws/lambda/*" - Effect: "Allow" Action: - "dynamodb:Query" - "dynamodb:Scan" - "dynamodb:PutItem" - "dynamodb:UpdateItem" - "dynamodb:GetRecords" - "dynamodb:GetShardIterator" - "dynamodb:DescribeStream" - "dynamodb:ListStreams" Resource: - !Sub ${InferenceDDBTable.Arn} - Effect: "Allow" Action: - "s3:PutObject" - "s3:GetObject" - "s3:PutObjectAcl" Resource: - !Sub ${createS3Bucket.Arn}/* - Effect: "Allow" Action: - "sagemaker:InvokeEndpoint" Resource: "*" - Effect: "Allow" Action: - "kinesisvideo:Describe*" - "kinesisvideo:Get*" - "kinesisvideo:List*" Resource: "*" kvsConsumerTrigger: Type: "AWS::Serverless::Function" Properties: Description: > AWS Lambda Function to start (asynchronous) streaming transcription; it is expected to be called by the Amazon Connect Contact Flow. CodeUri: kvsConsumerTriggerFunction Handler: "kvs_trigger.handler" Runtime: "nodejs12.x" MemorySize: 128 Timeout: 30 Role: !GetAtt kvsConsumerTriggerRole.Arn Environment: Variables: transcriptionFunction: !Ref kvsMLInferenceFunction kvsConsumerTriggerRole: Type: "AWS::IAM::Role" Properties: AssumeRolePolicyDocument: Version: "2012-10-17" Statement: - Effect: "Allow" Principal: Service: - "" Action: - "sts:AssumeRole" Path: "/" Policies: - PolicyName: kvs-streaming-trigger-policy PolicyDocument: Version: "2012-10-17" Statement: - Effect: "Allow" Action: - 'logs:CreateLogGroup' - 'logs:CreateLogStream' - 'logs:PutLogEvents' Resource: - !Sub "arn:${AWS::Partition}:logs:${AWS::Region}:${AWS::AccountId}:log-group:/aws/lambda/*" - Effect: "Allow" Action: - "lambda:InvokeFunction" - "lambda:InvokeAsync" Resource: - !GetAtt kvsMLInferenceFunction.Arn initContactDetails: Type: "AWS::Serverless::Function" Properties: Description: > AWS Lambda Function that will be triggered when the call starts so that we have the initial contact details which can later add to when we have the transcript, and audio file location. CodeUri: initContactDetailsFunction Handler: "contact_init.handler" Runtime: "nodejs12.x" MemorySize: 128 Timeout: 30 Role: !GetAtt initContactDetailsRole.Arn Environment: Variables: table_name: !Ref contactDetailsTable assume_role: !GetAtt ConnectUserStsRole.Arn initContactDetailsRole: Type: "AWS::IAM::Role" Properties: AssumeRolePolicyDocument: Version: "2012-10-17" Statement: - Effect: "Allow" Principal: Service: - "" Action: - "sts:AssumeRole" Path: "/" Policies: - PolicyName: connect-aipsas-ststoken PolicyDocument: Version: "2012-10-17" Statement: - Effect: "Allow" Action: - 'logs:CreateLogGroup' - 'logs:CreateLogStream' - 'logs:PutLogEvents' Resource: - !Sub "arn:${AWS::Partition}:logs:${AWS::Region}:${AWS::AccountId}:log-group:/aws/lambda/*" - Effect: "Allow" Action: - "dynamodb:UpdateItem" Resource: - !Sub ${contactDetailsDDBTable.Arn} - Effect: "Allow" Action: - 'sts:AssumeRole' Resource: - !GetAtt ConnectUserStsRole.Arn Outputs: InferenceDDBTable: Description: The ARN of the DynamoDB table created to store segments of call transcripts (customer audio) Value: !GetAtt InferenceDDBTable.Arn contactsDDBTable: Description: The ARN of the DynamoDB table created to store contact details used in this solution Value: !GetAtt contactDetailsDDBTable.Arn createS3BucketOP: Description: Bucket contains all the call recordings and sample contactflow Value: !GetAtt [createS3Bucket, WebsiteURL] ContactFlowlambdaInitArn: Description: AWS Lambda Function that will be triggered when the call starts so that we have the initial contact details which can later add to when we have the transcript, and audio file location. Value: !GetAtt initContactDetails.Arn ContactFlowlambdaTriggerArn: Description: AWS Lambda Function to start (asynchronous) streaming transcription; it is expected to be called by the Amazon Connect Contact Flow. Value: !GetAtt kvsConsumerTrigger.Arn MachineLearningInferenceFunction: Description: AWS Lambda Function to get audio from Kinesis Video Streams and use Amazon Transcribe to get text for the caller audio. Should be invoked by TranscriptionTrigger and write results to the transcriptSegments table. Value: !Ref kvsMLInferenceFunction