Description: > MediaSearch Solution - Indexer stack (v0.3.2) Resources: # This MediaBucket only holds the downloaded YouTube videos YTMediaBucket: Type: AWS::S3::Bucket Description: Create a bucket to hold downloaded YouTube videos # Dynamo DB to hold the indexed YouTube videos along with any Metadata YTMediaDDBQueueTable: Type: AWS::DynamoDB::Table Properties: BillingMode: PAY_PER_REQUEST AttributeDefinitions: - AttributeName: "ytkey" AttributeType: "S" KeySchema: - AttributeName: "ytkey" KeyType: "HASH" YTIndexerLambdaIAMRole: Type: 'AWS::IAM::Role' Properties: AssumeRolePolicyDocument: Version: 2012-10-17 Statement: - Effect: Allow Principal: Service: - lambda.amazonaws.com Action: - 'sts:AssumeRole' Path: / Policies: - PolicyName: root PolicyDocument: Version: 2012-10-17 Statement: - Effect: Allow Action: - 's3:*' Resource: - !Join - '' - - 'arn:aws:s3:::' - !Ref YTMediaBucket - /* - !Join - '' - - 'arn:aws:s3:::' - !Ref YTMediaBucket - Effect: Allow Action: - 'logs:CreateLogGroup' - 'logs:CreateLogStream' - 'logs:PutLogEvents' Resource: 'arn:aws:logs:*:*:*' - Effect: Allow Action: - 'dynamodb:*' Resource: !GetAtt YTMediaDDBQueueTable.Arn PytubeLambdalayer: Type: AWS::Lambda::LayerVersion Properties: Content: S3Bucket: '' S3Key: !Join - '' - - '' - '' Description: Layer with pytube package required to download YT Videos LayerName: pytube-llayer LicenseInfo: MIT YouTubeVideoIndexer: Type: 'AWS::Lambda::Function' Properties: Layers: - !Ref PytubeLambdalayer Code: S3Bucket: '' S3Key: !Join - '' - - '' - '' Handler: index.lambda_handler Role: !GetAtt YTIndexerLambdaIAMRole.Arn Runtime: python3.9 Timeout: 600 MemorySize: 1024 Environment: Variables: playListURL: !Ref PlayListURL numberOfYTVideos: !If [IndexYTVideosYN, 0,!Ref NumberOfYTVideos] LOG_LEVEL: INFO ddbTableName: !Ref YTMediaDDBQueueTable mediaBucket: !Ref YTMediaBucket mediaFolderPrefix: !Ref MediaFolderPrefix metaDataFolderPrefix: !Ref MetadataFolderPrefix RETRY: 10 ##Create the Role needed to create a Kendra Index KendraIndexRole: Type: 'AWS::IAM::Role' Condition: CreateIndex Properties: AssumeRolePolicyDocument: Version: 2012-10-17 Statement: - Sid: '' Effect: Allow Principal: Service: kendra.amazonaws.com Action: 'sts:AssumeRole' Policies: - PolicyDocument: Version: 2012-10-17 Statement: - Effect: Allow Resource: '*' Condition: StringEquals: 'cloudwatch:namespace': 'Kendra' Action: - 'cloudwatch:PutMetricData' - Effect: Allow Resource: '*' Action: 'logs:DescribeLogGroups' - Effect: Allow Resource: !Sub - 'arn:aws:logs:${region}:${account}:log-group:/aws/kendra/*' - region: !Ref 'AWS::Region' account: !Ref 'AWS::AccountId' Action: 'logs:CreateLogGroup' - Effect: Allow Resource: !Sub - 'arn:aws:logs:${region}:${account}:log-group:/aws/kendra/*:log-stream:*' - region: !Ref 'AWS::Region' account: !Ref 'AWS::AccountId' Action: - 'logs:DescribeLogStreams' - 'logs:CreateLogStream' - 'logs:PutLogEvents' PolicyName: KendraMediaIndexPolicy ##Create the Kendra Index MediaKendraIndex: Type: 'AWS::Kendra::Index' DependsOn: - YouTubeVideoIndexer - CallYouTubeVideoIndexer Condition: CreateIndex Properties: Edition: 'DEVELOPER_EDITION' Name: !Join - '' - - !Ref 'AWS::StackName' - '-Index' RoleArn: !GetAtt KendraIndexRole.Arn ##Attach Custom Data Source when using existing Index KendraMediaDSOwn: Type: 'AWS::Kendra::DataSource' Condition: OwnIndex Properties: IndexId: !Ref ExistingIndexId Name: !Join - '' - - !Ref 'AWS::StackName' - '-DS' Type: 'CUSTOM' ##Attach Custom Data Source when using default Index KendraMediaDS: Type: 'AWS::Kendra::DataSource' Condition: CreateIndex Properties: IndexId: !GetAtt MediaKendraIndex.Id Name: !Join - '' - - !Ref 'AWS::StackName' - '-DS' Type: 'CUSTOM' MediaDynamoTable: Type: 'AWS::DynamoDB::Table' Properties: AttributeDefinitions: - AttributeName: "id" AttributeType: "S" KeySchema: - AttributeName: "id" KeyType: "HASH" BillingMode: "PAY_PER_REQUEST" TranscribeDataAccessRole: Type: AWS::IAM::Role Properties: AssumeRolePolicyDocument: Version: '2012-10-17' Statement: - Effect: Allow Principal: Service: transcribe.amazonaws.com Action: sts:AssumeRole Policies: - PolicyDocument: Version: 2012-10-17 Statement: - !If - NonEmptyBucket - Effect: Allow Resource: - !Sub 'arn:aws:s3:::${MediaBucket}/*' - !Sub 'arn:aws:s3:::${MediaBucket}' Action: - 's3:GetObject' - 's3:ListBucket' - !Ref "AWS::NoValue" - Effect: Allow Action: - 's3:GetObject' - 's3:ListBucket' Resource: !Sub - 'arn:aws:s3:::${bucket}*' - bucket: !Ref YTMediaBucket PolicyName: TranscribeDataAccessPolicy CrawlerLambdaRole: Type: AWS::IAM::Role Properties: AssumeRolePolicyDocument: Version: '2012-10-17' Statement: - Effect: Allow Principal: Service: lambda.amazonaws.com Action: sts:AssumeRole ManagedPolicyArns: - arn:aws:iam::aws:policy/service-role/AWSLambdaBasicExecutionRole Policies: - PolicyDocument: Version: 2012-10-17 Statement: - Effect: Allow Resource: !Sub - 'arn:aws:kendra:${region}:${account}:index/${index}*' - region: !Ref 'AWS::Region' account: !Ref 'AWS::AccountId' index: !If [CreateIndex, !GetAtt MediaKendraIndex.Id, !Ref ExistingIndexId] Action: - 'kendra:*' - !If - NonEmptyBucket - Effect: Allow Resource: - !Sub 'arn:aws:s3:::${MediaBucket}/*' - !Sub 'arn:aws:s3:::${MediaBucket}' Action: - 's3:GetObject' - 's3:ListBucket' - 's3:GetBucketLocation' - !Ref "AWS::NoValue" - Effect: Allow Resource: !Sub - 'arn:aws:s3:::${bucket}*' - bucket: !Ref YTMediaBucket Action: - 's3:GetObject' - 's3:ListBucket' - 's3:GetBucketLocation' - Effect: Allow Resource: !GetAtt MediaDynamoTable.Arn Action: - 'dynamodb:*' - Effect: Allow Resource: '*' Action: - 'transcribe:*' - Effect: Allow Resource: !GetAtt 'TranscribeDataAccessRole.Arn' Action: - 'iam:PassRole' - Effect: Allow Resource: !GetAtt 'S3JobCompletionLambdaFunction.Arn' Action: - 'lambda:InvokeFunction' PolicyName: CrawlerLambdaPolicy S3CrawlLambdaFunction: Type: AWS::Lambda::Function Properties: Handler: crawler.lambda_handler Runtime: python3.8 Role: !GetAtt 'CrawlerLambdaRole.Arn' Timeout: 900 MemorySize: 1024 Code: S3Bucket: '' S3Key: !Join - '' - - '' - '' Environment: Variables: MEDIA_BUCKET: !Ref MediaBucket YTMEDIA_BUCKET: !Ref YTMediaBucket MEDIA_FOLDER_PREFIX: !Ref MediaFolderPrefix METADATA_FOLDER_PREFIX: !Ref MetadataFolderPrefix MAKE_CATEGORY_FACETABLE: !Ref MakeCategoryFacetable INDEX_YOUTUBE_VIDEOS: !If [IndexYTVideosYN, 'false', 'true'] TRANSCRIBEOPTS_FOLDER_PREFIX: !Ref OptionsFolderPrefix MEDIA_FILE_TABLE: !Ref MediaDynamoTable INDEX_ID: !If [CreateIndex, !GetAtt MediaKendraIndex.Id, !Ref ExistingIndexId] DS_ID: !If [CreateIndex, !GetAtt KendraMediaDS.Id, !GetAtt KendraMediaDSOwn.Id] STACK_NAME: !Ref AWS::StackName TRANSCRIBE_ROLE: !GetAtt 'TranscribeDataAccessRole.Arn' JOBCOMPLETE_FUNCTION: !Ref S3JobCompletionLambdaFunction JobCompleteLambdaRole: Type: AWS::IAM::Role Properties: AssumeRolePolicyDocument: Version: '2012-10-17' Statement: - Effect: Allow Principal: Service: lambda.amazonaws.com Action: sts:AssumeRole ManagedPolicyArns: - arn:aws:iam::aws:policy/service-role/AWSLambdaBasicExecutionRole Policies: - PolicyDocument: Version: 2012-10-17 Statement: - Effect: Allow Resource: !Sub - 'arn:aws:kendra:${region}:${account}:index/${index}*' - region: !Ref 'AWS::Region' account: !Ref 'AWS::AccountId' index: !If [CreateIndex, !GetAtt MediaKendraIndex.Id, !Ref ExistingIndexId] Action: - 'kendra:*' - !If - NonEmptyBucket - Effect: Allow Resource: - !Sub 'arn:aws:s3:::${MediaBucket}/*' - !Sub 'arn:aws:s3:::${MediaBucket}' Action: - 's3:GetObject' - 's3:ListBucket' - 's3:GetBucketLocation' - !Ref "AWS::NoValue" - Effect: Allow Resource: !Sub - 'arn:aws:s3:::${bucket}*' - bucket: !Ref YTMediaBucket Action: - 's3:GetObject' - 's3:ListBucket' - 's3:GetBucketLocation' - Effect: Allow Resource: !GetAtt MediaDynamoTable.Arn Action: - 'dynamodb:*' - Effect: Allow Resource: '*' Action: - 'transcribe:*' PolicyName: JobCompleteLambdaPolicy S3JobCompletionLambdaFunction: Type: AWS::Lambda::Function Properties: Handler: jobcomplete.lambda_handler Runtime: python3.8 Role: !GetAtt 'JobCompleteLambdaRole.Arn' Timeout: 300 MemorySize: 1024 Code: S3Bucket: '' S3Key: !Join - '' - - '' - '' Environment: Variables: INDEX_ID: !If [CreateIndex, !GetAtt MediaKendraIndex.Id, !Ref ExistingIndexId] DS_ID: !If [CreateIndex, !GetAtt KendraMediaDS.Id, !GetAtt KendraMediaDSOwn.Id] MEDIA_FILE_TABLE: !Ref MediaDynamoTable STACK_NAME: !Ref AWS::StackName TrancriptionJobCompleteEvent: Type: AWS::Events::Rule Properties: EventPattern: source: - aws.transcribe detail-type: - "Transcribe Job State Change" detail: TranscriptionJobStatus: - "COMPLETED" - "FAILED" State: ENABLED Targets: - Arn: !GetAtt S3JobCompletionLambdaFunction.Arn Id: !Ref S3JobCompletionLambdaFunction JobCompleteLambdaPermission: Type: AWS::Lambda::Permission Properties: FunctionName: !GetAtt S3JobCompletionLambdaFunction.Arn Action: lambda:InvokeFunction Principal: events.amazonaws.com SourceArn: !GetAtt TrancriptionJobCompleteEvent.Arn DSSyncStartSchedule: Type: AWS::Events::Rule Properties: ScheduleExpression: !Join - '' - - 'rate(' - !Ref SyncSchedule - ')' State: ENABLED Targets: - Arn: !GetAtt S3CrawlLambdaFunction.Arn Id: !Ref S3CrawlLambdaFunction SyncScheduleLambdaPermission: Type: AWS::Lambda::Permission Properties: FunctionName: !GetAtt S3CrawlLambdaFunction.Arn Action: lambda:InvokeFunction Principal: events.amazonaws.com SourceArn: !GetAtt DSSyncStartSchedule.Arn StartCrawler: Type: Custom::CustomResource DependsOn: - YouTubeVideoIndexer - CallYouTubeVideoIndexer - S3CrawlLambdaFunction - S3JobCompletionLambdaFunction - TrancriptionJobCompleteEvent Properties: ServiceToken: !GetAtt S3CrawlLambdaFunction.Arn TriggerDependencies: - !Ref ExistingIndexId - !Ref MediaBucket - !Ref MediaFolderPrefix - !Ref MetadataFolderPrefix - !Ref OptionsFolderPrefix - '' - !Ref PlayListURL - !Ref NumberOfYTVideos #- !Ref IndexYTVideos - '' CallYouTubeVideoIndexer: Type: AWS::CloudFormation::CustomResource Properties: ServiceToken: !GetAtt YouTubeVideoIndexer.Arn TriggerDependencies: - !Ref ExistingIndexId - !Ref MediaBucket - !Ref PlayListURL - !Ref NumberOfYTVideos #- !Ref IndexYTVideos - '' Parameters: MediaBucket: Type: String Default: '' Description: 'S3 bucket name containing media files in the region where you are deploying ()' MediaFolderPrefix: Type: String Default: '' Description: 'Prefix for media folder in the media bucket ( e.g. path/to/files/ )' MetadataFolderPrefix: Type: String Default: '' Description: '(Optional) Metadata files prefix folder location ( e.g. metadata/ ). If a media file is stored at s3://bucket/path/to/files/file2.mp3, and the metadata prefix folder location is metadata/, the metadata file location is s3://bucket/metadata/path/to/files/file2.mp3.metadata.json. By default, there is no metadata file prefix folder, and metadata files are stored in the same folder as the media files. See: See https://github.com/aws-samples/aws-kendra-transcribe-media-search/blob/main/README.md#add-kendra-metadata' OptionsFolderPrefix: Type: String Default: '' Description: '(Optional) Transcribe options files prefix folder location ( e.g. transcribeopts/ ). If a media file is stored at s3://bucket/path/to/files/file2.mp3, and the options prefix folder location is transcribeopts/, the metadata file location is s3://bucket/transcribeopts/path/to/files/file2.mp3.transcribeopts.json. By default, there is no options file prefix folder, and Transcribe options files are stored in the same folder as the media files. See https://github.com/aws-samples/aws-kendra-transcribe-media-search/blob/main/README.md#add-transcribe-options' MakeCategoryFacetable: Type: String Default: 'true' AllowedValues: ['true', 'false'] Description: 'Set true to make the Kendra index attribute "_category" facetable, displayable and searchable' SyncSchedule: Type: String Default: '24 hours' AllowedValues: - '2 hours' - '6 hours' - '12 hours' - '24 hours' - '48 hours' - '72 hours' Description: 'Frequency to synchronize the S3 bucket with the Kendra index. The default is 24 hours' ExistingIndexId: Default: '' Type: String Description: "Leave this empty to create a new index or provide the index *id* (not name) of the existing Kendra index to be used" #IndexYTVideos: # Description: Do you want to index YouTubevideos ? # Type: String # Default: 'true' # AllowedValues: # - 'true' # - 'false' PlayListURL: Type: String Description: 'Enter the YouTube playlist URL. Defaulted to This is my Architecture PlayList on Youtube' Default: 'https://www.youtube.com/playlist?list=PLhr1KZpdzukdeX8mQ2qO73bg6UKQHYsHb' NumberOfYTVideos: Type: Number Default: 5 Description: 'Enter the number of youtube videos to download. Defaulted to 5' Metadata: AWS::CloudFormation::Interface: ParameterGroups: - Label: default: MediaSearch Indexer parameters Parameters: - ExistingIndexId - MediaBucket - MediaFolderPrefix - SyncSchedule - Label: default: Kendra Metadata and Transcribe options parameters Parameters: - MetadataFolderPrefix - OptionsFolderPrefix - Label: default: Index YouTube Videos Parameters: #- IndexYTVideos - PlayListURL - NumberOfYTVideos Conditions: BlankPlayList: !Equals - !Ref PlayListURL - '' ZeroYTDownload: !Equals - !Ref NumberOfYTVideos - 0 IndexYTVideosYN: !Or - !Condition BlankPlayList - !Condition ZeroYTDownload CreateIndex: !Equals - !Ref ExistingIndexId - '' # IndexYTVideosYN: !Equals # - !Ref IndexYTVideos # - 'true' OwnIndex: !Not - !Equals - !Ref ExistingIndexId - '' NonEmptyBucket: !Not - !Equals - !Ref MediaBucket - '' Outputs: KendraIndexId: Value: !If [CreateIndex, !GetAtt MediaKendraIndex.Id, !Ref ExistingIndexId] MediaBucketsUsed: Value: !If [NonEmptyBucket, !Join [ ",", [!Ref MediaBucket, !Ref YTMediaBucket]], !Ref YTMediaBucket] YouTubeMediaBucketUsed: Value: !Ref YTMediaBucket