##This CloudFormation template creates an Amazon Kendra index. It adds a webcrawler datasource ##to the index and crawls the online AWS Documentation for Amazon Kendra, Amazon Lex and Amazon SageMaker ##After the datasource is configured, it triggers a datasource sync, i.e. the process to crawl the sitemaps ##and index the crawled documents. ##The output of the CloudFormation template shows the Kendra index id and the AWS region it was created in. ##It takes about 30 minutes to create an Amazon Kendra index and about 15 minutes more to crawl and index ##the content of these webpages to the index. Hence you might need to wait for about 45 minutes after ##launching the CloudFormation stack Resources: ##Create the Role needed to create a Kendra Index KendraIndexRole: Type: 'AWS::IAM::Role' Properties: AssumeRolePolicyDocument: Version: 2012-10-17 Statement: - Sid: '' Effect: Allow Principal: Service: kendra.amazonaws.com Action: 'sts:AssumeRole' Policies: - PolicyDocument: Version: 2012-10-17 Statement: - Effect: Allow Resource: '*' Condition: StringEquals: 'cloudwatch:namespace': 'Kendra' Action: - 'cloudwatch:PutMetricData' - Effect: Allow Resource: '*' Action: 'logs:DescribeLogGroups' - Effect: Allow Resource: !Sub - 'arn:aws:logs:${region}:${account}:log-group:/aws/kendra/*' - region: !Ref 'AWS::Region' account: !Ref 'AWS::AccountId' Action: 'logs:CreateLogGroup' - Effect: Allow Resource: !Sub - 'arn:aws:logs:${region}:${account}:log-group:/aws/kendra/*:log-stream:*' - region: !Ref 'AWS::Region' account: !Ref 'AWS::AccountId' Action: - 'logs:DescribeLogStreams' - 'logs:CreateLogStream' - 'logs:PutLogEvents' PolicyName: !Join - '' - - !Ref 'AWS::StackName' - '-DocsKendraIndexPolicy' RoleName: !Join - '' - - !Ref 'AWS::StackName' - '-DocsKendraIndexRole' ##Create the Kendra Index DocsKendraIndex: Type: 'AWS::Kendra::Index' Properties: Name: !Join - '' - - !Ref 'AWS::StackName' - '-Index' Edition: 'DEVELOPER_EDITION' RoleArn: !GetAtt KendraIndexRole.Arn ##Create the Role needed to attach the Webcrawler Data Source KendraDSRole: Type: 'AWS::IAM::Role' DependsOn: - KendraDocsS3Bucket Properties: AssumeRolePolicyDocument: Version: 2012-10-17 Statement: - Sid: '' Effect: Allow Principal: Service: kendra.amazonaws.com Action: 'sts:AssumeRole' Policies: - PolicyDocument: Version: 2012-10-17 Statement: - Effect: Allow Resource: !Sub - 'arn:aws:kendra:${region}:${account}:index/${index}' - region: !Ref 'AWS::Region' account: !Ref 'AWS::AccountId' index: !GetAtt DocsKendraIndex.Id Action: - 'kendra:BatchPutDocument' - 'kendra:BatchDeleteDocument' - Effect: Allow Resource: !GetAtt 'KendraDocsS3Bucket.Arn' Action: 's3:ListBucket' - Effect: Allow Resource: !Join - '/' - - !GetAtt 'KendraDocsS3Bucket.Arn' - '*' Action: 's3:GetObject' PolicyName: !Join - '' - - !Ref 'AWS::StackName' - '-DocsDSPolicy' RoleName: !Join - '' - - !Ref 'AWS::StackName' - '-DocsDSRole' KendraDocsS3Bucket: Type: 'AWS::S3::Bucket' Properties: BucketName: !Sub - 'sagemaker-llm-kendra-rag-${region}-${account}' - region: !Ref 'AWS::Region' account: !Ref 'AWS::AccountId' KendraDocsS3BucketDS: Type: 'AWS::Kendra::DataSource' DependsOn: - KendraDocsS3Bucket Properties: DataSourceConfiguration: S3Configuration: BucketName: !Ref 'KendraDocsS3Bucket' IndexId: !GetAtt DocsKendraIndex.Id Name: 'KendraDocsS3BucketDS' RoleArn: !GetAtt KendraDSRole.Arn Type: 'S3' #Docs Data Source KendraDocsDS: Type: 'AWS::Kendra::DataSource' Properties: DataSourceConfiguration: WebCrawlerConfiguration: UrlInclusionPatterns: - '.*https://docs.aws.amazon.com/lex/.*' - '.*https://docs.aws.amazon.com/kendra/.*' - '.*https://docs.aws.amazon.com/sagemaker/.*' Urls: SiteMapsConfiguration: SiteMaps: - 'https://docs.aws.amazon.com/lex/latest/dg/sitemap.xml' - 'https://docs.aws.amazon.com/kendra/latest/dg/sitemap.xml' - 'https://docs.aws.amazon.com/sagemaker/latest/dg/sitemap.xml' IndexId: !GetAtt DocsKendraIndex.Id Name: 'KendraDocsDS' RoleArn: !GetAtt KendraDSRole.Arn Type: 'WEBCRAWLER' DataSourceSyncLambdaRole: Type: AWS::IAM::Role Properties: AssumeRolePolicyDocument: Version: '2012-10-17' Statement: - Effect: Allow Principal: Service: lambda.amazonaws.com Action: sts:AssumeRole ManagedPolicyArns: - arn:aws:iam::aws:policy/service-role/AWSLambdaBasicExecutionRole Policies: - PolicyDocument: Version: 2012-10-17 Statement: - Effect: Allow Resource: !Sub - 'arn:aws:kendra:${region}:${account}:index/${index}*' - region: !Ref 'AWS::Region' account: !Ref 'AWS::AccountId' index: !GetAtt DocsKendraIndex.Id Action: - 'kendra:*' PolicyName: DataSourceSyncLambdaPolicy DataSourceSyncLambda: Type: AWS::Lambda::Function Properties: Handler: index.lambda_handler Runtime: python3.8 Role: !GetAtt 'DataSourceSyncLambdaRole.Arn' Timeout: 900 MemorySize: 1024 Code: ZipFile: | # Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved. # SPDX-License-Identifier: MIT-0 import json import logging import boto3 import cfnresponse import random import os logger = logging.getLogger() logger.setLevel(logging.INFO) INDEX_ID = os.environ['INDEX_ID'] DS_ID = os.environ['DS_ID'] AWS_REGION = os.environ['AWS_REGION'] KENDRA = boto3.client('kendra') def start_data_source_sync(dsId, indexId): logger.info(f"start_data_source_sync(dsId={dsId}, indexId={indexId})") resp = KENDRA.start_data_source_sync_job(Id=dsId, IndexId=indexId) logger.info(f"response:" + json.dumps(resp)) def lambda_handler(event, context): logger.info("Received event: %s" % json.dumps(event)) start_data_source_sync(DS_ID, INDEX_ID) status = cfnresponse.SUCCESS cfnresponse.send(event, context, status, {}, None) return status Environment: Variables: INDEX_ID: !GetAtt DocsKendraIndex.Id DS_ID: !GetAtt KendraDocsDS.Id DataSourceSync: Type: Custom::DataSourceSync DependsOn: - DocsKendraIndex - KendraDocsDS Properties: ServiceToken: !GetAtt DataSourceSyncLambda.Arn Outputs: KendraIndexID: Value: !GetAtt DocsKendraIndex.Id AWSRegion: Value: !Ref 'AWS::Region' KendraDocsS3BucketName: Value: !Ref 'KendraDocsS3Bucket' KendraDocsS3DSID: Value: !GetAtt KendraDocsDS.Id