// Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved. // SPDX-License-Identifier: MIT-0 AWSTemplateFormatVersion: '2010-09-09' Transform: AWS::Serverless-2016-10-31 Parameters: CrawlSchedule: Type: String Default: rate(1 day) URLs: Type: String Default: https://aws.amazon.com/kendra/faqs/,https://aws.amazon.com/lex/faqs/,https://aws.amazon.com/comprehend/faqs/ Description: A comma separate list of URLS to crawl. The pages can either be HTML or PDFs Resources: PuppeteerHeadlessChromium: Type: AWS::Serverless::LayerVersion Properties: CompatibleRuntimes: - nodejs12.x - nodejs14.x ContentUri: ./src/layer Description: Lambda layer to execute Puppeteer on headless chromium LayerName: puppeteer-headless-chromium-node LicenseInfo: MIT-0 RetentionPolicy: Delete Metadata: BuildMethod: nodejs12.x CrawlPage: Type: AWS::Serverless::Function Properties: Handler: index.handler Runtime: nodejs12.x FunctionName: CrawlPage Timeout: 900 MemorySize: 1024 CodeUri: ./src/lambda Layers: - !Ref PuppeteerHeadlessChromium Environment: Variables: URLS: !Ref URLs NODE_PATH: /opt/nodejs/node12/node_modules:/opt/nodejs/node14/node_modules:/opt/nodejs/node_modules:/var/runtime/node_modules:/var/runtime:/var/task:/opt/nodejs/lib/ Events: CrawlPageEvent: Type: Schedule Properties: Schedule: !Ref CrawlSchedule