# Web Crawler Component This component shows code examples of how images can be crawled, provided as input to an Amazon Rekognition model to determine the content. The context in the code is based upon crawling images from [AWS Blogs](https://aws.amazon.com/blogs/), hence it was written and structured based on the results generated from it. ## Build With - JavaScript ## Prerequisites The scripts need to be run on your local machine, with the list of prerequisites: - [Create an AWS Account](https://portal.aws.amazon.com/gp/aws/developer/registration/index.html) if you do not already have one and log in. The IAM user that you use must have sufficient permissions to make necessary AWS service calls and manage AWS resources. - [AWS CLI](https://docs.aws.amazon.com/cli/latest/userguide/install-cliv2.html) installed and configured - [Git](https://git-scm.com/book/en/v2/Getting-Started-Installing-Git) installed - [Node.js](https://nodejs.org/en/download/) installed - An Amazon Rekognition Custom Labels model created fronted by an API Gateway. - An API that has the contents of the AWS blog posts that can be crawled from. - An API that retrieves official documentations of individual services identified. - [Amazon DynamoDB table](https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/getting-started-step-1.html) created, following the structure of an item below: ``` { "Item":{ "OriginURL":{ "S":"" }, "PublishDate":{ "S":"" }, "ArchitectureURL":{ "S":"" }, "Metadata":{ "M":{ "crawler":{ "S":"" }, "Rekognition":{ "M":{ "labels":{ "S":"" }, "textServices":{ "S":"" }, "textMetadata":{ "S":"" } } } } }, "Reference":{ "L":"" }, "Title":{ "S":"" } } } ``` ## How to run the script 1. Create a new directory, navigate to that directory in a terminal and clone the GitHub repository: ``` git clone ``` 2. Change directory to the web-crawler-component directory: ``` cd web-crawler-component ``` 3. Install dependencies: ``` npm install ``` 4. Run the script: ``` npm run start ```