# Data Lake Solution Many Amazon Web Services (AWS) customers require a data storage and analytics solution that offers more agility and flexibility than traditional data management systems. A data lake is an increasingly popular way to store and analyze data because it allows businesses to store all of their data, structured and unstructured, in a centralized repository. The AWS Cloud provides many of the building blocks required to help businesses implement a secure, flexible, and cost-effective data lake. The data lake solution is an automated reference implementation that deploys a highly available, cost-effective data lake architecture on the AWS Cloud. The solution is intended to address common customer pain points around conceptualizing data lake architectures, and automatically configures the core AWS services necessary to easily tag, search, share, and govern specific subsets of data across a business or with other external businesses. This solution allows users to catalog new datasets, and to create data profiles for existing datasets in Amazon Simple Storage Service (Amazon S3) and integrate with integrate with solutions like AWS Glue and Amazon Athena with minimal effort. For the full solution overview visit [Data Lake on AWS](https://aws.amazon.com/answers/big-data/data-lake-solution). For help when using the data lake solution, visit the [online help guide](http://docs.awssolutionsbuilder.com/data-lake/). ## File Structure The data lake project consists of microservices that facilitate the functional areas of the solution. These microservices are deployed to a serverless environment in AWS Lambda.
|-deployment/ [folder containing templates and build scripts]
|-source/
  |-api/
    |-authorizer/ [custom authorizer for api gateway]
    |-services/
      |-admin/ [microservice for data lake administrative functionality]
      |-cart/ [microservice for data lake cart functionality]
      |-logging/ [microservice for data lake audit logging]
      |-manifest/ [microservice for data lake manifest processing]
      |-package/ [microservice for data lake package functionality]
      |-profile/ [microservice for data lake user profile functionality]
      |-search/ [microservice for data lake search functionality]
  |-cli/ [data lake command line interface]
  |-console/ [data lake angularjs management console]
  |-resource/
    |-access-validator/ [auxiliar module used to validate granular permissions]
    |-helper/ [custom helper for CloudFormation deployment template]
Each microservice follows the structure of:
|-service-name/
  |-lib/
    |-[service module libraries and unit tests]
  |-index.js [injection point for microservice]
  |-package.json
## Getting Started
#### 01. Prerequisites
The following procedures assumes that all of the OS-level configuration has been completed. They are:
* [AWS Command Line Interface](https://aws.amazon.com/cli/)
* Node.js 12.x
The data lake solution is developed with Node.js for the microservices that run in AWS Lambda and Angular 1.x for the console user interface. The latest version of the data lake solution has been tested with Node.js v12.x.
#### 02. Build the data lake solution
Clone the aws-data-lake-solution GitHub repository:
```
git clone https://github.com/awslabs/aws-data-lake-solution.git
```
#### 03. Declare enviroment variables:
```
export AWS_REGION=