# Data Ingestion From On-Premise NFS using Amazon DataSync

## Overview

[AWS DataSync](https://aws.amazon.com/datasync/) is a fully managed data transfer service that simplifies, automates, and accelerates moving and replicating data between on-premises storage systems and AWS storage services over the internet or AWS Direct Connect. In a datalake environment, AWS DataSync can be used to sync files securely from on premise storage servers like NFS to S3 based datalake automatically.

In this architecture, we = walk you through how to use AWS DataSync and DataSync Agent to migrate data to a datalake in Amazon S3.

![Data Ingestion Amazon Glue](aws-datasync-from-nfs-on-prem.png)

## Architecture Component Walkthrough

1. You create a network attached file storage server (NFS) inside your data center.
2. You [install an AWS Datasync Agent](https://docs.aws.amazon.com/datasync/latest/userguide/create-agent-cli.html) as a VMware ESXi [hypervisor](https://en.wikipedia.org/wiki/Hypervisor) based environment. This Agent will have read access on the NFS server.
3. You configure AWS DataSync with the [locations](https://docs.aws.amazon.com/datasync/latest/userguide/create-locations-cli.html) required to perform syncronisation
3. You [create](https://docs.aws.amazon.com/datasync/latest/userguide/create-task-cli.html) and then [start](https://docs.aws.amazon.com/datasync/latest/userguide/start-task-execution.html) an AWS DataSync task to synchronization files from NFS to S3.
4. Use an [AWS Glue Crawler](https://docs.aws.amazon.com/glue/latest/dg/add-crawler.html) to catalog the S3 location that receives files via AWS DataSync.

## References

* [Getting started with AWS DataSync](https://docs.aws.amazon.com/datasync/latest/userguide/getting-started.html)
* [How AWS DataSync works](https://docs.aws.amazon.com/datasync/latest/userguide/how-datasync-works.html)