MODULE 3: AWS DATASYNC ====================== Copyright Amazon Web Services, Inc. and its affiliates. All rights reserved.This sample code is made available under the MIT-0 license. See the LICENSE file. Errors or corrections? Contact akbariw@amazon.com. ------------------------------------------------------------------------------------- **INTRODUCTION** ------------------- AWS DataSync is a data transfer service that makes it easy for you to simplify, automate & accelerate data migration between on-premises storage and Amazon S3, Amazon Elastic File System, or Amazon FSx for Windows File server. AWS DataSync automatically handles many of the heavy lifting tasks related to data transfers that can slow down and complicate migrations such as building & managing complex scripts that handle metadata preservation, data integrity validation, enabling parallel transfers and network optimization.

**OVERVIEW** ------------------- In this module you will transfer approx. 10,000 small files from an NFS share (**/nfs_source)** to an Amazon S3 bucket using two methods - **Method 1 –** Utilize a script which uses the AWS S3 cp command to pull the data from an NFS share and push it to an Amazon S3 bucket. We will perform this to get a baseline for transfer performance and understand what metadata is copied across using this method. - **Method 2 -** Deploy and configure a single AWS DataSync agent task to accelerate the data transfer bypassing reading the data directly from the NFS server and writing it to an Amazon S3 bucket. We will then compare the performance & metadata of both methods.

**CREATE S3 BUCKET – AWS DATASYNC** ----------------------------------- **Note:** This bucket will be used in module 3 as the target for AWS DataSync transfer 1. From the AWS console, click **Services** at the top of the screen and type & select **S3** 2. From the AWS S3 console select **+Create bucket** 3. Provide a unique bucket name for your **Target-S3-bucket**. Use the following naming convention “stg316-target-**xyz**” were **xyz** is combination your surname and first name (e.g. “**stg316-target-citizenj**”) - Take note of your **Target-S3-bucket** name in your workshop.txt file 4. Next select **US West (Oregon)** as the region 5. Click **Next** 6. Click **Next** 7. Ensure the **“Block all public access**” check box is enabled, and select **Next** 8. On the final screen, select **Create bucket**

Now let’s configure the IAM role assigned to your Linux EC2 instance, to have full access to the **Target-S3-Bucket** you created 1. From the AWS console, click **Services** at the top of the screen and type & select **IAM** 2. From left hand menu select **Roles** 3. In the search field enter the following **S3ROrole** 4. Click on the returned value 5. On the next screen click on the Permissions tab and expand the **s3ROAccessPolicy located** under the Policy Name 6. Click on the **Edit Policy** 7. Click on the **JSON** tab 8. Replace the contents with the below via copy and paste, and **replace** the **Target-S3-Bucket** value with your value { "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": "s3:*", "Resource": ["arn:aws:s3:::Target-S3-Bucket/*"] } ] }

- Click on **Review policy** at the bottom of the screen - Click on **Save changes**

**METHOD 1 - 10,000K SMALL FILE MIGRATION USING A SCRIPT** ------------------------------------------------------
Now lets migrate 10,000 small files (hosted on the NFS server), using a copy script located on the Linux host EC2 instance. The copy script will read from the NFS mount point and write to Amazon S3.
1. Navigate back to the SSH session you have running to your Linux EC2 instance (NFS client) 2. In the SSH session run the below commands to upate the copy script with your values as per the instructions shown below cd /scripts/ds-demo vi copy_script.sh - Press “**i**” to go into edit mode - Update the **** in the script with the **Target-S3-Bucket** name you created in the previous steps - Press the “**ESC**” button when you are done editing - Type ”:**wq**” and hit **Enter** to save changes and exit 3. Now let’s open a second SSH session using the below steps (which we will use to monitor the network performance), leave this session open - In your remote desktop session, click on Windows icon located at the bottom left of the screen - Type **CMD** and hit Enter to open a new command prompt - You should have stored your ***.pem** key file on the desktop as per the previous instructions. - Enter the below commands in the command prompt cd c:\users\administrator\desktop - Next enter the below command to SSH into the Linux server, remember to replace the two values shown in **< >** with your values (*i.e. ssh –i stg316-key.pem ec2-user@192.168.10.102*) ssh -i .pem ec2-user@ - Switch back to your **first** SSH session and run the below commands to start the script to copy data from **/nfs_source** to your **Target-S3-Bucket** cd /scripts/ds-demo ./start-transfer.sh - Switch to your **second** Putty SSH session which you just opened and run the following command to observe the copy transfer performance sudo su cd /scripts/ds-demo ./show_performance.sh

**Note** the throughput values shown (i.e. x MB/s transfer rate) for the first 30 seconds, then switch back to your first SSH session window - Navigate back to the first SSH session and wait until the output states “**Data transfer to Amazon S3 bucket complete”** - From within your first SSH session run the following commands cat /scripts/ds-demo/time.log **Take note** of the time the script took to run (it is the time value shown next to the value for **real**)

**METHOD 2 - 10K SMALL FILE MIGRATION USING AWS DATASYNC** ----------------------------------------------------------
Now lets migrate the same 10,000 small files (hosted on the NFS server), this time using AWS DataSync.


**DEPLOY AWS DATASYNC AGENT** ----------------------------- We are going to deploy the AWS DataSync agent within AWS as an EC2 instance in the absence of an on-premise environment (where you could deploy it as a VMware appliance). The AWS DataSync agent will then read directly from the NFS server (not the NFS client) and transfer the data to your **Target-S3-Bucket** 1. Using the Chrome icon on the Windows EC2 instance desktop, log into your AWS Account using Chrome 2. From the chrome session, in the AWS console, at the top of the screen, click **Services** and type & select **DataSync** - Select **Get Started** - In the Create agent page, under the **Amazon EC2** section click on the **Learn more**  icon - Scroll down to the table that has a list of **AMI Names**, and click on the **Launch Instance** link corresponding to the **us-west-2** Oregon row - In the next page, select the box next to **m5.xlarge** - Select **Next: Configure Instance Details** - In the **Network** drop down select the VPC which has “**STG316**” in its name - In the **Subnet** drop down, select the one which has “**STG316**” in its name - Leave all other settings as default on the page - Click **Next: Add Storage** - Click **Next: Add Tags** - Select **Add Tag** - Enter the following values (case sensitive) - Key = **Name** - Value = **STG316-DataSync** - Click **Next: Configure Security Group** - Click on the “**Select an existing security group**” check box - Select the security group with the name of **STG316-FileGatewaySG** - Click **Review and Launch** - Click **Launch** - Select your **key pair** , accept the check box and click **Launch Instances** - From the AWS console, click **Services** and type & select **EC2** - From the left hand menu, select **Instances** - In the right hand pane, select the box next to “**STG316-DataSync**” - From the bottom window pane, select the **Description** tab, and take note of the **private IP** address into your workshop.txt file for **DataSync-Instance-Private-IP** - Ensure the “**Status Check**” column for this EC2 instance shows **“2/2 checks passed“** before proceeding to the next step. - From the AWS console, at the top of the screen, click **Services** and type & select **DataSync** - Select **Get Started** - Enter the following values on the page - **Service endpoint:** Select Public service endpoints in US West (Oregon) - **Activation key:** Enter the Private IP address you noted down in the previous step for **DataSync-Instance-Private-IP** - Select **Get Key** - You will get the following successful output after your agent has activated successfully - Select **Create Agent** to continue - When the create agent process is complete click on the **blue DataSync** link at the top left of the screen to continue with the next step of creating a task

**TRANSFER DATA USING DATASYNC** --------------------------------- 1. Click on the **Create task** from the top right hand side of the window - Select **Create a new location** from the source locations options - **Location type:** Network File System (NFS) - **Agents:** select the agent you have just deployed - **NFS Server:** enter the value of your **File-Gateway-Instance-Private-IP** - **Mount path:**  enter your **Source-S3-Bucket** name value - Click **Next** to continue 2. Select **Create a new location** from the Destination locations options - **Location type:** Amazon S3 bucket - **S3 bucket:** **Target-S3-Bucket** - **S3 storage class:** select Standard - **Folder:** Type“**datasync-copy**“ - **IAM role:**: click on the **Autogenerate** button - Click **Next** to continue 3. Provide a task name (*i.e. NFS-to-S3-transfer-10K-small-files*) - **Verify data:** Check integrity during transfer - **Copy file metadata:** Ensure the following items are all checked - Copy ownership - Copy permissions - Copy time stamps - Leave all other options as such as default & select **Next** - Click **Create task** 4. On the next screen wait until the **Task status** value is **Available** (refresh screen to get update) 5. Click on the **Start** button 6. Leave all options as they are (don’t override any) and click on **Start** 7. At the top of the screen click on the **See execution details** button to view the progress of the transfer 8. The task will go through a few phases, where it will first compare the files in the source location with what’s stored on the target before sending the new or updated files. In this lab there are approx 10,000 files to be transferred so the launching phase may take a moment or two before switching to the transferring state. - While it is going through these states, navigate through the performance, locations, options, filters and task logging tabs in the middle of the screen to verify the parameters you have configured and the view the outputs available 9. When the **Execution status** show a status of **Success**, your data transfer has completed. - Take note of the **Duration** time taken for the data transfer, and also of the **Data throughput** values, how do they compare with the values you achieved using the S3 copy script in the previous section?

**VERIFY DATA TRANSFERRED USING BOTH METHODS** ---------------------------------------------- Lets view the data copied across from the local NFS share to our target Amazon S3 bucket 1. From the AWS console, click **Services** at the top of the screen type & select **S3** - Click your **Target-S3-bucket** name - Check the box next to the folder labelled **s3-cli-copy** - Click on **Actions**→ **Get total size** - Note the total objects copied to your S3 bucket via the S3 copy script - Click **Cancel** when done viewing. - Check the box next to the folder labelled **datasync-copy** - Click on **Actions**→ **Get total size** - Note the total objects copied to your S3 bucket via DataSync - Click **Cancel** when done viewing. - Click on the folder name **datasync-copy** to go into the directory/prefix - Click on **Appdata** - Click on the box to the left of **saturn.gif** - From the right hand pop-up window, under the properties section click on **Metadata** - This will show you the metadata that DataSync added to object that it copied across to S3. In the next section we view the value of this metadata to help avoid re-factoring applications that access files based on user/group permissions. - Click **Cancel** to continue

**COMPARE METADATA: SCRIPT VS DATASYNC**

------------------------------------------------
Now let’s create a File Gateway NFS share and point it to the target Amazon S3 bucket that we transferred data to, so that we can easily visualise the data from the view of a NFS file share. Then we will compare the file metadata details that were copied across using the two different methods.

**Create NFS Share** 1. From the AWS console, at the top of the screen, click **Services** and type & select **Storage Gateway** 2. On the left hand pane of the AWS Storage Gateway console, select **File shares** 3. Select **Create file Share** from the top menu 4. Enter the name of your **Target-S3-bucket** in the **Amazon S3 bucket name** field. 5. Select **Network File System (NFS)** 6. Select the **File Gateway** you just deployed (STG316-filegateway) 7. Click **Next** 8. Leave all defaults and select **Next** 9. On the next page, click the **Edit** value next to **Allowed clients** - Remove the existing **0.0.0.0/0** value and replace it with **192.168.0.0/16** - Then click the **Close** button to the on the right of the screen for Allowed clients 10. Click the **Edit** value next to **Mount options** - Select “**No root squash**” for Squash level - Leave export as **read-write** - Then click the **Close** button to the on the right of the screen for Mount options 11. Scroll to the bottom of the page and click **Create file share** 12. You will be taken to the **File share** page. Click on the **refresh Icon** on the top right hand corner, until the status of the file share changes from **Creating** to **Available**, before proceeding to the next steps. 13. On the same File Share page, check the box next to the name of your **File share ID** 14. In the details pane below, copy the command for mounting **On Linux** in to your **workshop.txt** for the value of **Second-NFS-FileShare-mount-command**

**Mount NFS Share** 1. Navigate back to your SSH session and run the following command sudo su 2. Next, copy the NFS mount command you noted down in your workshop.txt for **Second-NFS-FileShare-mount-command**, and simply replace the **[MountPath]** value at the end with the value of  **/nfs_target** and enter the entire command into the SSH session, and hit Enter - *i.e. mount -t nfs -o nolock,hard 192.168.10.12:/stg316-target-citizenj /nfs_target* 3. Run the below command to verify you have the NFS mount points of **/nfs_target** showing in the list df -h

**COMPARE TRANSFER METHODS: SCRIPT VS DATASYNC** Now let’s view the files that the S3 copy script & DataSync agent copied across from a file share point of view to see how the metadata translates to the stored files, and what attributes were preserved 1. From the AWS console, at the top of the screen, click **Services** and type & select **Storage Gateway** - From the left hand pane select **File shares** - Check the box next to the file share ID which shows your **Target-S3-Bucket** name in the S3 Bucket column - From the top menu select **Action**→ **Refresh Cache** then select **Start** 2. **View original data**- Run the below command in your SSH session to view the original data time stamp & permissions for the file **saturn.gif** located on /nfs_source ls -ltr /nfs_source/appdata 3. **View S3 copy script data that was transferred** - Run the below command to view the time-stamp, user/group & permission attributes for the file **saturn.gif** copied via the S3 CLI copy script. ls -ltr /nfs_target/s3-cli-copy/appdata - Do the time-stamps, user/group & permission value differ from the original data? 4. **View DataSync copied data** - Run the below command to view the timestamp, user/group & permission value for the file **saturn.gif** copied via DataSync. ls -ltr /nfs_target/datasync-copy/appdata - Do the time-stamps, user/group & permission value differ from the original data? Were they preserved?

**Note:** Notice how the Saturn.gif file that was copied across using DataSync retained the same timestamp & permissions (r-r-r & user9:appadmin) as the original source file, unlike the data copied across via the script.

**SUMMARY** ----------- In module 3, you obtained hands-on experience in deploying and configuring AWS DataSync to simplify, automate and accelerate the transfer of data, in this case 10,000 very small files to Amazon S3 compared to scripting it. AWS DataSync also copied across the metadata (so you could re-access the same objects again via File Gateway as files with their permissions & timestamps). AWS DataSync encompassed data transfer verification, and didn’t require any scripting knowledge or performance tuning to enable faster data transfers.

**END OF MODULE 3** ------------------- Click here to go to [module 4](/module4/README.md)