image
AWS DataSync Architecture
Start course
Difficulty
Intermediate
Duration
14m
Students
338
Ratings
4.2/5
Description

This course explores the AWS DataSync service which focuses on the transfer of data between on-premises into AWS storage services, in addition to transferring data between AWS storage services.

Learning objectives

  • Define what AWS DataSync is and what it's used for
  • Understand AWS DataSync use cases
  • Define the architecture and process of using AWS DataSync to transfer data from on-premises to AWS
  • Define the architecture and process of using AWS DataSync to transfer data between different AWS storage services

Intended Audience

This course has been designed to assist those who are responsible for managing and maintaining data and storage solutions, also it would be advantageous to anyone who is looking to take an AWS associate level exam.

Prerequisites

To get the most out of this course, it would be beneficial to have a basic awareness of the different available AWS Storage Services, including Amazon S3, EFS, and FSx for Windows Server.

Transcript

To understand how AWS DataSync works at an operational level, we first need to understand how the architecture of the service is put together and the different components involved to carry out a DataSync task.

This table shows the current source and destination locations that DataSync can transfer files from and to.

And so based on this I want to look at the architecture of DataSync from two different perspectives, these being:

  • When transferring data from your own managed storage environment to AWS
  • And secondly when transferring data between 2 different AWS storage services, such as Amazon S3 to Amazon EFS

So, let’s look at the first scenario first whereby we have our own self-managed storage solution on-premises and we need to use AWS DataSync to move this data into Amazon S3, so what’s involved? 

When performing data transfer from on-premises then we need to configure an Agent, a Location and a Task.

The Agent will be used on the customer side, so it sits outside of AWS, and it's just a virtual machine supported by VMware ESXi, KVM or Microsoft Hyper-V hypervisors, so it should be compatible with your existing infrastructure.  The agent itself is used to both read and write data to your own storage solution and can be generated and configured from within the AWS Management dashboard which can then be downloaded. 

Next we have a location, and the location identifies the endpoint of a DataSync task.  So as a result, everytime you create a DataSync task you will need to specify the source location and the destination location, dictating where you want to move data from and to.  You can create locations for:

  • Network File Systems (NFS)
  • Server Message Blocks (SMB)
  • Self-managed object storage
  • Amazon EFS
  • Amazon FSx for Windows File Server
  • Amazon S3

Again, these locations can be configured from within the AWS Management Console.

The task contains the details of the operation that you are trying to carry out and perform with DataSync, so it will contain the locations that were created and specified for both the source and destination, in addition to the configuration and conditions of how the data transfer will take place. 

For example, you can configure the type integrity and data verification checks to take place, or if you want to transfer all data in the source location, or just data that has changed since the last task was performed.  You can also specify if you want to overwrite or delete files.

If you only want to transfer specific files from the source, then you can apply pattern filters enabling you to restrict which files to include or exclude from the transfer in the source location.  

Finally, you can also specify logging details which integrate with Amazon CloudWatch Logs to help you identify any failures or errors.  For more information on Amazon CloudWatch, please see our existing courses here: 

An overview of Amazon CloudWatch: https://cloudacademy.com/course/an-overview-of-amazon-cloudwatch-1222/?context_resource=lp&context_id=40

So the taks essentially outlines exactly what will happen during the data transfer process.

From an architecture perspective, the process looks as shown here.

We have the on-premises server holding our storage data with the AWS DataSync agent installed as a virtual machine.  Two locations would have been created, with the source pointing to the on-premises server and the destination to an S3 bucket.  The task would then be configured to transfer the data conforming to the setting configured within the task, and when it runs AWS DataSync will transfer the data using encryption-in-transit over TLS to the Amazon S3 destination bucket.  All logging information would then be stored in Amazon CloudWatch if configured to do so.

Before I move on, I just want to highlight that the DataSync tasks will only copy your storage data, it doesn’t include any file systems permissions or settings.

Let me now quickly explain how the process works if you were to transfer data from one AWS storage service to another, for example, from Amazon S3 to Amazon EFS.

This time, let’s look at the infrastructure to begin with.

As you can see, in this process we do not use the Agent, however, we will continue to create 2 locations, a source location for Amazon S3 and a destination location for EFS.  Also in addition to these locations, we will also have to create a Task.  So the process remains very similar to that of when transferring data from on-premises, however, we don’t need to use the DataSync agent. 

Again, it’s important to note that when using DataSync it will not copy and configuration relating to the source storage option, for example if you were to copy from one S3 bucket to another S3 bucket, then it would only move the data, it would not copy any bucket-level settings or permissions.

Lectures

 

Introduction - What is AWS DataSync? - AWS DataSync Use Cases - Summary

About the Author
Students
236830
Labs
1
Courses
232
Learning Paths
187

Stuart has been working within the IT industry for two decades covering a huge range of topic areas and technologies, from data center and network infrastructure design, to cloud architecture and implementation.

To date, Stuart has created 150+ courses relating to Cloud reaching over 180,000 students, mostly within the AWS category and with a heavy focus on security and compliance.

Stuart is a member of the AWS Community Builders Program for his contributions towards AWS.

He is AWS certified and accredited in addition to being a published author covering topics across the AWS landscape.

In January 2016 Stuart was awarded ‘Expert of the Year Award 2015’ from Experts Exchange for his knowledge share within cloud services to the community.

Stuart enjoys writing about cloud technologies and you will find many of his articles within our blog pages.