This course covers the core learning objective to meet the requirements of the 'Designing Network & Data Transfer solutions in AWS - Level 2' skill
Learning Objectives:
- Understand the most appropriate AWS connectivity options to meet performance demands
- Understand the appropriate features and services to enhance and optimize connectivity to AWS public services such as Amazon S3 or Amazon DynamoDB.
- Understand the appropriate AWS data transfer service for migration and/or ingestion
- Apply an edge caching strategy to provide performance benefits for AWS solutions
To understand how AWS DataSync works at an operational level, we first need to understand how the architecture of the service is put together and the different components involved to carry out a DataSync task.
This table shows the current source and destination locations that DataSync can transfer files from and to.
And so based on this I want to look at the architecture of DataSync from two different perspectives, these being:
- When transferring data from your own managed storage environment to AWS
- And secondly when transferring data between 2 different AWS storage services, such as Amazon S3 to Amazon EFS
So, let’s look at the first scenario first whereby we have our own self-managed storage solution on-premises and we need to use AWS DataSync to move this data into Amazon S3, so what’s involved?
When performing data transfer from on-premises then we need to configure an Agent, a Location and a Task.
The Agent will be used on the customer side, so it sits outside of AWS, and it's just a virtual machine supported by VMware ESXi, KVM or Microsoft Hyper-V hypervisors, so it should be compatible with your existing infrastructure. The agent itself is used to both read and write data to your own storage solution and can be generated and configured from within the AWS Management dashboard which can then be downloaded.
Next we have a location, and the location identifies the endpoint of a DataSync task. So as a result, everytime you create a DataSync task you will need to specify the source location and the destination location, dictating where you want to move data from and to. You can create locations for:
- Network File Systems (NFS)
- Server Message Blocks (SMB)
- Self-managed object storage
- Amazon EFS
- Amazon FSx for Windows File Server
- Amazon S3
Again, these locations can be configured from within the AWS Management Console.
The task contains the details of the operation that you are trying to carry out and perform with DataSync, so it will contain the locations that were created and specified for both the source and destination, in addition to the configuration and conditions of how the data transfer will take place.
For example, you can configure the type integrity and data verification checks to take place, or if you want to transfer all data in the source location, or just data that has changed since the last task was performed. You can also specify if you want to overwrite or delete files.
If you only want to transfer specific files from the source, then you can apply pattern filters enabling you to restrict which files to include or exclude from the transfer in the source location.
Finally, you can also specify logging details which integrate with Amazon CloudWatch Logs to help you identify any failures or errors. For more information on Amazon CloudWatch, please see our existing courses here:
An overview of Amazon CloudWatch: https://cloudacademy.com/course/an-overview-of-amazon-cloudwatch-1222/?context_resource=lp&context_id=40
So the taks essentially outlines exactly what will happen during the data transfer process.
From an architecture perspective, the process looks as shown here.
We have the on-premises server holding our storage data with the AWS DataSync agent installed as a virtual machine. Two locations would have been created, with the source pointing to the on-premises server and the destination to an S3 bucket. The task would then be configured to transfer the data conforming to the setting configured within the task, and when it runs AWS DataSync will transfer the data using encryption-in-transit over TLS to the Amazon S3 destination bucket. All logging information would then be stored in Amazon CloudWatch if configured to do so.
Before I move on, I just want to highlight that the DataSync tasks will only copy your storage data, it doesn’t include any file systems permissions or settings.
Let me now quickly explain how the process works if you were to transfer data from one AWS storage service to another, for example, from Amazon S3 to Amazon EFS.
This time, let’s look at the infrastructure to begin with.
As you can see, in this process we do not use the Agent, however, we will continue to create 2 locations, a source location for Amazon S3 and a destination location for EFS. Also in addition to these locations, we will also have to create a Task. So the process remains very similar to that of when transferring data from on-premises, however, we don’t need to use the DataSync agent.
Again, it’s important to note that when using DataSync it will not copy and configuration relating to the source storage option, for example if you were to copy from one S3 bucket to another S3 bucket, then it would only move the data, it would not copy any bucket-level settings or permissions.
Lectures
Introduction - What is AWS DataSync? - AWS DataSync Use Cases - Summary
Stuart has been working within the IT industry for two decades covering a huge range of topic areas and technologies, from data center and network infrastructure design, to cloud architecture and implementation.
To date, Stuart has created 150+ courses relating to Cloud reaching over 180,000 students, mostly within the AWS category and with a heavy focus on security and compliance.
Stuart is a member of the AWS Community Builders Program for his contributions towards AWS.
He is AWS certified and accredited in addition to being a published author covering topics across the AWS landscape.
In January 2016 Stuart was awarded ‘Expert of the Year Award 2015’ from Experts Exchange for his knowledge share within cloud services to the community.
Stuart enjoys writing about cloud technologies and you will find many of his articles within our blog pages.