In this course, we explain how to set up a hybrid storage solution using AWS Storage Gateway.
- AWS Storage Gateway and its architecture
- Which type of storage gateway to use for your use case
- How to create a storage gateway using the EC2 platform
- Those that are looking to implement a hybrid storage solution
- Those who have an interest in AWS storage gateway that may need a bit more information on how it works
- You should have a strong understanding of storage in AWS, including knowledge of Amazon S3, Amazon FSx, Amazon EBS, and Amazon Glacier
- Familiarity with Amazon EC2 will help as well
- For more information on these services, check out existing content here:
When you create a storage gateway in the console, you choose between four different types: an Amazon S3 File gateway, an Amazon FSx File Gateway, a tape gateway, and a volume gateway.
The gateway you choose depends on your use case. In this lecture, I’ll talk about a few common Storage Gateway scenarios and the type of gateways that are suitable for each scenario.
The first scenario is a backup and archiving use case. A company wants to eventually close its data centers but must keep a few critical systems, such as Hadoop clusters and their SQL databases, running on-premises due to latency concerns. Backing up these systems using traditional infrastructure is an expensive option, so they’ve decided to consider using Storage Gateway to backup their data to the cloud instead.
So which gateway type would they use? Well, all gateway types are suitable for backup and archiving use cases, so we’ll have to look at the types of data they’re backing up here. In this scenario, they’re not talking about physical or virtual tapes, so tape gateway is not the best option. Instead, they’re talking about Hadoop and SQL database files and logs, so the best option is either the S3 File Gateway or a Volume Gateway.
If the company uses S3 File Gateway, they can back up their data to Amazon S3. Here’s how it works. Once they deploy and activate the gateway, they can then create an NFS or SMB file share. They then specify which S3 bucket to associate with this file share. Once the file share is created, you can mount it on an application or database server, where you can read and write to the share. Any files in your file share get mapped 1:1 in your S3 bucket.
This means they can access their data directly in S3 and make use of native S3 controls, such as S3 object lock, S3 versioning, cross region replication, and S3 lifecycle configurations to move data into lower cost storage tiers such as S3 Glacier Flexible Retrieval and S3 Glacier Deep Archive. By using lifecycle configurations, the company can drastically reduce S3 storage costs overall.
They can also reduce latency as well with the S3 File Gateway, which is a concern for the company in this scenario. The S3 File Gateway uses local caching to provide access to recently used data. You can size this cache to be relatively even with the size of your active working set. So if you have 1 TB of data that gets read on a Network attached storage or NAS drive on a given day, you can size your cache to at least 1 TB to satisfy these reads. Keep in mind, for an S3 File Gateway, your cache can be up to 64 TB in size.
Overall, the S3 File Gateway allows this company to back up their SQL database and Hadoop cluster data into S3 and gain access to S3 native controls.
Now theoretically, the company could also use FSx File Gateway here as well. FSx File Gateway is often used when companies have a need to replace file share infrastructure, but find it challenging because of high latency and bandwidth issues. So they turn to FSx as a cost-effective alternative.
The way it works is you deploy the gateway. You then configure the Gateway to join your Microsoft Active Directory. After that, you deploy an FSx SMB file system to the gateway, which you can then mount onto a client to read and write to.
The communication between your on-premises network and AWS does require the use of a direct connect or VPN connection.
Now the company’s decision to use this gateway really comes down to the controls they need. If they want native access to S3 object controls like versioning or lifecycle configurations, S3 File Gateway is the better option. If they need the pure Microsoft Windows file system experience, backing up to FSx File Gateway would be the clear winner, as it has parity with common Windows file system features, such as shadow copies, application-consistent backup, data deduplication, and more.
There are other differences between FSx and S3 File Gateways as well. For example, they both provide caching for low latency but these caches are optimized for different kinds of data. With S3, it’s optimized for large files, images, or large database backups. With FSx, it’s optimized for small and mixed file workloads and office documents. The true benefit of FSx is shared file systems, as you can have unlimited file shares with 500 active users per gateway. S3 only supports the use of 10 file shares with 100 active users per gateway.
So knowing that, if this use case were to mention data from multi-user interactive file shares or backing up an exchange server, then FSx for File Gateway would be a closer fit for it.
This scenario would also work with a volume gateway as well, especially if the company wants to use their AWS backups for disaster recovery purposes. So let’s change the scenario a bit: Let’s say the company now wants to eventually migrate their SQL database to the cloud. To start this journey, the first thing they’ll do is store their database backups as EBS snapshots they can later convert to EBS volumes to use with EC2.
In this use case, a volume gateway would be the perfect solution, as it provides ISCSI block-level storage for your on-premises applications. Here’s how it works.
After you create a volume storage gateway, you create individual storage volumes that your application can read and write data to. When it comes time to back up these volumes, you take snapshots of them, using the native Storage Gateway snapshot scheduler or AWS Backup.
Although these snapshots are technically stored in S3, they’re stored in a bucket that you do not control - so they aren’t treated like native S3 objects, which means you don’t get to control versioning, lifecycle configurations, or other controls that S3 File Gateway would provide you. These snapshots can be restored as storage gateway volumes that your application can read or write to or they can be restored as EBS volumes you can attach to EC2 instances.
Volume gateway has two modes: cached volumes and stored volumes. Cached volumes enable you to store your data in S3 and then keep a copy of frequently accessed data locally. This is good for cost savings and minimizes the need to scale storage on-premises.
Stored volumes will store all your data locally, and backup your data to S3 asynchronously. So for this scenario, a stored volume gateway is probably the best option for the company as they have all their data on-premises and a full backup of their data to the cloud that they can restore to use in a disaster recovery scenario.
Like the S3 File Gateway, it also makes use of a cache for frequently accessed data. And additionally, it compresses data in transit to AWS, which helps you save on bandwidth and storage costs.
Let’s move to the next scenario. In this scenario, a customer currently uses physical tapes to archive some of their more important data and stores it off-site. However, they find they’re having some challenges using physical tapes. It takes them time to order these tapes, and tapes can expire over time so restoring that data is challenging. The customer is looking to use AWS as a virtual tape library. Which gateway should they use?
The tape gateway is a good answer to this customer problem, as it enables you to create a virtual tape library that is stored on AWS. This virtual tape library serves as a drop-in replacement for your physical tape library, and will store your physical tapes as virtual tapes.
This concept of a “drop-in replacement” means that it integrates directly into your existing backup structure, with support for software like Dell EMC, CommVault, IBM, and more. When you activate and deploy your tape gateway, you have to mount virtual tape drives and a media changer on your on-premises servers. The media changer will load and unload the virtual tapes that you create into the virtual tape drives for read and write operations.
The virtual tape drive, the media changer, and the virtual tape itself all communicate to your on-premises servers through iSCSI connectivity. Once your backup software recognizes the devices, you can then begin to write your data to the virtual tapes that you create. The tape gateway will then compress that data, encrypt it, and store it in a virtual tape library that’s backed by Amazon S3.
Once you no longer need immediate access to that tape, you can eject or export the tape from your backup software. The tape gateway will then archive it to either Amazon Glacier Flexible Retrieval or Amazon Glacier Deep Archive. Thus you can stop maintaining physical tapes, and start reliably accessing them.
In summary, there are a lot of ways to architect for storage gateway, and these are just a few scenarios that might help you determine which storage gateway type to use. In addition to this, here’s a run down of when to use each gateway in general:
Using an S3 File Share is best used as a cost-effective way to backup and archive your data. It’s a very popular ingest mechanism for data analytics, data lake, and machine learning workloads.
FSx File Shares are commonly used if you have multi-user interactive file sharing such as group shares, project shares, home directories, and media editing.
Volume gateways are ideal for workloads where you backup on-premises data to the cloud, migrate volumes to the cloud, or for disaster recovery.
And tape gateways are used when transitioning off of physical tapes, or backing up and archiving data.
That’s all for this one! See you next time.
Alana Layton is an experienced technical trainer, technical content developer, and cloud engineer living out of Seattle, Washington. Her career has included teaching about AWS all over the world, creating AWS content that is fun, and working in consulting. She currently holds six AWS certifications. Outside of Cloud Academy, you can find her testing her knowledge in bar trivia, reading, or training for a marathon.