AWS Storage Services
The course is part of these learning paths
With an on-premises data backup solution within your data center, it’s critical for your business to have a disaster recovery plan built into your business continuity plans. You need to have a plan in place should a disaster occur that affects your operation of the business. The same is true when you start to leverage the cloud for its storage capabilities for your backed up data.
This course explains how cloud storage fits in with DR and the different considerations when preparing to design a solution to back up your on-premises data to AWS. It will explain how Amazon S3, AWS Snowball, and AWS Storage Gateway can all be used to help with the transfer and storage of your backup data.
You should not assume that just because you are backing data up to the cloud it will solve your every need, there are many points of consideration when planning a DR backup solution to the cloud, such as AWS. However, it does also open opportunities to you that may not have been possible with a standard on-premises backup solution. It’s these points of interest that many enterprises are focusing on to gain a significant advantage when it comes to disaster recovery.
AWS offers a number of different services available to help you architect the best solution for your needs. To allow you to set up the correct solution that works for you, you must first understand how each of these services can be of benefit to you.
To help you implement effective solutions, you must first have answers to the following:
- What is your RTO (Recovery Time Objective)?
- What is your RPO (Recovery Point Objective)?
- How quickly do you need to retrieve your data?
- How much data do you need to import/export?
- What durability is required for your data?
- How sensitive is your data?
- What security mechanisms are required to protect your data?
- Do you have any compliance controls that you need to abide by?
When you have answers to these questions, you will be able to start working towards an effective backup solution to create a cost-efficient, highly reliable, durable and secure data backup storage solution.
- Gain an understanding of how your storage solution can affect your business continuity and DR plans
- Obtain the knowledge to know when to use specific AWS storage solutions to your advantage between Amazon S3, Amazon Glacier, AWS Snowball, and AWS Storage Gateway
- Understand how each of these services can provide a DR solution to fit your specific needs
This course has been designed for:
- Engineers who need to manage and maintain AWS storage services
- Architects who are implementing effective data backup solutions from on-premise to AWS
- Business continuity management managers
- Anyone looking to prepare for the AWS Solutions Architect - Professional certification
As a prerequisite to this course you should have a basic understanding of the following:
- Business continuity
- Disaster recovery
- Data backup terms and methodologies
- Amazon S3
- Amazon EC2
- Elastic Block Store (EBS)
This course includes
If you have thoughts or suggestions for this course, please contact Cloud Academy at firstname.lastname@example.org.
Resources referenced within this lecture
Hello, and welcome to this lecture. I want to explain when you may want to use AWS Storage Gateway along with the different options available when using this service.
Storage Gateway allows you to provide a gateway between your own data center's storage systems such as your SAN, NAS or DAS and Amazon S3 and Glacier on AWS.
The Storage Gateway itself is a software appliance that can be stored within your own data center which allows integration between your on-premise storage and that of AWS. This connectivity can allow you scale your storage requirements both securely and cost efficiently.
The software appliance can be downloaded from AWS as a virtual machine which can then be installed on your VMware or Microsoft hypervisors.
Storage Gateway offers different configurations and options allowing you to use the service to fit your needs. It offers file, volume and tape gateway configurations which you can use to help with your DR and data backup solutions.
Let me explain the differences between each of these configurations, starting with file gateways.
File gateways allow you to securely store your files as objects within S3. Using as a type of file share which allows you mount on map drives to an S3 bucket as if the share was held locally on your own corporate network. When storing files using the file gateway they sent to S3 over HTTPS and are also encrypted with S3's own server side encryption SSE-S3.
In addition to this local a on-premise cache is also provisioned for accessing your most recently accessed files to optimize latency with also helps to reduce egress traffic costs. When your file gateway's first configured you must associate it with your S3 bucket which the gateway will then present as a NFS V.3 or V4.1 file system to your internal applications.
This allows you to view the bucket as a normal NFS file system, making it easy to mount as a drive on Linux or map a drive to it in Microsoft. Any files that are then written to these NFS file systems are stored in S3 as individual objects as a one to one mapping of files to objects.
The second option we have as a gateway configuration are volume gateways, and these can be figured in one of two different ways, Stored volume gateways and cached volume gateways.
Let me explain stored volume gateways first.
Stored volume gateways are often used as a way to backup your local storage volumes to Amazon S3 whilst ensuring your entire data library also remains locally on-premise for very low latency data access. Volumes created and configured within the storage gateway are backed by Amazon S3 and are mounted as iSCSI devices that your applications can then communicate with.
During the volume creation, these are mapped to your on-premise storage devices which can either hold existing data or be a new disk. As data is written to these iSCSI devices the data is actually written to your local storage services such as your own NAS, SAN or DAS storage solution. However the storage gateway then asynchronously copies this data to Amazon S3 as EBS snapshots.
Having your entire dataset remain locally ensures you have the lowest latency possible to access your data which may be required for specific applications or security compliance and governance controls whilst at the same time providing a backup solution which is governed by the same controls and security that S3 offers.
Volumes created can be between 1GiB and 16TB and for each storage gateway up to 32 stored volumes can be created which can give you a maximum total of 512TB of storage per gateway. Storage volume gateways also provide a buffer which uses your existing storage disks. This buffer is used as a staging point for data that is waiting to be written to S3.
During the upload process the data is sent over an encrypted SSL channel and stored in an encrypted format within S3. To add to the management and backup of your data storage gateway makes it easy to take snapshots of your storage volumes at any point, which are then stored as EBS snapshots on S3. It's worth pointing out that these snapshots are incremental ensuring that only the data that's changed since the last backup is copied helping to minimize storage costs on S3.
As you can see, gateway stored volumes makes recovering from a disaster very simple. For example, let's consider the scenario that you lost your local application and storage layers on-premise Providing you had provision for such a situation you may have AMI templates that mimic your application tier which you could provision as EC2 instances within AWS. You could then attach EBS volumes to these instances which could be created from the storage gateway volume snapshots which would be stored on S3 giving you access to your production data required. Your applications storage infrastructure could potentially be up and running again in a matter of minutes within a VPC with connectivity from your on-premise data center.
Let me now compare this option to the second volume gateway option of cached volume gateways. Cached volume gateways are differed to stored volume gateways in that the primary data storage is actually Amazon S3 rather than your own local storage solution.
However cache volume gateways do utilize your local data storage as a buffer and the cache for recently accessed data to help maintain low latency, hence the name, Cache Volumes.
Again, during the creation of these volumes they are presented as iSCSI volumes which can be mounted by an application service. The volumes themselves are backed by the Amazon S3 infrastructure as opposed to your local disks as seen in the stored volume gateway deployment. As a part of this volume creation you must also select some local disks on-premise to act as your local cache and a buffer for data waiting to be uploaded to S3.
Again, this buffer is used as a staging point for data that is waiting to be written to S3 and during the upload process, the data is encrypted using an SSL channel where the data is then encrypted within SSE S3. The limitations is slightly different with cache volume gateways in that each volume created can be up to 32TB in size with support for up to 32 volumes meaning a total storage capacity of 1024TB per cache volume gateway.
Although all of your primary data used by applications is stored in S3 across volumes, it is still possible to take incremental backups of these volumes as EBS snapshots. In a DR scenario, and as I mentioned in the previous section, this then enables quick deployment of the datasets which can be attached to EC2 instances as EBS volumes containing all of your data as required.
The final option with AWS Storage Gateway is a tape gateway known as Gateway VTL. Virtual Tape Library. This allows you again to back up your data to S3 from your own corporate data center but also leverage Amazon Glacier for data archiving. Virtual Tape Library is essentially a cloud based tape backup solution replacing physical components with virtual ones.
This functionality allows you to use your existing tape backup application infrastructure within AWS providing a more robust and secure backup and archiving solution. The solution itself is comprised of the following elements.
- Storage Gateway. The gateway itself is configured as a tape gateway which as a capacity to hold 1500 virtual tapes.
- Virtual Tapes. These are a virtual equivalent to a physical backup tape cartridge which can by anything from 100 gig to two and a half terabytes in size. And any data stored on the virtual tapes are backed by AWS S3 and appear in the virtual tape library.
- Virtual Tape Library. VTL. As you may have guessed these are virtual equivalents to a tape library that contain your virtual tapes.
- Tape Drives. Every VTL comes with ten virtual tape drives which are presented to your backup applications is iSCSI devices.
- Media Changer. This is a virtual device that manages tapes to and from the tape drive to your VTL and again it's presented as an iSCSI device to your backup applications.
- Archive. This is equivalent to an off-site tape backup storage facility where you can archive tapes from your virtual tape library to Amazon Glacier which as we already know, is used as a cold storage solution. If retrieval of the tapes are required, storage gateway uses the standard retrieval option which can take between 3 - 5 hours to retrieve your data.
Once your storage gateway has been configured as a tape gateway, your application and backup software can mount the tape drives along with the media changer is iSCSI devices to make the connection.
You can then create the required virtual tapes as you need them for backup, and your backup software can use these to back up the required data which is stored on S3.
For a list of the support and third party backup applications, please visit the link on screen.
When you want to archive virtual tapes for maybe cost optimization or compliance and governance or even DR, then the data is simply moved from Amazon S3 to Amazon Glacier.
That has now brought me to the end of this lecture. Coming up next I shall be summarizing the key points taken from the previous lectures throughout this course.
About the Author
Stuart has been working within the IT industry for two decades covering a huge range of topic areas and technologies, from data centre and network infrastructure design, to cloud architecture and implementation.
To date, Stuart has created 60++ courses relating to Cloud, most within the AWS category with a heavy focus on security and compliance
He is AWS certified and accredited in addition to being a published author covering topics across the AWS landscape.
In January 2016 Stuart was awarded ‘Expert of the Year Award 2015’ from Experts Exchange for his knowledge share within cloud services to the community.
Stuart enjoys writing about cloud technologies and you will find many of his articles within our blog pages.