Using AWS Storage Gateway for On-premise Data Backup
Start course
2h 3m

** Not all content covered in the course introduction has been added to the course at this time. Additional content is scheduled to be added to this course in the future. **

In this section of the AWS Certified: SAP on AWS Specialty learning path, we introduce you to strategies for configuring high availability and disaster recovery for SAP workloads on AWS.

Learning Objectives

  • Understand how to configure high availability with Amazon RDS
  • Identify backup and disaster recovery strategies using the AWS Cloud
  • Describe various approaches for business continuity and diaster recovery for SAP workloads on AWS


The AWS Certified: SAP on AWS Specialty certification has been designed for anyone who has experience managing and operating SAP workloads. Ideally you’ll also have some exposure to the design and implementation of SAP workloads on AWS, including migrating these workloads from on-premises environments. Many exam questions will require a solutions architect level of knowledge for many AWS services. All of the AWS Cloud concepts introduced in this course will be explained and reinforced from the ground up.


Resources referenced within this lecture

Supported Third-Party Backup Applications for Tape Gateway

Lecture Transcript

Hello, and welcome to this lecture. I want to explain when you may want to use AWS Storage Gateway along with the different options available when using this service.

Storage Gateway allows you to provide a gateway between your own data center's storage systems such as your SAN, NAS or DAS and Amazon S3 and Glacier on AWS.

The Storage Gateway itself is a software appliance that can be stored within your own data center which allows integration between your on-premise storage and that of AWS. This connectivity can allow you scale your storage requirements both securely and cost efficiently.

The software appliance can be downloaded from AWS as a virtual machine which can then be installed on your VMware or Microsoft hypervisors.

Storage Gateway offers different configurations and options allowing you to use the service to fit your needs. It offers file, volume and tape gateway configurations which you can use to help with your DR and data backup solutions.

Let me explain the differences between each of these configurations, starting with file gateways.

File gateways allow you to securely store your files as objects within S3. Using as a type of file share which allows you mount on map drives to an S3 bucket as if the share was held locally on your own corporate network. When storing files using the file gateway they sent to S3 over HTTPS and are also encrypted with S3's own server side encryption SSE-S3.

In addition to this local a on-premise cache is also provisioned for accessing your most recently accessed files to optimize latency with also helps to reduce egress traffic costs. When your file gateway's first configured you must associate it with your S3 bucket which the gateway will then present as a NFS V.3 or V4.1 file system to your internal applications.

This allows you to view the bucket as a normal NFS file system, making it easy to mount as a drive on Linux or map a drive to it in Microsoft. Any files that are then written to these NFS file systems are stored in S3 as individual objects as a one to one mapping of files to objects.

The second option we have as a gateway configuration are volume gateways, and these can be figured in one of two different ways, Stored volume gateways and cached volume gateways.

Let me explain stored volume gateways first.

Stored volume gateways are often used as a way to backup your local storage volumes to Amazon S3 whilst ensuring your entire data library also remains locally on-premise for very low latency data access. Volumes created and configured within the storage gateway are backed by Amazon S3 and are mounted as iSCSI devices that your applications can then communicate with.

During the volume creation, these are mapped to your on-premise storage devices which can either hold existing data or be a new disk. As data is written to these iSCSI devices the data is actually written to your local storage services such as your own NAS, SAN or DAS storage solution. However the storage gateway then asynchronously copies this data to Amazon S3 as EBS snapshots.

Having your entire dataset remain locally ensures you have the lowest latency possible to access your data which may be required for specific applications or security compliance and governance controls whilst at the same time providing a backup solution which is governed by the same controls and security that S3 offers.

Volumes created can be between 1GiB and 16TB and for each storage gateway up to 32 stored volumes can be created which can give you a maximum total of 512TB of storage per gateway. Storage volume gateways also provide a buffer which uses your existing storage disks. This buffer is used as a staging point for data that is waiting to be written to S3.

During the upload process the data is sent over an encrypted SSL channel and stored in an encrypted format within S3. To add to the management and backup of your data storage gateway makes it easy to take snapshots of your storage volumes at any point, which are then stored as EBS snapshots on S3. It's worth pointing out that these snapshots are incremental ensuring that only the data that's changed since the last backup is copied helping to minimize storage costs on S3.

As you can see, gateway stored volumes makes recovering from a disaster very simple. For example, let's consider the scenario that you lost your local application and storage layers on-premise Providing you had provision for such a situation you may have AMI templates that mimic your application tier which you could provision as EC2 instances within AWS. You could then attach EBS volumes to these instances which could be created from the storage gateway volume snapshots which would be stored on S3 giving you access to your production data required. Your applications storage infrastructure could potentially be up and running again in a matter of minutes within a VPC with connectivity from your on-premise data center.

Let me now compare this option to the second volume gateway option of cached volume gateways. Cached volume gateways are differed to stored volume gateways in that the primary data storage is actually Amazon S3 rather than your own local storage solution.

However cache volume gateways do utilize your local data storage as a buffer and the cache for recently accessed data to help maintain low latency, hence the name, Cache Volumes.

Again, during the creation of these volumes they are presented as iSCSI volumes which can be mounted by an application service. The volumes themselves are backed by the Amazon S3 infrastructure as opposed to your local disks as seen in the stored volume gateway deployment. As a part of this volume creation you must also select some local disks on-premise to act as your local cache and a buffer for data waiting to be uploaded to S3.

Again, this buffer is used as a staging point for data that is waiting to be written to S3 and during the upload process, the data is encrypted using an SSL channel where the data is then encrypted within SSE S3. The limitations is slightly different with cache volume gateways in that each volume created can be up to 32TB in size with support for up to 32 volumes meaning a total storage capacity of 1024TB per cache volume gateway.

Although all of your primary data used by applications is stored in S3 across volumes, it is still possible to take incremental backups of these volumes as EBS snapshots. In a DR scenario, and as I mentioned in the previous section, this then enables quick deployment of the datasets which can be attached to EC2 instances as EBS volumes containing all of your data as required.

The final option with AWS Storage Gateway is a tape gateway known as Gateway VTL. Virtual Tape Library. This allows you again to back up your data to S3 from your own corporate data center but also leverage Amazon Glacier for data archiving. Virtual Tape Library is essentially a cloud based tape backup solution replacing physical components with virtual ones.

This functionality allows you to use your existing tape backup application infrastructure within AWS providing a more robust and secure backup and archiving solution. The solution itself is comprised of the following elements.

- Storage Gateway. The gateway itself is configured as a tape gateway which as a capacity to hold 1500 virtual tapes.

- Virtual Tapes. These are a virtual equivalent to a physical backup tape cartridge which can by anything from 100 gig to two and a half terabytes in size. And any data stored on the virtual tapes are backed by AWS S3 and appear in the virtual tape library.

- Virtual Tape Library. VTL. As you may have guessed these are virtual equivalents to a tape library that contain your virtual tapes.

- Tape Drives. Every VTL comes with ten virtual tape drives which are presented to your backup applications is iSCSI devices.

- Media Changer. This is a virtual device that manages tapes to and from the tape drive to your VTL and again it's presented as an iSCSI device to your backup applications.

- Archive. This is equivalent to an off-site tape backup storage facility where you can archive tapes from your virtual tape library to Amazon Glacier which as we already know, is used as a cold storage solution. If retrieval of the tapes are required, storage gateway uses the standard retrieval option which can take between 3 - 5 hours to retrieve your data.

Once your storage gateway has been configured as a tape gateway, your application and backup software can mount the tape drives along with the media changer is iSCSI devices to make the connection.

You can then create the required virtual tapes as you need them for backup, and your backup software can use these to back up the required data which is stored on S3.

For a list of the support and third party backup applications, please visit the link on screen.

When you want to archive virtual tapes for maybe cost optimization or compliance and governance or even DR, then the data is simply moved from Amazon S3 to Amazon Glacier.

About the Author
Learning Paths

Stuart has been working within the IT industry for two decades covering a huge range of topic areas and technologies, from data center and network infrastructure design, to cloud architecture and implementation.

To date, Stuart has created 150+ courses relating to Cloud reaching over 180,000 students, mostly within the AWS category and with a heavy focus on security and compliance.

Stuart is a member of the AWS Community Builders Program for his contributions towards AWS.

He is AWS certified and accredited in addition to being a published author covering topics across the AWS landscape.

In January 2016 Stuart was awarded ‘Expert of the Year Award 2015’ from Experts Exchange for his knowledge share within cloud services to the community.

Stuart enjoys writing about cloud technologies and you will find many of his articles within our blog pages.