Introduction to Amazon Storage Gateway


Storage on AWS
Elastic Block Store
Overview of EBS
Simple Storage Service
Advanced services: Glacier and Storage Gateway
Start course

Storage is a central part of any computing infrastructure. Amazon provides many services on the cloud to replace traditional, on-premises storage systems, ranging from short-term storage for running instances who are doing computation on smal batches of data, up to long-term archives saved in redundant disks or even tapes.

In this course, the Computer Engineer and Cloud Expert Mohammad Ali Tabibi will give you an overview of the AWS storage services like EBS, S3, Glacier and Storage Gateway, to better understand what they are for, how they are built, and how they can be best used.

Who should take this course

Being a beginner course, no prerequisites are needed to understand the concepts of this course. Nevertheless, having some knowledge of what AWS is, and having some experience with the Linux Command Line Interface, might be helpful to follow along the course.

If you want to test your knowledge on the basic topics covered in this course, we strongly suggest to take our AWS questions. Also, if you want to learn more about the other AWS services, please consider checking out our other AWS courses.


AWS storage gateway is a service connecting an on-premises software appliance with cloud based storage to provide seamless and secure integration between an organization's on-premises IT environment an AWS' storage infrastructure. The service allows you to securely store data in the AWS cloud for scalable and cost effective storage. It provides low latency performance by maintaining frequently accessed data on premises while securely storing all of your data encrypted in Amazon S3 or Amazon Glacier. AWS storage gateway has several benefits. The AWS storage gateway securely transfers your data to AWS over SSL and stores data encrypted in Amazon S3 and Amazon Glacier using advanced encryption standard. The AWS gateway durably stores your on-premises application data by uploading it to Amazon S3 and Amazon Glacier, which in turn redundantly store data in multiple facilities and on multiple devices within each facility. It also performs regular systematic data integrity checks. There's no need to restructure your on-premises applications.

Gateway cached volumes and gateway stored volumes expose a standard ISCSI blockless device interface and gateway VTL presents a standard ISCSI virtual tape library interface. Gateway stored volumes and gateway cached volumes are designed to seamlessly integrate with Amazon S3, Amazon EBS and Amazon EC2 by enabling you to store point-in-time snapshots of your on-premises application data in Amazon S3 as Amazon EBS snapshots for future recovery on premises or in Amazon EC2. The AWS storage gateway supports industry standard storage protocols that work with your existing applications.

AWS storage gateway offers both volume-based and tape-based storage solutions. Volume gateways. Volume gateways provide cloud back storage volumes that you can mount as ISCSI devices from your on-premises application servers. The gateway supports the following volume configurations.

Gateway-cached volumes, you can store your data in Amazon S3 and retain a copy of frequently accessed data subsets locally.

Gateway-cached volumes offers substantial cost savings on primary storage and minimize the need to scale your storage on premises. You can also retain low-latency access to your frequently accessed data.

You can create storage volumes up to 32 terabytes in size and mount them as ISCSI devices from your on-premises application servers. Data written to these volumes is stored in Amazon S3 with only a cache of recently written and recently read data stored locally on your on-premises storage hardware. Gateway-stored volumes. If you need low latency access to your entire data set, you can configure your on-premises gateway to store all your data locally and then asynchronously back up point-in-time snapshots of this data to Amazon S3. This configuration provides durable and an inexpensive off-site backups that you can recover to your local data center or Amazon EC2. For example, if you need replacement capacity for disaster recovery, you can recover the backups to Amazon EC2. You can create storage volumes of up to one terabyte in size and then mount them as ISCSI devices from your on-premises application servers. Data written to your gateway-stored volumes is stored on your on-premises storage hardware and asynchronously backed up to Amazon S3 in the form of Amazon EBS snapshots. Gateway-virtual tape library, VTL. You can cost effectively and durably archive backup data in Amazon Glacier.

Gateway-VTL provides a virtual tape infrastructure that scales seamlessly with your business needs and eliminates the operational burden of provisioning, scaling and maintaining a physical tape infrastructure. Each virtual tape can be stored in a virtual tape library backed up by Amazon S3 or virtual tape shelf backed by Amazon Glacier. Each virtual tape library holds up to 1,500 virtual tapes with a maximum aggregate capacity of 150 terabytes. There are many possible use cases for AWS storage gateway. It enables your existing on-premises back up applications to store primary backups on Amazon S3's scalable, reliable, secure and cost effective storage service. The AWS storage gateway, together with EC2, can mirror your entire production environment for disaster recovery. Using Amazon EC2 you can configure virtual machine images of your DR application servers and only pay for these servers when you need them.

Managing on-premises storage for departmental file shares and home directories typically results in high capital and maintenance costs, underutilized hardware and restrictive user quotas. The AWS storage gateway addresses these on-premises scaling and maintenance issues by enabling you to seamlessly store your corporate file shares on Amazon S3 while keeping a copy of your frequently accessed files on premises. If you want some leverage, Amazon EC2's on demand compute capacity for additional capacity during peak periods for new projects or some more cost effective way to run your normal workloads, you can use the AWS storage gateway to mirror your volume data to Amazon EC2 instances. If you're running development and user acceptance testing environments in Amazon EC2 to take advantage of AWS' on demand compute capacity, you can use the AWS storage gateway to ensure these environments have ongoing access to the latest data from your production systems on premises.

By using gateway VTL, you can store data requiring long term retention and infrequent access without changing your existing backup applications and tape-based processes. Also you can own and operate on premises physical tape infrastructure by storing your archive and long term back up data on a limited collection of virtual tapes. The following diagram provides an overview of the AWS storage gateway deployment. Gateway-cached volumes allow you to utilize Amazon S3 for your primary data while retaining some portion for it locally in a cache for frequently accessed data. As your applications write data to and read data from a gateway-cached volume, this data is initially stored on-premises on direct attached storage, DAS, network attached storage, NAS, or storage area network, SAN storage. This local storage is used for two purposes. First, to prepare and buffer your data for upload to your storage volume in Amazon S3. Second, it's used to cache your applications recently written and recently read data on premises for low latency access. When you application reads data from your gateway-cached volume, your on-premises gateway first checks its local cache for this data before checking Amazon S3. Gateway-stored volumes store your primary data locally while asynchronously backing up the data to AWS. Your gateway-stored volumes are maps to on-premises DAS, NAS or SAN storage. You can start with either new or already stored data. As your on-premises applications read and write to your storage volume, this data is retrieved locally from or stored locally to the on-premises DAS, NAS or SAN storage, it maps to your storage volume. Your on-premises gateway also temporarily stores this data on local DAS, NAS or SAN storage to prepare and buffer it for upload to Amazon S3 or to store it in the form of Amazon EBS snapshots.

Whether you're using gateway-cached or gateway-stored volumes, you can take point-in-time incremental snapshots of your storage gateway volume and store them in Amazon S3 in the form of Amazon EBS snapshots. For gateway-stored volumes or your volume data is stored on premises, snapshots provide durable off-site backups in Amazon S3. Snapshots can be initiated on a scheduled or ad hoc basis. When taking a new snapshot, only the data that has changed since your last snapshot is stored. If you have a volume of 100 gigabytes of data but only five gigabytes of data have changed since your last snapshot, only the five additional gigabytes of data will be stored in Amazon S3. When you delete a snapshot, only the data not needed for any other snapshot is removed. The following diagram provides an overview of the gateway-VTL deployment.

Gateway-VTL presents your existing backup application with an industry standard ISCSI-based virtual tape library, VTL, consisting of a virtual media changer and virtual tape drives. Virtual tapes are created in your virtual tape library using the AWS management console and each virtual tape library can hold up to 1,500 virtual tapes with a maximum aggregate capacity of 150 terabytes. Once created, virtual tapes are discovered by your backup application using its standard media inventory procedure and they're available for immediate access backed up by Amazon S3. Your backup application can read and write to virtual tapes by mounting then on virtual tape drives using the virtual media changer. When you no longer require immediate or frequent access to data contained on a virtual tape, you can use your backup application to move it from its virtual tape library to your virtual tape shelf. It is backed up by Amazon Glacier further reducing your storage costs. Retrieving virtual tapes from your virtual tape shelf can be performed using the AWS management console and it takes about 24 hours to be available in your virtual tape library.

About the Author
Mohammad Ali Tabibi
Software Engineer

Computer Engineer and Cloud Expert