Long-term backups with Amazon Glacier


Storage on AWS
Elastic Block Store
Overview of EBS
Simple Storage Service
Advanced services: Glacier and Storage Gateway
Start course

Storage is a central part of any computing infrastructure. Amazon provides many services on the cloud to replace traditional, on-premises storage systems, ranging from short-term storage for running instances who are doing computation on smal batches of data, up to long-term archives saved in redundant disks or even tapes.

In this course, the Computer Engineer and Cloud Expert Mohammad Ali Tabibi will give you an overview of the AWS storage services like EBS, S3, Glacier and Storage Gateway, to better understand what they are for, how they are built, and how they can be best used.

Who should take this course

Being a beginner course, no prerequisites are needed to understand the concepts of this course. Nevertheless, having some knowledge of what AWS is, and having some experience with the Linux Command Line Interface, might be helpful to follow along the course.

If you want to test your knowledge on the basic topics covered in this course, we strongly suggest to take our AWS questions. Also, if you want to learn more about the other AWS services, please consider checking out our other AWS courses.


Amazon Glacier is an extremely low-cost storage service that provides secure and durable storage for data archiving and back up. In order to keep costs low, Amazon Glacier is optimized for data that is infrequently accessed and for which retrieval times of several hours are suitable. With Amazon Glacier, you can reliably store large or small amounts of data for as little as one cent per gigabyte per month. A significant saving compared to on-premises solutions.

There are many features and benefits to using Amazon Glacier. Amazon Glacier supports secure transfer of your data over SSL and automatically encrypts data at rest using AES 256 bit symmetric keys. You can also control access to your data using IAM. Amazon Glazier is designed to provide average annual durability of 99.99% for an archive. The service redundantly stores data in multiple facilities and on multiple devices within each facility.

Amazon Glacier scales to meet your growing and often unpredictable storage requirements. There is no limit to the amount of data you can store in the service. In addition, you can choose to store your data in the Amazon Glacier region that meets your regulatory throughput and geographic redundancy criteria. Amazon S3 allows you to seamlessly move data between itself and Amazon Glacier using data life cycle policies. You can also use AWS import/export to accelerate using large amounts of data into Amazon Glacier using portable storage devices for transport.

Amazon Glacier can be used to support a wide variety of use cases. For example, off-site enterprise information archiving.

Amazon Glacier allows you to cost-effectively and securely store large enterprise data off-site making it simple, inexpensive, and safe to retain archive data for as long as desired.

Archiving media assets. Media companies core assets are their content. The number and size of these assets can grow to tens or hundreds of petabytes, and safely and securely storing these assets is of critical importance. Archiving research and scientific data. The very same logic applies to research and scientific organizations such as pharmaceutical and biotech companies, as well as universities and research institutes who have large data archiving needs for their data sets.

Digital preservation. Digital preservationists and organization such as libraries, historical societies, non-profits and governments can take advantage of high durability guaranteed by Amazon Glacier to save their data.

Magnetic tape replacement. Amazon Glacier can also replace on-premises or off-site tape libraries. Replacing your tape library with Amazon Glacier removes the burden of managing these operational challenges and the cost of buying dedicated hardware. The Amazon Glacier data model core concepts include vaults and archives. Amazon Glacier is a rest-based web service. In terms of rest, vaults and archives are the resources. In addition, the Amazon Glacier data model includes job and notification configuration resources. These resources complement the core resources. In Amazon Glacier, a vault is a container for storing archives. When you create a vault, you specify a name and select an AWS region where you want to create that vault. Each vault resource has a unique address whose general form you can see here. For example, suppose you want to create a vault, example vault in the east of the United States, say northern Virginia. This vault can then be addressed by the URI as you can see on the screen.

An AWS account can create vaults in any supported AWS region. You can store an unlimited number of archives in a vault. Depending on your business or application needs, you can store these archives in one or multiple vaults. Amazon Glacier supports various vault operations. Note that vault operations are region specific. When you request a vault list, you can request it from a specific AWS region and the resulting list only include vaults created in that specific region. An archive can be any object such as a photo, video, or document and is the base unit of storage in Amazon Glacier. Each archive has a unique ID and an optional description.

Archive IDs are unique within a vault. Note that you can only specify the optional description during the upload of an archive. Amazon Glacier assigns the archive an ID which is unique in the AWS region in which it is stored. You can store an unlimited number of archives in a vault. Each archive has a unique address which you can see the general form of here. Retrieving an archive in vault inventory or list of archives are asynchronous operations in Amazon Glacier in which you first initiate a job and then download the job output after Amazon Glacier completes it. With Amazon Glacier, your data retrieval requests are queued and most jobs take about four hours to complete. Each job is uniquely identified by a URI. You can see the general form of this URI here. After Amazon Glacier completes a job, you can download the job output. It is possible to download all the job outputs or optionally download only a portion of the output by specifying a byte range. Because jobs take time to complete, Amazon Glacier supports a notification mechanism to warn you when a job is about to complete. It is possible to configure a vault to send notifications to an Amazon simple notification service topic when a job is complete. You can specify one SNS topic per vault in the notification configuration.

Each notification configuration resource is uniquely identified by a URI. You can see the general form of this URI here. You can create vaults programmatically or by using the Amazon Glacier console. Here we'll see how to use the console to create a vault. Sign in to the AWS management console and click on Amazon Glacier. Now select a region from the region selector then click on create vault. In the create vault window, enter a name in the vault's name box. If you wanted to have notifications sent to you or your application, whenever certain Amazon Glacier jobs complete, click on continue to notifications to set up Amazon SNS notification by creating a new SNS topic or using an existing one. To complete the process, click create vault now. You can see that your created vault is in the Amazon Glacier vaults list and by selecting it in the panel below.

You can delete a vault only if there are no archives in it as of the last inventory that Amazon Glacier has computed, and also if there have been no rights to the vault since the last inventory. Note that Amazon Glacier prepares an inventory for each vault periodically, every 24 hours.

Because the inventory might not reflect the latest information, Amazon Glacier ensures that the vault is indeed empty by checking if there were any right operations since the last vault inventory.

You can delete a vault programmatically or by using the Amazon Glacier console. Here we'll see how to use the console to delete a vault. Sign into the AWS management console and open the Amazon Glacier console, then select the AWS region where the vaults that you want to delete exists.

Select your specific vault and click delete vault. Finally, confirm the deletion by clicking okay.

About the Author
Mohammad Ali Tabibi
Software Engineer

Computer Engineer and Cloud Expert