1. Home
  2. Training Library
  3. Amazon Web Services
  4. Courses
  5. Automated Data Management with EBS, S3, and Glacier

Enabling S3's LifeCycle feature


Getting started
Start course

Data management is a key part of the infrastructure of most organizations, especially those dealing with large data stores. For example, imagine a team involved in scientifical analysis of data: they probably require a system to store the raw data in, another to analyze chunks of data quickly and cost-efficiently, and long-term archival to keep both the raw data and the result of their computation. In cases like that, it's important to deploy an automated system that can move data efficiently with integrated automatic backups.

In this course, the experienced System Administrator and Cloud Expert David Clinton will talk about implementing such a data management and backup system using EBS, S3 and Glacier, and taking advantage of the S3 LifeCycle feature and of DataPipiline for the automation of data transfers among the various pieces of the infrastructure. This system can be enabled easily and cheaply, as is shown in the last lecture of the course.

Who should take this course

As a beginner-to-intermediate course, some basic knoweldge of AWS is expected. A basic knowledge of programming is also needed to follow along the Glacier lecture. In any case, even those who are totally newcomers to these topics should be able to grasp at least the key concepts. 

If you want to learn more about the AWS solutions discussed in this course, you might want to check our other AWS courses. Also, if you want to test your knowledge on the basic topics covered in this course, we strongly suggest to take our AWS questions. You will learn more about every single services cited in this course. 

If you have thoughts or suggestions for this course, please contact Cloud Academy at support@cloudacademy.com.


Hi, and welcome to cloudacademy.com's video series on Data Management. In this video we're going to explore some techniques for creating automated backups.

Backups of your working data is a critical element in any data management system. But backups are only good, and they're only useful if the guy on the other side of the keyboard, that's you, remembers to do the backup. Therefore leveraging the automation that is possible with computing systems is a very useful addition to a backup regime. Now the tool that we're gonna focus on the most in this video is a service that is integrated with S3 buckets themselves, and that is life cycle.

Here we have an empty S3 bucket. There don't happen to be any folders and archives that are stored in it right now, but this will do for demonstration purposes. We'll click on properties, and then click on versioning. Versioning allows your data to be identified by it's latest version or by earlier versions. If you accidentally overwrite some important data, that data can, if you've enabled versioning, be scored and available for retrieval should you need it. So make sure that versioning is enabled, in this case it already is. Now click on life cycle.

Let's add a rule. Have this rule apply to the whole bucket. We could of course restrict the operations that we're going to associate with this bucket to a particular folder, let's say mydata.

Anything that's found in the directory mydata will be backed up, or copied, or it'll be the subject of this process. But for now, especially since in our particular bucket there's nothing, there is no data, we'll just apply the rule to everything in the bucket. Configure the rule. On this bucket we will, let's say archive only. That is we will at a certain set time, take the data in this bucket and save it to Glacier. We will archive to the Glacier storage class after, let's say five days. Naturally, you can set that to any time you like. On previous versions, version that have already been superseded by subsequent work, perhaps we might permanently delete or archive and then permanently delete. So we'll, let's say we'll archive after five days, once a version is superseded, we will archive after five days and then permanently delete it let's say after a total of 10 days. Again these times are entirely arbitrary and you set the configuration as most fits your needs. Let's review. We'll need a rule name. And this will be myrule for our purposes. The rule would apply to the whole bucket of Elastic BeanStalk AP Northeast 1. And the times and actions have been set. Let's create and activate the rule. Don't forget to when necessary save the rule before moving on. What life cycle will do now, is automatically without our intervention and without the need for any manual work at all, copy all the contents in our case of our bucket to our Glacier instance, where we will receive very, very secure and safe storage at a very low price. There's a latency of course, as we've mentioned before. There could be three or four hours between a request for the restored data, and the data actually being copied back to where we might need it. But it's safe, and it's available.

About the Author
David Clinton
Linux SysAdmin
Learning Paths

David taught high school for twenty years, worked as a Linux system administrator for five years, and has been writing since he could hold a crayon between his fingers. His childhood bedroom wall has since been repainted.

Having worked directly with all kinds of technology, David derives great pleasure from completing projects that draw on as many tools from his toolkit as possible.

Besides being a Linux system administrator with a strong focus on virtualization and security tools, David writes technical documentation and user guides, and creates technology training videos.

His favorite technology tool is the one that should be just about ready for release tomorrow. Or Thursday.