Amazon S3 Lifecycle Configurations
Introduction to Amazon EFS
EFS in Practice
Amazon Elastic Block Store (EBS)
Running Operations with the Snow Family
Data Transfers with AWS DataSync
The course is part of this learning path
This section of the Solution Architect Associate learning path introduces you to the core storage concepts and services relevant to the SAA-C03 exam. We start with an introduction to the AWS storage services, understand the options available and learn how to select and apply AWS storage services to meet specific requirements.
Want more? Try a lab playground or do a Lab Challenge!
- Obtain an in-depth understanding of Amazon S3 - Simple Storage Service
- Get both a theoretical and practical understanding of EFS
- Learn how to create an EFS file system, manage EFS security, and import data in EFS
- Learn about EC2 storage and Elastic Block Store
- Learn about the services available in AWS to optimize your storage
- Learn how to use AWS DataSync to move data between storage systems and AWS storage services
Hello and welcome to this lecture covering Amazon S3 storage classes. As we just saw in the demonstration, I had an option to select which storage class I wanted my uploaded object to reside in. Amazon S3 offers these different storage classes to allow you to select a class based on performance features and costs and it's down to you to select the storage class that you require for the data. The storage classes available are as follows S3 Standard, S3 Intelligent Tiering, S3 Standard Infrequent Access, S3 One Zone Infrequent Access, S3 Glacier, and S3 Glacier Deep Archive.
S3 Standard. This storage class is considered a general-purpose storage class. It is ideal for a range of use cases where you need high throughput with low latency with the added ability of being able to access your data frequently. By copying data to multiple availability zones, S3 Standard offers eleven nines of durability across multiple availability zones, meaning the OData remains protected against a single availability zone failure. It also offers a 99.99% availability across the year, which is the highest availability that S3 offers. From a security standpoint this storage class also has the added support of SSL, Secure Sockets Layer, for encrypting data in transit in addition to encryption options for when the data is at rest. With management features such as lifecycle rules, objects in S3 Standard can automatically be moved to another storage class. For those unfamiliar with life cycle rules, they provide an automatic method of managing the life of your data while it is being stored on Amazon S3. By adding a life cycle wall to a bucket you are able to configure and set specific criteria that can automatically move your data from one class to another or delete it from Amazon S3 altogether. You may want to do this as a cost saving exercise by moving data to a cheaper storage class after a set period of time.
S3 Intelligent Tiering. This storage class is ideal for those circumstances where the frequency of access to the object is unknown. Effectively, we have unpredictable data access patterns and so by using this storage class, it can help to optimize your storage costs. Depending on your data access patterns of objects in the Intelligent Tiering Class, S3 will move your objects between two different tiers, these being frequent and infrequent access. Now, these classes are a part of the Intelligent Tiering Class itself and are separate from the existing storage classes I listed earlier. When the objects are moved to Intelligent Tiering, they are placed within the frequent access tier, which is the more expensive of the two tiers. If an object is not accessed for 30 days then AWS will automatically move the object to the cheaper tier known as the infrequent access tier. Once that same object is accessed again, it will automatically be moved back to the frequent tier. Much like S3 Standard, S3 Intelligent Tiering also offers 11 nines of durability across multiple availability zones offering protection against the loss of a single AZ. However, its availability isn't quite as high as S3 Standard as it set at 99.9%. This storage class also has the added support of SSL for encrypting data in transit in addition to encryption options for when the data is at rest. S3 Intelligent Tiering also supports the lifecycle rules and matches the same performance throughput and low latency as S3 Standard.
S3 Standard infrequent access. This can be seen as the equivalent to the infrequent tier from the Intelligent Tiering class as it is designed for data that does not need to be accessed as frequently as data within the Standard tier, and yet still offers high throughput and low latency access, much like S3 Standard does. As with all other S3 storage classes, it carries that 11 9s durability across multiple AZs, again by copying your objects to multiple availability zones within a single region to protect against AZ outages. It shares the same availability as Intelligent Tiering of 99.9 percent. As a result, this storage class comes at a cheaper cost than S3 Standard. Common security features such as SSL for encryption in transit and data at rest encryption is supported as well as management controls such as lifecycle rules to automatically move objects to an alternate storage class based on your requirements.
S3 One Zone Infrequent Access. By now you can probably assume what this storage class comprises of based off of the previous classes that I've already discussed. However, again, being an infrequent storage class it is designed for objects that are unlikely to be accessed frequently. It also carries the same throughput and low latency. However, the durability, although remaining at eleven nines only exists across a single availability zone. As the name implies to this class it is one zone, as in one availability zone. So the objects will be copied multiple times to different storage locations within the same availability zone instead of across multiple availability zones. This results in a 20% storage cost reduction when compared to S3 Standard. One Zone IA does, however, offer the lowest level of availability which is currently 99.5 percent and this is down to the fact that your data is being stored in a single availability zone. Should the AZ storing your data become unavailable then you will lose access to your data or even worse it may become completely lost should the AZ be destroyed in a catastrophic event. Again, life cycle rules and encryption mechanisms are in place to protect your data both in transit and at rest.
S3 Glacier. The next two storage classes are associated with S3 Glacier which is used for archival data. Firstly let me explain more about S3 Glacier, as it can be accessed separately from the Amazon S3 service but closely interacts with it S3 Glacier storage classes directly interact with the Amazon S3 lifecycle rules discussed previously. However, the fundamental difference with the Amazon Glacier storage classes come at a fraction of the cost when it comes to storing the same amount of data than the S3 storage classes. So what's the catch? Well, it doesn't provide you the same features as Amazon S3 but more importantly, it doesn't provide you instant access to your data.
So what do Amazon Glacier classes offer exactly? Well, they offer an extremely low-cost long term durable storage solution which is often referred to as cold storage, ideally suited for long term backup and archival requirements. It's capable of storing the same data types as Amazon S3, effectively any object, however, like I just mentioned it doesn't provide instant access to your data. In addition to this, there are other fundamental differences which makes this service fit for purpose for other use cases. The service itself has 11 nines of durability making this just as durable as Amazon S3. Again this is achieved by replicating the data across multiple different availability zones within a single region but it provides the storage at a considerably lower cost compared to that of Amazon S3. And this is because retrieval of data stored in Glacier is not an instant access retrieval process. When retrieving your data it can take up to several hours to gain access to it depending on certain criteria. The data structure within Glacier is centered around vaults and Archives. Buckets and folders are not used. They are purely used for S3.
A Glacier vault simply acts as a container for Glacier archives. These vaults are regional and as such during the creation of these vaults, you are asked to supply the region in which they will reside. Within these vaults, we then have our data which is stored as an archive and these archives can be any object similar to S3. Thankfully you can have unlimited archives within your Glacier vaults, so from a capacity perspective, it follows the same rule as S3. Effectively you have access to an unlimited quantity of storage for your archives and vaults. Now whereas Amazon S3 provided a nice graphical user interface to view, manage, and retrieve your data within buckets and folders, Amazon Glacier does not offer this service.
The Glacier dashboard within AWS management console allows you to create your vaults, set data retrieval policies, and event notifications. When it comes to moving data into S3 Glacier for the first time it's effectively a two-step process. Firstly, you need to create your vaults as your container for your archives and this could be completed using the Glacier console. Secondly, you need to move your data into the Glacier vault using the available API or SDKs. As you may be thinking, there's also another method of moving your data into Glacier and this is by using the S3 lifecycle rules that I discussed earlier. When it comes to retrieving your archives, which is your data, you will again have to use some form of code to do so, either the APIs, SDKs or the AWS CLI. Either way, you must first create an archival retrieval job, then request access to all or part of that archive.
Now you have more of an understanding of S3 Glacier, let me review the two S3 Glacier storage classes. Firstly, S3 Glacier. This is the default Standard storage class within S3 Glacier offering a highly secure using in transit and at rest encryption low-cost and durable storage solution. The durability matches that of other S3 storage classes, being 11 9s across multiple availability zones, and the availability of S3 Glacier is 99.9%. It's simple to add data to this storage class using the S3 put APIs is in addition to S3 lifecycle rules. However, it does offer a variety of retrieval options depending on how urgently you need the data back, each offering a different price point. These being expedited, Standard, and bulk.
Expedited. This is used when you have an urgent requirement to retrieve your data but the request has to be less than 250 megabytes. The data is then made available to you in one to five minutes and this is the most expensive retrieval option of the three.
Standard. This can be used to retrieve any of your archives no matter their size but your data will be available in three to five hours, so much longer than the expedited option and this is the second most expensive of the three options.
And finally, bulk. This option is used to retrieve petabytes of data at a time, however, this typically takes between five and twelve hours to complete. This is the cheapest of the retrieval options so it really depends on how much data and how quickly you need it, as the retrieval speed and cost to be made by your retrieval option.
S3 Glacier Deep Archive. Out of all the storage classes offered by S3, Glacier Deep Archive is the cheapest and again being a Glacier class, it focuses on long-term storage. This is an ideal storage class for circumstances that require specific data retention regulations and compliance with minimal access, such as those within the financial or health sector where data records might need to be legally retained for seven years or even longer. The durability and availability matches that of S3 Glacier with eleven 9s durability across multiple AZss with 99.9% availability.
Adding data into deep archive follows the same processes as S3 Glacier, using S3 put APIs in addition to S3 lifecycle rules. Deep Archive, however, does not offer multiple retrieval options. Instead, AWS states that the retrieval of the data will be within 12 hours or less. To summarize some of the common features between the storage classes, this table clearly shows how they differ. As you can see, the main difference of the classes is the durability and availability percentages, in addition to the pricing.
So when selecting your class for your data you really need to be asking yourself the following questions: how critical is my data? Does it require the highest level of durability? How reproducible is the data? Can it be easily created again if need be? and how often is the data likely to be accessed? For detailed information on Amazon S3 pricing covering all storage classes discussed please see our existing course Understanding and Optimizing Costs with AWS Storage Services.
Stuart has been working within the IT industry for two decades covering a huge range of topic areas and technologies, from data center and network infrastructure design, to cloud architecture and implementation.
To date, Stuart has created 150+ courses relating to Cloud reaching over 180,000 students, mostly within the AWS category and with a heavy focus on security and compliance.
Stuart is a member of the AWS Community Builders Program for his contributions towards AWS.
He is AWS certified and accredited in addition to being a published author covering topics across the AWS landscape.
In January 2016 Stuart was awarded ‘Expert of the Year Award 2015’ from Experts Exchange for his knowledge share within cloud services to the community.
Stuart enjoys writing about cloud technologies and you will find many of his articles within our blog pages.