AWS Data Services
The course is part of these learning paths
One of the core building blocks of Infrastructure as a Service (IaaS) is that of storage, and AWS provides a wide range of storage services that allow you to architect the correct solution for your needs. Understanding what each of these services is and what they have been designed and developed for, gives you the knowledge to implement best practices ensuring your data is stored, transmitted and backed up in the most efficient and scalable way. This course will focus on each of the storage services provided through AWS training and will explain what the service is, its key features and when and why you might use the service within your own environment.
The objectives of this course are to provide:
- An overview and introduction to the different AWS storage services, including EBS Storage
- An understanding of how to transfer data into and out of AWS
- The knowledge to confidently select the most appropriate storage service for your needs
This course is designed as an introduction to the AWS storage services and methods of storing data. As a result, this course is suitable for:
- Those who are starting out their AWS journey to understand the various services that exist and their use case
- Storage engineers responsible for maintaining and storing data within the enterprise
- Security engineers who secure and safeguard data within AWS
- Those who are looking to begin their certification journey with either the AWS Cloud Practitioner or one of the 3 Associate level certifications
This is an entry-level course to AWS storage services and so no prior knowledge of these services are required, however, a basic understanding of Cloud Computing and awareness of AWS would be beneficial but not essential.
If you have thoughts or suggestions for this course, please contact Cloud Academy at email@example.com.
- API reference for Amazon Glacier
- Using the AWS SDK's with Amazon Glacier
- Policy evaluation logic
- Amazon Glacier Pricing page
Hello and welcome to this lecture focusing on Amazon Glacier. Amazon Glacier is similar in part to Amazon S3. It even directly interacts with the Amazon S3 lifecycle rules discussed in the previous lecture. However, the fundamental difference with Amazon Glacier is that it's a fraction of the cost when it comes to storing the same amount of data. So what's the catch?
Well it doesn't provide you the same features, but more importantly, it doesn't provide you instant access to your data. So what is Amazon Glacier exactly? It's an extremely low cost, long-term, durable storage solution which is often referred to as cold storage, ideally suited for long-term backup and archival requirements. It's capable of storing the same data types as Amazon S3 effectively any object. However, it does not provide instant access to your data. In addition to this, there are other fundamental differences which makes this service fit for purpose for other use cases.
The service itself again has eleven 9's of durability, making this just as durable as Amazon S3. Again, this is achieved by replicating your data across multiple different availability zones within a single region. But it provides the storage at considerable lower cost compared to that of Amazon S3. This is because retrieval of data stored in Glacier is not an instant access retrieval process. When retrieving your data, it can take up to several hours to gain access to it, depending on certain criteria. The data structure within Glacier is centered around vaults and archives. Buckets and folders are not used. They are purely for S3.
A Glacier Vault simply acts as a container for Glacier Archives. These vaults are regional. And as such, during the creation of these vaults, you are asked to supply the region in which they will reside. Within these vaults, we then have our data which is stored as an archive. And these archives can be any object similarly to S3. For example, any document, audio, movie or image file, etc, and each will be saved as an archive. Thankfully, you can have unlimited archives within your Glacier Vaults. So from a capacity perspective, it follows the same rule as S3. Effectively, you have access to an unlimited quantity of storage for your archives and vaults.
Now whereas Amazon S3 provided a nice graphical user interface to view, manage and retrieve your data within buckets and folders, Amazon Glacier does not offer this service. The Glacier Dashboard within AWS management console, only allows you to create your vaults. Any operational process to upload and retrieve data has to be done using some form of code, either with the Glacier web app service API, or by using the AWS SDKs which simplifies the process of authentication. More information on the API and the SDKs can be found with these links on screen. When it comes to moving data into Amazon Glacier for the first time, it's effectively a two-step process. Firstly, you need to create your vaults as a container for your archives, and this could be completed using the Glacier console. Secondly, you need to move your data into the Glacier vault using the API or SDKs. As you may be thinking, there's also another method of moving data into Glacier if it already exists in S3. This is by the S3 Lifecycle rules. There is an option to move your S3 data into Glacier after a set period of time as defined by you within the Lifecycle rule. When the rule is met, S3 will move the data into Glacier as required. When it comes to retrieving your archives, your data, you will again have to use some form of code to do so, either the APIs, SDKs or the AWS CLI. Either way, you must first create an archival retrieval job. Request an access to all or part of the archive. When doing so, you can also specify a retrieval option which can be one of three options.
Expedited. This is used when you have an urgent requirement to retrieve your data but the request has to less than 250 meg. The data is then made available to you in one to five minutes. And the cost of this service is based upon three cents per gig and one cent per request.
Standard. This can be used to retrieve any of your archives no matter their size, but your data will be available in three to five hours. So much longer than the Expedited option. This cost for the service is one cent per gig requested and five cent per thousand requests.
Bulk. Finally, this option is used to retrieve petabytes of data at a time. However, this typically takes between five and twelve hours to complete. This is the cheapest of the retrieval options which is set at 0.25 cent per gig and 2.5 cents per thousand requests. So it really depends on how much data and how quickly you need it as to the retrieval speed and cost to you made by your retrieval option.
By default, Amazon Glacier encrypts your data using the AES-256 algorithm which is the Advanced Encryption Standard 256 bit, and will manage all of the encryption keys on your behalf. In addition to IAM policies which govern access control to all of you AWS services, Glacier also uses additional methods of access control for protecting your data. And this comes in the form of Vault Access Policies and Vault Lock Policies. Whereas IAM policies are identity based, meaning the policies are associated to a particular user group or role, Vault Access Policies are classed as a resource-based policy as they are applied directly to your vault resource. This is similar to a bucket policy in Amazon S3. These Vault Access Policies govern access control to a particular vault and each vault can only have a single associated Vault Access Policy. They follow the same policy patent as identity policies using the JSON format. However, they also include a principal component to identify who is permitted or refused access. If a user has access to a vault through an identity policy, and there also happens to be a Vault Access Policy attached to the vault as well, then all access will be evaluated between the two policies to see if access is allowed. If there is an explicit deny in either policy, the identity will be refused access. For more information on policy evaluation logic, please view the link here.
Vault Lock Policies are similar to Vault Access Policies. However, once they are set, they cannot be changed. This allows you to implement stringent security controls to help you abide by specific governance and compliance controls. For example, you may not be allowed to delete archives for three years due to regulatory requirements. In this case, you could set a Vault Lock Policy denying anyone from deleting archives that are less than 1,095 days old. This will ensure that no data less than three years old can be deleted. You would Vault Access Policies to govern access control features that may change over time and you would use Vault Lock Policies to help you maintain compliance using access controls that must not be changed.
The pricing structure for Amazon Glacier is very simple. It has a single storage cost for all data despite how much data you are storing. However, this will vary between region. For example, within the London region, the cost of storage is set at 0.45 cents per gig. Similarly to Amazon S3, there also additional costs such as data transfer, request and retrieval pricing. Data transfer into Glacier is free. However, there is a charge for data transferred out of this service. For example, to another region, the transfer is set at 2 cent per gigabyte. The price varies if transferring out across the internet depending on the quantity of data transferred. As we know, there are three variants of data retrieval. Each with their own cost per gig, which is already been covered for standard, expedited and bulk. However, there's also a cost associated to how many of these retrieval request you make. As an example for the London region, this is priced as follows: For the latest pricing information, please refer to the Amazon Glacier Pricing Page.
To quickly summarize, Amazon Glacier is designed to archive data for extended periods of time in cold storage for a very small cost. And so it is ideally suited for retaining data for regulatory reasons. However, it is not a good choice if you need to store data that changes frequently, and that requires immediate access.
About the Author
Stuart has been working within the IT industry for two decades covering a huge range of topic areas and technologies, from data centre and network infrastructure design, to cloud architecture and implementation.
To date, Stuart has created 60++ courses relating to Cloud, most within the AWS category with a heavy focus on security and compliance
He is AWS certified and accredited in addition to being a published author covering topics across the AWS landscape.
In January 2016 Stuart was awarded ‘Expert of the Year Award 2015’ from Experts Exchange for his knowledge share within cloud services to the community.
Stuart enjoys writing about cloud technologies and you will find many of his articles within our blog pages.