AWS Storage Services
The course is part of these learning paths
With an on-premises data backup solution within your data center, it’s critical for your business to have a disaster recovery plan built into your business continuity plans. You need to have a plan in place should a disaster occur that affects your operation of the business. The same is true when you start to leverage the cloud for its storage capabilities for your backed up data.
This course explains how cloud storage fits in with DR and the different considerations when preparing to design a solution to back up your on-premises data to AWS. It will explain how Amazon S3, AWS Snowball, and AWS Storage Gateway can all be used to help with the transfer and storage of your backup data.
You should not assume that just because you are backing data up to the cloud it will solve your every need, there are many points of consideration when planning a DR backup solution to the cloud, such as AWS. However, it does also open opportunities to you that may not have been possible with a standard on-premises backup solution. It’s these points of interest that many enterprises are focusing on to gain a significant advantage when it comes to disaster recovery.
AWS offers a number of different services available to help you architect the best solution for your needs. To allow you to set up the correct solution that works for you, you must first understand how each of these services can be of benefit to you.
To help you implement effective solutions, you must first have answers to the following:
- What is your RTO (Recovery Time Objective)?
- What is your RPO (Recovery Point Objective)?
- How quickly do you need to retrieve your data?
- How much data do you need to import/export?
- What durability is required for your data?
- How sensitive is your data?
- What security mechanisms are required to protect your data?
- Do you have any compliance controls that you need to abide by?
When you have answers to these questions, you will be able to start working towards an effective backup solution to create a cost-efficient, highly reliable, durable and secure data backup storage solution.
- Gain an understanding of how your storage solution can affect your business continuity and DR plans
- Obtain the knowledge to know when to use specific AWS storage solutions to your advantage between Amazon S3, Amazon Glacier, AWS Snowball, and AWS Storage Gateway
- Understand how each of these services can provide a DR solution to fit your specific needs
This course has been designed for:
- Engineers who need to manage and maintain AWS storage services
- Architects who are implementing effective data backup solutions from on-premise to AWS
- Business continuity management managers
- Anyone looking to prepare for the AWS Solutions Architect - Professional certification
As a prerequisite to this course you should have a basic understanding of the following:
- Business continuity
- Disaster recovery
- Data backup terms and methodologies
- Amazon S3
- Amazon EC2
- Elastic Block Store (EBS)
This course includes
If you have thoughts or suggestions for this course, please contact Cloud Academy at firstname.lastname@example.org.
Resources referenced within this lecture
Storage Fundamentals for AWS
AWS: Overview of AWS Identity & Access Management (IAM)
AWS Security Best Practises: Abstract & Container Services
AWS Big Data Security: Encryption
Automated Data Management with EBS, S3, and Glacier
Hello, and welcome to this lecture. I want to discuss using Amazon S3, is a Backup Solution and some of the features that it provides, which makes this a viable solution for many scenarios.
As you probably know, Amazon S3 is a highly available and durable service, with huge capacity for scaling. It can store files from 1byte in size, up to 5TBs, with numerous security features to maintain a tightly secure environment.
This makes S3 an ideal storage solution for static content, which makes Amazon S3 perfect as a backup solution.
Amazon S3 provides three different classes of storage, each designed to provide a different level of service and benefit.
I briefly touched on these classes earlier, but to reiterate, the classes are, Standard, Standard Infrequent Access, and Amazon Glacier. It's important to understand the differences between these classes if you intend to use S3 as a backup solution. The best way to define these differences is to breakdown their key features.
Let me start by looking at Standard. Standard offers eleven nines of durability, and four nines availability over a given year. This reliability is also coupled with an SLA of the service, which can be found here. This level of durability is achieved by the automatic replication of your data to multiple devices and multiple availability zones within a specified region. It also has a number of security features, enabling it to support encryption both in transit, and when data is at rest with both client and server side encryption mechanisms. It also offers data management capabilities through the use of lifecycle policies that can automatically move data to another storage class for cost optimization, or it can delete the data all together depending on your lifecycle policy configuration.
The Standard Infrequent Access class also offers the eleven nines durability, but only three nines availability, whereas the Standard offered four. This class however, is predominantly used for data that is accessed less frequently than the Standard, hence the name. As a result, the availability takes a hit, but in doing so, the cost for this class is far less than the Standard class. This makes it an effective choice when using S3 to store backup data for DR purposes, as you're still getting the eleven nines durability, and have high speed data retrieval access as and when you need it. This class also adheres to the same SLA listed previously, plus the same encryption and data management mechanisms. The main difference here is the cost of the actual storage itself.
Finally, Amazon Glacier. Although yes, this is classed as a different service, it uprights in conjunction with S3, and is considered the third storage class of S3. Amazon Glacier stores data in archives as opposed to S3 buckets, and these archives can be up to 40TBs in size. The archives are saved within Vaults, which act as containers for archives. Specific security measures can be applied to these Vaults to offer additional protection of your data. This class is essentially used for data archiving and long-term data retention, and is commonly referred to as the cold storage service within AWS.
When you need to move data into Glacier, you have different options available to you. You can use the lifecycle rules within S3 to move data from a Standard or Infrequent Access Class directly to Amazon Glacier. You can use the AWS SDKs, or the underlying Amazon Glacier API.
Amazon Glacier also retains the same level of durability as the other two classes, and also offers support for both encryption in transit and at rest to protect sensitive information. However, in addition to this, it also enforces its own security measures that differ from S3, such as Vault Locks, enabling you to enforce a write once, read many, or WORM control, and vault access policies, which provide access to specific users or groups. Amazon Glacier can also be used to help comply with specific governance controls, such as, HIPPA and PCI, within an overall solution.
The most significant difference between Glacier and the other two classes, is that you do not have immediate data access when you need it. If you need to retrieve data from Amazon Glacier, than depending on your retrieval option, it can take anything between a number of minutes to a number of hours for that data to be made available for you to than explore.
This can be a problem if you're on a DR situation whereby you need immediate access to specific data, as is possible within S3. As a result, you need to be aware of the data that you are archiving to Amazon Glacier, specifically if you have low RTO; the lower number of retrieval options available, each with different cost and time for retrieving your data.
Expedited. This is used when urgent access is required to a subset of an archive, which is less than 250MB, and the data will typically be available within 1-5 minutes. The cost of this is $0. 03 per GB and $0. 01 per request.
Standard. This allows you to retrieve any of your archives, regardless of size, and it will take between 3-5 hours to retrieve the data. As you can see, the cost of this method is cheaper than Expedited.
Bulk. This is the cheapest option for data retrieval, and can retrieve petabytes of archive data. This option normally takes between 5-12 hours, and as you can see, it is significantly cheaper than the other options.
Finally, Amazon Glacier is the cheapest of the three classes to store data, as we can see here, in this price comparison within the US East Region.
As I've discussed, S3 offers a great level of durability and availability of your data, which is perfect for DR use cases. However, if from either a business continuity or compliance perspective, you had a requirement to access S3 in multiple regions, than the default configuration of having your data stored within a single region, and copied across multiple AZ's is simply not enough.
In this scenario, you would need to ensure you configured Cross Region Replication. By default, S3 will not copy your data across multiple regions, it has to be explicitly configured and enabled.
From a DR point of view, you want to configure configuring Cross Region Replication to help with the following points.
Reduce latency of data retrieval. If you had multiple data centers, and one of these became unavailable in a disaster, another data center could potentially take the load and run your production environment. However, this data center, won't be in another geographic location. If your data was critical, you would want to ensure latency is minimized when accessing the backup data to bring your service back online as soon as possible. By configuring Cross Region Replication to a location that is closer to your Secondary Data Center, latency in data retrieval will be kept to a minimum.
Governance and Compliance. Depending on specific compliance controls, there may be a requirement to store backup data across a specific distance from the source. By enabling Cross Region Replication, you can maintain compliance whilst at the same time still have the data in the local region for optimum data retrieval latency.
As a result of Cross Region Replication, you would have eleven nines of durability, and four nines availability for both the source and the replicated data.
From a performance perspective, S3 is able to handle multiple concurrent, and as a result, Amazon recommends that for any file that you're trying to upload to S3, larger than 100MB, than you should implement multipart upload. This feature helps to increase the performance of the backup process.
Multipart upload essentially allows you to upload a single large object by breaking it down into smaller contiguous chunks of data. The different parts can be uploaded in any order, and if there is an error during the transfer of data for any part, than only that part will need to be resent. Once all parts are uploaded to S3, S3 will than erase some of the data for the object.
There are a number of benefits to multipart upload. These being, Speed and Throughput.
As mutliple parts can be uploaded at the same time in parallel, the speed and throughput of uploading can be enhanced.
Interruption Recovery. Should there be any transmission issues or errors, it will only affect the part being uploaded, unaffecting the other parts. When this happens, only the affected part will need to be resent.
Management. There is an element of management available whereby you can, if you're quiet, pause your uploads and then resume them at any point. Multipart uploads do not have an expiry time, allowing you to manage the upload of your data over a period of time.
Finally, I want to touch on Security, which after all is a key factor for any data storage solution.
Getting your data into AWS and on to S3 is one thing, but ensuring that it can't be accessed or exposed to unauthorized personnel is another. More and more I hear on news feeds where data has been exposed or leaked out into the public, due to incorrect security configurations made on S3 buckets, inadvertently exposing what is often very sensitive information to the general public.
This can have a significant detrimental effect on an organization's reputation. Security options must be understood, defined, and managed correctly, if using S3 as a storage backup solution. As a result, S3 comes with a range of security features, which you should be aware of and know how to implement.
Some of these security features, which can help you maintain a level of data protection are:
- IAM Policies. These are identity and access management policies that can be used to both allow and restrict access to S3 buckets and objects at a very granular level depending on identities permissions.
- Bucket Policies. This are JSON policies assigned to individual buckets, whereas IAM Policies are permissions relating to an identity, a user group, or role. These Bucket Policies can also define who or what has access to that bucket's contents.
- Access Control Lists. These allow you to control which user or AWS account can access a Bucket or object, using a range of permissions, such as read, write, or full control, et cetera.
- Lifecycle Policies. Lifecycle Policies allow you to automatically manage and move data between classes, allowing specific data to be relocated based on compliance and governance controls you might have in place.
- MFA Delete. Multi-Factor Authentication Delete ensures that a user has to enter a 6 digit MFA code to delete an object, which prevents accidental deletion due to human error.
- Versioning. Enabling versioning on an S3 bucket, ensures you can recover from misuse of an object or accidental deletion, and revert back to an older version of the same data object. The consideration with versioning is that it will require additional space as a separate object is created for each version, so that's something to bear in mind.
When possible, and if S3 is to be used as a backup solution, than there should be no reason to expose or configure these buckets as publicly accessible, and so you should enforce as many of these security mechanisms as possible, based on the risk factor of the data being stored.
The following resources relate to topics covered and mentioned throughout this course.
- Storage fundamentals for AWS
- AWS, an Overview of AWS Identity & Access Management
- AWS Security Best Practices, looking at Abstract and Container Services
- AWS Big Data Security, specifically looking at Encryption, and
- Automated Data Management with EBS, S3, and Glacier.
There are two labs. You can create your first Amazon S3 Bucket, and using S3 Bucket Policies and Conditions to Restrict Specific Permissions.
There are also two Blogs:
- S3 Lifecycle Policies Versioning & Encryption
- and looking at S3 Security, and mastering S3 bucket policies and ACL.
That now brings me to the end of this lecture, coming up next, I will discuss the AWS Snowball Service, and how this can be used for your data transfer.
About the Author
Stuart has been working within the IT industry for two decades covering a huge range of topic areas and technologies, from data centre and network infrastructure design, to cloud architecture and implementation.
To date, Stuart has created 60++ courses relating to Cloud, most within the AWS category with a heavy focus on security and compliance
He is AWS certified and accredited in addition to being a published author covering topics across the AWS landscape.
In January 2016 Stuart was awarded ‘Expert of the Year Award 2015’ from Experts Exchange for his knowledge share within cloud services to the community.
Stuart enjoys writing about cloud technologies and you will find many of his articles within our blog pages.