Designing for Failure
Managing RTO and RPO for AWS Disaster Recovery
Designing for high availability, fault tolerance and cost efficiency
High Availability in RDS
High Availability in Amazon Aurora
High Availability in DynamoDB
The course is part of this learning path
This section of the Solution Architect Associate learning path introduces you to the High Availability concepts and services relevant to the SAA-C03 exam. By the end of this section, you will be familiar with the design options available and know how to select and apply AWS services to meet specific availability scenarios relevant to the Solution Architect Associate exam.
Want more? Try a lab playground or do a Lab Challenge!
- Learn the fundamentals of high availability, fault tolerance, and back up and disaster recovery
- Understand how a variety of Amazon services such as S3, Snowball, and Storage Gateway can be used for back up purposes
- Learn how to implement high availability practices in Amazon RDS, Amazon Aurora, and DynamoDB
In this lecture, I shall be looking at the architecture behind the recovery strategy of Backup & Restore.
Firstly, this method provides the longest RTO and RPO values, and so as a result this recovery strategy should only be used for deployments that do not pose a significant business risk or impact if the services are not back in operation quickly. Backup & Restore generally assumes your RTO will be 24 hours or less, with an RPO measured in hours.
You can of course use backup & restore within multiple AZ’s within a region, but let’s take a look at this strategy to protect you from an entire regional outage, so how can you use this approach to recover from this kind of failure?
There are a number of different AWS services available that can be used to help you in this scenario, and they can be used to help you backup your resources allowing you to recover to a specific point-in-time, or on a more scheduled and periodic basis. Obviously, point-in-time recovery allows you additional flexibility, and an example of some of the services that can be used to offer this capability are as follows:
Amazon Aurora DB snapshot
Amazon Redshift snapshot
Amazon DocumentDB snapshot
Amazon Neptune snapshot
Amazon Elastic Block Store snapshot
Amazon Elastic File Service backup, when used with AWS Backup
Of course, there are other AWS services that can offer you additional backup methods which can be used to restore across regions. One example is Amazon S3, which can also be used with the cross-region replication feature.
One key AWS service to mention here is AWS Backup, this is a very versatile AWS-managed service used to help you manage and implement backups across a number of different supported AWS services. For the latest supported services offered by AWS backup, please see the AWS documentation found here.
AWS Backup acts as a central hub to manage backups across your environment, across multiple regions, centralizing management and providing full auditability in addition to assisting with specific compliance controls. Having a managed service to monitor and control your backups allows for all logging to be consolidated in a single place, in addition to seeing the status of completed backups and monitoring any restores required.
The service itself uses backup features from existing services, so for example, if you were to manage your EBS backups, AWS Backup would manage these through the EBS Snapshot feature as a way of performing the backup. Other AWS services supported by AWS backup include:
Amazon EC2 instances
Amazon RDS databases
Amazon Aurora databases
Amazon DynamoDB tables
Amazon EFS file systems
AWS Storage Gateway volumes
Amazon FSx (for both windows file server and Lustre)
When using AWS Backup you will need to create backup policies or backup plans. These simply determine the exact requirements that you need for your backups and contain information such as:
A backup schedule
Lifecycle rules, such as the transition of data to cold storage after a set period
A backup vault, which is where your backups are stored and encrypted through the use of KMS encryption keys
By applying these policies and backup plans, it enables you to control your RPO accordingly for different resources and applications running on those resources. Defining the backup schedule and window directly links to your RPO requirements.
Let’s take a quick look at an example configuration in which you could apply a backup and restore strategy for both a single AZ and more importantly, across multiple regions.
Within this diagram, we have our source region on the left, and our destination or DR region on the right. We can see that in this scenario we have an EC2 instance with an associated EBS volume, in addition to an RDS instance running in a single AZ. We also have an EFS File System which has multiple mount targets, one to each AZ within the region.
From a backup perspective, the EBS volume is backed up using EBS snapshots, which are then stored on Amazon S3. The EFS File system is backed up using AWS Backup. The RDS instance is backed up using a DB snapshot, which is stored on Amazon S3, in addition to the database transaction logs.
If the availability zone hosting the EBS volume, EC2 instance and RDS instance fails, then a restore to another AZ is easily achieved. A new EC2 instance is launched from an existing AMI, this is then connected to the EFS file System using a new mount point. The EBS snapshot is restored onto a new EBS volume and that volume is attached to the new instance. A new RDS instance is launched from the DB snapshot backup taken. Your infrastructure is now back up and running and operational, good times!
However, what would happen if the entire Region was lost? Let’s take a look!
In this scenario, we could rely on the features within the services of the storage layer, both S3 and AWS Backup, in addition to Data LifeCycle Manager which can be used to automate EBS snapshot replication to another region. We would launch a new EC2, again, this can be the same AMI used in the previous region as long as you copied the AMI between the regions. The EBS volume can again be restored using the snapshot stored on S3 that has been copied between regions using AWS DataLifecycle Manager, and then attached to the EC2 instance. The EFS File System could then be restored using AWS Backup within the new region, thanks to cross-region backup plans, and then once restored, the EC2 instance can connect using a new mount point from the new Availability Zone. A new RDS instance can also be easily launched as long as you have implemented cross-region automated backups, which are then stored on S3.
So by implementing a simple architecture using some core AWS-managed services you can manage your RTO’s and RPO’s using the Backup and Restore feature. Do bear in mind that this was a simple diagram with minimal services. When looking at this strategy for your own resources, you may have hundreds or thousands of resources that need restoring, and depending on how much data across your environment could significantly affect your RTO.
Stuart has been working within the IT industry for two decades covering a huge range of topic areas and technologies, from data center and network infrastructure design, to cloud architecture and implementation.
To date, Stuart has created 150+ courses relating to Cloud reaching over 180,000 students, mostly within the AWS category and with a heavy focus on security and compliance.
Stuart is a member of the AWS Community Builders Program for his contributions towards AWS.
He is AWS certified and accredited in addition to being a published author covering topics across the AWS landscape.
In January 2016 Stuart was awarded ‘Expert of the Year Award 2015’ from Experts Exchange for his knowledge share within cloud services to the community.
Stuart enjoys writing about cloud technologies and you will find many of his articles within our blog pages.