Building High Availability into your environment
Understanding SLAs in AWS
Which services should I use to build a decoupled architecture?
Managing RTO and RPO for AWS Disaster Recovery
The course is part of this learning path
This course covers the core learning objective to meet the requirements of the 'Designing for disaster recovery & high availability in AWS - Level 2' skill
- Analyze the amount of resources required to implement a fault-tolerant architecture across multiple AWS availability Zones
- Evaluate an effective AWS disaster recovery strategy to meet specific business requirements
- Understand SLA for AWS services to ensure the high availability of a given AWS solution
- Analyze which AWS services can be leveraged to implement a decoupled solution
Hello and welcome to this lecture where I shall be explaining what the difference between Recovery Time Objective (RTO), and Recovery Point Objective (RPO), and why these are important factors when your organization comes to building and developing an effective DR strategy.
Disaster recovery is all about having the capability to restore your operations as quickly as possible with minimal data loss. But what would we consider a disaster? A disaster usually signifies a catastrophic event, something that has huge impact against your AWS resources and workloads, and this can be anything from a natural disaster such as earthquakes that affect and impact AWS availability zones, or even entire Regions, to unauthorized access to your resources that have been maliciously misconfigured or even deleted. When it comes to geographical disasters, such as earthquakes or floods, it can affect an entire region, and this is something that should be considered when designing your infrastructure to recover from a disaster.
As a part of this process you need to think of an effective backup strategy in addition to having redundant resources and workloads ready to take the load should a disaster occur. Both our RTO and RPO metrics help us to establish a maximum tolerance level of business impact before regaining a stable operational state.
RTO is defined as the maximum amount of time in which a service can remain unavailable for before it can be classed as damaging to the business. For example, if your RTO for a service was two hours, and you experienced an outage at 09:00, then your service must be back up and running by 11:00 before unacceptable damage to the business was experienced.
RPO is defined as the maximum amount of time for which data could be lost for a service. So this effectively measures the acceptable data loss between your last backup and the service interruption. For example, let’s assume your RPO was 8 hours and you had a service outage at 15:00, you would need to restore your operational state with data from a backup no earlier than 07:00 that same morning. As a result, you need to be mindful of your backup strategy to ensure that it meets the demand of your RPO for different services and solutions.
So how can you implement an architecture suitable to fit your required RTO timescales, especially when working with multi-regional solutions?
There are a number of recovery strategies to consider when architecting in AWS for your DR plan to help you meet your RTO and RPO needs. These can be defined into 4 methods, with each one becoming more complex than the previous, and usually more expensive, but with a decreased RTO and RPO at each stage.
Backup & Restore - This provides you with the highest RTO and RPO, and by high I am referring to the time associated with your RTO and RPO metrics. Typically a Backup & Restore provides an RTO in 24 hours or less, and your RPO in hours
Pilot Light - This method decreases your RTO to a number of hours, and RPO down to minutes.
Warm Standby - With Warm Standby, your RTO can be measured in minutes, and your RPO in seconds
Multi-Site Active/Active - This is the most expensive recovery strategy, however, it does mean that both your RTO and RPO are close to zero!
Stuart has been working within the IT industry for two decades covering a huge range of topic areas and technologies, from data center and network infrastructure design, to cloud architecture and implementation.
To date, Stuart has created 150+ courses relating to Cloud reaching over 180,000 students, mostly within the AWS category and with a heavy focus on security and compliance.
Stuart is a member of the AWS Community Builders Program for his contributions towards AWS.
He is AWS certified and accredited in addition to being a published author covering topics across the AWS landscape.
In January 2016 Stuart was awarded ‘Expert of the Year Award 2015’ from Experts Exchange for his knowledge share within cloud services to the community.
Stuart enjoys writing about cloud technologies and you will find many of his articles within our blog pages.