1. Home
  2. Training Library
  3. Designing for Disaster Recovery & High availability in AWS - Level 2

Summary

Start course
Difficulty
Intermediate
Duration
1h
Students
5
Description

This course covers the core learning objective to meet the requirements of the 'Designing for disaster recovery & high availability in AWS - Level 2' skill

Learning Objectives:

  • Analyze the amount of resources required to implement a fault-tolerant architecture across multiple AWS availability Zones
  • Evaluate an effective AWS disaster recovery strategy to meet specific business requirements
  • Understand SLA for AWS services to ensure the high availability of a given AWS solution
  • Analyze which AWS services can be leveraged to implement a decoupled solution
Transcript

In this final lecture, I want to recap some of the key points taken from the previous lectures.  

I began by looking at the definitions of both RTO and RPO, which were defined as the following:

  • RTO - Recovery Time Objective - defined as the maximum amount of time in which a service can remain unavailable for before it can be classed as damaging to the business. 

  • RPO - Recovery Point Objective - defined as the maximum amount of time for which data could be lost for a service. 

I then highlighted the four recovery strategies:

  1. Backup & Restore

  2. Pilot Light

  3. Warm Standby

  4. Multi-Site Active/Active

I then covered how you should approach defining what your RTO and RPO metrics should be, and in this lesson I explained that:

  • Defining your RTO and RPO is an essential part of your disaster recovery and business continuity planning 

  • RTO and RPO should be defined for each of your applications individually

  • The lower the metrics are, the more complex the architecture will need to be to support those values, and in turn, the more it will cost you as a business to implement.  

  • Key questions you should ask to understand your RTO and RPO values include:

  • What impact would the loss of an application have on the business?

  • What are the repercussions of this loss?

  • What would the financial impact be?

  • Are there any SLAs that need to be upheld

  • What dependencies are in place on the application?

  • Are you bound by any external regulatory requirements?

  • The AWS Resilience Hub is an AWS service that acts as a central location to help you manage, define, and validate how resilient your applications are that you are deploying with your AWS infrastructure. 

  • There is no simple and easy metric or rule to determine what your RPO and RTO should be, it is all dependent on your own internal factors within your business

Following this lecture I then began to dive deeper into the individual recovery strategies:

Starting with Backup & Restore:

  • The 1st tier of the 4 recovery strategies

  • This method provides the longest RTO and RPO values and required the most amount of effort to recover

  • Backup & Restore generally assumes your RTO will be 24 hours or less, with an RPO measured in hours. 

  • The cheapest option of the 4 recovery methods

  • Point-in-time recovery allows you additional flexibility across some database and storage services

  • Using AWS Backup can help you manage your backups acting as a central hub to control backups across your environment, across multiple regions

  • Upon regional failure, recovery can be achieved by restoring resources using backups in a new region

Next, we have the Pilot Light:

  • This is the 2nd tier in complexity and cost, following Backup & Restore

  • The main difference between Backup and Restore and Pilot Light is the introduction of replication of data between source and disaster recovery regions to help you reduce your RPO

  • Includes the addition of having critical core infrastructure running in that DR region which is considered ‘Always on’.  

  • Having the data replicated continuously from databases to the Disaster recovery region is a great way to achieve a very low RPO

  • Any changes made in one region needs to be deployed in the DR region for core infrastructure, using AWS Cloudformation can help with the management of these deployments and changes required.  

  • Upon regional failure, recovery can be achieved using application servers from pre-configured images in the DR region.  Data stores will already be available due to continuous replication

Then moved on to Warm Standby:

  • The 3rd tier in complexity and cost, following Pilot Light

  • Similar to Pilot Light, however, Warm Standby has a scaled down version of your primary region up and running, and operational in your designated DR region

  • Reduces RTO when compared to Pilot Light

  • Ability to process incoming request immediately after failure using scaled down resources in designated DR region

  • Use Auto Scaling to scale out the required resources to meet the desired needs of the workload

And then finally, Multi-Site Active/Active:

  • The 4th Tier in complexity and cost, following Warm Standby

  • Offers you the lowest RTO and RPO when it comes to defining your DR strategy

  • With Multi-site active/active you are effectively deploying your infrastructure across multiple regions at full scale

  • There is no designated DR region

  • Your customers can access your applications and services from any region they require

That now brings me to the end of this lecture and to the end of this course, and so you should now have a greater understanding of how to manage RTO and RPO for AWS Disaster Recovery, and some of the strategies that you could use. 

Feedback on our courses here at Cloud Academy is valuable to both us as trainers and any students looking to take the same course in the future. If you have any feedback, positive or negative, it would be greatly appreciated if you could contact support@cloudacademy.com

Thank you for your time and good luck with your continued learning of cloud computing. Thank you.




About the Author
Students
224670
Labs
1
Courses
215
Learning Paths
175

Stuart has been working within the IT industry for two decades covering a huge range of topic areas and technologies, from data center and network infrastructure design, to cloud architecture and implementation.

To date, Stuart has created 150+ courses relating to Cloud reaching over 180,000 students, mostly within the AWS category and with a heavy focus on security and compliance.

Stuart is a member of the AWS Community Builders Program for his contributions towards AWS.

He is AWS certified and accredited in addition to being a published author covering topics across the AWS landscape.

In January 2016 Stuart was awarded ‘Expert of the Year Award 2015’ from Experts Exchange for his knowledge share within cloud services to the community.

Stuart enjoys writing about cloud technologies and you will find many of his articles within our blog pages.