Designing for Failure
Managing RTO and RPO for AWS Disaster Recovery
Designing for high availability, fault tolerance and cost efficiency
High Availability in RDS
High Availability in Amazon Aurora
High Availability in DynamoDB
The course is part of this learning path
This section of the Solution Architect Associate learning path introduces you to the High Availability concepts and services relevant to the SAA-C03 exam. By the end of this section, you will be familiar with the design options available and know how to select and apply AWS services to meet specific availability scenarios relevant to the Solution Architect Associate exam.
Want more? Try a lab playground or do a Lab Challenge!
- Learn the fundamentals of high availability, fault tolerance, and back up and disaster recovery
- Understand how a variety of Amazon services such as S3, Snowball, and Storage Gateway can be used for back up purposes
- Learn how to implement high availability practices in Amazon RDS, Amazon Aurora, and DynamoDB
Hello and welcome to this lecture where I'm going to explain what Multi-AZ actually is when we are referring to Amazon RDS. Multi-AZ simply means Multi-Availability Zone, and so right away, we can ascertain that this is a feature that is used to help with resiliency and business continuity. When Multi-AZ is configured, a secondary RDS instance, known as a replica, is deployed within a different availability zone within the same region as the primary instance. That's its single and only purpose, to provide a failover option for a primary RDS instance.
It's not to be used as a secondary replica to offload read-only traffic to. That is the role of the read replica, which is very different. It's important to understand that key difference, and this difference will become clearer as we make our way through this course. The replication of data between the primary RDS database and the secondary replica instance happens synchronously. Amazon RDS offers different configurations for Multi-AZ instances based on the database engine type. So, let's look at the differences between those, and I wanna start off by looking at Oracle, MySQL, MariaDB, and PostgreSQL. All of these database engines use the failover mechanism when Multi-AZ is in use and configured, but what does this mean? If you have configured Multi-AZ for one of these engine types and an incident occurs which causes an outage to the primary RDS instance, then the RDS failover process takes over automatically.
This process is managed by AWS and is not something that you need to manually perform or trigger. RDS will update the DNS record to point to the secondary instance. This process can typically take between 60 and 120 seconds. The length of time is very dependent on the size of the database, its transactions, and the activity of the database at the time of failover. This automatic changeover enables you to continue using the database without the need of an engineer making any changes to your environment.
This failover process will happen in the following scenarios: if patching maintenance is being performed on the primary instance, if the instance of the primary database has a host failure, if the availability zone of the primary database fails, if the primary instance was rebooted with failover, and if the database instance class on the primary database is modified. As you can see, activating Multi-AZ is an effective measure and precaution to implement to ensure you have resiliency built in should an outage occur, which may result from patching being performed on the instance to a complete AZ outage, which does, of course, happen and has happened. However, if this process is automatic and is performed by RDS, how can you be made aware of when this event occurs? You need to be notified of this event to enable you to understand what caused the issue with the primary instance to trigger the failover. The RDS Failover triggers an event which is recorded as RDS-EVENT-0025 when the failover process is complete. This allows you to configure RDS to notify you by SMS or SNS when this event is triggered. For more information on configuring RDS notifications based on events, please visit the following URL.
These events are also recorded within the RDS Console as well to allow you to gain further information. Let me now talk about SQL Server Multi-AZ configuration which, instead of using the RDS failover mechanism, SQL Server Multi-AZ is achieved through the use of SQL Server Mirroring. To start with, the use of Multi-AZ is not available on all versions of SQL Server. Currently, at the time of writing this course, it supports the following versions. For the latest supported versions, please see the AWS documentation relating to SQL Server. The principle, however, is much the same between SQL Server Mirroring and RDS Failover, in that, both methods are used to provision a secondary instance to act as the primary instance in the event of an outage. SQL Server Mirroring provisions a secondary RDS instance in a separate AZ than that of the primary RDS instance to help with resilience and fault tolerance.
Previously, I mentioned that with the failover of Multi-AZ technique, AWS automatically updates the DNS record to point to the secondary instance. With SQL Server Mirroring, both the primary and secondary instances uses the same endpoint. During an incident, the mirroring process transitions the physical network address from the failed instance to the standby mirrored instance. Before enabling SQL Mirroring, you need to ensure you have your environment configured correctly first. You need to have a database subnet group configured, which has a minimum of two different AZs within it, and this DB subnet must then be associated to the SQL Server that is going to be mirrored. It's worth noting that you can specify which availability zone the standby mirror instance will reside in, so it's always good practice to architect your application that communicates with the RDS database across multiple AZs. To check which AZ the standby instance is in once you have enabled mirroring, you can either use the Console where it will stipulate the location of the secondary instance or you can use the AWS CLI command of describe-db-instances.
Amazon Aurora is different to the previous database engines that I've already discussed when it comes to resiliency across more than one single availability zone. By default, Amazon Aurora DB clusters are fault tolerant, which is designed to maintain the data to withstand a complete failure of an availability zone. This is achieved within the cluster by copying and replicating data across different instances in different AZs within a single region. Should a failure occur of the primary instance, then Aurora can automatically provision and launch a new primary instance, however, this process can take up to 10 minutes, which can be a significant amount of time if the database is being used within a critical production environment.
However, this time can be significantly reduced if you enable Multi-AZ on your Aurora cluster, which allows RDS to automatically provision a replica within a different AZ. With this replica in place, should a failure of the primary instance occur, the replica instance is promoted to the new primary instance and the load and processing is taken over by this existing replica automatically without having to wait the 10 minutes like you did in the previous example. This creates a highly available and resilient database solution. It's possible to create up to 15 different replicas if required, and you can associate each a priority which defines which replica will take over as primary should an incident occur. That now brings me to the end of this lecture covering Multi-AZ across RDS database instances. Coming up next, I shall be covering RDS read replicas and the roles that these instances play.
Stuart has been working within the IT industry for two decades covering a huge range of topic areas and technologies, from data center and network infrastructure design, to cloud architecture and implementation.
To date, Stuart has created 150+ courses relating to Cloud reaching over 180,000 students, mostly within the AWS category and with a heavy focus on security and compliance.
Stuart is a member of the AWS Community Builders Program for his contributions towards AWS.
He is AWS certified and accredited in addition to being a published author covering topics across the AWS landscape.
In January 2016 Stuart was awarded ‘Expert of the Year Award 2015’ from Experts Exchange for his knowledge share within cloud services to the community.
Stuart enjoys writing about cloud technologies and you will find many of his articles within our blog pages.