Designing for Failure
Designing for high availability, fault tolerance and cost efficiency
High Availability in RDS
High Availability in Amazon Aurora
High Availability in DynamoDB
The course is part of this learning path
This section of the Solution Architect Associate learning path introduces you to the High Availability concepts and services relevant to the SAA-C03 exam. By the end of this section, you will be familiar with the design options available and know how to select and apply AWS services to meet specific availability scenarios relevant to the Solution Architect Associate exam.
- Learn the fundamentals of high availability, fault tolerance, and back up and disaster recovery
- Understand how a variety of Amazon services such as S3, Snowball, and Storage Gateway can be used for back up purposes
- Learn how to implement high availability practices in Amazon RDS, Amazon Aurora, and DynamoDB
Welcome back! In this lecture, you’ll learn about the HA configuration options available within Amazon Aurora. Knowing of these options and how to apply them will ensure that your applications run with maximum uptime.
For starters, Amazon Aurora which is often quoted as AWS’s fastest growing service is a database service which provides superior MySQL and Postgres engine compliant performance, and is designed in a way that separates the compute layer from the storage layer. Separating the compute layer and storage layer from each other is a key architectural decision which allows you to dial up and down the availability of your data - mostly in the way that read replicas can be easily introduced and removed at will - more on this later.
The compute layer when launched can be provisioned in several configurations - providing varying forms of performance and availability which I’ll cover off individually in the coming slides. The compute layer is implemented using EC2 instances, but since this is a managed service these will not show up within the EC2 console.
The storage layer is shared amongst all compute nodes within the cluster regardless of the cluster configuration. Aurora stores data in 10 GB blocks, with each block being replicated six times across three AZs - two within each availability zone. From an availability and durability point of view, Aurora can handle up to 3 copies lost for reads, and up to 2 copies lost for writes. This makes the data highly redundant, durable, and available. The storage layer is presented to the compute layer as a single logical volume. This same single logical volume is shared across all compute instances involved in the compute layer whether it be a master or read replica - allowing the read replicas to accomplish the near-identical query performance as the master itself.
When compared with RDS - the management of data from a replication viewpoint is fundamentally different. With RDS data needs to be replicated from the master to each of its replicas. Aurora, on the other hand, has no need for replication since it uses and shares a single logical volume amongst all compute instances.
Aurora uses a quorum and gossip protocol baked within the storage layer to ensure that the data remains consistent. Together the quorum and gossip protocol provide a continuous self-healing mechanism for the data. Reads require a quorum of 3 and Writes require a quorum of 4. The peer to peer gossip protocol is used to ensure that data is copied across each of the 6 storage nodes. If a storage node goes offline intermittently - when it comes back online it will receive all data modifications from its peers via the gossip protocol. Availability zones are connected together using very high-speed backbone interconnects - which means that the gossip protocol is very fast.
Aurora in general, and regardless of the compute layer setup - always provides 6 way replicated storage across 3 availability zones. Because of Aurora's storage layer design, Aurora is only supported in regions that have 3 or more availability zones.
Aurora provides both automatic and manual failover of the master either of which takes approximately 30 seconds to complete - its very quick one of the great benefits of using Aurora.
In the event that Aurora detects the master going offline, an automatic failover will be performed. In this scenario, Aurora will either launch a replacement master or promote an existing read replica to the role of master, with the latter being the preferred option as it is quicker for this promotion to complete.
Connecting to an Aurora database is performed by using one of the database connection endpoints.
Connection endpoints are created by the service to allow you to connect particular compute instances of the cluster.
There are 4 different connection endpoint types. Which one you end up using is dependent on your requirements. Let’s now quickly review each of them:
- Cluster Endpoint: The cluster endpoint points to the current master database instance. Using the Cluster endpoint allows your application to perform read and writes against the master instance.
- Reader Endpoint: The reader endpoint load balancers connections across the read replica fleet within the cluster.
- Custom Endpoint: A custom endpoint load balancer's connections across a set of cluster instances that you choose and register within the custom endpoint. Custom endpoints can be used to group instances based on instance size or maybe group them on a particular db parameter group. You can then dedicate the custom endpoint for a specific role or task within your organization - for example, you may have a requirement to generate month end reports - therefore you connect to a custom endpoint that has been specifically set up for this task.
- Instance Endpoint: An instance endpoint maps directly to a cluster instance. Each and every cluster instance has its own instance endpoint. You can use an instance endpoint when you want fine-grained control over which instance you need to service your requests.
As a general rule of thumb - read-intensive workloads should connect via the reader endpoint.
Reader and Custom connection endpoints are designed to load balance connections across their members - with the intention of spreading load across the member instances. Connection endpoint load balancing is implemented internally using Route 53 DNS - therefore be careful in the client layer not to cache the connection endpoint lookups longer than their specified TTLs.
Connection endpoints are mostly applicable and used in “Single Master with Multiple Read Replica” setups.
Stuart has been working within the IT industry for two decades covering a huge range of topic areas and technologies, from data center and network infrastructure design, to cloud architecture and implementation.
To date, Stuart has created 150+ courses relating to Cloud reaching over 180,000 students, mostly within the AWS category and with a heavy focus on security and compliance.
Stuart is a member of the AWS Community Builders Program for his contributions towards AWS.
He is AWS certified and accredited in addition to being a published author covering topics across the AWS landscape.
In January 2016 Stuart was awarded ‘Expert of the Year Award 2015’ from Experts Exchange for his knowledge share within cloud services to the community.
Stuart enjoys writing about cloud technologies and you will find many of his articles within our blog pages.