Designing for Failure
Managing RTO and RPO for AWS Disaster Recovery
Designing for high availability, fault tolerance and cost efficiency
High Availability in RDS
High Availability in Amazon Aurora
High Availability in DynamoDB
The course is part of this learning path
This section of the Solution Architect Associate learning path introduces you to the High Availability concepts and services relevant to the SAA-C03 exam. By the end of this section, you will be familiar with the design options available and know how to select and apply AWS services to meet specific availability scenarios relevant to the Solution Architect Associate exam.
Want more? Try a lab playground or do a Lab Challenge!
- Learn the fundamentals of high availability, fault tolerance, and back up and disaster recovery
- Understand how a variety of Amazon services such as S3, Snowball, and Storage Gateway can be used for back up purposes
- Learn how to implement high availability practices in Amazon RDS, Amazon Aurora, and DynamoDB
Let's take a quick look at a demo that shows how easy it is to set up and use a multi master Aurora database cluster.
In this example I’ll perform the following sequence:
- Launch a new multi master Aurora MySQL database cluster within the AWS RDS console.
- Create a new database named demo, and within it create a new table named course.
- Use the AWS RDS console to find and determine the connection points for the multi master database and set them up as environment variables named AURORA_NODE1 and AURORA_NODE2 in the local terminal.
- Launch a Python script that implements load balancing and retry connection logic to continuously insert records into the course table.
- Confirm connections are load balanced across both active master database nodes.
- Crash each of the master database nodes individually.
- Confirm connection failover to the remaining active master database node happens and databases inserts still continue.
Note: The commands and script as demonstrated here onwards are available in the following CloudAcademy Github repository.
Ok, let's begin. Starting off in the AWS RDS console - I’ll create a new Amazon Aurora MySQL Multi Master database.
Under the Database features, I’ll select the “Multiple Writers” option - this is what makes the cluster a multi master. I’ll set the DB cluster identifier to be “cloudacademy-db-multi”. I’ll configure the credentials to be admin with a password of cloudacademy. For instance size - I’ll simply choose the smallest size.
I’ll then deploy it into an existing Multi AZ VPC. For security groups - I’ll simply allocate an existing one which allows inbound TCP connections to the default MySQL port 3306. Connections will be made from an existing bastion host which has the standard MySQL client already installed on it.
Ok with all that in place, I can now go ahead and click on the “Create Database” button at the bottom. Provisioning is fairly quick and takes just a matter of minutes to complete.
While we are waiting for the database provisioning process to complete - let’s jump over into GitHub and examine the Aurora Multimaster repo. Here within the readme we can see the commands that we will execute to create the demo database and course table.
The “insert-test” python script implements connection load balancing and retry logic. Lines 7 and 8 query environment variables established in the terminal - these are the connection endpoints for each of the 2 master instances. Lines 10, 11, 12 specify the database name and database credentials for authentication.
The “reconnect” function spanning line 14 through to 19 simply calls the reconnect function on the passed in connection and logs out the fact that a reconnection has been attempted - either succeeding or failing.
The remainder of the script starting from line 32 establishes 2 database connections, one to each of the master nodes. The script then inserts 100 course records into the course table. Connection load balancing is performed by simply testing whether the current value of x within the for loop is even or odd - and then alternating the connections - with one of the connections being considered the primary, and the other one taking on the role of the backup connection.
Ok, let’s jump back into the RDS console and confirm that our Multi Master database is ready - which it is. Next, I’ll need to gather both connection endpoints for the masters and then set them up as environment variables within an SSH session on the bastion host.
I’ll now jump into my local terminal and connect to the bastion host using SSH.
Once connected I’ll git clone the aurora multimaster git repo. Once that is completed I’ll navigate into the aurora multi master directory and do a directory listing to examine its contents. From here I then establish both the AURORA_NODE1 and AURORA_NODE1 environment variables and configure them with the connection endpoints previously highlighted and copied.
Next, I’ll split the terminal up into 3 individual panes using tmux. I’ll use the key sequence control plus b plus double quote to split the terminal horizontally, and then again vertically using control plus b plus percent sign. This will allow me to run 3 commands side by side and see all results at the same time. In the first pane, I will set up a watch to perform a “select count star from course” every one second. This will provide us a running count of how many records have been inserted into the course table. In the second pane, I will launch the main Python script which will be performing the inserts into the database implementing connection load balancing and retry logic. Here we can see that it has started to successfully insert new data records using connection load balancing. In the previous pane, we can see that the course table count is now increasing as expected. In the 3rd pane, I will intermittently execute the command “alter system crash” against both of the database nodes individually to simulate a crash. We should expect that the connection retry logic is exercised and that the table count view remains incrementing without any loss. This appears to be the case, which is great.
If we now let the main python script playout - we should see that the final table count ends up at 100 - which it finally does. This is a great result.
In summary, this demonstration highlighted the following:
- How to provision a new Aurora MySQL multi master read-write database.
- Both database nodes are in an active-active or multi master read write configuration.
- Connection load balancing and retry logic implemented within the Python client script is working successfully without any data loss.
If you’ve followed along, please don’t forget to terminate your database cluster to avoid ongoing charges.
Stuart has been working within the IT industry for two decades covering a huge range of topic areas and technologies, from data center and network infrastructure design, to cloud architecture and implementation.
To date, Stuart has created 150+ courses relating to Cloud reaching over 180,000 students, mostly within the AWS category and with a heavy focus on security and compliance.
Stuart is a member of the AWS Community Builders Program for his contributions towards AWS.
He is AWS certified and accredited in addition to being a published author covering topics across the AWS landscape.
In January 2016 Stuart was awarded ‘Expert of the Year Award 2015’ from Experts Exchange for his knowledge share within cloud services to the community.
Stuart enjoys writing about cloud technologies and you will find many of his articles within our blog pages.