High Availability in RDS
Amazon RDS Performance Insights
DynamoDB Backup Capabilities
The course is part of this learning path
This section of the SysOps Administrator - Associate learning path introduces you to the AWS database services relevant to the SOA-C02 exam. We will understand the service options available and learn how to select and apply AWS database services to meet specific design scenarios relevant to the exam.
- Understand the various database services that can be used when building cloud solutions on AWS
- Learn how to build databases using Amazon RDS, DynamoDB, Redshift and Elasticache
- Learn how to perform database backups across different services
- Understand how to implement database solutions with high availability in mind
- Understand the methods and techniques that can be used to enhance performance across your databases
Every RDS database engine has an internal structure that tracks current connections in memory. Each connection is in one of two possible states, active or idle. To measure the load on the database, Performance Insights samples that structure once every second. This might seem to be a high-load activity that could impact a database's performance. However, it does not.
Every RDS database engine stores information about connections, including state, in memory. Performance Insights only samples it. Then, it sends this information to services outside of RDS for processing and analysis. The resulting impact on performance is no more than 1% of the overall load per CPU.
Data is collected about each active connection. It includes: The SQL query The state of the SQL query. For example, is it running on the CPU? Is it waiting on I/O, a lock, a commit to a log wait, or for another database resource. The host where the SQL query originated and the user that submitted the SQL query. This data is condensed into a single value for measurement and presented as the metric "Average Active Sessions," or AAS. The Performance Insights dashboard is divided into two parts; a load chart and a top activity table.
The load chart is in the top half of the dashboard. It displays the recent history of database load measured in units of Average Active Sessions, AAS, and is color-coded by wait type. To the right of the load chart is a legend that lists the AAS components and their values.
The default time frame for this history, shown in the upper-right, is one hour. The load chart is interactive. By clicking and dragging on it, you can zoom into any visible timeframe down to one-second increments.
The top-activity table is below the load chart. It shows what has contributed to the database load for the time interval displayed on the load chart. By default, the top activity table shows SQL statements but it can also show activity by Waits, Hosts, and Users. It is ordered by load with the heaviest being on top.
On the left side of the top activity table is a bar graph color-coded to match the legend of the load chart. To the right of the bar-graph is the SQL statement responsible for the activity in the bar graph. This dashboard is essentially the same for all RDS database engines. The only difference between them is that each database engine has its own name for Wait states. However, they essentially map to the same internal activities.
This graph presents a visualization of database load. For this information to have value, it needs to have meaning. It needs context. That context is best explained showing how Performance Insights transforms connection data sampled from memory and turns it into the AAS metric.
Performance Insights samples database connection activity in memory once every second. This graph shows three users connected to a database. Time runs from left to right and the sample rate is marked in one-second increments. It shows three types of queries.
The first type of query, represented with a light-green rectangle, runs fast. Each time it is run, it takes less than one second to finish. Also, this type of query is run only occasionally. The next type of query takes a little more time to run. It is slower, but still less one second, and is run with some regularity. It's represented by a darker green rectangle. These queries will be sampled most of the time but not always. The third type of query takes more than one second to complete. It will always be captured by the sampling process.
With a one-second sample rate, some SQL queries are missed. This might seem alarming but it is not an issue. There are two reasons for this. The first is that these missed queries are already optimized and put almost zero load on the database. The second is, as I'll talk about in a moment, dropped samples reduce the amount of noise in the load chart.
Missing some SQL queries filters out a fair amount of noise from the load chart. In some ways, it's like being in a crowded room where people are having conversations, exchanging greetings, or just saying, "Hi." The fast query that runs occasionally has no impact on performance. It's already been optimized and puts no load on the database.
In the example of a crowded room, these are the people saying hi and moving on to another place. There's no interaction and it is of little importance. The sub-second query running with some regularity, while fast and fairly optimized, does add some load to the database. This load might not always be sampled but it is being measured with some consistency and will show up in the load graph.
In the crowded room example, these conversations might be considered minor but have enough substance to be noticed. Long-running queries, those taking one second or longer, will always be captured. In that crowded room, these conversations are important and measured. Even with missing queries, these data points are used to create seamless visualizations within the Performance Insights load chart. It does this by using the same principle televisions use to show movement, persistence of vision.
The standard frame rate for high definition television is 24 frames per second. Every second, 24 images flash in front of your eyes to create the illusion of seamless motion. It is an illusion because there are gaps, where visual data is missing, between each frame.
With video, the average person can perceive continuous motion with as few as 16 frames per second. 24 frames per second was chosen because it provided a quality effect at a reasonable cost.
The idea behind the sampling rate for Performance Insights is similar. By sampling once per second, with some data being dropped, Performance Insights can create what appears to be a seamless view of the motion inside the database. Increasing the number of samples would create a clearer picture but collecting every SQL query would consume a substantial amount of resources. This amount of data collection would put an unwanted load on the database as well as its supporting infrastructure. Sampling memory once every second provides a quality view of database load at a cost of no more than 1% of additional CPU utilization.
Going back to the previous graph of samples, let's add a fourth user to increase the amount of activity. Now there's a more consistent load on the database. To create a picture of the load, the dashboard stacks the sampled data. This provides a clear view of concurrent database activity. It shows how much, or how little, load is on the database over time.
Performance Insights samples memory every second. As part of the sampling process, it returns the total number of active connections. Then, once per minute, it divides the total by 60 to get the average. If there are 30 active connections in that minute of sampling, the AAS value is 0.5. Because of this, several different types of activity can appear to be the same. In this case, there is one user and, to simplify things, samples are taken over a 10-second period. This is a long-running query. 10 connections sampled over 10 seconds. Performance Insights shows one active user.
Here's a different scenario that looks exactly the same inside the load graph. Two users each run a five-second query, one after another. 10 samples over 10 seconds, regardless of the number of users, results in one active session. Instead of a pair of five-second queries, the same two users are running a series of shorter SQL queries that take one second each. In this situation, the two users are running queries alternately. This back-and-forth action, without overlap, appears as a single user in the load chart. 10 samples of 10 seconds, one active session.
What happens when the same two users run a series of five single-second queries simultaneously? 10 samples of 10 seconds, one active session. This graph shows 10 users. They each run a query that takes one second to finish and do it at the exact same time. Then, for the next nine seconds, all 10 connections are all idle. 10 samples of 10 seconds, one active session.
Performance Insights was built to reveal the load on an RDS database instance. If you need to know how many people are connected to an RDS instance, Amazon CloudWatch has the metric DatabaseConnections. AAS is calculated based on the work those connections are generating. This is the first step in determining the load on a database instance. In the next lecture, I'll show you how Performance Insights adds context to AAS.
Stuart has been working within the IT industry for two decades covering a huge range of topic areas and technologies, from data center and network infrastructure design, to cloud architecture and implementation.
To date, Stuart has created 150+ courses relating to Cloud reaching over 180,000 students, mostly within the AWS category and with a heavy focus on security and compliance.
Stuart is a member of the AWS Community Builders Program for his contributions towards AWS.
He is AWS certified and accredited in addition to being a published author covering topics across the AWS landscape.
In January 2016 Stuart was awarded ‘Expert of the Year Award 2015’ from Experts Exchange for his knowledge share within cloud services to the community.
Stuart enjoys writing about cloud technologies and you will find many of his articles within our blog pages.