This course explores Amazon RDS Performance Insights, a performance monitoring and tuning feature that can quickly assess the load on a database hosted inside Amazon RDS and determine when and where to take action.
In this course, you will learn what the Performance Insights service is, how it provides meaningful information database load, how to use it, and the costs and security features of the service.
If you have any feedback relating to this course, please contact us at support@cloudacademy.com.
Learning Objectives
- Obtain a foundational understanding of Amazon RDS Performance Insights
- Learn how Performance Insights measures database load and adds context to AAS (Average Active Sessions)
- Learn how to use Performance Insights
- Understand how to control access to Performance Insights
- Understand the costs associated with using the service
Intended Audience
This course is intended for both new and experienced database administrators that need to assess, understand, and optimize the load that users and applications put on an RDS database.
Prerequisites
To get the most out of this course, you should have a basic understanding of relational databases, have created at least one RDS database instance, and a working knowledge of the AWS cloud platform.
Every RDS database engine has an internal structure that tracks current connections in memory. Each connection is in one of two possible states, active or idle. To measure the load on the database, Performance Insights samples that structure once every second. This might seem to be a high-load activity that could impact a database's performance. However, it does not.
Every RDS database engine stores information about connections, including state, in memory. Performance Insights only samples it. Then, it sends this information to services outside of RDS for processing and analysis. The resulting impact on performance is no more than 1% of the overall load per CPU.
Data is collected about each active connection. It includes: The SQL query The state of the SQL query. For example, is it running on the CPU? Is it waiting on I/O, a lock, a commit to a log wait, or for another database resource. The host where the SQL query originated and the user that submitted the SQL query. This data is condensed into a single value for measurement and presented as the metric "Average Active Sessions," or AAS. The Performance Insights dashboard is divided into two parts; a load chart and a top activity table.
The load chart is in the top half of the dashboard. It displays the recent history of database load measured in units of Average Active Sessions, AAS, and is color-coded by wait type. To the right of the load chart is a legend that lists the AAS components and their values.
The default time frame for this history, shown in the upper-right, is one hour. The load chart is interactive. By clicking and dragging on it, you can zoom into any visible timeframe down to one-second increments.
The top-activity table is below the load chart. It shows what has contributed to the database load for the time interval displayed on the load chart. By default, the top activity table shows SQL statements but it can also show activity by Waits, Hosts, and Users. It is ordered by load with the heaviest being on top.
On the left side of the top activity table is a bar graph color-coded to match the legend of the load chart. To the right of the bar-graph is the SQL statement responsible for the activity in the bar graph. This dashboard is essentially the same for all RDS database engines. The only difference between them is that each database engine has its own name for Wait states. However, they essentially map to the same internal activities.
This graph presents a visualization of database load. For this information to have value, it needs to have meaning. It needs context. That context is best explained showing how Performance Insights transforms connection data sampled from memory and turns it into the AAS metric.
Performance Insights samples database connection activity in memory once every second. This graph shows three users connected to a database. Time runs from left to right and the sample rate is marked in one-second increments. It shows three types of queries.
The first type of query, represented with a light-green rectangle, runs fast. Each time it is run, it takes less than one second to finish. Also, this type of query is run only occasionally. The next type of query takes a little more time to run. It is slower, but still less one second, and is run with some regularity. It's represented by a darker green rectangle. These queries will be sampled most of the time but not always. The third type of query takes more than one second to complete. It will always be captured by the sampling process.
With a one-second sample rate, some SQL queries are missed. This might seem alarming but it is not an issue. There are two reasons for this. The first is that these missed queries are already optimized and put almost zero load on the database. The second is, as I'll talk about in a moment, dropped samples reduce the amount of noise in the load chart.
Missing some SQL queries filters out a fair amount of noise from the load chart. In some ways, it's like being in a crowded room where people are having conversations, exchanging greetings, or just saying, "Hi." The fast query that runs occasionally has no impact on performance. It's already been optimized and puts no load on the database.
In the example of a crowded room, these are the people saying hi and moving on to another place. There's no interaction and it is of little importance. The sub-second query running with some regularity, while fast and fairly optimized, does add some load to the database. This load might not always be sampled but it is being measured with some consistency and will show up in the load graph.
In the crowded room example, these conversations might be considered minor but have enough substance to be noticed. Long-running queries, those taking one second or longer, will always be captured. In that crowded room, these conversations are important and measured. Even with missing queries, these data points are used to create seamless visualizations within the Performance Insights load chart. It does this by using the same principle televisions use to show movement, persistence of vision.
The standard frame rate for high definition television is 24 frames per second. Every second, 24 images flash in front of your eyes to create the illusion of seamless motion. It is an illusion because there are gaps, where visual data is missing, between each frame.
With video, the average person can perceive continuous motion with as few as 16 frames per second. 24 frames per second was chosen because it provided a quality effect at a reasonable cost.
The idea behind the sampling rate for Performance Insights is similar. By sampling once per second, with some data being dropped, Performance Insights can create what appears to be a seamless view of the motion inside the database. Increasing the number of samples would create a clearer picture but collecting every SQL query would consume a substantial amount of resources. This amount of data collection would put an unwanted load on the database as well as its supporting infrastructure. Sampling memory once every second provides a quality view of database load at a cost of no more than 1% of additional CPU utilization.
Going back to the previous graph of samples, let's add a fourth user to increase the amount of activity. Now there's a more consistent load on the database. To create a picture of the load, the dashboard stacks the sampled data. This provides a clear view of concurrent database activity. It shows how much, or how little, load is on the database over time.
Performance Insights samples memory every second. As part of the sampling process, it returns the total number of active connections. Then, once per minute, it divides the total by 60 to get the average. If there are 30 active connections in that minute of sampling, the AAS value is 0.5. Because of this, several different types of activity can appear to be the same. In this case, there is one user and, to simplify things, samples are taken over a 10-second period. This is a long-running query. 10 connections sampled over 10 seconds. Performance Insights shows one active user.
Here's a different scenario that looks exactly the same inside the load graph. Two users each run a five-second query, one after another. 10 samples over 10 seconds, regardless of the number of users, results in one active session. Instead of a pair of five-second queries, the same two users are running a series of shorter SQL queries that take one second each. In this situation, the two users are running queries alternately. This back-and-forth action, without overlap, appears as a single user in the load chart. 10 samples of 10 seconds, one active session.
What happens when the same two users run a series of five single-second queries simultaneously? 10 samples of 10 seconds, one active session. This graph shows 10 users. They each run a query that takes one second to finish and do it at the exact same time. Then, for the next nine seconds, all 10 connections are all idle. 10 samples of 10 seconds, one active session.
Performance Insights was built to reveal the load on an RDS database instance. If you need to know how many people are connected to an RDS instance, Amazon CloudWatch has the metric DatabaseConnections. AAS is calculated based on the work those connections are generating. This is the first step in determining the load on a database instance. In the next lecture, I'll show you how Performance Insights adds context to AAS.
Stephen is the AWS Certification Specialist at Cloud Academy. His content focuses heavily on topics related to certification on Amazon Web Services technologies. He loves teaching and believes that there are no shortcuts to certification but it is possible to find the right path and course of study.
Stephen has worked in IT for over 25 years in roles ranging from tech support to systems engineering. At one point, he taught computer network technology at a community college in Washington state.
Before coming to Cloud Academy, Stephen worked as a trainer and curriculum developer at AWS and brings a wealth of knowledge and experience in cloud technologies.
In his spare time, Stephen enjoys reading, sudoku, gaming, and modern square dancing.