The course is part of these learning paths
This course explores Amazon RDS Performance Insights, a performance monitoring and tuning feature that can quickly assess the load on a database hosted inside Amazon RDS and determine when and where to take action.
In this course, you will learn what the Performance Insights service is, how it provides meaningful information database load, how to use it, and the costs and security features of the service.
If you have any feedback relating to this course, please contact us at firstname.lastname@example.org.
- Obtain a foundational understanding of Amazon RDS Performance Insights
- Learn how Performance Insights measures database load and adds context to AAS (Average Active Sessions)
- Learn how to use Performance Insights
- Understand how to control access to Performance Insights
- Understand the costs associated with using the service
This course is intended for both new and experienced database administrators that need to assess, understand, and optimize the load that users and applications put on an RDS database.
To get the most out of this course, you should have a basic understanding of relational databases, have created at least one RDS database instance, and a working knowledge of the AWS cloud platform.
In the previous lecture, I covered what the metric AAS, the Active Average Session, is and how it is calculated. AAS is what Performance Insights uses to reveal load on an RDS instance.
AAS shows activity and this implies access to the CPU. That's important but there's more to database performance than CPU utilization. Performance Insights adds context to session activity. This is a simple graph representing a single connection and three SQL queries. Here's the same activity but now color-coded based on the state of the query.
In this graph, green represents when the query has access to the CPU. Blue is when the query is waiting for IO. Orange is when the query is waiting for a database resource such as a lock, access to a buffer, or a cache. The first query starts and runs entirely on the CPU.
When complete, it returns its results to the requestor. When the second query starts, it also has access to the CPU. Then there's a problem. It needs data that it doesn't have. The query waits while a request is made to the disk. After it gets what it needs, it loads the data into memory and runs on the CPU again. When finished, it returns data to the requestor
The third query starts on the CPU but then has to wait on an internal database resource. Once it gets access, it starts running on the CPU again. Then it has to wait for something like a database lock. After the lock is cleared, access to the CPU is restored and the query finishes and returns data to the requestor.
With color-coded data, the load graph reveals what's happening within the database. Let's take a look a the session state. The CPU load is represented in green. It shows the database is under constant CPU load. Disk IO, in blue, shows that, for about half the time, data is either being sent or retrieved from disk. In orange, some SQL queries are in a wait state.
Performance Insights displays a graph based on averages. If Performance Insights graphed every sample, the load graph would be very busy and difficult to read. Doing this would also increase the number of resources needed to display and analyze activity. Averages based on one-minute increments make the Performance Insights load graph clear and concise.
The default graph shows an hour's worth of activity divided into one-minute segments. Each segment displays the average active session, ASS state for that minute. As a quick aside, using Performance Insights, one-second sampling can be made visible. In the load graph, select the 5-minute timeframe or use the mouse to click-and-drag on the graph to zoom into it.
Looking at this graph, the next thing to figure out is whether or not the load displayed is large or small. By itself, I can't tell whether it should be ignored or if it requires immediate attention. The next step then is to determine how large this load is compared to the available capacity of the database instance.
Performance Insights has, on the load chart, a line showing the number of CPU cores on the instance. This line is labeled Max CPU. Max CPU is extremely important. There can be only as many concurrent connections running as there are CPU cores on the instance. Active sessions require CPU access. If you run out of cores, the work stops.
In this example, Max CPU shows two cores available. When the demand, shown in green, reaches the line, the next session has to wait. This load graph is displaying CPU saturation on the database instance. It is under a heavy load. Every time the graph moves above Max CPU, somebody is waiting for something. It could be CPU, data on disk, or a wait state to be cleared. The color-coding has revealed, in addition to CPU saturation and wait states, disk I/O is an issue. This creates a clearer picture of how much load exists on the instance, how it is impacting performance, and what is causing it.
As a review, I want to go over what the AAS values mean. If the AAS count is greater than zero but less than one, the database is not blocked. Blocking activity happens when a session fails to make progress. There is CPU available and SQL queries are running. If the AAS count is near zero the database is idle. When the AAS count is less than the number of CPUs on the instance, there is CPU available and the instance is not blocked. This is good!
Be careful! If AAS is greater than one but also less than the number of CPU cores, it could be that a single session has been blocked. Even though the overall database load is acceptable, there could be an issue with a single query. If several users run the same query, in time, it could develop into a problem that impacts performance.
Understanding why a single session is 100% active takes some investigation and analysis. Problems start to appear when AAS is greater than the number of available cores on the database instance. If AAS is significantly greater than the available number of CPU cores there is a bottleneck that needs to be investigated and fixed. Knowing this, it's possible to use Performance Insights to avoid arguments and make recommendations about the database's size and shape. I'll talk about that in my next lecture, Using Performance Insights.
Stephen is the AWS Certification Specialist at Cloud Academy. His content focuses heavily on topics related to certification on Amazon Web Services technologies. He loves teaching and believes that there are no shortcuts to certification but it is possible to find the right path and course of study.
Stephen has worked in IT for over 25 years in roles ranging from tech support to systems engineering. At one point, he taught computer network technology at a community college in Washington state.
Before coming to Cloud Academy, Stephen worked as a trainer and curriculum developer at AWS and brings a wealth of knowledge and experience in cloud technologies.
In his spare time, Stephen enjoys reading, sudoku, gaming, and modern square dancing.