High Availability in RDS
Amazon RDS Performance Insights
DynamoDB Backup Capabilities
The course is part of this learning path
This section of the SysOps Administrator - Associate learning path introduces you to the AWS database services relevant to the SOA-C02 exam. We will understand the service options available and learn how to select and apply AWS database services to meet specific design scenarios relevant to the exam.
- Understand the various database services that can be used when building cloud solutions on AWS
- Learn how to build databases using Amazon RDS, DynamoDB, Redshift and Elasticache
- Learn how to perform database backups across different services
- Understand how to implement database solutions with high availability in mind
- Understand the methods and techniques that can be used to enhance performance across your databases
In the previous lecture, I covered what the metric AAS, the Active Average Session, is and how it is calculated. AAS is what Performance Insights uses to reveal load on an RDS instance.
AAS shows activity and this implies access to the CPU. That's important but there's more to database performance than CPU utilization. Performance Insights adds context to session activity. This is a simple graph representing a single connection and three SQL queries. Here's the same activity but now color-coded based on the state of the query.
In this graph, green represents when the query has access to the CPU. Blue is when the query is waiting for IO. Orange is when the query is waiting for a database resource such as a lock, access to a buffer, or a cache. The first query starts and runs entirely on the CPU.
When complete, it returns its results to the requestor. When the second query starts, it also has access to the CPU. Then there's a problem. It needs data that it doesn't have. The query waits while a request is made to the disk. After it gets what it needs, it loads the data into memory and runs on the CPU again. When finished, it returns data to the requestor
The third query starts on the CPU but then has to wait on an internal database resource. Once it gets access, it starts running on the CPU again. Then it has to wait for something like a database lock. After the lock is cleared, access to the CPU is restored and the query finishes and returns data to the requestor.
With color-coded data, the load graph reveals what's happening within the database. Let's take a look a the session state. The CPU load is represented in green. It shows the database is under constant CPU load. Disk IO, in blue, shows that, for about half the time, data is either being sent or retrieved from disk. In orange, some SQL queries are in a wait state.
Performance Insights displays a graph based on averages. If Performance Insights graphed every sample, the load graph would be very busy and difficult to read. Doing this would also increase the number of resources needed to display and analyze activity. Averages based on one-minute increments make the Performance Insights load graph clear and concise.
The default graph shows an hour's worth of activity divided into one-minute segments. Each segment displays the average active session, ASS state for that minute. As a quick aside, using Performance Insights, one-second sampling can be made visible. In the load graph, select the 5-minute timeframe or use the mouse to click-and-drag on the graph to zoom into it.
Looking at this graph, the next thing to figure out is whether or not the load displayed is large or small. By itself, I can't tell whether it should be ignored or if it requires immediate attention. The next step then is to determine how large this load is compared to the available capacity of the database instance.
Performance Insights has, on the load chart, a line showing the number of CPU cores on the instance. This line is labeled Max CPU. Max CPU is extremely important. There can be only as many concurrent connections running as there are CPU cores on the instance. Active sessions require CPU access. If you run out of cores, the work stops.
In this example, Max CPU shows two cores available. When the demand, shown in green, reaches the line, the next session has to wait. This load graph is displaying CPU saturation on the database instance. It is under a heavy load. Every time the graph moves above Max CPU, somebody is waiting for something. It could be CPU, data on disk, or a wait state to be cleared. The color-coding has revealed, in addition to CPU saturation and wait states, disk I/O is an issue. This creates a clearer picture of how much load exists on the instance, how it is impacting performance, and what is causing it.
As a review, I want to go over what the AAS values mean. If the AAS count is greater than zero but less than one, the database is not blocked. Blocking activity happens when a session fails to make progress. There is CPU available and SQL queries are running. If the AAS count is near zero the database is idle. When the AAS count is less than the number of CPUs on the instance, there is CPU available and the instance is not blocked. This is good!
Be careful! If AAS is greater than one but also less than the number of CPU cores, it could be that a single session has been blocked. Even though the overall database load is acceptable, there could be an issue with a single query. If several users run the same query, in time, it could develop into a problem that impacts performance.
Understanding why a single session is 100% active takes some investigation and analysis. Problems start to appear when AAS is greater than the number of available cores on the database instance. If AAS is significantly greater than the available number of CPU cores there is a bottleneck that needs to be investigated and fixed. Knowing this, it's possible to use Performance Insights to avoid arguments and make recommendations about the database's size and shape. I'll talk about that in my next lecture, Using Performance Insights.
Stuart has been working within the IT industry for two decades covering a huge range of topic areas and technologies, from data center and network infrastructure design, to cloud architecture and implementation.
To date, Stuart has created 150+ courses relating to Cloud reaching over 180,000 students, mostly within the AWS category and with a heavy focus on security and compliance.
Stuart is a member of the AWS Community Builders Program for his contributions towards AWS.
He is AWS certified and accredited in addition to being a published author covering topics across the AWS landscape.
In January 2016 Stuart was awarded ‘Expert of the Year Award 2015’ from Experts Exchange for his knowledge share within cloud services to the community.
Stuart enjoys writing about cloud technologies and you will find many of his articles within our blog pages.