Amazon RDS: Monitoring
The course is part of these learning paths
This introductory course provides a solid foundation in monitoring Amazon RDS using AWS tools.
It begins by getting you acquainted with monitoring databases hosted on the Amazon RDS service and then moves on to explore the available AWS tools that can be used for this purpose.
If you have any feedback relating to this course, please reach out to us at firstname.lastname@example.org.
- Learn about database monitoring in AWS
- How monitoring databases in the cloud is different from on-premises
- Understand the AWS tools available inside RDS for monitoring
- Become aware of the AWS infrastructure monitoring tools that can be used to monitor RDS
This course is intended for anyone who is new to database monitoring — or monitoring in general — and needs to monitor databases hosted in Amazon RDS.
To get the most out of this course, you should have a basic knowledge of cloud computing (Amazon Web Services in particular) and have a high-level understanding of how relational databases work.
There are many aspects of database monitoring. While performance is probably the most common, depending on requirements, there are aspects outside of database performance that need to be monitored as well. These include, but are not limited to, areas such as availability, recoverability, and security.
Availability is the condition where a given resource can be accessed by its consumers. In terms of databases, this means that when a database is available the users of its data, such as applications, consumers, and end-users can access it. Is the RDS instance or endpoint accessible? Are instances stopping or starting? Has an instance been deleted? Have multi-AZ RDS instances had to failover?
Related to availability is the percentage of time a database can be used for productive work. This percentage will vary from organization to organization, system to system, and from user to user. Planned downtime is to be expected. Emergency maintenance can and will happen. Effective monitoring will help keep those unplanned outages to a minimum.
Recoverability is the ability to reestablish service in the event of an error or component failure. Are database backups being performed both automatically and manually? Are there successful backups of individual databases? Have recovery procedures been tested?
Recoverability is an aspect of durability. Durability means that, once a database transaction has been committed to storage, it will not be lost due to a system failure.
From a database perspective, performance monitoring is usually part of a larger performance management system. In the cloud, the goal of a database performance management system is the optimization of resources for enabling the largest possible workload running at the lowest cost.
Performance and availability are terms that are often confused with each other. While related, they are different and should be treated as separate issues. The confusion comes from when performance suffers to the extent that users cannot perform their job functions. When this happens, the database has become unavailable.
Database performance monitoring can identify bottlenecks and points of contention, monitor workloads and throughput, review SQL performance, monitor storage space, and view database instance resource usage.
There are several aspects of a database that must be monitored in order to achieve optimum performance efficiency. Common categories include CPU usage, memory utilization, and storage capacity. Other categories impacting performance can include queue latency, disk I/O, active database connections, blocking or waiting tasks, errors in database log files, and job task failures.
Database security is a complex and challenging endeavor that includes multiple aspects of information security technologies and practices. It's often in conflict with database usability. The more accessible a database is, the more vulnerable it tends to be. Likewise, when tighter security measures are in place on a database, it is more difficult it is to access and use. The challenge is to find balance between security and usability. Monitoring helps establish and maintain this balance.
Monitoring for security and compliance is mostly about access and auditing. Database manageability also plays an important role in database security. Manageability is how efficiently a database can be monitored and maintained to keep it performant, secure, and running smoothly.
A breach in security can take one or more paths to data. These paths include excessive, inappropriate, and unused user privileges, abuse by users with administrative privileges, insufficient application security, database misconfigurations, malware-infected devices, and social engineering such as baiting, phishing, and ransomware.
Data breaches are helped by weak audit trails and poor monitoring. This makes it difficult to determine who caused the breach, what data was compromised, where it happened, and when it occurred. This allows bad actors to repeatedly exploit security gaps.
Actively monitoring RDS databases can help limit the blast radius when a breach occurs. Monitoring will reveal who is connecting to a database instance as well as who has attempted and failed to access the data.
Security monitoring can report what changes, if any, have been made to the database security groups, encryption settings, and backups. Database security logging and monitoring is difficult because data is often sensitive, there are legitimate privileged users, and monitoring requires a risk versus reward versus performance trade-off. Again, it comes back to the idea of finding balance.
The hope with monitoring is that, with enough data, we can predict the future by understanding the present. It can be challenging to connect metric values and log entries to a specific cause or event. While causation and correlation can exist at the same time, correlation does not imply causation. Things can be related without impacting each other. For example, a single poorly designed SQL query can put significant load on a CPU. The query runs slowly and because of how it's designed it impacts other users and reports take a significantly long time to finish. The CPU utilization and slow reports are related. However, the reports are not causing the problem, it's an issue with the SQL query design. Causation explicitly applies to cases where action A causes the outcome B.
Correlation is simply a relationship between the action and the outcome. However, the action, by itself, did not cause the outcome. Action A is related to outcome B. Correlation and causation are often confused because the human mind likes to find patterns even when they do not exist. In database monitoring, this means it's easy to tell ourselves stories to explain problems and those stories, however fascinating, can be wrong.
Another example you might see the number of requests increasing on a database, and, at the same time, memory usage increases. They seem to be related. However, after further investigation, you find that someone ran a large, memory-intensive analysis job at the same time. Database monitoring that provides fast, accurate problem resolution is critical to troubleshoot problems before they affect end-users.
Stephen is the AWS Certification Specialist at Cloud Academy. His content focuses heavily on topics related to certification on Amazon Web Services technologies. He loves teaching and believes that there are no shortcuts to certification but it is possible to find the right path and course of study.
Stephen has worked in IT for over 25 years in roles ranging from tech support to systems engineering. At one point, he taught computer network technology at a community college in Washington state.
Before coming to Cloud Academy, Stephen worked as a trainer and curriculum developer at AWS and brings a wealth of knowledge and experience in cloud technologies.
In his spare time, Stephen enjoys reading, sudoku, gaming, and modern square dancing.