Information is at the heart of most software systems and the lifeblood of many organizations, so you want the database that stores this information to be efficient and reliable. But as we know, things happen; sometimes bad things. One of the ways that we can prevent bad things from happening is to know about them in advance like the old saying says, "To be forewarned is to be forearmed."
Azure SQL in its many forms has an abundance of features that help you to monitor the state of your databases and database server. Ranging from prebuilt automated monitoring that is augmented with artificial intelligence through to dynamic management views, SQL Server monitors and logs all aspects of the database engine’s operation and configuration. Intelligent Insights and Azure SQL analytics enable you to easily access the wealth of diagnostic and performance data in an easily digestible format.
This course introduces you to the different database monitoring and notification technologies available, how they work, and how to use them. If you have any feedback relating to this course, feel free to contact us at firstname.lastname@example.org.
- Understand the key elements of database monitoring
- Learn about the features of Intelligent Insights, Azure's AI-based database monitoring service
- Create graphical reports using SQL Server Management Studio
- Understand how wait statistics can show you where threads have to wait and how this can be used to monitor performance
- View and fix index fragmentation
- Monitor database storage
- Implement notification alerts on various database platforms
This course is aimed at database administrators or anyone who wants to learn how to implement systems that can find potential issues that may disrupt the delivery of their database services.
To get the most out of this course, you should have experience with SQL Server Management Studio, be familiar with reading and writing SQL, and have an understanding of basic database architecture and administration tasks, like indexes and backups.
Course Related SQL Scripts
In terms of diagnosing performance issues, wait statistics are one of the most useful metrics as we've seen with Intelligent Insights. While many of you may have an acquaintance with some wait statistics, there are many of them and some of them are more well-known than others in the context of performance troubleshooting.
I'd like to briefly talk about what is a wait. Azure SQL and SQL Server code is executed by threads in the context of a scheduler. If you have a 14-core hyper threading CPU, the database engine will have up to 28 logical cores to work with, which translates into 28 user schedulers plus some system schedulers. You can use sys.dm_os_schedulers to return a list of schedulers. This works for both SQL Server and Azure SQL.
The schedulers manage execution threads which do the actual work of the database server. There are a finite number of threads that reside in the thread pool. The default number of worker threads is determined at start-up and is based on a simple formula related to the number of logical processor cores.
All 64-bit machines, no matter their configuration will have at least 512 threads in the pool. You can get the maximum number of worker thread count from the DMV sys.dm_os_sys_info view. I've included the formula for 32-bit computers but they are largely irrelevant now.
When the database engine goes to execute a query, it will determine the number of threads needed based on the execution plan which also takes into account the degree of parallelism, MAX DOP, the maximum number of cores available.
The threads are distributed amongst the schedulers and when the query's execution has completed the threads are returned to the thread pool. As there is a finite number of threads it is conceivable that SQL Server will run out of them, say when you have query with a high degree of parallelism with lots of connections wanting to execute it.
When this happens, you will see a wait type of THREADPOOL. This can also happen if there is a high degree of resource contention where hundreds of connections all want exclusive access to a particular resource. The inability to log into a database can be a symptom of the thread pool being exhausted as there are no threads to handle the login.
Thinking now about a single thread executing, it will be running on the CPU and the code needs some data to continue executing. That data is not currently in memory, so it needs to go to disk and pull a database page into memory.
In terms of the CPU, disk I/O takes a very long time, so it would be very inefficient to have the thread sit on the CPU waiting for the data to be retrieved from disk. This is especially true when there are hundreds of other threads that also need processing.
The thread tells the CPU what resource it needs and removes itself from the processor and waits for the resource to become available. The thread waits on a list. In software speak, lists are unordered because there is no telling when the resource it is waiting on will become available. When the resource does become available, then the thread will move from the wait list to the runnable queue.
A queue is by nature a FIFO structure, that is first in first out. The thread will be relatively briefly in the runnable queue until it is its turn for the processor again where it can continue executing with the data retrieved from disk.
A thread will continue its processing, transitioning from running to suspended, then runnable, and back to running until the query has finished. Going back to the example of a query needing data from disk, when the thread gets to the point of requiring the page it will enter the suspended state with a wait type of PAGEIOLATCH_SH.
One interesting question is, how does a thread know the resource it is waiting for is ready? The thread is suspended, not on the processor so it is dumb or inert. Each thread, as it is about to leave the processor, checks for completed resource requests and if there are any will change the relevant thread's state from suspended to runnable, moving the thread to the runnable queue. In this way, threads operate a cooperative scheduling model.
The thread will run again when it reaches the front of the queue. If a thread has no resource requirements, will it stay on the processor until it has finished its task? Well, that depends on whether it can complete in less than four milliseconds. SQL Server also uses time slices to help schedule and distribute CPU time between threads. This is called the os_quantum and is a fixed time of four milliseconds.
If a thread cannot complete in that time, then it removes itself from the processor and goes to the end of the runnable queue with the wait type of SOS_SCHEDULER_YIELD. There is a school of thought that excessive SOS_SCHEDULER_YIELD waits means that the system needs more CPUs. SOS_SCHEDULER_YIELD just means that the thread didn't need resources, it may be doing a scan of a large table already in memory, so by itself is not necessarily indicative of more CPUs required. Only in conjunction with degraded performance and other factors might this be the case.
We end up with an overall wait time that is made up of resource wait time, which are those threads in a suspended state and signal wait time which are those with their resources that are in a runnable state. It is the resource wait time that we are most interested in. The signal wait time is usually very short, maybe one to two milli-seconds, and is very rarely a source of performance issues.
Hallam is a software architect with over 20 years experience across a wide range of industries. He began his software career as a Delphi/Interbase disciple but changed his allegiance to Microsoft with its deep and broad ecosystem. While Hallam has designed and crafted custom software utilizing web, mobile and desktop technologies, good quality reliable data is the key to a successful solution. The challenge of quickly turning data into useful information for digestion by humans and machines has led Hallam to specialize in database design and process automation. Showing customers how leverage new technology to change and improve their business processes is one of the key drivers keeping Hallam coming back to the keyboard.