Introduction
Overview
Intelligent Insights
SQL Server Management Studio
Wait Statistics
Index and Statistics
Storage
Alerts
Summary
The course is part of this learning path
Information is at the heart of most software systems and the lifeblood of many organizations, so you want the database that stores this information to be efficient and reliable. But as we know, things happen; sometimes bad things. One of the ways that we can prevent bad things from happening is to know about them in advance like the old saying says, "To be forewarned is to be forearmed."
Azure SQL in its many forms has an abundance of features that help you to monitor the state of your databases and database server. Ranging from prebuilt automated monitoring that is augmented with artificial intelligence through to dynamic management views, SQL Server monitors and logs all aspects of the database engine’s operation and configuration. Intelligent Insights and Azure SQL analytics enable you to easily access the wealth of diagnostic and performance data in an easily digestible format.
This course introduces you to the different database monitoring and notification technologies available, how they work, and how to use them. If you have any feedback relating to this course, feel free to contact us at support@cloudacademy.com.
Azure SQL and SQL Server Query Performance Tuning
Learning Objectives
- Understand the key elements of database monitoring
- Learn about the features of Intelligent Insights, Azure's AI-based database monitoring service
- Create graphical reports using SQL Server Management Studio
- Understand how wait statistics can show you where threads have to wait and how this can be used to monitor performance
- View and fix index fragmentation
- Monitor database storage
- Implement notification alerts on various database platforms
Intended Audience
This course is aimed at database administrators or anyone who wants to learn how to implement systems that can find potential issues that may disrupt the delivery of their database services.
Prerequisites
To get the most out of this course, you should have experience with SQL Server Management Studio, be familiar with reading and writing SQL, and have an understanding of basic database architecture and administration tasks, like indexes and backups.
Course Related SQL Scripts
https://github.com/cloudacademy/azure-sql-database-monitoring-dp-300
The CXPACKET wait type indicates that parallelism is taking place as part of a query's execution plan. This means that data is being exchanged between two simultaneously running threads of a single process. In this respect, CXPACKET is a wait type that we would expect to see and in general, parallelism is a good thing.
Being able to break down the query into threads that can be executed simultaneously should mean an overall fast execution. And this is normally the case, but there can be exceptions. Obviously, a query plan that is utilizing parallelism is going to be more complicated. There is the managing of the multiple threads and the orchestration of stitching the resultant data together to present as a unified result set.
So, we have a controller thread that spawns worker threads then goes into a CXPACKET wait stat until the worker threads have completed the task and returned their rows. There is a point where the benefits of parallelism don't outweigh the costs of the setup. This is particularly true with small queries.
There is a server value called cost threshold for parallelism, which by default is set to five. That is any query that is estimated to take over five seconds to complete will use parallelism. The default value of five seconds was originally set back in the 1990s when computers were obviously a lot slower.
It is also generally accepted nowadays that depending on the server's specification, a cost threshold of 20 to 25 is more appropriate. You can set the cost threshold for parallelism with SP_configure or using the advanced settings page of the server's properties. You can change this value on all versions of SQL Server except Azure SQL platform as a service. So that's on premise in a VM or a managed instance.
Another area where parallelism can work against you is when tables statistics are not up to date. A simple example is a table scan of a large table where many rows have been recently deleted and the row count is not correctly represented in the database statistics. This will lead to an incorrect distribution to the worker threads.
In this example, the query plan optimizer believes the table has 8 million rows and then creates four threads to get 2 million rows each. Because of the distribution of the records after the delete, the first three threads only have to get 100,000 rows until they complete. However, the parallel operation is incomplete until all threads are finished. So, while there is nothing bad about parallelism, in this case, the CXPACKET wait is indicating out of date database statistics.
PAGEIOLATCH is to be expected. After all, it happens every time a database page has to be read from disk. To know whether excessive PAGEIOLATCH waits are a problem or not, you can compare the current count of waits with your baseline figures. If the count is in the same ballpark for a similar size database, then next look at the wait time. If the average PAGEIOLATCH wait time has increased, then that is indicative of an issue with SQL Server I/O or perhaps a new work issue.
When an executing thread needs some data, it first looks in the buffer pool to see if the page it is after is in memory that is RAM. If it is, then it will do a logical read from memory and no waiting takes place. If the page is not in memory, then a physical I/O or read happens that pulls the page in from disk to memory while the executing thread waits. This is normal behavior.
If there is an unexpected spike in physical reads, it could be indicative of a sudden change in workload or the amount of memory for storing in memory pages has decreased or maybe both. SQL Server memory can come under pressure to reduce the buffer pool size from external Windows sources. A bloated plain cache can cause SQL Server to borrow memory from the buffer pool.
When you see a large number of a CXPACKET and PAGEIOLATCH_SH waits together, this is commonly indicative of a large parallel table scan. If this is an ongoing situation involving the same table, then creating the appropriate index for the query will fix this.
Excessive PAGEIOLATCH_SH waits could be symptomatic of an I/O subsystem or disk issue. A hard drive may be faulty and close to failure or a drive has become badly fragmented or if the server instance is not running on a dedicated machine, it is possible that other non-SQL Server processes are also overloading the hard drive.
The WriteLog wait type is when transaction data is written to disk from the transaction cache or log buffer. Transaction information is not written directly to the log file on disk, but is accumulated in a memory buffer and then is flushed to disk in one shot.
So, the WriteLog wait type is only related to the action of flushing the log to disk and not any other part of the transaction process. This is why it is good practice to have the log file on a physically separate drive to the data file so there is no contention in the I/O subsystem.
Now I just wanna demonstrate the kind of impact that writing the log to disk can have. Here I have a very simple piece of code that is inserting almost 10,000 records into a table using implicit transactions which means a commit on every insert.
Before I run it, I'll just clear the wait stats and then run the insert code. This is running on an S0 tier Azure SQL database and takes just over a minute to complete. Now, if we just have a look at the wait stats, we can see the WriteLog waits there and as we would expect there is 9,999 WriteLog waits with the overall resource wait time of 61.4 seconds being very close to the total time for the whole operation.
Let's see what happens when I wrap the inserts in an explicit transaction. I'll clear wait stats again and run the inserts. Wow! So the first time I ran this was 62.7 seconds. 62.7 minus 61.4 of the overall WriteLog wait time gives us 1.3 which is very close to the elapsed time. When I have a look at the wait stats now we can see there is a count of only three but the average wait time is slightly increased as you might expect with a larger amount of data being written at one time.
Before we move on from the WriteLog, I just want to quickly look at delayed durability as another possible option for reducing the impact of this wait type. Delayed durability is the asynchronous writing of log data and implicit in this is the possibility of data loss should there be a catastrophic failure. If your business requirements cannot tolerate any data loss, then do not use this feature.
I'll use the alter database command and set the delayed durability to forced meaning that all transactions are subject to the setting. Next, I'll clear the wait stats so we have a clean slate. And as you can see, the begin transaction and commit have been removed. Okay, 2.7 seconds is considerably faster than 62 seconds and in the ballpark of our single transaction commit.
What is interesting now that I run the wait stats query is there is no WriteLog wait type there but if I wait a few seconds and run it again, it does turn up. This definitely shows the delayed aspect of delayed durability and its asynchronous nature. Also of interest is that the average wait time for the single WriteLog is also nine milliseconds as it was with the single transaction commit.
While the update lock wait type isn't very common, well, you'd hope not, it is an indicator of resource contention that can have an adverse impact on your database. This wait type means that a session wants to update some data, but is unable to as the resource is being held by another session. This is more likely to happen with lengthy transactions.
So as a rule of thumb, it is better to keep your transaction short and focused. This isn't always possible and it may turn out to be some other reason that has caused a session to hold an update log. So you need to be able to identify which connection or session is holding the lock and preventing other sessions from completing.
One way to identify the blocking session is by queering the dynamic management view, DM_exec_requests which you can join with DM_exec_SQL text to get the SQL in question. By using a predicate where the blocking session ID is not equal to zero, then you can limit the results set to only sessions that are being blocked.
Once you've identified the blocking session, then you can investigate the causes using the session ID or if necessary kill that connection. If you're using SQL Server, then one of the predefined reports available through SQL Server management studio is all blocking transactions report that will essentially give you the same information as the dynamic management view.
Hallam is a software architect with over 20 years experience across a wide range of industries. He began his software career as a Delphi/Interbase disciple but changed his allegiance to Microsoft with its deep and broad ecosystem. While Hallam has designed and crafted custom software utilizing web, mobile and desktop technologies, good quality reliable data is the key to a successful solution. The challenge of quickly turning data into useful information for digestion by humans and machines has led Hallam to specialize in database design and process automation. Showing customers how leverage new technology to change and improve their business processes is one of the key drivers keeping Hallam coming back to the keyboard.