Storage & Compression
The course is part of this learning path
SQL is the language for interacting with databases, and like all computer languages, there are many ways to achieve the right result. And like other software languages, some ways will be better and more efficient than others. What sets SQL and database programming apart from most other computer programming is the vast amount of data that is regularly dealt with, and how other factors that are not immediately obvious can impact the SQL query code you write.
This course looks at ways that you can see how the SQL Server engine views your queries and how it intends to execute them. Using that information, we look at other factors like table and index design that can positively or negatively impact your query execution.
For any feedback relating to this course, please contact us at firstname.lastname@example.org.
- Understand what impacts query performance when using SQL
- Learn how the SQL Server engine processes queries
- Use execution plans to see how a query is executed
- Instruct the query engine
- Understand how different query types can be used
- Learn how to design tables to make queries more efficient
- Use database file configuration to improve performance
- Understand how database compression can impact performance
This course is aimed at database administrators, data engineers, or anyone who wants to learn how to improve database query execution.
To get the most out of this course, you should be familiar with SQL query syntax, SQL data types, and SQL table structure.
Course Related SQL Scripts
The two main reasons, but by no means all of the reasons, for poor query performance are poor indexing and poor table design. Obviously, indexes cannot exist without tables, but I'm going to talk about index design first in the context of the select into statement we've been looking at.
Here I've highlighted the tables where the data is being pulled from for the select query, and we can see in all cases a table scan is taking place. With the exception of the Items table, these table scans incur a cost. A table scan means that every row in the table is examined to see if it meets the predicate or where clause criteria.
Now, why would it do that? Well, one of the answers to that question could be there are no indexes on these tables, so it has no choice. Well, I know that not to be the case, so for TransLines we have a unique index of branch, transaction number, station number, and line.
For TransHeaders it is a unique index of branch, transaction number, and station. Customers, there is a unique customer code and for Items, there is a UPC, unique product code. Also, we can see here on TransLines we also have an index on UPC. So why isn't the query optimizer using these but is in fact suggesting an index that doesn't exist yet? Well, it's determined that those indexes will not improve performance.
Let's just see by measuring the query in its current form what the performance in terms of execution time is. So that's 5 1/2 seconds which is pretty quick considering that's some 1.9 million rows. So, what if we force the query to use some of these indexes? We can do that by using hints.
So, on TransLines we can tell it to use a particular index, with index, really what we're doing filtering on the unique product code, let's use Index_upc, so how does that look now? Oh, okay, so we've added some operations in here and our query plan has become a little more complex.
How does it relate to performance? It's a little better. Okay, what if we add in this unique index, we're going to be matching against our headers table on that, will it do anything for us? It appears not. Okay, what if I tell it to use the primary key index of the TransHeaders table? No, not really any performance increase there at all.
So what's going on here is that we're doing an index scan, but then we are doing a look-up on the table as well and this is what's going on down here for TransLines, so it's an index seek which is better. Here we're scanning the whole index, here we're not, but you can see that there is an additional cost of looking at those fields that the query needs to get based on the UPC not equaling TPAY2.
So there's a lot going on here, and the relative cost of the insert has decreased quite a bit and we can assume the cost of inserting records will remain relatively static. We're actually creating more cost in the select part of the query with utilizing these indexes.
I'll try one more hint before moving on, and that will be on the Customers table, and I'll tell it to use its primary key, let's just have a look at how that changes the plan. Once again, we're doing an index scan on Customers, but we are then going to the Customers table to retrieve the related fields that we need from each of the records that has matched from the index scan, let's see how that affects performance. Okay, so that's the best so far.
Now you may be wondering why not use the suggested index here? Well, I will, but first of all I want to talk about indexes and the various types of indexes there are, and the implications of using them.
Hallam is a software architect with over 20 years experience across a wide range of industries. He began his software career as a Delphi/Interbase disciple but changed his allegiance to Microsoft with its deep and broad ecosystem. While Hallam has designed and crafted custom software utilizing web, mobile and desktop technologies, good quality reliable data is the key to a successful solution. The challenge of quickly turning data into useful information for digestion by humans and machines has led Hallam to specialize in database design and process automation. Showing customers how leverage new technology to change and improve their business processes is one of the key drivers keeping Hallam coming back to the keyboard.