Storage & Compression
The course is part of this learning path
SQL is the language for interacting with databases, and like all computer languages, there are many ways to achieve the right result. And like other software languages, some ways will be better and more efficient than others. What sets SQL and database programming apart from most other computer programming is the vast amount of data that is regularly dealt with, and how other factors that are not immediately obvious can impact the SQL query code you write.
This course looks at ways that you can see how the SQL Server engine views your queries and how it intends to execute them. Using that information, we look at other factors like table and index design that can positively or negatively impact your query execution.
For any feedback relating to this course, please contact us at firstname.lastname@example.org.
- Understand what impacts query performance when using SQL
- Learn how the SQL Server engine processes queries
- Use execution plans to see how a query is executed
- Instruct the query engine
- Understand how different query types can be used
- Learn how to design tables to make queries more efficient
- Use database file configuration to improve performance
- Understand how database compression can impact performance
This course is aimed at database administrators, data engineers, or anyone who wants to learn how to improve database query execution.
To get the most out of this course, you should be familiar with SQL query syntax, SQL data types, and SQL table structure.
Before I became fully involved with software, I worked in the market research industry. It was last century and the data sets are relatively small and we didn't use relational database management systems. The data was stored in structured record files and performing analysis essentially involved reading each record one at a time and grabbing one or two fields from it. Depending on the type of analysis this might involve loops within loops of record reading.
It occurred to me that it would be much more efficient to turn the data set on its side and read each record as a row of columns. In one disc access, you could read all the values you wanted. Conceptually, this is what the columnstore index is like. You can think of it like the special paste transpose function in Excel. How it works physically needs to be in the context of the database page model.
In a traditional index known as a rowstore index, the indexed fields for record stored together, so the index becomes like a mini table. In the case of a columnstore index, each column of the index with the exception of included columns are stored on their own page or pages and in their own order. Straight away we can see how this would be very efficient from a searching and reading perspective, but quite an overhead for inserting, deleting and updating data.
Each columnstore index field needs to maintain a pointer to the related table row. So when a record is inserted, deleted, or changed, multiple indexes need to be updated. The columnstore index feature is not available in all Azure SQL offerings. You need to be running service level tier S3 or better, elastic pools with at least 100 EDTU, or any vCore configuration. These restrictions are pretty much down to the compute power required to get the benefits from column store.
Interestingly, a columnstore index will not be dropped from a database if it is scaled down from a column store supporting tier, but the query optimization engine will ignore it until such time as the database is scaled back to a tier that does support it.
Going back to our sales query or creating a non-clustered columnstore index on sales data. And I'll take the cue from the previous exercise and use logged sales person department and title before tax, so all the filtering can be done at the index.
Before I run the query, let's just have a look at the overall data size with the columnstore index created. Because all fields in this index are relatively small in size, and there are no included columns, index is much smaller than the roaster index we saw earlier. This results in a reduction of overall size by a third, so now just over two gigabytes.
Also of interest is the fact that the execution plan is not making use of the columnstore index, so I'll have to use a hint to tell the SQL server to use it. Let's run that. 1.7 seconds is a massive improvement. I'll now do the same for the normalized tables by creating a non-clustered columnstore index on train header, and run that query. This does seem to be taking a long time. 132 seconds, so well over two minutes, this is not good.
The main problem here is that we have no index being used to join our transaction header and detail tables. So I'll reinstate the index on the trend line table using train hitter ID. Well, that's more like it, but still only half the speed of the flat table structure. We've also seen a reduction in overall data size, but nowhere near the same as for the flat table, going from 1.55 to 1.36 gigabytes. I don't see this as a slap in the face. We'll properly normalize tables, but more of a vindication of SQL servers columnstore index technology, and how well it accomplishes what is designed to do.
Hallam is a software architect with over 20 years experience across a wide range of industries. He began his software career as a Delphi/Interbase disciple but changed his allegiance to Microsoft with its deep and broad ecosystem. While Hallam has designed and crafted custom software utilizing web, mobile and desktop technologies, good quality reliable data is the key to a successful solution. The challenge of quickly turning data into useful information for digestion by humans and machines has led Hallam to specialize in database design and process automation. Showing customers how leverage new technology to change and improve their business processes is one of the key drivers keeping Hallam coming back to the keyboard.