Storage & Compression
The course is part of this learning path
SQL is the language for interacting with databases, and like all computer languages, there are many ways to achieve the right result. And like other software languages, some ways will be better and more efficient than others. What sets SQL and database programming apart from most other computer programming is the vast amount of data that is regularly dealt with, and how other factors that are not immediately obvious can impact the SQL query code you write.
This course looks at ways that you can see how the SQL Server engine views your queries and how it intends to execute them. Using that information, we look at other factors like table and index design that can positively or negatively impact your query execution.
For any feedback relating to this course, please contact us at firstname.lastname@example.org.
- Understand what impacts query performance when using SQL
- Learn how the SQL Server engine processes queries
- Use execution plans to see how a query is executed
- Instruct the query engine
- Understand how different query types can be used
- Learn how to design tables to make queries more efficient
- Use database file configuration to improve performance
- Understand how database compression can impact performance
This course is aimed at database administrators, data engineers, or anyone who wants to learn how to improve database query execution.
To get the most out of this course, you should be familiar with SQL query syntax, SQL data types, and SQL table structure.
The compression feature in SQL Server comes in two varieties, row compression, and page compression. The natural instinct is to think of compression in terms of zip archives, and while page compression does share some similarities in that identical values within data on a page are replaced by a single symbol or identifier, row compression is quite different. The subject of compression could be a course on its own, so I'm just gonna give you a sense of how both types of compression work.
Row compression is achieved by the intelligent substitution of data types. This means a smaller version of a data type will be used where it can be. A four byte integer field will become a two byte integer field if it can still accommodate the values. This rationale also applies to DECIMAL and DATE, TIME data types.
CHAR and NCHAR data types will have blank spaces removed. If you are using Unicode characters to store ANSI values then those fields will be compressed. Depending on language and collation settings you can achieve up to a 50% compression on those columns.
Page compression is more complex in that it involves the analysis of the data on the page and has three steps. Firstly, the rows are compressed, then a process called prefix compression takes place, which happens in the context of a column. For a column, a field that has the most commonality with other values of that column within the page is used as an anchor value. All other values in that column have the number of characters they share with that anchor value substituted by the count of those characters.
In the first column, Andrews is the largest value that has common characters with the other rows, as does Stevenson in the second column and dentist in the third column. The fields where the anchor values were have now become empty, but not null. The next step of the process is dictionary compression where the whole page is scanned ignoring columns and common values are placed in a dictionary at the beginning of the page, but after the column anchors, and then those fields are replaced with an index into the dictionary.
The dictionary values can include prefix compressed values. Compression is a high maintenance task and does come with some limitations. The page will only be compressed if a space-saving of at least 20% can be made. Obviously, as page compression depends on all values on a page, a page will not be compressed until it is full, and records are row compressed when inserted until the page is full.
In the case of a clustered index, the page will be compressed after a page split due to an insert. Let's see what kind of space-saving we can get with compression turned on, and how that will affect query performance. Right-clicking on a table and selecting storage then manage compression from the context menu will take you to a compression window where you can select the type of compression you want to implement. This window has a calculate button that will give you an estimate of the space-savings you might be able to achieve with the selected compression type.
I'm gonna start with the least aggressive type of compression, row compression. And we can expect to save just over half a gigabyte. After all of that, we get a simple ALTER TABLE statement that will rebuild all partitions that table is on with the option of data compression equals row. So, we've seen a reduction in table size of around 400 megabytes, leaving us with a size of 2.6 gigabytes, which is slightly less than we were promised.
Running this query with row compressed table data and an uncompressed index takes 21 and a half seconds. Let's see what happens if I row compress the index as well. This will involve dropping it and recreating it with the same data compression option as we saw for the table. Row compressing both the table and the non-clustered index gives us a total saving that is in the ballpark of the initial estimate, around 2.5 gigabytes.
As expected, this query does take a little bit longer to run at 27 seconds. What if we turn on page compression for the table? Well, now we are starting to see significant space-savings for the flat SalesData table, for the first time dipping under two gigabytes. This query also returns in around the same time of 26 and a half seconds.
So it appears that we have got a free half gigabyte of disk space with no impact on data retrieval performance. All compression, but especially page compression does have a significantly negative impact on data manipulation operations, like inserts and deletes. Now, I want to see how compression affects a partitioned table, so I'll head over to the AKPart database and set up page compression on all the partitions.
You can see here that you have the option of different compression settings on different partitions. I'll just hit calculate and the estimates look very promising. So let's page compress the SalesData table. Again, we see the same ALTER TABLE with REBUILD PARTITION statement, one for each of the partitions. We still have the column store index on this table which I will leave in place. This gives us an overall size including our column store index of just over 750 megabytes, and when we run the query it returns in a blistering 6.2 seconds.
The best compression I could get on the normalized tables was using row compression on TranLine, with almost half the size of that table. This is mostly due to the quantity, item ID, line, price, and SubBeforeTax being considerably smaller than the data types used to store them. While compressing TranLine did give the overall smaller size of 650 megabytes, the query performance was terrible, at over two minutes.
Hallam is a software architect with over 20 years experience across a wide range of industries. He began his software career as a Delphi/Interbase disciple but changed his allegiance to Microsoft with its deep and broad ecosystem. While Hallam has designed and crafted custom software utilizing web, mobile and desktop technologies, good quality reliable data is the key to a successful solution. The challenge of quickly turning data into useful information for digestion by humans and machines has led Hallam to specialize in database design and process automation. Showing customers how leverage new technology to change and improve their business processes is one of the key drivers keeping Hallam coming back to the keyboard.