Optimizing a Power BI Data Model
The course is part of this learning path
In Optimizing a Power BI Data Model we start by investigating the underlying structure and operation of Power BI's VertiPaq database engine. Armed with this knowledge, we investigate various strategies for optimizing performance. Not to give the game away, most of the strategies involve reducing the size of the data model and loading pre-summarized data. We also look at tools within Power BI and Power Query Editor for analyzing report and query performance as well as internal error tracking and diagnosis.
- Remove unnecessary rows and tailor data sets for report audiences
- Change data types to improve storage and split columns into multiple columns to reduce the overall number of unique values
- Remove redundant or duplicate data and replace columns with measures where possible
- Use Performance Analyzer to highlight report bottlenecks
- Create aggregations for imported and direct query data models
- Configure and use Power BI's and Power Query's diagnostic tools
This course is intended for anyone who wants to improve the speed and responsiveness of their reports by improving the underlying data model.
We would highly recommend taking the Developing a Power BI Data Model and Using DAX to Build Measures in Power BI courses first, as they go into depth about how to implement several of the topics discussed in this course.
Let's start by importing the complete MobileSales database from SQL Server. It is almost a star-shaped schema. The SalesSummary table is the fact table that records the sale of items from the Item table to customers from the customer table. Items are classified via a hierarchy described by the Department, Subdepartment, and Class tables. Sales transactions are made at branches by a branch user. Having imported and saved the data model, we can see that it comes in at almost 74 MB in size, so not an enormous model by any stretch. I've also got a very simple report detailing gross sales and margin for the various branch types that are described by the region table.
This report illustrates how you may want to segment your data into multiple models. Suppose you had one model, and you are using row-level security to restrict access to data that is not relevant to particular users. In that case, you could instead have multiple data models. Corporate users have no interest in other data, and the same can be said for dealers. Only a select few of the senior management team would need to see all of this data together. The well-proven divide and conquer strategy is easy to implement and would work well with discrete report audiences. Data models can also be segmented by time. The current model might be the last two financial years, and you could have an archive model of lower resolution dating back, say, five to ten years.
Sticking with the current model, let's look at how it is made up using DAX studio analysis tools. I'll fire up DAX Studio through the external tools menu. Under advanced, select view metrics. This is a great view detailing a table's size, the size of its constituent columns, their data types, and their respective cardinality. Over on the right-hand side, we've got percent table and percent DB columns that tell us what percentage of the table and what percentage of the database each column accounts for. This is very handy for identifying low-hanging fruit in terms of duplicated or redundant data that can be eliminated entirely or replaced with measures. For instance, the receipt number, which is a string, is a concatenation of branch, station, and transaction numbers printed on each invoice.
Looking at customer, we can see that the code field is the unique identifier as its cardinality is the same as the number of rows. The added and updated columns, both date-time fields, primarily used for create, read, update and delete auditing, also have very high cardinality and are candidates for removal. And of course, we have the system-generated local date tables that are used for drilling down into a date hierarchy.
As the customer table has high cardinality, we can remove all the customers who have not recorded any transactions in SalesSummary. We can do this by right-clicking on the customer table in our data model view and selecting edit query. I'll replace the default source statement with an SQL.database statement using a query that returns only customers where they exist in SalesSummary. While I'm at it, I might as well do the same for the Item table as we are only interested in items that have been sold. We have cut the number of customers by 1/3 and items by well over ½, and we've seen a corresponding drop in the data size column. Items make up a relatively small portion of the overall data model size, so they've had a relatively small impact.
Hallam is a software architect with over 20 years experience across a wide range of industries. He began his software career as a Delphi/Interbase disciple but changed his allegiance to Microsoft with its deep and broad ecosystem. While Hallam has designed and crafted custom software utilizing web, mobile and desktop technologies, good quality reliable data is the key to a successful solution. The challenge of quickly turning data into useful information for digestion by humans and machines has led Hallam to specialize in database design and process automation. Showing customers how leverage new technology to change and improve their business processes is one of the key drivers keeping Hallam coming back to the keyboard.