Optimizing a Power BI Data Model
The course is part of this learning path
In Optimizing a Power BI Data Model we start by investigating the underlying structure and operation of Power BI's VertiPaq database engine. Armed with this knowledge, we investigate various strategies for optimizing performance. Not to give the game away, most of the strategies involve reducing the size of the data model and loading pre-summarized data. We also look at tools within Power BI and Power Query Editor for analyzing report and query performance as well as internal error tracking and diagnosis.
- Remove unnecessary rows and tailor data sets for report audiences
- Change data types to improve storage and split columns into multiple columns to reduce the overall number of unique values
- Remove redundant or duplicate data and replace columns with measures where possible
- Use Performance Analyzer to highlight report bottlenecks
- Create aggregations for imported and direct query data models
- Configure and use Power BI's and Power Query's diagnostic tools
This course is intended for anyone who wants to improve the speed and responsiveness of their reports by improving the underlying data model.
We would highly recommend taking the Developing a Power BI Data Model and Using DAX to Build Measures in Power BI courses first, as they go into depth about how to implement several of the topics discussed in this course.
Optimizing a Power BI data model is conceptually not too different from optimizing most types of databases. You want to reduce the size of the database as much as possible, so the server's CPU has less data pull from disk into RAM and then from RAM into cache before beginning the report processing. And like a relational database, the relationships between tables in the Power BI data model need to be clean and straightforward, so many-to-many relationships should be avoided as well as compound keys. This is why we have reporting-oriented data designs like the star schema. To see how this is achieved in practical terms, we must first understand the underlying architecture of Power BI's Vertipaq database engine.
First and foremost, data imported into Power BI, instead of direct querying of an external data source, are stored as columns rather than rows. This is not merely a case of transposing rows to columns as you might in Excel with the Special Paste function, as the column data is also compressed by only storing unique values. Storing unique values as columns is the primary mechanism behind Power BI's relatively quick performance.
Cardinality is the term given to describe the uniqueness of values within a dataset, or in this case, a column. As Power BI's Vertipaq engine only stores unique values, high cardinality data can negatively impact performance. Not only can reducing cardinality reduce the amount of raw data but also the related index data for looking up column values. Less unique values mean small indexes, which translates into faster performance.
I'll start with a freshly imported model, and as I apply the various optimizations, we will see how the data model reduces in size, which in turn will lead to better report performance.
Hallam is a software architect with over 20 years experience across a wide range of industries. He began his software career as a Delphi/Interbase disciple but changed his allegiance to Microsoft with its deep and broad ecosystem. While Hallam has designed and crafted custom software utilizing web, mobile and desktop technologies, good quality reliable data is the key to a successful solution. The challenge of quickly turning data into useful information for digestion by humans and machines has led Hallam to specialize in database design and process automation. Showing customers how leverage new technology to change and improve their business processes is one of the key drivers keeping Hallam coming back to the keyboard.