Power BI has changed the business intelligence (BI) landscape forever, enabling BI professionals and regular Excel users alike to work with big data and build beautiful and insightful dashboards. Integral to the Power BI platform is the data model.
This course discusses fundamental concepts of data modeling design within Power BI. We explore Microsoft's recommended best practices as well as common data modeling complications and the solutions to them. This course will prepare learners to develop data models strategically to achieve the best performance.
Learning Objectives
- Identify the primary components of data models and describe flat schemas and star schemas
- Diagnose common Power BI data modeling issues within models and recommend appropriate solutions to them
- Identify common challenges in Power BI data models, implement smart solutions, and avoid common mistakes
Intended Audience
- Business professionals whose job requires them to design and build data models in Power BI
- Anyone preparing to take the Microsoft DA-100 exam
Prerequisites
To get the most out of this course, you should be familiar with preparing data using Power BI.
Data models are made up of tables and relationships. The way the tables and relationships are laid out is referred to as the schema, we have all seen one very common type of schema, the flat schema. The flat schema is tables. These are two-dimensional with columns and rows and they work great if your data is under a few hundred thousand data points and your goal is to perform some basic analysis on a column of data.
The limitation of this model is that adding just one column results and adding one data point for every single row that already exists. So when we start handling big data in the hundreds of thousands, tens of millions, even billions of records, we need to move to a dimensional schema which is when we graduate to the star schema. The star schema can ignore the irrelevant data in the model while running a calculation. The star schema is so named because it usually has one central events table which has relationships to a number of detailed tables. When we use star schemas, we segregate the data into two purposes, fact data and dimension data.
Fact data is a list of events that drive a business process. The data is usually calculable and the fact tables often have records in the high thousands, millions and billions. Think of sales transactions. Each transaction is an individual numerical data point and we calculate those data points to measure business performance.
Dimension data on the other hand, is all the details that describe the fact data. Dimension tables contain categories and subcategories, locations and business channels, client areas, and employees involved in the fact events. Each cluster of dimension data gets its own table, so we tend to have multiple dimension tables.
We use this dimension data to describe the fact data and because the fact data measures in the millions, we need to distill all that information down to a small understandable measure. This process is called aggregation. Aggregation is the process of taking a bunch of numerical data points and measuring them in such a way that produces a single numerical representation of all that data. You already know the basic aggregations like sum, average, count, min and max, et cetera.
Chelsea Dohemann is a Senior Technical Trainer and Microsoft Certified Master with almost a decade of experience in technology training. She has taught an array of applications from Microsoft products including Office 365 web apps, Microsoft Office Suite, Power BI, VBA for Excel, and SharePoint to Adobe Acrobat Pro and Creative Cloud. Being a persistent learner herself, Chelsea is acutely in-tune with the challenges of learning. She presents her topics in plain language, with real-world examples, reducing complex concepts down to their simple parts.