The course is part of this learning path
Power BI has changed the business intelligence (BI) landscape forever, enabling BI professionals and regular Excel users alike to work with big data and build beautiful and insightful dashboards. Integral to the Power BI platform is the data model.
This course discusses fundamental concepts of data modeling design within Power BI. We explore Microsoft's recommended best practices as well as common data modeling complications and the solutions to them. This course will prepare learners to develop data models strategically to achieve the best performance.
Learning Objectives
- Identify the primary components of data models and describe flat schemas and star schemas
- Diagnose common Power BI data modeling issues within models and recommend appropriate solutions to them
- Identify common challenges in Power BI data models, implement smart solutions, and avoid common mistakes
Intended Audience
- Business professionals whose job requires them to design and build data models in Power BI
- Anyone preparing to take the Microsoft DA-100 exam
Prerequisites
To get the most out of this course, you should be familiar with preparing data using Power BI.
Often, the reason for a many-to-many relationship is that the data has not been properly separated into a star schema. You'll often find dimension data in the fact table alongside the fact data, which flattens out the model, and because all descriptive data is now being populated a million times, all that descriptive data becomes Many data. The solution to these scenarios is usually to star schema the data. When possible, pull all dimension data out into dimension tables. Let's see a real-world example of this, and see how to resolve it.
Here we have some basic information about one of our competitors' sales figures, by channel and also by product. The first comparison we wanna run is a side-by-side comparison of channel sales. So let's start by relating channel names to channel names. We are pretty much relating two fact tables here, so those channel names are gonna show up many times in each table, resulting in a many-to-many relationship. Visually, I can tell that it's many-to-many relationship by looking at the line between these two tables. Notice that the asterisk appears on both sides, indicating a many-to-many cardinality.
Okay, let's try creating our side-by-side comparison of the channel sales. We'll start with our sales from our own fact table, and then add their sales, and then segment it by channel name. Cool, I can see our sales and our competitors' sales side-by-side. While this exact analysis works, I have very little flexibility with my analysis options here. Watch what happens if I try to parse it out by product category. And there we can see that dead giveaway that we have an issue with in missing relationship. This is happening because we specifically related these two tables by channel name, so that's the only related data. So we would need to go back in and add another relationship.
Let's see, switching to model view, dragging and dropping product name on product name, and click OK, and now it appears as a dotted line. This is because its in inactive relationship. And it's an inactive relationship because there can only be one active relationship between two tables at any given time. This is for a very good reason, it prevents circular filtering issues. We would need to deactivate the Channel relationship to enable this new one. This obviously is not very user friendly. So the real, lasting, smart solution is for these two tables to both be filtered by a dimension tables. And in this example our Competitors data is super flat, so that is half of the problem right there.
Convert this flat table to a star schema, pull out the dimension data, and if any of its dimension data is the same as the dimension data from our model, relate them both to that dimension table using one-to-many relationships. This solution would allow our users to filter our data side-by-side with our competitors data using any relevant data, and they wouldn't need to deactivate relationships or program any complex DAX to do that. One of the reasons star schemas are so effective with larger datasets is due to filter propagation behavior.
Let's start by noting the one-to-many relationship between the product categories and subcategories. In these one-to-many relationships, filters applied to the one table, are transferred to the many table. And if that many table is also a one table in another relationship, then that same filter gets propagated down the relationships from one-to-many, one-to-many, until it reaches the fact table.
So if we filter the product category to only audio, the subcategories get filtered for only audio as well, as do the product names, on down. And each different one table can be filtered, of course, which results in the final table, the fact table inheriting the filters from all its parent tables, grand parent tables, and great-grand parent tables and so on. The direction in which filters are inherited between two tables is referred to as the cross filter direction.
While we have only seen one type so far, there are actually two types of cross-filter directions possible in Power BI. The single cross filter direction is the one we have already seen. This allows filters on the One table to flow to the Many table. This is the default cross filter direction for One-to-many relationships, and also the most common. In single types filtering the Many table does not have any impact on the One table. But there are cases when you might need filters to be able to flow upstream if you will. This bi-directional cross-filtering is referred to as Both in Power BI. This cross filter direction is generally discouraged as it can potentially cause problems, but sometimes it's the best way to get the job done.
So let's learn when we can use it, and when we should absolutely avoid it. Many-to-one and One-to-many relationships are by default assigned the single cross filter direction, but you can change them to Both when needed. Let's see an example. The most common reason to adjust cross-filter direction to Both is to sync slicers. For example, on this visual, notice that when I select customers, my other slicers are not being filtered to show the available categories and colors for that customer.
For instance, our customer A Datum Corp does not purchase home appliances from us, but you wouldn't know that from the slicer. The slicer implies that home appliances are a valid category for this customer, but when I select Home Appliances, the whole visual goes blank. That's because they don't purchase any Home Appliances. The reason why the slicer isn't being filtered is that filtering the customer table for A Datum Corp only flows down to the fact table where it stops. This filter won't swim up any other streams to filter those up-steam table, unless we adjust the cross-filter direction.
If I change the relationship between these two tables from the single cross filter direction to the both cross filter direction, notice that the icon changes from a single arrow to a double arrow, and if I return to my report and select a slicer option, now the other slicers are automatically filtering themselves to display only relevant options. This was simple and easy and got the job done. But behind the scenes some important things happened, and this one seemingly small adjustment could cause problems for us, if we aren't careful.
Sometimes the Both or bi-directional cross filter direction introduces ambiguity into the data model, which means that there are multiple paths that Power BI could use to propagate filters between tables. Luckily, we do not have any of that in this model. But let's create a fictional scenario to better understand this real-world problem.
Let's say we had a brands table, and that brands table could be used to propagate filters to the products table, and also to the orders table. In this scenario, this bi-directional cross filter we created earlier would introduce ambiguity into the model, because there would be two different paths that brand filters could be propagated to products table. Either directly from the brand table to the products table, or from brands to orders customers to fact sales to products.
Power BI's algorithm is pretty smart in guessing which route is best, but it's not smarter than humans, so it's up to us to help Power BI avoid these ambiguous scenarios. Earlier we saw that it's possible for two tables to have multiple relationships between them, and we noticed that only one relationship can be active at any time. Usually this happens between a dimension table and the fact table. One of the most common scenarios is when you have two different date fields, for instance sales date and ship date, they both need to be used in different scenarios.
Non-active relationships can be activated by using some custom DAX. And for each custom measure the date table can play different roles with the fact table, which is why these are called role playing tables. So we now know that relationships come in a lot of flavors, and they are absolutely critical elements of data models. But not all relational problems are caused directly by relationships. Having granularity disparities is another major culprit we want to be aware of. Let's dig in to the concept of Granularity.
Chelsea Dohemann is a Senior Technical Trainer and Microsoft Certified Master with almost a decade of experience in technology training. She has taught an array of applications from Microsoft products including Office 365 web apps, Microsoft Office Suite, Power BI, VBA for Excel, and SharePoint to Adobe Acrobat Pro and Creative Cloud. Being a persistent learner herself, Chelsea is acutely in-tune with the challenges of learning. She presents her topics in plain language, with real-world examples, reducing complex concepts down to their simple parts.