Profiling the Data
Troubleshooting in Power Query
In this course, we’ll review the Power BI Desktop interface. Then, we’ll show you how to load data into Power BI Desktop and how to save your file. We will also explain data profiling and look at the various data profiling options in Power Query like column quality, column value distribution and column profiling, and the benefits of using these.
Lastly, we will look at how to resolve cell-level errors, empties, and inconsistencies in Power Query. This section will cover how to replace errors, replace values, remove rows, and how to identify the root cause of the error using Power Query. The demos in this course will provide you with practical examples that will help you to troubleshoot when encountering issues while loading data into Power BI.
- Understand how to load data into Power BI and how to optimise functionality and size by using data profiling
- Understand how to resolve errors, empties, and data inconsistencies in Power Query
- Anyone who would like to learn about importing data into Power BI
- Anyone who needs to resolve cell-level errors or empties in a Power BI model or who would like to understand data profiling to improve the functionality of their model
- Some basic knowledge of, or experience in, working with large datasets
- Some experience with Power BI (not mandatory)
The files used in this course can be found in the following GitHub repo: https://github.com/cloudacademy/loading-data-power-bi
Power Query provides different functionality for identifying data anomalies, and in this section, we cover missing and error values. Let's have a look at what this looks like in Power BI. I have opened the Power BI file that we created and saved in the previous module on loading data. Okay, I'm now in Power BI desktop, and as mentioned, this sits within the Power Query. When I go to the query section, if I hover over the transform button, you can see it says, "Use the Power Query editor to connect, prepare and transform data." Select transform, and next, the Power Query editor will open up.
When we get inside of Power Query, you are going to see a turquoise colored line under the titles of the columns. This gives you information about the quality and statistics of the individual columns. When I hover over the turquoise line, you will see valid cells, errors and empties. This means that if there are any errors or empty cells in this column, you will be able to see it here. So we can already see why this could be useful. You can see the same information if you hover over any of the other columns.
For this example, I will be looking at the promo group column in the promotional groups table. There are 78 valid cells and zero errors and empty cells. This translates to a 100% valid cells and 0% errors and empty cells. If we go to the view tab, there are additional options we can select. If I select column quality, this will just expand the information that I had when I hovered over the turquoise line. You will see valid cells, errors and empties. You can copy the column quality metrics. Right click the area under the column header and select copy quality metrics. Now, paste this into an Excel sheet. You can see that this is all valid cells with no errors or empties.
So for this example, I am going to introduce missing values. Go to the original Excel file that had the data that was loading into Power BI. In the PromoGroup tab, delete the first 3 lines stating "ICED CAKE" in the PromoGroups column. Now, save the file. Go to power desktop, and in the navigator pane select refresh. This will refresh the data in Power BI to include the missing values. Next, we also need to refresh the Power Query preview.
In Power Query, select refresh all in the home tab under the refresh preview button. The null values are now included in the PromoGroups column in the promotional groups table. And when we hover over the turquoise colored line, it reflects that there are three empty cells in this column. For the next example, I will introduce error values. Go back to the original Excel file that had the data that was loaded into Power BI. Correct the blank cells to include ICED CAKE again.
Now, type the letter K at the end of the first three Product ID's in the product ID column in the promotional groups tab and press save. Currently, Power BI reads this column as a number type column, and therefore, including letters in this column will result in an error. Follow the same steps as in the previous example by refreshing the Power BI desktop, and then refresh all under refresh preview in Power Query. In Power Query, you can see in the PromoGroups column that the null values are no longer showing and there are no empty cells.
When we move to the product ID column, we can see that there are now three errors reflected. For now, we just looking at how to identify errors in columns, but in another module, I will show you how to correct these errors. Before I continue with the next module, I will remove the letter K in the product ID column in Excel. Press save and then refresh the Power BI desktop and Power Query.
Bianca is a chartered accountant and finance business partner who works with Power BI regularly to create useful, interactive dashboards to analyze financial metrics. She has worked as a lecturer and as a financial analyst in FMCG companies assisting sales and marketing teams with reviewing and understanding their financial results and forecasts, and identifying risks and opportunities for improvement. Bianca enjoys using technology to automate and simplify financial metrics.