Assessing Column Data looks at the tools and features in Power BI Desktop and Power Query Editor for assessing and profiling data within a dataset's columns. We cover essential characteristics such as column data types through to distribution and statistical properties of a column's data. This course shows you how to use the various graphical tools in Power Query Editor to visualize data distribution, including uniqueness and distinct values. These tools provide access to filtering functions and detection of empty column values and potential errors due to data type mismatches.
This course also looks at factors that impact correctly importing table relationships into a Power BI data model and pitfalls to look for when assessing and profiling column data.
Learning Objectives
- Learn how to carry out column profiling and assess table relationships in Power BI and Power Query Editor
Intended Audience
This course is intended for anyone who wants to use Power BI for data analysis and wants to learn how to assess the characteristics of their data before doing so.
Prerequisites
To get the most out of this course, you should have some experience with Power BI.
Within Power BI, there are several ways to evaluate data characteristics to point you in the right direction when it comes to cleaning the imported data or alert you to more serious data integrity issues. In Power BI's model view, table relationships will be depicted where they exist in the data source except in the case of compound keys. In the data view, column tools display a field's data type, giving you the ability to change it. You can also filter rows in the dataset based on a column's values by checking desired values or using filters, while a row count and distinct value count are displayed in the status bar.
Power Query Editor, available through the transform menu, assesses each column's data on validity, emptiness, correctness, and distribution, graphically displaying the results. Column headers show the field's datatype and overall validity or not with a stacked bar graphic under the column header. The data characteristics can be viewed in greater detail by turning on column quality in the view tab. Turning on column distribution displays a histogram graphic with a count of distinct and unique values. Be aware that the default behavior of Power Query Editor is to analyze the first 1000 rows of a table, so you may not be alerted to all data anomalies. Column profiling displays even more column statics, such as not a number count, minimum, maximum values, average and standard deviation. In addition to more detailed column stats, column profiling also lets you apply filters and replace values directly from the graph.
Hallam is a software architect with over 20 years experience across a wide range of industries. He began his software career as a Delphi/Interbase disciple but changed his allegiance to Microsoft with its deep and broad ecosystem. While Hallam has designed and crafted custom software utilizing web, mobile and desktop technologies, good quality reliable data is the key to a successful solution. The challenge of quickly turning data into useful information for digestion by humans and machines has led Hallam to specialize in database design and process automation. Showing customers how leverage new technology to change and improve their business processes is one of the key drivers keeping Hallam coming back to the keyboard.