Advanced Analysis with Power BI examines various methods for teasing out insights from data using statistical methodologies and presenting significant findings in visually compelling formats. The course starts with basic statistics such as standard deviation and then progresses to AI and machine learning analysis where Power BI does all the heavy lighting allowing the user to investigate and dynamically explore significant findings.
Learning Objectives
- How to use Z-scores to display outliers and use the Outlier Detection visualization from Microsoft
- How to use Power BI's Anomaly Detection and Fluctuation Analysis functionality
- Use time-series forecasting to predict future data points with varying degrees of certainty
- Use groups to classify categorical data and bins to categorize continuous data
- Learn about Key Influencers
- Use the Decomposition tree to drill down into a metric manually using known factors or let AI functionality determine which factors are the major contributors
- Use the power of Azure's AI and machine learning to analyze text for positive and negative sentiment, keywords and phrases, and image tagging
Intended Audience
This course is intended for anyone who wants to discover insights hidden in their data.
Prerequisites
- Have a basic understanding of statistics, like knowing the difference between a mean and median, a normal distribution, and conceptually how standard deviation is related to that
- Know how to connect a data source, load data, and generally use the Power BI Desktop and Power Query Editor environments
- AI Insights demonstration requires a PowerBi.com premium account
Power BI's key influencers visualization uses AI and machine learning to determine which factors in your data contribute to or drive a metric you're analyzing. When this feature was first introduced, you could only analyze categorical data, but since then, key influencers have expanded to include numerical and continuous data along with aggregates and measures.
In this demonstration, I want to look at the drivers of customer ratings of an online service. The rating itself is binary, and a classification of the original score of 1 to 10 where low ratings are 1 to 5 and high ratings are 6 to 10. We can pick other fields from within our data model that might explain or correlate with the analyzed variable. Like anomaly detection, candidate factors can be dragged on to the Explain by area. As I drag other fields onto the explain by section, analysis is done in real-time, working out each factor's contribution. The influence of the explain by factor determines the order of presentation on the left of the visualization; the most influential factors are at the top. Rating is a categorical field, and if I go into the format pane, we can see analysis type is indeed categorical. But if I change the analysis variable to the original score, which is a numerical value from 1 to 10, the analysis type has changed to continuous.
While it's great to know what customers like about an organization, it's even better to know what they don't like, so I'll change the what influences rating from high to low. When the role in an organization is consumer, it has significantly more impact than, say, company size or region. When I include theme, which is the main reason a customer gives for their rating, we can see that it is on a par with the consumer role in an organization. What does it mean for a consumer role to be 2.57 times more influential in the likelihood of giving a low rating? The first thing to understand is that these likelihood figures are in the context of the explain by factor. In the context of all roles, 5.78% give a low rating, but 14.93% of consumer roles rate low. 14.93 is 2.57 times higher than 5.78. It looks like the administrator role also gives low ratings, but there are only 2900 administrator roles, whereas there are over 29,000 consumer roles. Hence, the administrator role contributes far less in total.
Clicking on the next key influencer, the theme of usability, we get a column chart on the right showing a breakdown of low ratings by theme. At the bottom left of that chart, we can tick only show values that are influencers, and that displays usability, security, navigation, and speed which relate to the four themes on the left. The influencer graphic is linked to the factor analysis chart on the right, as you would expect.
Like other visualizations on a report, we can drop a slicer on the page to drill down further into the data. I expect that large enterprise organizations will have different concerns and priorities when it comes to online services. If I slice the data by company size, we can see the most influential factor for companies with over 50,000 employees is the security theme.
Suppose I drop a continuous numerical field on to explain by, such as tenure, which is the number of months a customer has been a subscriber to the service. In that case, we can see the field has been automatically allocated into bins for classification. As you might expect, those customers who have been with the service the longest are more likely to give a low rating. I guess you could say familiarity breeds contempt.
I'll also drop support ticket ID onto the explained by area because you'd think that customers experiencing more support issues would rate the service lower, and indeed they do. There is a many to 1 relationship between support tickets and customers, so the metric must be aggregated before using it in the customer table context. Trying to analyze non-summarised support tickets doesn't work.
While exploring the data by drilling down and poking around can be interesting, you can jump straight to the bottom line with the top segment's view. Behind the scenes, it looks like six segments, not seven, have been found. The position of each segment bubble is determined by the proportion of low ratings, while its size is related to the number of data points it includes. Segment one is definitely not the largest, but it is low-hanging fruit at almost ¾ of its customers give low ratings. On the left, we have a brief text description of segment one's characteristics that should come as no surprise to us. Segment members have more than four support tickets and are long-time customers whose role in the organization is not a publisher. Most segments share the support ticket and tenure characteristics.
Hallam is a software architect with over 20 years experience across a wide range of industries. He began his software career as a Delphi/Interbase disciple but changed his allegiance to Microsoft with its deep and broad ecosystem. While Hallam has designed and crafted custom software utilizing web, mobile and desktop technologies, good quality reliable data is the key to a successful solution. The challenge of quickly turning data into useful information for digestion by humans and machines has led Hallam to specialize in database design and process automation. Showing customers how leverage new technology to change and improve their business processes is one of the key drivers keeping Hallam coming back to the keyboard.