1. Home
  2. Training Library
  3. Microsoft Azure
  4. Courses
  5. Advanced Analysis with Power BI

Groups and Bins Demo

Contents

keyboard_tab

The course is part of this learning path

Start course
Overview
Difficulty
Intermediate
Duration
1h 4m
Students
79
Ratings
5/5
starstarstarstarstar
Description

Advanced Analysis with Power BI examines various methods for teasing out insights from data using statistical methodologies and presenting significant findings in visually compelling formats. The course starts with basic statistics such as standard deviation and then progresses to AI and machine learning analysis where Power BI does all the heavy lighting allowing the user to investigate and dynamically explore significant findings.

Learning Objectives

  • How to use Z-scores to display outliers and use the Outlier Detection visualization from Microsoft
  • How to use Power BI's Anomaly Detection and Fluctuation Analysis functionality
  • Use time-series forecasting to predict future data points with varying degrees of certainty
  • Use groups to classify categorical data and bins to categorize continuous data
  • Learn about Key Influencers
  • Use the Decomposition tree to drill down into a metric manually using known factors or let AI functionality determine which factors are the major contributors
  • Use the power of Azure's AI and machine learning to analyze text for positive and negative sentiment, keywords and phrases, and image tagging

Intended Audience

This course is intended for anyone who wants to discover insights hidden in their data.

Prerequisites

  • Have a basic understanding of statistics, like knowing the difference between a mean and median, a normal distribution, and conceptually how standard deviation is related to that
  • Know how to connect a data source, load data, and generally use the Power BI Desktop and Power Query Editor environments
  • AI Insights demonstration requires a PowerBi.com premium account
Transcript

Groups and bins are the same concept. In fact, within Power BI, they are accessed through the same menu. The primary difference between groups and bins is that groups classify categorical data while bins allow you to classify continuous data. Another way to think of a group is a new category in your data with a parent-child relationship with existing categorical values. A bin is an artificially created classification that you pour values into and then use the classification like categorical data. 

Let's start by looking at grouping. Here I have a bar chart showing the number of each item sold. There are thousands of them. While the chart is sorted from most frequently sold in descending order, looking at the item descriptions, we can see they're made up of handsets, vouchers, extended warranty, and SIM cards. We can use grouping to bring a little order to this visualization. Obviously, items are part of a product hierarchy, but for the sake of demonstration, let's suppose they aren't.

You can make a group by holding down the control key and clicking on the items you want to join in a group. With three SIM cards selected, I'll right-click and choose group data. Over on the right in my item table, a new column has been created called Description with the word groups in brackets appended to it, and the chart legend has two items in it. The group I've just created and everything else categorized as other. I'll rename the group to items group and right-click and select edit group. In the group's dialogue, we have the name of the top, a list on the left of all ungrouped values, and the list of current groups with their constituent members, on the right. The default group name is made up of the group members and can be cumbersome, so I'll give it a more meaningful name.

Now I'm going to go through and create a bunch of other groups based on brand and description and fast forward through most of them because it takes a little time. It's a case of selecting the ungrouped values and then clicking the group button underneath the list. Creating a new group and the group and members area. I've already got an Ericsson group, and I want to add all the Sony Ericsson items to it. It's a case of selecting the existing group and hitting the group button. This adds all Sony Ericsson items to the Ericsson group, and I'll rename it to reflect that.

Having created all those groups, we now see our items are color-coded according to the group membership. When I hover over the different items, group membership is displayed in the tooltip. Item groups can be used as any other column, so I can create a new graph with it. Item groups will be my axis, and I'll use quantity again for values. What's really cool is these two graphs are now linked, so when I click on the group and the right-hand graph, members of that group a highlighted in the left-hand graph. Clicking on item in the left-hand graph shows us what proportion of the group it accounts for in the right-hand graph, very cool. That was categorical data. 

Now let's do the same with continuous data by looking at the number of sales transactions classified by value. In its raw format, this is not a very helpful graph. We can see that the vast majority of transactions were of a low value, with maybe one or two close to $20,000. Instead of selecting values on the chart, I'll right-click on my LineTotal field and select new group. This group window is slightly different from the last one as it tells us the minimum and maximum values of the field in question. We also have a drop-down called bin type that defaults to the size of bins, with a default size already populated and bin size based on the data distribution. 

I'll go with the defaults and click OK. I need to remove the graph, access the LineTotal, and replace it with LineTotal bins. OK, that has reduced the noise significantly, but the graph is still not super helpful. I'll try changing the bin size through the edit group function. Well, that's a small improvement, but it's still not great. I'll try changing the bin type from size of bins to number of bins. Once again, Power BI has come up with a default value of 237 corresponding to the initial bin size calculation. I think that will be a little messy, so I'll reduce the bin count to 30, which in turn increases the bin size to just over $1300. OK, we have reduced the noise in the graph, but it's still not that helpful. While the bin feature is quick and easy, all bins will be the same size no matter what bin type you choose.

You can create a custom bin column within Power Query by adding a conditional column. Let's head over to Power Query editor through the transform data menu. I will call the new column LineTotalBin and use a series of if-then-else statements to categorize the Line total values. By starting with the highest value and using the greater than operator, the Linetotal values will be automatically assigned to the correct group. For the largest and least common values, I'll make my bins very big and reduce the bin size as I get closer to 0. I'm not going to bother with negative values, and I'll throw them all into a refund bucket. Click close and apply to save the changes, and then back in Power BI, I'll create a new chart where I'll use line total bin is my axis and quantity as the values. Great, that gives us a much better idea of transaction distribution based on values.

About the Author
Avatar
Hallam Webber
Software Architect
Students
19585
Courses
46
Learning Paths
7

Hallam is a software architect with over 20 years experience across a wide range of industries. He began his software career as a  Delphi/Interbase disciple but changed his allegiance to Microsoft with its deep and broad ecosystem. While Hallam has designed and crafted custom software utilizing web, mobile and desktop technologies, good quality reliable data is the key to a successful solution. The challenge of quickly turning data into useful information for digestion by humans and machines has led Hallam to specialize in database design and process automation. Showing customers how leverage new technology to change and improve their business processes is one of the key drivers keeping Hallam coming back to the keyboard.