Advanced Analysis with Power BI
The course is part of this learning path
Advanced Analysis with Power BI examines various methods for teasing out insights from data using statistical methodologies and presenting significant findings in visually compelling formats. The course starts with basic statistics such as standard deviation and then progresses to AI and machine learning analysis where Power BI does all the heavy lighting allowing the user to investigate and dynamically explore significant findings.
- How to use Z-scores to display outliers and use the Outlier Detection visualization from Microsoft
- How to use Power BI's Anomaly Detection and Fluctuation Analysis functionality
- Use time-series forecasting to predict future data points with varying degrees of certainty
- Use groups to classify categorical data and bins to categorize continuous data
- Learn about Key Influencers
- Use the Decomposition tree to drill down into a metric manually using known factors or let AI functionality determine which factors are the major contributors
- Use the power of Azure's AI and machine learning to analyze text for positive and negative sentiment, keywords and phrases, and image tagging
This course is intended for anyone who wants to discover insights hidden in their data.
- Have a basic understanding of statistics, like knowing the difference between a mean and median, a normal distribution, and conceptually how standard deviation is related to that
- Know how to connect a data source, load data, and generally use the Power BI Desktop and Power Query Editor environments
- AI Insights demonstration requires a PowerBi.com premium account
For most of human history, the ability to predict the future was considered in the realms of gods and wizards. In relatively recent times, statistical methods like regression have put the ability to estimate the future in the hands of us mere mortals. Big data with its large datasets has become very good at predicting outcomes, and this feature is available within Power BI. Here we have a dataset of Tour de France average speeds and distances from the 1903 inaugural edition until 2016. In terms of size, this is a very small dataset. I want to estimate the average speeds from 2017 until 2021. Not only is the dataset small, but we have a couple of holes in it due to world wars. Until the 1930s, the race format was significantly different. It had far fewer, but longer stages, which significantly impacted the cyclists' speed, and the bikes didn't have gears back then.
First off, I'll create a line graph of just average speed. At the bottom of the analytics tab, we can see the forecast pane. I'll throw in a trendline before doing the forecasting. You can adjust four parameters in your forecast. Forecasting length is the number of data points you want to forecast into the future. The drop-down allows you to select time-series specific data points, but in this case, I'll stick with points as my points are years. My data ends in 2016, and I want to predict through to 2021, so I'll change the forecast length to five.
Whenever you change a parameter, you must click the apply button for the change to take effect. The confidence level relates to the shaded area around the predicted line. The higher the confidence, the greater the area. This means a 95% probability that actual future values will fall within this shaded area. If I reduce the confidence level to 75%, the area shrinks, meaning 25% of the time, future data points will fall outside the area and predictions.
Using the " Ignore the last " field, you can move the forecasted prediction back in time to include current data points. I want to start predicting from 2008 through to 2021, which is 13 years, and my current data ends in 2016. Then, I want to ignore the last eight points or years. At 75% confidence, the actual figures fall within the predicted range. Still, the predicted line doesn't take into account year-to-year fluctuations, and I would not expect it to with the small and very limited dataset.
If you bear with me, I'll move the starting point of the forecast back to the year 1999, the beginning of the Lance Armstrong era where he won seven Tour de Frances in a row but was later found to have been using performance-enhancing drugs. I'll need to change my confidence level to 80% to get the actual figures to fit within the predicted range. Now I'll update my dataset to include statistics up until 2021, and it initially looks as if the actual figures are further outside the estimate. However, adding additional data points has pushed the "ignore the last data points" out, so I need to change the 17 to 22 to set us back to 1999. I still need to have a confidence level set at 80% to have the actuals fall within the estimated range. If I start the forecasting in 2007, post Lance Armstrong's winning streak, the actual values fall comfortably within the estimated range at 75% confidence.
Hallam is a software architect with over 20 years experience across a wide range of industries. He began his software career as a Delphi/Interbase disciple but changed his allegiance to Microsoft with its deep and broad ecosystem. While Hallam has designed and crafted custom software utilizing web, mobile and desktop technologies, good quality reliable data is the key to a successful solution. The challenge of quickly turning data into useful information for digestion by humans and machines has led Hallam to specialize in database design and process automation. Showing customers how leverage new technology to change and improve their business processes is one of the key drivers keeping Hallam coming back to the keyboard.