Getting Data into Power BI
The course is part of this learning path
Getting Data into Power BI from Different Sources looks at connecting to data sources on-premise, in the Azure cloud, and Power platform data sources. The course is focused on how various data sources can impact your data model's performance, rather than looking at every single source that Power BI accommodates, as there are literally dozens and dozens of them. Apart from typical database and file data sources we look at Dataverse, a power platform repository and data flow an online ETL service.
- How to connect to a database and file data source on-premises and in the Azure cloud
- How to change the location of a data source without reimporting the data
- Look at the pros and cons of each Power BI data storage mode
- User Power Platform's Dataverse product as a data source
- How to and why you would use a Data Flow as a data source
- Setting up and connecting to an XMLA endpoint
This course is intended for anyone who wants to get data into Power BI.
- An understanding of Power BI desktop and power BI online
- A Power BI premium account for XMLA endpoint, one of the more specialized data connection topics
Database and file resources
Before we jump into creating and using a data flow, I briefly want to discuss why you would use one. The reasons are twofold. A data flow is a series of transformations, as in ETL processes, that result in a dataset designed for reporting. This results in organizational-wide timesaving as report authors don't have to reinvent the wheel by performing the same transformations on the original data. It also means that the transformations are consistent as the data flow ensures one version of the truth. A data flow pulls together information from various sources, puts it through the necessary transformations, and makes the end product of the flow available through Azure data Lake gen 2 storage.
Let's go through a simple example of creating a data flow that ingests an Excel workbook of sales targets from Azure blob storage and splits it into two gross sales and margin targets tables. We set up a data flow through the Power BI.com portal. I'm going to create the data flow in my GettingData workspace. Go to your workspace and create a new data flow. I'm going to start by adding new tables. As with Power BI desktop, you have a vast array of data source options because we are using online Power Query editor.
My target data source, excuse the pun, is stored in an Azure blob container. After entering the storage account name, I'll choose account key as my authentication method and paste in the key from the Azure portal. After clicking next, I'm presented with a typical data import navigation window. Having selected the Branchtargets container, all the files are stored within are displayed with the metadata. This window is for selecting a container within your storage account, and as I have only one container, there is nothing to do here except click transform data. The next window shows just the Branchtargets container in the Power Query editor that we are familiar with. If you have lots of files in your container, you can use column filtering, like the file extension, to select the files you're interested in.
I'm only interested in the first file, so I'll access its contents by clicking on the binary value within the content column. We are not quite there yet, as now we have to select the sheet within the workbook. There is only one sheet in this case, so that's easy. There is one small gotcha here. The data within the Excel sheet needs to be defined as a table, which involves selecting the data in Excel and using the control + T shortcut. Clicking on table within the data column finally gets us what we are looking for. I'm going to create two tables by duplicating branch targets and renaming it to SalesTargets and renaming the copy to MarginTargets. Next, I'll filter out the rows I don't want for each target type. Margin targets target type is M, and sales targets target type is S. In each case, I'll use the advanced option of the filter row's function. Let's save and close and give the data flow a name.
Now that we've got a data flow, let's use it as a source within Power BI desktop. Under get data, I'll select Power BI data flows. I'm prompted to authenticate before connecting, which I'll do. The data source navigator displays workspaces containing data flows, of which I only have one - GettingData. Within that, there is the Targets data flow, containing margin and sales targets. From here on in, you can treat the table as you would from any other source with the caveat that data retrieval is comparatively sluggish and slow. Well, it was for me.
Hallam is a software architect with over 20 years experience across a wide range of industries. He began his software career as a Delphi/Interbase disciple but changed his allegiance to Microsoft with its deep and broad ecosystem. While Hallam has designed and crafted custom software utilizing web, mobile and desktop technologies, good quality reliable data is the key to a successful solution. The challenge of quickly turning data into useful information for digestion by humans and machines has led Hallam to specialize in database design and process automation. Showing customers how leverage new technology to change and improve their business processes is one of the key drivers keeping Hallam coming back to the keyboard.