Getting Data into Power BI
The course is part of this learning path
Getting Data into Power BI from Different Sources looks at connecting to data sources on-premise, in the Azure cloud, and Power platform data sources. The course is focused on how various data sources can impact your data model's performance, rather than looking at every single source that Power BI accommodates, as there are literally dozens and dozens of them. Apart from typical database and file data sources we look at Dataverse, a power platform repository and data flow an online ETL service.
- How to connect to a database and file data source on-premises and in the Azure cloud
- How to change the location of a data source without reimporting the data
- Look at the pros and cons of each Power BI data storage mode
- User Power Platform's Dataverse product as a data source
- How to and why you would use a Data Flow as a data source
- Setting up and connecting to an XMLA endpoint
This course is intended for anyone who wants to get data into Power BI.
- An understanding of Power BI desktop and power BI online
- A Power BI premium account for XMLA endpoint, one of the more specialized data connection topics
Database and file resources
We'll start by importing data from two common sources; a relational database, SQL Server, and an Excel workbook, so a file-based source. Data sources change over time in terms of their location and format. Whether your data storage moves from file to database or from on-premise to Cloud, you need to easily change the data sources' location without having to rebuild the import process. We will look at changing a data source's location settings and how to make those settings dynamic using parameters.
Power Bi offers various storage modes from imported to direct query and live connections. We'll look at the pros and cons of each storage mode and the limitations that could impact your data model's performance. Dataverse is Power Platform's data repository that brings together many and varied businesses' data sources into one location. As Dataverse is part of the same product stable as Power Bi, so it deserves a special mention.
Data flow is an online data transformation and ETL service that enables you to separate the data model from the reporting function. Separating these two functions allow data modelers to do what they do best and makes sure report writers are using the correct version of source data. We end the course by looking at XMLA endpoints that enable the use of power Bi datasets by non-Power Bi client applications. Let's head on over to the computer and pull some data into Power Bi.
In Power BI desktop, there are common data source shortcut buttons on the toolbar, in addition to those in the middle of the splash report pane. More data services can be found by clicking the down arrow on the get data button on the toolbar and for a full list of supported data sources, select more from the bottom of the common data services drop-down menu. This will bring up the get data dialog where all data sources can be viewed in one list or via the category list on the left-hand side. The number of data sources that Power BI supports connecting to is nothing short of remarkable. All well-known and some less well-known database platforms are supported. Under Power platform, you can connect to Power BI datasets, data flows, and a Dataverse. We shall be looking at some of these a little bit later. There is an extensive array of Azure products and services that you can use as data sources as well as online services from Adobe, Google, Salesforce, and many more.
I'm going to start by connecting to an Azure SQL database. This is very similar to connecting to a standard SQL Server. However, it does try to connect using your windows credentials initially, and as I'm not logged in to an Azure active directory domain, this fails. I can go with an SQL Server login or use my Microsoft account. I will use the server login I've created when I set up the Azure database. Having authenticated, we get the navigator dialog that displays the database server and the MobileSales database, the only one on the server. I'll select the tables I want to import into my Power BI model. This will take a little while to import as there are over 1.6 million rows in the SalesSummary fact table. I'll quickly rearrange the tables in the data model view before importing sales targets from an Excel spreadsheet. I'll use the Excel workbook button on the toolbar and choose the file from my local hard drive folder. The Excel import treats the workbook as a database and the sheets as tables.
I'll select sheet one and hit transform instead of load as I want to check the data types and possibly make some modifications before bringing the targets into my data model. Transform will open Power Query Editor, where you can set up transformations that will eventually become part of your data model load process each time the model is refreshed. I'll rename sheet one to branch targets. When working in Power Query Editor, it's very important that you hit close and apply or apply for any changes you've made to take effect.
We can see branch targets have been added to the data model, and Power BI has intuited the one-to-many relationship with the branch table. This is a completely new model, and when I save it, a PBIX file is created that contains all of the data from all of the sources that have gone into the data model. Importing data into a local PBIX file is Power BI's default behavior and storage mode. The PBIX file is the native format of the Vertipaq database engine, which is highly optimized for small size and fast performance.
Before moving on, I want to look at loading an Excel file from Azure blob storage. Under Get Data and more, go down to Azure and select Azure blob storage. I'll enter my account name, which is powerbiexcelfiles, and click OK. As I've already authenticated, I'll type the account name incorrectly to show you the account key prompt. The storage account key can be found in access keys under security and networking in your Azure storage account. Now I connect using the correct and previously authenticated storage account. I've only got one container in the storage account called Branchtargets, so I'll select that. You should click the transform data button because clicking load at this stage will just give you the metadata related to your container.
Transform data will open Power Query Editor, where we click on the combine files button on the right of the content column header. Now we have a dialogue similar to the standard Excel import where we can select the sheet from the workbook. I want to draw your attention to the skip files with errors checkbox at the bottom left. This will ignore files that don't share the same layout or schema as the selected sample file. Loading all files in a container that matches a specified layout is useful. If you've ever been in a situation, as I have, where you have to load data from files from a particular folder, and the file names hold information like their origin and date, this is a great timesaver. Once Branchtargets are loaded, we can see the source name column, which means you don't have to go through the hassle of extracting the file name as you loop through the files if you are doing this with an SSIS package, for example.
Hallam is a software architect with over 20 years experience across a wide range of industries. He began his software career as a Delphi/Interbase disciple but changed his allegiance to Microsoft with its deep and broad ecosystem. While Hallam has designed and crafted custom software utilizing web, mobile and desktop technologies, good quality reliable data is the key to a successful solution. The challenge of quickly turning data into useful information for digestion by humans and machines has led Hallam to specialize in database design and process automation. Showing customers how leverage new technology to change and improve their business processes is one of the key drivers keeping Hallam coming back to the keyboard.