Getting Data into Power BI
The course is part of this learning path
Getting Data into Power BI from Different Sources looks at connecting to data sources on-premise, in the Azure cloud, and Power platform data sources. The course is focused on how various data sources can impact your data model's performance, rather than looking at every single source that Power BI accommodates, as there are literally dozens and dozens of them. Apart from typical database and file data sources we look at Dataverse, a power platform repository and data flow an online ETL service.
- How to connect to a database and file data source on-premises and in the Azure cloud
- How to change the location of a data source without reimporting the data
- Look at the pros and cons of each Power BI data storage mode
- User Power Platform's Dataverse product as a data source
- How to and why you would use a Data Flow as a data source
- Setting up and connecting to an XMLA endpoint
This course is intended for anyone who wants to get data into Power BI.
- An understanding of Power BI desktop and power BI online
- A Power BI premium account for XMLA endpoint, one of the more specialized data connection topics
The default storage mode for Power BI is import. This means that a copy of the source data is pulled into Power BI and stored in the native Power BI format. Import mode not only enables the use of a wide variety of data sources, but the native file format and the cached nature of the data translate into blisteringly fast performance. While import mode is ideal, it isn't always practical, and there are size limitations related to import mode. Import mode file size needs to be less 1Gb if not using a Premium account. If using premium, the max is 10GB, or up to whatever your storage capacity can accommodate, if using Large Model Support. No matter the original data source, your data model is in the Power BI file format and can use all Power BI and DAX functionality once imported.
Direct query allows you to get around the size limitations of import mode. Direct query does pull in the schema from the data source but not the data. The importing procedure for direct query is essentially the same as for import mode apart from selecting the direct query radio button. As direct query data isn't stored in a Power BI data model, we don't have a data view.
Once imported into the data model view, the tables are color-coded. The small stripe across the top indicates the storage mode. I'll also import a table from another local database. As we've seen, there are a lot of data sources that Power BI can import from, but not all support direct query. The Microsoft site's Power BI data sources page details which sources support direct query and live connection. As a rule of thumb, only database-type products that support ODBC will support direct query.
When a report visualization requires data from more than one table in a direct query model, combining the data happens at the originating data source. This is the most efficient way for the query to work rather than pulling data from the involved tables and then combining them within Power BI. Combining direct query and imported tables in one data model is called a composite model. Suppose we have a visualization that combines data from a direct query table and an imported table. A query from Power BI sent to the original data source will not be an intersection of those two tables because all the data from the direct query table will need to be pulled into Power BI and then joined with the imported table.
This scenario completely negates the efficiencies of the Power BI architecture. The solution to this problem is dual-mode storage. Dual-mode will copy the data from the original source into Power BI, just like import mode. If the dual-mode table is involved in a join with a direct query table, then the result of that join will be pulled from the direct query source. We can change the storage mode in the advanced section of the table's properties. When I try to change the Region table to import mode, I get this warning. It's telling me three things. Firstly, a table in import mode cannot be changed to any other mode. Secondly, refreshing the data of an imported table will take longer than a direct query, which doesn't have to be refreshed. Thirdly, we will introduce weak relationships. This refers to the fact that you can no longer send a query to the original source asking for data from a join of both tables.
This is such a bad idea that Power BI will change the mode from import to dual and display that fact with a striped line on the table header when I try to do it. I have to do the process again to get the table into import mode—the weak relationship is shown by the gap in the line that joins the two tables. I'll set Region back to direct query mode and delete BranchTargets so that my data model comes from a single source. Notice the data view has disappeared now that there are no dual or imported tables in the model.
Once you've set up a data model from a single source, you can export what is essentially the connection string. Go to transform data and data source settings to bring up the model's data source. The export PBIDS button will create a small JSON file that other report authors can use to connect to the same data source. I'll save this one as MobileSales_DQ for direct query. I can then go to that file and double-click it to open Power BI desktop, where it immediately takes me to the table navigator section of the data import process. I'm prompted to choose between import and direct query, whether loading or transforming the data. Looking at the Power BI import data source file, PBIDS, we see that it is not too involved. The connections section is set up with square brackets denoting a JSON collection. Adding another connection manually and using the file will not work, and Power BI desktop will display an error message.
Earlier I mentioned live connections in the context of direct query. A Live Connection is direct query for Analysis services and Power BI data sets. The analysis service can be on-premises or Azure Analysis Services. Unlike direct query, Live connection doesn't even pull the schema, as all live connection data sources share the same underlying data architecture and engine – the Vertipaq engine. You can't make changes to the data model of a live connection as it is viewed as an external data source. When you connect to an existing Power BI dataset, that dataset becomes shared. That is, your reports are sharing the dataset with other reports. As you can see, transform data is disabled, there is no data view, and when a table is selected, you are unable to change any of its properties. You can add measures as they are part of the report and not the dataset, but you can't add calculated columns.
Regarding shared datasets, I can take this existing report that uses a live connection to the MobileSales dataset and change the source to a dataset with the same structure called PowerSales, which is deployed to the Getting Data workspace. After publishing the report, we can go to the PowerSales dataset and view lineage through the toolbar button in the dataset or via the context menu in the workspace. View lineage shows us the reports sharing the dataset, in addition to the dataset's source.
Hallam is a software architect with over 20 years experience across a wide range of industries. He began his software career as a Delphi/Interbase disciple but changed his allegiance to Microsoft with its deep and broad ecosystem. While Hallam has designed and crafted custom software utilizing web, mobile and desktop technologies, good quality reliable data is the key to a successful solution. The challenge of quickly turning data into useful information for digestion by humans and machines has led Hallam to specialize in database design and process automation. Showing customers how leverage new technology to change and improve their business processes is one of the key drivers keeping Hallam coming back to the keyboard.