In this course, we're going to review the features, concepts, and requirements that are necessary for designing data flows and how to implement them in Microsoft Azure. We’re also going to cover the basics of data flows, common data flow scenarios, and what all is involved in designing a typical data flow.
Learning Objectives
- Understand key components that are available in Azure that can be used to design and deploy data flows
- Know how the components fit together
Intended Audience
This course is intended for IT professionals who are interested in earning Azure certification and for those who need to work with data flows in Azure.
Prerequisites
To get the most from this course, you should have at least a basic understanding of data flows and what they are used for.
Let’s talk about what a modern data warehouse looks like. What a modern data warehouse does is allow you to easily bring together all of your data at scale. You can then obtain insight into this data via analytical dashboards, operational reports, or even advanced analytics.
The image on your screen shows what a typical modern data warehouse looks like.
- We can see here that structured, unstructured, and semi-structured data, like logs, files, and media is combined into Azure Data Lake Storage, using Azure Data Factory.
- The data in Azure storage can be leveraged to perform scalable analytics, using Azure Databricks. This results in cleansed and transformed data.
- This cleansed and transformed data is then moved to Azure Synapse Analytics, where it can be combined with existing structured data. This essentially creates a single hub for all of the data. The native connectors between Azure Databricks and Azure Synapse Analytics can be used to access the data and to move the data at scale.
- You can then build things like operational reports and analytical dashboards to gather insights from the data. Azure Analysis Services can be used to serve your end users.
- You can even run ad hoc queries directly against the data right inside Azure Databricks.
In this diagram here, Azure Synapse Analytics is the cloud data warehouse that you can use to scale, compute, and store independently, using its massively parallel processing architecture. Azure Data Factory is a hybrid data integration service. What this does is allow you to create, schedule, and orchestrate ETL and ELT workflows.
The Azure Data Lake Storage depicted here resides in Azure Blob storage. It’s a massively scalable object storage offering for all kinds of unstructured data. Azure Databricks is an Apache Spark-based analytics platform, while Azure Analysis Services is an analytics-as-a-service offering that you can use to govern, deploy, test, and deliver BI solutions.
Power BI is a suite of business analytics tools that can connect to hundreds of data sources. You can use Power BI to not only simplify data prep but to also perform ad hoc analysis. It’s really good at producing slick reports that can be published and then accessed on the web and across mobile devices.
So, as you can see, there are quite a few pieces that make up a modern data warehouse. Hopefully, you have an idea of how they fit together now.
Tom is a 25+ year veteran of the IT industry, having worked in environments as large as 40k seats and as small as 50 seats. Throughout the course of a long an interesting career, he has built an in-depth skillset that spans numerous IT disciplines. Tom has designed and architected small, large, and global IT solutions.
In addition to the Cloud Platform and Infrastructure MCSE certification, Tom also carries several other Microsoft certifications. His ability to see things from a strategic perspective allows Tom to architect solutions that closely align with business needs.
In his spare time, Tom enjoys camping, fishing, and playing poker.