In this course, we're going to review the features, concepts, and requirements that are necessary for designing data flows and how to implement them in Microsoft Azure. We’re also going to cover the basics of data flows, common data flow scenarios, and what all is involved in designing a typical data flow.
- Understand key components that are available in Azure that can be used to design and deploy data flows
- Know how the components fit together
This course is intended for IT professionals who are interested in earning Azure certification and for those who need to work with data flows in Azure.
To get the most from this course, you should have at least a basic understanding of data flows and what they are used for.
As far as big data goes, unorganized raw data is going to often be stored in multiple storage systems, including relational and non-relational systems. The raw data, on its own, doesn’t have much context, nor does it offer any meaningful insight to be analyzed. As such, it requires services to orchestrate and operationalize processes that can refine that data and convert it useful business information. Azure Data Factory is an Azure offering that's built for complex hybrid ETL, ELT, and data integration projects.
To illustrate the usefulness of azure data factory, consider a casino that collects petabytes and petabytes of game logs from its slot machines on the floor. The casino needs to analyze these logs in order to gain insight into customer game play, customer demographics, and other useful information. The casino wants to identify reward offers that match the players and develop new games that drive growth and provide a better experience to its players.
In order to analyze its gaming logs, the casino needs to leverage data such as customer info, game selection info, and marketing campaign info that is hosted in their on-premises data store. The casino wants to utilize this on-prem data, combining it with additional information that it has in a cloud data store.
To extract meaningful information from this data, the casino needs to process the joined data by using a Spark cluster, such as Azure HDInsight, and then it wants to publish the transformed data to a cloud data warehouse. Doing so will allow the casino to build a report on top of it. The casino wants to automate the workflow, monitoring and managing it on a daily schedule. The entire data flow process needs to kick off when files land in a blob store container.
For this example, Azure Data Factory can be leveraged, because it’s a cloud-based data integration service. It allows organizations to create data-driven workflows in the cloud for orchestrating and automating data movement and for data transformation.
By using leveraging Azure Data Factory, the casino can create and schedule pipelines, or data-driven workflows, that can ingest data from different data stores. Azure Data Factory can process and transform the casino’s data by using several different compute services, including Azure HDInsight Hadoop, Spark, Azure Data Lake Analytics, and even Azure Machine Learning.
The output data can then be published to a data store, where BI applications can consume it. When all is said and done, Azure Data Factory allows the casino to organize their raw data into meaningful data stores and data lakes for better business decisions.
Tom is a 25+ year veteran of the IT industry, having worked in environments as large as 40k seats and as small as 50 seats. Throughout the course of a long an interesting career, he has built an in-depth skillset that spans numerous IT disciplines. Tom has designed and architected small, large, and global IT solutions.
In addition to the Cloud Platform and Infrastructure MCSE certification, Tom also carries several other Microsoft certifications. His ability to see things from a strategic perspective allows Tom to architect solutions that closely align with business needs.
In his spare time, Tom enjoys camping, fishing, and playing poker.