Data Flow Basics
Data Flow Components
Building a Dataflow with Azure Data Factory
The course is part of these learning pathsSee 3 more
In this course, we're going to review the features, concepts, and requirements that are necessary for designing data flows and how to implement them in Microsoft Azure. We’re also going to cover the basics of data flows, common data flow scenarios, and what all is involved in designing a typical data flow.
- Understand key components that are available in Azure that can be used to design and deploy data flows
- Know how the components fit together
This course is intended for IT professionals who are interested in earning Azure certification and for those who need to work with data flows in Azure.
To get the most from this course, you should have at least a basic understanding of data flows and what they are used for.
Welcome to Azure Databricks. In this lesson, I just want to quickly review what Azure Databricks is, and how it can fit into the dataflow.
Microsoft describes Azure Databricks as a data analytics platform that’s optimized for the Microsoft Azure cloud services platform. It offers three environments for developing data intensive applications, including Databricks SQL, Databricks Data Science & Engineering, and Databricks Machine Learning.
The Databricks SQL environment allows you to run SQL queries on against data lakes, and it allows you to create multiple visualization types that, in turn, allow you to explore query results from different perspectives. You can also use it to build and share dashboards.
Databricks Data Science & Engineering is an interactive workspace that you can use to facilitate collaboration between data engineers, data scientists, and machine learning engineers. In big data pipelines, data is typically ingested into Azure via Azure Data Factory. This data is sometimes ingested in batches, or it can also be streamed in almost real-time using Apache Kafka, Event Hub, or IoT Hub. The data that’s ingested then typically winds up in Azure Blob Storage or in Azure Data Lake Storage.
Azure Databricks, within the workflow, can read data from different data sources and turn that data into breakthrough insights, using Spark.
The third environment available in Databricks, Databricks Machine Learning, is a machine learning environment that incorporates allows for experiment tracking, model training, feature development, and other options.
Ultimately, you can use Azure Databricks to you perform ETL operations as part of your dataflow by extracting data from a source like Azure Data Lake Storage Gen2, pulling it into Azure Databricks, running the transformations on the data within Azure Databricks, and then loading the transformed data into Azure Synapse Analytics.
The image in your screen shows how this process works.
If you’d like to know more about the nuts and bolts of Azure Databricks, visit the URL that you see on your screen.
Tom is a 25+ year veteran of the IT industry, having worked in environments as large as 40k seats and as small as 50 seats. Throughout the course of a long an interesting career, he has built an in-depth skillset that spans numerous IT disciplines. Tom has designed and architected small, large, and global IT solutions.
In addition to the Cloud Platform and Infrastructure MCSE certification, Tom also carries several other Microsoft certifications. His ability to see things from a strategic perspective allows Tom to architect solutions that closely align with business needs.
In his spare time, Tom enjoys camping, fishing, and playing poker.