image
Azure Databricks
Start course
Difficulty
Intermediate
Duration
1h 5m
Students
2197
Ratings
4.6/5
Description

In this course, we're going to review the features, concepts, and requirements that are necessary for designing data flows and how to implement them in Microsoft Azure. We’re also going to cover the basics of data flows, common data flow scenarios, and what all is involved in designing a typical data flow.

Learning Objectives

  • Understand key components that are available in Azure that can be used to design and deploy data flows
  • Know how the components fit together

Intended Audience

This course is intended for IT professionals who are interested in earning Azure certification and for those who need to work with data flows in Azure.

Prerequisites 

To get the most from this course, you should have at least a basic understanding of data flows and what they are used for.

Transcript

Welcome to Azure Databricks. In this lesson, I just want to quickly review what Azure Databricks is, and how it can fit into the dataflow.

Microsoft describes Azure Databricks as a data analytics platform that’s optimized for the Microsoft Azure cloud services platform. It offers three environments for developing data intensive applications, including Databricks SQL, Databricks Data Science & Engineering, and Databricks Machine Learning.

The Databricks SQL environment allows you to run SQL queries on against data lakes, and it allows you to create multiple visualization types that, in turn, allow you to explore query results from different perspectives. You can also use it to build and share dashboards.

Databricks Data Science & Engineering is an interactive workspace that you can use to facilitate collaboration between data engineers, data scientists, and machine learning engineers. In big data pipelines, data is typically ingested into Azure via Azure Data Factory. This data is sometimes ingested in batches, or it can also be streamed in almost real-time using Apache Kafka, Event Hub, or IoT Hub. The data that’s ingested then typically winds up in Azure Blob Storage or in Azure Data Lake Storage. 

Azure Databricks, within the workflow, can read data from different data sources and turn that data into breakthrough insights, using Spark.

The third environment available in Databricks, Databricks Machine Learning, is a machine learning environment that incorporates allows for experiment tracking, model training, feature development, and other options. 

Ultimately, you can use Azure Databricks to you perform ETL operations as part of your dataflow by extracting data from a source like Azure Data Lake Storage Gen2, pulling it into Azure Databricks, running the transformations on the data within Azure Databricks, and then loading the transformed data into Azure Synapse Analytics. 

The image in your screen shows how this process works.

 

If you’d like to know more about the nuts and bolts of Azure Databricks, visit the URL that you see on your screen.

About the Author
Students
90913
Courses
89
Learning Paths
56

Tom is a 25+ year veteran of the IT industry, having worked in environments as large as 40k seats and as small as 50 seats. Throughout the course of a long an interesting career, he has built an in-depth skillset that spans numerous IT disciplines. Tom has designed and architected small, large, and global IT solutions.

In addition to the Cloud Platform and Infrastructure MCSE certification, Tom also carries several other Microsoft certifications. His ability to see things from a strategic perspective allows Tom to architect solutions that closely align with business needs.

In his spare time, Tom enjoys camping, fishing, and playing poker.