Azure Databricks is an analytics platform powered by Apache Spark. Spark is a unified analytics engine capable of working with virtually every major database, data caching service, and data warehouse provider.
However, Spark clusters in Databricks also support Scala, since Apache Spark is built on Scala. Scala is a high-level programming language that combines aspects of both functional and object-oriented programming to form a concise language that is especially useful in an environment like Databricks. Using Databricks's built-in support for data analytics with Scala's ability to efficiently interact with resources in a customizable way gives companies a high level of control over their data and analytics.
In this lab, you'll use Scala in an Azure Databricks cluster to interact with Azure Data Lake Storage (ADLS), including ingesting, transforming, and writing data to the store.
Upon completion of this lab you will be able to:
This lab is intended for:
You should be familiar with:
October 24th, 2022 - Updated the instructions and screenshots to reflect the latest UI
Nov 3rd, 2021 - Updated instruction to resolve the login issue with Azure Databricks
October 23rd, 2021 - Provide a workaround for an Azure Active Directory issue that initially prevents logging in to Databricks
July 2nd, 2020 - Updated "Mounting ADLS onto Azure Databricks" lab step to reflect the actual output of the ls
command
Matt has worked for multiple Fortune 500 companies as a DevOps Engineer and Solutions Architect. He is an AWS Certified DevOps Engineer - Professional, and an AWS Certified Solution Architect - Associate. He enjoys reading and learning new technologies.