hands-on lab

Working with Scala in Azure Databricks

Beginner
Up to 1h 15m
323
4.2/5
Get guided in a real environmentPractice with a step-by-step scenario in a real, provisioned environment.
Learn and validateUse validations to check your solutions every step of the way.
See resultsTrack your knowledge and monitor your progress.
Lab description

Azure Databricks is an analytics platform powered by Apache Spark. Spark is a unified analytics engine capable of working with virtually every major database, data caching service, and data warehouse provider.

However, Spark clusters in Databricks also support Scala, since Apache Spark is built on Scala. Scala is a high-level programming language that combines aspects of both functional and object-oriented programming to form a concise language that is especially useful in an environment like Databricks. Using Databricks's built-in support for data analytics with Scala's ability to efficiently interact with resources in a customizable way gives companies a high level of control over their data and analytics.

In this lab, you'll use Scala in an Azure Databricks cluster to interact with Azure Data Lake Storage Gen2, including ingesting, transforming, and writing data to the store.

Learning Objectives

Upon completion of this lab you will be able to:

  • Load data into Azure Data Lake Storage Gen2
  • Create and manage a Databricks workspace
  • Create and manage a Databricks cluster
  • Use Scala to manage folders and write data to ADLS Gen2
  • Use Scala to create DataFrames from data in ADLS Gen2

Intended Audience

This lab is intended for:

  • Azure administrators
  • Cloud engineers and solutions architects
  • Data engineers
  • Anyone with a need to visualize and analyze data in Azure

Prerequisites

You should be familiar with:

Updates

March 1st, 2024 - Migrated to Azure Data Lake Storage Gen2

October 24th, 2022 - Updated the instructions and screenshots to reflect the latest UI

Nov 3rd, 2021 - Updated instruction to resolve the login issue with Azure Databricks

October 23rd, 2021 - Provide a workaround for an Azure Active Directory issue that initially prevents logging in to Databricks

July 2nd, 2020 - Updated "Mounting ADLS onto Azure Databricks" lab step to reflect the actual output of the ls command

Environment before
Environment after
About the author
Avatar
Matt Martinez, opens in a new tab
Cloud Content & Labs QA
Students
108,397
Labs
41
Learning paths
9

Matt has worked for multiple Fortune 500 companies as a DevOps Engineer and Solutions Architect. He is an AWS Certified DevOps Engineer - Professional, and an AWS Certified Solution Architect - Associate. He enjoys reading and learning new technologies.

Covered topics
Lab steps
Logging in to the Microsoft Azure Portal
Creating an Azure Databricks Workspace
Creating a Spark Cluster and Scala Notebook in Azure Databricks
Mounting ADLS Gen2 onto Azure Databricks
Working with Folders in ADLS using Scala on Databricks
Importing Multiple Files into Azure Data Lake Storage Gen2
Using Scala on Azure Databricks to Create Data Frames and Write to ADLS