1. Home
  2. Training Library

Preparing to Start the Using SQL in Azure Databricks to Answer Business Questions Exam

The resource is part of this learning path

Preparing to Start the Using SQL in Azure Databricks to Answer Business Questions Exam

This document prepares you to start the Using SQL in Azure Databricks to Answer Business Questions Exam. This exam requires the use of the Azure Databricks and Data Lake Storage Lab Playground to answer the questions in the exam. The following sections will guide you through preparing the lab playground environment for the exam.

 

Starting the Azure Databricks and Data Lake Storage Playground

Introduction

The Azure Databricks and Data Lake Storage Playground provides the Databricks cluster for you to run queries against for the exam. You will also use the built-in notebook feature for writing SQL queries. Databricks will query data that you will upload to the playground's Azure Data Lake Store (Gen 1) in an upcoming section. The following instructions ensure you have an operational playground.

 

Instructions

1. If you have not already done so, start the Azure Databricks and Data Lake Storage Playground.

 

2. Ensure that you have performed all the instructions in the lab playground so that you have a running Azure Databricks cluster and a notebook ready to use.

 

Summary

Your playground is now ready to be customized for the exam.

 

Uploading the Customer and Country Data to Azure Data Lake Store

Introduction

The exam asks you to answer questions about customer and country data using SQL. You will use Azure Data Lake Store as the data source for running queries with Databricks. In this Lab Step, you will upload data files containing the customer and country data to the Data Lake Store.

 

Instructions

1. Download the following sample data files to your computer by right-clicking the link and clicking Save link as...:

  1. Country data
  2. Customer data

 The files are in JSON format.

 

2. In the search bar at the top of the Azure Portal, enter data lake store. Click Data Lake Store Gen1 under SERVICES:

alt

The store is listed in the table:

alt

 

3. Click the name of the Data Lake Store to open its Overview blade:

alt

Note: You may see permissions errors saying you don't have specific permissions. If you do, disregard them. Seeing these messages is expected and simply means your access has been limited to the scope necessary for this Lab.

 

4. From the Data Lake Store Overview blade, click Data Explorer in the command bar:

alt

Warning: If you see an error message similar to the message in the following image, close the data explorer blade and wait a minute for the Lab environment to finish initializing:

alt

The environment completes initialization within three minutes of when you start the Lab.

 

5. Click Upload in the command bar:

alt

 

6. In the Upload Files blade, click Select a file and select both the countries and customers data files you previously downloaded:

alt

 

7. Click Add selected files to upload the files to the Data Lake Store.

It should only take a few seconds to upload.

 

8. Close the Upload files blade once you see a STATUS of Completed.

 

Summary

You have now uploaded the data files required for answering the questions on the Using SQL in Azure Databricks to Answer Business Questions exam.

 

Importing the Data into Databricks

Introduction

You are ready to import and interact with data in Databricks. You will now import data from Azure Data Lake Storage (ADLS), so you can analyze it with SQL in Databricks to answer the exam questions.

 

Instructions

1. In the Data Lake Storage dashboard, copy the name of the only Data Lake store available:

alt

 

2. In your Databricks notebook, enter the following command and press shift+enter to run it:

%fs ls adl://{YOUR DATA LAKE ACCOUNT NAME}.azuredatalakestore.net/

Replace {YOUR DATA LAKE ACCOUNT NAME} with the Data Lake Storage account name you copied. The resulting command will resemble:
alt

This command will list the files in the root of your data lake. Ensure the two JSON files are listed.

 

3. In the new command box, enter the following and press shift+enter to execute it:

%python
configs = {
  "fs.adl.oauth2.access.token.provider.type": "CustomAccessTokenProvider",
  "fs.adl.oauth2.access.token.custom.provider": spark.conf.get("spark.databricks.passthrough.adls.tokenProviderClassName")
}
dbutils.fs.mount(
source = "adl://{YOUR DATA LAKE ACCOUNT NAME}.azuredatalakestore.net/",
mount_point = "/mnt/datalake",
extra_configs = configs)

In just a few moments, you'll see a success output:

alt

What this command did was mount the ADLS account into a /mnt/datalake/ directory in your notebook. Now, you'll be able to more easily access your data.

Finally, you'll use a SQL command to import your data into a SQL database.

 

3. Enter the following into the new command box to create a customers SQL table for Databricks and press shift+enter to execute it:

%sql
CREATE TABLE customers
USING json
OPTIONS (
path "/mnt/datalake/customers.json"
)

 

4. Execute the following to create the countries SQL table for Databricks to use:

%sql
CREATE TABLE countries
USING json
OPTIONS (
path "/mnt/datalake/countries.json"
)

 

Summary

Your lab playground is now ready for you to use to answer the questions in the Using SQL in Azure Databricks to Answer Business Questions exam.