Working With Azure Databricks
Monitoring and Optimization
Azure Data Lake Storage Gen2 (ADLS) is a cloud-based repository for both structured and unstructured data. For example, you could use it to store everything from documents to images to social media streams.
Data Lake Storage Gen2 is built on top of Blob Storage. This gives you the best of both worlds. Blob Storage provides great features like high availability and lifecycle management at a very low cost. Data Lake Storage provides additional features, including hierarchical storage, fine-grained security, and compatibility with Hadoop.
The most effective way to do big data processing on Azure is to store your data in ADLS and then process it using Spark (which is essentially a faster version of Hadoop) on Azure Databricks.
In this course, you will follow hands-on examples to import data into ADLS and then securely access it and analyze it using Azure Databricks. You will also learn how to monitor and optimize your Data Lake Storage.
- Get data into Azure Data Lake Storage (ADLS)
- Use six layers of security to protect data in ADLS
- Use Azure Databricks to process data in ADLS
- Monitor and optimize the performance of your data lakes
- Anyone interested in Azure’s big data analytics services
- Experience with Azure Databricks
- Microsoft Azure account recommended (sign up for free trial at https://azure.microsoft.com/free if you don’t have an account)
The GitHub repository for this course is at https://github.com/cloudacademy/azure-data-lake-gen2.
Welcome to “Using Azure Data Lake Storage Gen2”. My name’s Guy Hummel, and I’m a Microsoft Certified Azure Solutions Architect and Data Engineer. If you have any questions, feel free to connect with me on LinkedIn and send me a message, or send an email to firstname.lastname@example.org.
This course is intended for anyone who’s interested in Azure’s big data analytics services.
Since this course shows how to use Azure Databricks to perform analytics on Azure Data Lake Storage, you should have some experience with Azure Databricks. If you’re not familiar with it, then I recommend taking our Running Spark on Azure Databricks course first.
The best way to learn is by doing, so I recommend that you try performing these tasks yourself on your own Azure account. If you don’t already have one, then you can create a free trial account.
To save you the trouble of typing in the URLs and commands shown in this course, I’ve put them in a readme file in a GitHub repository. You can find a link to the repository at the bottom of the Overview tab below this video.
By the end of this course, you should be able to get data into and out of Azure Data Lake Storage (ADLS); use six layers of security to protect data in ADLS; use Azure Databricks to process data in ADLS; and monitor and optimize the performance of your data lakes.
We’d love to get your feedback on this course, so please give it a rating when you’re finished.
Now, if you’re ready to learn how to get the most out of Azure Data Lake Storage Gen2, then let’s get started.
Guy launched his first training website in 1995 and he's been helping people learn IT technologies ever since. He has been a sysadmin, instructor, sales engineer, IT manager, and entrepreneur. In his most recent venture, he founded and led a cloud-based training infrastructure company that provided virtual labs for some of the largest software vendors in the world. Guy’s passion is making complex technology easy to understand. His activities outside of work have included riding an elephant and skydiving (although not at the same time).