Using Azure Data Lake Store and Analytics
Azure Data Lake Store (ADLS) is a cloud-based repository for both structured and unstructured data. For example, you could use it to store everything from documents to images to social media streams.
ADLS is designed for big data analytics in a Hadoop environment. It is compatible with Hadoop Distributed File System (HDFS), so you can run your existing Hadoop jobs by simply telling them to use your Azure data lake as the filesystem.
Alternatively, you can use Azure Data Lake Analytics (ADLA) to do your big data processing tasks. It’s a service that automatically provisions resources to run processing jobs. You don’t have to figure out how big to make a cluster or remember to tear down the cluster when a job is finished. ADLA will take care of all of that for you. It is also simpler to use than Hadoop MapReduce, since it includes a language called U-SQL that brings together the benefits of SQL and C#.
In this course, you will follow hands-on examples to import data into ADLS, then secure, process, and export it. Finally, you will learn how to troubleshoot processing jobs and optimize I/O.
- Get data into and out of ADL Store
- Use the five layers of security to protect data in ADL Store
- Use ADL Analytics to process data in a data lake
- Troubleshoot errors in ADL Analytics jobs
- Anyone interested in Azure’s big data analytics services
- Database experience
- SQL experience (recommended)
- Microsoft Azure account recommended (sign up for free trial at https://azure.microsoft.com/free if you don’t have an account)
This Course Includes
- 37 minutes of high-definition video
- Many hands-on demos
The github repository for this course is at https://github.com/cloudacademy/azure-data-lake.
Welcome to the “Introduction to Azure Data Lake Store and Analytics” course. My name’s Guy Hummel and I’ll be showing you how to get started with Microsoft’s storage service for big data analytics. I’m a Research Lead at Cloud Academy and I have over 10 years of experience with cloud technologies. If you have any questions, feel free to connect with me on LinkedIn and send me a message, or send an email to email@example.com.
This course is intended for anyone who’s interested in Azure’s big data analytics services.
To get the most from this course, you should have some experience with databases. It would also be helpful to have some familiarity with writing queries using SQL, but it’s not a requirement. The best way to learn is by doing, so I recommend that you try performing these tasks yourself on your own Azure account. If you don’t already have one, then you can create a free trial account.
To save you the trouble of typing in the URLs and commands shown in this course, I’ve put them in a readme file in a github repository. You can find a link to the repository at the bottom of the “About this course” tab below this video.
We’ll start with an overview of how Azure Data Lake Store (ADLS) is different from Azure SQL Data Warehouse (ASDW). Then I’ll show you how to get data into a data lake.
Next, we’ll look at the five layers of security you can use to protect your data.
After that, I’ll show you how to process your data using Azure Data Lake Analytics (ADLA) and the U-SQL language.
Then we’ll go over the most common problems when running processing jobs and how to troubleshoot them.
Finally, I’ll give you a brief overview of how to optimize data ingestion and I/O-intensive workloads.
By the end of this course, you should be able to get data into and out of ADL Store; use the five layers of security to protect data in ADL Store; use ADL Analytics to process data in a data lake; and troubleshoot errors in ADL Analytics jobs.
We’d love to get your feedback on this course, so please let us know what you think on the Comments tab below or by emailing firstname.lastname@example.org.
Now, if you’re ready to learn how to get the most out of Azure Data Lake Store and Analytics, then let’s get started.
About the Author
Guy launched his first training website in 1995 and he's been helping people learn IT technologies ever since. He has been a sysadmin, instructor, sales engineer, IT manager, and entrepreneur. In his most recent venture, he founded and led a cloud-based training infrastructure company that provided virtual labs for some of the largest software vendors in the world. Guy’s passion is making complex technology easy to understand. His activities outside of work have included riding an elephant and skydiving (although not at the same time).