Using Azure Data Lake Store and Analytics
Azure Data Lake Store (ADLS) is a cloud-based repository for both structured and unstructured data. For example, you could use it to store everything from documents to images to social media streams.
ADLS is designed for big data analytics in a Hadoop environment. It is compatible with Hadoop Distributed File System (HDFS), so you can run your existing Hadoop jobs by simply telling them to use your Azure data lake as the filesystem.
Alternatively, you can use Azure Data Lake Analytics (ADLA) to do your big data processing tasks. It’s a service that automatically provisions resources to run processing jobs. You don’t have to figure out how big to make a cluster or remember to tear down the cluster when a job is finished. ADLA will take care of all of that for you. It is also simpler to use than Hadoop MapReduce, since it includes a language called U-SQL that brings together the benefits of SQL and C#.
In this course, you will follow hands-on examples to import data into ADLS, then secure, process, and export it. Finally, you will learn how to troubleshoot processing jobs and optimize I/O.
- Get data into and out of ADL Store
- Use the five layers of security to protect data in ADL Store
- Use ADL Analytics to process data in a data lake
- Troubleshoot errors in ADL Analytics jobs
- Anyone interested in Azure’s big data analytics services
- Database experience
- SQL experience (recommended)
- Microsoft Azure account recommended (sign up for free trial at https://azure.microsoft.com/free if you don’t have an account)
This Course Includes
- 37 minutes of high-definition video
- Many hands-on demos
The github repository for this course is at https://github.com/cloudacademy/azure-data-lake.
Why does Microsoft have two different Azure services for storing huge amounts of data? Why does it need to offer Data Lake Store in addition to SQL Data Warehouse? Well, they serve two different, but related, needs.
SQL Data Warehouse gives us a clue about its purpose in the name itself. It’s intended for SQL queries. That also implies that it stores data in structured, relational tables. If you have raw data that’s not in a nicely structured format, then you’ll probably need to process it before you store it in SQL Data Warehouse.
Data Lake Store, on the other hand, will store any kind of data, whether it’s structured or not. For example, you could store everything from documents to images to social media streams.
Data warehouses are generally used for business reporting, while data lakes are more often used for data analytics and exploration. In fact, one common setup is to process data in the data lake and then export it to the data warehouse.
The two services are designed to work with different types of software, too. SQL Data Warehouse is built on SQL Server, so it works well with that ecosystem of software. Data Lake Store, in contrast, is built to work with Hadoop. That’s because Hadoop excels at processing unstructured data.
One final difference is that SQL Data Warehouse is certified for compliance with over 20 standards, including HIPAA. Data Lake Store does not have regulatory compliance. This is another reason why it makes sense to use SQL Data Warehouse to serve data to a wider audience.
And that’s it for the overview.
Guy launched his first training website in 1995 and he's been helping people learn IT technologies ever since. He has been a sysadmin, instructor, sales engineer, IT manager, and entrepreneur. In his most recent venture, he founded and led a cloud-based training infrastructure company that provided virtual labs for some of the largest software vendors in the world. Guy’s passion is making complex technology easy to understand. His activities outside of work have included riding an elephant and skydiving (although not at the same time).