Azure App Service in action
In the real world, there are generally two possibilities for hosting your web applications or APIs in the cloud. You can follow the classic model a...Learn More
We’ve previously discussed Azure Data Lake and Azure Data Lake Store. That post should provide you with a good foundation for understanding Azure Data Lake Analytics – a very new part of the Data Lake portfolio that allows you to apply analytics to the data you already have in Azure Data Lake Store or Azure Blog storage.
According to Microsoft, Azure Data Lake Analytics lets you:
Azure Data Lake Analytics allows users to focus on code and analytics logic without having to worry about the intricacies of hardware set up, management, and operation in a distributed environment. Data Lake Analytics works with various Azure data sources such as Azure Blob storage, and Azure SQL database. However, using Azure Data Lake Analytics with data kept in Azure Data Lake Store provides the most optimized performance for big data workloads. An image from Microsoft Azure beautifully represents the various technologies that combine to make Data Lake work:
(Image courtesy: Microsoft Azure)
In basic terms, here are the steps for setting up an Azure Data Lake Analytics operation:
Before getting started, it’s good to be aware of these details:
Along with Data Lake, Microsoft introduced Azure U-SQL. In Microsoft’s own words:
Azure Data Lake Analytics includes U-SQL, a language that unifies the benefits of SQL with the expressive power of your own code. U-SQL’s scalable distributed query capability enables you to efficiently analyze data in the store and across relational stores such as Azure SQL Database.
U-SQL combines the power of SQL and C# with a high-abstraction of parallelism and distributed programming. U-SQL processes any kind and any size of data. Unlike Hive, which uses SQL-like syntax (HQL) and will only work with structured data, U-SQL works with any kind of data: structured and unstructured.
A U-SQL query might look like this:
@Result = SELECT emp_id, city, COUNT(*) AS NumberOfEmployees FROM @Employees GROUP BY dept, city ORDER BY NumberOfEmployees DESC, dept, city FETCH FIRST 10 ROWS;
Look at the query. SELECT, COUNT, FROM, GROUP BY, ORDER BY, etc., certainly use SQL syntax, but the data types follow the C# format.
What does @Employees mean? It turns out to be:
@Employees = EXTRACT emp_id int , name string , city string , salary int , country string , phone_numbers int FROM @INPUT_EMPLOYEESS USING Extractors.Text(delimiter : '\t', quoting: true, encoding : Encoding.Unicode);
The rowset @Employees is being extracted from a file using Extractors.Text. But you can also use Outputters to convert the result into any desired format, like CSV.
The above example shows how U-SQL, in its simplest form, can:
Readers might notice the similarity between Pig Latin script and U-SQL. Like Pig scripts, each U-SQL expression is assigned to a variable which is used in further processing. You can also deploy and register the code as an assembly in a U-SQL metadata catalog. This allows you – or anyone else – to reuse the code in future scripts. You will need to use REFERENCE ASSEMBLY <U-SQL_script_name>.
The power of U-SQL goes beyond the simple query given above. The U-SQL can also handle:
The Azure Data Lake Analytics query service is currently in preview and its pricing model will change after release. But before that, we need to understand what an Analytics Unit (AU) and completed jobs are.
Other standard charges like transactions and data transfer are excluded from Analytics pricing:
Each Azure Data Lake Analytics account has configurable quotas limiting the number of AUs that can be assigned to jobs and the number of concurrent jobs. However, you can increase the quota by contacting Microsoft.
Azure Data Lake analysis is an exciting space to explore and execute big data technologies. Data Lake technologies are built for the cloud and employ Microsoft’s user-friendly and simple approach to technology.
The DevOps principle of feedback calls for business, application, and infrastructure telemetry. While telemetry is important for engineers when debugging production issues or setting base operational conditions, it is also important to product owners and business stakeholders because it...
This week on Cloud Academy, we’ve added new learning paths and hands-on labs in networking, serverless, big data, storage, and other cloud services that you need to know about in AWS, Azure, and Google Cloud Platform.Learning PathsAWS Network Specialty Certification ExamAdvanced...
The availability of so much data is one of the greatest gifts of our day. But how does this impact a business when it’s transitioning to the cloud? Will your historic on-premise data be a hindrance if you’re looking to move to the cloud? What is Azure Data Factory? Is it possible to enr...
What is Amazon Machine Learning and how does it work"Amazon Machine Learning is a service that makes it easy for developers of all skill levels to use machine learning technology.”UPDATES: I've published a new hands-on lab on Cloud Academy! You can give it a try for free and st...
Azure Machine Learning: A Cloud-based Predictive Analytics ServiceLast week I wrote about using AWS's Machine Learning tool to build your models from an open dataset. Since then, feeling I needed more control over what happens under the hood - in particular as far as which kind of mod...