The course is part of these learning paths
Azure Stream Analytics (ASA) is Microsoft’s service for real-time data analytics. Some examples include stock trading analysis, fraud detection, embedded sensor analysis, and web clickstream analytics. Although these tasks could be performed in batch jobs once a day, they are much more valuable if they run in real time. For example, if you can detect credit card fraud immediately after it happens, then you are much more likely to prevent the credit card from being misused again.
Although you could run streaming analytics using Apache Spark or Storm on an HDInsight cluster, it’s much easier to use ASA. First, Stream Analytics manages all of the underlying resources. You only have to create a job, not manage a cluster. Second, ASA uses Stream Analytics Query Language, which is a variant of T-SQL. That means anyone who knows SQL will have a fairly easy time learning how to write jobs for Stream Analytics. That’s not the case with Spark or Storm.
In this course, you will follow hands-on examples to configure inputs, outputs, and queries in ASA jobs. This includes ingesting data from Event Hubs and writing results to Data Lake Store. You will also learn how to scale, monitor, and troubleshoot analytics jobs.
Learning Objectives
- Create and run a Stream Analytics job
- Use time windows to process streaming data
- Scale a Stream Analytics job
- Monitor and troubleshoot errors in Stream Analytics jobs
Intended Audience
- Anyone interested in Azure’s big data analytics services
Prerequisites
- SQL experience (recommended)
- Microsoft Azure account recommended (sign up for free trial at https://azure.microsoft.com/free if you don’t have an account)
This Course Includes
- 50 minutes of high-definition video
- Many hands-on demos
Resources
The github repository for this course is at https://github.com/cloudacademy/azure-stream-analytics.
I hope you enjoyed learning about Azure Stream Analytics. Let’s do a quick review of what you learned.
In a typical Stream Analytics architecture, you have applications, devices, and gateways that generate events. These data streams are funneled into either Events Hubs or IoT Hubs. Stream Analytics then performs a series of transformations on all of this data. The results are sent to a storage service, a real-time dashboard, or an automation service.
Tumbling windows divide data into non-overlapping windows. Hopping windows divide data into overlapping windows, with a fixed hop size. Sliding windows also divide data into overlapping windows, but with no fixed hop size. The data is divided into all possible unique windows of a given length.
A record’s timestamp is called System.Timestamp. You need to give it an alias, so you can refer to it in other parts of your query. Use TIMESTAMP BY to specify the timestamp column. If you don’t specify a timestamp column, then it will be when the data record arrived, such as when it arrived in Event Hub.
To monitor a job, use the job graphs to watch metrics in real-time, or set alerts to be notified if something unexpected happens.
To scale a big job, ideally it should be embarrassingly parallel, which means the whole job can be split into parallel tasks for multiple workers. To achieve this, there must be an equal number of partitions in the input, query, and output.
A streaming unit, or SU, represents a certain capacity of CPU, memory, and I/O. 6 SU represents the full capacity of a single computing node. If a job isn’t parallelizable, then try 6 SU times the number of steps in the query. If a job is parallelizable, then try 6 SU times the number of partitions. Also remember that the number of partitions should be evenly divisible by the number of nodes.
There are 3 common types of errors when running a Stream Analytics job. The first is connectivity issues with inputs or outputs. If this happens, use the Test feature to see it can connect to the input or output.
The second is issues with input data. To see what input data is being received by your job, change the query to “SELECT * FROM InputStream” and then run a test by sampling the data.
The third is issues with your query. To make it easier to diagnose, reduce it to a simpler query, test it, and then build it back up, testing at every step.
To learn more about Azure Stream Analytics, you can read Microsoft’s documentation. Also watch for new big data courses on Cloud Academy, because we’re always publishing new courses. If you have any questions or comments, please let us know by clicking the “Report an issue” button below. Thanks and keep on learning!
Guy launched his first training website in 1995 and he's been helping people learn IT technologies ever since. He has been a sysadmin, instructor, sales engineer, IT manager, and entrepreneur. In his most recent venture, he founded and led a cloud-based training infrastructure company that provided virtual labs for some of the largest software vendors in the world. Guy’s passion is making complex technology easy to understand. His activities outside of work have included riding an elephant and skydiving (although not at the same time).