Using Azure Stream Analytics
The course is part of these learning paths
Azure Stream Analytics (ASA) is Microsoft’s service for real-time data analytics. Some examples include stock trading analysis, fraud detection, embedded sensor analysis, and web clickstream analytics. Although these tasks could be performed in batch jobs once a day, they are much more valuable if they run in real time. For example, if you can detect credit card fraud immediately after it happens, then you are much more likely to prevent the credit card from being misused again.
Although you could run streaming analytics using Apache Spark or Storm on an HDInsight cluster, it’s much easier to use ASA. First, Stream Analytics manages all of the underlying resources. You only have to create a job, not manage a cluster. Second, ASA uses Stream Analytics Query Language, which is a variant of T-SQL. That means anyone who knows SQL will have a fairly easy time learning how to write jobs for Stream Analytics. That’s not the case with Spark or Storm.
In this course, you will follow hands-on examples to configure inputs, outputs, and queries in ASA jobs. This includes ingesting data from Event Hubs and writing results to Data Lake Store. You will also learn how to scale, monitor, and troubleshoot analytics jobs.
- Create and run a Stream Analytics job
- Use time windows to process streaming data
- Scale a Stream Analytics job
- Monitor and troubleshoot errors in Stream Analytics jobs
- Anyone interested in Azure’s big data analytics services
- SQL experience (recommended)
- Microsoft Azure account recommended (sign up for free trial at https://azure.microsoft.com/free if you don’t have an account)
This Course Includes
- 50 minutes of high-definition video
- Many hands-on demos
The github repository for this course is at https://github.com/cloudacademy/azure-stream-analytics.
When a job is running, you’ll often want to see how it’s doing. Are inputs and outputs flowing properly? Is it running efficiently? Are there any errors? If you go back into the fraud detection job, you can see the answers to these questions in graphical form at the bottom of the page.
The first graph shows the number of input events, output events, and errors over the past hour. You can see that there were more inputs than outputs, which is what you would expect, considering that we’re only outputting potentially fraudulent calls. There weren’t any errors, either, which is great.
The second graph shows the resource utilization. It peaked at 44%. This is the percentage utilization of the SUs, or Streaming Units. I’ll talk about this in more detail in the Scaling lesson.
You can also see other metrics by clicking on either of the graphs. By the way, if you don’t see any lines on your graph because it’s been longer than an hour since you ran this job, then you can change the time range to “Past 24 hours”, or longer, if necessary. You can also change this from a line graph to a bar graph, if you want.
The metrics you can choose are on the left. For example, if you check “Out of order Events”, you should see that there were lots of them. This is fairly normal because the events are sent from the call generator to the Event Hub to the Stream Analytics job, so there is plenty of opportunity for them to get out of order. If your use case can’t tolerate out-of-order events, then you can change the “Event ordering” configuration.
Notice that a couple of the metrics have their checkboxes greyed out. That’s because all of the metrics have to be of the same type if you’re going to display them on the same graph. All of the metrics other than these two show a count, whereas “SU % Utilization” shows a percentage and “Input Event Bytes” shows a number of bytes. If you uncheck all of the other metrics, then you’ll be able to check one of these two.
Let’s say you were most interested in seeing input, output, and out-of-order events, then you could save this customized graph and pin it to your dashboard. Now if you go to your dashboard, the graph is easily accessible.
But you don’t want to have to look at the graph all the time to see if the job is running properly, so instead, you can set up alerts to notify you if something unexpected happens. Let’s say you want to make sure that data is always streaming into the job. Suppose you expect that the longest period when there won’t be any inputs is about 10 minutes. We can create an alert that will notify us if that happens.
Click “Add metric alert”. Call it something like “No input for 10 minutes”. Under Metric, choose “Input Events”. For the Condition, choose “less than or equal to”, and set the Threshold to 0. Then set the Period to “Over the last 10 minutes”. So, we’re saying that if the number of input events is equal to 0 over the last 10 minutes, then send an alert.
Here, you can choose who to send the alert to, and how. You can email owners, contributors, and readers for this job. You can also specify additional emails. If you want to send the alert to an application, you can do that either using a webhook or an Azure Logic App. Click OK.
If you want to see a list of your alert rules, go back to the fraud detection job and select “Alert rules” from the Monitoring menu.
And that’s it for monitoring.
About the Author
Guy launched his first training website in 1995 and he's been helping people learn IT technologies ever since. He has been a sysadmin, instructor, sales engineer, IT manager, and entrepreneur. In his most recent venture, he founded and led a cloud-based training infrastructure company that provided virtual labs for some of the largest software vendors in the world. Guy’s passion is making complex technology easy to understand. His activities outside of work have included riding an elephant and skydiving (although not at the same time).