Using Azure Stream Analytics
Azure Stream Analytics (ASA) is Microsoft’s service for real-time data analytics. Some examples include stock trading analysis, fraud detection, embedded sensor analysis, and web clickstream analytics. Although these tasks could be performed in batch jobs once a day, they are much more valuable if they run in real time. For example, if you can detect credit card fraud immediately after it happens, then you are much more likely to prevent the credit card from being misused again.
Although you could run streaming analytics using Apache Spark or Storm on an HDInsight cluster, it’s much easier to use ASA. First, Stream Analytics manages all of the underlying resources. You only have to create a job, not manage a cluster. Second, ASA uses Stream Analytics Query Language, which is a variant of T-SQL. That means anyone who knows SQL will have a fairly easy time learning how to write jobs for Stream Analytics. That’s not the case with Spark or Storm.
In this course, you will follow hands-on examples to configure inputs, outputs, and queries in ASA jobs. This includes ingesting data from Event Hubs and writing results to Data Lake Store. You will also learn how to scale, monitor, and troubleshoot analytics jobs.
- Create and run a Stream Analytics job
- Use time windows to process streaming data
- Scale a Stream Analytics job
- Monitor and troubleshoot errors in Stream Analytics jobs
- Anyone interested in Azure’s big data analytics services
- SQL experience (recommended)
- Microsoft Azure account recommended (sign up for free trial at https://azure.microsoft.com/free if you don’t have an account)
This Course Includes
- 50 minutes of high-definition video
- Many hands-on demos
The github repository for this course is at https://github.com/cloudacademy/azure-stream-analytics.
To show you how to create and run a job, I’ll take you through one of Microsoft’s fictional scenarios.
Suppose that a company has automated its manufacturing and now the machinery in its factories have sensors that emit data in real time. Let’s say that you want to have a continuously updated list of sensors that have measured a temperature greater than 100 degrees Fahrenheit over a 30-second period.
You would likely create an architecture that looks like this. The sensors would send their data to Event Hubs or IoT Hubs, which would then feed the data to a Stream Analytics job for processing. To view the results easily, you could use Power BI.
Since you’re just getting started with Stream Analytics, let’s make this much simpler. The sensor data will come from a file and we won’t be outputting the results to anything outside of Stream Analytics.
First, you’ll need to download the GitHub repository for this course. The URL is at the bottom of the course overview below. Then click the “Clone or download” button and then click “Download ZIP”. When it’s done downloading, unzip the folder.
OK, now we need to create a Stream Analytics job, so go to the Azure portal and click "Create a resource", then “Data + Analytics”, then “Stream Analytics job”. Call it “iot-job”. For the resource group, I’ll use one I created before called “examplerg”. If you don’t already have a resource group you can use, then create one.
For the location, choose the same region where your resource group resides. For the hosting environment, if you choose “Edge”, then it will deploy to an on-premises IoT Gateway Edge device. That’s not what we’re doing, so leave it on “Cloud”. Then check “Pin to dashboard” and click Create.
It’s deploying now. While it’s doing that, let’s have a look at the data file. Go into the Samples folder and then the GettingStarted folder. It’s in JSON format. Each entry has a timestamp when the sensor reading was taken, the name of the sensor, the temperature, and the humidity. The temperature for this one is 123. Must be in the middle of a desert or something.
OK, the deployment is done. Normally, the first thing you would do is specify your inputs, but since we’re going to use a sample data file instead of a real streaming input, we’ll specify that in a different place.
Here, it has provided a basic template for a query. It kind of looks like you can edit it in this box, but you can’t. You have to click “Edit query” to do that. The simplest query we could do would be to just read in the data. To do that, replace this with a name for the output, such as just “output”. Then replace this with a name for the input, such as “InputStream”. You can use a different input name if you want, but it will create more work for you later on, so please call it “InputStream”.
Now you can tell it where to get the sample data. Click the three dots over here and select “Upload sample data from file”. Then click the folder icon and go into the GitHub repository you downloaded. Then go back into Samples, Getting Started, and select the JSON file. Click OK. Now click Test.
It should only take 5 or 10 seconds. OK, it’s done. The output is down here and it’s formatted nicely as a table.
Alright, let’s run a more useful query. Go back to the GettingStarted folder and open the Filtering text file. This query selects all four columns again, but this time it renames the last three with more readable names. You’ll notice that this query uses the name “InputStream” for the input just like we did in the last job. If you named it something different, then you’ll either have to change the name here to what you used last time or you’ll have to upload the sample data again and associate it with this new input name. Since I used the same name, I won’t have to do either of those things.
Then, at the end, there’s a WHERE clause that says to filter out everything except for records from sensorA.
Copy and paste everything into the query box. Then click Test again. This time, it’s only showing records for sensorA, which is what we wanted.
Now that you know how to run a job, go to the next video to find out how to divide the data into 30-second periods. Stay on this Azure Portal page because we’re going to use it again in the next lesson.
Guy launched his first training website in 1995 and he's been helping people learn IT technologies ever since. He has been a sysadmin, instructor, sales engineer, IT manager, and entrepreneur. In his most recent venture, he founded and led a cloud-based training infrastructure company that provided virtual labs for some of the largest software vendors in the world. Guy’s passion is making complex technology easy to understand. His activities outside of work have included riding an elephant and skydiving (although not at the same time).