Big Data Analytics with IoT Device Data
PREVIEW16m 24s

This course looks at how to deal with analytics at scale, specifically how to empower your big data analytics with real-time data from IoT devices using Alibaba Cloud. This course also includes a demo of real-time IoT data acquisition, storage processing, and visualization.

Learning Objectives

  • Learn how big data and IoT technologies impact enterprises, and what is the key trends driving these changes
  • Learn about the Alibaba Cloud's IoT big data offering
  • Understand the architectural principles and best practices for designing analytics solutions with IoT data

Intended Audience

This course is intended for anyone looking to use Alibaba Cloud IoT services to power their big data analytics.


To get the most out of this course, you should have a foundational knowledge of Alibaba Cloud and IoT concepts.


The last section of my webinar will be a quick demo. So what are we gonna do is we're gonna demonstrate how to develop an end-to-end real time IoT data acquisition solutions. As a data source, we're gonna use data simulated by the KEPServerEX, and we're expecting to design a working serverless solution, and demonstrate the process of the time IoT data acquisition, storage, processing, and visualization. The demo process will be straightforward.

We start from the simulation data generation. Then we'll ingest the simulation data to the cloud. We're gonna store our data process and store, clean data, and visualize performance report. Here is the architecture of our demo. So as I told you, the data will come from the simulator in the form of MQTT messages that will be received via IoT hub. IoT hub will transfer this MQTT messages into more readable form of the data products. And first stop that we're gonna see is, we're gonna forward the data to the table store, that we can leverage a key value structure of no SQL storage.

Also, the data will be sent to the function of compute to introduce the serverless product that is available at Alibaba Cloud, where we can apply very simple bison code in order to parse the data. Then the parsed data will be transferred to the DataHub. DataHub will forward the message to the StreamCompute for the real time transformation, and then receive it back from this StreamCompute and load it to OSS for permanent storage. At the same time, we'll send it to the MaxCompute for the further batch processing and visualization at QuickBI. The optional step will be log service in order to ensure that all logs are well received and processed by the StreamComputer in real time.

So we're gonna start from the data injection. When the KEPServerEX will send the MQTT messages, here you can see the result of the processed data. And then we're gonna send the data from the IoT hub into the function compute, where are we gonna parse it. And then we're gonna store it. The next step will be data storage. We're gonna use again, the IoT Hub for the key values storage, where we can separate the timestamp and values that we receive from the KEPServerEX. The other data, the clean data will be permanently stored at OSS.

For the data processing, we're gonna use the real time data processing when we connect DataHub and StreamCompute. And also we're gonna process the data in the batch where we can transfer the clean data from the DataHub into MaxCompute. Lastly, we're gonna do data visualization when MaxCompute generate the reporting data for the QuickBI. And also optionally, we're gonna demonstrate how log service can help you to check the quality of the IoT data that we receive from the simulated KEPServerEX.

So let's get started from the demo. I'll move to the Alibaba cloud console. Here is the console, the UI of IoT platform. I already open it at the products where you can see the name of our product. I call it Demo IoT data. Here, detailed product information in several functions that you can perform using IoT platform. Like you can receive a notification, you can define a feature, you can service subscription, you can do even pricing the data, and you can see a device log and align debugging. The next things that we need to study at the IoT platform is how can we define the rules in order to forward the messages into the cloud?

So let's have a look. Here, we define the rule using SQL for the data processing. And also we define it to Data Forwarding rules. One is that data will be sent to the function compute for the parsing. And another rule will be to send the data to the table store as to store it in the raw format. Let's start to check the data in a table store where are we save the data in the key values form. Here, you can see the timestamp, which is the primary key in our case. It is a unique timestamp recording the time when the message was generated, and the value that come together from KEPServer. Here, we can see raw details, and you can see this structure of the message that we receive from different devices that was sent to send random data to the cloud.

As you can see, the format of the data is not yet ready for the further data processing at data intelligence products of Alibaba Cloud. So the next stop we'll get FunctionCompute, where we need to parse the data. Here, at the FunctionCompute, we can use a various type of the programming languages. But for this particular demonstration, we choose to write the simple code in Python in order to parse it, and then send it for the further processing data intelligence products.

So we, now we can see the data that come from function compute in kind of, in the form that will be transformed using the StreamCompute. So let's check what kind of shards we have received from the FunctionCompute and how it's different from a raw format that we so at the table store. Here's the schema of the shard that we have. So here we have the shard ID, system time, timestamp, ID company code, ID protocol name, ID system code, ID tag name, value, quality, and the time. Time that these particular device generate this message.

Let's check how we can transform this data and match with our reporting requirements that we're gonna leverage at the MaxCompute. Here is the interface of the StreamCompute. Unfortunately, because my demo environment said is in mainland China, the default interface is Chinese. So I'll try to guide you through. So the most important part of this demo is we can check the overall processing structure for this StreamCompute.

So here you can see that we're gonna receive the extract result table from DataHub. We're gonna do very simple data transformation, and then forward the transformed data result into DataHub again for storage at OSS, and the further processing in the batch at the MaxCompute. Also, I said that StreamCompute supports SQL in order to do simple transformation. Here is several examples how we'll use the StreamCompute for this demo. So we create several DDL statements, in order to create the tables. And also very simple transformation queries that we can run through the StreamCompute. Let's see how the data will looks like after we complete this transformation.

We can go back to the DataHub and see transformed result table. Again, we can all open the sample of the shards data, and we can see that the schema of the data that we have after the transformation is different. So we simplify our schema for reporting purposes that we're gonna do using MaxCompute. We remain the timestamp, we have a tag ID, instead of complicated parameters of the devices that send the data. We have the stream data for the value, quality and time. Let's move to the OSS where we store the data permanently.

Here, you can see different folders that is created following the three shards that we saw at the StreamCompute. Also the transformed data will be transferred to the MaxCompute for the further analysis. Here, you can see the data schema that we received from StreamCompute where we have a timestamp, tag ID, value, quality and time. Partition by the days, hours and minutes. We also can use DataWorks as the ID in order to do simple SQL queries. In this example, we are creating the table. Then MaxCompute using DataWorks can be connected to QuickBI for very simple this void development.

Here, how this void might looks like we choose one type of the tag. We have the timestamp when we received the data, and we also have the value. So the operators who is monitoring the performance of this particular device, or a sensor can follow up with this performance using the historical information. Also for convenience or the visualization, we can extend the monitor time at the QuickBI. Also for business operators, we can have the preview mode and even create very simple reporting proposal and provide the access to the business users.

The last part of the demo will be how can we track the logs that is allotted to the log service and also to our demo. So let's check what kind of information, what kind of data is generated by different types of the tag ID. So we can see the distribution, the equal distribution of the data that we're receiving from the tag IDs. Which means all the devices generate correct data. And we're not losing any logs during the data processing from the data injections into the data consumptions.

Also, if we want to specify particular information from the particular device, we can just click on this particular tag ID, and then we're gonna receive the records of the data that we receive only from these tag ID. This was the quick demonstration of how can you build a very comprehensive solutions for processing IoT data at Alibaba Cloud. At the end of the session, I would highlight these three takeaways. The first is Alibaba Cloud provides advanced IoT platform that can help you to establish the end to end process from the collection of the data, from the devices, and delivering into the cloud for further analysis.

Also being one of the biggest cloud provider. We empower our customers with the powerful, big data and AI capabilities for various type of the data processing, analysis and consumptions. Then lastly is we're not only providing the IoT platform and the big data and AI capabilities, but we also provide a comprehensive cloud computing offering that can help you to build very comprehensive end to end industrial applications for your IoT use case.

So I think at this point, we can wrap up our session. Thank you again for joining us today, and I hope you found this webinar useful for you. And if you have further questions, please don't hesitate to get in touch with us, and we'll try to answer your questions or perhaps even support on your digital transformation journey. Thanks again, and have a great day.

About the Author
Learning Paths

Alibaba Cloud, founded in 2009, is a global leader in cloud computing and artificial intelligence, providing services to thousands of enterprises, developers, and governments organizations in more than 200 countries and regions. Committed to the success of its customers, Alibaba Cloud provides reliable and secure cloud computing and data processing capabilities as a part of its online solutions.