Messaging Services in Azure
In this course, we explore three Azure messaging services: Azure Event Grid, Azure Event Hubs, and Azure Service Bus, and what they are used for.
- Gain a good understanding of what these three Azure messaging services are, and how they are used.
- Those who wish to learn what Azure Event Grid, Azure Event Hubs, and Azure Service Bus are, and what they are used for.
- Basic familiarity with Azure
- Basic familiarity with distributed apps
Welcome to Azure Event Hubs. In this lesson, we’ll take a look at what Azure Event Hubs is, why you would use it, and we’ll highlight key Event Hubs architecture components.
Azure Event Hubs is a distributed stream processing platform and event ingestion service. It is designed to receive millions of events per second. Data that is sent to an Azure event hub can be transformed and stored with any real-time analytics provider or batching/storage adapters.
Event Hubs can be used in a wide range of scenarios, including anomaly detection, application logging, analytics pipelines, live dashboards, archiving data, transaction processing, and device telemetry streaming.
Data is only valuable when there is an easy way to process it and to get timely insights from data sources. Azure Event Hubs is a distributed stream processing platform that offers low latency and seamless integration with data and analytics services both inside of Azure and outside. It can be used to build a complete big data pipeline. Event Hubs is often referred to as an event ingestor and represents the "front door" for an event pipeline. An event ingestor sits between event publishers and event consumers. It decouples the creation of an event stream from the consumption of the events. Event Hubs is a unified streaming platform with a time retention buffer, decoupling event producers from event consumers.
Azure Event Hubs contains several key components that make up its architecture, including Event Producers, Partitions, Consumer Groups, Throughput Units, and Event Receivers.
Event Producers are entities that send data to an event hub. Event publishers can publish events using HTTPS, AMQP 1.0, or Apache Kafka (1.0 and above).
Partitions are used to divide the message stream into subsets, with each consumer only reading a specific partition.
A consumer group represents a view, state, position, or offset of an entire event hub. Consumer groups enable consuming applications to have their own views of the event stream, meaning they read the event stream at their own pace and with their own offsets.
Throughput units, also known as processing units in the premium tier or capacity units in the dedicated tier, are pre-purchased units of capacity that control the throughput capacity of Event Hubs.
Event Receivers are entities that read event data from an event hub. Event Hubs then delivers events through a session as they become available.
An Event Hubs Namespace is a management container for event hubs. What a Namespace does is provide DNS-integrated network endpoints, a range of access control, and network integration management features, like IP filtering, service endpoint, and Private Link.
To publish an event, an entity must send data to an event hub. An entity that sends data to an event hub is an event publisher or producer, and they can publish events using HTTPS or AMQP 1.0 or the Kafka protocol. To gain publishing access, event publishers use Azure Active Directory-based authorization with OAuth2-issued JWT tokens, or an Event Hub-specific Shared Access Signature token. Event Hubs offers REST API and client libraries for publishing events to an event hub. For other runtimes and platforms, any AMQP 1.0 client can be used. An entity can publish events individually or in batch mode.
Event Hubs throughput is scaled through the use of partitions and throughput-unit allocations. A partition is a commit log that holds event data. It contains the body of the event, a user-defined property bag that describes the event, certain metadata, its number in the stream sequence, and a service-side timestamp at which it was accepted.
Partitions enable the processing of large volumes of events, and partitioning helps with that in two ways. First, partitioning allows for multiple parallel logs to be used for the same event hub, multiplying the available raw IO throughput capacity. Second, partitions are how a solution feeds processing owners, ensuring that each event has a clear processing owner.
In Azure Event Hubs, a partition key is used to map incoming event data to specific partitions. This is done to organize the data. The partition key is a sender-supplied value that's passed into an event hub. It's then processed through a static hashing function, to create the partition assignment.
Specifying a partition key keeps related events together in the same partition, and in the same exact order in which they arrived. The partition key is a string that's derived from the application context. It identifies the interrelationship of the events. A sequence of events identified by a partition key is called a stream. A partition is a multiplexed log store for many such streams.
Using partition keys to group related events together can be beneficial for several reasons. First, it ensures that related events are stored together and delivered in order of arrival, which can be important for applications that require processing events in a specific order. Second, it can help balance the processing load across partitions by ensuring that related events are not spread across multiple partitions. Finally, it allows for more efficient querying and analysis of related events, as they can be easily retrieved by querying a single partition.
In general, it is recommended to use a partition key whenever possible in Azure Event Hubs, as it can help improve performance, reliability, and scalability of your event processing pipeline.
Event Hubs Capture allows you to automatically capture the streaming data in Event Hubs and save it to either a Blob storage account or an Azure Data Lake Storage account. Capture can be enabled from the Azure portal, and a minimum size and time window can be specified to perform the capture. Captured data is written in the Apache Avro format, and the files produced by Event Hubs Capture have an Avro schema.
So, to recap….
Event Hubs is a big data streaming platform and event ingestion service that provides low latency and seamless integration with data and analytics services inside and outside Azure to build a complete big data pipeline.
Event Hubs ensures reliable and ordered delivery of events by using partitions and partition keys to organize sequences of events, assigning unique offsets within each partition, and allowing consumers to specify offsets and use checkpointing to track their progress in the stream.
Additionally, Event Hubs Capture enables automatic capture of streaming data to Blob storage or Azure Data Lake Storage.
Tom is a 25+ year veteran of the IT industry, having worked in environments as large as 40k seats and as small as 50 seats. Throughout the course of a long an interesting career, he has built an in-depth skillset that spans numerous IT disciplines. Tom has designed and architected small, large, and global IT solutions.
In addition to the Cloud Platform and Infrastructure MCSE certification, Tom also carries several other Microsoft certifications. His ability to see things from a strategic perspective allows Tom to architect solutions that closely align with business needs.
In his spare time, Tom enjoys camping, fishing, and playing poker.