How to Use Pub/Sub
The course is part of this learning path
Organizing and responding to events in a decentralized system (such as with microservices or IoT) can be a challenge. In this course, you will learn how to use Google Cloud Pub/Sub to create a reliable, asynchronous, messaging service for any scale.
- Understand what Cloud Pub/Sub is
- How to send and receive messages
- What the typical use cases are
- How to get started
- GCP Developers
- GCP Data Engineers
- Anyone preparing for a Google Cloud certification (such as the Professional Data Engineer exam)
- Access to a Google Cloud Platform account is recommended
In this section, I'm going to describe how Cloud Pub/Sub actually works. As the name implies Pub/Sub supports the publisher/subscription model. This means, the main ways to use it are, number one, publishing messages to a topic. Number two, subscribing to a topic. Number three, receiving messages from a topic. I'm going to go through each of these terms in detail. A message simply refers to the data that is going to be exchanged between services. The message data is stored as a text string or a byte string, and it can be formatted however you want. It can be as simple as a name or a number, or it can be more complicated like a JSON object. Typically, messages represent some sort of change event. It's commonly used to convey things like a field has changed in a database or this task has been completed.
Next, let's define what a topic is. A topic is basically a queue where related messages can be added to and read from. You typically have multiple topics and you can think of each topic as representing a different message category. Messages are assigned to topics and you pick which topics you want to receive messages from. In this way, you can control which messages you will receive. So you can create as many different topics as you need.
Generally, you want them specific enough that you will only get the messages you're interested in, but you don't want to have too many topics either. You need to consider the types of messages that will be sent as well as the needs of the applications that will be reading them. A good example would be a new user topic. Whenever a new user account is created, a message would be added to the topic containing the user ID. Any service that needed to do something when a user is added could be notified. So you might have a system that sends out a confirmation email. Another system might add the user to a mailing list. All these systems can use the new user topic to be notified when a new user was added and to know which user it was.
Now, this gives you a lot of flexibility. Without Pub/Sub you would probably need to update your registration code every time you added or removed a service. But now, you just need your new service to read messages from the new user topic. Plus, removing a service is even more simple. Just delete the service and that's it. Topics allow you to make significant changes to one system without disrupting any others. An empty topic isn't very useful. In order to use it, you need to add some messages. This is called publishing.
A publisher is a service that adds new messages to a topic. Topics can have a single publisher, or they can have multiple. Any application that can make HTTPS requests to googleapis.com can be a publisher. Before you can start reading the messages from a topic, you have to be made aware that they exist. This is called subscribing.
A subscriber is a service that wishes to receive messages from one or more topics. In order to accomplish this, each subscriber needs a subscription. A subscriber can subscribe to multiple topics, but each topic needs to have a separate subscription. You cannot use the same subscription for multiple topics. If you want to subscribe to three topics, you need three subscriptions. So subscriptions could only have one topic, but they can have multiple subscribers. This is important to note. Each subscription will only receive one copy of a message. That means if one subscription has five subscribers, then only one of those subscribers will get the message. If instead you wanted all five subscribers to each get their own copy of every message, then they should not share the same subscription. Instead, each subscribers should have its own separate subscription.
It's very important you understand exactly how this works. If you have three subscribers that each do something different, then you should create a separate subscription for each. Otherwise, each message will only be delivered to one of the subscribers at random. And you'll only be able to complete a third of your tasks. What if you have 10 subscribers and all 10 of those do the same task? Well, in this case, you'd want to have all 10 subscribers share a single subscription. That way, only one of the subscribers works on each message. If they each have their own subscription, then each task would be repeated 10 times. Make sure you understand which behavior is desired when you create a subscription.
Also, you should be aware that you will only receive messages that were published after a subscription was created. Creating a subscription does not give you access to every single message ever published to a topic. If you create your subscription today, you won't get any messages from yesterday. Also, it means any message published to a topic with zero subscribers is automatically discarded.
Now, you can receive messages from your subscribed topics in one of two ways. The subscriber can manually pull the latest messages through a direct request, or the subscriber can request that any new messages be pushed to a provided endpoint. When you create a new subscription, you have to specify the type. With a pull subscription, you can control how, and when you're notified of new messages. A good example would be a batch job that wakes up, gets the latest updates, and then does a bunch of processing. After it's done processing, it goes to sleep for a while. It then repeats this process over and over again.
A push subscription means that the service doesn't have to constantly ask for updates. Instead, the topic will notify you about every new message as they come in. It does this by sending a notification to a provided endpoint. So, when you create a push subscription, you also have to specify an endpoint that can accept notifications. With a push subscription, you're processing new messages as they are published.
A good example here would be an auto-scaling serverless function that spawns a new instance for every new message that comes in. A subscriber can be any application that can make HTTPS requests to googleapis.com. A push subscriber also needs to provide a valid endpoint that can accept HTTPS POST requests. There's one additional term you should also be familiar with. After receiving a list of new messages, the subscriber needs to acknowledge or ACK each message. This is what ensures that the same messages will not be delivered.
For pull subscriptions, all unacknowledged messages will be delivered on a pull request. It doesn't matter if they were sent in the last pull or not. If a message wasn't acknowledged, then it assumed it needs to be resent. For push subscriptions, it works in a similar fashion. The topic will keep trying to resend unacknowledged messages over and over again, until they're finally acknowledged. There is a certain waiting period between each attempt and eventually, if the subscriber does not respond, the notification will time out. But essentially, the topic will keep notifying you over and over again, until you confirm receipt. This is called at-once-delivery.
All Pub/Sub messages will be delivered at least once, but possibly more. Duplicates usually happen when a subscriber fails to properly acknowledge a message in a timely fashion. Keep in mind that you will occasionally get multiple copies of some messages. Now there are different ways of dealing with duplicates, but that's a more advanced topic. And so I'm not gonna cover that here.
This acknowledgement feature allows your system to be more resilient and to handle failures gracefully. If a message is delivered, but the subscriber crashes before acknowledgement, you won't lose anything. Once the subscriber has recovered, the topic will redeliver any unacknowledged messages.
Daniel began his career as a Software Engineer, focusing mostly on web and mobile development. After twenty years of dealing with insufficient training and fragmented documentation, he decided to use his extensive experience to help the next generation of engineers.
Daniel has spent his most recent years designing and running technical classes for both Amazon and Microsoft. Today at Cloud Academy, he is working on building out an extensive Google Cloud training library.
When he isn’t working or tinkering in his home lab, Daniel enjoys BBQing, target shooting, and watching classic movies.