The Google Cloud Operations suite (formerly Stackdriver) includes a wide variety of tools to help you monitor and debug your GCP-hosted applications. This course will give you hands-on demonstrations of how to use the Monitoring, Logging, Error Reporting, Trace, and Profiler components of the Cloud Operations suite. You can follow along with your own GCP account to try these examples yourself.
If you have any feedback relating to this course, feel free to reach out to us at firstname.lastname@example.org.
- Use the Cloud Operations suite to monitor, log, report on errors, trace, and profile
- System administrators
- People who are preparing to take the Google Associate Cloud Engineer certification exam
- Overview of Google Cloud Platform course or experience with Google Cloud Platform
- The GitHub repository for this course is at https://github.com/cloudacademy/google-cloud-ops.
Looking at real-time monitoring is great, but there will be many times when you’ll want to look at what happened in the past. In other words, you need logs.
For example, suppose I wanted to see when a VM instance was shut down. Compute Engine, like almost every other Google Cloud Platform service, writes to the Cloud Audit Logs. These logs keep track of who did what, where, and when.
There are four types of audit logs. Admin Activity logs track any actions that modify a resource. This includes everything from shutting down VMs to modifying permissions.
System Event logs track Google Cloud’s actions on resources. Some examples are maintenance of an underlying virtual machine host and reclaiming a spot VM.
Data Access logs are pretty self-explanatory. They track data requests. Note that this also includes read requests on configurations and metadata. Since these logs can grow very quickly, they’re disabled by default. One exception is BigQuery Data Access logs, which are not only enabled by default, it’s not even possible to disable them. Fortunately, you won’t get charged for them, though.
Policy Denied logs have an entry for each time a user or service account was denied access to a resource because of a security policy violation.
Okay, now let’s see if we can find out when a particular VM instance was shut down. In the console, search for “Logging”.
There are lots of options for filtering what you see here. You can look at the logs for particular resources, such as your firewall rules, projects, or uptime checks.
In this case, we need to look at the VM instance logs. You can choose all instances or a specific one. I’d like to look at the logs for a VM called instance-1.
Here you can choose which logs you want from the instance. The one we want is the activity log in the Cloud Audit logs section because it’ll include entries for when the VM was stopped. Since I installed the Ops Agent on this instance, it also sends its operating system logs (that is, its syslog entries) to Cloud Logging, and since syslog includes entries for when the operating system was shut down, we could look there as well, but let’s just stick with the activity log for now.
You can also filter by severity level, and for example, only look at entries with a severity of critical or higher. I’ll leave it unset, so it’ll show all severity levels.
Finally, you can change how far back it will look for log entries. I’ll set it to the last day.
Okay, now I’ll search for any entries that contain the word “stop” so I can see if this instance was shut down in the last day. These entries show when the instance was stopped.
If you need to do really serious log analysis, then you can export the logs to BigQuery, which is Google’s data warehouse and analytics service. Before you can do that, you need to have the right permissions to export the logs. If you are the project owner then, of course, you have permission. If you’re not, then you’ll have to ask a project owner to give you the Logs Configuration Writer role.
Before I export the logs, I’m going to clear the search field and also check syslog so there’ll be more entries to export.
Okay, first, go into the “More actions” menu, and select “Create Sink”. A sink is a place where you want to send your data. Give your sink a name, such as “example-sink”. Click “Next”. For the sink service, there are quite a few options, such as BigQuery, Cloud Storage, or Cloud Pub/Sub. We’ll choose BigQuery.
Now you need to choose a BigQuery dataset to receive the logs. If you don’t have one already, then click “Create new BigQuery dataset”. Give it a name, such as “example_dataset”. Note that I used an underscore instead of a dash because dashes are not allowed in BigQuery dataset names. Click “Create Dataset”. Now click the “Create Sink” button.
It says the sink was created, so let’s jump over to BigQuery and see what’s there. Hmmm. It created our example dataset, but it doesn’t contain any tables, which means it doesn’t have any data. That’s weird, right? Well, it’s because when you set up a sink, it only starts exporting log entries that were made after the sink was created.
OK, then let’s generate some more log entries and see if they get exported. I’ll restart the VM, which will generate lots of log entries. Okay, I’ve restarted it. Now if we go back to the Logging page, do we see the new messages? Yes, we do.
Now let’s go back to BigQuery and see if the data’s there. You might have to refresh the browser. Yes, there are two tables there now: one for the Cloud Audit activity log and one for syslog. Click on the syslog table.
Let’s search for “shutdown” this time instead of “stop”. Click “Query”, “In new tab”. To do a search in BigQuery, you need to use SQL statements. Thankfully, it already gave me the skeleton of a SQL statement. I just need to fill in what I’m selecting. I’ll put in an asterisk to select everything, but I’ll restrict it by using a WHERE clause with the column name “jsonPayload.message” (which is the column that contains the text in the log entry)...”LIKE '%shutdown%'”. The percent signs are wildcards, so this SQL statement says to find any log entries that have the word “shutdown” in them somewhere.
Now we click the “Run” button...and it returns the matching log entries. If we scroll to the right, then we can see the jsonPayload.message field, and it does indeed contain the word “shutdown” in each of the entries.
Of course, we did a very similar search on the Logging page, and it was way easier, so why would we want to go through all of this hassle of exporting to BigQuery and writing SQL statements? Well, because sometimes you may need to search through a huge number of log entries and need to do complicated queries. BigQuery is lightning fast when searching through big data, and if you build a complex infrastructure in Google Cloud Platform, then the volume of log data it will generate will easily qualify as big data.
And that’s it for logging.
Guy launched his first training website in 1995 and he's been helping people learn IT technologies ever since. He has been a sysadmin, instructor, sales engineer, IT manager, and entrepreneur. In his most recent venture, he founded and led a cloud-based training infrastructure company that provided virtual labs for some of the largest software vendors in the world. Guy’s passion is making complex technology easy to understand. His activities outside of work have included riding an elephant and skydiving (although not at the same time).