The Google Cloud Operations suite (formerly Stackdriver) includes a wide variety of tools to help you monitor and debug your GCP-hosted applications. This course will give you hands-on demonstrations of how to use the Monitoring, Logging, Error Reporting, Trace, and Profiler components of the Cloud Operations suite. You can follow along with your own GCP account to try these examples yourself.
If you have any feedback relating to this course, feel free to reach out to us at support@cloudacademy.com.
Learning Objectives
- Use the Cloud Operations suite to monitor, log, report on errors, trace, and profile
Intended Audience
- System administrators
- People who are preparing to take the Google Associate Cloud Engineer certification exam
Prerequisites
- Overview of Google Cloud Platform course or experience with Google Cloud Platform
Resources
- The GitHub repository for this course is at https://github.com/cloudacademy/google-cloud-ops.
What can you do if your application is working properly but performing too slowly? You can use Cloud Trace and Cloud Profiler. Cloud Trace shows you the latency of each application request. That is, it tells you how long each request takes.
The Trace List is probably where you will spend most of your time. It shows you all of the traces over a specific period of time in this cool graph. It is set to “1 hour” right now, but we can change that to give a longer view. Each one of these dots is a trace of an individual request to the application. If you click on one of the dots, it brings up two more panes underneath. The Waterfall View shows what happened during the request. The first bar shows the total end-to-end time, which was 215 milliseconds in this case. The bars underneath show the time it took to complete calls performed when handling the request. In this case, we have one bar for an HTTP GET request.
Of course, this timeline would be a lot more useful if we were running a more complex application with multiple calls so you could see which ones were taking the most time. Each of those calls would have a bar on this chart. The Hello World application is about the simplest application possible, so you’ll just have to use your imagination here.
Analysis reports show you the latency distribution for your application and also attempt to identify performance bottlenecks, which is a great feature. You have to have at least 100 traces before you can run a report, though.
If you’re running your applications in App Engine, then it’ll automatically capture and submit traces, but if you want to trace code that’s running outside of App Engine, then you’ll have to add instrumentation code to your applications using an open-source library that supports the Cloud Trace API, such as OpenTelemetry or OpenCensus.
Cloud Trace shows you which requests take the longest to run. Once you’ve determined which requests might need to be optimized, you can use Cloud Profiler to see which parts of the code for those requests are using the most CPU and memory.
To use Cloud Profiler, you have to add instrumentation code to your application even if it’s running in App Engine. Google has provided a sample application called shakesapp that includes this instrumentation. It’s written in the Go language. Here’s what it looks like in Cloud Profiler. This is called a flame graph, and it can be a bit confusing until you know how it works.
Since CPU time is selected, the bars represent the CPU time taken by each function. I ran the application seven times, so these results show the average of those seven runs. The first bar is for the entire application, which took about 13 seconds of CPU time, on average.
The bars underneath are color-coded according to the package they’re in. Most of these functions are part of the standard libraries for the Go language. The ones that are part of the actual application, shakesapp, are dark green in this graph. The first one just calls the second one, so the second bar is the one that matters. It calls a Go language function called MatchString. This single function takes up 58% of the CPU time for this application, so we might want to see if there’s a more efficient way to perform this operation.
And that’s it for tracing and profiling.
Guy launched his first training website in 1995 and he's been helping people learn IT technologies ever since. He has been a sysadmin, instructor, sales engineer, IT manager, and entrepreneur. In his most recent venture, he founded and led a cloud-based training infrastructure company that provided virtual labs for some of the largest software vendors in the world. Guy’s passion is making complex technology easy to understand. His activities outside of work have included riding an elephant and skydiving (although not at the same time).