Tracing and Profiling
Start course
1h 13m

Once you have implemented your application infrastructure on Google Cloud Platform, you will need to maintain it. Although you can set up Google Cloud to automate many operations tasks, you will still need to monitor, test, manage, and troubleshoot it over time to make sure your systems are running properly.

This course will walk you through these maintenance tasks and give you hands-on demonstrations of how to perform them. You can follow along with your own GCP account to try these examples yourself.

Learning Objectives

  • Use the Cloud Operations suite to monitor, log, report on errors, trace, and debug
  • Ensure your infrastructure can handle higher loads, failures, and cyber-attacks by performing load, resilience, and penetration tests
  • Manage your data using lifecycle management and migration from outside sources
  • Troubleshoot SSH errors, instance startup failures, and network traffic dropping

Intended Audience

  • System administrators
  • People who are preparing to take the Google Professional Cloud Architect certification exam






In the last lesson, we looked at how to debug errors in your application, but what do you do if your application is working properly but performing too slowly? That’s what Cloud Trace and Cloud Profiler are used for.  Cloud Trace shows you the latency of each application request. That is, it tells you how long each request takes.

The Trace List is probably where you will spend most of your time. It shows you all of the traces over a specific period of time in this cool graph. It is set to “1 hour” right now, but we can change that to give a longer view. Each one of these dots is a trace of an individual request to the application. If you click on one of the dots, it brings up two more panes underneath. The Waterfall View shows what happened during the request. The first bar shows the total end-to-end time, which was 215 milliseconds in this case. The bars underneath show the time it took to complete calls performed when handling the request. In this case, we have one bar for an HTTP GET request.

Of course, this timeline would be a lot more useful if we were running a more complex application with multiple calls so you could see which ones were taking the most time. Each of those calls would have a bar on this chart. The Hello World application is about the simplest application possible, so you’ll just have to use your imagination here.

Analysis reports show you the latency distribution for your application and also attempt to identify performance bottlenecks, which is a great feature. You have to have at least 100 traces before you can run a report, though.

If you’re running your applications in App Engine, then it’ll automatically capture and submit traces, but if you want to trace code that’s running outside of App Engine, then you’ll have to either add instrumentation code to your applications using the Trace SDK or submit traces through the API.

Cloud Trace shows you which requests take the longest to run. Once you’ve determined which requests might need to be optimized, you can use Cloud Profiler to see which parts of the code for those requests are using the most CPU and memory.

To use Cloud Profiler, you have to add instrumentation code to your application even if it’s running in App Engine. Google has provided a sample application called shakesapp that includes this instrumentation. It’s written in the Go language. Here’s what it looks like in Cloud Profiler. This is called a flame graph, and it can be a bit confusing until you know how it works.

Since CPU time is selected, the bars represent the CPU time taken by each function. I ran the application seven times, so these results show the average of those seven runs. The first bar is for the entire application, which took about 13 seconds of CPU time, on average.

The bars underneath are color-coded according to the package they’re in. Most of these functions are part of the standard libraries for the Go language. The ones that are part of the actual application, shakesapp, are dark green in this graph. The first one just calls the second one, so the second bar is the one that matters. It calls a Go language function called MatchString. This single function takes up 58% of the CPU time for this application, so we might want to see if there’s a more efficient way to perform this operation.

Okay, that’s it for the Cloud Operations suite. Before we go, you might want to delete your application, so it doesn’t incur any more charges. Go to App Engine and then go to Settings. Click “Disable application”. It will ask you to type the app’s ID before you can click “DISABLE”. This doesn’t delete the application, but it does stop it from serving requests. To start the application up again, you can just click “Enable application”.

If you want to permanently delete the application, then you’ll have to delete the project it’s associated with, which you can do in the “IAM & Admin” page. Be aware that if you delete a project, you will never be able to use that project ID again. That is, you won’t be able to create a new project with the same ID.

That’s it for this lesson.

About the Author
Learning Paths

Guy launched his first training website in 1995 and he's been helping people learn IT technologies ever since. He has been a sysadmin, instructor, sales engineer, IT manager, and entrepreneur. In his most recent venture, he founded and led a cloud-based training infrastructure company that provided virtual labs for some of the largest software vendors in the world. Guy’s passion is making complex technology easy to understand. His activities outside of work have included riding an elephant and skydiving (although not at the same time).