Start course
1h 13m

Once you have implemented your application infrastructure on Google Cloud Platform, you will need to maintain it. Although you can set up Google Cloud to automate many operations tasks, you will still need to monitor, test, manage, and troubleshoot it over time to make sure your systems are running properly.

This course will walk you through these maintenance tasks and give you hands-on demonstrations of how to perform them. You can follow along with your own GCP account to try these examples yourself.

Learning Objectives

  • Use the Cloud Operations suite to monitor, log, report on errors, trace, and debug
  • Ensure your infrastructure can handle higher loads, failures, and cyber-attacks by performing load, resilience, and penetration tests
  • Manage your data using lifecycle management and migration from outside sources
  • Troubleshoot SSH errors, instance startup failures, and network traffic dropping

Intended Audience

  • System administrators
  • People who are preparing to take the Google Professional Cloud Architect certification exam






So far, we've been talking about monitoring and debugging your applications in production, but you'll also need to test your application and infrastructure to see how it will perform under different conditions.

There are at least three types of tests you should run: load tests, where you stress your application with a heavy load. Resilience tests where you see what happens when various infrastructure components fail and vulnerability tests where you see if your application can withstand hacker attacks.

Ideally you should run load tests before you put your application into production. Your test should be designed to simulate real world traffic as closely as possible. You should test at the maximum load you expect to encounter which can admitingly be difficult to predict for some applications but hopefully you'll have a reasonably good idea of how much traffic you're likely to get. You should also measure how your Google cloud costs increase as the number of users increases.

If you're expecting a wide variation in how much traffic you get then you should also test how your application performs when traffic suddenly increases.

Resilience testing is similar to disaster recovery testing because you're testing what happens when infrastructure fails but the difference is that in resilience testing you're expecting your application to keep running with little or no downtime.

One common testing scenario is to terminate a random instance within an autoscaling instance group. Netflix created software called Chaos Monkey that automates this sort of testing. If your application in the autoscaling instance group is stateless, then it should be able to survive this sort of failure without any noticeable impact on users.

Since cyber attacks are extremely common these days, your organization should put processes in place to test the security of your applications. Here are a few important ones:

First, ideally your software development team should have a peer review process with developers checking each other's code for security flaws. Second, you should integrate a static code analysis tool such as HP Fortify into your continuous integration continuous deployment pipeline to automate security checking.

Third, at least once a year you should run penetration tests on your applications and infrastructure to see if they're vulnerable. You can either do this yourself or contract a third party to do it. Other cloud providers typically require that you request permission before you perform penetration testing on your cloud infrastructure. Surprisingly Google does not require that you contact them.

Google also provides a useful tool called the Web Security Scanner. This service connects to the base URL of your application and follows all of the links in it while scanning for vulnerabilities, such as cross-site-scripting, mixed content, and outdated libraries. It can scan applications hosted in App Engine, Compute Engine, and Google Kubernetes Engine.

And that's it for this lesson.



Course Introduction - Monitoring - Logging - Error Reporting and Debugging - Tracing - Storage Management - Cloud SQL Configuration - Cloud CDN Configuration - Instance Startup Failures - SSH Errors - Network Traffic Dropping - Conclusion

About the Author
Learning Paths

Guy launched his first training website in 1995 and he's been helping people learn IT technologies ever since. He has been a sysadmin, instructor, sales engineer, IT manager, and entrepreneur. In his most recent venture, he founded and led a cloud-based training infrastructure company that provided virtual labs for some of the largest software vendors in the world. Guy’s passion is making complex technology easy to understand. His activities outside of work have included riding an elephant and skydiving (although not at the same time).