Once you have implemented your application infrastructure on Google Cloud Platform, you will need to maintain it. Although you can set up Google Cloud to automate many operations tasks, you will still need to monitor, test, manage, and troubleshoot it over time to make sure your systems are running properly.
This course will walk you through these maintenance tasks and give you hands-on demonstrations of how to perform them. You can follow along with your own GCP account to try these examples yourself.
Learning Objectives
- Use the Cloud Operations suite to monitor, log, report on errors, trace, and debug
- Ensure your infrastructure can handle higher loads, failures, and cyber-attacks by performing load, resilience, and penetration tests
- Manage your data using lifecycle management and migration from outside sources
- Troubleshoot SSH errors, instance startup failures, and network traffic dropping
Intended Audience
- System administrators
- People who are preparing to take the Google Professional Cloud Architect certification exam
Prerequisites
- Google Cloud Platform: Fundamentals course or experience with Google Cloud Platform
Resources
- The GitHub repository for this course is at https://github.com/cloudacademy/managing-gcp.
So far, we’ve been looking at alerts and log messages from system software. Now it’s time to look at how to get error information from your applications. That’s where the Cloud Error Reporting service comes in. To show you how this works, I’m going to install Google’s Hello World application in App Engine and then get it to generate an error.
Normally, you’d use your own workstation for the development environment, but to simplify this demo, I’m going to use Cloud Shell. The nice thing about Cloud Shell is that it already has all of the packages installed that you need. When you do want to write and test Java code on your own workstation, remember that you need to install the Google Cloud SDK, the Java SE 11 Development Kit, Git, and Maven 3.5 or greater on your system.
First, open Cloud Shell. Next, get a copy of the Hello World application with this “git clone” command. Then go into the directory where the app is.
Now use the local development server to make sure the app works. To see if it’s working, click the “Web Preview” icon, and select “Preview on port 8080”. You should see a “Hello world!” message. OK, it’s working, so let’s stop the development server and upload the application to App Engine. You can stop the development server with a Ctrl-C.
Now use the “gcloud app deploy” command to upload it to App Engine. If you’re doing this yourself, then it may look slightly different than mine because I’ve already configured App Engine. If this is your first time deploying to App Engine, then it will likely ask you to choose a region. OK, it’s done deploying.
There are a couple of ways to test it. If you’re not using Cloud Shell, then you could do a “gcloud app browse”, which is pretty handy. Since we are using Cloud Shell, we’ll have to go to this URL. There’s “Hello World!” again.
Now, in order to see an error on the Error Reporting page, we need to generate an error. Let’s edit the code and mess something up. I’m going to add a line that I know will cause a problem. This’ll throw an exception because you can’t divide a number by zero.
Now run “gcloud app deploy” again. When it’s done, bring the app up in your browser again. This time it gives you a big error message, which is actually what we want, for once.
Let’s see if Error Reporting picked it up. The Google Cloud Console shows errors on the main dashboard, so you don’t have to go to the Error Reporting page to see them. I’ll refresh the page. There it is. Click on “Go to Error Reporting” to see what it shows. If you click on the error, you’ll see more details, including the stack trace. You can even click on the line where the error occurred and it will take you into your source code in the Debugger. However, in this case, the line it’s showing at the top of the stack trace is not in our code. We can click on this line, though, which is in our code, and it should take us there.
The Debugger is a great tool that you can use whether an error occurred or not. Let’s put a more subtle problem in the code and see how we can use the Debugger to figure out what’s wrong.
Suppose we want to check the operating system running our app, and if it’s Ubuntu, then we’ll print “Ubuntu rocks!”
First, we have to fix the bug that we introduced previously, so we’ll remove that line. We have to go back to the editor to do that. Now I’ll add the new code. Even if you’re not familiar with Java, this is pretty straightforward. It gets the name of the operating system, then it checks to see whether it’s equal to Ubuntu or not, and if it is, it says, “Ubuntu rocks!”, and if it isn’t, it says, “Hello world!”.
Now we’ll upload it to App Engine again. OK, now we’ll refresh the webpage. And it says “Hello world!” again, not “Ubuntu rocks!” That might be because the underlying operating system isn’t Ubuntu, but let’s go back to the Debugger and see if that’s the reason.
You’ll notice that this is still the old version of the file. First, refresh the browser. It’s still showing the old version. To get to the new version, you have to click on this drop-down menu and select the right one. The latest version should say 100% at the end. Sometimes you have to tell it where the source code is. Find “App Engine” in the list, and click the “Select source” button.
Now find the file. You’ll see some text on the right-hand side that says to click a line number to take a snapshot of the variables and call stack. It also points out that taking a snapshot does not stop the running application, which is good to know.
Click in the left-hand gutter on the line just after the “osname” variable is set. Now that the snapshot point is set, we can refresh the webpage and trigger the snapshot. If we go back to the Debugger tab, you’ll see that it’s showing the variables and call stack on the right-hand side. There’s “osname”. It’s set to “Linux”, not anything more specific. I guess it doesn’t know the specific distribution of Linux that’s running, so let’s change our code to check for Linux instead.
And deploy the new version. Now refresh the webpage. It worked!
Let’s go back to the main Error Reporting page and I’ll show you a couple of other things. First, if you’re sitting on this page watching for errors in real-time, then you should click the “AUTO RELOAD” button, which will refresh the page every 5 seconds. If you don’t want to hang around here and just want to get an email when an error occurs, then click the “Turn on notifications” button.
Alright, that’s it for this lesson.
Lectures
Course Introduction - Monitoring - Logging - Tracing - Testing - Storage Management - Cloud SQL Configuration - Cloud CDN Configuration - Instance Startup Failures - SSH Errors - Network Traffic Dropping - Conclusion
Guy launched his first training website in 1995 and he's been helping people learn IT technologies ever since. He has been a sysadmin, instructor, sales engineer, IT manager, and entrepreneur. In his most recent venture, he founded and led a cloud-based training infrastructure company that provided virtual labs for some of the largest software vendors in the world. Guy’s passion is making complex technology easy to understand. His activities outside of work have included riding an elephant and skydiving (although not at the same time).