Instance Startup Failures
Start course
1h 13m

Once you have implemented your application infrastructure on Google Cloud Platform, you will need to maintain it. Although you can set up Google Cloud to automate many operations tasks, you will still need to monitor, test, manage, and troubleshoot it over time to make sure your systems are running properly.

This course will walk you through these maintenance tasks and give you hands-on demonstrations of how to perform them. You can follow along with your own GCP account to try these examples yourself.

Learning Objectives

  • Use the Cloud Operations suite to monitor, log, report on errors, trace, and debug
  • Ensure your infrastructure can handle higher loads, failures, and cyber-attacks by performing load, resilience, and penetration tests
  • Manage your data using lifecycle management and migration from outside sources
  • Troubleshoot SSH errors, instance startup failures, and network traffic dropping

Intended Audience

  • System administrators
  • People who are preparing to take the Google Professional Cloud Architect certification exam






What can you do if your VM instance fails to boot up completely? You can't use SSH because the SSH server isn't running yet. If you're running the VM on your desktop, then you could look at the console. But how do you do that for a Google Cloud instance? Luckily, there's a solution. You'd look at the serial port.

By default, you can see the output of the serial port by clicking on the instance, and then at the bottom of the page, you can click the view serial port button. This might be enough information to help you troubleshoot your problem, but it many cases, you'll need to interact with the VM to see what's going on. You'll notice that there's a button called connect to serial port, but it's grayed out. How frustrating. To enable interactive access, you need to add meta data to the instance. This isn't a terribly user friendly way of enabling a feature, but it's actually not too difficult.

First you have to decide whether you want to enable interactive access for an individual instance or for an entire project. If you enable it on individual instances, then you'll have to enable it manually for every instance. For convenience, you might want to enable it for an entire project, but there is a higher security risk enabling serial port access for all of your instances because there is currently no way to restrict access by IP address. So hackers could try to break in to any of your VMs through the serial port. It wouldn't be easy though, because they'd need to know the correct SSH key, username, project ID, zone, and instance name.

To enable interactive access to an individual instance, you can use this gcloud command gcloud compute instances add dash meta data. Now put in the instance name, which is instance dash one in my case, and then dash dash meta data equals serial dash port dash enable equals one.

Now when I refresh the page, the connect to serial port button lights up. If I click on it, then it brings up another window where I can interact with the serial console.

By the way, if you're connecting to a Windows instance, then you'll need to go into the drop down menu and select port two.

If the serial port output showed that you have a problem with the file system on your boot disk, then you can attempt to fix it by attaching the disk to another instance.

First, delete the instance, but be sure to include the keep disks option. Notice that it still gives me a warning about deleting disks, even though I used the keep disks option. That's normal.

Then create a new instance. I'll call it debug dash instance.

Now attach the disk that we saved from the original instance. Notice that by default the name of a boot disk is the same as the name of the instance, instance dash one in this case. You can also add the device name flag so it will be obvious which device corresponds to this disk, which will be helpful in a later step.

Then SSH into the new instance.

Now you need to find out the device name for the debug disk. Look in the dev disk by ID directory.

Remember when I mentioned that naming the disk device would be helpful? You can see that the debug disk is SDB. The file system is on the first partition, or part one. So the device name we need to use is SDB One. Now you can run an fs-check on it.

Of course, I'm doing this on a good disk, so fs-check doesn't see any problems. But if this disk had come from an instance that couldn't boot properly, then there's a good chance that an fs-check would find lots of problems.

Let's pretend that fs-check had to clean the file system and it was successful. After that, you should verify that it will mount properly.

You should also check that it has a colonel file. It does, but before you celebrate, you should check one more thing, that the disk has a valid master boot record.

It printed out information about the file system, so this disk is good to go. Now you would create a new instance and use this disk as its boot disk.

That took a bit of work, but it was relatively straightforward. For a tougher challenge, try the next lesson where we tackle SSH errors.



Course Introduction - Monitoring - Logging - Error Reporting and Debugging - Tracing - Testing - Storage Management - Cloud SQL Configuration - Cloud CDN Configuration - SSH Errors - Network Traffic Dropping - Conclusion

About the Author
Learning Paths

Guy launched his first training website in 1995 and he's been helping people learn IT technologies ever since. He has been a sysadmin, instructor, sales engineer, IT manager, and entrepreneur. In his most recent venture, he founded and led a cloud-based training infrastructure company that provided virtual labs for some of the largest software vendors in the world. Guy’s passion is making complex technology easy to understand. His activities outside of work have included riding an elephant and skydiving (although not at the same time).