Managing and troubleshooting the boot process
Start course

The second course in this Linux certification series (the first was a series introduction, and the third will focus on boot and package management) focuses on System Architecture. It explores how Linux works within its hardware environment and how you can use Linux tools to optimize your system for your specific needs.

You'll learn how to identify and manage hardware peripherals and how the Linux boot process and runlevels work and how you can control them.

If you have thoughts or suggestions for this course, please contact Cloud Academy at


In the previous video we explored the Linux start up process and the roles played by various resources. In particular, we mentioned the contributions of the GRUB bootloader and the kernel. Now we're going to learn how to manage the process as it moves through the GRUB stage, and also how to listen to the kernel when it tries to tell us about problems it encounters along its journey.

Understanding and managing the GRUB bootloader

Depending on your boot settings, the GRUB menu may or may not be visible for a few seconds as your computer starts up. If it's not visible, you can manually cause it to display by repeatedly clicking on the right Shift key during the early boot stages. If that doesn't work, then you can always manually abort the startup while it's in progress (by pressing the power button for five seconds) and the GRUB menu should appear the next time you boot.

Once you're in, you will be presented with a number of boot options. The default will be a full, normal boot of the latest version of your main operating system. There might also be alternate operating systems and a couple of memory tests - for troubleshooting RAM. 

Pressing the "e" key will allow you to edit the boot parameters of your default option. We'll come back to that in a minute. Highlighting the Advanced Options item with your arrow key and then pressing enter will reveal a number of further options, including booting to older kernels - something that can be especially useful if a recent upgrade has gone wrong. These kernel files, by the way, actually live in the /boot directory.

There are also recovery mode options for each kernel version that provides a menu of troubleshooting tools to rescue a damaged installation. Pressing "c" or ctrl-c, by the way, will give a limited command line interface.

But let's get back to the main GRUB menu and press "e" for edit. Feel free to edit any of the parameters you see here. Just remember: anything you do may well render your computer unbootable. Fortunately, as Linux system administrators, you can easily access the tools you'll need to edit yourself out of any trouble you've caused.

Here are a couple of examples. Look at the Linux line towards the bottom. The vmlinuz-3.13 value points to the latest Linux kernel available on the system. The value of root is the identifier for my hard drive. Either of these values can be changed.

You could also add a line like rw init=/bin/bash to boot into a root shell session that will allow you full write access to the whole filesystem. This is one way to restore key system files and recover from a non-bootable condition. If you're uncomfortable leaving your computer open to such access, you might consider protecting it with a BIOS password.

To review, you can force access to the GRUB boot menu by hitting the right shift repeatedly during start up. "e" will send you to a simple text editor where you can edit the boot parameters of a particular GRUB item. "c" will send you to a limited command line (from which esc will send you back to the main GRUB menu). And selecting a recovery mode option from the Advanced Options menu will provide a list of utilities suited for recovering from some system problem.

Diagnosing boot problems in Linux

If something does fail to load properly during the boot process, you've got lots of good options. I'm not sure if there's anything that happens on a Linux system that doesn't leave behind some kind of record in the logs - most of which live happily in /var/log. Now I can hear some of you complaining that if the system won't properly boot, then what good are inaccessible log files? To which the answer is: but they ARE accessible. You will often be able to get in using one of the recovery mode options we described before. But, assuming you didn't encrypt your filesystem when you installed it, you can also boot your PC to a Live Linux session using a DVD or USB stick containing ISO images of any of the popular Linux distributions, or using a super lightweight purpose-built utility distribution like SystemRescueCD or PuppyLinux. Once that's running, you can mount your main hard drive and set about exploring the log files to find out what went wrong, and then repairing your broken files.

The logs that most closely relate to boot problems are kern.log, boot.log, and dmesg. The problem with any of these is that a single boot will generate so many hundreds of lines of messages that they become nearly unreadable. So now's a great time to properly introduce you to your new best friend, grep. 

Let's output the contents of the dmesg logfile using cat, but rather than writing directly to the screen as we normally would, we'll pipe it using the pipe character (which you get by pressing shift and backslash together) and then type grep followed by the string we'd like to search for. Let's say that we were alarmed during boot by a serious-sounding warning message about power/level that appeared during boot. We can simply enclose the entire string in quotation marks and see what comes up.

That's a whole lot easier to read! But it might not provide enough context to help. So we could also simply open dmesg within a text processing tool like "less" and then search for our string to see it in its full context. We can enter a search string by pressing the forward slash key, and then type in our string. We now have the warning's full context. In this particular case that might not be all that helpful, but I'm sure you can see how it could.

If the dmesg, kern, and boot log files proved unhelpful, you should next turn to your favorite Internet search engine or perhaps the user forum associated with your Linux distribution...or even the user forum. I am constantly amazed at how much genuinely useful information is readily available online.

While the range of hardware issues that should concern system administrators of virtual machines like Amazon EC2 instances is narrower than for physical deployments, they still matter. An Ubuntu virtual machine on Amazon Web Services will, by default, include most of the regular log files like boot.log, dmesg, and kern.log. However, AWS provides a more direct method for troubleshooting system issues. In the Instances dashboard of the AWS console, right click on the instance that concerns you, select Instance Settings, and then Get System Log. When you find a suspicious-looking entry, you can copy it and go to AWS's "Troubleshooting Instances with Failed Status Checks" to see if there's anything helpful there. Naturally you can also draw on all your usual troubleshooting resources in your search for a solution.

About the Author
Learning Paths

David taught high school for twenty years, worked as a Linux system administrator for five years, and has been writing since he could hold a crayon between his fingers. His childhood bedroom wall has since been repainted.

Having worked directly with all kinds of technology, David derives great pleasure from completing projects that draw on as many tools from his toolkit as possible.

Besides being a Linux system administrator with a strong focus on virtualization and security tools, David writes technical documentation and user guides, and creates technology training videos.

His favorite technology tool is the one that should be just about ready for release tomorrow. Or Thursday.

Covered Topics