In this course, we will explore how to set up a network of virtual machines and how to implement SSH key authentication and execute commands on remote systems. We'll look at how to install and remove software from local and remote systems. You'll learn about continue and break statement and their benefits and use cases. You learn how to automate processes on Linux through the use of cron jobs and examine running processes.
This course is part of the Linux Shell Scripting learning path. To follow along with this course, you can download all the necessary resources here.
Learning Objectives
- Learn how to create a network of virtual machines and how to configure SSH key authentication and execute commands on remote systems via SSH
- Learn how to install and remove software packages both on your local system as well as on remote systems
- Understand continue and break statements in loops and what they're used for
- Understand what cron is and how to use it to schedule the running of scripts in Linux at various intervals
- Learn how to examine running processes on a Linux system and how to determine their process IDs
Intended Audience
- Anyone who wants to learn Linux shell scripting
- Linux system administrators, developers, or programmers
Prerequisites
To get the most out of this course, you should have a basic understanding of the Linux command line.
All right, welcome to this exercise walk through. You know the drill. What we're going to do is look at our requirements and then we're gonna go ahead and build a script to satisfy those requirements. So the requirements for this particular scripting exercise are that the script is named run-everywhere.sh. And what it does is it executes all arguments as a single command on every server listed in a file. And by default that file is /vagrant/servers. This script also executes the provided command as the user executing the script. So we don't want people to execute this as sudo we want them to execute it as themselves in that way they can have their own SSH key set up and that sort of thing, and do it that way. And if they want to use root privileges we'll provide them a way to do that, we'll talk about in just a second here. So another thing we want to do is use a SSH command that actually times out in a relatively short period of time. We may have not really talked about this in one of our demonstrations or one of our exercises before this but what can happen is if you have a server that's down on the network and you try to SSH to it, SSH can take a minute or two before it times out. And of course, any of these network commands such as ping or SSH or other similar commands often have a way to control the timeout. So here, the way to do it with SSH is to use the -o ConnectTimeout= and then some value, we'll use that option. And so I've just decided to use two seconds. You could leave this out and let it fail on its own. You could use a longer duration if you want, longer than two but I think this is good for our particular purposes. We also decided that this script is going to allow the user to specify some options. The first option that we're concerned with this -f and then a path to a file. And what this does is allows a user to override the default file of /vagrant/servers. This way they can create their own list or even sub list of servers and execute commands against that list that they provide to the script. So it's not hard coded if you will. The next option is -n which stands for DRY RUN and this allows for commands to be displayed instead of executed. And another option we're going to give the user the ability to use is -s, which runs the command with sudo or super user privileges on the remote server. For example, if they need to run a system administration command that requires root privileges they'll just use - s, they will be connected via SSH via their normal user and then sudo will be pre pended to their command which allows it to be executed as route on the remote system. And of course, with sudo, it has to be configured number one, and then number two, there is a log file that tracks who is running commands with sudo. So that is a very good feature, indeed. Okay, so the last option we're going to allow for is -v, which enables a verbose mode and that just simply displays the name of the server for which the command is being executed on. So as you know, some commands generate output and if you wanna see what output is coming from what server then use -v and then it will show you. So that's the whole idea behind that. Again, we mentioned this, that we're going to enforce that the script be executed without superuser privileges. And if they do execute it with superuser privileges let's just tell them to use the -s option. Maybe give them a usage statement and then exit the script. Speaking of a usage statement we're going to create one much like you would see in a man page for a normal command on the Linux system we're going to create our own little mini man page if you will, for this script. If for any reason a command was not able to be executed successfully on a remote host, we wanna let the user know about that. And also we want to exit with a zero exit status if everything went okay, or if everything didn't go okay we'll just exit with the most recent non-zero exit status from the SSH command. And that could mean that we didn't connect to a host or that could mean that the command itself failed. The first step for this project is to actually create a small internal network if you will, of three virtual machines, we're going to create an administration server and then two other servers that are going to act as the workhorses, if you will, for our little network here. So obviously the first thing to do is open up a command line session on your local machine. We'll go into this shell class folder and we're going to create a multi net vagrant project. So that's what we're gonna name it, nothing special about the name. It's just something I chose. And then what we're going to do is initialize our vagrant project here and we're going to use the class image. Now we need to edit our Vagrant file to specify the three machines that we want to have created by vagrant. So I'm just gonna come down here below the config.vm.box line and add my configuration. So we'll create the admin01 server first here. We'll set the host name to admin01. Given an IP address of 10.9.8.10. And then I'm just gonna copy this configuration to the next one here. And so anywhere I see admin01 I'm just gonna change that to server01. And then the IP address is gonna be .11. Very similar process here for server02. So anywhere I see 01, just change it to 02. Well, almost anywhere see 01. Okay. So let me double check my work here. I have admin a one with an IP address of 10.9.8.10, server01, 10.9.8.11 and server02 with 10.9.8.12. Okay, this looks good. So now what we need to do is run Vagrant up and it will create all three of these systems. And then once it's up, we'll connect to the admin01 server and start writing our script. Okay, this looks good. Let me run Vagrant status to confirm. Sure enough admin01, server01 and server02 are all in the running state, according to Vagrant. And now what we're going to do is connect to the admin01 system and do our work from there. So I'll run Vagrant ssh admin01. Once here I'm going to move into the shared folder of /vagrant. And now I'm going to start editing our scripts. So we're going to name it run-everywhere.sh. By default, we're going to have a list of servers located at /vagrant/servers. And I'm just gonna put that variable at the very top of the script here. I'm gonna call it SERVER_LIST. Another variable I'm going to define at the top of our shell script here is going to be around the SSH options that we're going to use for our timeout settings. Now, this may be something personally that I would adjust over time, depending on how my network works and things like that. Maybe I'll have other options that I think that I'll need to use in the future. So I'm gonna put these types of things at the very top of my script here. So we'll just say... Unlike some of our other scripts we do not want this script run with superuser privileges. So what we're going to do is do a check like this. So this says, if the UID is equal to zero that means we know they're executing this with root privileges and then we're gonna tell them not to do that and then use this -s option that we're going to allow them to use instead of -s, in my mind stands for sudo or superuser. We're also gonna give them a usage statement here and I'm just going to build that into a function because I know we're going to need this later anyway, when we're doing things like parsing options. So I'm going to put our usage function here. We're going to allow for the -nsv options. We're also going to allow for -f option but that requires a file if they use that, and then we're going to require a command. By the way, anything they provide on the command line after the options is going to be treated as a single command. We're just gonna pass that over to SSH and allow that to be executed. So we're not gonna do anything super fancy here. This is going to allow us to do what we need to do. And if user wants a multiple commands to be run on a remote host then they just need to enclose that in quotation marks, so that entire command gets sent over and not changed by any shell expansion or anything like that. So we'll get to some of these examples in a minute. But I just wanted to point that out really quick that we're not promising the user anything other than executing one command on every remote system. I see a typing mistake here, fix that. Okay, let's continue here. So we're going to have them execute the script and we're gonna make sure that they're not executing it with root privileges. And if they make it past that check, well then we can start parsing the arguments or parsing the options. F requires an argument. So we're going to put a colon after that option, and the nsv options do not. So for the f option we're going to allow them to override the SERVER_LIST. The -n option is just going to be DRY_RUN and we'll just set that variable to true. And then we'll check it later on in our script. S is for SUDO, and we'll set that to sudo and you'll see why I do that here in just a minute. And then we're going to allow for verbose and we'll set that to true. And then we'll check that later on. And any other option is going to be invalid. And so we're going to teach them how to use the script with a usage statement here. So now that anything that's left on the command line after we parsed the options should be a command. So let's shift everything over after the options. So the whole point of this script is for a user to give us a command. And then this script will execute that command on all the systems in the SERVER_LIST. And if they don't supply an argument after options well then there are no commands to execute. So we should really tell them how to use the script and exit. So, if there are no commands to execute give them usage and then exit. At this point anything that remains is the command. So now that we've checked that they're executing the script with non superuser privileges and that they actually gave us a command to execute over a list of servers, now, what we need to do is make sure that that list of servers actually exists. And remember, we allowed them to change that with the -f option above, and that means they can give us any kind of data that they want. So we need to make sure that they gave us a path to a file and that we can read that file. So we'll do a check right here. If statement reads, if not exist SERVER_LIST then we'll do these two commands here. Echo cannot open SERVER_LIST and tell them what file we try to open. And what we're going to do is send that message to standard error. And we do that of course with a greater than sign and percent two. And then we exit our script with a non-zero exit status. Speaking of exit statuses, you know that one of our requirements is to make sure the script does not exit with zero if there was an error somewhere along the way. So one way we can accomplish this goal is to set an exit status of zero at the beginning and then check for any non-zero exit statuses from the SSH command as we execute them, and then if there happens to be a non-zero exit status then override our default exit status for the script and then exit with that exit status. So what we're going to do here is just say, hey, expect the best, but we're going to prepare for the worst. And we do that with an exit status of zero, we'll assume zero, and then otherwise we'll override it. Okay, so now what we need to do is loop through the SERVER_LIST. So what this for loop does is takes the contents of the SERVER_LIST file and assigns it to the variable server. So the first item in SERVER_LIST will be assigned to server and we'll go through this loop. And then the second line or the second server will be assigned to server, and we'll do that until there are no more servers in this SERVER_LIST. Now we promise that if they're going to be in verbose mode that we were going to tell them what server we're about to execute a command on before we do it. So let's just put a quick check in here. Simple enough. And then what we're going to do is actually build an SSH command with the SSH command itself, the options that we specified at the beginning of the scripts, the server that we're going to be executing the command on, and if we need to execute it with sudo privileges and then of course the command itself. So the reason why I'm going to put this into a variable is that we're gonna need to use this variable twice. If it's a dry run, we're just going to print this variable, but if it's not, then we're actually going to execute it. So in theory, I could have written this out twice but in kind of the spirit of not repeating myself I'm just going to create a variable to store this command that we can use in two different ways that you'll see here in just a second. So I'm gonna store this command in SSH and restore command variable. So the SSH_OPTIONS variable contains that -o ConnectTimeout setting. And we're going to connect to a server that comes from the SERVER_LIST which is assigned to ${SERVER} in the for loop. Then we're going to execute the remote command with sudo if they provided the -s options. So if you remember above if they provide -s we set sudo to equal to sudo the word sudo. Now, if they didn't supply the -s option then the variable sudo is not set and it will just return an empty string. So it will not run with sudo privileges. And we'll show this in a few minutes here. There are a couple of different ways to handle that. That is one way. You could have also put a check. You could have had sudo set to true, and then later on down here, check oh, is sudo true and then if so, then accommodate it there. Or you can do it like I've done it here. Again, as long as the script works I don't really care how you get there as long as it makes sense to you and it's actually functional. Okay, so obviously the last thing left to do here is run the command. If it's a dry run, we're not going to execute anything. We're just going to display what we would have executed. So let's put a check in right here. So here, if it's a dry run, we just display the words dry run, and then the command that we would have executed. Otherwise we're just gonna execute this command. We're going to store the exit status because we're going to check it here in a minute. Let's talk through this last little bit of code here. So we execute the SSH command. We store the return code or exit value into the SSH_EXIT_STATUS variable. Then we check that variable. If it's anything other than zero then what we're going to do is set the exit status of our program to that, that EXIT_STATUS variable. So this allows that default value of zero to be overwritten. If anything goes wrong along the way, then we tell the user that the execution failed and we tell them on what server it failed. And again, we're sending that to standard error because it is an error message. Here, I chose not to exit right after an error. And the reason for that is let's say you have a long list of servers, let's say 100 servers, and on server number three, you get an error. Well, you may want to continue executing that command on all the servers that are up or available or whatever, without immediately exiting because perhaps you just wanna know the ones that failed, so you can look at them manually later instead of having to figure out, oh, okay, the command got executed on these two servers, but not these other ones. So now I have to fix over number three or take it out of the list or create a new list that doesn't contain the first two that the command got ran on and so on. So that was my thinking on not exiting the script there. Of course, if you wanna allow the user and option, perhaps you can use a -e option that says, hey, exit on error. And then you can check for that option here and then exit if the user set that, for example. So again, I love scripting. You can make these scripts do anything you want. So now that I have my script written, what I'm gonna do is actually go to the very top of the script and just kind of scan down it and look for some common issues and common mistakes that everyone makes. For example, not providing a matching quote, missing opening or closing bracket, spelling a word, any of those types of errors. And I may not find them all right away. And if not, that's okay. When I execute my script, I'll see an error and that I can go back and fix it. 'Cause again, I'm working here on a virtual machine on a test system and it's okay if I break things here, right? I'm not it on a production server and this is not going to wake anyone up in the middle of the night and break anything super important. So let me just jump to the very top of the script and just start working my way down and see if I find anything that looks obviously wrong to me. So here's, I'm working my way down. I'm looking to make sure I don't have the spaces in between the equal signs when I'm doing variable assignments. I'm making sure that if I have a single quote at the beginning, that it ends with a single quote and not a double quote. I have done that in the past. It's a common mistake to make mismatched quotes. I'm using a bracket. I'm gonna look to see if having an opening bracket and a closing bracket. If I'm referencing a variable inside a statement, I'm gonna make sure that that statement is in quotation marks like this double quote, so it gets expanded. Here, I have in double quotes. So that looks good. I'm just gonna keep going down here. I don't see anything on that screen. Okay. Here is an error. Actually need this dollar sign to be before the variable. And I actually need a closing quotation mark. So if I had executed my script I would have ran into this bug. And I see a very similar thing right here that I did as well. I left off a closing quotation mark. Same thing here. Okay, so today I was not paying very close attention to my closing quotation marks. So I'm really glad I had taken the extra minute here to look over my script. No promises that I'm not gonna have an error in a second but if I do, we'll just troubleshoot it and figure it out at that time. So I'm just going ahead and save my changes.
Jason is the founder of the Linux Training Academy as well as the author of "Linux for Beginners" and "Command Line Kung Fu." He has over 20 years of professional Linux experience, having worked for industry leaders such as Hewlett-Packard, Xerox, UPS, FireEye, and Amazon.com. Nothing gives him more satisfaction than knowing he has helped thousands of IT professionals level up their careers through his many books and courses.