1. Home
  2. Training Library
  3. Cloud Computing Fundamentals
  4. Courses
  5. Shell Scripting: Transforming, Processing, and Reporting Data

Data Manipulation and Text Transformations with Sed

The course is part of this learning path

Start course
Overview
Difficulty
Intermediate
Duration
1h 33m
Students
72
Ratings
5/5
starstarstarstarstar
Description

In this course, we'll explore a range of options for manipulating data. We'll look at some commands that can be used in shell scripts to transform and process data. We'll then walk you through an example of parsing through log files to find data that matches certain criteria. Finally, you'll learn how to manipulate data and transform text with sed.

This course is part of the Linux Shell Scripting learning path. To follow along with this course, you can download all the necessary resources here.

Learning Objectives

  • Learn a range of commands used for transforming, processing, and reporting data including cut, awk, sort, and uniq
  • Learn how to parse log files for failed login attempts and identify the IP addresses of those
  • Manipulate data and transform text with sed

Intended Audience

  • Anyone who wants to learn Linux shell scripting 
  • Linux system administrators, developers, or programmers

Prerequisites

To get the most out of this course, you should have a basic understanding of the Linux command line.

Transcript

In this lesson, you'll learn how to use the most important features of the sed command. Sed stands for stream editor, and it's used for, well, editing streams. You might not be familiar with the term stream, but if you've performed any I/O redirection or piping, you've been using streams. You can think of a stream as the data that travels from one process to another through a pipe, or from one file to another as a redirect, or from one device to another. So you can think of standard input as the standard input stream, standard output as the standard output stream, and standard error as the standard error stream. By the way, these data streams are typically textual data. The sed command is used to perform basic text transformations on an input stream. For example, you can use sed to substitute some text for other text, remove lines, append text after given lines, and insert text before certain lines. Sed differs from other editors, such as Vim, Emacs, and Nano, in that it's used programmatically. Of course, you can substitute some text for other text, remove lines, append text after given lines, and insert text before certain lines using Vim, but that requires interaction from a human. Sure, you can write macros to perform these functions in Vim, but you have to have someone to start Vim and then execute the macros. With sed, you can programmatically perform these edits without the need of interaction, and this makes sed ideal to be used in shell scripts, for example. The most common use of sed is to act as a command line version of find and replace. I'm going to use this particular application of sed to teach you the basics of the sed command. Notice that I said sed command. Sed is a standalone utility and not a shell builtin. We can prove this by using the type shell builtin. So you can see that when you type sed and press Enter, the /usr/bin/sed program will be executed. Of course, for standalone commands, you can use man to get help and documentation on how to use that command. So let's quickly look at sed's man page. As we've already mentioned, sed is a stream editor. The man page says it's used for filtering and transforming text. If we look at the synopsis section of the man page, we can see that we can execute sed with zero or more options followed by a sed script and optionally followed by one or more input files. So let's exit out of this man page by typing q and get to our first sed command. Let's create a file to work with real quick. Let's do this. Let's make sure our contents made it in the file. I'm sure it did, but let's just prove it here by cat-ing that file. So sure enough, it says, Dwight is the assistant regional manager. And that's the text that's stored in manager.txt. Now, as I said a minute ago, the most common use of sed is to act as a command line version of find and replace. Let's say you want to replace the text of assistant with the text of assistant to the. To do that, you would use the substitute command in sed. The substitute command is represented by the character s. So let me show you how that works. We'll start out with our sed command. Next, we'll put our sed script in single quotes. We'll tell sed to perform a substitution by specifying the letter s. After the s we use a forward slash followed by the text we want to substitute and then another forward slash. In this case, that text is assistant. By the way, the forward slash is acting as a delimiter and the text between the forward slashes is called the search pattern. This search pattern is actually a regular expression, so you can make very advanced searches if you want to. In this case, we're keeping things simple to demonstrate the concept of finding and replacing some text using sed. Next, we'll type the text that is going to replace the previously specified text. We want to change assistant to assistant to the, so we'll type this. Next, we'll close our substitution with another forward slash. We'll call this text the replacement string. That is the end of our sed script, so we'll supply the closing single quote. Finally, we'll tell sed what file to use as input. In this example, that file is manager.txt. When we execute the command, you can see that the text of assistant has been replaced with assistant to the. The original text is, Dwight is the assistant regional manager. The text displayed after being transformed by our sed command is, Dwight is the assistant to the regional manager. Notice that I used the word displayed. I used it very intentionally here. In this example, sed is not altering the contents of the file. It's just sending its output via standard out, which is our terminal. We can prove that the original file is unaltered by looking at its contents. Sed can alter the contents of the specified file, and we'll get to that shortly, but first, let's look at another example of using sed to find and replace text. Let's use this text to work with. Let's change the text of my wife to sed. In this example, we didn't reuse any of the text that we searched for. Instead, we completely replaced it with brand new text. It should probably go without saying that sed is case sensitive by default. So here's an example of that. Notice that the lowercase text of my wife was not substituted. That's because it wasn't an exact match. If nothing matches, sed doesn't perform any alterations and therefore the text is returned without alteration. If you want to perform a case insensitive search, you'll need to supply a flag. So really the format of the sed s command is this. You have s, a forward slash, and then a search pattern, another forward slash to terminate the search pattern, then the replacement string, another forward slash, and then any flags you want to use. So I'm just gonna type Control + C here because I don't wanna execute that command. I just wanna quit that command line there without running it. So we're going to supply the i flag to the command, which you can think of as standing for insensitive. Now the text gets replaced because you told sed to ignore case. By the way, and it strikes me as a little funny, you can also use a capital I as the flag to do this insensitive case matching. So this command works as well. Let's add some more lines to our love.txt file and see how sed handles that. Notice that I've used two greater than signs, and that causes text to be appended to the file. If I just used one, it would have truncated the file. But I was careful here and used two. So we can look at our contents here. And sure enough, there are two lines in that file. So let's go ahead and add a third line. Okay, there it is, love.txt with three lines. Now, let's try the replacement command again. What happens is that sed reads one line from the file and executes the sed commands in quotes against that line. Next, it moves onto the next line and does the same thing until it reaches the end of the file. You'll see that my wife was replaced by the word sed on lines one and three. Line two was unchanged because our search pattern of my wife was not found on that line. Let's add yet another line to this file to demonstrate another important concept. Now, let's run the replacement command again. Notice that, on the last line, only the first occurrence of my wife was replaced. It says, I love sed and my wife loves me. Also, my wife loves the cat. By default, sed just replaces the first occurrence of the search pattern on a line. To override this behavior, we'll need to use the g flag. You can think of g as standing for global, as in a global replace. Now, the last line says that I love sed and sed loves me. Also, sed loves the cat. If you want to replace the second occurrence of the search pattern, use the number two as a flag. If you want to replace the third occurrence, use three, the fourth, use four, et cetera, et cetera. So let's do this. This time, none of the first occurrences of the text my wife were replaced, but only the second occurrences were. On the line where only one occurrence of my wife existed, that line was unaltered because there was no second occurrence. On the last line where my wife appears three times, only the second one was changed. By the way, when I'm saying things like changed, I don't mean that the original file has changed. Again, the contents of the file remain the same. Sometimes you don't want to just see the output displayed to your terminal, but instead you want to save that output. One way to do that is to redirect the output of the sed command to a file. Of course, this isn't anything unique with sed. It's just how Linux works with any command. So let's create a new file called my-new-love.txt. Again, the original file is left unaltered. So there you have it, the original file with the original text and the new file with the redirected and altered text. If you want sed to alter the file, use the -i option to sed, which you can think of as in-place editing. If you want sed to make a backup copy of the file before it alters it, supply some text right after the -i and sed will append that text to the copy of the file. So now, we have our original file named love.txt and another one named love.txt.bak. Sed has performed an in-place edit of the original file. So let's look at that now. So there you can see where sed has replaced the text of my wife where appropriate, given the command we ran. If we look at the backup file, we'll see that it has the original text in it. It's important that you don't use a space after the -i option. If you do, you'll get an error. So let me demonstrate that here. So this is what not to do. Okay, notice I'm using a space. This is going to cause an error. So sure enough, we get an error. That is one little gotcha you can avoid. So just do not use a space after -i. If you only want to save the lines where matches were made, you can use the w flag followed by a file name. So let's do this. Let's change love to like. We'll do a global replace so that no matter how many times love appears on the line, it will get replaced by like, and then we'll use the w flag and a file name after it, so we'll use like.txt, and then the file we're going to use as input is love.txt. In this example, sed displays the entire contents of the love.txt file to the screen with any replacements it made, and it created a new file named like.txt that only contains the lines where the replacements were performed. So let's look at that like.txt file now. While we're talking about input and output, I would like to point out that sed can be used in a pipeline instead of specifying a file to work on. The simplest example would be to cat a file and pipe that to sed like this. That is the exact same thing as doing this. This is a very common pattern with Linux commands, where they can be given a file to operate on or they can use the data sent through a pipe to operate on. We've seen this with other commands, such as cut, awk, sort, unique, and others. Command pipelining is very powerful because you string as many commands together as needed to make the data look the way you want it to. I'm going to generate some text to work on with an echo command. Let's say we want to change /home/jason to be /export/users/jasonc. The challenging thing about this is that the strings we want to manipulate have forward slashes in them, and we've been using forward slashes as a delimiter with sed. One way to get around this is to escape the forward slashes. I'm going to show you a better way in just a second, but let's see how we would do it with escaping first. Now, to escape a character, simply proceed it with a backslash. So we're starting our search pattern with a forward slash, but the next forward slash is in a delimiter, so it needs to be escaped. So we're gonna use a backslash to escape this forward slash. Same thing again here. So now, we end our search pattern with a forward slash and, again, our next pattern starts with a forward slash, so we have to escape it here. Again, we're escaping a forward slash, and yet again. And then, finally, we're closing out our command here. That feels like a lot to keep straight, and it's easy to make a typing mistake in this situation. A nice feature with sed is that you can use any character as the delimiter. So let's use a pound sign instead so the first character that follows the s will be treated as the delimiter, no matter what that character is. So let me give you an example here. I'm going to use a pound sign, and now sed will treat that pound sign as the delimiter. So now, I can just go ahead and specify the text I want to use here in my search pattern, which is /home/jason, close that search pattern, and now the replacement string is /export/users/jasonc, and then we'll close out the replacement text there with a pound sign, and the single quote finishes our sed script command there. So let me hit Enter. And so, as you can see, that works as well. Just to prove that any character works as a delimiter, let's use another character. How about a colon this time? Okay, that works as well. So you get the idea. If you need to use forward slashes in your search pattern or replacement text, choose a different delimiter other than the forward slash. All right, so enough with actually performing substitutions. Let's take a second here to talk about when you would actually want to perform them. One idea is to use templates or template files. For example, if you are constantly deploying new websites and use the same configuration, except for the website name, then it would be a good idea to create a template file that contains all the standard configuration and a placeholder for the website name. Then, you could use sed to simply replace all the placeholders with the actual website name when you're ready to deploy it. Another example of when you could use sed's substitution feature is when you are migrating from one server to another or when you are using a restore of one server to create another new server. In this example, you would need to find and replace the old host name with a new one for all the files in the etc directory, files such as /etc/hosts and maybe /etc/hostname, and there could be others depending on that system's configuration. Yet another example would be when you are copying configuration for a given service from one host to another. You might find yourself doing this, especially if you are working on clusters. You would just copy the configuration from one host to the new host that you are adding to the cluster. Typically, you'll need to change the host name in the configuration file, and you could use the sed to do this for you. Sed would be especially helpful if that host name appeared many times in that configuration. Okay, so that's pretty much it with the sed substitution command. Let's say we want to remove or delete some lines with sed. Well, how you do that is with the d command. Let's review the contents of our love.txt file. Let's say we want to remove the line that says, This is line 2. All you have to do is come up with a search pattern that will match that line and not the other ones. So we could use a search pattern of This because the word This only appears on the line we want to delete. Our search pattern is This and the command is d. So to be clear here, the syntax is a delimiter, and, by default, we use the forward slash as the delimiter, followed by the search pattern and a closing delimiter, again, this is a forward slash, and, finally, a d, which tells sed to delete the line that matches the provided pattern. We can see that this worked, as our desired line is gone. Let's say we want to remove all the lines that contain the search pattern of love. Here's how we would do that. So now, all the lines that matched love are gone. Sometimes when I'm working with a configuration file with many lines, I like to strip out the comments and blank lines to compact the output. That way, I can see just the configuration and nothing else. It makes it easier to read. At least, it does for me. I'm going to create an example configuration file to demonstrate this concept with. So this sample made-up configuration file is just a few lines long, but you can easily imagine one with many, many more lines. First, let's remove the comments. Remember that the search pattern is actually a regular expression. In regular expressions, the caret symbol matches the beginning of a line. It matches a position and not a character. So if we want to match all the lines that start with a pound sign, we use caret pound sign. This makes sure we don't accidentally delete lines that have actual configuration at the beginning of a line but have a comment later on that same line. So now, we have our configuration file without the comment lines, but we still have the blank lines. To delete a blank line, we'll use another regular expression. Again, the caret symbol matches the beginning of the line. The dollar sign matches the end of the line. So caret dollar sign matches a line if the beginning of the line is immediately followed by the end of the line. Said another way, caret dollar sign matches blank lines. So we know how to do the two things we want to do, but now we need to do them at the same time. We need to combine them. To use multiple sed commands or sed expressions, we can separate them with semicolons. For each line, sed will perform the first command, which deletes lines that start with a pound sign, and then the second command, which deletes blank lines. To show you that you can combine different types of commands, let's perform a substitution. Let's change apache to httpd. So that particular command did three things. It deleted lines that started with a pound sign, it deleted blank lines, and it also changed apache to httpd. Just to be thorough, I wanna share with you another way to execute multiple sed commands. Mainly, I wanna show you this just in case you happen to see it in a script or in someone else's work. That way, you'll know exactly what is going on. So the other way to do this is to use multiple -e options to sed, one for each sed command to execute. Sed also allows you to specify a file containing the sed commands. So let's demonstrate this now. I'm going to create a file using the echo command here. Of course, you can use an editor if you want to. So what we've done here is put one sed command per line. Now, we'll use the -f option and supply the path to the file that has the sed commands in it. So we can do this, sed -f, and then the path to our file, and then the path to the file that we want to use as input. So here, again, we get the exact same result. Before we wrap up this lesson on sed, let's take a look at using addresses. An address determines on what lines the sed command will be executed on. If no address is given, the command is performed on all lines. An address is specified before the sed command. The simplest of addresses is a line number. Here are the contents of our sample configuration file to refresh our memories. Here's an example that will only execute against line two of the file. As you can see, the search pattern of apache was replaced by httpd only on the second line. There's also a match on the last line of the file, but that was left alone because it didn't match the address of line two. By the way, some people do not use spaces after the address, so this is the same command without that extra space. You can also use a regular expression as an address to match lines. Let's say you want to replace apache with httpd but only on lines that contain the word Group. As you can see, apache was changed to httpd only on the line that also contained the word Group. You can also specify a range by separating two address specifications with a comma. For example, if we wanted to change the word run to execute but only on lines one through three, we would do this. There was only one occurrence of run from lines one through three, and it was changed. The occurrence on line four was not. So let's extend the range to include line four. Again, you can use regular expressions instead of line numbers. Let's say we want to change run to execute starting with the line that matches #User and ending at the next blank line. So here's how we would do that. So there, you can see that every instance of run that was in between the line that started with #User and the next blank line was exchanged with a text execute. Of course, there was only one match, so only one substitution was made, but you get the point. You could specify a large portion of the file if you wanted to, one starting point with another starting point as your address, and then change everything in that section while leaving everything else alone. It's very powerful to use addresses with sed. In this lesson, you learned the most common use case of sed, which is to perform text substitutions. You learned how to replace specific occurrences as well as how to replace all occurrences of the search pattern. In addition to finding and replacing text, you learned how to delete text with the d command. From there, you learned how to save the alterations performed by sed as well as how to make backups of the original file so that your data is safe. Next, you learned three different ways to execute multiple sed commands on the same set of data. Finally, you learned how to use addresses to work on very specific sections of input.

About the Author
Avatar
Jason Cannon
Founder, Linux Training Academy
Students
3386
Courses
60
Learning Paths
8

Jason is the founder of the Linux Training Academy as well as the author of "Linux for Beginners" and "Command Line Kung Fu." He has over 20 years of professional Linux experience, having worked for industry leaders such as Hewlett-Packard, Xerox, UPS, FireEye, and Amazon.com. Nothing gives him more satisfaction than knowing he has helped thousands of IT professionals level up their careers through his many books and courses.

Covered Topics