Process text streams using filters
Start course

This course will focus on the Linux command line and the text-manipulation tools that let you effectively control just about anything on your system. We'll learn about terminal environments, working with text streams, file management and archives, system processes, advanced text searches, and terminal text editors.

The previous course covered installation package management, while "Filesystems and Partitions" is up next. 

Here's the complete first set of Linux certification courses.

If you have thoughts or suggestions for this course, please contact Cloud Academy at


When people talk about Linux you'll often hear the expression nearly everything in Linux is a text file.

Whether it's the configuration files kept in /etc, executable scripts or logs so much of what goes on in Linux is really nothing more than the system reading plain text or lightly formatted files. With all that can be controlled using text, it's stands to reason that there's a great deal that can be accomplished by filtering and editing that text. And especially by filtering and editing it on the fly as processes are actually an execution.

We're going to illustrate the text filtering and manipulation tools any Linux sysadmin will need by performing rather insignificant actions but the skills themselves are directly transferable to the most sophisticated and even elegant tasks. But this is one of those times I referred to earlier when you're unfortunately going to be deluged with commands and their arguments. Take it slowly don't be afraid to listen to something over again and don't forget that I will insert periodic review to help all these digest properly. And most of all, work through all these commands on your own.

Working with Linux text manipulation tools

You should note that each of these tools has many possible arguments and use cases that we're not going to go into right now. You'll generally need to have a decent overview level knowledge of text processing commands for the LPI exam and won't be expected to dig deeply into complex options.

As we've seen already many times in this series, cat will print a file to the screen. We didn't yet mention that cat is actually short for "concatenate" which means to join things together. Adding the "-n" argument will print the text with numbered lines. And using uppercase A will print all characters. If you need a more simplified version of that file, we can use cut to strip away everything we don't need. Here the "-d" followed by the colon sets the field delimiter to colon meaning that every instance of a colon marks the beginning of a new field. "-f1" means that we only want to print the first field.

This doesn't actually change the contents of the file nor could we in the case of the password file without becoming the administrator but we could easily save our newly edited text by piping the string to a file. Let's see what it looks like.

"Expand" will convert tabs in a text string to spaces. We'll work with the file called file that contains two lines one whose words are separated by tabs and the other separated by spaces. Running "expand -t" followed by the number 15 and the name of our file will replace every tab in the file with 15 spaces. Running "unexpand" with the number one and the name of our file will replace every set of one spaces with a tab. Running it again with the value two will replace every set of two spaces with a tab.

"Fmt" formats large bodies of text. "Fmt -w" will force a file to break to a new line every X number of characters. "Fmt -t" will indent all paragraph lines after the first line, "Pr" will also add formatting to the text.

"Pr -d" will print double spaced "-l" will set a limit to the total number of screen lines. Head will print only the indicated number of lines from the top, the head of the file.

"Od" which stands for octal dump will print the characters of a file in different formats. "Od -a" for instance will display our text with "ht" representing each tab and "sp" for each space. "Od -c" will print tabs as "/t" and new lines as "/m." And here's OD's default output.

Let's review. Cat prints to the screen, cut will isolate a single column and print only that. Expand converts tabs to spaces while unexpand converts spaces to tabs. "Fmt -w" formats the width of the text that's displayed to the screen, "pr -d" and "-l" control line spacing and screen length. Head prints only the first defined number of lines of a file and "od" will print files in different formats.

Less is an old friend we've used previously to view text. Less will display large bodies of text one screen at a time allowing you to use regular controls like the arrow keys or page up and page down to move through a document.

Join is a tool for merging columns of data for multiple files assuming that they share a common field. I've created two very simple files part one and part two containing simple numbered columns. By running join and specifying our two files, the data from both will be usefully displayed. Paste besides printing a single file will also by default print the data from two files side by side, adding "-s" will print the two file sequentially rather than side by side. Although I admit that this particular example doesn't look that useful.

"Nl" like "cat -n" will print the lines of a file with numbers. Sort will reorder the contents of a file either by number or alphabetic sequence. Running sort with "-n" telling Linux that we want this sorted by number will display the lines in ascending order. "Sort -nr" will do the same but in reverse or descending order. Sort without the "-n" switch will list the files in alphabetic order. This is a version of the file without numbers again adding "-r" will reverse the output.

Split is one of those rare text tools that actually does something permanent by default. Running split will take a file or text stream and create smaller files from it according to your specified size. So to create files each no greater than say two lines long use "split -2" and the file name. In this case, listing the files in this directory reveals six files named xaa to xaf each no longer than two lines. If you don't specify a file length, split will default to 1,000 lines.

Tail will print only the last lines of a file. The number of lines it does print depends on your specification. Therefore, "tail -n 3" will print the last three lines of the etc/group file. When you add the "-f" switch, tail will continue printing any new lines that are subsequently added to the file. This can be really useful when you're trying to monitor a system event. Running tail against a log file like syslog will allow you to see system events as they happen. "Tr" will translate text from one format to another.

Let's take off the new split files that we created and read it using cat but rather than display it to the screen as is we'll pipe it (using the shift and backslash keys) to tr, in this case converting all lowercase characters to uppercase. Here's a file I created with a couple of such duplicates.

"Uniq -u" will print only unique lines. Quickly printing document's statistics is the job of "wc." "Wc" will tell you the total number of lines, words and bytes of a file.

Introduction to sed on Linux

Finally, we come to "sed" which stands for stream editor. In truth, sed could easily fill a course all its own and is treated with some awe by those admins familiar with its magic. It's almost embarrassing to reduce sed to just a couple of simple examples but this should actually cover you for the LPI exam and I'll let you get away with it as long as you make a solemn promise to dig deeper on your own later.

Let's take another look at our part one file and note its contents. Suppose you've grown tired of dogs and would prefer a horse. You can cat the file again and pipe it to sed using "s" and forward slash tells sed to substitute horse for dog, which is exactly what it does. Running sed again with a "d" following dog will delete the dog line entirely. My apologies to dog lovers.

Let's review. Less displays text files one screen at a time. By the way, there's another similar tool called "More." Join merges columns from multiple files that share a common field. Paste prints multiple files together. "Nl" prints a file with its line numbers. Sort controls the order by which you can print a files contents. Split creates smaller files out of a single large file. Tail prints or monitors the end of a file. "Tr" translates text from one format to another. Unique will isolate duplicated lines. "Wc" displays documents statistics. And sed does just about everything else.

About the Author
Learning Paths

David taught high school for twenty years, worked as a Linux system administrator for five years, and has been writing since he could hold a crayon between his fingers. His childhood bedroom wall has since been repainted.

Having worked directly with all kinds of technology, David derives great pleasure from completing projects that draw on as many tools from his toolkit as possible.

Besides being a Linux system administrator with a strong focus on virtualization and security tools, David writes technical documentation and user guides, and creates technology training videos.

His favorite technology tool is the one that should be just about ready for release tomorrow. Or Thursday.

Covered Topics