Search text files using regular expressions
Start course
Difficulty
Intermediate
Duration
1h
Students
2236
Ratings
4.9/5
starstarstarstarstar-half
Description

This course will focus on the Linux command line and the text-manipulation tools that let you effectively control just about anything on your system. We'll learn about terminal environments, working with text streams, file management and archives, system processes, advanced text searches, and terminal text editors.

The previous course covered installation package management, while "Filesystems and Partitions" is up next. 

Here's the complete first set of Linux certification courses.

If you have thoughts or suggestions for this course, please contact Cloud Academy at support@cloudacademy.com.

Transcript

Regular expressions often described as Regex. regex are text strings that contain some mixture of regular letters that are meant to be understood literally as simple characters and meta-characters that have meaning at the system or processing level. Thus, characters like the dot, brackets, the caret, dollar sign, backslash and asterisk can all be used for instructions for tools like grep.

In this video, we'll learn how to properly notate and use these instructions both in their regular and extended levels. We're also going to learn how to use the grep tool much more effectively. By the way, as I don't believe I've mentioned this before grep stands for Global Regular Expression Print.

How to run sophisticated text searches using regex and other text streaming tools

Here's a silly but widely-used example. Looking at the fruits.txt file I created, we see that it contains the word "banana. "Rather than just searching for the whole word, something even a Windows user could do, we'll do it like trained Linux admins and use grep to search for "banana," specifically taking advantage of the repeated incidence of the letters A-N. Naturally, this won't be so efficient in our case but you can no doubt imagine how it can be applied to more complicated cases. This will search for any letter B followed by any number of A-Ns and then an A.

The problem is that telling GREP to look for A-N and just A-N requires the use of parentheses and parentheses are reserved special characters. To tell grep to treat the parentheses the way we'd like we need to prefix each one with a backslash can quickly become tiresome. Using egrep or extended grep, on the other hand, tells Linux to treat the parentheses exactly the way we want them in this case. This produces the exact same results with a simpler command.

If you need to search through a file or directories of files for a complicated string, you can use either fgrep or grep -f. Fgrep, which stands for fast grep, works by ignoring any meta meanings that a special character like a dollar sign or an asterisk might sometimes have and searches for their literal presence. Let's say that you've got a file that contains a high-level password that you'd like to find since the password contains say a dollar sign which is normally a special character you can use fgrep on the text.txt file to get it. You could also use grep with the "-f "switch to read the password from a file I've created, which contains nothing but the password like this. This can be very useful if you often search for the same complicated string and don't want the hassle of retyping it over and over again. By the way, from a security perspective it's very bad practice to store passwords in unencrypted files.

Processing text while taking into account special characters can add a dimension of complexity to just about any streaming operation. You can always simply remove special characters either in a live stream or by saving to a new file.

Let's use the substitution tool "sed" to illustrate. We've created an html file called text.html that predictably contains html-formatting tags. Suppose you'd like to use the files text but without the tags. You can pipe the text to sed and have sed strip all the formatting. Here we're simply removing each pair of left and right arrows and the characters between them. The "/g," by the way, tells sed to apply the changes globally or on every instance it encounters, rather than only on the first one. We've already seen how to pipe text to grep which then filters for specific strings. In fact however, grep can also be used on its own in a significant range of functions. Let's try to find the word "sdb1, " the designation for one of my disk partitions in the dmesg log file. We're used to running it after cat like this. We can do the exact same thing without cat this way.

If you'd like to find the string "sdb1" wherever it's found in any of the log files in the /var/log directory or in sub-directories below it add the "-r" recursive switch. You can use grep with "-v" to display only those lines without a particular string. Looking at the dmesg file we can see that the number three does show up a whole lot of times. We can exclude all lines containing the number three this way.

Let's do that again adding the "-n" switch to print line numbers along with our output. This can be really useful if you would now want to find or edit specific lines within the file itself.

Let's review. Regex meta-characters include the dot, or period, brackets, the caret, dollar sign, backslash and asterisk. To apply Regex functionality to grep filters you'll often need the backslash character. Egrep will produce the same effect without the need for escape characters. Fgrep will search for literal strings, while grep-f will read a search string from any file. You can strip special formatting characters from a file using sed. "Grep -r" will search all files within a direct retrieve for a string. "-v" will display all the lines that don't contain a specified string. And "-n" will number the lines displayed.

About the Author
Students
16233
Courses
12
Learning Paths
5

David taught high school for twenty years, worked as a Linux system administrator for five years, and has been writing since he could hold a crayon between his fingers. His childhood bedroom wall has since been repainted.

Having worked directly with all kinds of technology, David derives great pleasure from completing projects that draw on as many tools from his toolkit as possible.

Besides being a Linux system administrator with a strong focus on virtualization and security tools, David writes technical documentation and user guides, and creates technology training videos.

His favorite technology tool is the one that should be just about ready for release tomorrow. Or Thursday.

Covered Topics