h1

Selecting Records – grep

June 16, 2006

The grep command selects and prints lines from a file (or a bunch of files) that match a pattern. Let's say your friend Bill sent you an email recently with his phone number, and you want to call him ASAP to order some books. Instead of launching your email program and sifting through all the messages, you can scan your in-box file, like this:

grep 'number' /var/mail/hermie
can call No Starch Press at 800/420-7240. Office hours are
noted that recently, an alarming number of alien spacecrafts
among colleagues at a number of different organizations

Here, grep has pulled out just the lines that contain the word number. The first line is obviously what you were after, while the others just happened to match the pattern. The general form of the grep command is this:

grep <flags> <pattern> <files>

The most useful grep flags are shown here:

-i Ignore uppercase and lowercase when comparing.
-v
Print only lines that do not match the pattern.
-c
Print only a count of the matching lines.
-n
Display the line number before each matching line.

When grep performs its pattern matching, it expects you to provide a regular expression for the pattern. Regular expressions can be very simple or quite complex, so we won't get into a lot of details here. Here are the most common types of regular expressions:

abc Match lines containing the string "abc" anywhere.
^abc
Match lines starting with "abc."
abc$
Match lines ending with "abc."
a..c
Match lines containing "a" and "c" separated by any two characters (the dot matches any single character).
a.*c
Match lines containing "a" and "c" separated by any number of characters (the dot- asterisk means match zero or more characters).

Regular expressions also come into play when using vi, sed, awk, and other Unix commands. If you want to master Unix, take time to understand regular expressions. Here is a sample poem.txt file and some grep commands to demonstrate regular-expression pattern matching:

Mary had a little lamb
Mary fried a lot of spam
Jack ate a Spam sandwich
Jill had a lamb spamwich

To print all lines containing spam (respecting uppercase and lowercase), enter

grep 'spam' poem.txt
Mary fried a lot of spam
Jill had a lamb spamwich

To print all lines containing spam (ignoring uppercase and lowercase), enter

grep -i 'spam' poem.txt
Mary fried a lot of spam
Jack ate a Spam sandwich
Jill had a lamb spamwich

To print just the number of lines containing the word spam (ignoring uppercase and lowercase), enter

grep -i 'spam' poem.txt
3

To print all lines not containing spam (ignoring uppercase and lowercase), enter

grep -i -v 'spam' poem.txt
Mary had a little lamb

To print all lines starting with Mary, enter

grep '^Mary' poem.txt
Mary had a little lamb
Mary fried a lot of spam

To print all lines ending with ich, enter

grep 'ich$' poem.txt
Jack ate a Spam sandwich
Jill had a lamb spamwich

To print all lines containing had followed by lamb, enter

grep 'had.*lamb' poem.txt
Mary had a little lamb
Jill had a lamb spamwich

If you want to learn more about regular expressions, start with the man regexp command. There's also a good book called Mastering Regular Expressions, by Jeffrey Friedl, published by O'Reilly & Associates.

For more information on the grep command, see the grep manual.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: