Crunching Data – awk

June 16, 2006

The awk command combines the functions of grep and sed, making it one of the most powerful Unix commands. Using awk, you can substitute words from an input file's lines for words in a template or perform calculations on numbers within a file. (In case you're wondering how awk got such an offbeat name, it's derived from the surnames of the three programmers who invented it.)

To use awk, you write a miniature program in a C-like language that transforms each line of the input file. We'll concentrate only on the print function of awk, since that's the most useful and the least confusing of all the things awk can do. The general form of the awk command is

awk <pattern> '{print <stuff>}' <file>

In this case, stuff is going to be some combination of text, special variables that represent each word in the input line, and perhaps a mathematical operator or two. As awk processes each line of the input file, each word on the line is assigned to variables named $1 (the first word), $2 (the second word), and so on. (The variable $0 contains the entire line.)

Let's start with a file, words.data, that contains these lines:

nail hammer wood
pedal foot car
clown pie circus

Now we'll use the print function in awk to plug the words from each input line into a template, like this:

awk '{print "Hit the",$1,"with your",$2}' words.data
Hit the nail with your hammer
Hit the pedal with your foot
Hit the clown with your pie

Say some of the data in your input file is numeric, as in the grades.data file shown here:

Rogers 87 100 95
Lambchop 66 89 76
Barney 12 36 27

You can perform calculations like this:

awk '{print "Avg for",$1,"is",($2+$3+$4)/3}' grades.data
Avg for Rogers is 94
Avg for Lambchop is 77
Avg for Barney is 25

So far, we haven't specified any value for pattern in these examples, but if you want to exclude lines from being processed, you can enter something like this:

awk /^clown/'{print "See the",$1,"at the",$3}' words.data
See the clown at the circus

Here, we told awk to consider only the input lines that start with clown. Note also that there is no space between the pattern and the print specifier. If you put a space there, awk will think the input file is '{print and will not work. But all this is just the tip of the awk iceberg–entire books have been written on this command. If you are a programmer, try the man awk command.


One comment

  1. That’s really thinking at an imvsseripe level

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: