An Introduction to AWK

How to Create Reports with AWK on Linux

AWK - A Linux Data Processing Language - Mark Alexander Bain
AWK - A Linux Data Processing Language - Mark Alexander Bain
This article looks at AWK - a powerful Linux tool that can be used to analyse and format data such as CSV files containing stock quotes from Yahoo! Finance.

If anyone wants one single reason for moving to Linux, then that reason could well be AWK; AWK is a text data processing language and is one of the most powerful tools in the Linux toolbox; and all that a developer has to do in order to use it is to fire up a Linux terminal and start programming.

What Exactly is AWK?

To say that AWK is a text data processing language is correct, but doesn't fully convey just how powerful and adaptable AWK is. - it can:

  • take inputs from files or from the standard input
  • process a file line by line
  • have subroutines that run at the start and end of processing
  • use data types
  • use arrays and associative arrays
  • use simple pattern/action pairs to produce out complex algorithms
  • consist of built in and user defined functions
  • use command such as printfto format the output

and it comes as standard in Linux.

Why is AWK Called AWK?

Many Linux commands tell a programmer what they're to be used for just by their names (get, find, sort and whois, for example) but awk is not like that: AWK is actually named after its creators - Alfred Aho, Peter Wienberger and Brian Kernighan.

How is AWK Pronounced?

AWK is not pronounced as three separate letters (A-W-K), but as single word rhyming with Auk ( the Auk is a seabird, and is sometimes used as an emblem for AWK).

My System Doesn't Seem to Have AWK Installed

In most cases a Linux distribution will have an AWK interpreter installed rather than AWK itself, and the interpreter may be:

  • gawk (gnu AWK )
  • mawk (Mike Brennan's AWK )
  • nawk (new AWK )
  • oawk (old AWK )

In fact, even if AWK does seem to be installed then it is probably actually a link to one of the interpreters:

$ ls -l $(which awk)
/usr/bin/awk -> /etc/alternatives/awk
$ ls -l /etc/alternatives/awk
/etc/alternatives/awk -> /usr/bin/mawk

The Structure of an AWK Program

All AWK programs have the same structure:

  • an optional BEGIN subroutine (for printing headers and initialising any variables)
  • a series of pattern/action pairs - the action (or set of actions) is run one every line matching the pattern. If the pattern is omitted then the action (or set of actions) will run for every line
  • an optional END subroutine (for printing any results such as sums or counts)
  • all of the code must be encapsulated within a single quote

The AWK program will also need to be told about:

  • any delimiters used in the file
  • the file name

so, this takes the form of:

awk -F<field delimiter> '
BEGIN {initial actions}
pattern 1 {action set 1}
pattern 2 {action set 2}
END {final actions}
' <input file>

An Example of an AWK Program

AWK is a data processing language and so some data is needed- something like the following stock quote information from the Yahoo! Finance web site:

"NOVL",4.37,"10/7/2008","4:00pm",-0.47,4.93,5.00,4.37,5034992
"MSFT",23.23,"10/7/2008","4:00pm",-1.68,25.00,25.21,23.14,144142064
"HOLL",2.07,"10/7/2008","3:59pm",+0.47,1.42,2.32,1.42,91085

If this data (from http://finance.yahoo.com/q/cq?d=v1&s=NOVL,MSFT,HOLL) is saved as quotes.csv then a simple AWK program can be used to create a useful report:

awk -F, '
BEGIN {
print "Stock Quote Analysis\n"
inc_count = 0
same_count = 0
dec_count = 0
}
$5 > 0 {inc_count++}
$5 == 0 {same_count++}
$5 < 0 {dec_count++}
{
gsub("\"", "", $1)
printf "%5s %6.2f %6.2f\n",$1,$2,$5
}
END {
print ""
print inc_count" shares have increased"
print dec_count" shares have decreased"
print same_count" shares are unchanged"
}
' quotes.csv

In this example the code:

  • initialises the variables to be used in the BEGIN section
  • runs different code for when the fifth field is greater than zero, equal to zero and less than zero
  • removes any quotation marks from the first field (using AWK's replace command - gsub), and prints a formatted output for every line
  • prints out the analysed results in the END section

and the output is:

Stock Quote Analysis
NOVL 4.37 -0.47
MSFT 23.23 -1.68
HOLL 2.07 0.47
1 shares have increased
2 shares have decreased
0 shares are unchanged

Conclusion

AWK is very simple to use but the program's BEGIN and END sections and a few pattern/action pairs can create a professional report - one that can analyse any data quickly and easily.

Mark Alexander Bain - Mark Alexander Bain is a writer, Mo Bro and consultant for all aspects of software development at dsquared. He has also written regularly ...

rss
Advertisement

Comments

Oct 29, 2008 9:24 PM
Guest :
great work. well done.
Aug 19, 2009 12:35 PM
Yuen Kit Mun :
I've been using AWK for years, on Windows from the DOS command line! There's a GNU version of AWK for DOS.

I've used AWK to
- analyse server log files
- strip out tables and other junk tags from HTML pages so that I can view them on my PDA
- convert text files to HTML (with some heuristics to detect and markup chapter headings, detect paragraph beginnings)
- summarize nested directory sizes from DOS dir command output
- draw data structures using ASCII text, from program source code
- generate HTML albums for JPEG files

Tried converting people into using AWK but nobody's interested. Not as much buzz as PERL or Python. I just love the simplicity of AWK. More than enough to get almost any text file job done.

There's an example in the GNU manual for a simple HTTP server written in AWK. There are stories of compilers being written in AWK.


2 Comments
Advertisement
Advertisement