If anyone wants one single reason for moving to Linux, then that reason could well be AWK; AWK is a text data processing language and is one of the most powerful tools in the Linux toolbox; and all that a developer has to do in order to use it is to fire up a Linux terminal and start programming.
What Exactly is AWK?
To say that AWK is a text data processing language is correct, but doesn't fully convey just how powerful and adaptable AWK is. - it can:
- take inputs from files or from the standard input
- process a file line by line
- have subroutines that run at the start and end of processing
- use data types
- use arrays and associative arrays
- use simple pattern/action pairs to produce out complex algorithms
- consist of built in and user defined functions
- use command such as printfto format the output
and it comes as standard in Linux.
Why is AWK Called AWK?
Many Linux commands tell a programmer what they're to be used for just by their names (get, find, sort and whois, for example) but awk is not like that: AWK is actually named after its creators - Alfred Aho, Peter Wienberger and Brian Kernighan.
How is AWK Pronounced?
AWK is not pronounced as three separate letters (A-W-K), but as single word rhyming with Auk ( the Auk is a seabird, and is sometimes used as an emblem for AWK).
My System Doesn't Seem to Have AWK Installed
In most cases a Linux distribution will have an AWK interpreter installed rather than AWK itself, and the interpreter may be:
- gawk (gnu AWK )
- mawk (Mike Brennan's AWK )
- nawk (new AWK )
- oawk (old AWK )
In fact, even if AWK does seem to be installed then it is probably actually a link to one of the interpreters:
The Structure of an AWK Program
All AWK programs have the same structure:
- an optional BEGIN subroutine (for printing headers and initialising any variables)
- a series of pattern/action pairs - the action (or set of actions) is run one every line matching the pattern. If the pattern is omitted then the action (or set of actions) will run for every line
- an optional END subroutine (for printing any results such as sums or counts)
- all of the code must be encapsulated within a single quote
The AWK program will also need to be told about:
- any delimiters used in the file
- the file name
so, this takes the form of:
An Example of an AWK Program
AWK is a data processing language and so some data is needed- something like the following stock quote information from the Yahoo! Finance web site:
If this data (from http://finance.yahoo.com/q/cq?d=v1&s=NOVL,MSFT,HOLL) is saved as quotes.csv then a simple AWK program can be used to create a useful report:
In this example the code:
- initialises the variables to be used in the BEGIN section
- runs different code for when the fifth field is greater than zero, equal to zero and less than zero
- removes any quotation marks from the first field (using AWK's replace command - gsub), and prints a formatted output for every line
- prints out the analysed results in the END section
and the output is:
Conclusion
AWK is very simple to use but the program's BEGIN and END sections and a few pattern/action pairs can create a professional report - one that can analyse any data quickly and easily.