If You want to be a Programmer then Use a Pipe, My Son

Make scripting Elegant with Pipes - Mark Alexander Bain
Make scripting Elegant with Pipes - Mark Alexander Bain
In this article Mark Alexander Bain proposes the use of pipes. Not for smoking of course, but to improve the structure of scripting files.

It was long, long ago in a computer department far, far away that an aged and learned programmer said to me "Remember, a programmer's strength flows from the Pipe", thus setting me on the path to true scripting enlightenment and a truly elegant way of dealing with data flow in a command line script.

What is a Pipe?

In the same way that a pipe can take water or gas from a service provider in one city to be used as required by a consumer in another city, so a pipe can also take the output of one computing process and direct it to the input of another.

Now this idea is simple, but I've found that some of the most simple programming concepts may seem difficult at first and so it may be a good idea to look at life without pipes before considering life with pipes.

Scripting Without Pipes

Let's start by imagining a simple (but realistic) scenario. Let's imagine that the task is to take a file and then to:

  1. extract all lines that contain some data
  2. extract the third field from the file
  3. change the word 'pope' to 'pipe'
  4. count the number of lines left

This would appear to be 4 separate processes and, in fact, can be handled as such:

  1. IPFILE=test.csv
  2. grep -v "^$" $IPFILE > $IPFILE.2
  3. awk -F "," '{print $3}' $IPFILE.2 > $IPFILE.3
  4. sed s/"pope"/"pipe"/g $IPFILE.3 > $IPFILE.final
  5. wc - l < $IPFILE.final
  6. rm $IPFILE.2 $IPFILE.3

The script works well but is rather inelegant and requires multiple files to be created. This, is, of course, not a problem because the final step is to delete the temporary files. However, a much neater solution is to use a pipe.

Scripting With Pipes

Now the hardest part of scripting with pipes is to find the pipe key. The pipe symbol is | (that's a vertical line not the number 1, lowercase L or uppercase i). On my current USA layout keyboard it is Shift-\ and is just below F11. On my British laptop it is still Shift-\ but this time it is left of the z key. Once you've found the key then its use is simple:

process 1 | process 2

So, our script now becomes:

  1. IPFILE=test.csv
  2. grep -v "^$" $IPFILE |
  3. awk -F "," '{print $3}' |
  4. sed s/"pope"/"pipe"/g |
  5. wc - l

Notice that the pipe removes the need for multiple files to be created. The output from one line of code simply becomes the input for the next. However, the script only produces an output to the screen it does not create the formatted file. For that the tee function is required.

Time for tee

The tee functions does 2 things simultaneously:

  1. writes an output to file
  2. sends an to the standard output (the screen, for instance)

The script now becomes:

  1. IPFILE=test.csv
  2. grep -v "^$" $IPFILE |
  3. awk -F "," '{print $3}' |
  4. sed s/"pope"/"pipe"/g |
  5. tee $IPFILE.final |
  6. wc - l

The pipes enable a single flow of data. It may even save a little bit of time or possibly a little disk space. However, that's not really the point. The point is to create an elegant and efficient piece of code.

Mark Alexander Bain - Mark Alexander Bain is a writer, Mo Bro and consultant for all aspects of software development at dsquared. He has also written regularly ...

rss
Advertisement
Advertisement
Advertisement