Today, I worked with the Unix’ awk utility. This is an extremely potent utility to investigate text files on a Unix platform. It can be invoked from the terminal command line. The command must start with awk.
The keyword awk is followed by a script that is positioned between quotes. After the quotes, the textfile is mentioned (say ww-ii-data.txt).
When some items need to initialised, we have the begin clause. The beginclause is positioned between brackets {}.
After that a selection can be made on lines with a selection between slashes. The actions on the line are then also positioned between brackets. Finaly after the END, an end-clause may be included. We then have:
awk ‘BEGIN {} /selection/ {} END {}’ file.
As an example:
awk ' BEGIN {count=0;max=0} //{ temp = substr($0,37,3) + 0; count++; if (max< temp) max=temp } END {print "regels: ", count," max in Celcius", (5/9)*(max-32);} ' ww-ii-data.txt
I noticed that variables can be used. No declaration is needed. Nice.
An alternative programme is written on a file where columns are separated by commas. In that case, the seperator must be included in the BEGIN clause. This is accomplished with "FS="separator code"". If that is done, the different columns are labelled as $1, $2, etc. This allows you to directly access such a column. If one would like to use this columns, one may use a variable $1, $2 that stands for this column.
awk ' BEGIN {count=0;max=0;FS=","} // { temp = $3 + 0; count++; if (max< temp) max=temp } END {print "regels: ", count," max ", max;} ' /home/hadoop/a.csv
Finally, a statement to remove end-of-line characters in a UNIX file:
{Processed_File}