Awk BuiltIn Variables

2021-06-24

Linux Linux Utilities Bash Development

The following is a list of variables that awk sets automatically on certain occasions in order to provide information to your program. The variables that are specific to gawk are marked with a pound sign (#). These variables are gawk extensions. In other awk implementations or if gawk is in compatibility mode (see section Command-Line Options ), they are not special.

VARIABLE NAME	DESCRIPTION
FS	input field separator variable
OFS	Output Field Separator
RS	Input Record Separator variable
ORS	Output Record Separator Variable
NR	Number of Records
NF	Number of Fields in a record
FILENAME	Name of the current input file
FNR	Number of Records relative to the current input file
RLENGTH	length of the substring matched by the match() function
RSTART	first position in the string matched by match() function

FS - input field separator variable

It represents the (input) field separator and its default value is space. You can also change this by using -F command line option.

Awk reads and parses each line from input based on whitespace character by default and set the variables $1, $2 and etc. Awk FS variable is used to set the field separator for each record. Awk FS can be set to any single character or regular expression that matches the separations between fields in an input record. If the value is the null string (""), then each character in the record becomes a separate field (This behavior is a gawk extension. POSIX awk does not specify the behavior when FS is the null string).

The default value is " ", a string consisting of a single space. As a special exception, this value means that any sequence of spaces, TABs, and/or newlines is a single separator. It also causes spaces, TABs, and newlines at the beginning and end of a record to be ignored.

Here is an awk FS example to read the /etc/passwd file which has ":" as field delimiter.

#!/usr/bin/awk -f

BEGIN {
    FS = ":";
    print "Name\tUserID\tGroupID\tHomeDirectory";
}

{
    print $1"\t"$3"\t"$4"\t"$6;
}

END {
    print NR,"Records Processed";
}

~] ./passwd.awk /etc/passwd
Name    UserID  GroupID HomeDirectory
root    0       0       /root
daemon  1       1       /usr/sbin
bin     2       2       /bin
sys     3       3       /dev
sync    4       65534   /bin
games   5       60      /usr/games
man     6       12      /var/cache/man
lp      7       7       /var/spool/lpd
mail    8       8       /var/mail
news    9       9       /var/spool/news
...
...
...
36 Records Processed

OFS - Output Field Separator

It represents the output field separator and its default value is space. It is output between the fields output by a print statement. Its default value is " ", a string consisting of a single space.

~] awk 'BEGIN{ OFS = "="; FS = ":" } {print $1, $3;}' /etc/passwd
root=0
daemon=1
bin=2
sys=3
sync=4
games=5
...
...
...

RS - Input Record Separator variable

Awk RS defines a line. Awk reads line by line by default.

Its default value is a string containing a single newline character, which means that an input record consists of a single line of text. It can also be the null string, in which case records are separated by runs of blank lines. If it is a regexp, records are separated by matches of the regexp in the input text. (See section How Input Is Split into Records .)

The ability for RS to be a regular expression is a gawk extension. In most other awk implementations, or if gawk is in compatibility mode (see section Command-Line Options ), just the first character of RS’s value is used.

Let us take students marks are stored in a file, each records are separated by double new line, and each fields are separated by a new line character.

~] cat student.txt
Jones
2143
78
84
77

Gondrol
2321
56
58
45

RinRao
2122
38
37
65

Edwin
2537
78
67
45

Dayan
2415
30
47
20

Now the below Awk script prints the Student name and Rollno from the above input file.

~] #!/usr/bin/awk -f

BEGIN {
    RS = "\n\n";
    FS = "\n";

}

{
        print $1, $2;
}

~] ./students.awk students.txt 
Jones 2143
Gondrol 2321
RinRao 2122
Edwin 2537
Dayan 2415

ORS - Output Record Separator Variable

Awk ORS is an Output equivalent of RS. Each record in the output will be printed with this delimiter. It is output at the end of every print statement. Its default value is "\n", the newline character. Following is an awk ORS example:

Following is an awk ORS example:

~] cat students-marks.txt 
Jones   2143 78 84 77
Gondrol 2321 56 58 45
RinRao  2122 38 37
Edwin   2537 78 67 45
Dayan   2415 30 47

~] awk 'BEGIN { ORS=" <==> "; } { print; }' students-marks.txt 
Jones   2143 78 84 77 <==> Gondrol 2321 56 58 45 <==> RinRao  2122 38 37 <==> Edwin   2537 78 67 45 <==> Dayan   2415 30 47 <==>

In the above script, each records in the file student-marks file is delimited by the character " <==> ".

NR - Number of Records

NR - the number of input records awk has processed since the beginning of the program’s execution. awk increments NR each time it reads a new record.

In the following awk NR example, NR variable has line number, in the END section awk NR tells you the total number of records in a file.

~] awk '{ print "Processing Record - ", NR; } END { print NR, "Students Records are processed"; }' students-marks.txt 
Processing Record -  1
Processing Record -  2
Processing Record -  3
Processing Record -  4
Processing Record -  5
5 Students Records are processed

NF - Number of Fields in a record

awk NF gives you the total number of fields in a record. NF is set each time a new record is read, when a new field is created, or when $0 changes.

Awk NF will be very useful for validating whether all the fields are exist in a record.

Let us take in the students-marks.txt file, some test score is missing for to students as shown below.

~] cat students-marks.txt 
Jones   2143 78 84 77
Gondrol 2321 56 58 45
RinRao  2122 38 37
Edwin   2537 78 67 45
Dayan   2415 30 47

The following Awk script, prints Record(line) number, and number of fields in that record. So It will be very simple to find out that Test3 score is missing.

~] awk '{ print NR, "->", NF}' students-marks.txt 
1 -> 5
2 -> 5
3 -> 4
4 -> 5
5 -> 4

FILENAME - Name of the current input file

FILENAME variable gives the name of the file being read. Awk can accept number of input files to process.

~] awk '{ print FILENAME }' students-marks.txt 
students-marks.txt
students-marks.txt
students-marks.txt
students-marks.txt
students-marks.txt

FNR - Number of Records relative to the current input file

The current record number in the current file. awk increments FNR each time it reads a new record.

awk resets FNR to zero each time it starts a new input file.

~] awk '{ print FILENAME, FNR; }' students-marks.txt books.txt 
students-marks.txt 1
students-marks.txt 2
students-marks.txt 3
students-marks.txt 4
students-marks.txt 5
books.txt 1
books.txt 2
books.txt 3
books.txt 4
books.txt 5

RLENGTH

The length of the substring matched by the match() function.

RLENGTH is set by invoking the match() function. Its value is the length of the matched string, or -1 if no match is found.

~] awk 'BEGIN { if (match("I have 1000 dollars", "dollar")) { print RLENGTH } }'
6

~] awk 'BEGIN { if (match("I have 1000 dollars", /[0-9]+/)) { print RLENGTH } }'
4

~] echo "I have 1000 dollars" | awk 'match($0, /[0-9]+/) { print RLENGTH; }'
4

RSTART

It represents the first position in the string matched by match() function.

The start index in characters of the substring that is matched by the match() function. RSTART is set by invoking the match() function. Its value is the position of the string where the matched substring starts, or zero if no match was found.

~] awk 'BEGIN { if (match("I have 1000 dollars", "dollar")) { print RSTART } }'
13

~] awk 'BEGIN { if (match("I have 1000 dollars", /[0-9]+/)) { print RSTART } }'
8

~] echo "I have 1000 dollars" | awk 'match($0, /[0-9]+/) { print RSTART; }'
8