Training: AWK Introduction

Created on 2010-11-08 by Surabhi Thorayintavida

Switzernet

 

Requirements

Introduction

Training

Validation

References

 

Requirements

Before you start, please make sure you followed the training sessions:

-Simple UNIX bash commands

Introduction

Awk is essentially a stream editor. You can pipe text to it, and it manipulates the lines on line-by-line basis. It is also a programming language.

It has the ability to remember context, do comparisons, and most things another full programming language can do. For example, it isn't just limited to single lines. It can JOIN multiple lines, if you do things right.

Syntax for one line awk commands

 

awk: awk -Fs '/search/ {action}' awkvar=$shellvar infile

nawk: awk -Fs -v awkvar=$shellvar '/search/ {action}' infile

gawk: awk -Fs -v awkvar=$shellvar '/search/ {action}' infile

 

Awk scannes ascii files or standard input. It can search strings easily and then has a lot of possibilities to process the found lines and output them in the new format. It does not change the input file but sends its results onto standard output.

 

awk/nawk/gawk

 

Awk is the orignal awk. Nawk is new_awk and gawk the gnu_awk. The gnu_awk can do most, but is not available everywhere.

Training

By default Awk auto splits a line on white spaces. The fields are stored in $1 through $NF and the whole line is in $0.

Searching happens within "//" and actions within "{}". The main action is to print.

1. Simple Print command: To print content of a file infile.txt

 

awk '{print}' infile.txt

 

2. Print lines that contain a particular string, for example the word ‘test’.

 

awk '/test/ { print }' infile.txt

 

3. Print first entry or column of the lines that contains a particular string, for example here: ‘value’.

 

awk '/value/ { print $1 }' infile.txt

 

-------------------------------------------------------------------------------------------------------------------------------------

Exercise 1:

Create a file with your name using vi command and enter the following text.

Enter in to this world

We learn awk here

It is a stream editor

It is a training

 

a) Display the entire file

b) Display the second line.

c) Display the output  

 

---------------------------------------------------------------------------------------------------------------------------------------

 

4. Retrieve fields separated by special character.

If we have to retrieve data from file containing some other character like colon (in this example) we can use the syntax:

awk –F:{statement}

Example:

To retrieve the value:   

From the text named sample: This is a variable:field:type line

                                                 There can be multiple:type:values here

 

awk '{print $4}' sample | awk -F: '{print $2}'

 

5. Retrieving data from csv file, CSV files are nothing but text files separated by “,” instead of space

 

Examples of a CSV file entry:

 

"212.249.XX.X","001804539XXXX","4121550XXXX","SWITZERLAND","Swiss SIP","2009-10-31 23:52:43","0:36","36","0.00300"

"212.249.XX.X","4179748XXXX","4121550XXXX","SWITZERLAND","Swiss SIP","2009-10-31 23:52:38","0:25","25","0.00209"

 

In the above file to retrieve the column 2 and column 8 and send the result to a new csv file you can use the below command

 

awk -F\",\" '{print"\""$2"\",\""$8"\""}' test.txt > out.csv

 

We use \ whenever we type “. Quotes have got two meaning in awk.

Hence “,” field separator is mentioned as \”,\”.

Similarly, to make a csv file, after and before each column value the field separator should be present. The field separators are added in the print command.

 

 

---------------------------------------------------------------------------------------------------------------------------------------

 

Exercise 2:

 

1. Copy the below text in a txt file

 

"4179367XXXX","4121550XXXX","SWITZERLAND","Swiss SIP","2009-10-01 23:58:52","2:42","0"

"4179520XXXX","4121550XXXX","SWITZERLAND","Swiss SIP","2009-10-01 23:58:43","0:32","0"

"4176307XXXX","4121550XXXX","SWITZERLAND","Swiss SIP","2009-10-01 23:58:43","0:00","0"

"4179367XXXX","4121550XXXX","SWITZERLAND","Swiss SIP","2009-10-01 23:58:33","0:06","0"

 

2. Retrieve the content of 1st column and 6th column.

3. Append 0: to the content of 6th column.

4. Send the output into a file named out.csv.

---------------------------------------------------------------------------------------------------------------------------------------

Validation

Create a validation document containing the Print Screen of each step and the outputs.

Upload the validation document on the training session web site, according to the guidelines.

 

 

References

http://www.bolthole.com/AWK.html

 

*   *   *