Emin Gabrielyan

Shell pipelines

Echo shell command prints its argument on the standard output. The “-e” option forces the command to interpret special symbols, such as “\n” for a new line.

$ echo "aaaa"

aaaa

$ echo "aaaa\nbbb\ncccc"

aaaa\nbbb\ncccc

$ echo -e "aaaa\nbbb\ncccc"

aaaa

bbb

cccc

The output of the command can be pipelined using the symbol “|” into an input of another command. The command cat prints on its standard output whatever is received as input.

$ echo -e "aaaa\nbbb\ncccc" | cat

aaaa

bbb

cccc

The following perl command does exactly the same as the command cat. It prints into the standard output the lines received at input, without modifying them. The option “-e” tells perl that the instructions (such as print) are provided in the command line. The option “-n” tells perl to repeat the instruction for each input line.

$ echo -e "aaaa\nbbb\ncccc" | perl -ne 'print'

aaaa

bbb

cccc

The print command of perl prints by default the input line. The input line is stored in the variable “$_”. Therefore “print” and “print $_” are two equivalent commands:

$ echo -e "aaaa\nbbb\ncccc" | perl -ne 'print $_'

aaaa

bbb

cccc

Matching

The matching command “/…/” contains an expression that is searched in the default variable “$_”. The matching expression can be used in a if-statement. The command(s) following the if-statement are executed only for the lines that matched the expression “aa”.

$ echo -e "aaaa\nbbb\ncccc" | perl -ne 'if(/aa/){print}'

aaaa

The perl language permits to rewrite the same if-statement in an inversed order if we deal with a single command:

$ echo -e "aaaa\nbbb\ncccc" | perl -ne 'print if(/aa/)'

aaaa

The expression can be more complex than simply a sub-string. For example “[ab]” signifies either “a” or “b”:

$ echo -e "aaaa\nbbb\ncccc" | perl -ne 'print if(/[ab]/)'

aaaa

bbb

The following reminds you that the “print” command above prints by default the variable “$_”.

$ echo -e "aaaa\nbbb\ncccc" | perl -ne 'print $_ if(/[ab]/)'

aaaa

bbb

The dot command concatenates two strings:

$ echo -e "aaaa\nbbb\ncccc" | perl -ne 'print "line: ".$_ if(/[ab]/)'

line: aaaa

line: bbb

We use the same dot command to add a new line symbol “\n”. The variables “$_” already contained the new line symbol (provided in input strings), the reason why we see in the output of the command the empty lines (due to double new-line symbols).

$ echo -e "aaaa\nbbb\ncccc" | perl -ne 'print "line: ".$_."\n" if(/[ab]/)'

line: aaaa

line: bbb

Here we learn a new special variable of perl “$&”. This variable contains the substring of the input line that is responsible for matching the regular expression. In this piece of code, the new line character “\n” is obligatory as the matched substring does not contain its own new-line character.

$ echo -e "aaaa\nbbb\ncccc" | perl -ne 'print "matched: ".$&."\n" if(/[ab]/)'

matched: a

matched: b

Here we match a substring containing “a” or “b” repeated 2 times. The expression “[ab]{2}” is equivalent of “[ab][ab]”.

$ echo -e "aaaa\nbbb\ncccc" | perl -ne 'print "matched: ".$&."\n" if(/[ab]{2}/)'

matched: aa

matched: bb

Now the same as above but with 3 characters:

$ echo -e "aaaa\nbbb\ncccc" | perl -ne 'print "matched: ".$&."\n" if(/[ab]{3}/)'

matched: aaa

matched: bbb

When passing to 4 characters, the input line “bbb” does not match anymore (to “[ab]{4}” pattern).

$ echo -e "aaaa\nbbb\ncccc" | perl -ne 'print "matched: ".$&."\n" if(/[ab]{4}/)'

matched: aaaa

Substitution

The “s” perl command substitutes a substring of the input line by a new substring. The printout below shows a substitution of the character “a” by the character “A”. Only one substitution per line is carried out.

$ echo -e "aaaa\nbbb\ncccc" | perl -ne 's/a/A/; print'

Aaaa

bbb

cccc

Below we substitute two characters “aa” by “A”:

$ echo -e "aaaa\nbbb\ncccc" | perl -ne 's/a{2}/A/; print'

Aaa

bbb

cccc

Now we substitute any two consecutive characters of “a” or “b” by “K”.

$ echo -e "aaaa\nbbb\ncccc" | perl -ne 's/[ab]{2}/K/; print'

Kaa

Cccc

The character “^” indicates on the beginning of a line. This is not a character that exists in the input line and is used for referring to the beginning of the line. We add the character “K” at the beginning of each input line.

$ echo -e "aaaa\nbbb\ncccc" | perl -ne 's/^/K/; print'

Kaaaa

Kbbb

Kcccc

Similarly to “^” representing the beginning of the line, the character “$” appearing in the regular expression represents the end of the line.

$ echo -e "aaaa\nbbb\ncccc" | perl -ne 's/$/K/; print'

aaaaK

bbbK

ccccK

We use a pattern “[ab]” to indicate any character which is either “a” or “b”. If we want to indicate absolutely any character we can use dot “.” in the regular expression. The following piece of code replaces the last character of each string by “K”.

$ echo -e "aaaa\nbbb\ncccc" | perl -ne 's/.$/K/; print'

aaaK

bbK

cccK

The sequence of characters appearing in square brackets represents the list of possibilities that can match. For example “[ab]” means “a” or “b” and “[abc]” means “a”, “b”, or “c”. If the characters in the list are in the alphabetical order you can specify a range “[a-c”]. Therefore “[abc]” and “[a-c]” are two equivalent notations of the same expression.

$ echo -e "aaaa\nbbb\ncccc" | perl -ne 's/[a-c]/K/; print'

Kaaa

Kbb

Kccc

Match “a”, “b” or “c” two times and replace by “K” in each input line:

$ echo -e "aaaa\nbbb\ncccc" | perl -ne 's/[a-c]{2}/K/; print'

Kaa

Kcc

You already learned that “{2}” or “{3}” are quantifiers and signify that the previous entity repeats 2 or 3 times respectively. The symbol “*” is also a quantifier and signifies that the previous entity can be repeated any times, from 0 to any number. As a result all strings matched, and replaced by a single symbol “K”.

$ echo -e "aaaa\nbbb\ncccc" | perl -ne 's/[a-c]*/K/; print'

The following has the same effect. The dot “.” meaning any symbol and the quantifier asterisk “*” meaning any time.

$ echo -e "aaaa\nbbb\ncccc" | perl -ne 's/.*/K/; print'

Now we match the entire string, and replace it by the special variable “$&” representing the substring being matched. It means we do nothing as we match a substring (or the entire string) and replace it by itself.

$ echo -e "aaaa\nbbb\ncccc" | perl -ne 's/.*/$&/; print'

aaaa

bbb

cccc

The use of the variable becomes more interesting in the following example, where we duplicate the strings:

$ echo -e "aaaa\nbbb\ncccc" | perl -ne 's/.*/$& $&/; print'

aaaa aaaa

bbb bbb

cccc cccc

Do not hesitate to triplicate the strings if you wish so:

$ echo -e "aaaa\nbbb\ncccc" | perl -ne 's/.*/$& $& $&/; print'

aaaa aaaa aaaa