Emin Gabrielyan
Echo shell command prints its argument on the standard output. The -e option forces the command to interpret special symbols, such as \n for a new line.
$ echo "aaaa"
aaaa
$ echo "aaaa\nbbb\ncccc"
aaaa\nbbb\ncccc
$ echo -e "aaaa\nbbb\ncccc"
aaaa
bbb
cccc
$
The output of the command can be pipelined using the symbol | into an input of another command. The command cat prints on its standard output whatever is received as input.
$ echo -e "aaaa\nbbb\ncccc" | cat
aaaa
bbb
cccc
$
The following perl command does exactly the same as the command cat. It prints into the standard output the lines received at input, without modifying them. The option -e tells perl that the instructions (such as print) are provided in the command line. The option -n tells perl to repeat the instruction for each input line.
$ echo -e "aaaa\nbbb\ncccc" | perl -ne 'print'
aaaa
bbb
cccc
$
The print command of perl prints by default the input line. The input line is stored in the variable $_. Therefore print and print $_ are two equivalent commands:
$ echo -e "aaaa\nbbb\ncccc" | perl -ne 'print $_'
aaaa
bbb
cccc
$
The matching command / / contains an expression that is searched in the default variable $_. The matching expression can be used in a if-statement. The command(s) following the if-statement are executed only for the lines that matched the expression aa.
$ echo -e "aaaa\nbbb\ncccc" | perl -ne 'if(/aa/){print}'
aaaa
$
The perl language permits to rewrite the same if-statement in an inversed order if we deal with a single command:
$ echo -e "aaaa\nbbb\ncccc" | perl -ne 'print if(/aa/)'
aaaa
$
The expression can be more complex than simply a sub-string. For example [ab] signifies either a or b:
$ echo -e "aaaa\nbbb\ncccc" | perl -ne 'print if(/[ab]/)'
aaaa
bbb
$
The following reminds you that the print command above prints by default the variable $_.
$ echo -e "aaaa\nbbb\ncccc" | perl -ne 'print $_ if(/[ab]/)'
aaaa
bbb
$
The dot command concatenates two strings:
$ echo -e "aaaa\nbbb\ncccc" | perl -ne 'print "line: ".$_ if(/[ab]/)'
line: aaaa
line: bbb
$
We use the same dot command to add a new line symbol \n. The variables $_ already contained the new line symbol (provided in input strings), the reason why we see in the output of the command the empty lines (due to double new-line symbols).
$ echo -e "aaaa\nbbb\ncccc" | perl -ne 'print "line: ".$_."\n" if(/[ab]/)'
line: aaaa
line: bbb
$
Here we learn a new special variable of perl $&. This variable contains the substring of the input line that is responsible for matching the regular expression. In this piece of code, the new line character \n is obligatory as the matched substring does not contain its own new-line character.
$ echo -e "aaaa\nbbb\ncccc" | perl -ne 'print "matched: ".$&."\n" if(/[ab]/)'
matched: a
matched: b
$
Here we match a substring containing a or b repeated 2 times. The expression [ab]{2} is equivalent of [ab][ab].
$ echo -e "aaaa\nbbb\ncccc" | perl -ne 'print "matched: ".$&."\n" if(/[ab]{2}/)'
matched: aa
matched: bb
$
Now the same as above but with 3 characters:
$ echo -e "aaaa\nbbb\ncccc" | perl -ne 'print "matched: ".$&."\n" if(/[ab]{3}/)'
matched: aaa
matched: bbb
$
When passing to 4 characters, the input line bbb does not match anymore (to [ab]{4} pattern).
$ echo -e "aaaa\nbbb\ncccc" | perl -ne 'print "matched: ".$&."\n" if(/[ab]{4}/)'
matched: aaaa
$
The s perl command substitutes a substring of the input line by a new substring. The printout below shows a substitution of the character a by the character A. Only one substitution per line is carried out.
$ echo -e "aaaa\nbbb\ncccc" | perl -ne 's/a/A/; print'
Aaaa
bbb
cccc
$
Below we substitute two characters aa by A:
$ echo -e "aaaa\nbbb\ncccc" | perl -ne 's/a{2}/A/; print'
Aaa
bbb
cccc
$
Now we substitute any two consecutive characters of a or b by K.
$ echo -e "aaaa\nbbb\ncccc" | perl -ne 's/[ab]{2}/K/; print'
Kaa
Kb
Cccc
$
The character ^ indicates on the beginning of a line. This is not a character that exists in the input line and is used for referring to the beginning of the line. We add the character K at the beginning of each input line.
$ echo -e "aaaa\nbbb\ncccc" | perl -ne 's/^/K/; print'
Kaaaa
Kbbb
Kcccc
$
Similarly to ^ representing the beginning of the line, the character $ appearing in the regular expression represents the end of the line.
$ echo -e "aaaa\nbbb\ncccc" | perl -ne 's/$/K/; print'
aaaaK
bbbK
ccccK
$
We use a pattern [ab] to indicate any character which is either a or b. If we want to indicate absolutely any character we can use dot . in the regular expression. The following piece of code replaces the last character of each string by K.
$ echo -e "aaaa\nbbb\ncccc" | perl -ne 's/.$/K/; print'
aaaK
bbK
cccK
$
The sequence of characters appearing in square brackets represents the list of possibilities that can match. For example [ab] means a or b and [abc] means a, b, or c. If the characters in the list are in the alphabetical order you can specify a range [a-c]. Therefore [abc] and [a-c] are two equivalent notations of the same expression.
$ echo -e "aaaa\nbbb\ncccc" | perl -ne 's/[a-c]/K/; print'
Kaaa
Kbb
Kccc
$
Match a, b or c two times and replace by K in each input line:
$ echo -e "aaaa\nbbb\ncccc" | perl -ne 's/[a-c]{2}/K/; print'
Kaa
Kb
Kcc
$
You already learned that {2} or {3} are quantifiers and signify that the previous entity repeats 2 or 3 times respectively. The symbol * is also a quantifier and signifies that the previous entity can be repeated any times, from 0 to any number. As a result all strings matched, and replaced by a single symbol K.
$ echo -e "aaaa\nbbb\ncccc" | perl -ne 's/[a-c]*/K/; print'
K
K
K
$
The following has the same effect. The dot . meaning any symbol and the quantifier asterisk * meaning any time.
$ echo -e "aaaa\nbbb\ncccc" | perl -ne 's/.*/K/; print'
K
K
K
$
Now we match the entire string, and replace it by the special variable $& representing the substring being matched. It means we do nothing as we match a substring (or the entire string) and replace it by itself.
$ echo -e "aaaa\nbbb\ncccc" | perl -ne 's/.*/$&/; print'
aaaa
bbb
cccc
$
The use of the variable becomes more interesting in the following example, where we duplicate the strings:
$ echo -e "aaaa\nbbb\ncccc" | perl -ne 's/.*/$& $&/; print'
aaaa aaaa
bbb bbb
cccc cccc
$
Do not hesitate to triplicate the strings if you wish so:
$ echo -e "aaaa\nbbb\ncccc" | perl -ne 's/.*/$& $& $&/; print'
aaaa aaaa aaaa
bbb bbb bbb
cccc cccc cccc
$
What is a regular expression? [This document]
http://switzernet.com/3/public/101024-regex/
Advanced Perl regular expressions
http://perldoc.perl.org/perlre.html
* * *
Copyright © 2010 by Switzernet