Emin Gabrielyan
2009-02-09
1. Searching missing final responses using an Excel file
2. Displaying provisional replies not followed by final replies
3. Hourly loss rate of final responses
3.1. Uploading the script to SIP servers
3.2. Parallel execution of the script on all SIP servers
In this document we continue the study of call interruptions [090203-interrupt-ratio], [ch2]. When a B2BUA SIP proxy (PortaSIP) sends a re-INVITE and receives a provisional ‘100’ reply, the server knows that the re-INVITE is successfully delivered to the destination node (OpenSER). Otherwise the SIP server will retransmit the re-INVITE until provisional or final response is received (see section ‘No reply to re-INVITES’ of [2009-02-06]). But when provisional response is already received it is the responsibility of the replying peer (i.e. of OpenSER) to make sure that the final reply is successfully delivered to the SIP server (PortaSIP, which invoked the re-INVITE method). OpenSER expects an ACK, as a confirmation of the reception of its final response to B2BUA. Therefore OpenSER must keep transmitting its reply until ACK is received.
Well our OpenSER was erroneously configured for a stateless handling of re-INVITEs. Joined is a configuration with a stateless re-INVITE processing [cfg.txt] and with state-full processing [cfg.txt]. Below is the difference between these two files:
$ diff ab99.cfg.txt ab98.cfg.txt
144,148c144,156
< sl_send_reply("100","Your Re-INVITE is received");
< # sl_send_reply("200","OK");
< # Test for call interruptions, 2009-01-30 17:00
< sl_send_reply("488","Your Re-INVITE is ignored");
< exit;
---
> if(t_newtran())
> {
> t_reply("100","Your Re-INVITE is received");
> t_reply("488","Your Re-INVITE is ignored");
> exit;
> }
> else
> {
> xlog("L_INFO","$Ts ==> Newtran error $oU\n");
> sl_send_reply("100","Your Re-INVITE is received (SL)");
> sl_send_reply("488","Your Re-INVITE is ignored (SL)");
> exit;
> }
With stateless processing the provisional and final responses are sent and are forgotten, meaning that there is no sense of a transaction and of there is no a follow-up mechanism. As a result, the server does not wait for an ACK after the transmission of the final reply. Neither, the server will retransmit the final reply, in case of a missing ACK. At the other end, if the provisional reply is received but the final one is lost (or dropped), the B2BUA will wait for 300 seconds and will disconnect the dialog. This explains the interruptions observed at every 32 second starting from the 364th second (corresponding to the second re-INVITE). These interruptions are occurring due to losses of final replies occurred 300 seconds ago. In case of a lost reply to the very first re-INVITE, the delay is of 900 seconds [2009-02-06]. Therefore the missing final reply of the very first re-INVITE causes an interruption at 932nd second while all the following losses cause interruptions 300 seconds later. The following distribution of call durations for 2008 shows the interruption at 932nd second and all interruptions of following losses starting from the 364th second and repeating with intervals of 32.1 seconds.
An important amount of interruptions at the 632nd second still remain without explication.
We developed scripts permitting to catch in voluminous log files of SIP servers (PortaSIP B2BUA) all provisional replies not followed by final replies. The following sections introduce these scripts. The most complete analysis is provided by the third script (Hourly loss rate of final responses), while the two others help to understand the messages and their processing.
Here we introduce a script getting all provisional and final replies sent in response to re-INVITES, as well as all BYE messages retrieved from log files of SIP servers (PortaSIP) [sh.txt].
The scrip creates a CSV file with 5 columns. The first column contains the time. The date and time are written into the text CSV file in the format of the following example "Feb 08 00:00:06.137". The Excel application interprets such date format correctly, understanding also the millisecond values. Once saved in binary XLS format, you can display in Excel the time with milliseconds using the following custom format ‘yyyy-mm-dd hh:mm:ss.000’. The second column is the Call-ID of the dialog. In the third column, the symbols ‘T’ or ‘R’ indicate the transmission or reception. The forth column is the peer, and the last fifth column is the SIP header’s start-line.
Before further processing with Excel, we shall sort the retrieved entries first by the second column (Call-ID), then by the first column (Time). Thus, all exchanges of individual dialogs (calls) will be grouped and will follow a chronological order. A formula in the 6th column (F) of the joined Excel file [xls] is for identification of all cases when a provisional reply ‘100’ is not followed by a final reply ‘488’. When the final response is lost, we display the duration in seconds from the provisional reply until the next message of the dialog, which is usually a BYE message arriving after a 300 second delay.
=IF(AND(E1=$E$5,E2<>$E$6),(A2-A1)*24*3600,"")
For the short period of logs loaded into the Excel file, the formulas show 4 missing final replies. The BYE can be initiated by user before expiration of the 300 second delay (one case in the Excel file). For other 3 cases, BYE was triggered by the 300 second timer of the SIP server [xls].
The joined script finds in a SIP log file all cases with missing final replies [sh.txt]. For each case the script displays the delay in seconds and the message (usually BYE) following the provisional replies.
sona@fr4$ ./a5.sh
Feb 08 00:04:00 33.7250 BYE sip:41215502294@89.236.132.53:50000 SIP/2.0
Feb 08 00:06:12 299.9540 BYE sip:+3232940588@146.188.127.1:5060 SIP/2.0
Feb 08 00:28:01 300.0350 BYE sip:+41613112347@146.188.127.1:5060 SIP/2.0
Feb 08 00:32:50 300.0210 BYE sip:212.249.15.4;r2=on;lr=on;ftag=89f1a111a73bd044afcc8a673e5fe18d SIP/2.0
Feb 08 08:28:28 300.1210 BYE sip:+41622961134@146.188.127.2:5060 SIP/2.0
Feb 08 09:15:18 9.1570 BYE sip:91.121.75.124:5061 SIP/2.0
Feb 08 09:32:51 300.0940 BYE sip:+213770943631@146.188.127.2:5060 SIP/2.0
Feb 08 09:45:59 299.9960 BYE sip:+41435269393@146.188.127.1:5060 SIP/2.0
Feb 08 10:06:52 256.1540 BYE sip:91.121.75.124:5061 SIP/2.0
Feb 08 10:06:58 300.0480 BYE sip:212.249.15.4;r2=on;lr=on;ftag=7fd49f3105478de4df3c13bde7ed2890 SIP/2.0
Feb 08 10:17:19 221.1620 BYE sip:91.121.75.124:5061 SIP/2.0
Feb 08 10:21:25 300.0800 BYE sip:212.249.15.3;r2=on;lr=on;ftag=64a19ddaa12846f90ad6a2c759abebe2 SIP/2.0
Feb 08 10:30:22 300.0200 BYE sip:+393392484678@146.188.127.2:5060 SIP/2.0
Feb 08 10:36:29 165.9900 BYE sip:91.121.75.124:5061 SIP/2.0
Feb 08 10:45:39 300.0430 BYE sip:+41216014436@146.188.127.1:5060 SIP/2.0
Feb 08 10:45:51 248.4790 BYE sip:91.121.75.124:5061 SIP/2.0
Feb 08 10:47:22 900.1480 BYE sip:212.249.15.3;r2=on;lr=on;ftag=058c876a885fb1cfb49da38cb0c9b5cf SIP/2.0
Feb 08 10:47:59 64.1320 BYE sip:91.121.75.124:5061 SIP/2.0
Feb 08 10:58:09 300.0360 BYE sip:212.249.15.4;r2=on;lr=on;ftag=56e3fe47c93f2737b6f7c4407eca836c SIP/2.0
With this script there is no need to post-process the output with Excel file. Here is a portion of the code in Perl, responsible for identifying the lost final responses.
$answ1="SIP/2.0 100 Your Re-INVITE is received"; $answ2="SIP/2.0 488 Your Re-INVITE is ignored"; |
The first expected answer is the provisional ‘100’ response. The second answer is the final ‘488’ response. |
if($h2 eq $answ1) { $call{$id}=$time; $nansw1++; } else { if(defined $call{$id}) { if($h2 ne $answ2) { printf "%s %s %s:%s:%s %9.4f %s\n", $mmm,$dd,$hh,$mm,$ss, $time-$call{$id},$h2; } delete $call{$id}; } } |
If a provisional response is received we store its time in an associative array indexed by the Call-ID of the dialog. As soon as another message with the same Call-ID is received, we delete the entry. But if we deal with a message which is not the final reply, then we display its start-line with the delay in seconds from the moment the provisional response was received. |
Finally, we developed a script computing the hourly loss rate of final replies by processing the daily log file containing all received and transmitted SIP messages. The script computes the numbers of provisional replies and the number of cases when final reply does not follow the provisional reply.
if($h2 eq $answ1) { $call{$id}=$time; $nansw1{$hour}++; } else { if(defined $call{$id}) { if($h2 ne $answ2) { $nlost2{$hour}++; } delete $call{$id}; } } |
If provisional reply ‘100’ is received the Call-ID is saved and re-INVITE counter is incremented. If another message for the same Call-ID is received, the Call-ID is deleted, and if that message is not a final ‘488’ reply the loss counter is incremented. |
Results of logs of four days on 6 SIP servers are collected into an Excel file [xls]. With a set of drop-down menus of the Excel file we can select the characteristics of each histogram (a particular day or all days, a particular hour or all hours, and a particular server or all servers):
[xls]
In the chart below, the first 10 green histograms correspond to the losses detected on fr2.youroute.net during hours from 15h to 00h for all days of observation. The next 10 blue histograms show the losses (or dropped packets) detected on fr4.youroute.net during the same hours for all days of observation. Finally the rose histograms show the losses on fr4.youroute.net on 2009-2-9 during hours from 8h to 17h. The range between 11h to 14h is the period when the state-full processing of re-INVITEs was activated on OpenSER.
[xls]
The part of the script responsible for state-full processing of re-INVITEs activated on OpenSER from 8h to 17h is shown below. The state-full processing will be definitively launched on 2009-02-10:
if(loose_route())
{
$var(comm)="LooseR.";
route(11);
if(method=="INVITE")
{
if(t_newtran())
{
t_reply("100","Your Re-INVITE is received");
t_reply("488","Your Re-INVITE is ignored");
exit;
}
else
{
xlog("L_INFO","$Ts ==> Newtran error $oU\n");
sl_send_reply("100","Your Re-INVITE is received (SL)");
sl_send_reply("488","Your Re-INVITE is ignored (SL)");
exit;
}
}
t_relay();
exit;
}
Modifications of script are carried out locally. The modified version of script is uploaded to all servers with a single bash line:
$ for h in us1 fr1 fr2 fr3 fr4 dk1; do scp a6.txt sona@$h.youroute.net:~/090208-lost-reinvite-answers/a6.sh; done 2>/dev/null
a6.txt 100% 2109 2.1KB/s 00:00
a6.txt 100% 2109 2.1KB/s 00:00
a6.txt 100% 2109 2.1KB/s 00:00
a6.txt 100% 2109 2.1KB/s 00:00
a6.txt 100% 2109 2.1KB/s 00:00
a6.txt 100% 2109 2.1KB/s 00:00
You must configure ssh on all servers and on your local machine so as to login to any of these servers without being asked for a password. You need to create a pair of private and public RSA keys locally. Your public RSA key (one long line) must be inserted into .ssh/authorized_keys file (of user ‘sona’) at all concerned SIP servers (there are already other lines).
$ ls ~/.ssh
id_rsa id_rsa.pub known_hosts known_hosts2
$ ssh sona@us1.youroute.net "ls .ssh"
authorized_keys
known_hosts
$ cat ~/.ssh/id_rsa.pub
ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEAy77jmVwfzxnnK3ZTyYKpzCOIEvYCEDEKb2EQ+CBG7118Xs86Dvwacq
z9htSzw1AiKAq7QC6ZBiMy5fRLpWGI2ZwfavtF0HuGofBDM0aZyaFT8CTtsZvfilLcovauc0TVHefX918IZkalTSSS
7enakdEYxu2q5uK52/lv5Uqx5DHKgAa3FCwwh5dJiTZA6HLWYguTieL4+Q== Emin.Gabrielyan@pc8
$ ssh sona@us1.youroute.net "cat .ssh/authorized_keys"
...
ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEAy77jmVwfzxnnK3ZTyYKpzCOIEvYCEDEKb2EQ+CBG7118Xs86Dvwacq
z9htSzw1AiKAq7QC6ZBiMy5fRLpWGI2ZwfavtF0HuGofBDM0aZyaFT8CTtsZvfilLcovauc0TVHefX918IZkalTSSS
7enakdEYxu2q5uK52/lv5Uqx5DHKgAa3FCwwh5dJiTZA6HLWYguTieL4+Q== Emin.Gabrielyan@pc8
...
Processing of the log file with all daily SIP messages may take tens of minutes on each server. The processing is launched simultaneously on all machines in parallel. The data is collected back via ssh and is merged into a local CSV file:
$ for h in us1 fr1 fr2 fr3 fr4 dk1; do ssh sona@$h.youroute.net "cd 090208-lost-reinvite-answers; ./a6.sh $h .0" & done 2>/dev/null | tee a6.csv
Feb 08 00:00:00, us1, 869, 0, 0.000
Feb 08 01:00:00, us1, 471, 0, 0.000
Feb 08 02:00:00, us1, 519, 0, 0.000
Feb 08 03:00:00, us1, 217, 0, 0.000
Feb 08 04:00:00, us1, 303, 0, 0.000
Feb 08 05:00:00, us1, 205, 0, 0.000
...
Feb 08 16:00:00, fr4, 1050, 13, 1.238
Feb 08 17:00:00, fr4, 1375, 28, 2.036
Feb 08 18:00:00, fr4, 1502, 38, 2.530
Feb 08 19:00:00, fr4, 2103, 56, 2.663
Feb 08 20:00:00, fr4, 2096, 65, 3.101
Feb 08 21:00:00, fr4, 1989, 57, 2.866
Feb 08 22:00:00, fr4, 1148, 24, 2.091
Feb 08 23:00:00, fr4, 956, 18, 1.883
The argument “.0” indicates the suffix of the sip.log file and in particular means the log of the previous day. The SIP servers keep unzipped the files of three past days (with “.0”, “.1”, and “.2” suffixes respectively). For processing the log of the current day, omit this argument.
Current day:
$ for h in us1 fr1 fr2 fr3 fr4 dk1; do ssh sona@$h.youroute.net "cd 090208-lost-reinvite-answers; ./a6.sh $h" & done 2>/dev/null | tee out.csv
Yesterday:
$ for h in us1 fr1 fr2 fr3 fr4 dk1; do ssh sona@$h.youroute.net "cd 090208-lost-reinvite-answers; ./a6.sh $h .0" & done 2>/dev/null | tee out.csv
Two days before:
$ for h in us1 fr1 fr2 fr3 fr4 dk1; do ssh sona@$h.youroute.net "cd 090208-lost-reinvite-answers; ./a6.sh $h .1" & done 2>/dev/null | tee out.csv
Three days before:
$ for h in us1 fr1 fr2 fr3 fr4 dk1; do ssh sona@$h.youroute.net "cd 090208-lost-reinvite-answers; ./a6.sh $h .2" & done 2>/dev/null | tee out.csv
The file ‘out.csv’ will be created on your local machine. Use different names, rename, or use the append ‘-a’ key for tee.
Retrieving times, call-ids, and SIP message start-lines for detecting losses of final replies within an Excel file [sh.txt]
Post-processing of the output of the previous script: an example detecting lost final replies within an Excel file [xls]
Detecting the losses of final replies and computing the time between the provisional reply and the BYE message [sh.txt]
Computing hourly loss rate of final replies [sh.txt]
CSV files of hourly rates concurrently computed on six servers and downloaded via ssh: February 6 [csv], 7 [csv], 8 [csv], 9 [csv], 10 [csv] (until 13h only)
Excel file with drop-down menus for selection of days, hours, and servers to be considered in loss statistics [xls] (until 19h for February 9)
Log file of execution of scripts of this project [log.txt]
Configuration of OpenSER with stateless processing of re-INVITEs [cfg.txt]
Configuration of OpenSER with state-full processing of re-INVITEs [cfg.txt] (this configuration will enter into force tomorrow morning on 2009-2-10)
Interruptions of calls caused due to looses of final replies of re-invites, computing the loss rate (this document):
http://unappel.ch/people/emin-gabrielyan/public/090209-reinvite-repl-losses-siplog/
http://switzernet.com/people/emin-gabrielyan/090209-reinvite-repl-losses-siplog/
Tracking and understanding the behavior of PortaSIP server with respect to re-INVITES by using a test-bed OpenSER in UK:
http://switzernet.com/public/090206-interruptions-lost-response-3/
http://unappel.ch/public/090206-interruptions-lost-response-3/
Identifying and explaining the reason of interruption of calls at 32 second intervals:
http://switzernet.com/public/090205-interruptions-lost-response-2/
http://unappel.ch/public/090205-interruptions-lost-response-2/
Introduction of the Interrupt Ratio and analyzing annual statistics of 2008:
http://unappel.ch/people/emin-gabrielyan/public/090203-interrupt-ratio/
http://switzernet.com/people/emin-gabrielyan/090203-interrupt-ratio/
Searching interruption reason by analyzing SIP message flow charts (with wireshark) directly on the B2BUA machines:
http://switzernet.com/public/090203-interruptions-lost-response/
Call interruption problem in the network of Switzernet, study of durations and probabilities:
http://unappel.ch/people/emin-gabrielyan/public/090201-call-interrupts/
http://4z.com/people/emin-gabrielyan/public/090201-call-interrupts/
http://switzernet.com/people/emin-gabrielyan/090201-call-interrupts/
Identifying the interruptions on call statistics with a statistical simulation:
http://switzernet.com/public/081118-call-interruptions/
http://unappel.ch/public/081118-call-interruptions/
Identifying abnormal interruptions at 10m32s and at ~11m on a real traffic:
http://switzernet.com/public/081119-interuption-d-appel/
http://unappel.ch/public/081119-interuption-d-appel/
Identifying interruptions on analytically simulated model:
http://unappel.ch/people/emin-gabrielyan/public/081120-call-interruptions/
An attempt to find a connection between interruptions and different factors such as device type, SIP server, and user’s ISP:
http://switzernet.com/public/081223-call-interruptions-statistics/
http://switzernet.com/company/081223-call-interruptions-statistics/
Procedure of refunds to affected customers:
http://switzernet.com/company/081124-refund-interruption-calls/
RSA in cryptography stands for the founders' last names: Ron Rivest, Adi Shamir and Len Adleman
SIP stands for Service Initiation Protocol [rfc3261]
B2BUA stands for Back to Back User Agent
ISP stands for Internet Service Provider
ACK in SIP protocol stands for Acknowledgment
CSV stands for Comma Separated Values
RFC stands for Request for Comments
* * *
Copyright © 2009 Switzernet