Research on the cause of call interruptions (part 4)

Christian Lathion, 2009-02-09

Switzernet

 

Research on the cause of call interruptions (part 4) 1

OpenSER configuration. 1

Identifying individual interrupted calls. 2

Further work. 4

References. 4

 

 

This document describes our tests that were made to identify possible causes of call interruptions (see [2009-02-01] for a statistical study of the call interruptions observed on our network). It follows previous studies of the problem [2009-02-03] [2009-02-05] [2009-02-06], and applies the proposed configuration modification on our production servers.

OpenSER configuration

For reminder, our initial OpenSER re-INVITE configuration (which handled re-INVITES statelessly) is as follows:

 

  if(loose_route())

  {

    $var(comm)="LooseR.";

    route(11);

    if(method=="INVITE")

    {

      sl_send_reply("100","Your Re-INVITE is received");

      sl_send_reply("488","Your Re-INVITE is ignored");

      exit;

    }

    t_relay();

    exit;

  }

 

The previous studies on call interruptions proposed the following modification, which handle re-INVITES statefully by creating a new transaction. The goal is to enable packet retransmission in case of network loss:

 

  if(loose_route())

  {

    if(method=="INVITE")

    {

      sl_send_reply("100","Your Re-INVITE is received");

      t_newtran();

      t_reply("488","Your Re-INVITE is ignored");

      exit;

    }

    t_relay();

    exit;

  }

 

The modification that is now applied to the production OpenSER servers follows. It has been slightly updated for clarity and error management. If the new transaction cannot be created (which should not happen), we reply statelessly as in the initial configuration. The 100 provisional request has been included in the stateful logic, even if it will never be retransmitted:

 

  if(loose_route())

  {

    $var(comm)="LooseR.";

    route(11);

    if(method=="INVITE")

    {

      if(t_newtran())

      {

        t_reply("100","Your Re-INVITE is received");

        t_reply("488","Your Re-INVITE is ignored");

        exit;

      }

      else

      {

        xlog("L_INFO","$Ts ==> Newtran error $oU\n");

        sl_send_reply("100","Your Re-INVITE is received (SL)");

        sl_send_reply("488","Your Re-INVITE is ignored (SL)");

        exit;

      }

    }

    t_relay();

    exit;

  }

Identifying individual interrupted calls

To obtain immediate feedback after the configuration update, we need a way to catch individual interruptions, instead of statistical analysis which needs a large amount of calls. For this, we will catch all “unmatched” BYE requests (issued by the SIP proxy on its own, neither received from the user nor the vendor side).

 

As shown in the previous researches on call interruptions, the BYE following the loss of a re-INVITE’s 488 reply follows this pattern of unmatched BYE, while on a normal call clearing, the SIP proxy should never issue a BYE by itself. This is however only true on a perfect setup. We will still observe unmatched BYE packets in case of media session/signaling failure, but these can be distinguished from the interruption pattern we are currently tracking.

 

To filter these unmatched BYE requests, we use the following awk script. It stores the call-id of all incoming BYE requests (received from the user or vendor side) in an array. For outgoing BYE requests (sent by the SIP proxy), it checks if the call-id is already present in the array. If yes (normal situation), the BYE request is relayed from the user or vendor side. If not, the SIP proxy issued the BYE by itself, and the script outputs the corresponding call-id for manual study of the disconnect reason:

 

awk '

  BEGIN {

    max=0;

    found=0;

    array[0]="beg";

  }

 

  {

    if ($0 ~ /^U /) {

      if ($0 ~ /^U 91\.121\.75\.124/) d="s";

      else d="r";

    }

 

    if ($0 ~ /^Call-ID: (.*)\./) {

      cid=$2;

      #print "rcv: " cid;

      for (i in array) {

        if (cid==array[i]) {

          found=1;

          break;

        }

      }

     

      if(found==1) {

        #retransmission

        #print "ret: " cid;

      }

      else {

        #print "new: " cid;

        max++;

        array[max]=cid;

      }

 

      if (d=="s") {

        if(found==1) {

          #normal call clearing

          #print "ok : " cid;

        }

        else {

          #disconnected

          print "dis : " cid;

        }

      }

      found=0;

    }

  }

'

 

The script is used in conjunction with ngrep, which allows filtering all BYE packets issued or received by the SIP proxy:

 

ngrep -pql -W byline "^BYE" port 5060 | ./a2.txt

 

Our initial tests confirm the previous hypotheses and proposed configuration modification. After the update, the script did not catch any interrupted calls corresponding to the described interruption pattern. This must now be confirmed by a statistical analysis on several days.

Further work

With statistical data, confirm that the configuration update solves all targeted call interruptions. We must also make sure the creation of a new transaction for each re-INVITE does not have side-effects.

References

Research on the cause of call interruptions (part 1) [ch1] [ch2]

Research on the cause of call interruptions (part 2) [ch1] [ch2]

Research on the cause of call interruptions (part 3) [ch1] [ch2]

Statistics on the global interruptions problem [ch1] [ch2]

Statistics on all call interruptions for 2008: [ch1] [ch2]

 

 

 

* * *