Unprocessed ACK messages and retransmission of 487 replies by Verizon

 

Emin Gabrielyan

Switzernet

2007-05-24

 

We think to have located the possible source of the problem at the Verizon side. We observed that Verizon is using OpenSER v1.2.0 on Solaris. We created a test bed where we replaced the real Verizon server with a pseudo-Verizon SIP server. On this pseudo-Verizon SIP server we reproduced exactly the same erroneous behaviour. Here we explain the error and we show how to fix it.

 

Recalling the problem.. 1

How it must look like. 2

Simulation of the problem at the Verizon side. 3

Explanation of the error 4

 

Recalling the problem

 

The graph below recalls the problem. In this graph our SIP server with two interfaces (212.249.15.4 and 195.129.125.74) receives an INVITE from 128.179.67.76 and forwards it to Verizon at 212.190.89.137. When the call is cancelled, Verizon sends us the 487 reply and we send back the ACK message. However, Verizon does not close the transaction and continues to send us many 487 status messages.

 

[png]

 

You may wish to look at the diagram of this transaction at the previous hop (128.179.67.76) [png]

 

How it must look like

 

The normally cancelled call transaction must look as shown in the graph below. In this example, the SIP server 192.168.1.15 represents our SIP server; the SIP server 192.168.1.16 represents Verizon. In the shown scenario the so called Verizon does not send us 487 reply after the reception of our ACK.

 

192.168.1.10 is a SIP phone.

 

[png]

 

For the shown scenario we provide the configuration file of the so called Verizon’s SIP server [cfg] (in this example running at 192.168.1.16).

 

Simulation of the problem at the Verizon side

 

We could simulate the current problem on the 192.168.1.16 SIP server which in this test-bed example represents Verizon. In the below graph we see the crazy behaviour of 192.168.1.16 which similarly to the real-life case does not understands the ACK of our server (represented by 192.168.1.15) and keeps transmitting the 487 replies all the time.

 

[png]

 

You may wish to look at the diagram of this transaction at 192.168.1.16 (the so called Verizon) [png]. We provide the erroneous configuration file running on 192.168.1.16 [cfg]. We suspect that Verizon is currently using a configuration file with a similar error.

 

Explanation of the error

 

According to RFC 3261, stateful proxies process the ACK exchanges hop-by-hop. Thus the ACK request is not a subject of retransmission. For this reason, one may think that the t_relay() function must not be invoked for such ACK messages. However you must consider that t_relay() function of the OpenSER transaction module is intelligent enough to not re-transmit such ACK messages. Moreover, the t_relay() function is very much needed, not for retransmission (as its name may suggest), but for telling the module that the transaction is closed, and that the transmission of 487 messages is not needed anymore.

 

As a conclusion: the t_relay() function must be invoked for both types of ACK messages, for those which are processed in loose_route() section (corresponding to answered calls) and for those which are not in the loose_route() section (corresponding to cancelled calls).

 

Note that all next-hop SIP servers (if there are and if they have the same problem) must be also fixed, because the first one-or-two 487 messages are replied by an ACK, but all the following 487 messages will be propagated back.

 

*   *   *