Loops in SIP proxy servers and 483 Too Many Hops due to NOTIFY methods

 

Emin Gabrielyan

Switzernet

2009-04-19

 

405 “Method not Allowed” to all NOTIFY methods

 

Notify message is used by certain UA as keep-alive pings.

 

NOTIFY sip:sip11.youroute.net SIP/2.0

Via: SIP/2.0/UDP 192.168.1.2:5060;branch=z9hG4bK-d8f57e0e

From: 41215502482 <sip:41215502482@sip11.youroute.net>;tag=4480be708ede0800o0

To: <sip:sip11.youroute.net>

Call-ID: 4775cc30-57d65540@192.168.1.2

CSeq: 69818 NOTIFY

Max-Forwards: 70

Event: keep-alive

User-Agent: Linksys/PAP2T-3.1.15(LS)

Content-Length: 0

[log1]

 

Notify method is not expected from UA. According to RFC3265, NOTIFY is transmitted back only when a subscription is made. Well, our proxy server seems never sends SUBSCRIBE to UA, therefore no NOTIFY shall be expected.

 

   Subscriber          Notifier

       |-----SUBSCRIBE---->|     Request state subscription

       |<-------200--------|     Acknowledge subscription

       |<------NOTIFY----- |     Return current state information

       |--------200------->|

       |<------NOTIFY----- |     Return current state information

       |--------200------->|

[rfc3265]

 

NOTIFY methods received from UA are retransmitted in a loop in our proxy, until the number of hops expires. We guess that loops during NOTIFY processing would not occur if an accurate and up to date configuration of alias names would be used. Function subst_hostport_by_maddr() is followed by a uri’s test to ‘myself’ which fails [dir] due to uncompleted list of aliases (cannot be kept constant due to dynamic changes of names).

 

The excessive number of SIP messages overloaded the proxy servers often causing a crash. The following printout shows the statistics of 483 “Too Many Hopes” responses grouped by the method and by sip proxy.

 

$ for h in fr1 fr2 fr3 fr4 fr5 dk1 us1; do echo -e "\n[$h]"; ssh sona@$h.youroute.net "cat \`ls /var/sipenv-*/log/sip.log.2 | head -1\` | awk '/^SIP\\/2.0/{bad=0} /^SIP\\/2.0 483/{bad=1} /^CSeq:/ && bad==1 {print \$3}' | sort | uniq -c"; done 2> /dev/null

[fr1]

 382 ACK

 186 BYE

  68 CANCEL

  31 INFO

2091 INVITE

1229807 NOTIFY

3786 OPTIONS

  73 PUBLISH

14720 REGISTER

 423 SUBSCRIBE

 

[fr2]

 736 ACK

 208 BYE

  48 CANCEL

  36 INFO

3013 INVITE

411064 NOTIFY

 710 OPTIONS

7512 REGISTER

1012 SUBSCRIBE

 

[fr3]

1565 ACK

 478 BYE

 123 CANCEL

  49 INFO

5590 INVITE

819347 NOTIFY

1663 OPTIONS

   6 PUBLISH

13149 REGISTER

2318 SUBSCRIBE

 

[fr4]

   1 ACK

   4 BYE

  32 INVITE

   3 OPTIONS

  86 REGISTER

3739 SUBSCRIBE

 

[fr5]

 890 ACK

 340 BYE

 108 CANCEL

  32 INFO

5868 INVITE

1184536 NOTIFY

1303 OPTIONS

16816 REGISTER

5941 SUBSCRIBE

 

[dk1]

 212 ACK

 128 BYE

  42 CANCEL

  51 INFO

1475 INVITE

183348 NOTIFY

 376 OPTIONS

 352 PING

6885 REGISTER

54958 SUBSCRIBE

 

[us1]

  78 ACK

  39 BYE

  79 INVITE

[log1]

 

The percentage of 483 “Too Many Hopes” response is significant on many servers. Two servers not having, by a chance, any customers using NOTIFY methods as a keep-alive messages, are not exposed to the looping problem.

 

$ for h in fr1 fr2 fr3 fr4 fr5 dk1 us1; do echo $h; ssh sona@$h.youroute.net "tail -400000 \`ls /var/sipenv-*/log/sip.log | head -1\` | awk '/^SIP\/2.0 483/{i++} /^SIP\/2.0/{n++} END{printf \"%10.4f %10d %10d\n\n\",i/n*100,i,n}'"; done 2> /dev/null

fr1

   60.8848       5918       9720

 

fr2

   27.4874       2666       9699

 

fr3

   41.1126       3865       9401

 

fr4

    0.3541         35       9883

 

fr5

   71.3987       6830       9566

 

dk1

   21.8199       2031       9308

 

us1

    0.0000          0      11611

[log1]

 

The troublemaking UA are identified by the following script:

 

$ for h in fr1 fr2 fr3 fr4 fr5 dk1 us1; do echo -e "\n$h"; ssh sona@$h.youroute.net "cat /var/sipenv-*/log/sip.log.0 | awk '/: (SENDING|RECEIVED) message (from|to) /{t=\$0;i=NR} NR==i+1 && /483 Too Many/ {gsub(\"^.*: \",\"\",t); print t}' | sort | uniq -c"; done 2>/dev/null

...

fr4

...

fr5

489095 RECEIVED message from 91.121.167.75:5060:

5751 SENDING message to 144.85.164.27:10005:

  28 SENDING message to 62.203.176.54:5060:

  27 SENDING message to 62.203.76.51:5060:

   6 SENDING message to 81.62.111.243:5060:

  55 SENDING message to 81.62.25.114:5060:

  53 SENDING message to 81.62.30.94:5060:

5735 SENDING message to 83.219.110.204:5060:

4094 SENDING message to 83.76.209.9:55212:

1639 SENDING message to 83.78.32.28:55283:

5730 SENDING message to 84.74.91.143:5060:

   5 SENDING message to 85.1.23.138:5060:

5728 SENDING message to 85.5.122.4:49155:

489095 SENDING message to 91.121.167.75:5060:

 

dk1

...

[log1] [list]

 

The UA types of troublemakers are identified, using a the list of concerned IP addresses grouped per proxy:

 

$ cat 483-*.txt | while read h; do echo -e "\n[$h]"; read ip; echo $ip; ssh sona@$h.youroute.net "cat /var/sipenv-*/log/sip.log.0 | awk -v i=-100 '/: RECEIVED message from $ip/{i=NR;} NR>=i && NR<i+15 && /User-Agent: /{print} NR==i+1{print \$1}' | sort | uniq -c" 2>/dev/null </dev/null; done

 

[fr1]

(217.162.53.151|77.58.247.34|77.58.247.34|92.104.5.95)

...

24949 User-Agent: Linksys/PAP2-3.1.8(LS)

6492 User-Agent: Sipura/SPA941-4.1.8

  13 User-Agent: X-Lite release 1100l stamp 47546

 

[fr2]

(84.72.172.220|85.4.73.249|90.185.178.208|90.185.178.208)

...

236 User-Agent: AVM FRITZ!Box Fon WLAN 7140 Annex A 39.04.57 (Apr 29 2008)

19479 User-Agent: Linksys/SPA941-5.1.8

6211 User-Agent: Sipura/SPA941-4.1.8

 

[fr3]

(189.143.39.156|194.178.110.131|201.102.34.121|217.162.106.142|62.152.197.241|62.152.197.241|84.74.174.104|84.75.112.2|85.218.21.62|85.218.41.129)

...

205 User-Agent: AVM FRITZ!Box Fon WLAN 7170 29.04.57 (Apr 29 2008)

6648 User-Agent: Linksys/PAP2-3.1.22(LS)

6523 User-Agent: Linksys/PAP2-3.1.9(LSc)

67283 User-Agent: Linksys/PAP2T-5.1.1(LS)

6332 User-Agent: Linksys/SPA942-4.1.18

 645 User-Agent: Nimbuzz Single

  60 User-Agent: Nokia RM-227 3.0633.69.00

   8 User-Agent: Purple/2.5.2devel

  41 User-Agent: X-Lite release 1014k stamp 47051

 

[fr4]

(84.72.77.125|84.73.84.240)

...

172 User-Agent: AVM FRITZ!Box Fon 06.04.01 (Jan 25 2006)

 190 User-Agent: AVM FRITZ!Box Fon WLAN 7050 (UI) 14.04.33 (May 10 2007)

 

[fr5]

(144.85.164.27|62.203.176.54|62.203.76.51|81.62.111.243|81.62.25.114|81.62.30.94|83.219.110.204|83.76.209.9|83.78.32.28|84.74.91.143|85.1.23.138|85.5.122.4)

...

175 User-Agent: AVM FRITZ!Box Fon WLAN 7050 14.04.33 (May 10 2007)

 178 User-Agent: AVM FRITZ!Box Fon WLAN 7170 Annex A 58.04.50 (Jan 25 2008)

 189 User-Agent: AVM FRITZ!Box Fon WLAN 7170 Annex A 58.04.59 (Aug 21 2008)

18026 User-Agent: Linksys/PAP2T-3.1.15(LS)

6173 User-Agent: Linksys/SPA1001-3.1.19(SE)

5977 User-Agent: Linksys/WRP400-1.00.04.c

 

[dk1]

(85.4.19.156|85.5.235.37|92.105.152.68|92.105.165.111)

...

6815 User-Agent: AVM FRITZ!Box Fon WLAN 7141 (UI) 40.04.68 (Nov 21 2008)

 216 User-Agent: AVM FRITZ!Box Fon WLAN 7270 54.04.70 (Feb 18 2009)

 286 User-Agent: AVM FRITZ!Box Fon WLAN 7270 54.04.97 (Mar 17 2009)

13317 User-Agent: Linksys/SPA2102-3.3.6

  55 User-Agent: SipPhone v1.1.10/iPhoneOS

 

[us1]

212.249.15.9

...

515 User-Agent: Kamailio (1.4.1-notls (i386/linux))

[log1]

 

Suggested solution

 

We added in the main routing block an immediate reply to the NOTIFY messages.

 

       if (method == "NOTIFY") {

               sl_send_reply("405", "Stay Alive / Method Not Allowed");

               break;

       }

[dir]

 

All templates responsible for generation of ser.cfg files are updated:

 

$ for h in fr1 fr2 fr3 fr4 fr5 dk1 us1; do echo -e "\n[$h]"; scp sona@$h.youroute.net:/usr/home/porta-sip/etc/ser.cfg ser-template-$h-v2.cfg; done 2> /dev/null

$ u2d ser-template-*-v2.cfg

ser-template-dk1-v2.cfg: done.

ser-template-fr1-v2.cfg: done.

ser-template-fr2-v2.cfg: done.

ser-template-fr3-v2.cfg: done.

ser-template-fr4-v2.cfg: done.

ser-template-fr5-v2.cfg: done.

ser-template-us1-v2.cfg: done.

 

$ diff ser-template-dk1-v2.cfg ser-template-fr1-v2.cfg

$ diff ser-template-dk1-v2.cfg ser-template-fr2-v2.cfg

$ diff ser-template-dk1-v2.cfg ser-template-fr3-v2.cfg

$ diff ser-template-dk1-v2.cfg ser-template-fr4-v2.cfg

$ diff ser-template-dk1-v2.cfg ser-template-fr5-v2.cfg

$ diff ser-template-dk1-v2.cfg ser-template-us1-v2.cfg

19a20

> alias="%%CANONIC_NAME2%%"

[log1]

 

The excessive number of NOTIFY methods resulting in 483 “Too Many Hops” disappeared in the logs of the following days:

 

$ for h in fr1 fr2 fr3 fr4 fr5 dk1 us1; do echo -e "\n[$h]"; ssh sona@$h.youroute.net "cat \`ls /var/sipenv-*/log/sip.log.0 | head -1\` | awk '/^SIP\\/2.0/{bad=0} /^SIP\\/2.0 483/{bad=1} /^CSeq:/ && bad==1 {print \$3}' | sort | uniq -c"; done 2> /dev/null

[fr1]

  70 INFO

   3 OPTIONS

 

[fr2]

 

[fr3]

  38 ACK

   3 BYE

  48 INVITE

   7 NOTIFY

   9 OPTIONS

  77 REGISTER

2004 SUBSCRIBE

 

[fr4]

   6

   3 BYE

   6 CANCEL

  24 INVITE

   3 OPTIONS

  71 REGISTER

3752 SUBSCRIBE

 

[fr5]

 263 ACK

   6 BYE

   2 CANCEL

 286 INVITE

  26 NOTIFY

   3 OPTIONS

 108 REGISTER

5866 SUBSCRIBE

 

[dk1]

  35 ACK

  32 BYE

   8 CANCEL

  35 INFO

 304 INVITE

  26 NOTIFY

  31 OPTIONS

 108 PING

1606 REGISTER

53106 SUBSCRIBE

 

[us1]

[log1]

 

The frequent crashes accompanied with high CPU level of ser processes are solved since the first upgrade, which took place on 2009-04-09 in afternoon:

 

[080212 ii iii] [080210] [xls]

 

SUBSCRIBE and PING methods

 

Later, on 2009-04-19, we added replies to SUBSCRIBE and PING methods.

 

      if (method == "SUBSCRIBE") {

            sl_send_reply("501", "Not Implemented");

            break;

      }

 

      if (method == "NOTIFY") {

            sl_send_reply("405", "Stay Alive / Method Not Allowed");

            break;

      }

 

      if (method == "PING") {

            sl_send_reply("200", "OK Alive");

            break;

      }

[dir]

 

The SIP method PING is suggested as an RFD not long time ago [draft]. For keep-alive flow and NAT issues, PING and not NOTIFY should be used. All servers are updated and the sip proxies are restarted.

 

$ for h in fr1 fr2 fr3 fr4 fr5 dk1 us1; do echo -e "\n[$h]"; scp sona@$h.youroute.net:/usr/home/porta-sip/etc/ser.cfg ser-template-$h-v3.download.cfg; done 2> /dev/null

 

[fr1]

ser.cfg                                                                               100% 7210     7.0KB/s   00:00

 

[fr2]

ser.cfg                                                                               100% 7210     7.0KB/s   00:00

 

[fr3]

ser.cfg                                                                               100% 7210     7.0KB/s   00:00

 

[fr4]

ser.cfg                                                                               100% 7210     7.0KB/s   00:00

 

[fr5]

ser.cfg                                                                               100% 7210     7.0KB/s   00:00

 

[dk1]

ser.cfg                                                                               100% 7210     7.0KB/s   00:00

 

[us1]

ser.cfg                                                                               100% 7237     7.1KB/s   00:00

 

$ u2d *.download.cfg

ser-template-dk1-v3.download.cfg: done.

ser-template-fr1-v3.download.cfg: done.

ser-template-fr2-v3.download.cfg: done.

ser-template-fr3-v3.download.cfg: done.

ser-template-fr4-v3.download.cfg: done.

ser-template-fr5-v3.download.cfg: done.

ser-template-us1-v3.download.cfg: done.

 

$ for f in *.download.cfg; do cmd=diff\ `ls *.download.cfg | head -1`\ $f; echo -e "\n$cmd"; $cmd; done

 

diff ser-template-dk1-v3.download.cfg ser-template-dk1-v3.download.cfg

 

diff ser-template-dk1-v3.download.cfg ser-template-fr1-v3.download.cfg

 

diff ser-template-dk1-v3.download.cfg ser-template-fr2-v3.download.cfg

 

diff ser-template-dk1-v3.download.cfg ser-template-fr3-v3.download.cfg

 

diff ser-template-dk1-v3.download.cfg ser-template-fr4-v3.download.cfg

 

diff ser-template-dk1-v3.download.cfg ser-template-fr5-v3.download.cfg

 

diff ser-template-dk1-v3.download.cfg ser-template-us1-v3.download.cfg

19a20

> alias="%%CANONIC_NAME2%%"

 

$

[log2] [dir]

 

Reject SUBSCRIBE before record route routing and loose route

 

To avoid loops with SUBSCRIBE methods they must be rejected before record-route routing. Replies to SUBSCRIBE, NOTIFY, and PING methods are sent before the loose_route() test.

 

      # Registration

      if (method == "REGISTER") {

            route(3);

            break;

      };

 

      # Removing the detected reasons of loops ending by 483 too many hops

      if (method == "SUBSCRIBE") {

            sl_send_reply("501", "Subscribe Not Implemented");

            break;

      }

 

      if (method == "NOTIFY") {

            sl_send_reply("405", "Use PING, Method Not Allowed");

            break;

      }

 

      if (method == "PING") {

            sl_send_reply("200", "OK, Alive here");

            break;

      }

 

      # Do strict routing if pre-loaded route headers present

      if (loose_route() && !(method == "INVITE")) {

            t_relay();

            break;

      };

[dir]

 

The new version of script is uploaded and the sip service is restarted on all servers. As soon as all servers get updated, the tails of last 2'500'000 lines of all sip.log files started to progressively loose the 483 replies.

 

$ for h in fr1 fr2 fr3 fr4 fr5 dk1 us1; do echo -e "\n[$h]"; ssh sona@$h.youroute.net "tail -3000000 \`ls /var/sipenv-*/log/sip.log | head -1\` | awk '/^SIP\\/2.0/{bad=0} /^SIP\\/2.0 483/{bad=1} /^CSeq:/ && bad==1 {print \$3}' | sort | uniq -c"; done 2> /dev/null; echo; date

 

[fr1]

 

[fr2]

 

[fr3]

   1 REGISTER

 105 SUBSCRIBE

 

[fr4]

   3 ACK

   2 REGISTER

 105 SUBSCRIBE

 

[fr5]

  35 ACK

  42 INVITE

   8 NOTIFY

   3 OPTIONS

  16 REGISTER

1055 SUBSCRIBE

 

[dk1]

   2 ACK

   3 INVITE

  18 REGISTER

 624 SUBSCRIBE

 

[us1]

 

Sun Apr 19 18:00:59 WEST 2009

[fr1]

 

[fr2]

 

[fr3]

   1 REGISTER

 105 SUBSCRIBE

 

[fr4]

   3 ACK

   2 REGISTER

 105 SUBSCRIBE

 

[fr5]

  35 ACK

  42 INVITE

   8 NOTIFY

   3 OPTIONS

  16 REGISTER

1055 SUBSCRIBE

 

[dk1]

   2 ACK

   1 INVITE

   8 REGISTER

 333 SUBSCRIBE

 

[us1]

 

Sun Apr 19 18:06:00 WEST 2009

[fr1]

 

[fr2]

 

[fr1]

 

[fr2]

 

[fr3]

   1 REGISTER

 105 SUBSCRIBE

 

[fr4]

   1 REGISTER

  70 SUBSCRIBE

 

[fr5]

  35 ACK

  42 INVITE

   7 NOTIFY

   3 OPTIONS

  13 REGISTER

 950 SUBSCRIBE

 

[dk1]

 

[us1]

 

Sun Apr 19 18:15:37 WEST 2009

[fr1]

 

[fr2]

 

[fr3]

  70 SUBSCRIBE

 

[fr4]

   1 REGISTER

  70 SUBSCRIBE

 

[fr5]

  35 ACK

  42 INVITE

   7 NOTIFY

   3 OPTIONS

  10 REGISTER

 840 SUBSCRIBE

 

[dk1]

 

[us1]

 

Sun Apr 19 18:29:29 WEST 2009

[fr1]

 

[fr2]

 

[fr3]

 

[fr4]

 

[fr5]

 

[dk1]

 

[us1]

 

Sun Apr 19 22:12:46 WEST 2009

[log3]

 

After a while all 483 “Too Many Hops” disappeared. For a final conclusion we sill need a full day statistics.

 

Further work

 

[Done] Seems that SUBSCRIBE must be processed before the loose_route() test of the main routing logic.

 

We may need a crontab script, which monitors the CPU load, restarts the b2b, and if failure occurred, kills the blocked ser processes.

 

References:

 

Breaking the loops caused by NOTIFY messages [090419 ii] (this web page)

 

SIP codes http://free.oxymium.net/Asterisk/SIPCodes.html

 

E. Gabrielyan and C. Lathion, Solving a significant amount of call interruptions by the upgrade of SIP signaling [090320 ii iii]

 

RFC3265 http://www.ietf.org/rfc/rfc3265.txt

 

The SIP PING Method http://ftp.ist.utl.pt/pub/drafts/draft-fwmiller-ping-03.txt

 

Monitoring the CPU loads of all SER processes of B2B agents [080212 ii iii]

 

SER CPU charts http://www.unappel.ch/public/080210-ser-cpu/

 

 

*   *   *

Copyright © 2009, Switzernet