Loops in SIP proxy servers and 483 Too Many Hops due to NOTIFY methods
Emin Gabrielyan
Switzernet
2009-04-19
Notify message is used by certain UA as keep-alive pings.
NOTIFY sip:sip11.youroute.net SIP/2.0
Via: SIP/2.0/UDP 192.168.1.2:5060;branch=z9hG4bK-d8f57e0e
From: 41215502482 <sip:41215502482@sip11.youroute.net>;tag=4480be708ede0800o0
To: <sip:sip11.youroute.net>
Call-ID: 4775cc30-57d65540@192.168.1.2
CSeq: 69818 NOTIFY
Max-Forwards: 70
Event: keep-alive
User-Agent: Linksys/PAP2T-3.1.15(LS)
Content-Length: 0
[log1]
Notify method is not expected from UA. According to RFC3265, NOTIFY is transmitted back only when a subscription is made. Well, our proxy server seems never sends SUBSCRIBE to UA, therefore no NOTIFY shall be expected.
Subscriber Notifier
|-----SUBSCRIBE---->| Request state subscription
|<-------200--------| Acknowledge subscription
|<------NOTIFY----- | Return current state information
|--------200------->|
|<------NOTIFY----- | Return current state information
|--------200------->|
[rfc3265]
NOTIFY methods received from UA are retransmitted in a loop in our proxy, until the number of hops expires. We guess that loops during NOTIFY processing would not occur if an accurate and up to date configuration of alias names would be used. Function subst_hostport_by_maddr() is followed by a uri’s test to ‘myself’ which fails [dir] due to uncompleted list of aliases (cannot be kept constant due to dynamic changes of names).
The excessive number of SIP messages overloaded the proxy servers often causing a crash. The following printout shows the statistics of 483 “Too Many Hopes” responses grouped by the method and by sip proxy.
$ for h in fr1 fr2 fr3 fr4 fr5 dk1 us1; do echo -e "\n[$h]"; ssh sona@$h.youroute.net "cat \`ls /var/sipenv-*/log/sip.log.2 | head -1\` | awk '/^SIP\\/2.0/{bad=0} /^SIP\\/2.0 483/{bad=1} /^CSeq:/ && bad==1 {print \$3}' | sort | uniq -c"; done 2> /dev/null
[fr1]
382 ACK
186 BYE
68 CANCEL
31 INFO
2091 INVITE
1229807 NOTIFY
3786 OPTIONS
73 PUBLISH
14720 REGISTER
423 SUBSCRIBE
[fr2]
736 ACK
208 BYE
48 CANCEL
36 INFO
3013 INVITE
411064 NOTIFY
710 OPTIONS
7512 REGISTER
1012 SUBSCRIBE
[fr3]
1565 ACK
478 BYE
123 CANCEL
49 INFO
5590 INVITE
819347 NOTIFY
1663 OPTIONS
6 PUBLISH
13149 REGISTER
2318 SUBSCRIBE
[fr4]
1 ACK
4 BYE
32 INVITE
3 OPTIONS
86 REGISTER
3739 SUBSCRIBE
[fr5]
890 ACK
340 BYE
108 CANCEL
32 INFO
5868 INVITE
1184536 NOTIFY
1303 OPTIONS
16816 REGISTER
5941 SUBSCRIBE
[dk1]
212 ACK
128 BYE
42 CANCEL
51 INFO
1475 INVITE
183348 NOTIFY
376 OPTIONS
352 PING
6885 REGISTER
54958 SUBSCRIBE
[us1]
78 ACK
39 BYE
79 INVITE
[log1]
The percentage of 483 “Too Many Hopes” response is significant on many servers. Two servers not having, by a chance, any customers using NOTIFY methods as a keep-alive messages, are not exposed to the looping problem.
$ for h in fr1 fr2 fr3 fr4 fr5 dk1 us1; do echo $h; ssh sona@$h.youroute.net "tail -400000 \`ls /var/sipenv-*/log/sip.log | head -1\` | awk '/^SIP\/2.0 483/{i++} /^SIP\/2.0/{n++} END{printf \"%10.4f %10d %10d\n\n\",i/n*100,i,n}'"; done 2> /dev/null
fr1
60.8848 5918 9720
fr2
27.4874 2666 9699
fr3
41.1126 3865 9401
fr4
0.3541 35 9883
fr5
71.3987 6830 9566
dk1
21.8199 2031 9308
us1
0.0000 0 11611
[log1]
The troublemaking UA are identified by the following script:
$ for h in fr1 fr2 fr3 fr4 fr5 dk1 us1; do echo -e "\n$h"; ssh sona@$h.youroute.net "cat /var/sipenv-*/log/sip.log.0 | awk '/: (SENDING|RECEIVED) message (from|to) /{t=\$0;i=NR} NR==i+1 && /483 Too Many/ {gsub(\"^.*: \",\"\",t); print t}' | sort | uniq -c"; done 2>/dev/null
...
fr4
...
fr5
489095 RECEIVED message from 91.121.167.75:5060:
5751 SENDING message to 144.85.164.27:10005:
28 SENDING message to 62.203.176.54:5060:
27 SENDING message to 62.203.76.51:5060:
6 SENDING message to 81.62.111.243:5060:
55 SENDING message to 81.62.25.114:5060:
53 SENDING message to 81.62.30.94:5060:
5735 SENDING message to 83.219.110.204:5060:
4094 SENDING message to 83.76.209.9:55212:
1639 SENDING message to 83.78.32.28:55283:
5730 SENDING message to 84.74.91.143:5060:
5 SENDING message to 85.1.23.138:5060:
5728 SENDING message to 85.5.122.4:49155:
489095 SENDING message to 91.121.167.75:5060:
dk1
...
The UA types of troublemakers are identified, using a the list of concerned IP addresses grouped per proxy:
$ cat 483-*.txt | while read h; do echo -e "\n[$h]"; read ip; echo $ip; ssh sona@$h.youroute.net "cat /var/sipenv-*/log/sip.log.0 | awk -v i=-100 '/: RECEIVED message from $ip/{i=NR;} NR>=i && NR<i+15 && /User-Agent: /{print} NR==i+1{print \$1}' | sort | uniq -c" 2>/dev/null </dev/null; done
[fr1]
(217.162.53.151|77.58.247.34|77.58.247.34|92.104.5.95)
...
24949 User-Agent: Linksys/PAP2-3.1.8(LS)
6492 User-Agent: Sipura/SPA941-4.1.8
13 User-Agent: X-Lite release 1100l stamp 47546
[fr2]
(84.72.172.220|85.4.73.249|90.185.178.208|90.185.178.208)
...
236 User-Agent: AVM FRITZ!Box Fon WLAN 7140 Annex A 39.04.57 (Apr 29 2008)
19479 User-Agent: Linksys/SPA941-5.1.8
6211 User-Agent: Sipura/SPA941-4.1.8
[fr3]
(189.143.39.156|194.178.110.131|201.102.34.121|217.162.106.142|62.152.197.241|62.152.197.241|84.74.174.104|84.75.112.2|85.218.21.62|85.218.41.129)
...
205 User-Agent: AVM FRITZ!Box Fon WLAN 7170 29.04.57 (Apr 29 2008)
6648 User-Agent: Linksys/PAP2-3.1.22(LS)
6523 User-Agent: Linksys/PAP2-3.1.9(LSc)
67283 User-Agent: Linksys/PAP2T-5.1.1(LS)
6332 User-Agent: Linksys/SPA942-4.1.18
645 User-Agent: Nimbuzz Single
60 User-Agent: Nokia RM-227 3.0633.69.00
8 User-Agent: Purple/2.5.2devel
41 User-Agent: X-Lite release 1014k stamp 47051
[fr4]
(84.72.77.125|84.73.84.240)
...
172 User-Agent: AVM FRITZ!Box Fon 06.04.01 (Jan 25 2006)
190 User-Agent: AVM FRITZ!Box Fon WLAN 7050 (UI) 14.04.33 (May 10 2007)
[fr5]
(144.85.164.27|62.203.176.54|62.203.76.51|81.62.111.243|81.62.25.114|81.62.30.94|83.219.110.204|83.76.209.9|83.78.32.28|84.74.91.143|85.1.23.138|85.5.122.4)
...
175 User-Agent: AVM FRITZ!Box Fon WLAN 7050 14.04.33 (May 10 2007)
178 User-Agent: AVM FRITZ!Box Fon WLAN 7170 Annex A 58.04.50 (Jan 25 2008)
189 User-Agent: AVM FRITZ!Box Fon WLAN 7170 Annex A 58.04.59 (Aug 21 2008)
18026 User-Agent: Linksys/PAP2T-3.1.15(LS)
6173 User-Agent: Linksys/SPA1001-3.1.19(SE)
5977 User-Agent: Linksys/WRP400-1.00.04.c
[dk1]
(85.4.19.156|85.5.235.37|92.105.152.68|92.105.165.111)
...
6815 User-Agent: AVM FRITZ!Box Fon WLAN 7141 (UI) 40.04.68 (Nov 21 2008)
216 User-Agent: AVM FRITZ!Box Fon WLAN 7270 54.04.70 (Feb 18 2009)
286 User-Agent: AVM FRITZ!Box Fon WLAN 7270 54.04.97 (Mar 17 2009)
13317 User-Agent: Linksys/SPA2102-3.3.6
55 User-Agent: SipPhone v1.1.10/iPhoneOS
[us1]
212.249.15.9
...
515 User-Agent: Kamailio (1.4.1-notls (i386/linux))
[log1]
We added in the main routing block an immediate reply to the NOTIFY messages.
if (method == "NOTIFY") {
sl_send_reply("405", "Stay Alive / Method Not Allowed");
break;
}
[dir]
All templates responsible for generation of ser.cfg files are updated:
$ for h in fr1 fr2 fr3 fr4 fr5 dk1 us1; do echo -e "\n[$h]"; scp sona@$h.youroute.net:/usr/home/porta-sip/etc/ser.cfg ser-template-$h-v2.cfg; done 2> /dev/null
$ u2d ser-template-*-v2.cfg
ser-template-dk1-v2.cfg: done.
ser-template-fr1-v2.cfg: done.
ser-template-fr2-v2.cfg: done.
ser-template-fr3-v2.cfg: done.
ser-template-fr4-v2.cfg: done.
ser-template-fr5-v2.cfg: done.
ser-template-us1-v2.cfg: done.
$ diff ser-template-dk1-v2.cfg ser-template-fr1-v2.cfg
$ diff ser-template-dk1-v2.cfg ser-template-fr2-v2.cfg
$ diff ser-template-dk1-v2.cfg ser-template-fr3-v2.cfg
$ diff ser-template-dk1-v2.cfg ser-template-fr4-v2.cfg
$ diff ser-template-dk1-v2.cfg ser-template-fr5-v2.cfg
$ diff ser-template-dk1-v2.cfg ser-template-us1-v2.cfg
19a20
> alias="%%CANONIC_NAME2%%"
[log1]
The excessive number of NOTIFY methods resulting in 483 “Too Many Hops” disappeared in the logs of the following days:
$ for h in fr1 fr2 fr3 fr4 fr5 dk1 us1; do echo -e "\n[$h]"; ssh sona@$h.youroute.net "cat \`ls /var/sipenv-*/log/sip.log.0 | head -1\` | awk '/^SIP\\/2.0/{bad=0} /^SIP\\/2.0 483/{bad=1} /^CSeq:/ && bad==1 {print \$3}' | sort | uniq -c"; done 2> /dev/null
[fr1]
70 INFO
3 OPTIONS
[fr2]
[fr3]
38 ACK
3 BYE
48 INVITE
7 NOTIFY
9 OPTIONS
77 REGISTER
2004 SUBSCRIBE
[fr4]
6
3 BYE
6 CANCEL
24 INVITE
3 OPTIONS
71 REGISTER
3752 SUBSCRIBE
[fr5]
263 ACK
6 BYE
2 CANCEL
286 INVITE
26 NOTIFY
3 OPTIONS
108 REGISTER
5866 SUBSCRIBE
[dk1]
35 ACK
32 BYE
8 CANCEL
35 INFO
304 INVITE
26 NOTIFY
31 OPTIONS
108 PING
1606 REGISTER
53106 SUBSCRIBE
[us1]
[log1]
The frequent crashes accompanied with high CPU level of ser processes are solved since the first upgrade, which took place on 2009-04-09 in afternoon:
[080212 ii iii] [080210] [xls]
Later, on 2009-04-19, we added replies to SUBSCRIBE and PING methods.
if (method == "SUBSCRIBE") {
sl_send_reply("501", "Not Implemented");
break;
}
if (method == "NOTIFY") {
sl_send_reply("405", "Stay Alive / Method Not Allowed");
break;
}
if (method == "PING") {
sl_send_reply("200", "OK Alive");
break;
}
[dir]
The SIP method PING is suggested as an RFD not long time ago [draft]. For keep-alive flow and NAT issues, PING and not NOTIFY should be used. All servers are updated and the sip proxies are restarted.
$ for h in fr1 fr2 fr3 fr4 fr5 dk1 us1; do echo -e "\n[$h]"; scp sona@$h.youroute.net:/usr/home/porta-sip/etc/ser.cfg ser-template-$h-v3.download.cfg; done 2> /dev/null
[fr1]
ser.cfg 100% 7210 7.0KB/s 00:00
[fr2]
ser.cfg 100% 7210 7.0KB/s 00:00
[fr3]
ser.cfg 100% 7210 7.0KB/s 00:00
[fr4]
ser.cfg 100% 7210 7.0KB/s 00:00
[fr5]
ser.cfg 100% 7210 7.0KB/s 00:00
[dk1]
ser.cfg 100% 7210 7.0KB/s 00:00
[us1]
ser.cfg 100% 7237 7.1KB/s 00:00
$ u2d *.download.cfg
ser-template-dk1-v3.download.cfg: done.
ser-template-fr1-v3.download.cfg: done.
ser-template-fr2-v3.download.cfg: done.
ser-template-fr3-v3.download.cfg: done.
ser-template-fr4-v3.download.cfg: done.
ser-template-fr5-v3.download.cfg: done.
ser-template-us1-v3.download.cfg: done.
$ for f in *.download.cfg; do cmd=diff\ `ls *.download.cfg | head -1`\ $f; echo -e "\n$cmd"; $cmd; done
diff ser-template-dk1-v3.download.cfg ser-template-dk1-v3.download.cfg
diff ser-template-dk1-v3.download.cfg ser-template-fr1-v3.download.cfg
diff ser-template-dk1-v3.download.cfg ser-template-fr2-v3.download.cfg
diff ser-template-dk1-v3.download.cfg ser-template-fr3-v3.download.cfg
diff ser-template-dk1-v3.download.cfg ser-template-fr4-v3.download.cfg
diff ser-template-dk1-v3.download.cfg ser-template-fr5-v3.download.cfg
diff ser-template-dk1-v3.download.cfg ser-template-us1-v3.download.cfg
19a20
> alias="%%CANONIC_NAME2%%"
$
To avoid loops with SUBSCRIBE methods they must be rejected before record-route routing. Replies to SUBSCRIBE, NOTIFY, and PING methods are sent before the loose_route() test.
# Registration
if (method == "REGISTER") {
route(3);
break;
};
# Removing the detected reasons of loops ending by 483 too many hops
if (method == "SUBSCRIBE") {
sl_send_reply("501", "Subscribe Not Implemented");
break;
}
if (method == "NOTIFY") {
sl_send_reply("405", "Use PING, Method Not Allowed");
break;
}
if (method == "PING") {
sl_send_reply("200", "OK, Alive here");
break;
}
# Do strict routing if pre-loaded route headers present
if (loose_route() && !(method == "INVITE")) {
t_relay();
break;
};
[dir]
The new version of script is uploaded and the sip service is restarted on all servers. As soon as all servers get updated, the tails of last 2'500'000 lines of all sip.log files started to progressively loose the 483 replies.
$ for h in fr1 fr2 fr3 fr4 fr5 dk1 us1; do echo -e "\n[$h]"; ssh sona@$h.youroute.net "tail -3000000 \`ls /var/sipenv-*/log/sip.log | head -1\` | awk '/^SIP\\/2.0/{bad=0} /^SIP\\/2.0 483/{bad=1} /^CSeq:/ && bad==1 {print \$3}' | sort | uniq -c"; done 2> /dev/null; echo; date
[fr1]
[fr2]
[fr3] 1 REGISTER 105 SUBSCRIBE
[fr4] 3 ACK 2 REGISTER 105 SUBSCRIBE
[fr5] 35 ACK 42 INVITE 8 NOTIFY 3 OPTIONS 16 REGISTER 1055 SUBSCRIBE
[dk1] 2 ACK 3 INVITE 18 REGISTER 624 SUBSCRIBE
[us1]
Sun Apr 19 18:00:59 WEST 2009 |
[fr1]
[fr2]
[fr3] 1 REGISTER 105 SUBSCRIBE
[fr4] 3 ACK 2 REGISTER 105 SUBSCRIBE
[fr5] 35 ACK 42 INVITE 8 NOTIFY 3 OPTIONS 16 REGISTER 1055 SUBSCRIBE
[dk1] 2 ACK 1 INVITE 8 REGISTER 333 SUBSCRIBE
[us1]
Sun Apr 19 18:06:00 WEST 2009 |
[fr1]
[fr2]
[fr1]
[fr2]
[fr3] 1 REGISTER 105 SUBSCRIBE
[fr4] 1 REGISTER 70 SUBSCRIBE
[fr5] 35 ACK 42 INVITE 7 NOTIFY 3 OPTIONS 13 REGISTER 950 SUBSCRIBE
[dk1]
[us1]
Sun Apr 19 18:15:37 WEST 2009 |
[fr1]
[fr2]
[fr3] 70 SUBSCRIBE
[fr4] 1 REGISTER 70 SUBSCRIBE
[fr5] 35 ACK 42 INVITE 7 NOTIFY 3 OPTIONS 10 REGISTER 840 SUBSCRIBE
[dk1]
[us1]
Sun Apr 19 18:29:29 WEST 2009 |
[fr1]
[fr2]
[fr3]
[fr4]
[fr5]
[dk1]
[us1]
Sun Apr 19 22:12:46 WEST 2009 |
[log3]
After a while all 483 “Too Many Hops” disappeared. For a final conclusion we sill need a full day statistics.
[Done] Seems that SUBSCRIBE must be processed before the loose_route() test of the main routing logic.
We may need a crontab script, which monitors the CPU load, restarts the b2b, and if failure occurred, kills the blocked ser processes.
Breaking the loops caused by NOTIFY messages [090419 ii] (this web page)
SIP codes http://free.oxymium.net/Asterisk/SIPCodes.html
E. Gabrielyan and C. Lathion, Solving a significant amount of call interruptions by the upgrade of SIP signaling [090320 ii iii]
RFC3265 http://www.ietf.org/rfc/rfc3265.txt
The SIP PING Method http://ftp.ist.utl.pt/pub/drafts/draft-fwmiller-ping-03.txt
Monitoring the CPU loads of all SER processes of B2B agents [080212 ii iii]
SER CPU charts http://www.unappel.ch/public/080210-ser-cpu/
* * *
Copyright © 2009, Switzernet