Reconstructing the CDR file from syslogs of the kamalio SIP router – in relation with the fraud involving Slovenia mobiles

[]

Emin Gabrielyan

Switzernet.com

2010-10-28

 

Reconstructing the CDR file from syslogs of the kamalio SIP router – in relation with the fraud involving Slovenia mobiles  1

1.     Introduction. 2

2.     Extracting the sufficient subset of records for building the CDR of answered calls. 2

2.1.       Syslog file. 2

2.2.       Extraction of sufficient subset 3

3.     Reading the fields of INVITE and BYE transactions. 4

3.1.       The greediness options of regex quantifiers. 5

3.2.       Retrieving the fields. 6

3.3.       Used vendors. 7

4.     Building call records. 7

5.     Processing the multiple call-id cases. 10

6.     Call Data Records (CDR) 14

6.1.       Statistics per country being called. 14

6.2.       Statistics over the from user field being used. 14

6.3.       Statistics over the phone number being called. 15

6.4.       CDR file. 16

7.     Traffic distribution chart 16

8.     Calls to Slovenia-Mobile-Kosovo Ipkonet 18

8.1.       Comparison of syslog and vendor records. 18

8.2.       The number of simultaneous calls to Slovenia mobiles. 19

9.     References: 22

10.       Glossary. 23

11.       Syslog and CDR files. 23

12.       Formatting particularities of this document 24

12.1.     Styles. 24

12.2.     File reference style. 24

12.3.     Numbering of references. 25

12.4.     Deleting the reference number bookmark before printing. 25

12.5.     Conventions on the new versions of the document 26

 

 

1.       Introduction

 

Our experimental SIP server installed for testing and developments of the ACD quality routing [101] [102] [103] [104] [105] [106] [107] [108] [109] and of the system designated for the routing of emergency calls [110] was hacked in October 2010. The first call log via the hacked server is dated October 13th. A significant volume is registered during the weekend from 2010-10-15 through 2010-10-17 [111] [112] [113] [114] [115]. The fraudulent traffic was terminated via the hacked server to several destinations. The server was not integrated into the main billing system, and the calls were not accounted in the central CDR database. For discovering the traffic details only syslog files [116] [117] [118] of the hacked UNIX server are used.

 

The kamalio server [119] [120] [121] logged the SIP transactions via unix syslog service. This document presents the construction of CDR file from the syslog file (see sections from 2 to 5), provides the output CDR file of fraudulent calls (section 6.4), different statistics on the number of simultaneous calls and destinations dialed (sections 6.1, 6.2, 6.3, 7, and 8.2), and several hypothesis for possible motives of the fraud (section 8.2).

 

This document has three different target audiences. One target is the training on processing of the text log files and testing of various skills such as the ability to form dialog (call) records from the transaction logs (SIP method/responses) [122] [123] [124] [125]. The logs not belonging to traffic generated by legitimate users will be used publicly for pre-recruitment tests and for internal training of the staff. The authorities processing the complains in relation with the fraudulent calls to Slovenia mobiles can find additional statistics related to the traffic. Finally this document publicly provides the data to other operators for comparison of patterns and for the prevention of frauds in their own networks.

 

2.       Extracting the sufficient subset of records for building the CDR of answered calls

 

2.1.  Syslog file

 

Our objective is to select the transactions determinant for the establishment of call durations and assemble them into phone calls.

 

The syslog file [126] [127] [128] contains records for accomplished (answered) SIP transactions. A SIP transaction is often an exchange of two SIP packets, a method followed by a response [129] [130] [131] [132]. The transaction is sent to syslog upon the reception of a response concluding the transaction. The transaction is recorded as a single syslog line. The following is an example of a transaction record. The kamalio server sends to syslog a single string of semicolon separated fields. The following is an example of such a record, where for a visual clarity each field is shown on a separated line aligned by the equal sign. The shown newlines and following spaces are not present in the syslog file.

 

101014-syslog.txt:Oct 13 13:13:08 ks301129 ser[9108]: NOTICE: acc [acc.c:275]: ACC: transaction answered:

 

timestamp=1286968388;

   method=INVITE;

 from_tag=7e6d1b44;

   to_tag=65212929765820101013131233;

  call_id=ae5d7f4125027e66;

     code=183;

   reason=Session Progress;

  src_user=101;

src_domain=91.121.73.130;

 dst_ouser=00972599870738;

  dst_user=+972599870738;

dst_domain=212.249.15.9

 

 

All lines of syslog file containing transaction records of kamalio where the “call_id” field is present are extracted in a separate log file, which serves us as the main input in our next efforts for building a CDR file.

 

Description:

All transactions with call_id field

File:

data1\101013+6-11'callid.zip

Size:

6.44MB

 

We are limiting our research by answered calls only. The output CDR will contain only the answered calls. You can skip the processing of the file and go directly to section 6.4 containing a link to the output CDR file. Sections 2.2, 3, 4, and 5 are provided only for training purposes and are insignificant for administrative efforts related to fraud complains.

 

2.2.  Extraction of sufficient subset

 

The transaction record appears in the syslog file [133] [134] [135] when the reply to a method is received [136] [137] [138] [139]. At that point, both the key data of the method’s request and of the reply are logged into a single syslog line. To obtain phone call data, we need the records of all INVITE methods having 200 OK replies and of all BYE methods.

 

The syslog file counts a slightly higher number of BYE transactions that of INVITE methods having 200 success replies. The excess of BYE can be due to losses and retransmissions. Additionally, as you see below, the transaction records are duplicated. The reason of the duplicates is ignored (can be the multi-process nature of the kamalio server).

 

$ grep ";method=INVITE;.*;code=200;" 101013+6-callid.txt | wc -l

137536

 

$ grep ";method=BYE;" 101013+6-callid.txt | wc -l

144532

 

$ grep ";method=INVITE;.*;code=200;" 101013+6-callid.txt | sort | uniq | wc -l

68766

 

$ grep ";method=BYE;" 101013+6-callid.txt | sort | uniq | wc -l

72249

 

$

 

We eliminate the duplicates and save the successful INVITE methods as well as all the BYE methods into a new text file.

 

$ egrep "(;method=INVITE;.*;code=200;|;method=BYE;)" 101013+6-callid.txt | sort | uniq | wc -l

141015

 

$ expr 68766 + 72249

141015

 

$ egrep "(;method=INVITE;.*;code=200;|;method=BYE;)" 101013+6-callid.txt | sort | uniq > 101013+6-answered.txt

 

$ u2d 101013+6-answered.txt

101013+6-answered.txt:

 

$ wc -l 101013+6-answered.txt

141015 101013+6-answered.txt

 

$

 

 

Description:

All transactions of answered calls

File:

data1\101013+6-12'answered.zip

Size:

3.69 MB

 

 

3.       Reading the fields of INVITE and BYE transactions

 

The kamalio service sends to syslog a string with semicolon separated fields containing both the filed name and the value. This section presents the script used for extracting the fields we are interested in. In our regex [140] [141] the quantifiers are followed by not very popular greediness options. If you wish to understand the regex used in our script, read the next subsection 3.1, otherwise skip it and continue with subsection 3.2.

 

3.1.  The greediness options of regex quantifiers

 

The behavior of commonly used quantifiers “*”, “?”, “+”, “{n,m}” can be tuned by greediness options. If the quantifier (such as “*”) is followed by “?” the quantified subpattern will match the minimum number of times. If the quantifier is followed by “+” it will match the maximum number of times.

 

By default a quantified subpattern “+” or “*” is greedy. In the following example, the character “a” is matched the maximum possible of times with expression “a+” or “a*” while still allowing the rest of the pattern (the last “a”) to match:

 

$ echo aaaa | perl -ne '/a+a/; print $&'

aaaa

 

$ echo aaaa | perl -ne '/a*a/; print $&'

aaaa

 

$

 

If you want it to match the minimum number of times possible, follow the quantifier with a “?”. In the following example “a+” is matched only 1 time, and “a*” zero time (the respective minimums).

 

$ echo aaaa | perl -ne '/a+?a/; print $&'

aa

 

$ echo aaaa | perl -ne '/a*?a/; print $&'

a

 

$

 

Perl provides also the “possessive” quantifier form. Follow the quantifier with “+”. The possessive option matches as much as possible and does not take care of the rest of the regex (whether the rest of the pattern will match or not).

 

The example below is without the possessive option “+” (after “a+”), and we see that it allows matching of both halves of the regular expression: “a+” and the following “a”.

 

$ echo aaaa | perl -ne '/(a+)a/; print $1'

aaa

 

$

 

However, when the possessive option is added, the first half of the regular expression eats up all “a”s, without leaving any character for the rest of the regex.

 

$ echo aaaa | perl -ne '/(a++)a/; print $1'

 

$

 

The regex with the possessive quantifier matches only without the second half.

 

$ echo aaaa | perl -ne '/(a++)/; print $1'

aaaa

 

$

 

3.2.  Retrieving the fields

 

The following command line script retrieves the unix time stamp (the seconds counted since 1970-01-01), the SIP method (INVITE or BYE), the unique call id, the source user (SIP from field), the destination user (SIP to field before translation), and finally the next hop SIP server (i.e. our vendor). As you see non-greedy quantifier is used for stopping the matching at the first occurrence of the semicolon separator, and the possessive quantifier is used for ensuring the full capture of the IP address (normally redundant as by default the quantifier is greedy).

 

$ head 101013+6-answered.txt | perl -ne '/ timestamp=(\d+);method=(\w+);.*;call_id=(.*?);.*;src_user=(.*?);.*;dst_ouser=(.*?);.*;dst_domain=([^\r\n]*+)/; printf "%10s %10s %20s %20s %20s %20s\n",$1,$2,$3,$4,$5,$6'

1286968402     INVITE     ae5d7f4125027e66                  101       00972599870738         212.249.15.9

1286968407        BYE     ae5d7f4125027e66        +972599870738                  101      188.161.231.133

1286968438     INVITE     9a428e1758434f2e                  101       00972599870738         212.249.15.9

1286968510     INVITE     f0073042b0657a13                  101       00972597516161         212.249.15.9

1286968532     INVITE     ab071445d631b666                  101       00972597516161         212.249.15.9

1286968590     INVITE     b8009e4b3178c70a                  101       00972599870738         212.249.15.9

1286968602        BYE     b8009e4b3178c70a                  101        +972599870738         213.71.2.208

1286968606        BYE     ab071445d631b666                  101        +972597516161         213.71.2.208

1286968607        BYE     f0073042b0657a13                  101        +972597516161         213.71.2.208

1286968611        BYE     9a428e1758434f2e                  101        +972599870738         213.71.2.208

 

As expected, the number of extracted lines matches to the total number of sufficient syslog records extracted and shown in section 2.2.

 

$ cat 101013+6-answered.txt | perl -ne '/ timestamp=(\d+);method=(\w+);.*;call_id=(.*?);.*;src_user=(.*?);.*;dst_ouser=(.*?);.*;dst_domain=([^\r\n]*+)/; printf "%10s %10s %20s %20s %20s %20s\n",$1,$2,$3,$4,$5,$6' | wc -l

141015

 

$ wc -l 101013+6-answered.txt

141015 101013+6-answered.txt

 

$

 

The fields are displayed in a comma separated format and with the call-id field at the beginning (for being used as a matching key between INVITE and BYE records). The symbol “A” in the 2nd field means the beginning of the call charge and “B” the end of the call.

 

$ cat 101013+6-answered.txt | perl -ne '/ timestamp=(\d+);method=(\w+);.*;call_id=(.*?);.*;src_user=(.*?);.*;dst_ouser=(.*?);.*;dst_domain=([^\r\n]*+)/; printf "%s,%s,%s,%s,%s,%s\n",$3,$2 eq "BYE"?"B":"A",$1,$4,$5,$6' | head

ae5d7f4125027e66,A,1286968402,101,00972599870738,212.249.15.9

ae5d7f4125027e66,B,1286968407,+972599870738,101,188.161.231.133

9a428e1758434f2e,A,1286968438,101,00972599870738,212.249.15.9

f0073042b0657a13,A,1286968510,101,00972597516161,212.249.15.9

ab071445d631b666,A,1286968532,101,00972597516161,212.249.15.9

b8009e4b3178c70a,A,1286968590,101,00972599870738,212.249.15.9

b8009e4b3178c70a,B,1286968602,101,+972599870738,213.71.2.208

ab071445d631b666,B,1286968606,101,+972597516161,213.71.2.208

f0073042b0657a13,B,1286968607,101,+972597516161,213.71.2.208

9a428e1758434f2e,B,1286968611,101,+972599870738,213.71.2.208

 

3.3.  Used vendors

 

The further processing of the INVITE records of the output shows that calls were routed only via two outgoing vendors. The verification matches with the total of section 2.2.

 

$ cat 101013+6-answered.txt | perl -ne '/ timestamp=(\d+);method=(\w+);.*;call_id=(.*?);.*;src_user=(.*?);.*;dst_ouser=(.*?);.*;dst_domain=([^\r\n]*+)/; printf "%s,%s,%s,%s,%s,%s\n",$3,$2 eq "BYE"?"B":"A",$1,$4,$5,$6' | awk -F, '$2=="A" {print $6}' | sort | uniq -c

  51073 212.249.15.9

  17693 217.168.45.4

 

$ expr 51073 + 17693

68766

 

$

 

 

4.       Building call records

 

In the following script we are merging together the records having the same call-id. This is achieved by sorting the output lines where the first field is the call-id, and the second field is “A” if the line represents the beginning of the conversation and “B” if it is the end of the call. As a result of merging we will have a set of fields representing the beginning of the call followed by a set of fields representing the end of the call. Normally we will have only pairs of lines with one “A” and one “B” record sharing the same call id. However, more than two A/B records can be merged together in exceptional cases, if multiple INVITEs and BYEs are registered under the same call-id.

 

$ cat 101013+6-answered.txt | perl -ne '/ timestamp=(\d+);method=(\w+);.*;call_id=(.*?);.*;src_user=(.*?);.*;dst_ouser=(.*?);.*;dst_domain=([^\r\n]*+)/; printf "%s,%s,%s,%s,%s,%s\n",$3,$2 eq "BYE"?"B":"A",$1,$4,$5,$6' | sort | awk -F, '$1!=id{printf "\n"} {printf "%s,",$0} {id=$1}' | head

0000221ba3287435,A,1287334561,684168,002522168768,217.168.45.4,0000221ba3287435,B,1287334580,002522168768,684168,41.206.153.251,

0000ae1a6842d52c,A,1287327409,000000000000,002522200370,217.168.45.4,0000ae1a6842d52c,B,1287327417,000000000000,002522200370,217.168.45.4,

0000f6798a5f4f6f,A,1287217382,101,0023224006762,212.249.15.9,0000f6798a5f4f6f,B,1287218431,101,+23224006762,213.71.2.208,

00024779b134e721,A,1287187156,101,0038643281242,212.249.15.9,00024779b134e721,B,1287187704,101,+38643281242,213.71.2.208,

0005c56a98078d50,A,1287167799,0000,0038643281289,212.249.15.9,0005c56a98078d50,B,1287167828,+38643281289,0000,41.206.158.7,

0005f5698e073461,A,1287313970,000000000000,002522200185,217.168.45.4,0005f5698e073461,B,1287313976,000000000000,002522200185,217.168.45.4,

000678011129b805,A,1287340940,000000000000,002522168855,217.168.45.4,000678011129b805,B,1287340947,000000000000,002522168855,217.168.45.4,

0006dd42d51dad7b,A,1287290372,000000000000,0022479910595,217.168.45.4,0006dd42d51dad7b,B,1287290382,0022479910595,000000000000,109.253.170.238,

000c222ec747ab3f,A,1287317096,888,0023222291847,217.168.45.4,000c222ec747ab3f,B,1287317219,888,0023222291847,217.168.45.4,

 

$

 

If we now sort the merged fields again according to the time (and not call-id), we will see that the chronologically first call record in the obtained list corresponds to the first INVITE in the file of transactions being processed.

 

We can now say that the first hacked call was issued on 1286968402 in unix time stamp, corresponding to October 13th 13:13 to +972599870738. This call was issued from the IP address 188.161.231.133 and it lasted 1286968407 - 1286968402 = 5 seconds.

 

 

$ head -1 101013+6-answered.txt

101014-syslog.txt:Oct 13 13:13:22 ks301129 ser[9109]: NOTICE: acc [acc.c:275]: ACC: transaction answered: timestamp=1286968402;method=INVITE;from_tag=7e6d1b44;to_tag=65212929765820101013131233;call_id=ae5d7f4125027e66;code=200;reason=OK;src_user=101;src_domain=91.121.73.130;dst_ouser=00972599870738;dst_user=+972599870738;dst_domain=212.249.15.9

 

$ cat 101013+6-answered.txt | perl -ne '/ timestamp=(\d+);method=(\w+);.*;call_id=(.*?);.*;src_user=(.*?);.*;dst_ouser=(.*?);.*;dst_domain=([^\r\n]*+)/; printf "%s,%s,%s,%s,%s,%s\n",$3,$2 eq "BYE"?"B":"A",$1,$4,$5,$6' | sort | awk -F, '$1!=id{printf "\n"} {printf "%s,",$0} {id=$1}' | sort -n -t, -k 3 | head

ae5d7f4125027e66,A,1286968402,101,00972599870738,212.249.15.9,ae5d7f4125027e66,B,1286968407,+972599870738,101,188.161.231.133,

9a428e1758434f2e,A,1286968438,101,00972599870738,212.249.15.9,9a428e1758434f2e,B,1286968611,101,+972599870738,213.71.2.208,

f0073042b0657a13,A,1286968510,101,00972597516161,212.249.15.9,f0073042b0657a13,B,1286968607,101,+972597516161,213.71.2.208,

ab071445d631b666,A,1286968532,101,00972597516161,212.249.15.9,ab071445d631b666,B,1286968606,101,+972597516161,213.71.2.208,

b8009e4b3178c70a,A,1286968590,101,00972599870738,212.249.15.9,b8009e4b3178c70a,B,1286968602,101,+972599870738,213.71.2.208,

bd20b03dd73c1c1b,A,1286998646,133,38643322585,212.249.15.9,bd20b03dd73c1c1b,B,1286998649,133,+38643322585,213.71.2.208,

d2228418041d064f,A,1286998656,133,38643322585,212.249.15.9,d2228418041d064f,B,1286998698,133,+38643322585,213.71.2.208,

174abf3e07248517,A,1287092708,0000000000,0038643327405,212.249.15.9,174abf3e07248517,B,1287092787,0000000000,+38643327405,213.71.2.208,

44577f70d5141a07,A,1287092720,0000000000,0038643327405,212.249.15.9,44577f70d5141a07,B,1287092786,0000000000,+38643327405,213.71.2.208,

 

$

 

Before generation of call records let us compute the number of distinct call identifications in the file of all transactions and in the file of transactions corresponding to answered calls only. As expected a fewer number of call id appears in the second file.

 

$ cat 101013+6-callid.txt | perl -ne '{/;call_id=.*?;/; print $&."\n"}' | head

;call_id=ae5d7f4125027e66;

;call_id=ae5d7f4125027e66;

;call_id=ae5d7f4125027e66;

;call_id=ae5d7f4125027e66;

;call_id=ae5d7f4125027e66;

;call_id=ae5d7f4125027e66;

;call_id=ae5d7f4125027e66;

;call_id=ae5d7f4125027e66;

;call_id=9a428e1758434f2e;

;call_id=9a428e1758434f2e;

 

$ cat 101013+6-callid.txt | perl -ne '{/;call_id=.*?;/; print $&."\n"}' | sort | uniq | wc -l

56415

 

$ cat 101013+6-answered.txt | perl -ne '{/;call_id=.*?;/; print $&."\n"}' | sort | uniq | wc -l

48004

 

$

 

Now the output file “101013+6-calls.txt” is generated and the number of its lines is exactly the same as the number of distinct call identifications in the file “101013+6-answered.txt” of transactions of answered calls.

 

$ cat 101013+6-answered.txt | perl -ne '/ timestamp=(\d+);method=(\w+);.*;call_id=(.*?);.*;src_user=(.*?);.*;dst_ouser=(.*?);.*;dst_domain=([^\r\n]*+)/; printf "%s,%s,%s,U %s,U %s,%s\n",$3,$2 eq "BYE"?"B":"A",$1,$4,$5,$6' | sort | awk -F, '$1!=id{printf "\r\n"} {printf "%s,",$0} {id=$1}' | sort -n -t, -k 3 > 101013+6-calls.txt

 

$ wc -l 101013+6-calls.txt

48005 101013+6-calls.txt

 

$

 

The call-id numbers repeated most of the time will take the highest amount of columns in the output file. They probably correspond to concurrent calls issued under the same call-id (due to an error or deliberately).

 

$ cat 101013+6-answered.txt | perl -ne '/ timestamp=(\d+);method=(\w+);.*;call_id=(.*?);.*;src_user=(.*?);.*;dst_ouser=(.*?);.*;dst_doma

in=([^\r\n]*+)/; printf "%s,%s,%s,U %s,U %s,%s\n",$3,$2 eq "BYE"?"B":"A",$1,$4,$5,$6' | sort | cut -d, -f1 | sort | uniq -c | sort -n -r

 | head -2

     22 xr3808484263002c1834218e1921681f@188.161.237.156

     22 xr11800770335646c9615328e192168f@188.161.229.236

     20 xr94449288617869c3113484e192168f@188.161.237.156

$

 

Description:

Call Data Records in Text format

File:

data1\101013+6-13'calls.txt.zip

Size:

1.29MB

 

In the next section we import the text records into an Excel file and we compute the amount of minutes and calls for dialogs sharing the same call-id.

 

5.       Processing the multiple call-id cases

 

The multiple call-id cases are processed in an Excel sheet described in this section. Rows of the Excel file contain merged field sets with call-start “A” and call-stop “B” fields. We first compute the number of start “A” and stop “B” records.

 

See the H2 cell for the formula computing the number of call starts:

=COUNTIF($M2:$DR2,"A")

 

See the cell I2 for the formula computing the number of stops:

=COUNTIF($M2:$DR2,"B")

 

 

As all transactions were sorted so as in the merged line all start fields appear before all stop fields (see section 4), the average of all start times is computed by summing the unix timestamps of the first block corresponding to the field-set of the start events and the average time of all stop events is computed by summing the unix timestamps of the second block corresponding to the field-set of the stop events.

 

See the formula in cell J2 for the average start time:

=SUM(OFFSET($M2,0,0,1,5*H2))/H2

 

See the formula in cell K2 for the average stop time. Note that the offset is shifted by 5*H2 positions to skip the start events:

=SUM(OFFSET($M2,0,5*H2,1,5*I2))/I2

 

 

The total duration can be computed as follows (see the D2 field):

=TIME(0,0,(K2-J2)*F2)

 

Where F1 field is equal to the number of simultaneous calls under the same call-id:

=AVERAGE(H2:I2)

 

The value in the H column is normally always equal to the value in the I column (except non-recoverable packet losses resulting in transaction losses).

 

 

Finally the date and time of the call is computed by processing the unix time stamp where we consider also the time shift between UTC and CET (CEST) time zones:

=DATE(1970,1,1)+J2/(3600*24)+2/24

 

One may have doubts about the way the duration is computed. When we do not know which start event corresponds to which stop (as a single call-id is used for multiple events), it might be unclear why the average of all start times and the average of all stop times is sufficient for computing the correct call duration.

 

When the same call-id is used for two different calls, there is no way to know how to match the starts with the stops. We have a set of starts on one side and a set of stops on the other.

 

The following drawing visualizes two starts “A” and two stops “B”.

 

The above image represents two possible combinations are possible:

 

The first possibility:

 

The second possibility:

 

However, in both cases the sum of durations of two calls is the same:

 

In the general case also:

 

Therefore the total duration of all calls sharing the same call-id is the average of stop times minus the average of start times multiplied by the number of simultaneous calls.

 

 

Description:

Calls sharing the same call-id

File:

data1\101013+6-13'calls.xls.zip

Size:

7.21MB

 

 

The Excel file of this section contains many formulas and is heavy to open. In the next section we present the compact version of the CDR file containing only values resulting from the computation of the number and duration of simultaneous calls.

 

6.       Call Data Records (CDR)

 

The CDR file containing the output values can be downloaded in section 6.4. Next sections show several statistics resulting from the CDR file. The statistics per country, per from-user field, and per destination number are shown in sections 6.1, 6.2, and 6.3 (also available in the Excel file of section 6.4).

 

6.1.  Statistics per country being called

 

 

Country

Code

Calls

Minutes

ACD

Slovenia

386

25'594

153'239.7

6.0

Sierra Leone

232

6'553

32'224.2

4.9

Somalia Republic

252

10'801

24'779.6

2.3

Guinea

224

5'021

12'659.3

2.5

Israel

972

28

118.4

4.2

Macedonia

389

6

4.7

0.8

Zimbabwe

263

1

0.2

0.2

 

6.2.  Statistics over the from user field being used

 

From user

Calls

Minutes

ACD

101

19'765

76'851.3

3.9

asd300

1'778

22'211.1

12.5

0000

1'648

19'234.2

11.7

000000000000

8'534

18'356.5

2.2

0000000

2'008

16'017.6

8.0

kalnas600

4'451

13'818.2

3.1

dehka

4'272

10'121.5

2.4

000000

868

9'974.5

11.5

foaad_saa

147

6'993.3

47.6

7777

605

6'424.5

10.6

55202033

324

4'294.3

13.3

kalnas500

271

4'036.9

14.9

1111

204

2'923.7

14.3

888888888

350

2'374.3

6.8

888

110

1'745.6

15.9

123

138

1'362.1

9.9

hisham1970

813

1'339.9

1.6

0000000000

468

1'224.7

2.6

684168

779

863.6

1.1

111111111111

85

583.6

6.9

11

71

550.9

7.8

asdf500

98

365.9

3.7

shikso

37

335.2

9.1

marryaina123-1001

17

282.4

16.6

999

13

168.4

13.0

karam155

8

146.4

18.3

RAMY250

10

117.2

11.7

00000000

58

117.0

2.0

5555555

6

101.0

16.8

133

21

48.2

2.3

10

19

32.2

1.7

1001

24

7.2

0.3

anonymous

2

2.0

1.0

441932376101

1

0.6

0.6

250

1

0.1

0.1

 

6.3.  Statistics over the phone number being called

 

The table of phone numbers being dialed shows only the top used numbers. The full list is available in the CDR Excel file (section 6.4).

 

 

To

Calls

Minutes

ACD

0038643281239

1'178

19'643.7

16.7

0038643281242

2'521

11'744.3

4.7

0038643281460

5'267

11'048.3

2.1

0038643281244

3'583

8'592.0

2.4

0023224000936

276

8'249.4

29.9

0023224000935

1'361

7'005.9

5.1

0038643281094

490

6'757.5

13.8

002522200377

1'613

6'564.5

4.1

0038643281081

356

5'629.6

15.8

0023224006762

649

5'587.6

8.6

0038643281286

1'711

5'524.6

3.2

0038643281494

557

5'304.5

9.5

0038643281461

2'259

5'176.2

2.3

0038643281287

366

4'794.5

13.1

0038643281463

695

4'732.7

6.8

0038643281289

330

4'618.0

14.0

0038643281098

779

4'604.2

5.9

0023224000938

2'966

4'385.7

1.5

0038643281234

358

4'065.5

11.4

0038643281498

370

3'840.6

10.4

0038643281288

271

3'811.6

14.1

0038643281230

291

3'742.0

12.9

0038643281233

321

3'741.3

11.7

0038643281499

319

3'711.6

11.6

0038643281238

462

3'619.9

7.8

0038643281231

316

3'566.2

11.3

002522200378

1'387

3'554.2

2.6

0023224006772

919

3'546.2

3.9

0038643281232

292

3'472.5

11.9

0038643281497

243

3'391.4

14.0

0038643281465

505

2'886.7

5.7

0038643281496

272

2'866.9

10.5

0038643281466

374

2'706.8

7.2

002522168653

592

2'477.0

4.2

0038643281241

166

2'323.9

14.0

0038643281476

155

2'292.1

14.8

002522168765

195

1'884.5

9.7

002522168898

403

1'867.7

4.6

002522168652

878

1'840.3

2.1

0022479910583

134

1'714.2

12.8

0022479910596

493

1'678.0

3.4

0038643281080

218

1'654.7

7.6

0022479910594

326

1'484.4

4.6

0022479910584

90

1'214.6

13.5

0022479910595

413

1'198.1

2.9

0022479910597

396

1'171.2

3.0

0022479910598

376

1'125.4

3.0

0023224001570

92

1'024.9

11.1

0038643281190

240

949.9

4.0

0023222291848

29

931.9

32.1

0022479910585

117

835.0

7.1

0038643281464

110

791.9

7.2

0022479910589

191

776.4

4.1

 

6.4.  CDR file

 

Description:

CDR file created from syslog

File:

data1\101013+6-14'cdr.zip

Size:

1.66MB

 

 

7.       Traffic distribution chart

 

The following chart shows the evolution of the distribution of the traffic by countries. The data is presented for hourly intervals. The values represent the number of concurrent parallel calls lasting during a given hour. First the fraudulent traffic was using the connections of Verizon. When Verizon detected the fraud and suspended the calls, the flow interrupted for a couple of hours and then restarted using this time the routes of Colt.

 

 

The following two records show the first and last calls routed via Verizon:

 

    Time: 2010-10-13 13:13:22

    From: 101

      To: Israel

   Phone: 00972599870738

Duration: 00:00:05

     Via: verizonbusiness.com

 

    Time: 2010-10-16 18:09:12

    From: dehka

      To: Sierra Leone

   Phone: 0023224000938

Duration: 00:00:46

     Via: verizonbusiness.com

 

The fraudulent traffic was interrupted when Verizon detected the fraud and decided to block the calls. In a couple of hours the fraudulent traffic began again, and this time via Colt. The following two records show the first and last calls routed via Colt. The fraud was detected by Colt on Sunday and the calls were blocked.

 

    Time: 2010-10-16 20:55:18

    From: 250

      To: Israel

   Phone: 00972599916699

Duration: 00:00:07

     Via: colt.net

 

    Time: 2010-10-17 23:07:44

    From: 000000000000

      To: Somalia Republic

   Phone: 002522168598

Duration: 00:00:19

     Via: colt.net

 

Description:

Distribution chart by hours

File:

data1\101013+6-15'chart.zip

Size:

2.18MB

 

 

8.       Calls to Slovenia-Mobile-Kosovo Ipkonet

 

When Verizon’s fraud department detected the pattern, the records of suspected calls to Slovenia were sent to us.

 

8.1.  Comparison of syslog and vendor records

 

The CDR generated by ourselves from syslog files was compared with the CDR of Verizon containing the calls to Slovenia mobiles. Calls of both CDR matched accurately most of the time. The records in two files were often identical except a time shift from 32 to 34 seconds due to a wrong time on one of the sides.

 

Description:

Vendor and syslog CDR comparison

File:

data1\101013+6-16'slovenia.zip

Size:

12.8MB

 

The following records represent the first and last calls appearing in the fraud report of Verizon for calls to Slovenia mobiles:

 

Time: 2010-10-15 01:48:46

To: 38643281227

Duration: 191 seconds

 

Time: 2010-10-16 18:08:58

To: 38643281463

Duration: 16

 

8.2.  The number of simultaneous calls to Slovenia mobiles

 

The entire traffic of 7’554’889 seconds or of 125’914.8 minutes, representing a charge of CHF 38'035.47 (without VAT) was sent to 32 phone numbers only. Except businesses handling simultaneous hot line calls, the multiple answers to the same phone number suggest a fraud. The following table shows the number of parallel calls to each specific individual mobile phone number. The first row of the table contains the 32 mobile phone numbers in question. The rows that follow represent one-hour intervals. The values appearing under individual phone numbers represent the average number of concurrent calls to that specific phone during the entire period of 1-hour intervals.

 

The table shows that for example during the entire hour from 2010-10-16 04:00 to 04h59 there were in average as many as 34 simultaneous calls to a single phone number +38 64 32 81 23 9, generating a total duration of 2’057.65 minutes during this single hour and corresponding to a cost of CHF 621.56 (per 1 hour and per 1 phone number). The number of simultaneous calls per single phone number reached as high as 91 parallel calls and the total number of parallel simultaneous calls to Slovenia mobiles reached as high as 180 parallel calls (a capacity of 6 full E1 lines).

 

In case of real mobile phone subscribers, we see neither a technical possibility nor an economical benefit for sending 126'000 minutes to 32 mobile phones in about one day. It is possible that a vendor of Verizon, or a vendor of its vendor provided a wrong answer supervision for all calls to Slovenia mobiles. Such an intermediary fake vendor would benefit from the traffic and can be therefore in the origin of the fraudulent calls. The final owner of the range of numbers in the destination country (such as a small MVNO, OLO, or PNS) can also benefit from the incoming traffic and therefore is also a hypothetical suspect for the origin of the fraudulent traffic.

 

The following chart is the graphical version of the previous table. The horizontal positions of histograms represent the hours. The total height of histograms at a given hour is the number of simultaneous calls to Slovenia mobiles. Different colors represent one of the 32 individual mobile phone numbers. The height of a single histogram of a single color is the number of simultaneous calls to the corresponding single mobile phone number. For example the chart shows that starting from 6 o’clock in the morning of October 16th, during one hour, there were 91 simultaneous calls to a single mobile phone subscriber +38643281239.

 

 

Description:

Simultaneous calls per phone

File:

data1\101013+6-17'phones.zip

Size:

1.08MB

 

9.       References:

 

Fraud reports [142] [143] [144] [145] [146] [147]:

http://switzernet.com/3/public/101028-fraud-slovenia/ (this pahe)

http://switzernet.com/public/060801-web/news_detail.php?id=167

http://switzernet.com/public/060801-web/news_detail.php?id=166

http://switzernet.com/3/folders/101018-fraud-slovenia/ (login: fraud)

http://mirror2.switzernet.com/3/folders/101018-fraud-slovenia/  (login: fraud)

http://www.fedpol.admin.ch/content/fedpol/fr/misc/conform.html

 

ACD quality routing [148] [149] [150] [151] [152] [153] [154] [155] [156]:

http://switzernet.com/public/091020-acd-routing/

http://www.unappel.ch/2/public/091020-acd-routing/

http://unappel.ch/public/091020-acd-routing/

http://intarnet.com/2/public/091020-acd-routing/

http://parinternet.ch/2/public/091020-acd-routing/

http://switzernet.com/public/091029-ACDstat/

http://unappel.ch/public/091029-ACDstat/

http://switzernet.com/public/091217-doc-acd-routing/

http://en.wikipedia.org/wiki/Least-cost_routing

 

Emergency numbers [157]:

http://unappel.ch/folders/101004-emergency-calls-planning/ (login: ofcom)

 

Kamalio/OpenSER SIP server/router [158] [159] [160]:

http://www.kamailio.org/

http://sip-router.org/

http://www.iptel.org/ser/

 

Perl regular expressions [161] [162]:

http://switzernet.com/3/public/101024-regex/

http://perldoc.perl.org/perlre.html

 

References on syslog file format [163] [164] [165]:

http://www.facetcorp.com/tnotes/facetwin/tn_syslog.html

http://www.syslog.org/

http://lists.rtpproxy.org/pipermail/users/2009-May.txt

 

References on SIP transactions versus dialogs [166] [167] [168] [169]:

http://www.iptel.org/sip_transaction

http://www.iptel.org/node/20

http://www.ietf.org/rfc/rfc2543.txt

http://www.ietf.org/rfc/rfc3261.txt

 

10. Glossary

 

CDR stands for Call Data Records

ACD stands for Average Call Duration

UTC stands for Universal Time Coordinated

CET stands for Central European Time

CEST stands for Central European Summer Time

MVNO stands fro Mobile Virtual Network Operator

OLO stands for Other Licensed Operator

PNS stands for Personal Numbering Service

 

11. Syslog and CDR files

 

This section groups all files used along this research. The list contains files with raw syslog records as well as files showing different statistics. The reference that contains the call records and is not heavy to open is the output CDR file [101013+6-14'cdr.xls].

 

 

Description:

All transactions of answered calls

File:

data1\101013+6-12'answered.zip

Size:

3.69 MB

 

 

Description:

Call Data Records in Text format

File:

data1\101013+6-13'calls.txt.zip

Size:

1.29MB

 

 

Description:

Calls sharing the same call-id

File:

data1\101013+6-13'calls.xls.zip

Size:

7.21MB

 

 

Call Data Records created from the syslog file:

Description:

CDR file created from syslog

File:

data1\101013+6-14'cdr.zip

Size:

1.66MB

 

 

Description:

Distribution chart by hours

File:

data1\101013+6-15'chart.zip

Size:

2.18MB

 

 

Description:

Vendor and syslog CDR comparison

File:

data1\101013+6-16'slovenia.zip

Size:

12.8MB

 

 

Description:

Simultaneous calls per phone

File:

data1\101013+6-17'phones.zip

Size:

1.08MB

 

 

12. Formatting particularities of this document

 

This section is addressed only to persons editing this or similar documents. This section is unrelated to the subject of the document.

 

12.1.           Styles

 

The following image shows the styles used in this document. Do not add new styles when editing and updating this document.

12.2.           File reference style

 

The [file reference] table style is bugging. When you open the document the font settings are mixed up. In the modify style pane of the [file reference] style you have to re-apply the [Lucida Console] fonts to the right column of this table style. This will restore all other setting of the style. The procedure must be carried out before printing or saving the document in HTML format.

12.3.           Numbering of references

 

Microsoft field codes are used for auto incremental reference numbers appearing in the document. In order to toggle field codes you have to first remove the hyperlink (Ctrl-K).

To add a new reference you need to copy any of other references and change only the hyperlink. You do not need to care about the numbering. The numbering of all references can be updated in a single step. Select the entire document and in the right-click pop up menu choose [Update Field].

12.4.           Deleting the reference number bookmark before printing

 

Before printing the document, update all fields (as explained in section 12.3) and delete the “iref” bookmark (Alt-I-K). Otherwise, all references will appear under the number of the last reference.

 

 

12.5.           Conventions on the new versions of the document

 

The main document file is a numbered index<N>.doc file, where <N> is an incrementing version number of the document. The document must be saved in index<N>.htm file (accompanying by an automatically generated folder index<N>_files). Every time a new version is released, the index.htm file must be deleted, and the last index<N>.htm file must be copied and renamed into new index.htm file. At any moment the index.htm file is a copy of the last index<N>.htm file. The index.htm file can be erased at any time when a new version is released. You must not have index.doc file. The folder index_files (corresponding to index.htm file) must be deleted as the index.htm file will anyway refer to the files located in the folder index<N>_files. At every update you must add in the header of the document your name and under the date of the update a link to the current version of index<N>.htm file (and not to index.htm) for backtracking.

 

Data files accompanying your document (not the files generated automatically when saving in HTML format) must be located in data<M> folder, where <M> is an incrementing number and is not necessarily equal to <N>. Do not hesitate to create each time your own data<M> folder, instead of adding pieces in already existing data<M> folder of the previous author.

 

 

*   *   *

Copyright © 2010 by Switzernet