Synchronization and backup of rotating log file

 

Emin Gabrielyan

 

Created and modified on 2010-12-28 by Emin

 

For the fraud monitoring the analysis of voluminous log files of the radius and billing servers is required. The billing and radius log files contain much more information than the data stored in the database. The daily log file can grow up to 3GB uncompressed. Recurrent text processing of such files cannot be done directly on the billing servers. It can be done on a separate fraud-control server without CPU usage restrictions. We have several requirements to fulfill. First we need to sync and store all files on the fraud-control server as long as necessary while the master server rotates the log files by deleting the oldest in the queue. It is also required to sync the currently running file permitting the processing on the fraud-control system on hourly or more frequently basis. Waiting 24h until the closure of the file may cost a fraud of tens of thousands of dollars.

 

This document presents a script developed for syncing and storing all radius log files. It tracks the currently running log file on the server, incrementally adds the new chunks of data to the local file, keeps tracking the file after it is renamed (due to rotation) on the remote server, adds last chunks of data after rotation, opens a new file for syncing with the fresh log, and keeps syncing the fresh log. The log files on the local server are prefixed by a value of date and time (instead of using rotation index). The rotation on the remote server and eventual cleaning of old files does not affect the local filenames.

 

The script updates are available for downloads:

[Downloads]

 

The code and the description of the script is provided below:

 

Script

Description

#!/bin/bash

# Keep this line (these two UNIX lines) unchanged

# Copyright (c) 2010 by Emin Gabrielyan of switzernet.com

# Created and modified on 2010-12-28 by Emin, on 2010-12-29 by Emin

# Version: aa10

The purpose of this script is to sync remote rotating log files with local files, names of which are prefixed by a date and time.

 

The local files do not undergo rotation, do not get renamed or deleted.

 

The active local file grows together with the running remote file.

                                                                         #

                                                                         #

sync="/home/var/log/radius"                                              #

work="/home/var/log/radius/work"                                         #

                                                                         #

                                                                         #

connect="`cat $work/connect.txt`"                                        #

This script is called by cron every 3 minutes or so. The script quits if the previous one is still running.

 

This script also updates itself when new updates are available.

 

The working folder must contain a connect.txt file with ssh settings.

 

Example of connect.txt:

ssh -c blowfish -i ~/key.me user@host

 

The parameter “sync” contains the local folder name of local files being synced.

far_running=/var/log/porta-billing.log                                   #

far_rotated="/var/log/porta-billing.log.?"                               #

rewind=222111000                                                         #

Parameter “far_running” contains the name of the remote active log file.

 

Parameter “far_ rotated” is a pattern matching to a new filename of the log file once it is closed on the remote server.

 

This parameter permits to find the file still after closure and rotation.

 

When the renamed instance of the file is found the missing tailing block can be synced before closing the file locally.

 

When the script is launched and the remote file is too large it will not download the entire file from the beginning.

 

The parameter “rewind” tells from which point in the remote file the copying must be started.

                                                                         #

function samedownload                                                    #

{                                                                        #

  cd $work                                                                 #

  url1="http://parinternet.ch/2/public/101228-radius-log/ver/download.zip" #

  url2="http://switzernet.com/3/public/101228-radius-log/ver/download.zip" #

  url3="http://unappel.ch/2/public/101228-radius-log/ver/download.zip"     #

  url4="http://www.unappel.ch/2/public/101228-radius-log/ver/download.zip" #

  ok=1                                                                     #

  for((i=1;i<=2;i++))                                                      #

  do                                                                       #

    wget -t 1 -N $url1 && break                                              #

    wget -t 1 -N $url2 && break                                              #

    wget -t 1 -N $url3 && break                                              #

    wget -t 1 -N $url4 && break                                              #

    [ $i -eq 2 ] && ok=0                                                     #

  done                                                                     #

  if [ $ok -eq 0 ]                                                         #

  then                                                                     #

    errlog "Cannot download updates"                                         #

    return 0                                                                 #

  fi                                                                       #

  install=0                                                                #

  if [ -f download.zip ]                                                   #

  then                                                                     #

    ls -li --time=ctime --time-style=+%Y-%m-%d,%H:%M:%S download.zip |       #

    cat > download2.txt                                                      #

    if [ -f download1.txt ]                                                  #

    then diff download1.txt download2.txt > /dev/null || install=1           #

    else install=1                                                           #

    fi                                                                       #

    mv download2.txt download1.txt                                           #

  fi                                                                       #

  if [ $install -eq 1 ]                                                    #

  then                                                                     #

    standard "New update received"                                           #

    rm -r Downloads 2>/dev/null                                              #

    unzip -o download.zip -d Downloads |                                     #

    while read s                                                             #

    do standard "$s"                                                         #

    done                                                                     #

  else standard "No update received"                                         #

  fi                                                                       #

  return $install                                                          #

}                                                                        #

                                                                         #

                                                                         #

This function is downloading updates.

 

It returns false if the downloaded version is not the same as the current.

 

Based on the return code the script will decide whether to update itself or not.

 

Here we have four possible addresses to download from. We go through these addresses twice.

 

Whenever a download works we exit from the loop.  If we do not break the loop after all attempts, we quit the function with the “true” return value, meaning that the same code must be executed and no update us needed.

 

If upon execution of wget commands we have the “download.zip” file in the current folder, we check if the previous timestamp was saved in a download.txt file. If not, we will unzip the file. If yes, we will compare the old timestamp with the current one. If a difference is found we will unzip the file.

 

Before unzipping the file we remove the [Downloads] folder completely. The function returns true (0 = no error), if the downloaded ZIP file is not unzipped, meaning nothing to install.

 

                                                                         #

                                                                         #

function standard                                                        #

{                                                                        #

  msg=$1                                                                   #

  date +%Y-%m-%d,%H:%M:%S,"$msg" | tee -a $sync/standard.log               #

}                                                                        #

                                                                         #

This function prints messages in the standard output log file.

function errlog                                                          #

{                                                                        #

  msg=$1                                                                   #

  date +%Y-%m-%d,%H:%M:%S,"$msg" | tee -a $sync/error.log                  #

}                                                                        #

                                                                         #

This function prints error messages in the error log file

function error                                                           #

{                                                                        #

  errlog $1                                                                #

  mv $sync/status=busy,lock.log $sync/status=free,lock.log                 #

  exit 1                                                                   #

}                                                                        #

                                                                         #

This function prints an error message; unlocks the process, and exits returning an error code.

                                                                         #

function createifnone                                                    #

{                                                                        #

  cd $sync                                                                 #

  if [ ! -f ??????,??????,*,*,running,radius.log ]                         #

  then                                                                     #

    exe="f=$far_running; ls -i \$f; wc -c \$f"                               #

    info=`$connect "$exe" 2> /dev/null | awk '{printf ",%s",$1}'`            #

    if [ -z "$info" ]                                                        #

    then error "Error getting remote info"                                   #

    fi                                                                       #

    inode=`echo $info | cut -d, -f2`                                         #

    size=`echo $info | cut -d, -f3`                                          #

    if [ $size -gt $rewind ]                                                 #

    then size=$((size-rewind))                                               #

    else size=0                                                              #

    fi                                                                       #

    cp /dev/null `date +%y%m%d,%H%M%S,$inode,$size,running,radius.log`       #

  fi                                                                       #

}                                                                        #

                                                                         #

If the running log does not exist locally, we obtain the inode number and the size of the remote log file.

 

If the size is too large we will rewind the number of bytes specified in the corresponding parameter and will consider copying the remote file from that point. Otherwise we will start copying from the beginning of the remote file.

 

We create an empty file, with a comma separated fields in its filename. The first two fields represent the date and time. The third field is the remote inode number and the forth field is the header size that has to be skipped in the remote file.

 

We keep the remote file’s inode number in the local filename in order to be sure to which remote file it really corresponds, as the remote files change their names due to log rotation.

                                                                         #

function logsync                                                         #

{                                                                        #

  cd $sync                                                                 #

  local=??????,??????,*,*,running,radius.log                               #

  fdate=`echo $local | cut -d, -f1`                                        #

  ftime=`echo $local | cut -d, -f2`                                        #

  inode=`echo $local | cut -d, -f3`                                        #

  head=`echo $local | cut -d, -f4`                                         #

  set 1 `wc -c $local`                                                     #

  size=$2                                                                  #

  skip=$((head+size))                                                      #

  exe1="set 1 \`ls -i $far_running $far_rotated | egrep ^$inode\\ \`;"     #

  exe2=" f=\$3; [ ! -z \"\$f\" ] &&"                                       #

  exe3=" echo \$f >&2 && dd bs=1 skip=$skip if=\$f"                        #

  far_file=`$connect "$exe1$exe2$exe3" 2>&1 >> $local`                     # Order is important

  if [ -z "$far_file" ]                                                    #

  then error "Error finding the far inode"                                 #

  fi                                                                       #

  echo "$far_file" | while read s                                          #

  do standard "$s"                                                         #

  done                                                                     #

  if [ ! $far_running = `echo "$far_file" | head -1` ]                     #

  then mv $local $fdate,$ftime,$inode,$head,rotated,radius.log             #

  fi                                                                       #

}                                                                        #

                                                                         #

                                                                         #

We locate the local running log file. It permits us to find out the remote inode name and the header skipped in the remote file.

 

By considering also the current size of the local file, we compute exactly how many bytes we have to skip on the remote file (head+size).

 

We prepare a command to be executed on the remote server. First it finds out the name of the file corresponding to the inode number. If such file is found, it will display it on the standard output starting from the position following the skipped bytes.

 

The standard output is redirected into the local file while the warning output is stored in the parameter “far_file”. The warning output contains the filename being actually copied and the statistics of the dd command.

 

If we discover that the filename corresponding to the remote file (first line in the far_file parameter) is not the default remote log file, we rename the local file by changing the “running” flag into “rotated” (the fifth field in the filename).

 

A local file with running flag will not exists anymore and upon the next cycle it will be created by “createifnone” function from the current log file on the remote server.

 

Pay attention to the order of redirections “2>&1 >> $local”. In this order we send the standard output into file $local, while the error output will be displayed instead of the standard output. In a different order, you would send everything into the file. Here we use the error output channel in order to capture the remote filename and to trigger an eventual shift of the running log file.

                                                                         #

function checkfolder                                                     #

{                                                                        #

  cd $sync                                                                 #

  n=`ls ??????,??????,*,*,running,radius.log 2>/dev/null | wc -l`          #

  if [ $n -gt 1 ]                                                          #

  then error "Error with running files"                                    #

  fi                                                                       #

}                                                                        #

                                                                         #

Only one running log file can exist in the folder. If it is not the case we exit with an error. The function “logsync” will identify the corresponding remote log file to sync with by looking at the local filename of the unique running log file.

                                                                         #

function lock                                                            #

{                                                                        #

  ls $sync/status=*,lock.log > /dev/null 2>&1 ||                           #

  cp /dev/null $sync/status=free,lock.log                                  #

  if mv $sync/status=free,lock.log $sync/status=busy,lock.log 2>/dev/null  #

  then                                                                     #

    date +%Y-%m-%d,%H:%M:%S,"pid=$$" | tee -a $sync/status=busy,lock.log     #

    return 0                                                                 #

  else                                                                     #

    errlog "Locked since `tail -1 $sync/status=busy,lock.log`"               #

    return 1                                                                 #

  fi                                                                       #

}                                                                        #

The function “lock” ensures the exclusive execution of the synchronization down streaming as well as excludes attempts of updates of scripts while they are running.

 

If the program is launched the first time, the lock file, set to “free”, is created. The status changes to lock by renaming the file. The lock file is used also for logging all locking events.

 

If the file is already locked, the PID of the process is displayed in the error log file.

                                                                         #

function unlock                                                          #

{                                                                        #

  mv $sync/status=busy,lock.log $sync/status=free,lock.log                 #

}                                                                        #

                                                                         #

The function “unlock” changes the status of the system into “free” by renaming the lock file. The new attempts of execution of the script launched by cron will thus be able to take the relay.

 

In the newer version (see the code repository), the unlocking events are logged in the file similarly to the locking events.

                                                                         #

function runlock                                                         #

{                                                                        #

  if lock                                                                  #

  then                                                                     #

    for((i=1;i<=7;i++))                                                      #

    do                                                                       #

      checkfolder                                                              #

      createifnone                                                             #

      logsync                                                                  #

      sleep 10                                                                 #

    done                                                                     #

    return 0                                                                 #

  else return 1                                                            #

  fi                                                                       #

}                                                                        #

                                                                         #

                                                                         #

This function executes the synchronization. If the system is busy by another process it returns “false” code (error code = 1). It creates a locally running version of the active log if it does not exist. It syncs the local file with the remote log. The same function “logsync” renames the local file from “running” into “rotated” once it detects that the file is closed on the remote server. The function “createifnone” will create locally a new running log file whenever the previous one is closed and renamed permanently into “rotated”.

                                                                         #

if runlock                                                               #

then                                                                     #

  if samedownload                                                          #

  then                                                                     #

    unlock                                                                   #

    exit                                                                     #

  else                                                                     #

    unlock                                                                   #

    cd $work                                                                 #

    standard "Quit and self update"                                          #

    exec cp Downloads/run.sh .                                               # Delete process memory

  fi                                                                       #

fi                                                                       #

                                                                         #

                                                                         #

                                                                         #

                                                                         #

This is where from the script starts its execution. We attempt to run and lock the sync. If we succeed, we check for new downloads. If there are no updates we unlock and exit. If there are updates, we unlock and replace our currently running script by the new version of the script taken from the downloaded package.

 

Pay attention to the “exec” command. Without the “exec” command the script risks to continue its execution after the self-update. The bash interpreter will continue reading the file from the next location in the file, and if the file contains new text (due to the copy into itself) the bash will attempt to interpret and execute it. This is a very non-desirable behavior. The “exec” command completely erases the process memory and closes all open file descriptors. It then loads into the memory a completely new code which is the executable of the “cp” command. There is no more risk of the side effects.

 

Files:

Log files tracking the development progress [zip]

 

References:

 

This web page:

http://unappel.ch/2/public/101228-radius-log/

http://parinternet.ch/2/public/101228-radius-log/

 

http://marc.info/?l=openssh-unix-dev&m=105414566514173&w=2

 

 

 

*   *   *