Log analyzing Apache on Debian.

January 2009


I've been toying around a bit with log analyzers on this little webserver.

Simplest of analyzers is visitors. Installation is trivial (as usual): apt-get install visitors Usage goes like this:

visitors -A --prefix http://www.mydomain.com --prefix http://www.alsomydomain.com -T -m 30 -o html >/web/www.mydomain.com/users/myaccount/visitors.html /var/log/apache2/www.mydomain.com-access-log

It can also trail a log that is still being written to with the --stream option, which I didn't try. But it cannot trail multiple log files at once (or so it seems), and it doesn't group results by smaller units (at least not with default configuration). I also skipped the creation of dotty visualizations of most-followed traversal patterns of users on the site, as that didn't work without config either. But I did leave some example output.

[Note]Note

Do note that the version I used -0.4a- is ancient, so things may have improved quite a bit in modern versions.

Webdruid (ancient version 0.5.4) brings richer output. It groups its output by month per default, and also has a mode that allows reverse lookup of clients so a partitioning of trafficd by country can be generated.

apt-get install webdruid
gzip -dc /var/log/apache2/www.my-domain.com-access-log-20090213.gz|webdruid -N15 -D /tmp/DNScache

(which writes to the current directory), where you can see the nice top hitting countries list it generates, as wel as the most traveled traversals across the sites.

The version of ploticus is so ancient that it doesn't even exist. Therefore I had to run lire on a different host, which might have skewed the comparison unfairly in its direction. Otoh, I didn't get the chance to try out its fairly advanced scheduling features, which might anneal the skew.

apt-get install lire ploticus
lr_log2report --output html common www.cs.rug.nl-access-log.20090211 /tmp/weblogreport-cs.html

The output ended up here. Lire seems to be the most versatile of the three analyzers, but does require the user to swallow some more application-specific terminology in order to be set up.

For a quick overview of a single log file, visitors is probably the quickest solution. Webdruid brings the best performance straight out of the box, but it lacks a bit in versatility with several logfiles. It does support near real-time updating of analyses though. Lire to me is the package of choice, as it supports other applications and log formats too, and doesn't require to manual preprocessing that Webdruid needs in order to do reverse lookup of IP numbers.