Bill Lovett

Tools for Better Logfile Analysis

Posted on November 14th, 2005

I'm anxious to see how Google Analytics is going to work out. I've enabled it on this site as of a few hours ago, but apparently you need 12 of them to pass before you get a peek at your first report. I wasn't aware that Google had bought Urchin, but I'll happily take advantage of another freebie.

The timing of this latest offering from Google is great, because it dovetails nicely with my recent interest in doing more and better logfile analysis. With over a year's worth of logfiles in tow, I've got grand visions of deep analysis of the whos and whats and wheres behind my website. For the longest time I looked to Awstats for some of the answers, but as great as that product is compared to other freely-available competitors, it wasn't enough. My current hosting company meanwhile provides Analog reports, but I've never found them very easy to understand.

The one shortcoming I can see already with Google Analytics, and likewise with Measuremap I think, is the way it collects information. Not so much a shortcoming in terms of functionality, because you can actually get more tidbits of information that way like screen resolution and plugin support. More a shortcoming of comprehensiveness. Things occasionally fall apart. What if the service on the other end of your analytics software has a glitch one day? What if you end up missing a slice of your activity for some unexpected technical reason? Analysis based on your server logs is the only road to definitiveness that I know of. If it's not in your server logs, it didn't happen. With Javascript code embedded in your template, the best you can say is that it probably didn't happen.

I'm interested in seeing the reporting interface for Google Analytics. With Awstats, I could only have a limited set of questions answered in a certain, predetermined way. It could tell me that a certain number of people had hit a certain file a certain number of times, but I couldn't necessarily dissolve that number into anything more granular. That's where the interesting answers are— the kind you get from asking questions like: Of all the hits I got to this particular page, how many came from a link on another page of this site versus a link on some other site?

Just from pulling one day's worth of traffic into a database and building SQL queries to reach that extra layer of detail, I found that a spider program from MSN had hit my site several hundred times more frequently than anything else. And that was from asking a relatively simple question. There's bound to be all sorts of other juicy tidbits waiting to be uncovered provided you can ask the right questions.

The other thing I noticed from my experiences with Awstats is that I usually zeroed in on two or three areas of the page and ignored the rest. You can only do so much with the knowledge of which country your visitors originate from, for example. What pages did they look at? Was the Brazilian traffic all going to the pages I wrote about learning Portuguese or something entirely different?

It's more than just the quality of the interface, although that is important. I think the real key to useful logfile analysis is the luxury of being able to have any question answered, whether it's a whim or something you're going to track from day to day. A local database seems like the only way to approach that. Granted, it'll end up being a very big database as time goes on, and probably bloated with less-than-perfectly-normalized data, but still.

Back to the index of all blog entries