Did You Notice That Bot Attack?

March 20, 2006

Last Friday, I wrote what I thought was a pretty nice review of StatCounter, a little free web analytics package. I included this line, “And one should never forget that, like many client-side analytic packages (i.e. the kind where you don’t need server logs), they don’t collect data about bots (like the GoogleBot), because most bots don’t read javascript.”

Being a good cross-referencing blogger, I then went to the StatCounter forum, where I have never written, and referenced my post in their “Do you like us?” section. I was really surprised to see the Master Member, Christine, write back this post:

“Compliment or Competitor: Forgive me if I’m wrong or paranoid. Bur aren’t you in fact a competitor of Statcounter’s? Your flagship website from your sig doesn’t use Statcounter…”

Wow. I was expecting, “Welcome new member” or maybe, “Thanks for the great review.” I never figured out how using their code on only my blog made me a competitor.

But she went on,

“BTW, you do have some factual errors when discussing Statcounter’s capabilities. The most strinking (sic) of them is that bots don’t get tracked by Statcounter. Statcounter tracks image enabled bots. The bots that don’t get tracked are not tracked because they are not image enabled, not because they are not javascript enabled.”

I wasn’t sure what she really meant, so I went back to Hack 23 in Web Site Measurement Hacks and read, “…a solely client-side data collection model (page tags) may not be able to collect all robot/spider traffic information, because some robot/spider agents do not execute JavaScript and generally do not accept cookies.”

Well, the author (Eric Peterson), wrote “may not” instead of “will not.” So I wrote Fred Kuu from HBX Uncovered. Fred is “the Web Metrics Technical Lead at Adobe Systems” according to the HBX website. Here’s his answer:

Hi Robbin,
Most bots (aka spiders or crawlers) cannot parse and execute Javascript. This is why all vendors (based on JS tagging approach) tout that they track and report only human activity. Granted, it is possible for a hacker to program a bot to parse the Javascript but it’s not easy and there’s not much of a gain by enabling it.

Now, regarding images, almost all bots (especially search engine ones) will be able to track if a page contains images but majority will not actually request the image. So in the logs, virtually all bot activity are to web pages and not to images or other binary files.

-Fred

Then I went to my SiteCatalyst user manual (as another reference point – it is also a client-side package) and it said, “SiteCatalyst does not track spiders since they do not load images.”

So finally I wrote Jon B. in London. The great thing about writing the other side of the ocean is, you write them at night and the answer is in your inbox before you wake up the next morning. Here’s what he said:

Hey Robbin, how goes it?

Strictly speaking, bots don’t execute Javascript. The javascript is responsible for loading the image hence there is some indirect truth in the sense that the bots don’t load images. Also bear in mind that bots can load images – that’s exactly how Google images sources its images database-index…

So, I put all this data into my blender, turned it on high and came up with this: If a bot reads a picture (like the Google Imagebot), a client-side solution like StatCounter can pick them up if the company decides to enable that ability. However, most bots are not about pictures, they are about finding your text, and those bots don’t get picked up by client-side solutions because they don’t talk to the javascript.

Robbin Steif, CEO
LunaMetrics