Sunday, August 28, 2005

Masquerading as Googlebot

This is one of my pet peeves. There are a lot of sites off late that show up on google's search when you are looking for stuff but the content itself is not available unless you register or subscribe for the content. During one of my recent searches , I ended up in a page that looked very promising. What ticked me off was that when I clicked to get to that page the search term I used was not anywhere in the visible part of the article. I was told to signup for a monthly online pass if I wanted the rest of the content. Somehow, this did not sit well with me at all. They should either make the content available or not have it be indexed at all. This has happened to me in the past (especially with searches that take me to Experts Exchange) but I never thought twice about it.

When I thought more about how these sites managed to allow the Google robot to index their pages without any subscription but didn't let me view it, the light bulb when off!. It was simple, they were just looking at the Browser's user-agent (A HTTP Header that identifies the requesting Browser) to let Google's robot through but not me. So all I had to do to see this content was pretend to be the Google's robot.

Changing User-agent in IE is possible but very cumbersome. But I would not recommend it because a lot of other things like windowsupdate , sites that use browser detection instead of object detection in javascript will be very confused. I would instead suggest doing this in firefox (Shame on you if you don't also have firefox on your desktop). There is a wonderful user agent switcher plugin in firefox that allows you setup your own user-agent. After download and install, restart firefox and go to Tools->User Agent Switcher -> Options -> Options, go to User Agents tab, add a new user agent and set

* Description ==> Google Bot,
* User Agent ==> Googlebot/2.1
* App Name ==> Googlebot
* App version ==> 2.1

Now go to Tools menu and select Tools->User Agent Switcher->Googlebot. If you go back to the same URL I mentioned in this blog above , you will now see the entire article!!. All I do now when I see sites using this technique: I simply switch my user-agent to Googlebot. Some may contend this is borderline hacking, but I am sorry , I think these sites deserve it considering the amount of my time that I have wasted wading through search results because of them.

