Monday, August 25, 2008

Endless beta

In looking up citations of our 2006 book Open Innovation on Google Scholar, I found that the data miners confused two books on the topic with similar names. Lacking any better recourse, I decided to report the bug to Google. The following day, I got back this reply:

From: "Scholar Support" <>
To: "Joel West" <>
Subject: Re: [#XXXXX] Mis-indexed article

Hello Joel,

Thank you for your note. As you may have noticed, Google Scholar is in beta, and we're currently working out some of the kinks. We appreciate your bringing this indexing issue to our attention, and we'll pass it on to our engineering team. Thank you for your assistance in improving Google Scholar.

The Google Scholar Team
In looking up when I first tried Google Scholar, I note this comment in a November 2004 e-mail that I sent to my SJSU colleagues:
there are still problems, particularly in listing the same book or article twice (because it was cited two different ways).
These problems of not matching references still exist, and perhaps they are unsolveable without human intervention. (Personally, if everyone used DOI in their references, there would be no problem). Here, I wonder if trying to compensate for this matching problem erred too far in the other direction — reporting a false match.

That, however, was not my point of raising this. While I love Google Scholar, perpetually hiding behind the “beta” claim is a cop-out. Before Google, no self-respecting software company would run a public beta for 4 years. Of course, this is a bad habit of Google’s: Google News went through beta for at least 3 years, and Grand Central also remains in beta after its 2007 acquisition.

A year ago, Dean Guistini made this exact point in more passionate terms:
At the outset of today's post, let me say that perpetual beta is a pointless Web 2.0 notion (a cop-out) and decidely unhelpful to academics and librarians. Beta-testing. In beta. Not quite finished yet. To be released in full soon. At times, the race that Google scholar seems to be running is against itself - both tortoise and hare. GS has few real competitors, and is cavalier about how it is developing. Why does it do this? you ask...Because it can.
He goes on to berate Google for its secretive responses to an interview with a library-oriented journal.

Dean’s absolutely right: Google does what it can because it can. Total World Domination has its privileges.

Even given that, I don’t understand Google’s motivations in this particular case. For Google Scholar, why is Google is either “beta” or secretive about its usage, given that it has no competitors and no discernible business model. (Yes, it certainly would be possible to charge academic databases for paid download click-throughs, but that’s chump change for the $20b/year Internet behemoth).

The thing seems like a public research experiment, rather than a production service — a toy made by PhDs for other PhDs.

No comments: