Saturday, November 28, 2009

We deserve better commodity information

It’s no news that Wikipedia, with all its flaws, is the default information source of a generation of skulls full of mush. If this wasn’t obvious enough from my college students, it was brought home a week ago when interviewing FLL robotics contestants (ages 9-14), when nearly all said their project “research” consisted of Google and Wikipedia. (One team said Google and Yahoo).

However, since then, Wikipedia’s problem has been a front page Wall Street Journal story Monday (and blog entry) on how Wikipedia is losing volunteers, specifically 49,000 in Q1 2009. The Telegraph had the most comprehensive follow up stories although the Times of London had good coverage (including a great article on the four sources of error.)

The impetus for the original WSJ article was the academic research of Felipe Ortega, who is part of a group studying open source software but actually did his Ph.D. dissertation on Wikipedia (a related but quite different species). He’s been tweeting to offer his comment on the current news coverage. While the bulk of his research hasn’t gone through the peer review, the abstract suggests he’s taken seriously all the research design issues.

After all the articles and the academic study, the official Wikipedia response is pretty unsatisfactory. It changes the subject, arguing that while the tide of new volunteers roughly matches the ongoing losses, at least the site traffic and number of articles continue to grow.

However, none of this relates to two inherent problems in Wikipedia that the current management is unable to solve, plus the third (and potentially catastrophic) outcome of WIkipedia’s commoditization of information.

The first problem is that Wikipedia publishes content by persistent idiots. Now that there are dozens or thousands of individuals trying edit articles on almost any topic, there are chronic edit wars with rival editors taking out each other’s changes in edit wars.

Competition is healthy — if there’s a selection mechanism based on quality or performance. Wikipedia has no such mechanism. Instead, what gets published comes from people who whine and bitch and moan, who win out over people who know what they’re talking about but have better things to do with their life. This works well for chronicling Simpsons episodes but not for summarizing academic research or major historical controversies. (Yes, I know that there are capable contributors, but in every battle between idiots and experts, the idiots are winning.)

I tried to sell this angle to a reporter I spoke with Monday, but I guess he thought it was just the griping of a snobby college professor who gave up years ago after watching his work be mangled by twits. However, in the 27 comments (thus far) to the official Wikipedia response were these five comments:

  1. Well, I have taken hours editing and polishing a biographical article about a scientist. There is nothing in the article now that is under dispute, yet it is probably going to be taken down and deleted as one editor is exercising his or hers petty power-plays.
  2. My most recent experiences have been quite negative: edits reverted with no reason, pages tagged as grammatically terrible when they were no such thing, or tagged as “not up to WP’s standards” when they were stubs and in some cases *still editing*. These taggings tended to be “drive-by” in the sense that some other editor dropped the tag onto the page or made their reversion but then failed to respond to explanations on the talk page for days.
  3. I used to spend a lot of time writing for Wikipedia, amending entries and creating new articles. Now it seems that a small number of self-appointed editors run the site. If I create new articles then they are nearly always deleted. If I correct information I know for a fact is wrong, it is reverted back and I am warned by the small sub class of elite editors
  4. I took on editing the Albigensian Crusade page a while back, a fairly simple job because what’s known about it comes principally from three contemporary chronicles dealing with the specific subject. A chronicle is self-indexed by time, therefore it should have been adequate to simply point readers in the direction of the sources, but no, that was inadequate, full references please. I got started, went so far, and checked if this was right. The %*$^^@# responsible refused to take the time to feedback, and was quite rude about it, so I stopped. Other appeals to administration went nowhere, and I concluded this is a system full of chiefs who can’t be bothered to get their hands dirty actually editing,
  5. I, for one, am one of those professional contributors who left Wiki in disgust. After spending a lot of time creating pages or adding a lot of content, some amateur came along and dumbed down the content and added fictious pictures that were purported to be of the creatures listed. It became a waste of my time to provide a lot of information that could be cut-and-paste into term papers, dissertations, reports, etc., and have some arm-chair contributor wreck it all.
This is a problem I’ve known about since soon after I joined Wikipedia in November 2003. The entire production process would have to be ripped up to fix this. Even Amazon has a way of providing feedback on user contributions so that readers know whose comments have been useful, even if it (and other processes) is fatally broken for highly polarized topics like politics.

One problem I didn’t see coming was the inevitable shift from original writing to maintenance mode. I started my main burst of Wikipedia contributions (2003-2004) by creating 11 new articles, from venture capitalists Eugene Kleiner and Tom Perkins to adding two missing campuses (CSULB, CSUSM) of the 23-campus CSU system.

Today, thanks to the law of large numbers (and the long tail) are very few significant articles left to be written. (Yes, Wikipedia has an article on only one Joel West — and it’s a lame one — but I don’t consider that a major omission.)

This reminds me of what I experienced in my first few years as a professional programmer: it is so much more more fun to write new code than maintain someone else’s code. In fact, as I became a manager I learned this is a major recruiting and staffing problem — even when you pay people, let alone when they’re volunteers. Over and over again, I saw that the manager or other “stuck” (high switching cost) programmers had to take the scut work so you could offer the new exciting stuff to attract the best talent.

Clearly, at Wikipedia existing volunteers don’t want to do the scut work, nor do the newcomers. If it’s de minimus, then (to use an analogy) perhaps good citizens will just pitch in and pick up the candy wrapper, but nobody’s going to spend a weekend clearing up the trash along the highway just for the fun of it.

Wikipedia is running out of good jobs to hand out. If you can’t give out fun work, how are you going to attract people? What I didn’t see six years ago was that inevitably Wikipedia’s content base would mature: first in English and eventually in all the major languages. When this happened, the opportunities for adding new content would mainly be limited to current events like new hurricanes or those Simpsons episodes.

However, I find hope in Wikipedia’s current troubles, as they suggest a solution WIkipedia’s most invidious problem: the commoditization of human knowledge. Monopolies are bad, even if they are for free goods. When I was interviewing open source leaders, the Apache (and most “open source” types) seemed to get this, while the free software types (Linux, OpenOffice) did not.

Competition is inefficient, but it provides choice. Monopolies at best mean benevolent dictators, and few benevolent dictators remain benevolent forever.

The mind-numbing ubiquity of WIkipedia is teaching a generation of kids to be lazy and uncritical consumers of information — whether it’s truth or merely wikitruth. They take what shows up on the first page of Google or in Wikipedia and assumes it’s true, even when it’s not.

When I was a kid, I would do my 5th grade reports using World Book, Encyclopedia Brittanica, usually one other encyclopedia like Collier’s or Compton’s, and also the Information Please Almanac. (If the report was important, I would also try to find a real book or two.) This wouldn’t make me an expert, but at least I would get multiple perspectives.

Today, Wikipedia’s commoditization of information means that Encyclopedia Britannica is struggling and its previous nemesis (the Encarta CD-ROM) is gone. At least a five-year-old version of the Columbia Encyclopedia survives as

Once upon a time, I assumed that the network effects meant that nothing would ever compete with Wikipedia. This week shows that in less than a decade it’s possible to create a significant body of knowledge with volunteer labor. None of the existing rivals have yet succeeded, whether Citizendium, Conservapedia, Liberapedia or Knol. However, with this large body of existing (or potential) body of would-be Wikipedia labor becoming available, they are certainly trying.

I will be curious to see if we can achieve success from volunteer organizations that focus on the quality rather than the quantity of contributions. In this direction, Citizendium (by WIkipedia co-founder Larry Sanger) is using a somewhat modified version of the Wikipedia process, while Google’s Knol is heading in a different direction by emphasizing authorial integrity over cumulative production.

Given the almost total lack of competition, anything that provides a viable alternative to WIkipedia is a good thing. It will be a good thing if a decade from now we have three or four online encyclopedias to choose from, much as today we can choose from three or four cellphone carriers.

It’s likely that one of these alternatives will be Wikipedia. Perhaps if its leaders take its current problems seriously, it will still be the most popular alternative out there and will be able to meet its current modest fundraising goals.


Julius Beezer said...

Re: school students who researched using Google and wikipedia, what this is telling us is humans prefer ease of access over quality. Particularly when we know nothing, something that is good enough, is a lot better than something that is hard to get hold of. You were fortunate, in global terms, to be so well-endowed with booktexts as a fifth-grader.

As for your struggles in the edit wars, I sympathise. The struggle for universal human enlightenment is not an easy one. Take a break, come back refreshed. Wikipedia's competitors are using the dark force to sap your morale. They don't like a free site being 3x more popular than the lot of them put together, but they are dinosaurs.

I agree with you about a multiplicity of sources though. Citizendium's real world authoring approach is the one that interests me most, but it's only at 12,000 articles, last I read.

Joel West said...

I agree with your point about inherent student laziness, but not the others.

Wikipedia does not nor (as currently constituted will it ever) provide anything resembling "universal human enlightenment." At best, it will have the world's biggest collection of factoids.

The people who criticize Wikipedia are not "the dark force," but realists who would like to see it better and at this point have given up.

As I have said many, many times, choice is good. Competitors are forces for good, not evil. Have a viable alternative to Wikipedia will be good for everyone — including Wikipedia.

Julius Beezer said...

Enlightenment is in the person, not the texts.

Laziness is such a loaded word in our culture, that I try to avoid it myself.

Perhaps we might agree that people who don't want to know are lazy*; after that we're just arguing about the quality of the information that those that do want to know can access at this point in history.

*even this is highly problematic: if my 11 year old self would rather play football than swot the encyclopedia--does that mean I was lazy? Or just had other priorities?

Joel West said...

If kid is asked "who was president between James Buchanan" or "what is the capital of North Dakota," then spending 10 seconds with Google or Wikipedia makes sense.

If a high school or college student has been given a graded research assignment, then it is relevant to expect students to distinguish between reliable and unreliable information.

My concern is not that students take the easy way out — my concern is that students don’t know how to do it the right way, and many don’t even realize the limitations of the GIGO they are consuming.