Monday, February 25, 2013

Correcting WIkipedia's distorted mirror on the world

We had friends over last week, and at one point the discussion drifted to Wikipedia. The husband is an adjunct and asked me what my policy was (after 14+ years of teaching) on using Wikipedia as a reference.

Wikipedia used to emphasize that it is a tertiary source — a compilation created from of primary and secondary sources — and thus is not an authoritative reference. However, today it appears to be less modest and more into bragging. Due to its price and comprehensiveness, Wikipedia has become the first (if not only) reference for the Internet generation.

Certainly Wikipedia is suitable for resolving a bar wager: the sort of questions that people used to argue about are now easily settled via Wikipedia (or via more specialized and authoritative sources such as IMDB).

It’s also well known that the Wikipedia mechanisms do not prevent the intentional fabrication of lies, whether it be libel (falsely accusing a political figure of conspiring to kill JFK) or false history (the so-called Bicholim Conflict). No process will ever prevent this — any more than a library can prevent cutting pages from books on open stacks — but the processes seem to eventually identify these problems and correct them. It also appears that true experts (e.g. actual scientists) will check key topics and make sure that the most egregious factual errors get corrected.

The remaining problem is the distorted mirror of American and other societies that Wikipedia presents to the world and to posterity. (By sheer weight of population. is clearly dominated by the American perspective, and the English articles outnumber the next three languages combined.).

By nature of its contributors — on average younger and with more free time than the average Internet user — Wikipedia has a social, cultural and political bias of its contributors versus the rest of society. In particular, these authors demonstrate a pre-occupation with contemporary popular culture over other (more enduring or important) aspects of human knowledge: not everyone can write about string theory, but just about anyone can summarize a Simpsons episode.

And thus Beyoncé Knowles has a 36-page entry, and the late Michael Jackson, 49 pages. John Lennon warrants only 20 pages, but the Beatles have 38 and Elvis 45. Charlie Chaplain merits 35 pages, versus 20 pages for Rembrandt, 19 for Beethoven and 17 for Mozart.

By way of comparison, the most famous scientist of the 20th century, Albert Einstein, rates only 29 pages. Other 20th century Nobelists include Niels Bohr (who gave us the atom) with 14 pages, Enrico Fermi (who split the atom) 16 pages, and Wilhelm Röntgen (who won the first Nobel prize for discovering x-rays) with 5 pages. Martin Luther (who changed the course of European history) had 32 pages, Charlemagne 36 pages (but only 24 pages in German) and Henry VIII has 30 pages.

Overall, the depth of coverage of major figures seems adequate. But, lacking limits on resources (either to generate content or in printing it) the coverage of trivial topics balloons far beyond all reasonable measure.

Exhibits A-Z are the coverage of cult favorite American TV shows such as South Park or the Simpsons. South Park has 28 pages, but then another 17 pages listing all the episodes. More significantly, there is a 3-10 page entry for each of 237 episodes aired across 16 seasons (thus far). One US TV show with a run of nearly two decades — which will be forgotten a century from now — merits 1000+ pages, more “ink” than all the major painters or composers of 500 years of European history combined.

If journalism is the first draft on history, now Wikipedia offers the first draft of a comprehensive encyclopedia that could, in the end, crowd out all other records of our contemporary society. The result is what one would expect if archeologists centuries from now tried to assess the 20th or 21st century from uncovered copies of US Weekly or videos of “Entertainment Tonight.”

I'm somewhat optimistic that the problem can be corrected in this century, because there are millions of people who have the knowledge to correct these distortions. It’s not some core problem of economics (Wikipedia demonstrates this) or scarcity, but merely a matter of incentives. Right now, people who actually know something that’s scarce have no incentive to give it away in an anonymous crowdsourced encyclopedia, but instead seek course credit, (less and less often) to sell it for publication or generate academic reputation through peer reviewed journals.

Google’s dead knol experiment was one attempt to create a new production community. Still, the sheer volume of excess available labor suggests that there will be other attempts. In particular, if there were a way that students or professors got credit for term paper-quality original contributions shared on the Web, we might have a more coherent (and representative) picture than what what Wikipedia presents to the world.

The MIT-spawned BioBricks Foundation is crowdsourcing synthetic biology components through its annual iGem competition, drawing on a far more skilled and specialized knowledge base than Western history . (Similarly, MIT’s Open Courseware has spawned a wave of free online university course content). This implies even one visionary university could lead us to a new model of knowledge generation.


Anonymous said...

I believe it's "Niels" (Bohr), not Neil : )

Joel West said...

Thanks for the correction.