Finding the Information Balance Between Quantity and Quality on the Web

Thanks to Google, Yahoo and other Search Engines, I can find information on almost any topic I can think of. This has made a huge impact on just about everything I do.

I remember having a set of World Book encyclopedias when I was growing up and they were the be-all and end-all in just about everything I needed to research for school, for anything I was curious about and for anything my parents and I wanted or needed to know more about. Most good parents who could afford to all had a set of encyclopedias in their house. And they had at least a little bit about just about any topic you could think of, from Shakespeare’s plays, to how to do basic household repairs, to geography, history, politics, you name it. We also had World Books’ Childcraft, the how and why 15 volume encyclopedia which featured a classic blend of photos, illustrations, fiction, and nonfiction to capture and keep the interest of a young student.

I used to sit down and just read through each Childcraft book, they were so interesting. I did the same with the World Books but more flipping through and reading about things that caught my eye.

But although we didn’t realize it at the time, we were getting a very limited view of each subject. Sure, we learned the important basic data, but we only got one viewpoint and one interpretation, and it was the most generic view possible. Which in light of the fact that it was the only view we got, was a good thing, but still we ended up with the most generic, undisputed (at least at the time), mainstream view of everything.

And you would think this would have insured that at least what we learned was reliable. And to a large extent this is true, especially for the specific, concrete things like the birds of North America or the names and characteristics of dinosaurs. But how can you teach or explain or even say that someone “knows” about Picasso or the history of American Indians or even the culture of a country using just one reference that tries to be as non-confrontational as possible.

And even if you looked at more than one reference book or other “trusted” reference on a subject, you still most likely got a one-sided viewpoint, that of whatever was the most likely to be non-confrontational and non-controversial. And whatever was written by people in academia was highly regarded, even though much of it was recycled from what some other academic wrote, without any first hand knowledge of the subject. In other words, just like bloggers frequently do, they read a lot about something and then put it all together in “their own” paper or thesis or other scholarly work. But much of the time, their knowledge was assumed. They didn’t know from their own experiences, they knew because they had read or studied it from someone who had done the same thing, on and on. Who knows how far back you might have to go to actually get to someone who had actually experienced it rather than just “researched” it.

Now we have the opposite problem. We have at our fingertips vast amounts of information on just about any subject you can possibly imagine. Want to know reasons for taking a martial art? I got 2 million plus hits just in Yahoo. How about the Detroit Riots of 1967. Almost 300,000 hits. Information about being left-handed? Over 12 million hits, including stores that sell gadgets for left-handed people, reasons why people are left-handed and famous left-handers. And this was all one page 1 of my search.

But what of this vast bank of information is valid and what is something that sounds reasonable but has no basis in fact? I personally feel that a lot more of what I find doing searches on the internet is really pretty reasonably sound than a lot of people seem to think. I constantly hear about how untrustworthy the information is on the web and how you can’t trust it. And yet, most of what I find (granted, I avoid the sites with titles such as ‘Elvis’ Guide to Culture’ or ‘A Klingon’s Guide to Battle’) is reasonable (although from differing viewpoints) and is well thought out. And with some cross-referencing, most things can be reasonably well verified.

And people talk about how search engines, being entities that in the business of making money, skew the results towards sites that advertise with them or that have put a lot of thought into marketing themselves so they show up early in searches. And that of course has some bearing in fact. However, it also means that sites with more reliability frequently get presented first. On an aside, I’m surprised at how often a Wikipedia entry shows up for a topic on the first page of a search. This is a relatively recent development, I remember not too long ago when you almost never saw Wikipedia. I don’t know if it is because Wikipedia is increasing its entries or is doing better at search engine optimization or becoming more trusted or what, but I have definitely noticed it.

I just read in the paper about a new Wikipedia tool demo by the UCSC Wiki Lab that examines an entry’s contents to determine how reliable each piece is, mostly by examining the reputations of each contributor responsible for each line. It determines the degree in which each contributor’s work survives consequent edits by other people. This is an interesting way of looking at it and I can see how that sounds reasonable. However, many accurate contributions get overwritten, not because their contribution was inaccurate but just to expand what was written or in the case of controversial entries, to rewrite it to their own viewpoint.

It background color codes sections of entries in Wikipedia in various shades of orange to indicate that the section may not be reliable (the darker orange, the less reliable). Sections in white are deemed reliable. Here is the Wikimania 2007 talk on this.

Here is a link to a paper entitled “A Context-Driven Reputation System for the Wikipedia” that explains more about the algorithms used.

This is an interesting idea and many Web 2.0 sites are using some sort of method that helps determine the reliability of what they contain, such as the number of links to a post or blog (assuming that if it is linked to, it contains something that other people deemed valuable).

However, I was not very impressed with the Wikipedia Reliability demo. It uses a subset of a copy of Wikipedia from Feb 2007 (which is fine). But as I randomly went through the artlicles, I didn’t really find any that made much sense to me as to why they trusted or didn’t trust any given information. Whatever their algorithms may be, it doesn’t look to me like they work very well.

So what is the answer? Should we do as the doomsayers say and avoid anything we find on the web? Is the web the new encyclopedia? Or should we reach a happy medium, discarding the obviously doubtful, looking at several sources to get a general consensus, putting some thought into forming an opinion and when we make some decisions, trusting but verifying?

In other words, should we treat what we find on the web any different than anything we see on the news, read in the paper, watch on TV, read in a book or any other way that we get information.

I don’t think so and I think most of us are aware enough to practice this regularly anyway. Maybe it is a matter of some people learning that approaching information on the web is not so different from how we approach any other sources of information. And ultimately, it can give a much broader and thorough view of something than just reading about it in an encyclopedia.

~Susan Mellott

