USING search engines to compile a list- like the top 50 greatest blues guitarists by record sales, say- involves a lot of drudge work because you have to visit many web pages to gather the data you need. But the next step in search engine technology could make creating such lists possible with a single mouse click. KnowItAll, a search engine under development at the University of Washington, Seattle, trawls the web for data and then collates it in the form of a list.
The approach is unique, says its developer, Oren Etzioni, because it generates information that probably doesn't exist on any single web page. The US Department of Defense's research arm, DARPA, and Google, are so impressed that they are providing funding for the project.
Etzioni's ultimate aim is to have KnowItAll answer questions such as "list all British scientists born before 1900". The software cannot do that yet, because it lacks a module that can understand "natural-language" questions of this type. That will come later, he says. What it can do, however, is take a phrase like "list scientists" and return with a list that it believes with a high degree of confidence are (or were) scientists.
For any input noun- "scientists", "guitarists", "gardeners" or "actors", say- KnowItAll tries to find sentences on websites that contain that noun and looks for words that often appear after it. In this way it might find the phrases "scientists such as" and "scientists including". It then feeds these to 12 search engines and extracts the words that tend to follow, which are often scientists' names. But c1ertain phrases like "scientists such as botanists" also fulfil the search criteria. The software can work out that "botanists" is not a name, and it can use this to inject "botanists such as" into the engines to obtain an even fuller list of scientists' names.
KnowItAll then returns a long list of scientists' names- each one accompanied by its percentage probability of being correct, as measured by frequency of occurrence of the names on websites. Users will be able to choose the level of confidence they want in the data. KnowItAll is also able to find words that often occur close to the search term. In the case of "scientists" these might be words like "DNA" and "quantum". It uses them to refine the probability that a person is indeed a scientist.
Source: Eurekalert & othersLast reviewed: By John M. Grohol, Psy.D. on 21 Feb 2009
Published on PsychCentral.com. All rights reserved.