|
The
Internet is getting to be more and more like
Hollywood. Successful movies spawn numerous
copycats, and this is now happening with search
engines as well.
Since
Google is making money and is expected to bring
in big bucks in a midyear IPO, suddenly search
engines are the next big thing. But unlike in
Hollywood — where the spin-offs, rip-offs, and
clones are seldom as successful as the movie
that spawned them — search engines just keep
giving users better and better tools.
Google
began with a unique page-ranking concept that
determined page popularity by looking at the
number of references from other sites.
Before
Google, the top engine was arguably
AltaVista, which went south after Compaq bought
Digital Equipment Corp., the company that
maintained it. Before and after AltaVista, a
slew of interim hotties came and went. The
original progenitor was WebCrawler. Among the
popular search engines was the still-popular
Yahoo!, which took the directory approach and
provided search as a secondary mechanism.
Yahoo!
will probably be the only company unscathed
in the upcoming battles, unless it chooses to
associate itself with the looming mess. The
biggest mess-maker will be Microsoft, which
suddenly thinks this is somehow its business
too. It intends to release a new search service
using natural-language queries, much like Ask
Jeeves. Ask Jeeves has never impressed me, and
the company has invested years in this idea.
What is Microsoft going to do differently?
Natural-language
searching means that instead of typing
"Dvorak writer magazine" to see which
magazines I write for, you would type "What
magazines does John C. Dvorak write for?"
This eventually leads to "If John C. Dvorak
writes for magazines, what are they, where are
they, and does he have a phone number I can
call?"
Do
this with Ask Jeeves and you get a
"sponsored" link to a telephone
directory CD-ROM that you can buy. Then there
are two come-ons for magazine subscriptions and
a link to "books by Dvorak." This is
followed by links to incredibly obscure blogs
that mention Dvorak columns. Useless.
To see
how poorly natural-language parsing works, use
Google's translation function. If companies
can't create decent machine translation, how can
they use natural language for search queries?
Still, Microsoft will do what it does best in
areas outside its core competency: muddy the
waters.
Not
all is lost: A recent issue of MIT's Technology
Review details some of the new approaches to
searching, including the unique clustering
methodology employed by Mooter (www.mooter.com).
Instead of a list of search results, you get
clusters not dissimilar from those found at the
quirky Kartoo (www.kartoo.com).
The
search engine mentioned in the article — and
one I've been toying with — is Teoma
(www.teoma.com), ironically now owned by Ask
Jeeves. It often outperforms Google in accuracy
and in putting exactly what you want at the top
of the results list.
Some of
this may have to do with Google's abandonment of
its ranking methodology because of the emergence
of redundant cross-linking, which is something
many outsiders blame on nearly 5 million
slaphappy bloggers and their so-called
blogrolls.
The
continued advantage of Google over the
competition, though, is the Google Web cache.
It's amazing how many pages are off-line but
still available in the Google cache. Unless a
new engine comes along and duplicates this
feature, there's no way Google can not stay on
top of this game.
Then
there's the issue of the deep Web, or what
was once called the hidden Web or the invisible
Web. This typically refers to database sites
using content management systems that crawlers
can't index. This is why when you do a search on
"John C. Dvorak," you get very few
hits on the hundreds and hundreds of online and
print columns I've written.
At least
one start-up is working on this problem. The
current methodology is to use metasearch engines
(the ones that attack all sorts of little
engines and piece together the results) like
WindSeek (www.windseek.com) or specialty search
engines, such as All academic, (www.allacademic.com).
However
this shakes out, at least one thing is certain:
We'll have a lot of different search choices,
and that's pretty much the only way we can
navigate all this information. Cheer these
people on.
To discontinue mailings, click here |