::scr googley eyed

Simon Wistow scr@thegestalt.org
Tue, 15 Oct 2002 10:53:24 +0100


Since I'm posting anyway - so, this change in Google's page ranking
technology then. How does it make you feel?

At first I was all in favour since it seemed to stop blogs getting too
highly rated (aside from my usual distate of Web Logs I just found it
absurd that at one point googling for "games theory prisoner's dilemma"
got you a page [0] on Matt "Blackbelt" Jones' blog whiched quoted a mail
from me - and, much as I'm loath to admit it, I'm not a world authority
on either games theory or games design).

Cards on the table time - I now work for Yahoo as a serach engineer.
Contraray to popular belief we don't 'use' google for our main searches.
For that we have teams of 'surfers' who maintain a taxonomy of links a
little like dmoz.org. From that we extract meta information and monitor
trends and they add and rate more pages in trends that are currently
popular. We use Google for what are called 'fall off searches' for stuff
that we can't find elsewhere.


In a recent, rather breathless article ...

  http://www.oreillynet.com/pub/a/javascript/2002/10/11/morville.html

...  Peter "Polar Bear Book" Morville derides Google for saying  :

"This page was generated entirely by computer algorithms without human
editors. No humans were harmed or even used in the creation of this
page." 

Which kind of misses the point. Aggregators will never kill off the
things which they aggregate. However Mr Morville then goes on to give a
list of reasons why he loves Google.

I agree with some of the points. I like Google because of the clean,
simple interface. I like automatic doc conversion to html and I like
cached links.

On the other hand I'm not in love with PageRank [tm]. As an algorithm
it's not that good, not scaleable and vulnerable to attack. There are
better algorithms out there or, at least, ones that are at least as
good. The statement that "Google is us" is a bit gushing but not nearly
as gushing as "Google excites us" - the paragraph 

"Perhaps most endearing is the fact that Google energizes us about the
future. Google freed us from the constraints of full-text search,
demonstrating the power of adaptive, emergent solutions." 

screams of someone who's spent too long immersed up to the neck in
corporate bullsh1t and marketing speak.

However, going back to "Google is us". I like the sentiment more than
the hype placed after it. 

For a freelance project I've been looking at collaborative filtering -
the technique of learning from a community to provide better
information. PageRank is an attempt at this. A ver simplistic one but an
attempt. However I think there are better ways to do collaborative
filtering. 

RDF and the whole semantic web shennanigans are a good start. Filtering
based on knowledge about who you are and previous habits (no point
giving you taxi firms in Chicago if you live in London) and also
extrapolating based on the results of others like you - kind of a
'people who liked these links also liked ... ' - would be good. Dipsy,
and the other irc infobots, are a weird form of collaborative filtering
in that they gain knowledge from the community, aggregating it then
spewing it out again. Celia had the idea this morning of grouping
results by document types - heuristically grouping results into
powerpoint presentations, mailing list archives, academic papers, blogs
and personal web pages. 

Anyway, since I need to get on with my work, anybody got any ideas? Or
comments? Or suggestions?

Simon


[0] http://www.blackbeltjones.com/work/mt/archives/000321.html


-- 
: feel the banana karma