::scr What porn sites don't want you to know..
Chris Carline
scr@thegestalt.org
Fri, 11 Oct 2002 14:18:35 +0100
On Fri, Oct 11, 2002 at 01:28:20PM +0100, Ash Argent-Katwala wrote:
> I've stopped purging spam lately, in the hopes of soon switching to one of
> these simple frequency (or preferably something with a larger chain length)
> probabilistic filters. Still at least since it's the same old crud it'll
> presumably compress fairly well, and big disks are pretty cheap. The
> write-up from Paul Graham <http://www.paulgraham.com/spam.html> is fairly
> compelling, although I'm sure this isn't new to many folk who pay more
> attention than me.
I installed a probabalistic filter about six weeks ago, and it's been
great. I had a lot of false-positive problems with SpamAssassin[0] to the
point where I didn't dare killfile anything in case it was legitimate. Not
good. But since installing a bayesianesque system and training it for
about five weeks, only about 1 in 20 spams get through (with no false
positives - this is really important for me!). With a bit more training
I'm hoping the hit rate will get even better.
Anyway. I'm now happily dropping spams into a purgatory mbox that acts as
a convenient store for even more spam that I can feed back into the
training set.
Some of the things it has let through would have been caught by
SpamAssassin, of course, but no system is perfect so it would seem.
Chris
[0] The main problem I had with SA was its tendency to mark online
receipts as spam, as well as information emails I'd requested from
legitimate sites. It also marked a "friendsreunited" message sent to my
Mrs as spam too...