about me

Spam detection using Naive Bayes

Post replyCan't edit post
Watch topicCan't delete post
Saturday, November 24 2007, 4:27
For over a year now, my website has been running rather effective anti-spam software written for the most part by yours truly. The amount of data it had accumulated of that course of time, however, was filling up my database quota and something needed to be done about it.
I patched up the way the system works—it is based on Naive Bayes statistical analysis—to make it distinguish between HTML markup and normal text. I had noticed that the classifier had been mixing up completely unrelated things because of this generalization.
Spam statistics
To better keep an eye on the current spam trends, I set up a nifty little page that the tracker now shows the status of the unceasing battle against spam here at
Actions:Post replyWatch topic
Paul: Isn't it typical that this post...Post reply Show 3 comments
Thom: hahaha, de ironie van het lot! Toch knap...
Paul: Well, I can't complain. It's banned over...

More updates

Click here to read more news items.
Members have extra privileges. You can login or register.
© 2005–2018 P. F. Lammertsma
No members currently online; 0 hits today by 0 unique users