[phpBB Debug] PHP Notice: in file [ROOT]/includes/session.php on line 2208: Array to string conversion
[phpBB Debug] PHP Notice: in file [ROOT]/includes/session.php on line 2208: Array to string conversion
[phpBB Debug] PHP Notice: in file [ROOT]/includes/session.php on line 2208: Array to string conversion
[phpBB Debug] PHP Notice: in file [ROOT]/includes/session.php on line 2208: Array to string conversion
[phpBB Debug] PHP Notice: in file [ROOT]/includes/session.php on line 2208: Array to string conversion
[phpBB Debug] PHP Notice: in file [ROOT]/includes/session.php on line 2208: Array to string conversion
[phpBB Debug] PHP Notice: in file [ROOT]/includes/session.php on line 2208: Array to string conversion
[phpBB Debug] PHP Notice: in file [ROOT]/includes/session.php on line 2208: Array to string conversion
[phpBB Debug] PHP Notice: in file [ROOT]/includes/session.php on line 2208: Array to string conversion
[phpBB Debug] PHP Notice: in file [ROOT]/includes/session.php on line 2208: Array to string conversion
[phpBB Debug] PHP Notice: in file [ROOT]/includes/session.php on line 2208: Array to string conversion
[phpBB Debug] PHP Notice: in file [ROOT]/includes/session.php on line 2208: Array to string conversion
[phpBB Debug] PHP Notice: in file [ROOT]/includes/session.php on line 2208: Array to string conversion
[phpBB Debug] PHP Notice: in file [ROOT]/includes/session.php on line 2208: Array to string conversion
[phpBB Debug] PHP Notice: in file [ROOT]/includes/session.php on line 2208: Array to string conversion
[phpBB Debug] PHP Notice: in file [ROOT]/includes/session.php on line 2208: Array to string conversion
[phpBB Debug] PHP Warning: in file [ROOT]/includes/functions.php on line 4688: Cannot modify header information - headers already sent by (output started at [ROOT]/includes/functions.php:3823)
[phpBB Debug] PHP Warning: in file [ROOT]/includes/functions.php on line 4690: Cannot modify header information - headers already sent by (output started at [ROOT]/includes/functions.php:3823)
[phpBB Debug] PHP Warning: in file [ROOT]/includes/functions.php on line 4691: Cannot modify header information - headers already sent by (output started at [ROOT]/includes/functions.php:3823)
[phpBB Debug] PHP Warning: in file [ROOT]/includes/functions.php on line 4692: Cannot modify header information - headers already sent by (output started at [ROOT]/includes/functions.php:3823)
Poco Forums • View topic - Advanced Multi-level filtering.
Page 1 of 1

Advanced Multi-level filtering.

PostPosted: Wed Sep 29, 2004 2:40 am
by SFCurley
I thought I'd share what has turned out to be a pretty effective spam filtering approach for me, and one that makes good use of a lot of Poco's features. It's a combination of approaches (whitelisting, bayesian filtering, and then challenge-response) and results in having to pay very little attention to the whole junkmail issue and inconveniencing very few unknown mail senders.

I've created a series of filters that flow as follows:

Filter 1: Sort all newsletters into a newsletters folder to read

Filter 2: Run poco's bayesian filter (but have the junk score threshshold set very high so that the message never gets moved based on the score). Add 100 to the junk score if Bayesian positive.

Filter 3: Stop processing if sender is known or domain is on the "approved domains" list (using the %addressbooks% and %exceptsenders% filter feature)

** ADDED ** Filter 4: Run the dns blacklist filter Hogyt put together. Add 100 to junk score if originating IP is black-listed.

Filter 5: Stop processing if Junk score less than 100. This means that it was Bayesian Negative and not DNS blacklisted.

Filter 6: Stop processing if the subject has a special code word

Filter 7: Otherwise, assume it's junk, and send an automated reply to the sender that says "Don't know who you are. If you're a real person, re-send your message with a code word in the subject line so I know you're not a spammer", and move it to the junk folder (or a quaratine folder). I send this from a different "postmaster" account that I've setup at hotmail so that the spammers don't get confirmation of my real email address.

Because of the whitelisting and bayesian filter steps, this approach means that less than 1 out of 1000 real senders get a bounce-back message, but almost all the spammers do and I never hear from them again.

I have a few other steps I include, which I won't go into detail about, but this is the crux of the approach. Overall, it's been very effective for me and I thought I'd post it for anyone else who might find it of interest.

PostPosted: Wed Sep 29, 2004 10:33 am
by Guest
I will give this a go too! Thanks.

Re: Advanced Multi-level filtering.

PostPosted: Thu Sep 30, 2004 6:38 am
by Pete
SFCurley wrote:Filter 2: Run poco's bayesian filter (but have the junk score threshshold set very high so that the message never gets moved based on the score).

Just FYI, I achieve this differently. I set the "Custom Sensitivity" slider to the "Lowest" position.

PostPosted: Thu Sep 30, 2004 6:44 am
by SFCurley
Actually, THAT is what I do, too. I said threshold but should have said sensitivity.

PostPosted: Wed Oct 13, 2004 5:46 am
by SFCurley
Added Filter 4 and modified filter 5 -- FYI for anyone who is interested doing something similar.

PostPosted: Thu Oct 13, 2005 6:04 am
by mrQQ
where do i get these filters?

PostPosted: Thu Oct 13, 2005 6:54 am
by Eric
mrQQ wrote:where do i get these filters?
Have a look here for Hogyt's script. :wink:

PostPosted: Thu Oct 13, 2005 7:21 am
by mrQQ
Yeah, got it.. have a problem with it though - it seems that it takes first Received: header, which is usually my mailserver :?: :(

PostPosted: Thu Oct 27, 2005 1:52 pm
by Mitch Wagner
Neat stuff! Thanks, SFCurley (and everyone else who's participated in this thread).

SFCurley, it appears that your technique works this way:

1) Exempt all mail from mailing lists, known senders, allowed domains and senders, and anybody who uses the codeword.

2) Everybody else gets put through the spam tests.

3) If
3a) The Bayesian filter thinks the message is spam or
3b) The DNSBL thinks it's spam
THEN the message is tagged as spam.

Is that basically it?

Your earlier message said you excluded a few filters--I'd be interested in hearing about them if you have the inclination.

PostPosted: Fri Oct 28, 2005 1:49 am
by SFCurley
Hi Mitch,

A couple of other things, one of which happens in the course of executing the standard PM filters. I whitelist or blacklist any email that has certain words by assigning either +999 or -999 to that word in the Message Body file. So, I use the eFax service for receiving all of my faxes by email, and there are some companies that send me junk faxes all the time. The sending fax number is contained in the header and message body, so I just assign that number a +999 junk score in the message body file.

I also test for false positives in the bayesian filters. Essentially, I want to know how accurate the bayesian filters are and I want them to be as well-trained as they can be, so what I do is to check after the basic PM filters (including the bayesian filters) to see if the junk score is > 0. If it is and if the person is in my address book or the domain is on my whitelist, I mark the message in blue to let me know it might be a bayesian false positive. That way, the bayesian stats are accurate and I can train the filter.

Also, if the message has my codeword in the subject line, I highlight the message in red to let me know I need to add to my address book.

That's pretty much it. Let me know if you have any other questions.

PostPosted: Fri Oct 28, 2005 1:15 pm
by Mitch Wagner
Thanks, SFCurley.

You may be interested in this: For a long time, I was a user of the Bayesian spam filter POPfile.

Like PocoMail's junk filter, POPfile allows you to whitelist addresses, subject lines, etc. For a while, I did what you do--I let the Bayesian spam filter do its work, whitelisted afterwards, and then color-coded whitelisted mail that had been tagged as spam so that I could use those messages to train POPfile.

But after a while I decided it wasn't worth the trouble. I didn't do any rigorous statistical study of it, but it appeared that if I was getting any gain in accuracy from my work, it was only one to three tenths of a percentage point--two or three messages out of every thousand. So then I just whitelisted known ham first, and then do the Bayesian thing on teh rest.

I love POPfile and am finding PocoMail's built-in spam filters inferior. I'm about to start a topic to discuss that and get recommendations from readers.