[phpBB Debug] PHP Notice: in file [ROOT]/includes/session.php on line 2208: Array to string conversion
[phpBB Debug] PHP Notice: in file [ROOT]/includes/session.php on line 2208: Array to string conversion
[phpBB Debug] PHP Notice: in file [ROOT]/includes/session.php on line 2208: Array to string conversion
[phpBB Debug] PHP Notice: in file [ROOT]/includes/session.php on line 2208: Array to string conversion
[phpBB Debug] PHP Notice: in file [ROOT]/includes/session.php on line 2208: Array to string conversion
[phpBB Debug] PHP Notice: in file [ROOT]/includes/session.php on line 2208: Array to string conversion
[phpBB Debug] PHP Notice: in file [ROOT]/includes/session.php on line 2208: Array to string conversion
[phpBB Debug] PHP Notice: in file [ROOT]/includes/session.php on line 2208: Array to string conversion
[phpBB Debug] PHP Notice: in file [ROOT]/includes/session.php on line 2208: Array to string conversion
[phpBB Debug] PHP Notice: in file [ROOT]/includes/session.php on line 2208: Array to string conversion
[phpBB Debug] PHP Notice: in file [ROOT]/includes/session.php on line 2208: Array to string conversion
[phpBB Debug] PHP Notice: in file [ROOT]/includes/session.php on line 2208: Array to string conversion
[phpBB Debug] PHP Notice: in file [ROOT]/includes/session.php on line 2208: Array to string conversion
[phpBB Debug] PHP Notice: in file [ROOT]/includes/session.php on line 2208: Array to string conversion
[phpBB Debug] PHP Notice: in file [ROOT]/includes/session.php on line 2208: Array to string conversion
[phpBB Debug] PHP Notice: in file [ROOT]/includes/session.php on line 2208: Array to string conversion
[phpBB Debug] PHP Warning: in file [ROOT]/includes/functions.php on line 4688: Cannot modify header information - headers already sent by (output started at [ROOT]/includes/functions.php:3823)
[phpBB Debug] PHP Warning: in file [ROOT]/includes/functions.php on line 4690: Cannot modify header information - headers already sent by (output started at [ROOT]/includes/functions.php:3823)
[phpBB Debug] PHP Warning: in file [ROOT]/includes/functions.php on line 4691: Cannot modify header information - headers already sent by (output started at [ROOT]/includes/functions.php:3823)
[phpBB Debug] PHP Warning: in file [ROOT]/includes/functions.php on line 4692: Cannot modify header information - headers already sent by (output started at [ROOT]/includes/functions.php:3823)
Poco Forums • View topic - Advanced Multi-level filtering.

Advanced Multi-level filtering.

Discussion on Bayesian and standard junk mail filters

Moderators: Eric, Tomas, robin, Michael

Advanced Multi-level filtering.

Postby SFCurley » Wed Sep 29, 2004 2:40 am

I thought I'd share what has turned out to be a pretty effective spam filtering approach for me, and one that makes good use of a lot of Poco's features. It's a combination of approaches (whitelisting, bayesian filtering, and then challenge-response) and results in having to pay very little attention to the whole junkmail issue and inconveniencing very few unknown mail senders.

I've created a series of filters that flow as follows:

Filter 1: Sort all newsletters into a newsletters folder to read

Filter 2: Run poco's bayesian filter (but have the junk score threshshold set very high so that the message never gets moved based on the score). Add 100 to the junk score if Bayesian positive.

Filter 3: Stop processing if sender is known or domain is on the "approved domains" list (using the %addressbooks% and %exceptsenders% filter feature)

** ADDED ** Filter 4: Run the dns blacklist filter Hogyt put together. Add 100 to junk score if originating IP is black-listed.

Filter 5: Stop processing if Junk score less than 100. This means that it was Bayesian Negative and not DNS blacklisted.

Filter 6: Stop processing if the subject has a special code word

Filter 7: Otherwise, assume it's junk, and send an automated reply to the sender that says "Don't know who you are. If you're a real person, re-send your message with a code word in the subject line so I know you're not a spammer", and move it to the junk folder (or a quaratine folder). I send this from a different "postmaster" account that I've setup at hotmail so that the spammers don't get confirmation of my real email address.

Because of the whitelisting and bayesian filter steps, this approach means that less than 1 out of 1000 real senders get a bounce-back message, but almost all the spammers do and I never hear from them again.

I have a few other steps I include, which I won't go into detail about, but this is the crux of the approach. Overall, it's been very effective for me and I thought I'd post it for anyone else who might find it of interest.
Last edited by SFCurley on Wed Oct 13, 2004 5:45 am, edited 1 time in total.
SFCurley
 

Postby Guest » Wed Sep 29, 2004 10:33 am

I will give this a go too! Thanks.
Guest
 

Re: Advanced Multi-level filtering.

Postby Pete » Thu Sep 30, 2004 6:38 am

SFCurley wrote:Filter 2: Run poco's bayesian filter (but have the junk score threshshold set very high so that the message never gets moved based on the score).

Just FYI, I achieve this differently. I set the "Custom Sensitivity" slider to the "Lowest" position.
Pete
 

Postby SFCurley » Thu Sep 30, 2004 6:44 am

Actually, THAT is what I do, too. I said threshold but should have said sensitivity.
SFCurley
 

Postby SFCurley » Wed Oct 13, 2004 5:46 am

Added Filter 4 and modified filter 5 -- FYI for anyone who is interested doing something similar.
SFCurley
 

Postby mrQQ » Thu Oct 13, 2005 6:04 am

where do i get these filters?
mrQQ
Frequent Visitor
 
Posts: 66
Joined: Wed Feb 09, 2005 6:03 am

Postby Eric » Thu Oct 13, 2005 6:54 am

mrQQ wrote:where do i get these filters?
Have a look here for Hogyt's script. :wink:
Eric
 

Postby mrQQ » Thu Oct 13, 2005 7:21 am

Yeah, got it.. have a problem with it though - it seems that it takes first Received: header, which is usually my mailserver :?: :(
mrQQ
Frequent Visitor
 
Posts: 66
Joined: Wed Feb 09, 2005 6:03 am

Postby Mitch Wagner » Thu Oct 27, 2005 1:52 pm

Neat stuff! Thanks, SFCurley (and everyone else who's participated in this thread).

SFCurley, it appears that your technique works this way:

1) Exempt all mail from mailing lists, known senders, allowed domains and senders, and anybody who uses the codeword.

2) Everybody else gets put through the spam tests.

3) If
3a) The Bayesian filter thinks the message is spam or
3b) The DNSBL thinks it's spam
THEN the message is tagged as spam.

Is that basically it?

Your earlier message said you excluded a few filters--I'd be interested in hearing about them if you have the inclination.
Mitch Wagner
Poco Tourist
 
Posts: 20
Joined: Tue Sep 14, 2004 6:51 am

Postby SFCurley » Fri Oct 28, 2005 1:49 am

Hi Mitch,

A couple of other things, one of which happens in the course of executing the standard PM filters. I whitelist or blacklist any email that has certain words by assigning either +999 or -999 to that word in the Message Body file. So, I use the eFax service for receiving all of my faxes by email, and there are some companies that send me junk faxes all the time. The sending fax number is contained in the header and message body, so I just assign that number a +999 junk score in the message body file.

I also test for false positives in the bayesian filters. Essentially, I want to know how accurate the bayesian filters are and I want them to be as well-trained as they can be, so what I do is to check after the basic PM filters (including the bayesian filters) to see if the junk score is > 0. If it is and if the person is in my address book or the domain is on my whitelist, I mark the message in blue to let me know it might be a bayesian false positive. That way, the bayesian stats are accurate and I can train the filter.

Also, if the message has my codeword in the subject line, I highlight the message in red to let me know I need to add to my address book.

That's pretty much it. Let me know if you have any other questions.
SFCurley
 

Postby Mitch Wagner » Fri Oct 28, 2005 1:15 pm

Thanks, SFCurley.

You may be interested in this: For a long time, I was a user of the Bayesian spam filter POPfile.

Like PocoMail's junk filter, POPfile allows you to whitelist addresses, subject lines, etc. For a while, I did what you do--I let the Bayesian spam filter do its work, whitelisted afterwards, and then color-coded whitelisted mail that had been tagged as spam so that I could use those messages to train POPfile.

But after a while I decided it wasn't worth the trouble. I didn't do any rigorous statistical study of it, but it appeared that if I was getting any gain in accuracy from my work, it was only one to three tenths of a percentage point--two or three messages out of every thousand. So then I just whitelisted known ham first, and then do the Bayesian thing on teh rest.

I love POPfile and am finding PocoMail's built-in spam filters inferior. I'm about to start a topic to discuss that and get recommendations from readers.
Mitch Wagner
Poco Tourist
 
Posts: 20
Joined: Tue Sep 14, 2004 6:51 am


Return to Junk Mail Filtering Help and How-To

Who is online

Users browsing this forum: No registered users and 2 guests

cron