[phpBB Debug] PHP Notice: in file [ROOT]/includes/session.php on line 2208: Array to string conversion
[phpBB Debug] PHP Notice: in file [ROOT]/includes/session.php on line 2208: Array to string conversion
[phpBB Debug] PHP Notice: in file [ROOT]/includes/session.php on line 2208: Array to string conversion
[phpBB Debug] PHP Notice: in file [ROOT]/includes/session.php on line 2208: Array to string conversion
[phpBB Debug] PHP Warning: in file [ROOT]/includes/functions.php on line 4688: Cannot modify header information - headers already sent by (output started at [ROOT]/includes/functions.php:3823)
[phpBB Debug] PHP Warning: in file [ROOT]/includes/functions.php on line 4690: Cannot modify header information - headers already sent by (output started at [ROOT]/includes/functions.php:3823)
[phpBB Debug] PHP Warning: in file [ROOT]/includes/functions.php on line 4691: Cannot modify header information - headers already sent by (output started at [ROOT]/includes/functions.php:3823)
[phpBB Debug] PHP Warning: in file [ROOT]/includes/functions.php on line 4692: Cannot modify header information - headers already sent by (output started at [ROOT]/includes/functions.php:3823)
Poco Forums • View topic - Suggestion to improve filtering

Suggestion to improve filtering

Discussion on Bayesian and standard junk mail filters

Moderators: Eric, Tomas, robin, Michael

Suggestion to improve filtering

Postby djgtram » Wed Apr 23, 2008 8:00 am

Since I upgraded to PM 4.5.0.3910, I noticed a decrease in the filtering precision. Although most of the spam was caught, some unquestionable spam messages (with straight, non-masked spam words in the subject, for instance) would be passed as legitimate. What was even stranger, checking these messages with Junk Mail Filter > Apply and Test would score them as totally acceptable (although they did mark the banned spam word in the subject).

After some poking around, I saw that the contents of these messages were immediately added to DBGood.ini as legitimate words and PM misled itself into thinking that they were all right. So, the workaround I found was to put 1001 lines of a single unimportant entry like aaa=1 into this file (PM expects to see at least a thousand words to activate the learning filters) and set it to read-only so that Poco cannot modify it. Not being able to collect good words doesn't seem to harm the precision of the filtering and it seems to do away with these problematic cases.

I can't yet post real statistical data but my accuracy has been dropped to 97.12% and I'll report back if it climbs back to higher values where it used to be before. Also, I'd be interested to hear your experiences if some of you would be ready to try it out (by simply backing up the file before overwriting it would make it easy to restore the previous functionality).

Actually, I've been using a variation of this technique for many years now, with a custom made utility to strip down DBGood.ini upon quitting PM. But as I keep the program running all day long, this was a help but not the perfect solution. I can't really say without first hearing about your experiences but based on my earlier and current experiments, I tend to think that the many problems mentioned in these topics about Poco's junk mail filtering is attributable to the handling of good words. With this feature removed or at least switched off, I would expect a drastic improvement.

Bye,
Gábor
djgtram
Resident Poster
 
Posts: 100
Joined: Thu Oct 20, 2005 4:51 am
Location: Hungary

Return to Junk Mail Filtering Help and How-To

Who is online

Users browsing this forum: No registered users and 1 guest

cron