[phpBB Debug] PHP Notice: in file [ROOT]/includes/session.php on line 2208: Array to string conversion
[phpBB Debug] PHP Notice: in file [ROOT]/includes/session.php on line 2208: Array to string conversion
[phpBB Debug] PHP Notice: in file [ROOT]/includes/session.php on line 2208: Array to string conversion
[phpBB Debug] PHP Notice: in file [ROOT]/includes/session.php on line 2208: Array to string conversion
[phpBB Debug] PHP Warning: in file [ROOT]/includes/functions.php on line 4688: Cannot modify header information - headers already sent by (output started at [ROOT]/includes/functions.php:3823)
[phpBB Debug] PHP Warning: in file [ROOT]/includes/functions.php on line 4690: Cannot modify header information - headers already sent by (output started at [ROOT]/includes/functions.php:3823)
[phpBB Debug] PHP Warning: in file [ROOT]/includes/functions.php on line 4691: Cannot modify header information - headers already sent by (output started at [ROOT]/includes/functions.php:3823)
[phpBB Debug] PHP Warning: in file [ROOT]/includes/functions.php on line 4692: Cannot modify header information - headers already sent by (output started at [ROOT]/includes/functions.php:3823)
Poco Forums • View topic - Suggestion to improve filtering
Page 1 of 1

Suggestion to improve filtering

PostPosted: Wed Apr 23, 2008 8:00 am
by djgtram
Since I upgraded to PM 4.5.0.3910, I noticed a decrease in the filtering precision. Although most of the spam was caught, some unquestionable spam messages (with straight, non-masked spam words in the subject, for instance) would be passed as legitimate. What was even stranger, checking these messages with Junk Mail Filter > Apply and Test would score them as totally acceptable (although they did mark the banned spam word in the subject).

After some poking around, I saw that the contents of these messages were immediately added to DBGood.ini as legitimate words and PM misled itself into thinking that they were all right. So, the workaround I found was to put 1001 lines of a single unimportant entry like aaa=1 into this file (PM expects to see at least a thousand words to activate the learning filters) and set it to read-only so that Poco cannot modify it. Not being able to collect good words doesn't seem to harm the precision of the filtering and it seems to do away with these problematic cases.

I can't yet post real statistical data but my accuracy has been dropped to 97.12% and I'll report back if it climbs back to higher values where it used to be before. Also, I'd be interested to hear your experiences if some of you would be ready to try it out (by simply backing up the file before overwriting it would make it easy to restore the previous functionality).

Actually, I've been using a variation of this technique for many years now, with a custom made utility to strip down DBGood.ini upon quitting PM. But as I keep the program running all day long, this was a help but not the perfect solution. I can't really say without first hearing about your experiences but based on my earlier and current experiments, I tend to think that the many problems mentioned in these topics about Poco's junk mail filtering is attributable to the handling of good words. With this feature removed or at least switched off, I would expect a drastic improvement.

Bye,
Gábor