[phpBB Debug] PHP Notice: in file [ROOT]/includes/session.php on line 2208: Array to string conversion
[phpBB Debug] PHP Notice: in file [ROOT]/includes/session.php on line 2208: Array to string conversion
[phpBB Debug] PHP Notice: in file [ROOT]/includes/session.php on line 2208: Array to string conversion
[phpBB Debug] PHP Notice: in file [ROOT]/includes/session.php on line 2208: Array to string conversion
[phpBB Debug] PHP Notice: in file [ROOT]/includes/session.php on line 2208: Array to string conversion
[phpBB Debug] PHP Notice: in file [ROOT]/includes/session.php on line 2208: Array to string conversion
[phpBB Debug] PHP Notice: in file [ROOT]/includes/session.php on line 2208: Array to string conversion
[phpBB Debug] PHP Notice: in file [ROOT]/includes/session.php on line 2208: Array to string conversion
[phpBB Debug] PHP Notice: in file [ROOT]/includes/session.php on line 2208: Array to string conversion
[phpBB Debug] PHP Notice: in file [ROOT]/includes/session.php on line 2208: Array to string conversion
[phpBB Debug] PHP Notice: in file [ROOT]/includes/session.php on line 2208: Array to string conversion
[phpBB Debug] PHP Notice: in file [ROOT]/includes/session.php on line 2208: Array to string conversion
[phpBB Debug] PHP Notice: in file [ROOT]/includes/session.php on line 2208: Array to string conversion
[phpBB Debug] PHP Notice: in file [ROOT]/includes/session.php on line 2208: Array to string conversion
[phpBB Debug] PHP Warning: in file [ROOT]/includes/functions.php on line 4688: Cannot modify header information - headers already sent by (output started at [ROOT]/includes/functions.php:3823)
[phpBB Debug] PHP Warning: in file [ROOT]/includes/functions.php on line 4690: Cannot modify header information - headers already sent by (output started at [ROOT]/includes/functions.php:3823)
[phpBB Debug] PHP Warning: in file [ROOT]/includes/functions.php on line 4691: Cannot modify header information - headers already sent by (output started at [ROOT]/includes/functions.php:3823)
[phpBB Debug] PHP Warning: in file [ROOT]/includes/functions.php on line 4692: Cannot modify header information - headers already sent by (output started at [ROOT]/includes/functions.php:3823)
Poco Forums • View topic - Message not considered junk at any senstivity

Message not considered junk at any senstivity

Discussion on Bayesian and standard junk mail filters

Moderators: Eric, Tomas, robin, Michael

Message not considered junk at any senstivity

Postby imtrobin » Tue Oct 04, 2005 8:25 am

Hi

I have a couple of mails which keeps getting through the Bayesian filters. They all look very similar, and link to the same site. I use strict Bayesian, highest sensitivity. When I do an apply test, the first score -100, would made it not consdered junk.

Image

How do I solve this?

Robin
imtrobin
Frequent Visitor
 
Posts: 74
Joined: Mon Aug 02, 2004 3:07 pm

Postby SFCurley » Tue Oct 04, 2005 12:23 pm

Hi, I don't have an answer to your question, but the Bayesian stats that were showing in the posts struck me as low and I thought I'd comment. They were shown as 80% accuracy with threshhold set at 90% and with 330,000 junk words (wow!).

Over the last 18,000 messages, my stats are: 99.67%, with 0.42% false positives.

Settings are: 44,000 junk words; 31,000 junk words;
Bias of 3 and Threshhold set at 0.99.

I'm just wondering if you've overtrained on junkwords and maybe have the threshhold set too low.

Again, I hope not to offend, but thought I'd share my settings just to see if you could get better results of PM's Bayesian filters.
SFCurley
 

Postby imtrobin » Tue Oct 04, 2005 4:12 pm

Why I would be offended? :)

I use the default "Strict Bayesian" settings. I'm almost quite surprised by how much junk gets through, despite hearing people with 99 % success with Bayesian. I received over a hundred of spam mails everyday.

I will give you settings a go, thanks!
imtrobin
Frequent Visitor
 
Posts: 74
Joined: Mon Aug 02, 2004 3:07 pm

Postby Michael » Tue Oct 04, 2005 4:54 pm

Given your score of 100 for Bayesian filtering those results will trump all others. I know you say you are using strict bayesian but that doesn't mean the other filter options are not used (see the quote from the help below). The headers you posted show a few other non-Bayesian filters being used but their influence is not great enough to have any effect on a message that the BF filters consider to be good.

Under your settings a message that the BF filters consider good will be given a credit of -100 points. To be deemed junk a message would have to have somewhere between +10 and +20. Three other junk mail rules contributed to the final score of -99:
  1. There is no X-Mailer header - score of +3
  2. The message is addressed to someone in one of your address books - score of -4
  3. The message is not from someone in one of your address books - score of -2.


Note as well, from the help file:
you should try to keep sizes of each good and junk corpus about the same, as it also helps with overall accuracy.

and
Strict Bayesian button will increase the importance Bayesian filter has in the junk mail filtering process
Michael
Moderator
 
Posts: 866
Joined: Mon Jul 26, 2004 12:14 pm
Location: Victoria BC, Canada

Postby imtrobin » Wed Oct 05, 2005 5:58 am

So you are recommending that I don't use strict bayesian? Why would Bayesian filter be so inaccurate? I tried setting bias to 3 and junk threshold to 99, still quite a lot of spams getting through.
imtrobin
Frequent Visitor
 
Posts: 74
Joined: Mon Aug 02, 2004 3:07 pm

Postby SFCurley » Wed Oct 05, 2005 12:14 pm

I hate to offer this idea up, because I know it's a pain in the neck, but I might try deleting the dbspam and dbgood files and re-building the corpus over time with the new settings in place. My general approach is to only correct when it gets it wrong and thus avoid over-training.

Hope that helps.
SFCurley
 

Postby Michael » Wed Oct 05, 2005 12:49 pm

imtrobin wrote:So you are recommending that I don't use strict bayesian? Why would Bayesian filter be so inaccurate? I tried setting bias to 3 and junk threshold to 99, still quite a lot of spams getting through.


I'm not saying not to use the strict bayesian, it's just that this is often misunderstood and people think that it means that only the bayesian filters are firing. It does not mean that, it simply elevates the BF such that they are almost the sole determinant of whether or not a message is spam. Depending on your requirements this may or may not be desirable. For instance some users may want to put other rules in place and elevation of the BF rule imposed by the strict bayesian setting would almost certainly negate any rule they add.

As to why your bayesian rule might be inaccurate, it's very difficult to say without analyzing your good and spam corpuses and, with 300,000 words in your spam corpus that is a monumental task. In general the recommendation is to only train the BF filters when the mistakenly classify a message.

This is one downside of BF, it is impossible for any of us to say why your filters are not performing as well as they should.
Michael
Moderator
 
Posts: 866
Joined: Mon Jul 26, 2004 12:14 pm
Location: Victoria BC, Canada

Postby imtrobin » Wed Oct 05, 2005 3:31 pm

Actually that's what I did.

I didn't train unnecessary. All spams which got thorough, I trained them as junk. Over a perid of time, that's why it grew so big. Any false positive, then I train again.
imtrobin
Frequent Visitor
 
Posts: 74
Joined: Mon Aug 02, 2004 3:07 pm

Postby SFCurley » Thu Oct 06, 2005 2:19 am

I thought I read that there were some planned changes coming to the internal PM Bayesian logic. Does anyone know if this is true or not, and if so, what they were?
SFCurley
 

Postby Michael » Thu Oct 06, 2005 2:32 am

I believe the algorithm was tweaked to some extent but I cannot find a reference to this in either the release notes or the beta forum.
Michael
Moderator
 
Posts: 866
Joined: Mon Jul 26, 2004 12:14 pm
Location: Victoria BC, Canada


Return to Junk Mail Filtering Help and How-To

Who is online

Users browsing this forum: No registered users and 0 guests

cron