[phpBB Debug] PHP Notice: in file [ROOT]/includes/session.php on line 2208: Array to string conversion
[phpBB Debug] PHP Notice: in file [ROOT]/includes/session.php on line 2208: Array to string conversion
[phpBB Debug] PHP Notice: in file [ROOT]/includes/session.php on line 2208: Array to string conversion
[phpBB Debug] PHP Notice: in file [ROOT]/includes/session.php on line 2208: Array to string conversion
[phpBB Debug] PHP Notice: in file [ROOT]/includes/session.php on line 2208: Array to string conversion
[phpBB Debug] PHP Notice: in file [ROOT]/includes/session.php on line 2208: Array to string conversion
[phpBB Debug] PHP Notice: in file [ROOT]/includes/session.php on line 2208: Array to string conversion
[phpBB Debug] PHP Warning: in file [ROOT]/includes/functions.php on line 4688: Cannot modify header information - headers already sent by (output started at [ROOT]/includes/functions.php:3823)
[phpBB Debug] PHP Warning: in file [ROOT]/includes/functions.php on line 4690: Cannot modify header information - headers already sent by (output started at [ROOT]/includes/functions.php:3823)
[phpBB Debug] PHP Warning: in file [ROOT]/includes/functions.php on line 4691: Cannot modify header information - headers already sent by (output started at [ROOT]/includes/functions.php:3823)
[phpBB Debug] PHP Warning: in file [ROOT]/includes/functions.php on line 4692: Cannot modify header information - headers already sent by (output started at [ROOT]/includes/functions.php:3823)
Poco Forums • View topic - Junk mail question -- in simple terms for me

Junk mail question -- in simple terms for me

Discussion on Bayesian and standard junk mail filters

Moderators: Eric, Tomas, robin, Michael

Junk mail question -- in simple terms for me

Postby speerga » Fri Sep 17, 2004 1:01 pm

Hi,

I am trying for about the fourth or fifth time to get Poco's Bayesian filtering to work for me.

I've read, re-read, re-read and RE-READ all the threads on junk mail flitering, Bayesian filter training, etc. :lol:

I'm not the cleverest person around these parts by far, so I probably am missing something pretty obvious.

But here's my basic question. And I DON'T mean to be cynical or sarcastic with this. But here goes: Why, in very simple terms, does PocoMail's Bayesian filter take so INCREDIBLY long to train, seem to be so very slow at really learning, when I was able to get 1)PopFile, 2) K9, and 3)Thunderbird Bayesian filters trained far better than Poco's and in far, far less the time??

I'm sorry. I truly don't mean to sound like I'm trolling or flaming, or anything else. I'm just extremely frustrated about it all.

I generally use K9. But from time to time I convince myself having PocoMail's builtin Bayesian filtering work would be more elegant. And indeed, it runs noticeably faster than running Poco with K9.

Yet it took me less than 200 or so messages to train K9 to something over 90 percent accuracy.

I started using Thunderbird just today and within 200 or 300 messages, it already seems to be getting over 75 percent.

I have used I don't know how many hundred messages in Poco, with a current good word count of about 13,750 and bad word count of about 18,900 -- and it's limping along at around 40-50 percent.

Ah, well. My apologies if I've gotten carried away here. I just simply don't understand why Poco's Bayesian filtering is that lame compared to the others? :shock:

Gary Speer
speerga
Resident Poster
 
Posts: 116
Joined: Wed Jul 28, 2004 1:49 pm
Location: Springfield, Missouri

Postby robin » Fri Sep 17, 2004 10:24 pm

Presumably the "% accuracy" figures that you are quoting are those reported by the respective applications. Can you instead monitor the number of false negatives and false positives that you are getting - a pain I know but that is a better measure of how effective the respective systems are. i.e. messages downloaded, messages wrongly identified as spam, messages wrongly identified as good, total number of spam messages.
robin
 

Postby Guest » Sat Sep 18, 2004 5:46 am

robin wrote:Presumably the "% accuracy" figures that you are quoting are those reported by the respective applications. Can you instead monitor the number of false negatives and false positives that you are getting - a pain I know but that is a better measure of how effective the respective systems are. i.e. messages downloaded, messages wrongly identified as spam, messages wrongly identified as good, total number of spam messages.


Yeah, well. I suppose I just got carried away. My whole point was this: Straight out of the box, K9 and even PopFile were trained and doing a terrific job within a couple of weeks. Poco's Bayesian filter will go day after day repeatedly plopping almost identical spam into my mailbox without seeming to learn a thing.

I really don't think I should have to get into a bunch of testing and refining to make it work, should I? As an "end user," shouldn't a feature be reasonably workable or work reasonably well off the shelf? :D

Gary Speer
Guest
 

Postby SFCurley » Sat Sep 18, 2004 7:32 am

As somebody who IS now getting very good results from Poco, and spent A LOT of time on the Bayesian issue, AND played extensively with POPFile, I do have to concur with speerga's comment. . . POPFile, which comes with no pre-defined corpus, does learn incredibly fast and does not appear to be terribly user-sensitive. Poco's BF -- by contrast -- does have a much wider range of user experience it would seem and is much slower on the uptake.

I have a few guesses about internal workings, POPFile's inclusion of what are called psuedo-tokens, etc, but these are all just speculative guesses.

Bottom line: I think speerga's is a valid question.
SFCurley
 


Return to Junk Mail Filtering Help and How-To

Who is online

Users browsing this forum: No registered users and 0 guests

cron