[phpBB Debug] PHP Notice: in file [ROOT]/includes/session.php on line 2208: Array to string conversion
[phpBB Debug] PHP Notice: in file [ROOT]/includes/session.php on line 2208: Array to string conversion
[phpBB Debug] PHP Notice: in file [ROOT]/includes/session.php on line 2208: Array to string conversion
[phpBB Debug] PHP Notice: in file [ROOT]/includes/session.php on line 2208: Array to string conversion
[phpBB Debug] PHP Notice: in file [ROOT]/includes/session.php on line 2208: Array to string conversion
[phpBB Debug] PHP Notice: in file [ROOT]/includes/session.php on line 2208: Array to string conversion
[phpBB Debug] PHP Notice: in file [ROOT]/includes/session.php on line 2208: Array to string conversion
[phpBB Debug] PHP Notice: in file [ROOT]/includes/session.php on line 2208: Array to string conversion
[phpBB Debug] PHP Notice: in file [ROOT]/includes/session.php on line 2208: Array to string conversion
[phpBB Debug] PHP Notice: in file [ROOT]/includes/session.php on line 2208: Array to string conversion
[phpBB Debug] PHP Notice: in file [ROOT]/includes/session.php on line 2208: Array to string conversion
[phpBB Debug] PHP Notice: in file [ROOT]/includes/session.php on line 2208: Array to string conversion
[phpBB Debug] PHP Warning: in file [ROOT]/includes/functions.php on line 4688: Cannot modify header information - headers already sent by (output started at [ROOT]/includes/functions.php:3823)
[phpBB Debug] PHP Warning: in file [ROOT]/includes/functions.php on line 4690: Cannot modify header information - headers already sent by (output started at [ROOT]/includes/functions.php:3823)
[phpBB Debug] PHP Warning: in file [ROOT]/includes/functions.php on line 4691: Cannot modify header information - headers already sent by (output started at [ROOT]/includes/functions.php:3823)
[phpBB Debug] PHP Warning: in file [ROOT]/includes/functions.php on line 4692: Cannot modify header information - headers already sent by (output started at [ROOT]/includes/functions.php:3823)
Poco Forums • View topic - Poco's Bayesian Filters
Page 2 of 2

PostPosted: Fri Aug 27, 2004 6:53 am
by vamp07
I looked at my junk ini which at this point has 7993 words. Out of those 6248 occur 2 or less times. This means that 78% of the content of those files is not being used to determine if a piece of mail is junk. If popfile does not insist on words occurring x amount of times I bet the culprit for training time and dictionary size needing to get so big is here.

Is there any variable in the poco.ini file I can change to tell it to start using words if they only occur 1 time?

PostPosted: Fri Aug 27, 2004 7:16 am
by SFCurley
I think that w/ good mail bias set to "3", words with a count of "2" will be included because 2 x3 => 5. Words with a count of 1 would be included if you had your good mail bias set to 5, since 1 x 5 => 5. I did have a correspondence w/Slaven (which I can't find) and think this is how it works re: whether or not words are inlcuded. Maybe somebody at PSI could confirm. No setting other than this that I am aware of that would affect whether a word has to occur once or more than once to be consdered.

Finally, one other point of difference b/w Poco and POPFile . . . Poco evaluates only the "top 30" probability words in an email based on their appearance in the corpus: POPFile includes all words. Hard to say how much of a difference this would make.

PostPosted: Fri Aug 27, 2004 7:24 am
by vamp07
I would think that good bias affects how it treats good words, not bad words. If it works the way you say it might be something to try. I wish we could get somebody in the know to participate in this thread.

PostPosted: Fri Aug 27, 2004 11:39 am
by Pete
The problem that I have is that I receive just enough spam to be annoying but not enough to effectively train PocoMail. I've been using PocoMail's BF for more than two months now. I have about 3,000 good words and 18,000 bad words.

On a daily basis, I receive about ten spams. Of those, about three of them automatically go into the Junk folder, but I have to manually classify and move the other seven. So for me, at this point, it would be easier and simpler to just manually delete the spam in my In box than to use PocoMail's Bayesian Filter. I hope that future versions of PocoMail's BF will be more useful.

On the plus side, PocoMail has not flagged very many false positives.

PostPosted: Fri Aug 27, 2004 2:47 pm
by SFCurley
Yeah, that's kind of like teaching a kid to read by sending them to school one day a week . . . you'll eventually get there, but it might be just in time for the prom. Clearly you've been too parsimonious with your email address!

PostPosted: Fri Aug 27, 2004 3:47 pm
by tribble
Pete, I'd be happy to forward you all of my spam. You could train Poco in no time at all :-)

PostPosted: Wed Sep 01, 2004 11:28 pm
by vamp07
How many tokens does Pocomail use do decide if something is spam? Is it the top 10? I think that is what popfile uses. If it uses more maybe this would explain why the dictionaries need to get so big?

PostPosted: Thu Sep 02, 2004 2:23 am
by SFCurley
Top 30 as I recall, which I THINK means the 30 with probabilities closest to 0 or 1 in an absolute sense.