[phpBB Debug] PHP Notice: in file [ROOT]/includes/session.php on line 2208: Array to string conversion
[phpBB Debug] PHP Notice: in file [ROOT]/includes/session.php on line 2208: Array to string conversion
[phpBB Debug] PHP Notice: in file [ROOT]/includes/session.php on line 2208: Array to string conversion
[phpBB Debug] PHP Notice: in file [ROOT]/includes/session.php on line 2208: Array to string conversion
[phpBB Debug] PHP Notice: in file [ROOT]/includes/session.php on line 2208: Array to string conversion
[phpBB Debug] PHP Notice: in file [ROOT]/includes/session.php on line 2208: Array to string conversion
[phpBB Debug] PHP Notice: in file [ROOT]/includes/session.php on line 2208: Array to string conversion
[phpBB Debug] PHP Notice: in file [ROOT]/includes/session.php on line 2208: Array to string conversion
[phpBB Debug] PHP Notice: in file [ROOT]/includes/session.php on line 2208: Array to string conversion
[phpBB Debug] PHP Notice: in file [ROOT]/includes/session.php on line 2208: Array to string conversion
[phpBB Debug] PHP Notice: in file [ROOT]/includes/session.php on line 2208: Array to string conversion
[phpBB Debug] PHP Notice: in file [ROOT]/includes/session.php on line 2208: Array to string conversion
[phpBB Debug] PHP Warning: in file [ROOT]/includes/functions.php on line 4688: Cannot modify header information - headers already sent by (output started at [ROOT]/includes/functions.php:3823)
[phpBB Debug] PHP Warning: in file [ROOT]/includes/functions.php on line 4690: Cannot modify header information - headers already sent by (output started at [ROOT]/includes/functions.php:3823)
[phpBB Debug] PHP Warning: in file [ROOT]/includes/functions.php on line 4691: Cannot modify header information - headers already sent by (output started at [ROOT]/includes/functions.php:3823)
[phpBB Debug] PHP Warning: in file [ROOT]/includes/functions.php on line 4692: Cannot modify header information - headers already sent by (output started at [ROOT]/includes/functions.php:3823)
Poco Forums • View topic - Poco's Bayesian Filters

Poco's Bayesian Filters

Discussion on Bayesian and standard junk mail filters

Moderators: Eric, Tomas, robin, Michael

Postby vamp07 » Fri Aug 27, 2004 6:53 am

I looked at my junk ini which at this point has 7993 words. Out of those 6248 occur 2 or less times. This means that 78% of the content of those files is not being used to determine if a piece of mail is junk. If popfile does not insist on words occurring x amount of times I bet the culprit for training time and dictionary size needing to get so big is here.

Is there any variable in the poco.ini file I can change to tell it to start using words if they only occur 1 time?
vamp07
Frequent Visitor
 
Posts: 66
Joined: Mon Jul 26, 2004 11:31 am

Postby SFCurley » Fri Aug 27, 2004 7:16 am

I think that w/ good mail bias set to "3", words with a count of "2" will be included because 2 x3 => 5. Words with a count of 1 would be included if you had your good mail bias set to 5, since 1 x 5 => 5. I did have a correspondence w/Slaven (which I can't find) and think this is how it works re: whether or not words are inlcuded. Maybe somebody at PSI could confirm. No setting other than this that I am aware of that would affect whether a word has to occur once or more than once to be consdered.

Finally, one other point of difference b/w Poco and POPFile . . . Poco evaluates only the "top 30" probability words in an email based on their appearance in the corpus: POPFile includes all words. Hard to say how much of a difference this would make.
SFCurley
 

Postby vamp07 » Fri Aug 27, 2004 7:24 am

I would think that good bias affects how it treats good words, not bad words. If it works the way you say it might be something to try. I wish we could get somebody in the know to participate in this thread.
vamp07
Frequent Visitor
 
Posts: 66
Joined: Mon Jul 26, 2004 11:31 am

Postby Pete » Fri Aug 27, 2004 11:39 am

The problem that I have is that I receive just enough spam to be annoying but not enough to effectively train PocoMail. I've been using PocoMail's BF for more than two months now. I have about 3,000 good words and 18,000 bad words.

On a daily basis, I receive about ten spams. Of those, about three of them automatically go into the Junk folder, but I have to manually classify and move the other seven. So for me, at this point, it would be easier and simpler to just manually delete the spam in my In box than to use PocoMail's Bayesian Filter. I hope that future versions of PocoMail's BF will be more useful.

On the plus side, PocoMail has not flagged very many false positives.
Pete
 

Postby SFCurley » Fri Aug 27, 2004 2:47 pm

Yeah, that's kind of like teaching a kid to read by sending them to school one day a week . . . you'll eventually get there, but it might be just in time for the prom. Clearly you've been too parsimonious with your email address!
SFCurley
 

Postby tribble » Fri Aug 27, 2004 3:47 pm

Pete, I'd be happy to forward you all of my spam. You could train Poco in no time at all :-)
tribble
Poco Enthusiast
 
Posts: 430
Joined: Wed Jul 28, 2004 8:55 am

Postby vamp07 » Wed Sep 01, 2004 11:28 pm

How many tokens does Pocomail use do decide if something is spam? Is it the top 10? I think that is what popfile uses. If it uses more maybe this would explain why the dictionaries need to get so big?
vamp07
Frequent Visitor
 
Posts: 66
Joined: Mon Jul 26, 2004 11:31 am

Postby SFCurley » Thu Sep 02, 2004 2:23 am

Top 30 as I recall, which I THINK means the 30 with probabilities closest to 0 or 1 in an absolute sense.
SFCurley
 

Previous

Return to Junk Mail Filtering Help and How-To

Who is online

Users browsing this forum: No registered users and 1 guest

cron