[phpBB Debug] PHP Notice: in file [ROOT]/includes/session.php on line 2208: Array to string conversion
[phpBB Debug] PHP Notice: in file [ROOT]/includes/session.php on line 2208: Array to string conversion
[phpBB Debug] PHP Notice: in file [ROOT]/includes/session.php on line 2208: Array to string conversion
[phpBB Debug] PHP Notice: in file [ROOT]/includes/session.php on line 2208: Array to string conversion
[phpBB Debug] PHP Notice: in file [ROOT]/includes/session.php on line 2208: Array to string conversion
[phpBB Debug] PHP Notice: in file [ROOT]/includes/session.php on line 2208: Array to string conversion
[phpBB Debug] PHP Notice: in file [ROOT]/includes/session.php on line 2208: Array to string conversion
[phpBB Debug] PHP Notice: in file [ROOT]/includes/session.php on line 2208: Array to string conversion
[phpBB Debug] PHP Notice: in file [ROOT]/includes/session.php on line 2208: Array to string conversion
[phpBB Debug] PHP Notice: in file [ROOT]/includes/session.php on line 2208: Array to string conversion
[phpBB Debug] PHP Notice: in file [ROOT]/includes/session.php on line 2208: Array to string conversion
[phpBB Debug] PHP Notice: in file [ROOT]/includes/session.php on line 2208: Array to string conversion
[phpBB Debug] PHP Notice: in file [ROOT]/includes/session.php on line 2208: Array to string conversion
[phpBB Debug] PHP Notice: in file [ROOT]/includes/session.php on line 2208: Array to string conversion
[phpBB Debug] PHP Notice: in file [ROOT]/includes/session.php on line 2208: Array to string conversion
[phpBB Debug] PHP Notice: in file [ROOT]/includes/session.php on line 2208: Array to string conversion
[phpBB Debug] PHP Notice: in file [ROOT]/includes/session.php on line 2208: Array to string conversion
[phpBB Debug] PHP Notice: in file [ROOT]/includes/session.php on line 2208: Array to string conversion
[phpBB Debug] PHP Notice: in file [ROOT]/includes/session.php on line 2208: Array to string conversion
[phpBB Debug] PHP Notice: in file [ROOT]/includes/session.php on line 2208: Array to string conversion
[phpBB Debug] PHP Notice: in file [ROOT]/includes/session.php on line 2208: Array to string conversion
[phpBB Debug] PHP Notice: in file [ROOT]/includes/session.php on line 2208: Array to string conversion
[phpBB Debug] PHP Notice: in file [ROOT]/includes/session.php on line 2208: Array to string conversion
[phpBB Debug] PHP Warning: in file [ROOT]/includes/functions.php on line 4688: Cannot modify header information - headers already sent by (output started at [ROOT]/includes/functions.php:3823)
[phpBB Debug] PHP Warning: in file [ROOT]/includes/functions.php on line 4690: Cannot modify header information - headers already sent by (output started at [ROOT]/includes/functions.php:3823)
[phpBB Debug] PHP Warning: in file [ROOT]/includes/functions.php on line 4691: Cannot modify header information - headers already sent by (output started at [ROOT]/includes/functions.php:3823)
[phpBB Debug] PHP Warning: in file [ROOT]/includes/functions.php on line 4692: Cannot modify header information - headers already sent by (output started at [ROOT]/includes/functions.php:3823)
Poco Forums • View topic - Filters not very smart about filtering

Filters not very smart about filtering

Discussion on Bayesian and standard junk mail filters

Moderators: Eric, Tomas, robin, Michael

Filters not very smart about filtering

Postby schwim » Sat Apr 16, 2005 3:33 pm

Hi guys,

I'm using 3.4, and have the junk mail filters pretty much standard(non-standard bayesian checked), and I go through the process of declassifying mail that's marked incorrectly, then marking it with it's correct distinction(i.e.~junk mail/good mail), but it doesn't seem to be learning very well.

I get topic reply notifications from one of my sites, and no matter how many times I unmark them as junk and mark them as good, the next one to come into the box gets marked as junk, and they're identical to the last one. This has probably happened 50 times now.

I've searched the forum and found threads with instructions like " declassify 15 times, then classify 15 times...." but this makes absolutely no sense to me.

Is there a simple way to help the filter system learn better? Right now, it's useless, as I'm still manually marking junk mail and unmarking good mail more often than not, but I don't want to install a third party app, because they seem to wreak havoc on PM, creating blank emails, etc...

thanks for any light you can shine on this :)

thanks,
json

Edit Eric: Moved to Junk Mail Filtering
schwim
Poco Tourist
 
Posts: 17
Joined: Sun Apr 03, 2005 9:15 am

Postby Michael » Sun Apr 17, 2005 11:46 am

First check where the junk mail filters are with respect to your regular incoming filters. They should be at or near the bottom of the list. This way other filters have a chance to act on the message and move it before the junk mail filters take hold. This should help eliminate false positives (a good message classified as junk).

Since you are also using the non-standard bayesian filters what do you have the junk score and good scores set to? These values may be totally negating the non-standard filters.
Michael
Moderator
 
Posts: 866
Joined: Mon Jul 26, 2004 12:14 pm
Location: Victoria BC, Canada

Postby seabc » Thu Apr 21, 2005 5:29 am

My experience has been that the standard non-bayesian filters, combined with some filters I have created, and with some banned words, do a reasonably good job - 86% rating, with almost no good emails misclassified as junk.

Now that I have figured out how to get Bayesian filters to process, I have discovered that my training approach has been worthless. I can see that Bayesian filtering is categorizing almost all of my email as good, despite having been trained on thousands of good and bad words. (I am talking about how Bayesian filtering categorizes email, distinguishing it from how these are scored) Particularly entertaining is that today I received three identical spams, which differed only in the From field. Bayesian filters concluded that two were good and one was bad. ?

So, clearly, I need to do better training. But I am increasingly cautious about this, because so many spams add in some text that could easily appear in any good message. So if I train on these messages, I suspect that I am undermining the value of all of prior training; and it seems reasonable to guess that such messages can skip right past Bayesian filters that have learned a lot of good words.

So I have the same question you do - what is the best way to train - plus an add-on - how much does Bayesian filtering add to the standard non-Bayesian filtering plus a handful of other filters and banned words?
seabc
Poco Tourist
 
Posts: 33
Joined: Mon Nov 22, 2004 7:18 am

Postby Michael » Thu Apr 21, 2005 3:32 pm

I am using a combination of BF, JMF and custom filters and scripts. I am starting to see more messages make it through this but haven't been keeping track. I will start to keep some stats on this and see what I discover.
Michael
Moderator
 
Posts: 866
Joined: Mon Jul 26, 2004 12:14 pm
Location: Victoria BC, Canada

Postby seabc » Fri Apr 22, 2005 6:18 am

I decided to try Thunderbird, to see how its built-in "adaptive filtering" compares to PM's. Running Thunderbird first, and leaving email on the server, and using none of my own filters, Thunderbird has failed to catch one spam. And I have trained it only on its mistakes, over a two day period.

On the same emails, PM's bayesian filtering has failed to classify several spams, despite or because of the training I have done. Particularly discouraging have been the numerous "Dear Homeowner" refinance offers that PM has missed - all look the same except for sender, and I have "trained" PM that these are spams.

I cannot decide whether to just turn off bayesian filtering, or to try and retrain it. Like you, I haven't seen a clear and concise description of how to train well.
seabc
Poco Tourist
 
Posts: 33
Joined: Mon Nov 22, 2004 7:18 am

Postby speerga » Fri Apr 22, 2005 12:44 pm

Hi all. I will probably sound like a broken record on this, BUT:

What you're finding out about Poco's Bayesian filter has been true every since it's been around. It's a really "retarded" sort of learner. :lol:

I have tried time and again to follow any instructions or suggestions on these forums for training the thing and simply cannot get it to learn and work effectively.

For the sake of speed and ease in downloading my email, I generally have given up on Spam filtering and run Pocomail with what little it will catch.

From time to time, I reinstall K9 as simply shut off Pocomail's Junk filtering functions altogether. No matter the somewhat slower speed in downloading email -- with K9, I get almost perfect Spam filtering.

Alas, it doesn't seem to matter how often we struggle to understand and/or train Poco, the Junk mail filtering, at least the Bayesian filter part, is simply WILDLY deficient compared to the builtin Spam filtering of Thunderbird or Mozilla Mail and such third-party software as K9.

And, yes, a truly step-by-step, here's-how tutorial for Junk mail filtering using Pocomail WOULD be a good idea. I don't mean one of those percentages, headers, sort of techno-babble things; I mean something by someone knowledgable doing a "Junk Mail in Poco for Dummies" sort of tutorial. :lol:

Gary Speer
speerga
Resident Poster
 
Posts: 116
Joined: Wed Jul 28, 2004 1:49 pm
Location: Springfield, Missouri

survey says!

Postby schwim » Fri Apr 22, 2005 3:19 pm

hey guys,

so I guess between the replies here and what I've read elsewhere on this forum, I'll just leave PM's filter system alone, as it has so far proven itself to be useless.

I came over from Thunderbird, and before that, was using Spam Inspector with OE, and both were very successful from the start, Thunderbird only dropping the ball once or twice in a blue moon, and Spam Inspector doing very good as well. PM on the other hand, as someone else has noted will miss THE EXACT SAME EMAIL with only a different sender. I can't imagine being able to teach a system that does that anything useful ;)

thanks,
json
schwim
Poco Tourist
 
Posts: 17
Joined: Sun Apr 03, 2005 9:15 am

Postby seabc » Sun Apr 24, 2005 1:51 am

Looking at the good and bad word files, I wonder about what PM is learning. Why, for example, is it useful for PM to have learned words like "X-ORIGINALARRIVALTIME-07=1" or "DATE-1000=1" or "RECEIVED-1conyv3sp3nzfpa0=1" or "X-SYMANTEC-TIMEOUTPROTECTION-0=1"
Last edited by seabc on Mon Apr 25, 2005 5:51 am, edited 1 time in total.
seabc
Poco Tourist
 
Posts: 33
Joined: Mon Nov 22, 2004 7:18 am

Postby Guest » Mon Apr 25, 2005 5:11 am

And how can training be of any value when learned words include words for html formatting, and font names? Is all spam in Times Roman, but good mail is in Arial?
Guest
 

Postby SFCurley » Thu Dec 29, 2005 11:25 am

In general, the PM bayesian filters, like all bayesian filters, will examine almost every token (words, fonts, strings, etc) that is present in the email and categorize the tokens appropriately. When using what it has learned, PM only looks at the (I believe) 30 most meaningful tokens that are truly indicative of whether it is junk or not, so having PM learn the meaningless tokens (arrivaltime, etc) probably does not harm, and simplifies the programming, I'm sure.
SFCurley
 

Postby saa888 » Fri Jan 05, 2007 4:24 am

The junk mail filtering in Myinfotogo is a joke. When I used Thunderbird, I rarely received junk email in my in box.

With Myinfotogo the junk mail fills up my inbox like crazy. Even the training feature doesn't help. The only reason I am keeping this is because it is the only combined email/PIM that I can find for my U3.

Pocosystems, you need to fix the junk mail filtering. It is total **** right now.
saa888
New Arrival
 
Posts: 2
Joined: Fri Jan 05, 2007 4:21 am

Postby dazbo » Fri Jan 05, 2007 7:42 am

Yeah i have to agree with a few posts in this thread.
Pocomails spam filtering with a few filters of my own are excellent! :D

cheers
Daz
dazbo
Drop-in Visitor
 
Posts: 13
Joined: Thu Dec 14, 2006 7:32 am

Postby saa888 » Sat Jan 06, 2007 7:41 pm

I reset the junk mail filters on Myinfotogo. I have received 9 junk emails. The program failed to find a single one of them.

When will you be coming out with email filtering on this program that actually does what it is supposed to do?

If you fixed that problem, this application would be a real winner. Right now it is barely usable.

When I used Thunderbird it caught a lot of spam right from the start, Myinfotogo can't even do it when it is trained to do it.
saa888
New Arrival
 
Posts: 2
Joined: Fri Jan 05, 2007 4:21 am

Postby Eric » Sat Jan 06, 2007 8:00 pm

saa888 wrote:I reset the junk mail filters on Myinfotogo. I have received 9 junk emails. The program failed to find a single one of them.
You've just reset your Junk Mail filters, so MITG hasn't learned enough to judge your emails. It needs at least to learn 1,000 good & 1,000 bad words.
When will you be coming out with email filtering on this program that actually does what it is supposed to do?
It actually does what it's supposed to do. For some it will work without much tweaking, for others it won't work great. :roll:
If you fixed that problem, this application would be a real winner. Right now it is barely usable.
I agree that the Bayesian filter needs improvement, but to say it's barely usable is simply untrue. At least it works for me and always did. I don't use a spam blocker anymore and until now I get some false positives from time to time,
but all real junk is transferred to Junk. 8)
When I used Thunderbird it caught a lot of spam right from the start, Myinfotogo can't even do it when it is trained to do it.
I know its spam filter is quite good, but that's all. For the rest I can't judge the app, since I've even never tried it.
Tried a lot of others before, although their junk mail filtering isn't always good. The only one who filtered best was Bloomba with SAProxy. Unfortunately taken over by Yahoo, because of their excellent search engine. :?

Hope the Junk Mail filtering will be adressed during a next upcoming beta, so it can be improved. :wink:
Eric
 


Return to Junk Mail Filtering Help and How-To

Who is online

Users browsing this forum: No registered users and 2 guests

cron