[phpBB Debug] PHP Notice: in file [ROOT]/includes/session.php on line 2208: Array to string conversion
[phpBB Debug] PHP Notice: in file [ROOT]/includes/session.php on line 2208: Array to string conversion
[phpBB Debug] PHP Notice: in file [ROOT]/includes/session.php on line 2208: Array to string conversion
[phpBB Debug] PHP Notice: in file [ROOT]/includes/session.php on line 2208: Array to string conversion
[phpBB Debug] PHP Notice: in file [ROOT]/includes/session.php on line 2208: Array to string conversion
[phpBB Debug] PHP Notice: in file [ROOT]/includes/session.php on line 2208: Array to string conversion
[phpBB Debug] PHP Notice: in file [ROOT]/includes/session.php on line 2208: Array to string conversion
[phpBB Debug] PHP Notice: in file [ROOT]/includes/session.php on line 2208: Array to string conversion
[phpBB Debug] PHP Notice: in file [ROOT]/includes/session.php on line 2208: Array to string conversion
[phpBB Debug] PHP Notice: in file [ROOT]/includes/session.php on line 2208: Array to string conversion
[phpBB Debug] PHP Notice: in file [ROOT]/includes/session.php on line 2208: Array to string conversion
[phpBB Debug] PHP Notice: in file [ROOT]/includes/session.php on line 2208: Array to string conversion
[phpBB Debug] PHP Notice: in file [ROOT]/includes/session.php on line 2208: Array to string conversion
[phpBB Debug] PHP Notice: in file [ROOT]/includes/session.php on line 2208: Array to string conversion
[phpBB Debug] PHP Notice: in file [ROOT]/includes/session.php on line 2208: Array to string conversion
[phpBB Debug] PHP Notice: in file [ROOT]/includes/session.php on line 2208: Array to string conversion
[phpBB Debug] PHP Notice: in file [ROOT]/includes/session.php on line 2208: Array to string conversion
[phpBB Debug] PHP Notice: in file [ROOT]/includes/session.php on line 2208: Array to string conversion
[phpBB Debug] PHP Notice: in file [ROOT]/includes/session.php on line 2208: Array to string conversion
[phpBB Debug] PHP Warning: in file [ROOT]/includes/functions.php on line 4688: Cannot modify header information - headers already sent by (output started at [ROOT]/includes/functions.php:3823)
[phpBB Debug] PHP Warning: in file [ROOT]/includes/functions.php on line 4690: Cannot modify header information - headers already sent by (output started at [ROOT]/includes/functions.php:3823)
[phpBB Debug] PHP Warning: in file [ROOT]/includes/functions.php on line 4691: Cannot modify header information - headers already sent by (output started at [ROOT]/includes/functions.php:3823)
[phpBB Debug] PHP Warning: in file [ROOT]/includes/functions.php on line 4692: Cannot modify header information - headers already sent by (output started at [ROOT]/includes/functions.php:3823)
Poco Forums • View topic - Junk mail filtering...once again

Junk mail filtering...once again

Discussion on Bayesian and standard junk mail filters

Moderators: Eric, Tomas, robin, Michael

Junk mail filtering...once again

Postby Guest » Thu Feb 10, 2005 8:03 am

I have been reading and re-reading all the Junk mail filtering topics in these forums for the past 1.5 months, trying to figure something out. I have been using Pocomail at home for over a year, and Barca at work since it came out.

I have been using the built in junk mail & bayesian filters since they were introduced in the products, and I will have to say, I can't get a score over the low to mid 80's % range.

I followed the steps in the many threads:

-delete contents of dbgood.ini, dbspam.ini
-reset statistics in the Junk mail filtering
-moved the junk mail filter to the top and bottom of my filter list
-change the sensitivity of the junk mail/bayesian filters
-etc.

I've tried them all for weeks at a time, and it appears that when I first log into email for the day, Barca and Poco can't keep up with the incoming email as I will receive 20 pieces of spam, and 15-18 of them end up in my inbox, where only 3-5 end up in the junk mailbox. I also noticed that sometimes the email in the inbox is identical to what it classified as spam and put in the junk box that same day.

I used both Popfile and K9 (with different email clients) and with Poco for a short period of time, and out of the box (fresh install) these two products are better than what Barca and Poco appear to offer, though they do slow email down.

I am determined to use the built in filtering, since that's what they are for, but I can't get consistent or desireable results. Though I've also read postings of people receiving 95%+ accuracy with the built in products, I would love to have that accruacy with Barca and Poco's built in filtering.

My current Junk mail filters is as follows:
Status tab:
-High Sensitivity
-12400 junk words
-4500 good words
-85% accuracy

General Settings tab:
-After downloading message is selected
-Custom sensitivity is set to Highest
-Run standard non-Bayesian filters is not checked

Bayesian tab:
-Run learning Bayesian filters is selected
-Junk Threshold: .90
-Good mail bias: 2.0
-Junk Score: 100
-Good Score: -100
in other words, I clicked the "Strict Bayesian" button

Word Lists tab:
-haven't messed with anything on this tab.

Currently my Junk Mail Filter is the last item in my Incoming mail Filter List, it was the first item for quite some time and had the same results.

If someone could help me out setting this up, I would much appreciate it.

Thanx
--Kevin
Guest
 

Postby kevinp » Thu Feb 10, 2005 8:04 am

Ok for some reason, as I typed the above message, my session may have logged out and it posted the message as GUEST. however, I entered in the origial message.

--Kevin
kevinp
Drop-in Visitor
 
Posts: 9
Joined: Thu Feb 10, 2005 7:37 am

Re: Junk mail filtering...once again

Postby Eric » Thu Feb 10, 2005 10:37 am

Hi Kevin,
kevinp wrote:I followed the steps in the many threads:

-delete contents of dbgood.ini, dbspam.ini
-reset statistics in the Junk mail filtering
-moved the junk mail filter to the top and bottom of my filter list
-change the sensitivity of the junk mail/bayesian filters
-etc.
What I don't see in this list is putting all newsletters and such into your address books, so they'll automatically get whitelisted.

As you know most newsletters are HTML and your Bayesian filter will put these almost always into the Junk Mail folder.

My current Junk mail filters:
Status tab:
- High Sensitivity
- 10051 junk words
- 4186 good words
- 99,47% accuracy :)

General Settings tab:
- After downloading message is selected
- Custom sensitivity is set to Highest
- Run standard non-Bayesian filters is not checked

Bayesian tab:
- Run learning Bayesian filters is selected
- Junk Threshold: 0.99
- Good mail bias: 3.0
- Junk Score: 100
- Good Score: -100
in other words, I clicked the "Strict Bayesian" button"

I hope this helps a bit. :wink:
Eric
 

Postby kevinp » Fri Feb 11, 2005 1:46 am

Thanx for the Reply

My newsletters (listservers I belong to) are being filtered fine with the filters I have set up, they never end up in the junk folder.

I just checked my email for the first time since last night at 10pm EST, and I had a total of 24 pieces of spam, 23 of which ended up in my IN box and only 1 in my Junk mail folder. This is pretty much on par with how things work every day, almost all the spam that come through end up in my IN box.

Should the Spam/Bayesian filter be the first in the filter list, last, middle?

--Kevin.
kevinp
Drop-in Visitor
 
Posts: 9
Joined: Thu Feb 10, 2005 7:37 am

Postby kevinp » Fri Feb 11, 2005 1:52 am

One more quick question, when spam ends up in my In box, what is the best way to move it/catagorize it as spam? All I have been doing is clicking the "File as Junk" button in the message window. Should I be doing something else?

--Kevin
kevinp
Drop-in Visitor
 
Posts: 9
Joined: Thu Feb 10, 2005 7:37 am

Postby Michael » Fri Feb 11, 2005 1:54 am

My junk mail filters are near the end of my incoming message filters. I have quite good accuracy on my end but then I am running the non-Bayesian filters as well as several processes I've built myself.

Putting the junk mail filters too high in the list will only result in legitimate messages being treated as junk, leave them where you have them. The fact that the messages are making it through to your In box means that no filter is catching them.

Do you get many messages from people you don't know? If not then you might consider using other filters to increase the junk mail score for unknown senders. This does carry the risk of missing legitimate messages but if you give a positive "Good" score in the Bayesian filters you can mitigate against this happening.
Michael
Moderator
 
Posts: 866
Joined: Mon Jul 26, 2004 12:14 pm
Location: Victoria BC, Canada

Postby kevinp » Fri Feb 11, 2005 2:04 am

The majority, if not all my personal email(pocomail at home) is from people I know, where some of their addresses are in my address book where others are not. My work email (barca), can contain both, personal and work email from clients. I don't bother putting my clients in my address book, because many of our clients are companies with many contacts.

Since the posting of my other message this AM, I received 3 more pieces of spam, all of which ended up in my Inbox.

What is the "Run standard Non Bayesian Filters" checkbox for and what does it do? What standard non bayesian filters does Barca have?

--Kevin
kevinp
Drop-in Visitor
 
Posts: 9
Joined: Thu Feb 10, 2005 7:37 am

Postby Michael » Fri Feb 11, 2005 2:25 am

kevinp wrote:The majority, if not all my personal email(pocomail at home) is from people I know, where some of their addresses are in my address book where others are not. My work email (barca), can contain both, personal and work email from clients. I don't bother putting my clients in my address book, because many of our clients are companies with many contacts.


Since these people are not in your address book I will revise my earlier recommendation slightly. Add a filter to check for either known or allowed senders and increase the junk score for those who are not in either list.

What are allowed senders? They are a list of addresses and/or domains that you don't necessarily maintain in your address book. This concept comes from the non-standard bayesian filters (see below). You should be able to implement this new filter without enabling the non-standard filters (I haven't tried this so I am not 100% certain of this).

To add an address or domain to the allowed senders list right click on the address in the message index pane and select "Junk Mail filtering | Allow Sender" (or Allow domain). If you have the option to underline known senders active (Tools | Options | Index Options | Underline known senders) then you can quickly see who is not in your address book.

kevinp wrote:Since the posting of my other message this AM, I received 3 more pieces of spam, all of which ended up in my Inbox.

What is the "Run standard Non Bayesian Filters" checkbox for and what does it do? What standard non bayesian filters does Barca have?

--Kevin


These are filters that Poco used before the Bayesian filters were added. They contain several tests for junk messages and include the ability to create your own white and black lists (including by domain), word lists, etc.
Michael
Moderator
 
Posts: 866
Joined: Mon Jul 26, 2004 12:14 pm
Location: Victoria BC, Canada

Postby kevinp » Fri Feb 11, 2005 2:50 am

So, with the creation of all these "additional" filters, why even have the bayesian filters?

When I used to use Popfile and K9, I never had to create extra filters, as their Bayesian filters were pretty accurate with their scoring.

When junk mail ends up in my inbox, what is the proper way to classify it as spam? Clicking the "File as Junk" button?

I guess I am just trying to understand why setting up all these rules/filters to whipe out spam is so difficult. I've used many different software packages, email, spam filters, etc., and this seems the most complex. I'm not saying I'm a novice, as I've been around the block for a decade and a half. :)


--Kevin
kevinp
Drop-in Visitor
 
Posts: 9
Joined: Thu Feb 10, 2005 7:37 am

Postby kevinp » Fri Feb 11, 2005 3:45 am

I hate to keep flooding this forum with messages, but I am just trying to better understand how all this works. Here are the scores from the past 2 pieces of spam that ended up in my IN box:

Status: U
X-Poco-Score-Detail: -100 [%BAYES%=P=0;T=90;BIAS=+20] (%bayes% P=0;T=90;Bias=+20)
X-Poco-Scored: -100
Subject: Name-brand software at low
Mime-Version: 1.0
Content-Type: text/html; charset="Windows-1251"

Status: U
X-Poco-Score-Detail: -100 [%BAYES%=P=0;T=90;BIAS=+20] (%bayes% P=0;T=90;Bias=+20)
X-Poco-Scored: -100
Subject: Same drugs --- little monetary value!
Mime-Version: 1.0
Content-Type: text/html; charset="utf-8"

Can anyone explain what the Poco-Score-Detail means, and how it would get that score?

Thanx for all your help

--Kevin
kevinp
Drop-in Visitor
 
Posts: 9
Joined: Thu Feb 10, 2005 7:37 am

Postby Michael » Fri Feb 11, 2005 4:18 am

Trying to answer both your last posts:

As to why the Bayesian filters are present, it's basically user demand. Many users saw Bayesian filters as a must have. Poco's standard filters were falling behind as spammer's adapted to general anti-spam techniques (not in response to Poco's rules but others were adopting similar rules).

Many competing products had Bayesian filters and PSI determined they must have them as well. It is my feeling that spammers will ultimately come up with methods to defeat Bayesian filters as it is in their interest to do so.

With regard to your second question, the X-Poco-Score-Detail entries (there can be more than one) are the result of the individual filter tests. A positive score indicates the particular filter thought the message was junk, a negative score means the filter thought the message was good. In your case it looks like you have both good and junk scores set in the Bayesian filters.

All scores are added up to give the X-Poco-Scored value. The last stage in Poco's junk mail filters is to compare this value with your threshold and then to either classify the message as good or junk.
Michael
Moderator
 
Posts: 866
Joined: Mon Jul 26, 2004 12:14 pm
Location: Victoria BC, Canada

Postby kevinp » Fri Feb 11, 2005 8:31 am

Michael,

Thanx for your input, I will be doing some more tweaking. I understand that spammers will always be finding ways around our filters, but originally I had my Spam filters in Barca set for a couple months and it still missed the majority of my spam, and this much training shouldn't have to be involved.

Normally, I should have my filters to capture/redirect email that comes in from my listservers, and filter them into the appropriate mailbox, then anything else going to my inbox should be hit by the spam filter and all junk go into the junk folder, but this isn't the case, and this is what I am trying to get resolved.

I can probably install Popfile or another 3rd party spam filter and receive better results within a week, but I don't want to do this since some of you are receiving VERY GOOD results with the built in spam filters in Barca and Poco.

I guess a question that has me pondering, if Bayesian is a set of rules, shouldn't all or most bayesian filters provide pretty much the same result?

Should I wipe out my settings (reset statistics, clear out DBGood.ini and DBSpam.ini) and start over?

--Kevin
kevinp
Drop-in Visitor
 
Posts: 9
Joined: Thu Feb 10, 2005 7:37 am

Postby Michael » Fri Feb 11, 2005 2:00 pm

Kevin:

My accuracy is quite good but not perfect. In fact in my case sometimes the Bayesian filters actually hurt my overall accuracy. My normal filters mark a message as junk then the Bayesian filter thinks the message is good and so adds the good score which stops the message from going into the junk mailbox. This is starting to happen more often and by examining some of the recent messages that this has happened to it is definitely due to spammers using new techniques to defeat the Bayesian filters.

My major problem is that very occasionally my filters will send a good message to the junk mailbox. I'm trying to refine my technique to avoid this.

I don't know if clearing the DBGood and DBSpam files will help, with spammers starting to invent tricks to get around the Bayesian filters I'm not sure how well clearing the files will work.

For those reading this thread and who have decided to clear the DBGood and DBSpam files, you only really need clear the DBSpam file. The DBGood file should be ok as is. I would advise being somewhat careful in building the DBSpam file, it might be wise to look at the spam messages and not train with them if they contain long lists of words unrelated to the message. I suspect this would be ok in most cases but you do run the risk that if you train the filter using several such messages and if fairly uncommon words are duplicated in these messages then you might end up with the BF filters mis-classifying a good message as junk. The problem with this is if you are receiving many junk messages it would be very easy to miss a legitimate message. This is one case where adding the "Accepted and Known sender" filter I mentioned earlier can help.

As to your question regarding the different Bayesian filters and their performance, there are subtle differences in the way different programs implement the Bayesian filter rules, some of these deal with how headers are treated, others deal with how many times a word needs to be seen before it is added to the spam corpus.
Michael
Moderator
 
Posts: 866
Joined: Mon Jul 26, 2004 12:14 pm
Location: Victoria BC, Canada

Postby kevinp » Fri Feb 11, 2005 3:35 pm

When a piece of spam is left in the inbox, what is the best way to "train" the filters so they will learn and start to classify them as spam? I keep hitting the "File as Junk" button, but this apparently appears to do nothing, but move it to the junk box.

Before I left work, I left the spam messages that I received at work in my pop box, to see what Pocomail (at home) would classify them. Just before I launched Poco, I logged into my accout via webmail to see what I had in there, and I had accumulated 20 pieces of spam in 4 hours. I launched Poco and 18 of those messages were left in the inbox, and Poco only classified 2 of them as spam.

I just don't understand how this is (not) working so well.

--Kevin
kevinp
Drop-in Visitor
 
Posts: 9
Joined: Thu Feb 10, 2005 7:37 am

Postby Guest » Fri Feb 11, 2005 6:08 pm

Are many of these messages styled? If so then the following filter may help:

Search: "Content-Type" for "plain", match only if not found AND
Search: "From" for "%addressbook%" (known senders), match only if not found
Searchk "From" for "%exceptsenders%" (allowed senders), match only if not found.

Action: Increase junk score (by 20?)
Guest
 

Next

Return to Junk Mail Filtering Help and How-To

Who is online

Users browsing this forum: No registered users and 1 guest

cron