[phpBB Debug] PHP Notice: in file [ROOT]/includes/session.php on line 2208: Array to string conversion
[phpBB Debug] PHP Notice: in file [ROOT]/includes/session.php on line 2208: Array to string conversion
[phpBB Debug] PHP Notice: in file [ROOT]/includes/session.php on line 2208: Array to string conversion
[phpBB Debug] PHP Notice: in file [ROOT]/includes/session.php on line 2208: Array to string conversion
[phpBB Debug] PHP Notice: in file [ROOT]/includes/session.php on line 2208: Array to string conversion
[phpBB Debug] PHP Notice: in file [ROOT]/includes/session.php on line 2208: Array to string conversion
[phpBB Debug] PHP Notice: in file [ROOT]/includes/session.php on line 2208: Array to string conversion
[phpBB Debug] PHP Notice: in file [ROOT]/includes/session.php on line 2208: Array to string conversion
[phpBB Debug] PHP Notice: in file [ROOT]/includes/session.php on line 2208: Array to string conversion
[phpBB Debug] PHP Notice: in file [ROOT]/includes/session.php on line 2208: Array to string conversion
[phpBB Debug] PHP Notice: in file [ROOT]/includes/session.php on line 2208: Array to string conversion
[phpBB Debug] PHP Notice: in file [ROOT]/includes/session.php on line 2208: Array to string conversion
[phpBB Debug] PHP Notice: in file [ROOT]/includes/session.php on line 2208: Array to string conversion
[phpBB Debug] PHP Notice: in file [ROOT]/includes/session.php on line 2208: Array to string conversion
[phpBB Debug] PHP Notice: in file [ROOT]/includes/session.php on line 2208: Array to string conversion
[phpBB Debug] PHP Notice: in file [ROOT]/includes/session.php on line 2208: Array to string conversion
[phpBB Debug] PHP Notice: in file [ROOT]/includes/session.php on line 2208: Array to string conversion
[phpBB Debug] PHP Notice: in file [ROOT]/includes/session.php on line 2208: Array to string conversion
[phpBB Debug] PHP Notice: in file [ROOT]/includes/session.php on line 2208: Array to string conversion
[phpBB Debug] PHP Notice: in file [ROOT]/includes/session.php on line 2208: Array to string conversion
[phpBB Debug] PHP Warning: in file [ROOT]/includes/functions.php on line 4688: Cannot modify header information - headers already sent by (output started at [ROOT]/includes/functions.php:3823)
[phpBB Debug] PHP Warning: in file [ROOT]/includes/functions.php on line 4690: Cannot modify header information - headers already sent by (output started at [ROOT]/includes/functions.php:3823)
[phpBB Debug] PHP Warning: in file [ROOT]/includes/functions.php on line 4691: Cannot modify header information - headers already sent by (output started at [ROOT]/includes/functions.php:3823)
[phpBB Debug] PHP Warning: in file [ROOT]/includes/functions.php on line 4692: Cannot modify header information - headers already sent by (output started at [ROOT]/includes/functions.php:3823)
Poco Forums • View topic - Check email against DNSBLs (DNS blacklists)

Check email against DNSBLs (DNS blacklists)

Scripting questions and ideas

Moderators: Eric, Tomas, robin, Michael

Postby Hogyt » Sun Aug 29, 2004 5:17 am

vamp07 wrote:It's working great for me.


Thats great to hear and has made my day :-D If you find any good RBLs or you alter the spam/ham scores to something better then please do share!

vamp07 wrote:Now I wish I could use it to train my Bayesian filters when it is clearly spam as far as rbl is concerned. I can't think of a way to do this. Does anybody know? You can't mark something as junk from within poco script can you? Will moving a message through script to the junk folder have the same effect?


I don't know how to force PocoMail to learn the message as spam or ham through its Bayes filter and i haven't tested whether moving a message through PocoScript to the junk folder will do this but i'll test it tomorrow. It would be great if the script could do this! If anyone else has some ideas or would like to test it then be my guest :-)

By the way, i've got a small update to the program which fixes a case when the IP address can't always be found even though its there, but it's minor and i'm still testing it so i won't release it yet. I have some more ideas for improving spam detection, such as optionally assigning a small (up to the user) negative score if the sender is in your address book but these things will have to wait for a bit. Edit: Come to think of it, a standard filter can already be set up to search for a known or junk sender and alter the spam score accordingly so scratch that idea!
Mat
Hogyt
Poco Enthusiast
 
Posts: 241
Joined: Thu Jul 29, 2004 11:22 am
Location: England

Postby vamp07 » Mon Aug 30, 2004 12:16 am

Here is a suggestion. For each rbl that could be contacted have it build a header X-RBL-NAME: [SPAM,NOT-SPAM]. This way you are giving the Bayesian filters something clean to work with that they can classify and use for their own scoring. The header in the format you currently build it is not useful to the Bayesian filters.
vamp07
Frequent Visitor
 
Posts: 66
Joined: Mon Jul 26, 2004 11:31 am

Postby Hogyt » Mon Aug 30, 2004 2:27 am

I'm confusing myself here so let me go over what happens :)

1) We receive a new email and the script runs on it
2) Whatever the score, the Bayes filter then checks the email and adds (or subtracts) to the score
3) If the final score is high enough the email is moved to the junk mailbox

So as it stands the Bayes filter always checks the email but the spam/good word list is never updated.

vamp07 wrote:Now I wish I could use it to train my Bayesian filters when it is clearly spam as far as rbl is concerned.


a) So I think what you're asking for is that the spam/good word list is automatically updated depending on the result of the script? This would update the Bayes filter on the entire contents of the message.

vamp07 wrote:For each rbl that could be contacted have it build a header X-RBL-NAME: [SPAM,NOT-SPAM]. This way you are giving the Bayesian filters something clean to work with that they can classify and use for their own scoring.


b) This is kind of doing the same thing twice. The script already scores the email and i think the Bayes should be an independent test. If it did what you say then the script could make a mistake and the Bayes filter would follow on and make the same mistake.

Have i understood you correctly? I think a) would be useful but b) not so useful. I don't know how to do a) though :?
Mat
Hogyt
Poco Enthusiast
 
Posts: 241
Joined: Thu Jul 29, 2004 11:22 am
Location: England

Postby Hogyt » Mon Aug 30, 2004 2:45 am

Updated to v1.09. The path to runmin.exe and the timeout value should now be set in the "Setup Script" screen. The list of RBLs should still be edited directly in the script. The IP extraction routine is a bit better (it was failing to get the IP in some cases before).
Mat
Hogyt
Poco Enthusiast
 
Posts: 241
Joined: Thu Jul 29, 2004 11:22 am
Location: England

Postby Les Armitage » Mon Aug 30, 2004 4:13 am

Bl***y H**l. I need to lie down in a dark room...
Les Armitage
Poco Tourist
 
Posts: 33
Joined: Sun Jul 25, 2004 7:57 pm
Location: West Lancashire. England

Postby vamp07 » Mon Aug 30, 2004 10:38 am

MAT,

basically you are adding a score to the spamines of a message. Poco Defaults to sending stuff to spam if the score is above 12. You can make that whatever you want but 12 is the default. Personally I think Bayesian scoring does an incredible job of knowing is something is spam after is has been trained. You check against RBL is cool and should provide the Bayesian filters another great indicator of if a message is spam. The header as constructed now is legible by a human but not useful to the Bayesian filters. If each rbl got its's own header with an indicator of yes or no the Bayesian filter would quickly start using it in it's computations and I bet it could quickly become one of the most used tokens in it's calculations since it is rarely wrong . AS far as which rbl is more accurate etc, who cares. Let the Bayesian filter figure it out itself over time. You don't need to get into assigning each rbl a score based on your perceived value of that rbl. Your current modification of the score forces me to marry it to the Bayesian filter by giving it the ability to override what the Bayesian filter determines.
vamp07
Frequent Visitor
 
Posts: 66
Joined: Mon Jul 26, 2004 11:31 am

Postby Hogyt » Mon Aug 30, 2004 11:08 am

I'll add it as an option :D I'm still not convinced it'll work better than the current system (which for me is getting over 99% correct classification so far) but you can see how it works for you! As it stands you can set the spam/ham scores to 0 easily enough, so will it be enough for you if we change the headers to something like the following?

X-Poco-Spam-RBL: Received from IP address (66.218.66.66)
Not in spamsources.fabel.dk (+0) spamsourcesfabeldkfalse
Found in relays.nthelp.com (+6) relaysnthelpcomtrue

ie. keep it how it is but add a token after each line that Bayes can learn from. What do you think?

Edit: I've posted to the spamassassin mailing list to get their opinions on the pros and cons of each method.
Mat
Hogyt
Poco Enthusiast
 
Posts: 241
Joined: Thu Jul 29, 2004 11:22 am
Location: England

Postby vamp07 » Mon Aug 30, 2004 11:35 am

The scoring can stay the same since I use strict Bayesian which means it assigns each message a score of 100 or -100. As long as rbl stick with scores in the range you are using it really has no effect. What I think is needed for the headers are new headers. One for each list that did not time out:

X-Poco-Spam-RBL-bl.spamcop.net: YES
X-Poco-Spam-RBL-dnsbl.njabl.org: No


If you want to keep things simple, don't add it as an option. Just stop putting everything under one header and create new headers for each list. You don't even need to use yes, no. You could put the score that the given list added or subtracted from the overall score. AS long as the values stay the same over time Bayes will pick up on them and start using them.

As far as accuracy I have seen several spams that were not in any rbl but were definitely spam. Bayes caught those just fine.
vamp07
Frequent Visitor
 
Posts: 66
Joined: Mon Jul 26, 2004 11:31 am

Postby Hogyt » Mon Aug 30, 2004 11:41 am

I see lots of spams that aren't in the RBLs too but i also run Bayes with +15 for spam and -5 for ham, where 10 is enough to classify the email as spam, so between the two methods it's picking up almost everything correctly.

The way you've described the headers, would Bayes associate the 'yes' or 'no' with the word to the left of it? I thought not which is why i concatenated the result into one word in the example above.
Mat
Hogyt
Poco Enthusiast
 
Posts: 241
Joined: Thu Jul 29, 2004 11:22 am
Location: England

Postby vamp07 » Mon Aug 30, 2004 11:56 am

I think this is the only way to get tokens in those ini files with one per header. Your current headers changes too much between messages to be useful (I think, not looked in the ini file to see exactly what it creates).
vamp07
Frequent Visitor
 
Posts: 66
Joined: Mon Jul 26, 2004 11:31 am

Postby Hogyt » Mon Aug 30, 2004 12:04 pm

I've had a look through the DBSpam.ini file and it looks like both methods would work fine. I think i prefer it like this:

X-Poco-Spam-RBL: Received from IP address (66.218.66.66)
spamsources.fabel.dk-false (+0)
relays.nthelp.com-true (+6)

Since it is still readable and allows for extra info such as the IP address and debug info. That should result in these tokens being assigned scores:

X-Poco-Spam-RBL-spamsources.fabel.dk-false
X-Poco-Spam-RBL-spamsources.fabel.dk-true
X-Poco-Spam-RBL-relays.nthelp.com-true
X-Poco-Spam-RBL-relays.nthelp.com-false
Mat
Hogyt
Poco Enthusiast
 
Posts: 241
Joined: Thu Jul 29, 2004 11:22 am
Location: England

Postby vamp07 » Mon Aug 30, 2004 12:15 pm

You are right. Just adding the -false or -true will do the trick.
vamp07
Frequent Visitor
 
Posts: 66
Joined: Mon Jul 26, 2004 11:31 am

Postby Hogyt » Mon Aug 30, 2004 12:20 pm

Either way is fine really, i'm just looking for the easiest change to the code that does what you want ;-) If you can wait until this time tomorrow then i'll update the script then.
Mat
Hogyt
Poco Enthusiast
 
Posts: 241
Joined: Thu Jul 29, 2004 11:22 am
Location: England

Postby Hogyt » Mon Aug 30, 2004 3:12 pm

Updated to v1.10. It has a new option "Use new style headers?" to change which style of headers is displayed. Setting it to True (the default value) gives the new headers and False the original headers. The new script along with examples of each type of header is in the first post. I've renamed it to DNSBL since apparently RBL is a trademark and DNSBL the correct word to use.

Can you give it a try Vamp and see if it is working for you?
Mat
Hogyt
Poco Enthusiast
 
Posts: 241
Joined: Thu Jul 29, 2004 11:22 am
Location: England

Postby Hogyt » Wed Sep 01, 2004 12:50 pm

Is anyone else using this script? If so, how are you finding it?

If you're on the border line of deciding whether to give it a try, check out my last 3 days stats :D

Image

At least 2 of the incorrectly classified email was due to me testing stuff.

This is how i set up the Bayesian filter (not sure if these options are that good but they seem to work) on high sensitivity:

Image

I have the script in the first post running as an incoming filter, another incoming filter which decreases the junk score by 5 if the sender is in my address book and then the standard Junk Mail filter turned on.
Mat
Hogyt
Poco Enthusiast
 
Posts: 241
Joined: Thu Jul 29, 2004 11:22 am
Location: England

PreviousNext

Return to PocoScript Help and How-To

Who is online

Users browsing this forum: No registered users and 3 guests

cron