[phpBB Debug] PHP Notice: in file [ROOT]/includes/session.php on line 2208: Array to string conversion
[phpBB Debug] PHP Notice: in file [ROOT]/includes/session.php on line 2208: Array to string conversion
[phpBB Debug] PHP Notice: in file [ROOT]/includes/session.php on line 2208: Array to string conversion
[phpBB Debug] PHP Notice: in file [ROOT]/includes/session.php on line 2208: Array to string conversion
[phpBB Debug] PHP Notice: in file [ROOT]/includes/session.php on line 2208: Array to string conversion
[phpBB Debug] PHP Notice: in file [ROOT]/includes/session.php on line 2208: Array to string conversion
[phpBB Debug] PHP Notice: in file [ROOT]/includes/session.php on line 2208: Array to string conversion
[phpBB Debug] PHP Notice: in file [ROOT]/includes/session.php on line 2208: Array to string conversion
[phpBB Debug] PHP Notice: in file [ROOT]/includes/session.php on line 2208: Array to string conversion
[phpBB Debug] PHP Notice: in file [ROOT]/includes/session.php on line 2208: Array to string conversion
[phpBB Debug] PHP Notice: in file [ROOT]/includes/session.php on line 2208: Array to string conversion
[phpBB Debug] PHP Notice: in file [ROOT]/includes/session.php on line 2208: Array to string conversion
[phpBB Debug] PHP Notice: in file [ROOT]/includes/session.php on line 2208: Array to string conversion
[phpBB Debug] PHP Notice: in file [ROOT]/includes/session.php on line 2208: Array to string conversion
[phpBB Debug] PHP Notice: in file [ROOT]/includes/session.php on line 2208: Array to string conversion
[phpBB Debug] PHP Notice: in file [ROOT]/includes/session.php on line 2208: Array to string conversion
[phpBB Debug] PHP Notice: in file [ROOT]/includes/session.php on line 2208: Array to string conversion
[phpBB Debug] PHP Notice: in file [ROOT]/includes/session.php on line 2208: Array to string conversion
[phpBB Debug] PHP Notice: in file [ROOT]/includes/session.php on line 2208: Array to string conversion
[phpBB Debug] PHP Warning: in file [ROOT]/includes/functions.php on line 4688: Cannot modify header information - headers already sent by (output started at [ROOT]/includes/functions.php:3823)
[phpBB Debug] PHP Warning: in file [ROOT]/includes/functions.php on line 4690: Cannot modify header information - headers already sent by (output started at [ROOT]/includes/functions.php:3823)
[phpBB Debug] PHP Warning: in file [ROOT]/includes/functions.php on line 4691: Cannot modify header information - headers already sent by (output started at [ROOT]/includes/functions.php:3823)
[phpBB Debug] PHP Warning: in file [ROOT]/includes/functions.php on line 4692: Cannot modify header information - headers already sent by (output started at [ROOT]/includes/functions.php:3823)
Poco Forums • View topic - Check email against DNSBLs (DNS blacklists)

Check email against DNSBLs (DNS blacklists)

Scripting questions and ideas

Moderators: Eric, Tomas, robin, Michael

Postby SFCurley » Wed Oct 13, 2004 7:07 am

Thanks. For my ip address it's not a problem. I use extensive whitelisting and Bayesian filtering before I get to the RBL test, so anyone who's whitelisted (including my domain) would never get to the rbl test.
SFCurley
 

Postby Hogyt » Wed Oct 13, 2004 8:44 am

Updated to v1.12 (see first post). The IP reading routine should work better. I've put the 'old' DNSBLs (the ones that were used in v1.10) in the code but commented them out. Feel free to include them if you want.
Mat
Hogyt
Poco Enthusiast
 
Posts: 241
Joined: Thu Jul 29, 2004 11:22 am
Location: England

Postby Hogyt » Thu Oct 14, 2004 12:09 pm

Updated to v1.13 (see first post). Sorry for so many updates! There was a bug which would affect anyone running this script as a filter after (below) the built in junk mail filters and then after that using a 'Junk score more than' filter. So it might not have affected anyone on the planet other than me but anyway it's fixed now :wink:
Mat
Hogyt
Poco Enthusiast
 
Posts: 241
Joined: Thu Jul 29, 2004 11:22 am
Location: England

Postby SFCurley » Thu Oct 14, 2004 3:55 pm

Believe it or not, that's exactly how I"m using it, but hadn't noticed any bug yet. Thanks for the fix, though.
SFCurley
 

Postby Hogyt » Fri Oct 15, 2004 1:48 am

The 'Junk score more than' filter would have been using the score before this script ran instead of the score after it so emails tagged as spam wouldn't have had an increased spam score. It works now though :D
Mat
Hogyt
Poco Enthusiast
 
Posts: 241
Joined: Thu Jul 29, 2004 11:22 am
Location: England

Non-Iteger assigned to integer value

Postby tdzark » Wed Nov 03, 2004 1:37 am

That about sums my problem with this filter up :)

I've changed the settings to tihs:

Embed $dnsbllist "end"
dnsbl.sorbs.net,30,0
dnsbl.njabl.org,30,0
list.dsbl.org,30,0
bl.spamcop.net,50,0
sbl-xbl.spamhaus.org,30,0
end

...since I use strict bayesian. I guess I need higher values then.

When it runs, I get the error box: non-integer assigned to integer value.

I've tried to remove blank spaces after the numbers just in case and can't see what's not strictly integers in there... also I've set the wait time to 1 since I got a lot of emails each day and have a very fast connection.

My filters: First I sort my emails into their appropriate folders, and then at the bottom pocomail bayesian junk is run, and at the bottom the DNSBL script is run. Is this the best way?

And is there somewhere I should use the stop processing function?

The success of the bayesian was horrible so I installed it over again, turned off non-bayesian, set it to strict and then started teaching it with only new messages (there's plenty of spam to learn from there....).

It started on 90% and have only decreased, 90% seems way too huge a number anyway, I just dl 60 emails and 20 of them where incorrectly in my in box. And they where the simplest spam messages with porn, viagra, rolex etc in text.

What's wrong here?
26.418 bad words, 5.019 good.
5% of incoming (153) filtered as junk, 88,66% accuracy (346 missed) and 1,97% false positive (60 missed).

Since reinstall I've received arond 1300 emails all in all. I don't figure out the number above, feel my result is much worse.

Please give some guidance!

TJ
tdzark
Drop-in Visitor
 
Posts: 6
Joined: Wed Nov 03, 2004 1:23 am

Re: Non-Iteger assigned to integer value

Postby Hogyt » Wed Nov 03, 2004 2:58 am

tdzark wrote:I've changed the settings to tihs:

Embed $dnsbllist "end"
dnsbl.sorbs.net,30,0
dnsbl.njabl.org,30,0
list.dsbl.org,30,0
bl.spamcop.net,50,0
sbl-xbl.spamhaus.org,30,0
end

...since I use strict bayesian. I guess I need higher values then.


Yes, probably more like 100 or 200 with the strict settings, but see my filters screenshot below because it depends on how you set it up.

tdzark wrote:When it runs, I get the error box: non-integer assigned to integer value.


Can you try v1.14 and see if you still get them? I made some changes a few days ago and have just uploaded the new script. If the error still occurs i'll look into it.

tdzark wrote:I've tried to remove blank spaces after the numbers just in case and can't see what's not strictly integers in there... also I've set the wait time to 1 since I got a lot of emails each day and have a very fast connection.


I'm not sure exactly what triggers the error but i don't think it's the spam scores. It is more likely another part of the code but i added more checks in v1.14 so hopefully that has fixed it.

You could also set the filter to not run if the recipient is in your address book (see my filters below). I find that speeds things up a lot since i don't get spam from friends (it could happen but it hasn't yet!) and so theres no point in checking the DNSBL's in that case.

tdzark wrote:My filters: First I sort my emails into their appropriate folders, and then at the bottom pocomail bayesian junk is run, and at the bottom the DNSBL script is run. Is this the best way?


Does the PocoMail junk filter and this script run on your mail? You can check the headers to see. If a filter causes a message to move to a new mailbox then that stops other filters running which is why i ask. This is how i currently have it set up:

Image

I've highlighted the relevant part. The PocoMail junk filter runs first. If that marks the email as spam then it moves it to the junk box and no more processing takes place.

Then, if the sender isn't in my address book, this script runs. If this script marks the message as spam then the score is at least 15. So the next script checks to see if it is more than 14 (i think 15 would be ok too), and if it is then that moves the email to the junk box and no more processing takes place.

Then i have a million filters that move emails to their correct mailboxes, but only after the junk filters have run.

tdzark wrote:And is there somewhere I should use the stop processing function?


If you set it up how i have it then you won't need it because moving an email also stops processing.

tdzark wrote:The success of the bayesian was horrible so I installed it over again, turned off non-bayesian, set it to strict and then started teaching it with only new messages (there's plenty of spam to learn from there....).

It started on 90% and have only decreased, 90% seems way too huge a number anyway, I just dl 60 emails and 20 of them where incorrectly in my in box. And they where the simplest spam messages with porn, viagra, rolex etc in text.

What's wrong here?
26.418 bad words, 5.019 good.
5% of incoming (153) filtered as junk, 88,66% accuracy (346 missed) and 1,97% false positive (60 missed).

Since reinstall I've received arond 1300 emails all in all. I don't figure out the number above, feel my result is much worse.

Please give some guidance!

TJ


I have the Bayesian filter set up like this:

Image

And the results i'm getting are like this:

Image

You can see i've taught it a massive number of spam and ham. Maybe that is the major difference?

I hope you can get it working as you like it!
Mat
Hogyt
Poco Enthusiast
 
Posts: 241
Joined: Thu Jul 29, 2004 11:22 am
Location: England

Postby tdzark » Thu Nov 04, 2004 2:32 am

Yup :P That worked, no more integer errors on me!

However, I no longer see any message in my Junk Mail box. It's unlikely that the script has anything to do with it, but that's the only change I've made with my knowing the last days.

I don't trust my bayesian enough yet to let it make all decisions on what's spam and not, so if anyone could me he get my junk back (didn't think I'd ever ask this question!) I'd be very happy about it.

Tjalve
tdzark
Drop-in Visitor
 
Posts: 6
Joined: Wed Nov 03, 2004 1:23 am

Postby Hogyt » Thu Nov 04, 2004 2:48 am

I'm glad the errors are gone. You could try disabling the script for a bit and see if your spam comes back but from how you described your setup earlier i didn't see how it would work with the filters in the order you described. You could try setting them up like the screenshot of my filter window above and see if it works that way.

The other thing to check is the headers of your emails and see if there is a X-Poco-Spam-DNSBL header and a X-Poco-Score-Detail header.
Mat
Hogyt
Poco Enthusiast
 
Posts: 241
Joined: Thu Jul 29, 2004 11:22 am
Location: England

Postby tdzark » Thu Nov 04, 2004 8:31 pm

Heh, don't know what you guys did to help me but my email is back, thanx :?

I set it up like your screenshot from the beginning, it works fine.

But why is it you recommend the spam filters to be on top? My other filter only deals with mail I 100% certain wants to get. I just sort them into folders based on email addresses, this is just as sure as checking against the addressbook, like you do.

Now the junk filter then the DNSBL filter have to work through all these emails. Wouldn't it be better to first grab the email I'm sure I want to have, and then afterwards run the junk filter on the rest?

Or would this be affecting the learning of the bayesian filter?

Actually I'm not sure how this learning works - I know I can teach it bad words by marking emails as junk, but when is it that the bayesian learns good words? Does it automatically count words as good as long as they're not marked as bad? And if so, will it do so for all my emails or only those that the junk filter runs on? In the latter case I guess this learning will be affected if I move the bayesian to the bottom.

If this is the case though, as it is now the DNSBL runs after the junk filter, so then the junk filter will not be able to use the result from the DNSBL filter, right?

TJ
tdzark
Drop-in Visitor
 
Posts: 6
Joined: Wed Nov 03, 2004 1:23 am

Postby Hogyt » Thu Nov 04, 2004 11:19 pm

All of my comments below are as i believe the filters work, but that doesn't necessarily mean they're correct! ;-)

tdzark wrote:But why is it you recommend the spam filters to be on top? My other filter only deals with mail I 100% certain wants to get. I just sort them into folders based on email addresses, this is just as sure as checking against the addressbook, like you do.


Thats fine how you describe it. If you are 100% certain you want the mail then there is no need to run the junk filters on them, and it will save some time not running them too (especially not running this script which can take a few seconds to run).

In general i don't know in advance whether emails i receive are ham, eg. if i get an email from a yahoo group, i can't move that into a folder first, because then the junk filters will never run on it (since moving an email stops the remaining filters from running). So i always have to have at least some moving filters at the end.

How i have it setup, i run Bayes on the message, then i only run this DNSBL script if the sender is in my address book (which saves some time, and this is the only indicator i have that the message is probably ham) and only then, if the message has been classed as ham by both filters, i move the message to its mailbox (otherwise it goes to the junk box).

I could check if the sender is in my address book right at the start of the filters and move it immediately, which would be similar to what you're doing and would also work, so in a nut-shell, it sounds fine how you're doing it! The only reason i don't do that is because it means having some moving filters at the start and the same filters again at the end (ie. a more complex setup).

tdzark wrote:Now the junk filter then the DNSBL filter have to work through all these emails. Wouldn't it be better to first grab the email I'm sure I want to have, and then afterwards run the junk filter on the rest?

Or would this be affecting the learning of the bayesian filter?


That would be fine if you're 100% sure the email is ham, i don't think there is any benefit in running the filters in this case. As i understand it, the Bayesian filter doesn't learn messages as good or bad, or update the scores (the number of times that word has occurred) unless you yourself classify the message. So if you drag the message in or out of the junk mail folder, if you click on the "File as Junk"/"Classify as Good" buttons or if you click on the Junk/Good buttons in the Junk Mail Filtering window then it will learn the message and update the good/bad words and the scores, otherwise all it does is give the message a score based on what it already knows. This is like learning on mistakes only (eg. you only teach the Bayesian filter if it's made a mistake) which is supposed to be a good method from what i've read.

tdzark wrote:Actually I'm not sure how this learning works - I know I can teach it bad words by marking emails as junk, but when is it that the bayesian learns good words? Does it automatically count words as good as long as they're not marked as bad? And if so, will it do so for all my emails or only those that the junk filter runs on? In the latter case I guess this learning will be affected if I move the bayesian to the bottom.


It learns good words in the same way as it learns bad words, ie. when you classify a mail as junk or good. It doesn't learn them by running the filter on your mail, so it doesn't automatically do anything. This means you can put the Bayesian filter at the bottom if you want, but only if you're sure the filters that move mail are dealing with ham.

tdzark wrote:If this is the case though, as it is now the DNSBL runs after the junk filter, so then the junk filter will not be able to use the result from the DNSBL filter, right?


That is correct. The Bayesian filter doesn't know what the DNSBL filter has done at this stage. Personally i don't like the idea of the Bayesian filters scoring being modified by what the DNSBL filter says. If you would prefer it to do that then you can reverse the order of the filters but i'm getting great results as it is so i don't think it will help.

I hope that explains everything a bit more... :D
Mat
Hogyt
Poco Enthusiast
 
Posts: 241
Joined: Thu Jul 29, 2004 11:22 am
Location: England

Postby tdzark » Fri Nov 05, 2004 12:26 am

:shock: Uh oh...Houston I've got a problem!

So if you drag the message in or out of the junk mail folder, if you click on the "File as Junk"/"Classify as Good" buttons or if you click on the Junk/Good buttons in the Junk Mail Filtering window then it will learn the message and update the good/bad words and the scores


Is that true about the dragging? Simply dragging a message from the Junk Mail folder to my Inbox folder will mark it as good?

THAT I didn't know...

I've dragged a lot of messages from Junk to Inbox, to test the filtering of new rules I make. I have later found that this isn't a neccessary method, there are other ways, but I did it quite a lot of times.

If all those was marked as good when I did it, that could really help explaining why my bayesian filter is so confused....

If you can confirm this, I will have to delete my good and bad word lists again then, and start over...

I'm happy to do that though, since I might just get everything right this time :oops: :roll: :P 8)
tdzark
Drop-in Visitor
 
Posts: 6
Joined: Wed Nov 03, 2004 1:23 am

Postby Hogyt » Fri Nov 05, 2004 12:36 am

Yeah i'm fairly sure it does that. If you check in the Junk Mail Filtering window, the number of good and bad words, and then drag a message out or in to the junk box then the numbers of words changes.

I guess that could explain some problems! If you drag an email in to the junk box and then back out again then i think it should undo whatever it did. At the moment it doesn't apear to do that.
Mat
Hogyt
Poco Enthusiast
 
Posts: 241
Joined: Thu Jul 29, 2004 11:22 am
Location: England

Postby tdzark » Fri Nov 05, 2004 1:31 am

Yup, the good and bad word count changed whith drag and drop. Ok, word lists deleted once again.

Hm.. is there any place to download spam message from? Would be great with a place you could go, enter your email address and then get 1000-2000 emails to use for training your filter...

Thanks for all replies! I'm in learning mode :mrgreen:

TJ
tdzark
Drop-in Visitor
 
Posts: 6
Joined: Wed Nov 03, 2004 1:23 am

Postby Hogyt » Fri Nov 05, 2004 1:38 am

My pleasure :D

Everyone gets different spam so its usually considered not very helpful to get your spam like that. If you really want it you could make a spare email account and enter its address into every mailing list you can find ;-)
Mat
Hogyt
Poco Enthusiast
 
Posts: 241
Joined: Thu Jul 29, 2004 11:22 am
Location: England

PreviousNext

Return to PocoScript Help and How-To

Who is online

Users browsing this forum: No registered users and 2 guests

cron