[phpBB Debug] PHP Notice: in file [ROOT]/includes/session.php on line 2208: Array to string conversion
[phpBB Debug] PHP Notice: in file [ROOT]/includes/session.php on line 2208: Array to string conversion
[phpBB Debug] PHP Notice: in file [ROOT]/includes/session.php on line 2208: Array to string conversion
[phpBB Debug] PHP Notice: in file [ROOT]/includes/session.php on line 2208: Array to string conversion
[phpBB Debug] PHP Notice: in file [ROOT]/includes/session.php on line 2208: Array to string conversion
[phpBB Debug] PHP Notice: in file [ROOT]/includes/session.php on line 2208: Array to string conversion
[phpBB Debug] PHP Notice: in file [ROOT]/includes/session.php on line 2208: Array to string conversion
[phpBB Debug] PHP Notice: in file [ROOT]/includes/session.php on line 2208: Array to string conversion
[phpBB Debug] PHP Notice: in file [ROOT]/includes/session.php on line 2208: Array to string conversion
[phpBB Debug] PHP Notice: in file [ROOT]/includes/session.php on line 2208: Array to string conversion
[phpBB Debug] PHP Notice: in file [ROOT]/includes/session.php on line 2208: Array to string conversion
[phpBB Debug] PHP Notice: in file [ROOT]/includes/session.php on line 2208: Array to string conversion
[phpBB Debug] PHP Warning: in file [ROOT]/includes/functions.php on line 4688: Cannot modify header information - headers already sent by (output started at [ROOT]/includes/functions.php:3823)
[phpBB Debug] PHP Warning: in file [ROOT]/includes/functions.php on line 4690: Cannot modify header information - headers already sent by (output started at [ROOT]/includes/functions.php:3823)
[phpBB Debug] PHP Warning: in file [ROOT]/includes/functions.php on line 4691: Cannot modify header information - headers already sent by (output started at [ROOT]/includes/functions.php:3823)
[phpBB Debug] PHP Warning: in file [ROOT]/includes/functions.php on line 4692: Cannot modify header information - headers already sent by (output started at [ROOT]/includes/functions.php:3823)
Poco Forums • View topic - Spell check as spam filter?

Spell check as spam filter?

Discussion on Bayesian and standard junk mail filters

Moderators: Eric, Tomas, robin, Michael

Spell check as spam filter?

Postby wortgames » Tue Jun 27, 2006 1:50 pm

Hi all,

Is there a simple way to spell-check incoming mail in Poco and assign a spam score based on the results?

Most of my legitimate mail tends to be made up from real words, whereas much of my spam uses numbers, spaces and miss-spellings presumably to avoid triggering banned word lists.

Perhaps I could create a white-list of 'known' offences committed by friends (or just edit my own dictionary as appropriate). Or set their address to bypass the check.


Possible? Easy? Difficult?
wortgames
Poco Tourist
 
Posts: 38
Joined: Fri Sep 10, 2004 2:34 am

Postby Michael » Wed Jun 28, 2006 2:46 am

Unfortunately this is something that would require development from Poco Systems, there is no access to the Spell checking engine from PocoScript.
Michael
Moderator
 
Posts: 866
Joined: Mon Jul 26, 2004 12:14 pm
Location: Victoria BC, Canada

Postby robin » Wed Jun 28, 2006 9:00 pm

But it's a nice idea!
robin
 

Postby wortgames » Thu Jun 29, 2006 1:53 pm

Thanks Robin, I think there's definitely some value in it.

One advantage I see is that most important mail (ie business mail) would generally have few spelling errors, although the exact rate would no doubt vary depending on the business you are in.

Personal mail is likely to contain a few lazy errors and 'cute' spellings, but I would imagine in a fairly short space of time it would be possible to 'teach' the dictionary and/or develop rules to reduce the false positives. For example the 'English' dictionary could comprise English and American spellings, common mis-spellings, and modern abbreviations for example.

If we were going to get clever about it, we could also implement a 'hot list' of words that spammers try to mis-spell (eg viagra, mortgage) so the filter could assign a higher score if it thinks the mis-spelled word is close to a hotlist word. This hotlist could even update periodically / submit itself to a master database.

Any words containing a number should probably be given a higher score, and perhaps the same mis-spelling appearing more than once should not receive multiple scores (for example a brand name, industry jargon or model number that may be repeated).

I suspect it might prove quite effective.
wortgames
Poco Tourist
 
Posts: 38
Joined: Fri Sep 10, 2004 2:34 am

Postby Slaven » Mon Jul 03, 2006 10:26 am

Thanks, that is a neat idea. There may be some obstacles in terms of computer resources (there's already quite a lot of activity whenever a single messages is checked for spam, so initiating the spell-check engine may slow down the process), I'll add it to our wishlist! :)
Slaven Radic
Poco Systems Inc
Slaven
Poco Systems Inc
 
Posts: 1644
Joined: Fri Jul 23, 2004 7:37 pm

Postby Maximus » Wed Jul 05, 2006 8:45 am

Hi, nice idea... but don't forget that people like me (and many others in Europe) receive messages in various languages. We should be able to select more than just one dictionary (e.g. German, French and English). Sometimes, even a single email message comprises several languages, e.g. a lot of admin messages from airlines, ISPs, etc. contain the text in up to 5 languages. I am living in Switzerland (4 national languages & English), but I guess this is also valid for other nations like Canada (French and English) as well.

Cheers
Adi
Maximus
Resident Poster
 
Posts: 169
Joined: Fri Aug 13, 2004 8:03 pm
Location: Zürich, Switzerland


Return to Junk Mail Filtering Help and How-To

Who is online

Users browsing this forum: No registered users and 3 guests

cron