by wortgames » Thu Jun 29, 2006 1:53 pm
Thanks Robin, I think there's definitely some value in it.
One advantage I see is that most important mail (ie business mail) would generally have few spelling errors, although the exact rate would no doubt vary depending on the business you are in.
Personal mail is likely to contain a few lazy errors and 'cute' spellings, but I would imagine in a fairly short space of time it would be possible to 'teach' the dictionary and/or develop rules to reduce the false positives. For example the 'English' dictionary could comprise English and American spellings, common mis-spellings, and modern abbreviations for example.
If we were going to get clever about it, we could also implement a 'hot list' of words that spammers try to mis-spell (eg viagra, mortgage) so the filter could assign a higher score if it thinks the mis-spelled word is close to a hotlist word. This hotlist could even update periodically / submit itself to a master database.
Any words containing a number should probably be given a higher score, and perhaps the same mis-spelling appearing more than once should not receive multiple scores (for example a brand name, industry jargon or model number that may be repeated).
I suspect it might prove quite effective.