[phpBB Debug] PHP Notice: in file [ROOT]/includes/session.php on line 2208: Array to string conversion
[phpBB Debug] PHP Notice: in file [ROOT]/includes/session.php on line 2208: Array to string conversion
[phpBB Debug] PHP Notice: in file [ROOT]/includes/session.php on line 2208: Array to string conversion
[phpBB Debug] PHP Notice: in file [ROOT]/includes/session.php on line 2208: Array to string conversion
[phpBB Debug] PHP Notice: in file [ROOT]/includes/session.php on line 2208: Array to string conversion
[phpBB Debug] PHP Notice: in file [ROOT]/includes/session.php on line 2208: Array to string conversion
[phpBB Debug] PHP Notice: in file [ROOT]/includes/session.php on line 2208: Array to string conversion
[phpBB Debug] PHP Notice: in file [ROOT]/includes/session.php on line 2208: Array to string conversion
[phpBB Debug] PHP Notice: in file [ROOT]/includes/session.php on line 2208: Array to string conversion
[phpBB Debug] PHP Warning: in file [ROOT]/includes/functions.php on line 4688: Cannot modify header information - headers already sent by (output started at [ROOT]/includes/functions.php:3823)
[phpBB Debug] PHP Warning: in file [ROOT]/includes/functions.php on line 4690: Cannot modify header information - headers already sent by (output started at [ROOT]/includes/functions.php:3823)
[phpBB Debug] PHP Warning: in file [ROOT]/includes/functions.php on line 4691: Cannot modify header information - headers already sent by (output started at [ROOT]/includes/functions.php:3823)
[phpBB Debug] PHP Warning: in file [ROOT]/includes/functions.php on line 4692: Cannot modify header information - headers already sent by (output started at [ROOT]/includes/functions.php:3823)
Poco Forums • View topic - Grep on Junk Word Lists

Grep on Junk Word Lists

Discussion not related specifically to one of the topics below

Moderators: Eric, Tomas, robin

Grep on Junk Word Lists

Postby Mr_Palmer » Mon Aug 23, 2004 10:19 am

I know Poco doesn't have this at the moment, but I'd really like to be able to use GREP pattern matching in the Junk word lists.

Currently if I want to stop all the combinations of the word viagra I'd need a humungous list, but one single Grep would catch 99% of them.

A vast number of spams include web-links to sites with .info or .biz domains. Using grep it's possible to flag up these pesky messages.

Just hope they can add this into a future release.

Mr_P
Mr_Palmer
Drop-in Visitor
 
Posts: 7
Joined: Sat Aug 21, 2004 1:51 am

Postby frazmi » Mon Aug 23, 2004 11:03 am

You might consider using a script to pass the subject and the body of a message to an external grep program. If there's a way to modify the passed file, then Poco could test for some flag indicating spam. A crude version of the script might look like this:

Code: Select all
Random #R1 999999            {Getting part of a 'random' file name
Set $TempFile $temppath      {Windows temporary folder
AddStrings $TempFile "PocoTemp"  #R1 ".tmp"

Set $SpamFlag "[[[SPAM]]]"
Set $GrepCommand "GrepCommand"
AddStrings $GrepCommand " /" $TempFile
{Just guessing at how to construct the grep command.
{GrepCommand would find the spam content in $TempFile and if found
{then would insert a flag (SpamFlag) at the start of the file.
Set $GrepProgram "GrepProgram"
{$GrepProgram might have to be a batch program, which then
{calls the grep program, depending on how grep accepts command line arguments

ReadHeader $Subject "Subject" %message
ReadBody $Body %message
AddStrings $Body " " $Subject
SaveBody $Body $TempFile

ExecuteAndWait $GrepProgram $GrepCommand
{If the grep program can return a "return code" then the script
{might capture and test it. See script language documentation.

{Assuming return code does not work...
OpenBody $NewBody $TempFile
StringPos #Spam $SpamFlag $NewBody

IF #Spam = 0 THEN Done
      AddStrings $Subject $SpamFlag
      DeleteHeader "Subject" %message
      AddHeader %message "Subject" $Subject

:Done
EXIT

The script could be set to run on incoming messages. Please note, this script is not tested at all -- I don't have grep on my system. So it might not work.
frazmi
Poco Enthusiast
 
Posts: 248
Joined: Tue Jul 27, 2004 1:27 am
Location: South Korea

Postby SFCurley » Mon Aug 23, 2004 11:07 am

I might also suggest trying out poco's bayesian filters. In pretty short order, they'll pin down pretty much anything that looks like spam, with impressive accuracy in my experience and using a learned "intelligence" that will hard to match with even good regular expression matching.
SFCurley
 

Postby Mr_Palmer » Mon Aug 23, 2004 12:29 pm

I'm already using Bayesian filters, they've made a big improvement.

But there's still a number of messages that go out of their way to fool even the bayesian filters by adding a bunch of random words or by having minimal text, then using a GIF & web-link to do the rest.

Grep patterns can pick out this sort of sneaky behavior.
Mr_Palmer
Drop-in Visitor
 
Posts: 7
Joined: Sat Aug 21, 2004 1:51 am

Postby Mr_Palmer » Mon Aug 23, 2004 7:45 pm

I have used a batch file for scanning emails in the past, it did trap a large percentage of the rubbish, but the thing is it that any match found on the list of patterns was considered spam. Which can lead to too many false positives.

If grep was built into the Word lists, then you could assign values to each pattern which would make it much more powerful.
Mr_Palmer
Drop-in Visitor
 
Posts: 7
Joined: Sat Aug 21, 2004 1:51 am


Return to General Discussion

Who is online

Users browsing this forum: antameexek, preeheple and 2 guests

cron