Return Styles: Pseud0ch, Terminal, Valhalla, NES, Geocities, Blue Moon.

Pages: 1-

Spam filter [PERL]

Name: Anonymous 2010-05-04 6:51

This is driving me insane! I've been trying to make a working spam filter for a long time but I still cannot accomplish this simple task!

my @SPAM; # Spam check
tie @SPAM, 'Tie::File', BOARD_SPAM, memory => 20000000, mode => O_RDONLY;
(tied @SPAM)->flock(LOCK_SH) if -e BOARD_SPAM;
for (@SPAM) {
    /^(.*)$/;
    Abort('Comment contains a blacklisted item') if $comment =~ /$1/gi;
}
untie @SPAM;


BOARD_SPAM = spam.txt file. Inside this text file are two items:
test.1
test;2


So, whenever the comment contains any of those two lines anywhere, the script is supposed to abort with an error.

However, this does not happen!

What happens is this: If you have test.1 in the comment, it does NOT abort the script, as it should.

But if you have test;2 in the comment, it works perfectly and it aborts. It only seems to filter the LAST line of spam.txt

I do not understand why it does this. for (@SPAM) means it should go through every single line, not just the last line, right?!

Name: Anonymous 2010-05-04 7:01

Why not just use JS?

Name: Anonymous 2010-05-04 7:12

if(open(my $SPAM, '<', BOARD_SPAM))
{ flock($SPAM, LOCK_SH);
  for(<$SPAM>)
  { Abort('Comment contains a blacklisted item') if $comment =~ /$_/gix; }
  close($SPAM); }

Name: Anonymous 2010-05-04 7:20

>>3
Does nothing. Still fails to filter the first line.

Name: Anonymous 2010-05-04 7:32

>>4
you obviously didn't try it.

Name: Anonymous 2010-05-04 7:35

>>4
FILTER MY ANUS

Name: Anonymous 2010-05-04 7:53

>>5
Yes I did. I've been trying so long to get it to work. I wouldn't pass on the opportunity to finally find a solution.

Name: Anonymous 2010-05-04 8:13

>>7
$ cat test.pl
#!/usr/bin/perl

use constant BOARD_SPAM => 'spam.txt';

sub Abort($) { die shift; }

my $comment = 'test.1';

if(open(my $SPAM, '<', BOARD_SPAM))
{ flock($SPAM, LOCK_SH);
  for(<$SPAM>)
  { Abort('Comment contains blacklisted item') if $comment =~ /$_/gix; }
  close($SPAM); }
$ cat spam.txt
test.1
test;2
$ perl test.pl
Comment contains blacklisted item at test.pl line 5, <$SPAM> line 2.
$

Name: Anonymous 2010-05-04 9:04

I tried this on my web server, but the filter still doesn't trigger... Do you think this has something to do with the server itself rather than the code?

Because looking at your code, it doesn't actually do anything different than the one I posted, other than the fact that it doesn't use Tie::File

Name: Anonymous 2010-05-04 9:20

>>9
it also leaves out the extra regex obfuscation, and adds an x onto the end of that one line.

Name: Anonymous 2010-05-04 9:46

for(email in inbox) { if(spam(email)) { move_to(email,spam_folder); } }

Name: Anonymous 2010-05-04 9:58

>>11
for(blacklisted items) { if(item found in comment) { abort("piss off!") } }

Name: Anonymous 2010-05-04 10:44

How do you know it's filtering only the last line if there are only two lines in the file?
Also, . is a special character.

Name: Anonymous 2010-05-04 11:05

/^(.*)\r?\n?$/ or next;

Name: Anonymous 2010-05-04 11:08

>>14 continued

Also, instead of posting on random forums about this, you should have tried to debug the problem yourself. Simple print "[$1] [$comment]\n" would have shown you exactly what's wrong with what you're doing.

And before some smart asses start accusing perl of being too unpredictable, C's fgets behaves the same way -- does not append a newline to its output if last line of file doesn't have a newline after it.

Name: Anonymous 2010-05-04 11:12

>>13

Thanks for you comment!

I figure it out. Apparently, you cannot use spaces in the spam.txt file. Instead of "blacklisted item" you have to use "blacklisted\sitem"

I don't understand why it works this way, but it does!

Name: Anonymous 2010-05-04 11:15

>>16
Also, even though . is a special character in regex. It still filtered out test.1

Which makes things even weirder. Space is not acceptable but . is?

Name: Anonymous 2010-05-04 11:39

>>16
Each line in the spam file is treated as an actual regex. You should begin the regex with \Q (optionally ending in \E), to `quote' the search string (make special characters literal versions of themselves).

This example helps explain it better: http://perldoc.perl.org/perlfaq6.html#How-can-I-quote-a-variable-to-use-in-a-regex?

Name: Anonymous 2010-05-04 11:41

>>18
fuckin' URLs

Name: Anonymous 2010-05-04 11:45

>>11
I just realized what an innovative feature multi-part function names would be:
move(email)to(spam_folder);

Name: Anonymous 2010-05-04 11:50

>>18
But a simple space is not a regex special symbol. There's no apparent reason why I need to use \s instead.

And thanks for \Q tip, that works too.

Name: Anonymous 2010-05-04 11:54

>>21
I spoke too soon! \Q doesn't work, either by using it in the spam.txt file or /\Q$_/

Name: Anonymous 2010-05-04 12:26

>>22
This is silly.  Why are you using regex matching if you don't want regex matching?

Name: Anonymous 2010-05-04 12:34

seriously, dont use regexp if all you want is a exact match in a file one any line.

Name: Anonymous 2010-05-04 13:10

>>20
Congratulations, you invented smalltalk/objc selectors.

Name: cheap ugg boots 2010-08-07 2:58

Each line in the spam file is treated as an actual regex. You should begin the regex with \Q (optionally ending in \E), to `quote' the search strin

Name: North Face Pink Ribbon 2012-09-27 21:29

Name: Anonymous 2012-09-27 21:39

>>26-30
wat

Name: Anonymous 2012-09-27 21:58

hacker jews hijacked this thread

Don't change these.
Name: Email:
Entire Thread Thread List