Killing comment spam in Textpattern

Jan 25 2005

Although it’s nothing like on the scale that the most popular blogs see it, I’ve been having problems with comment spam – mostly of the texas-holdem, online casino and herbal viagra variety – so tonight I decided to do something about it.

Comments are parsed by the comment.php file under textpattern/publish, and the function that we are interested in is saveComment(), around line 277.

It already has some checks in there for empty name, email or comment fields, so it is easy to add a last check against a central list of blacklisted words – just add the following code after the comment_required check:


$spam = file("http://www.thewatchmakerproject.com/blacklist.txt");
for($i=0;$i < count($spam);$i++) {
if (stristr($message,trim($spam[$i]))) {
exit ( graf('You have used words that are in my spam blacklist - if your are making a legitimate comment, please go <a href="" onClick="history.go(-1)">'.gTxt('back').'</a> and try again.') );
}
}

Using the file function, we create a temporary array containing all the words in our spam blacklist, then iterate through it checking to see if any of the words are present in the comment submitted. As soon as one is found, we use the existing Textpattern graf() function to show a message to the commenter and stop processing the comment.

To add new blacklisted words, all we have to do is add a new line to the text file. Obviously if you use this on your own site, change the file reference to your own blacklist text file.

Filed under: Textpattern.

Digg this article

Bookmark this article with del.icio.us

Previously: Click this what?

Next: HCI, Microsoft style


Comments

fathersGrave
2669 days ago
Should it be ‘stristr’ to be more effective (case-insensitive)?
#1
Matthew Pennell
2668 days ago
Good point, well spotted! Code changed.
#2
Stuart
2668 days ago
You’re getting too good with this stuff. I think this is the first piece of anti-spam coding I’ve seen for TXP. I’m assuming from your last paragraph that the txt file needs to be one word per line? And would it only do single words or could you have phrases in the list? You could use this method for a simple language moderation filter too I suspect.
#3
Matthew Pennell
2668 days ago
Yes – you can see in my blacklist text file how I’ve formatted it. It will also handle phrases too (it should really be using preg_match, but I can’t really get my head around regular expressions).

You’re right that it could also be used to filter out (or replace) unwanted/rude words as well.
#4
Stuart
2668 days ago
I know this is a silly question but have you mentioned this on the forum?
#5
Matthew Pennell
2667 days ago
Thread here.
#6
Stuart
2667 days ago
Buddy Bradley??
#7
Matthew Pennell
2667 days ago
Yup, that’s me. ;)
#8
Stuart
2667 days ago
I’m not asking!!
#9
Matthew Pennell
2667 days ago
More evidence of my entirely not-suitable-for-a-30-year-old comic book obsession ;)
#10
Stuart
2667 days ago
Mmmmm.
#11
Antispam
2305 days ago
Ich habe festgestellt, dass Ihre Seite von Spam-Robotern verseucht ist.
Wenn Sie interesse haben, k?nnen wir Ihen anbieten,
Ihre Seite davon zu befreien und vor weiteren Spam-Eintr?gen zu sch?tzen.

Besuchen Sie unsere Seite auf
http://www.eclabs.de/c_antispam.php
#12
stefan
2300 days ago
Great system, I just implemented this as the textpattern-comment-spammers seem to have found my website. Preliminary testing tells me it works. Now I only need to fill my blacklist file (though I am thinking of porting it to a database-system which makes management of the blacklist easier).

Thanks for the great work.
#13