Killing comment spam in Textpattern
Although it’s nothing like on the scale that the most popular blogs see it, I’ve been having problems with comment spam – mostly of the texas-holdem, online casino and herbal viagra variety – so tonight I decided to do something about it.
Comments are parsed by the comment.php file under textpattern/publish, and the function that we are interested in is saveComment(), around line 277.
It already has some checks in there for empty name, email or comment fields, so it is easy to add a last check against a central list of blacklisted words – just add the following code after the comment_required check:
$spam = file("http://www.thewatchmakerproject.com/blacklist.txt");
for($i=0;$i < count($spam);$i++) {
if (stristr($message,trim($spam[$i]))) {
exit ( graf('You have used words that are in my spam blacklist - if your are making a legitimate comment, please go <a href="" onClick="history.go(-1)">'.gTxt('back').'</a> and try again.') );
}
}
Using the file function, we create a temporary array containing all the words in our spam blacklist, then iterate through it checking to see if any of the words are present in the comment submitted. As soon as one is found, we use the existing Textpattern graf() function to show a message to the commenter and stop processing the comment.
To add new blacklisted words, all we have to do is add a new line to the text file. Obviously if you use this on your own site, change the file reference to your own blacklist text file.
Filed under: Textpattern.
Bookmark this article with del.icio.us
Previously: Click this what?
Next: HCI, Microsoft style
Comments
- fathersGrave
- 2669 days ago
- Should it be ‘stristr’ to be more effective (case-insensitive)?
- #1
- Matthew Pennell
- 2668 days ago
- Good point, well spotted! Code changed.
- #2
- Stuart
- 2668 days ago
- You’re getting too good with this stuff. I think this is the first piece of anti-spam coding I’ve seen for TXP. I’m assuming from your last paragraph that the txt file needs to be one word per line? And would it only do single words or could you have phrases in the list? You could use this method for a simple language moderation filter too I suspect.
- #3
- Matthew Pennell
- 2668 days ago
- Yes – you can see in my blacklist text file how I’ve formatted it. It will also handle phrases too (it should really be using preg_match, but I can’t really get my head around regular expressions).
You’re right that it could also be used to filter out (or replace) unwanted/rude words as well. - #4
- Matthew Pennell
- 2667 days ago
- Thread here.
- #6
- Matthew Pennell
- 2667 days ago
- Yup, that’s me. ;)
- #8
- Matthew Pennell
- 2667 days ago
- More evidence of my entirely not-suitable-for-a-30-year-old comic book obsession ;)
- #10
- stefan
- 2300 days ago
- Great system, I just implemented this as the textpattern-comment-spammers seem to have found my website. Preliminary testing tells me it works. Now I only need to fill my blacklist file (though I am thinking of porting it to a database-system which makes management of the blacklist easier).
Thanks for the great work. - #13