Another Approach at Comment Spam

Akismet Comment SpamYet another WordPress experiment, this time, it’s all about fighting spam! We all know and love Akismet but most of you will agree with me that some of those spammish comments still pass through. So how do we get rid of them? We can’t block all comments with links, they might be useful. We can’t block all comments containing spammish words, those spammish words might be the topic of your blog.

What we can do though is get a clearer picture of who’s submitting the comment — human or robot? Captchas work, but we hate them, and our visitors hate them even more. E-mail verification works, but it’s a pain having to go back and forth only to write a “thank you” message. So I was thinking last week..

What’s do spam comments have in common? Well most of the time they contain links, they have a junky e-mail address, sometimes a spammish name and most of the time with no avatar. But exploring all the spam that was stopped and the spam that came through on my blog, I also noticed that spam comments tend to fill in all fields and never miss one. They sometimes even subscribe to the comments by e-mail, which of course they will never receive.

So having that in mind, and looking from the spambot perspective, what do we see? We see a form with a bunch of fields and a submit button. We don’t see the visual side of it, we only see the HTML that’s behind it. So what’s the typical logic of a spam bot? Fill in all the fields and hit the submit button. There might be more intelligent bots out there, but it basically comes down to this.

Now, what if there was another field which is fake. A field that you don’t have to fill in, but the field name is rather tempting, say “website” and labeled “Website URL”, I don’t think that any spambot would want to miss that, right? But the secret sauce is that the field is wrapped into a parent element, which is invisible. Can spambots render CSS and determine if a field is visible or not? I doubt it, but they could. So give them an extra tempting field to fill in with their spammish URLs, and on the back side of it check whether the field was filled in or not.

Akismet Comment Spam

Thinking forward, this could be done with multiple fields, some with a default value perhaps, a checkbox maybe? Just make sure that your checking on the backend is correct. And once you encounter a comment that touched the field invisible to the users, spam!

I wrote a code snippet for WordPress which adds an extra website field that’s invisible with some simple checking upon comment submission, let’s see how it goes. I’m planning to run this for a week along side with Akismet, and then perhaps a week with Akismet turned off. I’ll publish the results and share the code snippets if I get anything positive, otherwise.. Oh well ;)

Thanks for reading, and let me know if you have any further thoughts on this topic. Cheers!

Update: While the mentioned above method worked to some point, it didn’t stop quite a lot of spam comments so I decided to go the “close comments on entries older than 14 days” route which seems to be working fine for the moment.

About the author

Konstantin Kovshenin

WordPress Core Contributor, ex-Automattician, public speaker and consultant, enjoying life in Moscow. I blog about tech, WordPress and DevOps.

25 comments

  • Just did something similar on a community site I'm running: hide with CSS the URL field, and if the comment contains a commenter's URL, nuke it. Works badass.

    • That's nice to hear Ozh, and even nicer that your comment came through (it was the first one published after I activated the whole ting). Have you done this in pair with Akismet and does it work all by itself for you?

      Spam-bot authors tend to update their software, so I think that the key here is to be different, perhaps unique. A spam-bot author won't bother upgrading his scripts to recognize hidden fields if it were used on just a couple of blogs, but if it went world-wide, they would, so there's really no guarantee… Just thinking out loud here ;)

  • In addition to using a form field designed to be left blank you could also use a method that checks how quickly the form has been filled in.

    Basically, a bot can fill a form in so much faster than a user so if a short form (similar to this one) was completed and sent for processing in under 25 seconds (or however long you decide) you can be pretty sure its a bot.

    Of course this can't ever be 100% but used in conjunction with methods like the one you described im pretty sure you can catch most bots without being a nuisance to your users :)

    • Ryan, I like your thinking! How would you calculate the time, javascript? Or perhaps record the time when the page loads and compare to the time the comment got submitted? If we were allowed to do javascript and assumed bots can't do javascript then I'd also suggest to give an irrelevant action= field for the form and then set it to the correct one on page load. What do you think?

  • So essentially switch the action of the form with javascript? That could definitely work. That way you could also set up a method for storing 'spam only' data just in case a genuine user message got through. Then you could scan through at your leisure and never miss any users data.

    I wouldn't use the timing method with javascript though. Chances of a bot spotting it are low but having something like that on the client side makes me uneasy – far too easy for people to play with.

    I would simply have a timestamp for when the form page was opened and another on the processing page. How you send the original timestamp across is completely up to the programmer but it would be nice if they incorporated a way to ignore the time it takes to load the processing page.

  • Oh also, when im using the blank form field method I personally use a randomly generated field name (sent via session to the processing script to check). That way the creators of the bot can't program it to ignore certain field names – as they never know what they will be :)

    • Thinking ahead, what if my HTML page didn't contain the comment form at all? And one was rendered using javascript that does an AJAX request for the form layout and then prints it on screen? ;)

      That way users won't be able to comment until the form is shown, and spambots wouldn't be able to comment at all (hopefully). And a short message asking the users to refresh the page if they don' see a comment form, to explain what's happening, though they probably wouldn't understand it anyway hehe ;0

      Random field name is a cool addition btw, thanks for suggesting!

      Cheers!

    • Yeah you could do that – but that risks alienating the people with javascript turned off (even if it is only a small percentage these days). I'll think of a way to solve that though if you give me a few mins :)

    • hey great blog!!! nice posts and ideas!
      u can write a input-field with a value via javascript into ur html. if the field parameter or value is not transfered to the submit script, it is spam or a user without JS… maybe it works and you dont need a ajax request…

    • Thanks Huberto, I guess there's a very small amount of users with javascript turned off, but looking at Google, Facebook, Twitter and the other giants, they work well both with turned on and off js, so it might be risky. Too bad Google Analytics can't say js vs no-js visitors this month.. Probably because GA is javascript itself haha ;)

      Cheers!

    • hahaha, yes thats true :-)
      it is really just a small amount of users without JS… the most of them dont know that they can turn it off ;-)
      I measured it on a commercial website with a tracking pixel and a JS tracking… i cant remember the exactly result, but it were less than 2% of the visitors with deactivated JS.

      i am looking forward to ur result and i hope i can use it successfully too ;-)

    • Huberto, difficult to say… At first sight Akismet was doing better, but now I'm getting more spam, though I found another work around which I'll be writing about very soon ;)

  • Pretty nice trick ti do with spams, I hate receiving tens of notification emails that I've comments to manage and found all them spams, I'm waiting your experiment results,

  • Nice trick – but anyone implementing it should be aware that screenreaders for the visually impaired may also read your "hidden" fields: in fact, most of the CSS tricks for skip-links depend on just that. I don't know offhand of anything that would reliably trap bots without messing with disabled users, but it would be great if there were something…

    • Amy, you have a valid point there. My thinking is, since fields are labelled and labels are read out by screen readers, why not have your label say "please do not fill in the next field" or something? :)

    • My favorite method is to add a URL field just as described in this post — but instead of styling it with CSS, I wrap it in a comment. Screen readers should successfully ignore comments. In five+ years of using this approach, it is STILL stopping the spambots. So I guess they don't learn so fast after all :)

  • Nice idea. There is a wordpress-plugin called NoSpamNX which is doing what you´re writing. But the spammers are always also up to date. Perhabs the plugin must be adjusted so the spammers don´t know the change….

  • I use this method frequently, and let me tell you — it ABSOLUTLEY WORKS. I've had a site go from getting 50+ spam comments a day to zero just by using one, innocent little hidden form field. :)

    I wish it was my idea, but I read about it on some blog a few years back. Great advice!