October 17, 2004

Why hasn't Google stopped comment spam?

[Update: They did something! Google, MSN and Yahoo! now disregard links with nofollow in the rel attribute of a link. Example: <a href="http://www.example.com/" rel="nofollow">. Go grab a plugin for your blogging system and stop supporting comment spammers! Like the TiVo permalink thing, any relation between the deployed solution and my proposal is largely coincidental and likely detrimental to the solution. People far smarter and more elequent than I came up with the rel solution before me and argued effectively for it. Thank you mysterious strangers!]

That's right, Google.

Comment spam is a problem for lots of people and there are (at least) two parties responsible for each piece of comment spam posted: the comment spammers and Google. Because of their positive image, very few people (with exceptions) look at Google when discussing comment spam. Plenty of people explain that comment spammers are trying to exploit PageRank, but no one complains that Google isn't patching an obvious vulnerability. If this were Microsoft there would be three duplicate posts about this a day until it was fixed.

Why am I focusing on Google? Comment spammers are trying to get links to their sites in order to boost their PageRank. Google doesn't offer a way to opt out of contributing to PageRank, they only offer a way to opt out of indexing all together with robots.txt.

What can they do to stop it? Offer a way for a link not to contribute to PageRank. Use VoteLinks or something like it and I will personally write the Movable Type filter that adds rel="vote-abstain" to all links in comments.

One thing that VoteLinks doesn't address is notifying comment spammers that their asshattery is ineffective before they submit comments. It would be nice if there were a way for comment spammers to check an attribute, like in a <div> around the comments field, that would say "no links in here will contribute to your PageRank." Without that piece the comment spammers will continue their shotgun approach to reciprocal linking in hopes of finding still-vulnerable weblogs. I don't imagine that those vulnerable weblogs will ever go away, but I'm just trying to avoid having to clean up after comment spammers on my own site.

So Google: Don't be evil, clean up the mess you've created.

Posted by george at October 17, 2004 08:14 PM
Comments and TrackBacks

TrackBack URL: http://mt.gnerd.net/mt-gnerd-tb.cgi/334

And while we're at it, we'd might as well blame Henry Ford for the problem of drunk driving.

I like people being able to do a websearch on comments on my weblog posts, as I've found a lot of very useful information by searching on weblog and mailinglist comments. It should really be up to the various weblog implementors to put in some sort of anti-spammer protection into their comment entries. After all, nothing prevents some big spamhouse from writing their own spider for finding comment forms (which is what they did before they got lazy and just used Google).

Google didn't create the spam problem, and IMO it's not their job to fix it either.

There are plenty of easy quick fixes to the spam problem, and why weblog comment engines still don't use something like that by default mystifies me.

Posted by: fluffy at October 23, 2004 01:55 PM

I think you may be misunderstanding my solution. I don't want to keep weblog comments and comment forms from being indexed in Google, that can be achieved now with a robots.txt. What I want to do is allow bloggers the ability to turn off PageRank for links in comments.

Comment spammers are not the same as email spammers. Email spammers are trying to get people to read their message. Comment spammers don't care if you click the link they provide, they are trying to get an increased PageRank in Google by getting a site to "vote" for their site by creating a link from a weblog to their site. Google uses PageRank to choose the order to display search results, and that can affect commerce sites by many orders of magnitude.

Comment spam is caused by Google's ranking algorithm which rewards people for being linked to, combined with weblogs' ability to allow people to post content on a site. If you want to stop comment spam, you have to change one of those two things. My solution changes Google so that spammers aren't rewarded for getting links in comments. The other solutions would involve raising the barrier to entry for posting comments, which reduces their value.

I have no idea what a spamhouse would gain from writing their own spider. They'd find a bunch of comments, but they wouldn't be able to affect PageRank with that information if posting comments didn't affect PageRank, which is the case under my solution.

As for the solution on your site, it would be trivial for spammers to write a spider that would load the form, get the hidden value and include that with the comment spam. It works well if there's only one site using it, but if a million Movable Type blogs start using it the spammers will incorporate it into their software fairly quickly.

Posted by: George at October 24, 2004 05:33 PM

I don't want to turn off pagerank for my comments pages, though, for the same reason that I don't want to stop spidering. If someone links to something in a legitimate discussion about a topic, shouldn't that link be worth something?

Stopping the spammers from posting is the only solution I'd like to entertain, since stopping the second-order effects for spammers also stops the second-order effects for legitimate comments.

Posted by: fluffy at October 25, 2004 10:04 AM

I don't know if if a link from a comment is worth something. I try to avoid deleting comments, even ones that I disagree with, but at the same time Google ranks those links with the same authority as links I make. I don't have a stellar PageRank, but it's high enough that someone wants to trick me into endorsing their site.

One other possible solution would be to rewrite all the urls in a post as http://www.google.com/url?sa=D&q=URLencodedLink which is what Blogger does to prevent Google from PageRanking links, to prevent this very problem.

Now I need to write that and tell figure out a way to advertise to comment spammers the fact that they will not benefit from comment spamming me.

Posted by: George at October 25, 2004 11:42 AM

Instead of the google rewrite/redirect that Blogger does, it would be polite to provide that mechanism on the host serving up the content. Added benifit is the posibility of logging what paths people take away from your website.

Posted by: Gabe at October 25, 2004 12:32 PM

With the url redirecting you suggest you should also exclude 'url', or whatever your redirect script is called, via robots.txt to make sure the spiders don't follow the redirect. Problem is, the hard core comment spammers ususally post their comments with robots that would never check if such a redirect is in place. So you wouldn't give them pagerank, but you would not stop them from commenting either.

Posted by: Thomas Fruetel at October 31, 2004 10:21 AM
Why hasn't Google stopped comment spam?
Excerpt: That's right, Google. Comment spam is a problem for lots of people and there are (at least) two parties responsible for each piece of comment spam posted: the comment spammers and Google. Because of their positive image, very few people (with exce...
Read the rest...
Trackback from: Dave's Chalkboard at October 31, 2004 10:42 AM

A while ago I suggested a possible Google-side solution. The basic idea is that comment spammers are easily identifiable by a simple pattern, namely that they create an unhuman amount of comments in a very short time period. It shouldn't be difficult for a search engine to parse out the dates associated with comment posts and limit the pagerank contributions at a treshold.

That way the basic propagation of pagerank would still work for actual comments, as fluffy suggests above.

If Google actually does something like that, they should be public about it in order to actually decrease comment spam (as opposed to "just" increase the relevancy of their results by that means)

Posted by: Bernhard Seefeld at October 31, 2004 03:45 PM

You can easily opt out of PageRank: put the links behind a redirector, and hide that redirector using a robots.txt.

Ie turn http://example.com/foobar/ into http://george.hotelling.net/redir/example.com/foobar/, have the server rewrite that to a 301 redirect to http://example.com/foobar/. Then tweak your robots.txt to forbid Google from spidering http://george.hotelling.net/redir/ and presto.

Posted by: Aristotle Pagaltzis at November 2, 2004 01:07 AM
Free Advertising
Excerpt: At least, that's what a lot of spammers think that about blogs, a way to get their link count up. While this, in part, is Google's fault, bloggers can not relay on Google to fix the problem. For MovableType users,...
Read the rest...
Trackback from: A peek inside Jesse's Head at November 18, 2004 08:41 PM
Comment Spam and the IMD Blog
Excerpt: Sorry for the recent trouble commenting on the weblog! Let me explain what happened - We use Movable Type to manage the student/class/department weblogs. It's great software because it let's us give dozens of people access to edit and publish...
Read the rest...
Trackback from: USC Interactive Media Division Weblog at November 22, 2004 01:00 PM
Comment Spam? How About An Ignore Tag? How About An Indexing Summit!
Excerpt: Bloggers seem increasingly upset at the comment spam they have to deal with, something driven primarily by those who seek higher search rankings by posting links to their sites into comment areas. To me, the solution seems simple. Why not...
Read the rest...
Trackback from: Search Engine Watch Blog at January 5, 2005 05:57 AM

Sorry, comments are closed.