jump to navigation

Apologies, Splogs, Scrapes, and Bruises
December 3, 2007

Posted by Dan Edelen in : Blogging, Technical

Functions : Trackback, Print This Post Print This Post , Email This Post Email This Post

Feeling a bit woozy in the bits...Apologies to readers who attempted to comment over the weekend. I’ve been dealing with an issue and the fix I installed only succeeded in blocking every comment made to Cerulean Sanctum’s comment section.

The problem concerns splogs, fake blogs that scrape RSS feed content looking for a particular buzzword (like cars, RSS, blogging, bathtubs, hair care, etc.) . When a splog finds its objective buzzword, it sucks out a portion of the RSS feed around that buzzword and posts it on the splog along with the URL back to the original post that was scraped.

The splogger attempts to game Google’s PageRank algorithm to rocket his splog up to the top of searches. Some tie their splog to AdSense money, but some don’t. For those who don’t, the only reason I can see for their splog to exist is to sell the splog’s domain name for more money one day because the Google PageRank for that domain name is high. Otherwise, I can’t see any reason for a splog to exist.

I use WordPress for this blog. When another blog posts a link back to the source blog cited, WordPress automatically registers that link as a pingback. (A pingback is a sort of automatic trackback. All trackbacks and pingbacks appear as normal comments within the comment section of a WordPress blog.)

But as all pingbacks are created equal, WordPress can’t determine if the pingback is coming from a real blog or from a splog.

Unfortunately, WordPress has a horrible oversight in its handling of trackbacks and pingbacks. First, it view pingbacks and trackbacks as equals, but they are not the same. A trackback has to be sent manually from a third-party blog to register in the comment section of the blog being referenced. Obviously, someone operating a splog cannot do manual trackbacks or else he’d spend all day inserting trackback URLs. That defeats the purpose of the splog, which is to scrape thousands of feeds and automatically post portions of them along with a link back to the scraped source. The splogger can’t send thousands of manual trackbacks, so trackbacks are less problematic. I almost never get any trackback spam.

Pingbacks, because they occur automatically, are another beast altogether. And splogs exploit them mercilessly. In the last two months, I’ve seen splog pingbacks at Cerulean Sanctum increase tenfold.

WordPress’s weakness here is that I can’t just turn off pingbacks without turning off trackbacks. I don’t want to turn off trackbacks. Nor do I want to turn off legitimate pingbacks.

Now plugins do exist that will force trackbacks and pingbacks into moderation. However, most of them also penalize commenters. I don’t wish to do that, especially since those plugins tend to excessively penalize first-time commenters.

Several blogs I know use captchas. I hate captchas with a passion because I tend to tab-browse sites. I post on another blog, hit submit, and I’m onto my next tab. What happens with a captcha is that another screen comes up with the captcha on it, but I never see it because I’m already reading my next tab. I don’t have all day to wait for a captcha screen to load (and since most of them are called from third-party sites, those pages can load slowly). I also just plain hate captchas because they fool me, too! Then I have to wait for that slow page to reload again, only for it to say, “Uh, Mr. Surfer Dude, you couldn’t tell that third letter was a j and not an i?”

In short, no captchas here.

An old WordPress plugin does exist that sends pingbacks alone to moderation by default. But it doesn’t work with version 2.x of WordPress. And while my two spam blockers do an exceptional job of filtering spam, they’re not the greatest at catching splog pingbacks because they see the splog as a legitimate source of a pingback. The level of heuristic available for discerning splogs from real blogs just ain’t there yet.

What threw a wrench into the comments here the last few days was a plugin I installed. I’ve been discussing this issue at WordPress’s forum, and a reader there enclosed a whipped-up plugin to automatically send all pingbacks to moderation. Only it croaked the second anyone tried to comment here, inserting a null comment that prevented all comments after it. And that went for reader comments, not just splog pingbacks.

So I’m back to square one. When this post posts a minute after midnight on Monday, I’ll probably have a half dozen splog pingbacks to delete when I wake up. Such is the nature of the beast right now. I wish someone out there could jury-rig a moderate pingbacks plugin that works with WordPress 2.x, Akismet, and Spam Karma 2. But as it stands right now, many WordPress bloggers are in this same boat.

:-(

Share/Save/Bookmark

Tags: , , , , , , , , , , , ,


Related posts:

RSS feed | Trackback URI

10 Comments »

Comment by Dan Edelen
2007-12-03 10:20:15

Ironically, the first comment is a pingback, but at least this one’s legit!

 
Comment by David Riggins
2007-12-03 11:01:12

Amazing the amount of concept-specific language in use on this post. I am in awe. Truly. It’s almost…Christian.

Comment by Dave Block Subscribed to comments via email
2007-12-03 13:00:09

What a funny comment, David. Thanks!

 
 
Comment by salguod Subscribed to comments via email
2007-12-04 13:32:05

I never knew that a pingback and a trackback were different. Movable type (which I use) doesn’t have pingbacks, only trackbacks. However, they can be auto-generated (so maybe they’re more like pingbacks?) and TB spam has been a huge problem from many MT blogs.

I have a nifty anti-spam plugin for MT called Ccode/Tcode. What it does is create a hidden form field that is required for comments to go through. The value of the field, I think, is generated by the contents of the post itself, so it’s different for each post. The only way that field is submitted is if you actually click the ‘post’ button, so auto generated spam comments get blocked. So it’s sort of like a hidden, automatic captcha.

I used to use a captcha and it was very effective, but I too find them annoying to deal with. Blogger’s is awful.

I’m not sure how it blocks trackback spam, but it does a phenomenal job.

Not that it helps you, unless you want to switch to MT. :D

 
Comment by salguod Subscribed to comments via email
2007-12-04 13:34:15

Uh, so I made that above comment and lo and behold - I got a captcha screen! Heh, I guess you gotta do what you gotta do.

BTW - There was also some database error gibberish. Unfortunately, I didn’t copy and paste it.

Comment by Dan Edelen
2007-12-04 15:12:29

Doug,

What??? I have no captchas installed on my system at all. Are you sure you didn’t have multiple tabs open and confused a captcha elsewhere with one from here?

 
 
Comment by salguod Subscribed to comments via email
2007-12-04 15:51:20

I don’t think so. The funny thing is the second comment didn’t give me one. Let’s see if this one does … :D

 
2007-12-04 21:51:29

Dan:

The splogging problem is not pingback and trackbacks.

It’s RSS.

Basically, what the sploggers do is that they have a program that fetches your RSS feed, from there they can get your last 10 articles. From these ten articles, they get the main URL of a blog post and knowing WordPress dynamics, copy the URL address and add “/trackback” to the end of the main URL address and send it back to you via WP’s pingback/trackback sysem.

I have been splogged lately as Akismet has caught many trackbacks and deemed them as spam. However, some do goto moderation that I mark as spam for Akismet to read.

However, the worse one that I have had is one where a splogger somehow takes every fifth word of my copied post and changes it to a synonym of the original word and claim that it was his post. From what I read online, there is a splogger program within the splogger community that actually does this for them automatically and changes the copyright info.

The problem I have is that there are some Christian sites that syndicate me (with my permission) and therefore, I can not use a plugin like digital fingerprint that can add ‘garbage’ midway in your feed to where you could google search your ‘garbage’ and see who is sploging you. Plus, WP also stops RSS syndication when you use the command in a post and therefore sometimes you may never see the fingerprint.

 
Comment by Normandie
2007-12-12 20:15:09

Okay, folks, I’m feeling my age or something. I haven’t a clue what you’re talking about! Pingbacks? Trackbacks? Captchas?

I don’t “text message” (when did something like that become a verb? either.

 
Name (required)
E-mail (required - never shown publicly)
URI
Subscribe to comments via email
Your Comment (smaller size | larger size)
You may use <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong> in your comment.

Trackback responses to this post