Lampreys of the Web
March 30, 2007
Posted by Dan Edelen in : Blogging, Technical Functions : Trackback,
Print This Post
,
Email This Post
Growing up in Ohio, I used to constantly hear about the lamprey problem in Lake Erie. Eel-like fish cruised the lake, ready to attach to some helpless walleye and suck its guts out. At least that was the image I had. When you’re a kid, having your guts sucked out plagues a lot of your dreams.
I hear that lampreys aren’t such a problem in Erie anymore, though game fish in Michigan, Huron, and Superior
still suffer from the parasites.
Cerulean Sanctum’s had its share of lampreys of the Web lately. If you’re a blogger, I hope you read on.
My bandwidth’s gone through the roof in the last few months. I mentioned a couple weeks ago that Yahooligans over at Yahoo! bombarded my site with multiple bots from multiple servers, sucking down several hundred MB of this blog a day. Though contrite, Yahoo’s Inktomi Slurp bot got a message in my robots.txt to stop lurking around like a shiftless teenage mallrat.
The second I got that cleared up, some cryptic bot started pounding the blog to the tune of nearly a thousand hits a day. I traced it back to Texas A&M’s computing department. Dropped them an e-mail to ask what gives, only to get a reply telling me they were experimenting with a search engine, I guess of their own design. They apologized. Still, I have something to say to those Aggie experimenters: Hook ‘Em Horns!
THEN the Communists got in on the thrashing, some unreachable Sino-Internet company chewing up my bandwidth with their private label “Made in China” search engine. I’m halfway tempted to just ban their IPs altogether, but then if a way exists for some person in China to read Cerulean Sanctum and find Christ, how can I pull the plug? Cerulean Sanctum failed the Great Firewall of China test last time I checked, but if some content’s getting through, I guess I’ll just have to grin and bear it.
Now comes the big gun. In the last couple weeks, Microsoft’s been hammering this site. The entire server footprint of Cerulean Sanctum, database, Wordpress, images, and backups is about 100MB. Unbelievably, Microsoft’s lamprey-like MSNBot has been sucking the innards of this site to the tune of almost 600MB per day. Six times the total content! To the boys and girls in Redmond: Folks, I know you’re the Avis of search, but would you stop trying harder on my site? Keep it up, and I’ll send Bill the next bill from my ISP.
For all you fellow bloggers, do check your logs. My bandwidth usage has tripled since the start of the year. Much of that comes from these insane “my bot’s bigger than your bot” shenanigans. The rest of it comes from one other onerous source.
I put images in my posts long ago to spruce up the look. Not many put graphics in their posts then, but today it’s more common. Until recently, I didn’t have much trouble with people hotlinking to those images. But in the last few months, I’m getting more and more people who don’t understand that this is a huge Web no-no.
If you’re a novice blogger, NEVER insert an image link in your blog (or MySpace pages, the worst offenders) that pulls an image from someone else’s server. Download the picture yourself (as long as it’s not copyrighted and its owner thinks it’s okay) and host it on your own server. Nearly every image in my collection is public domain, which makes the hotlinking issue even more galling.
Leaving my images open to search was my choice. I hoped that someone searching for an image might actually stumble across Cerulean Sanctum and find a reason to stay. I consider this blog a ministry, so I try to keep it accessible. Still, I’m going to have to put in some MOD_REWRITE calls in Apache to keep all the hoodlums out. One image I got from a free stock photo site became the darling of MySpace and I’ve been getting several hundred hits daily from MySpace on that file alone. No more. I close the rest of the holes later this weekend. I wish I didn’t have to do that. I’m no wizard with .htaccess, so this problem consumes a lot of my precious time and resources.
So again, check your logs, folks. You never know what lampreys are lurking in your own little pond.
Tags: Apache, Bandwidth, Blogging, Bot, Bots, China, Communists, Hotlinking, Hotlinks, Lamprey, Lampreys, Logs, Microsoft, MySpace, Parasite, Parasites, Robots.txt, Search, Technical, Texas-A&M, Yahoo!




BTW, if anyone out there is a real Apache guru, I need some serious help trying to do some redirects using regex patterns too sophisticated for me to figure out. Drop me a line at the e-mail at the top of the sidebar.
Thanx.
What pattern are you trying to match on? Can you gave a sample
Thanks for the public service announcement. Now if I just had some way of getting this article translated into common-man speak!
Just wanted to let you know that you have written some very thought-provoking articles the past few weeks. Being a cessationist I have refrained from commenting becuase I want to observe, listen and learn and not argue. I appreciate very much your passion for the Lord and His Church!
Don,
Bandwidth is the amount, in MB or GB, of data transferred from the server that hosts your site to those who request information from it. The more “hits” your site gets, the more data is transferred, and the more bandwidth you consume. Most Internet Service Providers set limits on bandwidth. Exceed your posted limit and you may get charged extra money.
Bots are automatic programs sent out by search engines to scan your site for content they can index. Googlebot, Inktomi Slurp, and MSNBot are the bots from Google, Yahoo!, and Microsoft respectively. Some others exists. Some of those can cause problems and are up to no good. In theory, you should be able to ban them from your site.
I explained hotlinking in the post. Don’t hotlink to files and don’t allow others to increase your bandwidth by leeching off your files.
As for the posts here recently, I feel as if I broke the streak by posting this one today, but since I typically don’t post on Fridays, I thought a technical post wouldn’t hurt. Thanks for the kudos. What you’ve seen here lately is all God, so give Him the glory.
Blessings. Have a great weekend.
I’m new to your blog…linked over from my Pastor’s blog (pasturescott.org). I love what I’m reading here…learning a lot. Thank you for having the courage to step out and tell it like it is.
BTW, the fellowship I worship with every Sunday sounds much like yours although no-one has danced in the aisles yet. Someday soon I’m sure the Holy Ghost will have His way and our feet will become very happy!
Marie,
Thanks for being a reader! I pray you’re blessed by what I write here. It’s often incendiary, but I try not to just critique. I try to provide solutions, too.
Blessings!
And here I thought you’d be talking about me pitting you against Piper.
Travis,
You seem pretty Web savvy. You have any skills in Apache? I need help trying to send some commands via .htaccess.
Thanks.
I’d like to say yes, but I usually break my blog whenever I start tinkering under the .htaccess hood.
I found a decent way to prevent hotlinking (but still allow image indexing by specific search engines)… I’ll send you an e-mail with that portion of my .htaccess file. 
As a convicted and exposed lamprey (I don’t think I resemble the picuture. I’m not that slim.)… I appreciate how you are dealing with this. Being a novice blogger (Only a few months in service) I didn’t realize this was: A) A no-no, B) A problem.
Thus I submit a passage of Scripture for consideration…
In Leviticus 5:4-6 it is written (Notice the “unaware” element)… “‘If a person thoughtlessly takes an oath to do anything, whether good or evil — in any matter one might carelessly swear about — even though he is unaware of it, in any case when he learns of it he will be guilty. 5 “‘When anyone is guilty in any of these ways, he must confess in what way he has sinned 6 and, as a penalty for the sin he has committed, he must bring to the Lord a female lamb or goat from the flock as a sin offering; and the priest shall make atonement for him for his sin. (NIV)
Obviously my personal situation has nothing to do with an oath, but it does have to do with honesty and integrity. AND since we’re no longer under the sacrifical system, I am trying to do the next best thing. Ask for forgiveness from the person and the Lord… then CLEAN UP MY SITE. The process of fixing my site will take a little while, but God willing I’ll get it done soon.
Blessings!
Ron,
You’re exonerated!
Hook ‘Em Horns!
Saw ‘Em Off!
Gig ‘em!
-Ray (A&M grad ‘91)
Ray,
As an alumnus, tell your Aggie brothers and sisters to go easy on us bloggers with that new search engine (or whatever it is) they’re testing. A thousand hits a day is too many!
Mr. Dan,
You can shoot me bro. I linked one image on one of my posts from your site. I am going to go right now and correct that. As a graphic artist I should know better, and I was lazy.
Please please accept my sincerest apologies, and know that I will never do this again (to anyone)!! I understand your frustrations, and am in real need of your forgiveness.
Bill,
Grace, brother! It’s the legion of MySpacers who do this (and know about it) that really get me.
I checked my logs just the other day in response to your first post. Yahoo was the biggest bandwidth of the bots, but no bigger than it’s been in the past year. Maybe those new bots haven’t found me yet.
I must confess that I likely have linked to images on other sites in the past (not yours, I don’t think). I just didn’t think about it. I don’t link anymore, I copy.
I know that my host has a way other than .htacces in the control panel (they use cpanel) to block external linking, perhaps there’s one in yours too.
Right click, download image to desktop, save: insert image. Amazing how many people can’t just add a couple simple steps. Down with MySpace; slowly becoming an evil empire in itself.
I enjoy your posts, even the techno-babble (I understand at least 80% of it)!
Blessings to you!
I try to avoid the technobabble, Aaron, but I can’t be spiritually incisive every single day!
Actually, my technobabble is meant to help others (even if that “other” is occasionally me). If I’m being whacked this hard by search engines and hotlinkers, other folks might be, too. And bandwidth costs money!
Set your image directory off limits to search engines. SEs don’t need to index royalty-free images a second time. Plus this will cut down on the number of people who find your images with a text search.
Michael,
I leave the directory open in hopes that someone will be looking for an image, then actually visit the blog. While you might think this isn’t an effective means to attract readers, you’d be amazed at how many e-mails I get to the contrary.