Computers Internet Blog (welcome to my access.log)

Welcome! You can probably guess what three things this entry is about. When you already know about computers, the Internet and blogs, this is of course the first thing you’d feel a desperate need to Google for, of course! Err…wait, Magic 2^16 Ball is telling me you just found this phrase in your serverlogs and got curious just WTF it is. And why all these people are searching for it (in fact, so many that a nonzero number of them are clicking through to your blog, which goes under a completely different name and may be only tangentally related to any of these topics, if at all), and yet this hotly sought-after site has somehow managed to fly under the radar of every major search engine.


Rude bot business cards
Computers internet blog
in my server logs

Many Web stats programs (including Analog, which I use) have this feature where they parse the referrer field for hits from major search engines, and list the top search queries people used to find your site. I’ve been seeing the phrase pop up in my blog’s stats semi-regularly lately, sometimes upwards of 25 hits a day. So tonight I got curious, pulled the logs and had a closer look.


    64.22.110.34 - - [10/Oct/2007:05:34:16 -0700] "GET /?p=365 HTTP/1.1" 200 26307 "http://www.google.com/search?q=computers+internet+blog" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2)"
    64.22.110.34 - - [10/Oct/2007:05:34:17 -0700] "POST /wp-comments-post.php HTTP/1.1" 200 84 "http://tim.cexx.org/?p=365" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2)"
    [...]
    75.126.132.23 - - [10/Oct/2007:13:54:26 -0700] "GET /?page_id=342 HTTP/1.1" 200 36704 "http://www.google.com/search?q=computers+internet+blog" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2)"
    75.126.132.23 - - [10/Oct/2007:13:54:27 -0700] "POST /wp-comments-post.php HTTP/1.1" 200 84 "http://tim.cexx.org/?page_id=342" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2)"
    [...]

Notice a pattern? These blog-seekers must not be happy that this isn’t the page they were looking for, because they sure seem eager to post a comment. In fact, they dive immediately for wp-comments-post.php within a second of page load, a superhuman feat if you ask me. (As I write this, WordPress cheerfully reports that it took 0.855 wall seconds on my shared server to generate one of those pages.)

Okay, so it’s some stupid comment-spam script leaving its calling card. What to do, what to do? Well, mod_rewrite is installed by default on many servers, and can be used to give different users different pages, depending on such things as their HTTP_REFERER. I recommend the output of /dev/urandom, or better yet, a redirect to a hefty Microsoft Service Pack download (assuming the bots support redirect).

(Obviously, I can’t implement it here now, because I’ve just posted about the ever-mysterious “internet blog” and drawn legitimate search traffic.)

mod_rewrite examples: Just add this to your .htaccess file (create if necessary)


# Plonk this stupid bot with a 403 Forbidden:

RewriteEngine on
RewriteCond %{HTTP_REFERER} search\?q=computers\+internet\+blog$
RewriteRule (.*) - [F]

Or, if you prefer dickhead A to go and waste the bandwidth of dickhead B…


RewriteEngine on
RewriteCond %{HTTP_REFERER} search\?q\=computers\+internet\+blog$
RewriteRule (.*)$ http://www.example.com/rubbish.iso [R,L]

In both cases, the RewriteCond line checks for the fairly malformed Google URL used by the spammer script (a real Google search query will have other junk before the “q=…”, so we let them slide), and the RewriteRule sends it packing.

Have fun!

4 Responses to “Computers Internet Blog (welcome to my access.log)”

  1. Stephen says:

    Damn Robots!!! This should help. I’ll try it and report back.

  2. Stephen says:

    No rude robot postings this morning!!! Perhaps they took a night off. There are two reasons that the posting might have stopped, a) I implemented the fix indicated above, or b) I reported them to abuse@t35.com – which I found to the be referrer – after a careful search of the log looking for the “get” followed by the “post”. I’ll report back again tomorrow.

  3. […] In comes the phrase “computers internet blog” which there is no way six people a day are coming to my page with that search. And I am right. A little Googling will indicate they are are a essentially a comment posting bot. In research I came across a line that a real Google referrer has much more stuff that is missing from these google referrers. From my logs the offensive referrer look like this: […]

  4. Alex Newell says:

    Thanks for this – I’ve been puzzling over this nonsense in my stats package for a while and asking myself why I’m getting such crazy unqualified traffic. I haven’t yet tried the Mod rewrite!

    :-)

    Alex

Leave a Reply