{"id":629,"date":"2009-04-03T01:01:52","date_gmt":"2009-04-03T05:01:52","guid":{"rendered":"http:\/\/tim.cexx.org\/?p=629"},"modified":"2009-04-03T01:01:52","modified_gmt":"2009-04-03T05:01:52","slug":"also-an-experiment","status":"publish","type":"post","link":"https:\/\/tim.cexx.org\/?p=629","title":{"rendered":"(Also: An experiment&#8230;)"},"content":{"rendered":"<p>In the <a href=\"https:\/\/tim.cexx.org\/?p=623\">last post<\/a>, I made the unspeakable blargger mistake of linking to an article on a news site, which means in 7 days or so, instead of said article this link will return absolute crap and\/or a &#8220;Buy membership now!&#8221; nag screen. Trying to keep up with such link rot (if anyone bothered) is a problem that grows linearly with the number of posts\/articles written, until it reaches 100% of the blogger&#8217;s time and he\/she\/subject\/verb has to stop writing any more posts and become a forest ranger. I&#8217;ve ranted this before with some <a href=\"http:\/\/boards.cexx.org\/index.php?topic=5542.0\">possible solutions<\/a>, but as you may have guessed based on my project completion record to date, didn&#8217;t get around to it (got maybe as far as writing a toy script that wget&#8217;s pages and stuffs the contents into a database record).<\/p>\n<p>So a little experiment: Instead of linking to the article directly, I linked to a carefully-constructed &#8220;I&#8217;m Feeling Lucky&#8221; Google query containing unique phrases contained in the article. The idea is that as the site shuffles stuff around \/ deletes content \/ recycles numeric links, rather than a 404* the link should preferentially return a clean copy of the article from somewhere else on the Internet if it exists (syndicated copy, fulltext copy-paste into a blog\/slashdot post somewhere, etc.).<\/p>\n<p>Let&#8217;s see if it lasts any longer than a regular news-site link!<\/p>\n<p>(For anyone interested, the actual query is:<\/p>\n<p> http:\/\/www.google.com\/search?q=%22A+company&#8217;s+backroom+mass+of+servers+and+switches+is+cloudlike.+So+are+social-networking+sites+like+Facebook+Inc.%2C+or+the+act+of+buying+a+book+on+Amazon.+Some+clouds%2C+like+Google&#8217;s+email%22&#038;btnI=Lucky<\/p>\n<p>The &#8220;%22&#8243; at the beginning and end of the query string itself is the URL-safe encoding for a double-quotation mark (ASCII code 0x22), so that the quote marks in the query don&#8217;t conflict with the quote marks in the &lt;a href=&#8221;&#8230;&#8221;&gt; tag. To simulate a click of the &#8220;I&#8217;m Feeling Lucky&#8221; button, replace the button-type code that normally appears in the query (btnG=Search) with &#8220;btnI=Lucky&#8221;. Also note that apparently Google limits queries to a maximum of 32 words.)<\/p>\n<p>* Modern commercial sites <a href=\"https:\/\/tim.cexx.org\/?p=242\">seldom, if ever, actually return a HTTP 404<\/a> code when a document is not found, since software including search-engine spiders detect these and drop 404&#8217;d pages from their listings. it&#8217;s far more profitable to pretend the user\/bot has reached some kind of non-error document, swap in a generic landing page and stuff it full of keywords and advertising.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>In the last post, I made the unspeakable blargger mistake of linking to an article on a news site, which means in 7 days or so, instead of said article this link will return absolute crap and\/or a &#8220;Buy membership now!&#8221; nag screen. Trying to keep up with such link rot (if anyone bothered) is [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_FSMCFIC_featured_image_caption":"","_FSMCFIC_featured_image_nocaption":"","_FSMCFIC_featured_image_hide":"","iawp_total_views":1,"footnotes":""},"categories":[4],"tags":[108,107],"class_list":["post-629","post","type-post","status-publish","format-standard","hentry","category-geek","tag-108","tag-link-rot"],"_links":{"self":[{"href":"https:\/\/tim.cexx.org\/index.php?rest_route=\/wp\/v2\/posts\/629","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/tim.cexx.org\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/tim.cexx.org\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/tim.cexx.org\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/tim.cexx.org\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=629"}],"version-history":[{"count":3,"href":"https:\/\/tim.cexx.org\/index.php?rest_route=\/wp\/v2\/posts\/629\/revisions"}],"predecessor-version":[{"id":632,"href":"https:\/\/tim.cexx.org\/index.php?rest_route=\/wp\/v2\/posts\/629\/revisions\/632"}],"wp:attachment":[{"href":"https:\/\/tim.cexx.org\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=629"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/tim.cexx.org\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=629"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/tim.cexx.org\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=629"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}