Archive for December 2008

Boxing day Tweetable Tweets

Okay, so I lied. Some of these are from two days ago..

Stilgherrian
stilgherrian “Christmas ruined as Sarah Palin shoots Rudolph” http://is.gd/cYaU
Stilgherrian
stilgherrian Just discovered NewsBuiscuit! “Children ‘getting over-excited’ about going to church on Christmas morning”: http://is.gd/d6pz
Harley Dennett
harleyd I just had to explain who ABC radio host Julie McCrossin was to an ABC reporter who rang seeking gay christian sources. Yay ABC cuts.
Mike Cannon-Brookes
mcannonbrookes RT @barconati Awesome post. Dan talks about the benefits of deploying Confluence enterprise wiki at the Powerhouse Museum http://tr.im/2l9b

Munging old URLs to match WordPress' expectations

One of the downsides of having spent years messing with my old Drupal blog is that I’ve ended up with a bunch of different permalink styles: to pick three posts at random, http://zhasper.com/zhasper/harry_potter_done, http://zhasper.com/2007/09/linkbloggery, http://zhasper.com/?p=631. Fortunately, I’m only running this blog to give myself a place to vent, so I don’t care about lost traffic. If I did care, this would be a problem.

I’m using the “Platinum SEO pack” plugin, which does a good job of handling URLs that don’t quite match the same schema that WordPress is using – for instance, if you visit http://zhasper.com/linkbloggery, it’ll figure out that you meant the second URL in the list above. Unfortunately, it’s not perfect – and my old blog had way too many variations for anything to cope with.

So, I’m going through and doing what I can to fix the low-hanging fruit. URLs in the second form, /YYYY/MM/title, already work fine. URLs in the first form need to have the /zhasper/ removed, and need all the _s turned into -s. I accomplish both of these through a bit of RewriteRule magic:

RewriteEngine On

RewriteBase /

RewriteRule zhasper/(.*) /$1 [R=301,L]

RewriteRule (.*)_(.*) $1-$2 [R=301,L]

This is quite definitely not the neatest way to achieve this. In the example above, it requires three excess round-trips between the server and the browser:

  • Browser requests /zhasper/harry_potter_done
  • Server sends a redirect to /harry_potter_done
  • Browser requests /harry_potter_done
  • Server sends a redirect to /harry_potter-done
  • Browser requests /harry_potter-done
  • Server sends a redirect to /harry-potter-done
  • Browser requests /harry-potter-done
  • Server sends a redirect to /2007/07/harry-potter-done/
  • Browser requests /2007/07/harry-potter-done/
  • Server sends actual content

The 301 in the RewriteRule means that the server tells the client that this is a permanent redirect – the content will never be at the old address, please update your bookmarks. This doesn’t make much difference to your browser – but crawlers such as Google should use this as a signal to update their index, and send any link-love directed at the old link to the new link.

If you didn’t have the redirect at all, Google wouldn’t know that /zhasper/harry_potter_done and /2007/07/harry-potter-done were the same page – it would think that the latter was just a more-recently-seen page which mysteriously had similar content to the old page.

If you go with a temporary redirect (by just using R on its own, or by stipulating [R=302], Google won’t know to update its index: it will still come back later and check the old URL, just in case the page has moved back there.

There are definitely better ways to achieve this – suggested enhancements are welcome 🙂

Google Webmaster Tools: I don't understand them.

I’ve seen a few hits on my site to http://zhasper.com/user/, or pages underneath. This seems to be because there used to be content there, and Google’s cache hasn’t (or hadn’t at the time anyway – it seems to have mostly caught up now).

I don’t want this, so I went to the “Remove URLs” tool, under Tools in the Webmaster Console.

The page says:

Before you begin, you must make sure that Google and other search engines will not crawl the content you want to remove from our search results.

To do this, ensure that each page returns an HTTP status code of either 404 or 410, or use a robots.txt file or meta noindex tag to block crawlers from accessing your content.

Okay, so it needs to return a 404. Easy – there’s no content there anyway, it’s already returning a 404. Double-check:

zhasper@bridgitte:~$ wget http://zhasper.com/user/

--2008-12-25 17:03:31--  http://zhasper.com/user/

Resolving zhasper.com... 88.198.1.123

Connecting to zhasper.com|88.198.1.123|:80... connected.

HTTP request sent, awaiting response... 404 Not Found

2008-12-25 17:03:31 ERROR 404: Not Found.

Excellent. So, I request the whole directory to be removed from the index.

Some days later, I come back and check, and my request for removal has been denied. There’s a little question mark beside the word denied, obviously further details, so I click on it:

Your request has been denied because the webmaster of the site hasn’t applied the appropriate robots.txt file or meta tags to block us from indexing or archiving this page.

No shit – I didn’t put anything in robots.txt because it’s returning a 404, and your instructions say that’s all that’s needed.

Grrr.

I *think* that everything under /user/ has been removed (there’s certainly nothing in the index any more), it’s just /user that’s not been removed. I don’t understand this – /user gives a 404 also, and the content shown in the snippet is the old Drupal content.

(obdisc: this is a private blog, all opinions are my own and not those of my employer, who happens to be Google. There’s probably something obvious that I’m overlooking – hopefully I’ll have another blog post soon with an update on what that is)

Udpate, 5 minutes later: Duh. Read the next paragraph, idiot:

If you’re requesting removal of a full site or directory, you must use a robots.txt file to block crawlers from accessing this content.

I’m requesting removal of a full directory. So….

Story of the day: The voices in your head are real.

From the normally staid ABC news website comes this gem:

Paranoia is much more common in modern society than previously thought, says a British doctor, who warns it could lead to major problems in society.

Oh noes! Rampant paranoia! Is this what’s been making me think crazy thoughts lately? Our society is in danger! Quick people: we must be vigilant! Examine your own thoughts for any hint of paranoia, NOW!

Dr Daniel Freeman from the psychiatry institute of King’s College London says almost a quarter of the population experience regular paranoid thoughts,

One in four? Then it’s almost certain that I’m paranoid. Woe is me! Whatever could be causing this epidemic of paranoia?

driven by an avalanche of sensational stories in the media.

Oh. Right. Good to see that you’re helping there, doc!

ASA censorship update: Screengrabs!

Re censorship of flight details: Tim Bennet at Electron Soup was faster than me and got screengrabs before the details were censored. Go satisfy your curiousity at his blog.

Crikey! I got a half-mention!

Stilgherrian alerted me to the fact that I got a mention on Crikey today – or at least, yesterday’s post about ASA’s censorship of flight records did.

I’m flattered, but also slightly pissed. If you clicked on that link, you’d have been asked to provide your credentials as a paid-up member of Crikey – or at least, to take a 21-day free trial. I had to do the latter, in order to read what had been said. Hopefully if I’m ever mentioned again on Crikey it’ll be within the next few weeks – because after that my free trial will have expired, and I’d hate to have to pay for a membership just to see how I was being quoted. There’s plenty of good reasons to pay for a membership, and I’ve been toying with the idea for a while – but that’s not the reason I’d prefer to be my primary reason.

So yes, I signed up for the trial and got to read the article. There’s a nice link back to my blog – except with a missing “http://”, so the link directs readers to http://www.crikey.com.au/Politics/zhasper.com/2008/12/censorship-of-flight-details/ and not to my blog. So, of course, I got… well, actually, I got 27 people hitting that page directly, no doubt through manually fixing the URL.

Actually, I should say that I got two half-mentions. I also had 61 visits from http://civilair.asn.au/. Ben Sandilands, the journalist wrote the Crikey piece, seems to be active there as well (at least: I found a story from him just by skimming the front page) – I’m guessing the two are related. As with Crikey, I can’t see the content on this site without registering. Unlike Crikey, it’s not possible to register here – so I’m still in the dark about where the traffic came from.

So, overall, a good day for blogging. Apparently I’m not the only person interested in why ASA censored flight details – I just wish I could see what the other interested people are saying.

Unrelatedly, I caved and ordered x-plane tonight. If I had a car, I’d be at the airport on one of the mounds right now, having spent the last half-hour watching the last few planes scurrying to get off the ground before curfew kicks in. I seem to be back in *that* phase.

Censorship of.. flight details?

A few days ago, a colleage pointed met at AirServices Australia’s new fancy flight tracker, which allows you to watch planes coming and going in the airspace around Sydney airport. There are plenty of things not to like – MS Virtual Earth ;), the nasty click-through EULA that you have to agree to before you even find out what the site provides…

But, that aside, it’s fairly cool. Planes, flying, around Sydney! Results from noise-level meters, so you can see just how noisy your new suburb is going to be. Even details about the planes – type of plane, altitude, flight numbers..

So today there was a tragic accident involving two planes with trainee pilots. SMH have a video online which shows the flight tracker, and shows the two planes involved colliding (and then one of them dropping off the radar – literally). According to the timestamps superimposed on the video, the crash happens just after 11:23am

The site lets you see historical data: in the box on the lower-left, un-tick the “Show Current Flights” button, then use the controls to choose the day and time you’d like to look at. So it’s easy enough to go back to 11:20am and run through the next few minutes and see the crash for yourself.

Except… that it’s not. There are no planes in that area at that time. In fact, there’s no light aviation at all. Someone has excised all light aviation records between 11:00am and 11:59:16am. If you set the timer to start art 10:59, you see a whole bunch of planes:

before

suddenly dissapear:

after

It’s not a subtle removal either, even if you ignore all the planes which freeze and then vanish from the graph. There’s a nice graph showing you the number of movements per hour for the day – spot what’s odd about today:

15-1216-1217-1218-12

I fail to understand this. I… just fail. I really don’t understand why this is considered sensitive, and why it’s been removed.

SMH: Not so clever with the counting

It seems that as well as firing all their journalists, SMH have forgotten how to do math.

smh-counting-fail

Well done!

Kitties is cute

What’s cuter than a kitty-cat giving you morning cuddles?

Two kitty-cats giving morning cuddles!

The picture below is momentous – it’s the first time both cats have consented to cuddle at the same time.

Burrito licks Linus' ear while I cuddle them both

Burrito licks Linus' ear while I cuddle them both

Stephen Fry: Tweeting with class

From his tweetstream overnight (my time):

Well, sitting in JFK lounge. Flying to LA this morning. Sharp contrast. But, you know, it’s possible to love LA and NYC x
It’s possible to like the Beatles and the Stones. x
You can love Mozart and Wagner x
You can like Dickens and Austen x
You can like iPhones and BlackBerries x
Carbs and protein x
Bach and Mahler x
Coke and Pepsi. (Coke and Ecstasy for that matter …) x
But NOT Vista and OS X. Non posso. Unmöglich. x
Well praps I went a bit far with coke and pepsi. Should’ve tweeter “you can dislike both coke and pepsi” x