Munging old URLs to match WordPress' expectations

One of the downsides of having spent years messing with my old Drupal blog is that I’ve ended up with a bunch of different permalink styles: to pick three posts at random,,, Fortunately, I’m only running this blog to give myself a place to vent, so I don’t care about lost traffic. If I did care, this would be a problem.

I’m using the “Platinum SEO pack” plugin, which does a good job of handling URLs that don’t quite match the same schema that WordPress is using – for instance, if you visit, it’ll figure out that you meant the second URL in the list above. Unfortunately, it’s not perfect – and my old blog had way too many variations for anything to cope with.

So, I’m going through and doing what I can to fix the low-hanging fruit. URLs in the second form, /YYYY/MM/title, already work fine. URLs in the first form need to have the /zhasper/ removed, and need all the _s turned into -s. I accomplish both of these through a bit of RewriteRule magic:

RewriteEngine On

RewriteBase /

RewriteRule zhasper/(.*) /$1 [R=301,L]

RewriteRule (.*)_(.*) $1-$2 [R=301,L]

This is quite definitely not the neatest way to achieve this. In the example above, it requires three excess round-trips between the server and the browser:

  • Browser requests /zhasper/harry_potter_done
  • Server sends a redirect to /harry_potter_done
  • Browser requests /harry_potter_done
  • Server sends a redirect to /harry_potter-done
  • Browser requests /harry_potter-done
  • Server sends a redirect to /harry-potter-done
  • Browser requests /harry-potter-done
  • Server sends a redirect to /2007/07/harry-potter-done/
  • Browser requests /2007/07/harry-potter-done/
  • Server sends actual content

The 301 in the RewriteRule means that the server tells the client that this is a permanent redirect – the content will never be at the old address, please update your bookmarks. This doesn’t make much difference to your browser – but crawlers such as Google should use this as a signal to update their index, and send any link-love directed at the old link to the new link.

If you didn’t have the redirect at all, Google wouldn’t know that /zhasper/harry_potter_done and /2007/07/harry-potter-done were the same page – it would think that the latter was just a more-recently-seen page which mysteriously had similar content to the old page.

If you go with a temporary redirect (by just using R on its own, or by stipulating [R=302], Google won’t know to update its index: it will still come back later and check the old URL, just in case the page has moved back there.

There are definitely better ways to achieve this – suggested enhancements are welcome 🙂

Crikey! I got a half-mention!

Stilgherrian alerted me to the fact that I got a mention on Crikey today – or at least, yesterday’s post about ASA’s censorship of flight records did.

I’m flattered, but also slightly pissed. If you clicked on that link, you’d have been asked to provide your credentials as a paid-up member of Crikey – or at least, to take a 21-day free trial. I had to do the latter, in order to read what had been said. Hopefully if I’m ever mentioned again on Crikey it’ll be within the next few weeks – because after that my free trial will have expired, and I’d hate to have to pay for a membership just to see how I was being quoted. There’s plenty of good reasons to pay for a membership, and I’ve been toying with the idea for a while – but that’s not the reason I’d prefer to be my primary reason.

So yes, I signed up for the trial and got to read the article. There’s a nice link back to my blog – except with a missing “http://”, so the link directs readers to and not to my blog. So, of course, I got… well, actually, I got 27 people hitting that page directly, no doubt through manually fixing the URL.

Actually, I should say that I got two half-mentions. I also had 61 visits from Ben Sandilands, the journalist wrote the Crikey piece, seems to be active there as well (at least: I found a story from him just by skimming the front page) – I’m guessing the two are related. As with Crikey, I can’t see the content on this site without registering. Unlike Crikey, it’s not possible to register here – so I’m still in the dark about where the traffic came from.

So, overall, a good day for blogging. Apparently I’m not the only person interested in why ASA censored flight details – I just wish I could see what the other interested people are saying.

Unrelatedly, I caved and ordered x-plane tonight. If I had a car, I’d be at the airport on one of the mounds right now, having spent the last half-hour watching the last few planes scurrying to get off the ground before curfew kicks in. I seem to be back in *that* phase.

WordPress plugins and pro tips: tell me about them

So. 2.7 is nice and shiny, yay.

Other plugins I’m using:

Akismet of course, for comment-spam (it’s already detected one, which Drupal’s anti-spam probably would have missed).

FD Feedburner Plugin, to redirect feeds off to Feedburner. I foolishly migrated from to yesterday, so now there are two redirects when you try to grab my feeds, but otherwise this works fine.

Friendfeed Comments, in case of the unlikely event that anyone comments on any of these posts over on FF.

Google Analyticator, for inserting of Google Analytics magic codez in all the right places (and not inserting when I’m logged in, so I don’t track myself)

Google XML Sitemaps, for better indexation by Google

MobilePress, for a funky iPhone interface

OpenID, for OpenID auth, commenting, and registration

Platinum SEO Pack, because it has magic which corrects for the fact that URLs to posts aren’t quite what they used to be.

Register Plus, so that you don’t have a boring registration experience.

Subscribe to Comments – for the cool “email me updates” option below the comment form

Use Google Libraries, to send a small part of the bandwidth for the site to someone else (and hopefully make page loads faster for you)

WP Super Cache, for caching of the site (and again, hopefully faster loading and diggproofing)

What else is cool that I should be using?

Curse those productive wordpress fiends.

Typical. The *day* after I stay up all night migrating from drupal to wordpress 2.6, they release wordpress 2.7 (Coltrane). How dare they give me shiny new goodness!

It’s very shiny indeed. It took me all of 5 minutes to upgrade (with only one false attempt; fortunately, “bzr revert -r 1” saved me).

Is every day of being a wordpress user like this?

Link-love for making the drupal to wordpress migration smooth

I’d just like to mention Alan and Laura Dove, who have a nice little walkthrough (complete with mashed up mysql script) for turning your drupal database into a wordpress database – at least, the posts and the comments.

Very handy, worked as advertised, yay.


I'm back!

Bet you never even noticed I was gone.

I’ve made a few minor changes to the site: switched to a new server, switched from debian to ubuntu, switched from drupal to wordpress. Nothing major.

If you’ve just had to plow through a flood of old entries re-appearing in gReader, my apologies. Please leave a comment below to let me know how upset you are!

Edit 11/12/08 02:05 – I now have the feed that should have been showing up on Planet Slug working at last, so any post I judge slugworthy will now show up there. More work to come soon – for one thing, have to get one of the friendfeed plugins working, to merge comments-here and comments-there.

Edit 11/12/08 02:32 – I’ve now fixed the URL that LJ has been trying to fetch from as well, which has probably been broken for quite some time. Assuming LJ follows the 302 to feedburner, the 15 or so people still stuck in that walled garden should see some updates as well. I still have no plans to follow the comments there – but your LJ is an openid thinger. Just stick your LJ address in the “Website” field of the comment box and you’ll be able to verify that it’s really you posting comments.

access.log shennanigans

In my continuing theme of writing about my own site..

I’ve just spotted some bizarre searches that found my site lately..

[06/Jun/2005:10:22:40 +1000] "GET /aggregator HTTP/1.1" 200 41935


[06/Jun/2005:11:41:29 +1000] "GET / HTTP/1.1" 200 7407


[06/Jun/2005:15:09:52 +1000] "GET /aggregator/categories/1 HTTP/1.0" 200 46103


[06/Jun/2005:15:10:14 +1000] "GET /aggregator/categories/1 HTTP/1.0" 200 14527


One of those makes sense – “Things that start with z”

The rest.. are just weird.

I’m extremely curious about what the person searching for “Things that start with z” was looking for though. Sounds like a Sesame Street write trying to come up with ideas for next weeks show to me..