Munging old URLs to match WordPress' expectations

One of the downsides of having spent years messing with my old Drupal blog is that I’ve ended up with a bunch of different permalink styles: to pick three posts at random, http://zhasper.com/zhasper/harry_potter_done, http://zhasper.com/2007/09/linkbloggery, http://zhasper.com/?p=631. Fortunately, I’m only running this blog to give myself a place to vent, so I don’t care about lost traffic. If I did care, this would be a problem.

I’m using the “Platinum SEO pack” plugin, which does a good job of handling URLs that don’t quite match the same schema that WordPress is using – for instance, if you visit http://zhasper.com/linkbloggery, it’ll figure out that you meant the second URL in the list above. Unfortunately, it’s not perfect – and my old blog had way too many variations for anything to cope with.

So, I’m going through and doing what I can to fix the low-hanging fruit. URLs in the second form, /YYYY/MM/title, already work fine. URLs in the first form need to have the /zhasper/ removed, and need all the _s turned into -s. I accomplish both of these through a bit of RewriteRule magic:

RewriteEngine On

RewriteBase /

RewriteRule zhasper/(.*) /$1 [R=301,L]

RewriteRule (.*)_(.*) $1-$2 [R=301,L]

This is quite definitely not the neatest way to achieve this. In the example above, it requires three excess round-trips between the server and the browser:

  • Browser requests /zhasper/harry_potter_done
  • Server sends a redirect to /harry_potter_done
  • Browser requests /harry_potter_done
  • Server sends a redirect to /harry_potter-done
  • Browser requests /harry_potter-done
  • Server sends a redirect to /harry-potter-done
  • Browser requests /harry-potter-done
  • Server sends a redirect to /2007/07/harry-potter-done/
  • Browser requests /2007/07/harry-potter-done/
  • Server sends actual content

The 301 in the RewriteRule means that the server tells the client that this is a permanent redirect – the content will never be at the old address, please update your bookmarks. This doesn’t make much difference to your browser – but crawlers such as Google should use this as a signal to update their index, and send any link-love directed at the old link to the new link.

If you didn’t have the redirect at all, Google wouldn’t know that /zhasper/harry_potter_done and /2007/07/harry-potter-done were the same page – it would think that the latter was just a more-recently-seen page which mysteriously had similar content to the old page.

If you go with a temporary redirect (by just using R on its own, or by stipulating [R=302], Google won’t know to update its index: it will still come back later and check the old URL, just in case the page has moved back there.

There are definitely better ways to achieve this – suggested enhancements are welcome 🙂

2 Comments

  1. James Polley says:

    Some notes:

    – WordPress copes fine if there are extra things on the end of the URL – /, -/, -/?comments=true – all get ignored.

    – WordPress does *not* cope if there’s a ?, !, (, or ) in the SLUG. As far as I can tell, WordPress would never use those characters itself, they’re leftovers from the Drupal migration.

    – I’ve had some success removing some of these via another RewriteRule – RewriteRule (.*)\"(.*) $1$2 [R=301,L] – but for many of these (eg, if the last character of the SLUG is a -, WordPress just breaks altogether if the character is in the SLUG, so manually removing them is the only option.

  2. James Polley says:

    The thing that’s really bugging me is the trailing-- issue. If the trailing-- is in the URL but not the SLUG, WordPress neatly ignores it. If the trailing-- is in the SLUG, it’s impossible to see the page, even when you enter the complete URL. All I’m doing to fix them is removing the - from the SLUG.

Leave a Reply