Munging old URLs to match Wordpress' expectations
One of the downsides of having spent years messing with my old Drupal blog is that I've ended up with a bunch of different permalink styles: to pick three posts at random, http://zhasper.com/zhasper/harry_potter_done
, http://zhasper.com/2007/09/linkbloggery
, http://zhasper.com/?p=631
. Fortunately, I'm only running this blog to give myself a place to vent, so I don't care about lost traffic. If I did care, this would be a problem.
I'm using the "Platinum SEO pack" plugin, which does a good job of handling URLs that don't quite match the same schema that Wordpress is using - for instance, if you visit http://zhasper.com/linkbloggery
, it'll figure out that you meant the second URL in the list above. Unfortunately, it's not perfect - and my old blog had way too many variations for anything to cope with.
So, I'm going through and doing what I can to fix the low-hanging fruit. URLs in the second form, /YYYY/MM/title, already work fine. URLs in the first form need to have the /zhasper/
removed, and need all the _
s turned into -
s. I accomplish both of these through a bit of RewriteRule magic:
RewriteEngine On
RewriteBase /
RewriteRule zhasper/(.*) /$1 [R=301,L]
RewriteRule (.*)_(.*) $1-$2 [R=301,L]
This is quite definitely not the neatest way to achieve this. In the example above, it requires three excess round-trips between the server and the browser:
- Browser requests /zhasper/harry_potter_done
- Server sends a redirect to /harry_potter_done
- Browser requests /harry_potter_done
- Server sends a redirect to /harry_potter-done
- Browser requests /harry_potter-done
- Server sends a redirect to /harry-potter-done
- Browser requests /harry-potter-done
- Server sends a redirect to /2007/07/harry-potter-done/
- Browser requests /2007/07/harry-potter-done/
- Server sends actual content
The 301 in the RewriteRule means that the server tells the client that this is a permanent redirect - the content will never be at the old address, please update your bookmarks. This doesn't make much difference to your browser - but crawlers such as Google should use this as a signal to update their index, and send any link-love directed at the old link to the new link.
If you didn't have the redirect at all, Google wouldn't know that /zhasper/harry_potter_done
and /2007/07/harry-potter-done
were the same page - it would think that the latter was just a more-recently-seen page which mysteriously had similar content to the old page.
If you go with a temporary redirect (by just using R
on its own, or by stipulating [R=302]
, Google won't know to update its index: it will still come back later and check the old URL, just in case the page has moved back there.
There are definitely better ways to achieve this - suggested enhancements are welcome :)