In order to move my blog to a free-as-in-freedom platform and support the great work that Joey (of git-annex fame) and Lars (of GTD for hackers fame) have put into their service, I decided to convert my Blogger blog to Ikiwiki and host it on Branchable.
Exporting posts and comments from Blogger
Thanks to Google letting people export their own data from their services, I was able to get a full dump (posts, comments and metadata) of my blog in Atom format.
To do this, go into "Settings | Other" then look under "Blog tools" for the "Export blog" link.
Converting HTML posts to Markdown
Converting posts from HTML to Markdown involved a few steps:
- Converting the post content using a small conversion library to which I added a few hacks.
- Creating the file hierarchy that ikiwiki requires.
- Downloading images from Blogger and fixing their paths in the article text.
- Extracting comments and linking them to the right posts.
The Python script I wrote to do all of the above will hopefully be a good starting point for anybody wanting to migrate to Ikiwiki.
Maintaining old URLs
In order to make sure I wouldn't break any existing links pointing to my blog on Blogger, I got the above Python script to output a list of Apache redirect rules and then found out that I could simply email these rules to Joey and Lars to get them added to my blog.
My rules look like this:
# Tagged feeds Redirect permanent /feeds/posts/default/-/debian http://feeding.cloud.geek.nz/tags/debian/index.rss Redirect permanent /search/label/debian http://feeding.cloud.geek.nz/tags/debian # Main feed (needs to come after the tagged feeds) Redirect permanent /feeds/posts/default http://feeding.cloud.geek.nz/index.rss # Articles Redirect permanent /2012/12/keeping-gmail-in-separate-browser.html http://feeding.cloud.geek.nz/posts/keeping-gmail-in-separate-browser/ Redirect permanent /2012/11/prefetching-resources-to-prime-browser.html http://feeding.cloud.geek.nz/posts/prefetching-resources-to-prime-browser/
Since I am no longer using Google Analytics on my blog, I decided to take advantage of the access log download feature that Joey recently added to Branchable.
Every night, I download my blog's access log and then process it using awstats. Here is the cron job I use:
#!/bin/bash BASEDIR=/home/francois/documents/branchable-logs LOGDIR=/var/log/feedingthecloud # Download the current access log LANG=C LC_PAPER= ssh -oIdentityFile=$BASEDIR/branchable-logbot email@example.com logdump > $LOGDIR/access.log
It uses a separate SSH key I added through the Branchable control panel and outputs to a file that gets overwritten every day.
Next, I installed the awstats Debian package, and configured it like this:
$ cat /etc/awstats/awstats.conf.local SiteDomain=feedingthecloud.branchable.com LogType=W LogFormat=1 LogFile="/var/log/feedingthecloud/access.log"
Even if you're not interested in analytics, I recommend you keep an eye on the 404 errors for a little while after the move. This has helped me catch a critical redirection I had forgotten.
Limiting Planet feeds
One of the most common things that happen right after someone migrates to a new blogging platform is the flooding of any aggregator that subscribes to their blog. The usual cause being the change in post identifiers.
Having always hosted my blog on a domain I own, all I needed to do to
move over to the new platform without an outage was to change my
I've kept the Blogger blog alive and listening on
ensure that clients using a broken DNS resolver (which caches records for
longer than requested via the record's
TTL) continue to
see the old posts.