In order to move my blog to a free-as-in-freedom platform and support the great work that Joey (of git-annex fame) and Lars (of GTD for hackers fame) have put into their service, I decided to convert my Blogger blog to Ikiwiki and host it on Branchable.
While the Ikiwiki tips page points to some old instructions, they weren't particularly useful to me. Here are the steps I followed.
Exporting posts and comments from Blogger
Thanks to Google letting people export their own data from their services, I was able to get a full dump (posts, comments and metadata) of my blog in Atom format.
To do this, go into "Settings | Other" then look under "Blog tools" for the "Export blog" link.
Converting HTML posts to Markdown
Converting posts from HTML to Markdown involved a few steps:
- Converting the post content using a small conversion library to which I added a few hacks.
- Creating the file hierarchy that ikiwiki requires.
- Downloading images from Blogger and fixing their paths in the article text.
- Extracting comments and linking them to the right posts.
The Python script I wrote to do all of the above will hopefully be a good starting point for anybody wanting to migrate to Ikiwiki.
Maintaining old URLs
In order to make sure I wouldn't break any existing links pointing to my blog on Blogger, I got the above Python script to output a list of Apache redirect rules and then found out that I could simply email these rules to Joey and Lars to get them added to my blog.
My rules look like this:
# Tagged feeds
Redirect permanent /feeds/posts/default/-/debian /tags/debian/index.rss
Redirect permanent /search/label/debian /tags/debian
# Main feed (needs to come after the tagged feeds)
Redirect permanent /feeds/posts/default /index.rss
# Articles
Redirect permanent /2012/12/keeping-gmail-in-separate-browser.html /posts/keeping-gmail-in-separate-browser/
Redirect permanent /2012/11/prefetching-resources-to-prime-browser.html /posts/prefetching-resources-to-prime-browser/
Collecting analytics
Since I am no longer using Google Analytics on my blog, I decided to take advantage of the access log download feature that Joey recently added to Branchable.
Every night, I download my blog's access log and then process it using awstats. Here is the cron job I use:
#!/bin/bash
BASEDIR=/home/francois/documents/branchable-logs
LOGDIR=/var/log/feedingthecloud
# Download the current access log
LANG=C LC_PAPER= ssh -oIdentityFile=$BASEDIR/branchable-logbot b-feedingthecloud@feedingthecloud.branchable.com logdump > $LOGDIR/access.log
It uses a separate SSH key I added through the Branchable control panel and outputs to a file that gets overwritten every day.
Next, I installed the awstats Debian package, and configured it like this:
$ cat /etc/awstats/awstats.conf.local
SiteDomain=feedingthecloud.branchable.com
LogType=W
LogFormat=1
LogFile="/var/log/feedingthecloud/access.log"
Even if you're not interested in analytics, I recommend you keep an eye on the 404 errors for a little while after the move. This has helped me catch a critical redirection I had forgotten.
Limiting Planet feeds
One of the most common things that happen right after someone migrates to a new blogging platform is the flooding of any aggregator that subscribes to their blog. The usual cause being the change in post identifiers.
Unsurprisingly, Ikiwiki already had a few ways to avoid this problem. I chose to simply modify each tagged feed and limit them to the posts added after the move to Branchable.
Switching DNS
Having always hosted my blog on a domain I own, all I needed to do to
move over to the new platform without an outage was to change my CNAME
to
point to feedingthecloud.branchable.com
.
I've kept the Blogger blog alive and listening on feeding.cloud.geek.nz
to
ensure that clients using a broken DNS resolver (which caches records for
longer than requested via the record's
TTL) continue to
see the old posts.
After replacing some variables in run.sh and blogger2ikiwiki.py:
Git diff shows:
Clearly my python-foo is too low What can I be missing?
Lissandro: it looks like the permalink that's in that particular post is not a valid URL.
You could try to find out why by adding a print statement just above line 233 in blogger2ikiwiki.py.
(a year later, but...)
Problem turned out to be drafts. I removed them from blogger and the re exported the data and everything went just fine (bah, I still didn't finihed, but I've got that part sorted out).
Thanks!