In order to move my blog to a free-as-in-freedom platform and support the great work that Joey (of git-annex fame) and Lars (of GTD for hackers fame) have put into their service, I decided to convert my Blogger blog to Ikiwiki and host it on Branchable.

While the Ikiwiki tips page points to some old instructions, they weren't particularly useful to me. Here are the steps I followed.

Exporting posts and comments from Blogger

Thanks to Google letting people export their own data from their services, I was able to get a full dump (posts, comments and metadata) of my blog in Atom format.

To do this, go into "Settings | Other" then look under "Blog tools" for the "Export blog" link.

Converting HTML posts to Markdown

Converting posts from HTML to Markdown involved a few steps:

  1. Converting the post content using a small conversion library to which I added a few hacks.
  2. Creating the file hierarchy that ikiwiki requires.
  3. Downloading images from Blogger and fixing their paths in the article text.
  4. Extracting comments and linking them to the right posts.

The Python script I wrote to do all of the above will hopefully be a good starting point for anybody wanting to migrate to Ikiwiki.

Maintaining old URLs

In order to make sure I wouldn't break any existing links pointing to my blog on Blogger, I got the above Python script to output a list of Apache redirect rules and then found out that I could simply email these rules to Joey and Lars to get them added to my blog.

My rules look like this:

# Tagged feeds
Redirect permanent /feeds/posts/default/-/debian http://feeding.cloud.geek.nz/tags/debian/index.rss
Redirect permanent /search/label/debian http://feeding.cloud.geek.nz/tags/debian

# Main feed (needs to come after the tagged feeds)
Redirect permanent /feeds/posts/default http://feeding.cloud.geek.nz/index.rss

# Articles
Redirect permanent /2012/12/keeping-gmail-in-separate-browser.html http://feeding.cloud.geek.nz/posts/keeping-gmail-in-separate-browser/
Redirect permanent /2012/11/prefetching-resources-to-prime-browser.html http://feeding.cloud.geek.nz/posts/prefetching-resources-to-prime-browser/

Collecting analytics

Since I am no longer using Google Analytics on my blog, I decided to take advantage of the access log download feature that Joey recently added to Branchable.

Every night, I download my blog's access log and then process it using awstats. Here is the cron job I use:

#!/bin/bash

BASEDIR=/home/francois/documents/branchable-logs
LOGDIR=/var/log/feedingthecloud

# Download the current access log
LANG=C LC_PAPER= ssh -oIdentityFile=$BASEDIR/branchable-logbot b-feedingthecloud@feedingthecloud.branchable.com logdump > $LOGDIR/access.log

It uses a separate SSH key I added through the Branchable control panel and outputs to a file that gets overwritten every day.

Next, I installed the awstats Debian package, and configured it like this:

$ cat /etc/awstats/awstats.conf.local
SiteDomain=feedingthecloud.branchable.com
LogType=W
LogFormat=1
LogFile="/var/log/feedingthecloud/access.log"

Even if you're not interested in analytics, I recommend you keep an eye on the 404 errors for a little while after the move. This has helped me catch a critical redirection I had forgotten.

Limiting Planet feeds

One of the most common things that happen right after someone migrates to a new blogging platform is the flooding of any aggregator that subscribes to their blog. The usual cause being the change in post identifiers.

Unsurprisingly, Ikiwiki already had a few ways to avoid this problem. I chose to simply modify each tagged feed and limit them to the posts added after the move to Branchable.

Switching DNS

Having always hosted my blog on a domain I own, all I needed to do to move over to the new platform without an outage was to change my CNAME to point to feedingthecloud.branchable.com.

I've kept the Blogger blog alive and listening on feeding.cloud.geek.nz to ensure that clients using a broken DNS resolver (which caches records for longer than requested via the record's TTL) continue to see the old posts.

AttributeError: 'NoneType' object has no attribute 'find'

After replacing some variables in run.sh and blogger2ikiwiki.py:

$ ./run.sh 
Traceback (most recent call last):
  File "../blogger2ikiwiki.py", line 365, in <module>
    (filename, post, permalink) = print_post(entry, tags)
  File "../blogger2ikiwiki.py", line 246, in print_post
    filename = extract_filename(permalink)
  File "../blogger2ikiwiki.py", line 233, in extract_filename
    components = urlparse(permalink)
  File "/usr/lib/python2.7/urlparse.py", line 140, in urlparse
    tuple = urlsplit(url, scheme, allow_fragments)
  File "/usr/lib/python2.7/urlparse.py", line 179, in urlsplit
    i = url.find(':')
AttributeError: 'NoneType' object has no attribute 'find'

Git diff shows:

diff --git a/blogger2ikiwiki.py b/blogger2ikiwiki.py
index 69cb1df..069dc2c 100755
--- a/blogger2ikiwiki.py
+++ b/blogger2ikiwiki.py
@@ -29,16 +29,16 @@ from html2text import html2text


 # Change this to point to the name of your Blogger export file
-ATOM_BACKUP_FILENAME = '../feedingthecloud.xml'
+ATOM_BACKUP_FILENAME = '/home/lisandro/tmp/blogger/pm.xml'

-LICENSE_LINK = '[Creative Commons Attribution-Share Alike 3.0 New Zealand License](http://creativecommons.org/licenses/by-sa/3.0/nz/)'
+LICENSE_LINK = '[Creative Commons Attribution-Share Alike 3.0 Unported](http://creativecommons.org/licenses/by-sa/3.0/)'

-AUTHOR_URL_REPLACEMENTS = {'http://www.blogger.com/profile/15799633745688818389': 'http://fmarier.org'}
+AUTHOR_URL_REPLACEMENTS = {'http://www.blogger.com/profile/09966442884730426878': 'http://perezmeyer.com.ar'}

 # Must include the trailing slash!
-BLOG_URL = 'http://feeding.cloud.geek.nz/'
+BLOG_URL = 'http://localhost/blog/'

-TAGGED_FEEDS = ['debian', 'mozilla', 'nzoss', 'ubuntu', 'postgres', 'sysadmin', 'django', 'python', 'nodejs']
+TAGGED_FEEDS = ['kde', 'qt']


 def get_author_name(entry):
diff --git a/run.sh b/run.sh
index a368639..737b8f1 100755
--- a/run.sh
+++ b/run.sh
@@ -2,8 +2,8 @@
 #
 # Quickly re-convert the blog and commit to a local test instance of ikiwiki

-SCRIPT_DIR=~/devel/remote/blogger2ikiwiki
-BLOG_DIR=~/ikiwiki/FeedingtheCloud
+SCRIPT_DIR=/home/lisandro/tmp/blogger/blogger2ikiwiki
+BLOG_DIR=/home/lisandro/blog

 cd $SCRIPT_DIR && rm -rf temp/
 mkdir -p temp/

Clearly my python-foo is too low :-/ What can I be missing?

Comment by Lisandro Damián Nicanor
Found it

(a year later, but...)

Problem turned out to be drafts. I removed them from blogger and the re exported the data and everything went just fine (bah, I still didn't finihed, but I've got that part sorted out).

Thanks!

Comment by Lisandro Damián Nicanor