RSS Atom Add a new post titled:
Keeping up with noisy blog aggregators using PlanetFilter

I follow a few blog aggregators (or "planets") and it's always a struggle to keep up with the amount of posts that some of these get. The best strategy I have found so far to is to filter them so that I remove the blogs I am not interested in, which is why I wrote PlanetFilter.

Other options

In my opinion, the first step in starting a new free software project should be to look for a reason not to do it :) So I started by looking for another approach and by asking people around me how they dealt with the firehoses that are Planet Debian and Planet Mozilla.

It seems like a lot of people choose to "randomly sample" planet feeds and only read a fraction of the posts that are sent through there. Personally however, I find there are a lot of authors whose posts I never want to miss so this option doesn't work for me.

A better option that other people have suggested is to avoid subscribing to the planet feeds, but rather to subscribe to each of the author feeds separately and prune them as you go. Unfortunately, this whitelist approach is a high maintenance one since planets constantly add and remove feeds. I decided that I wanted to follow a blacklist approach instead.

PlanetFilter

PlanetFilter is a local application that you can configure to fetch your favorite planets and filter the posts you see.

If you get it via Debian or Ubuntu, it comes with a cronjob that looks at all configuration files in /etc/planetfilter.d/ and outputs filtered feeds in /var/cache/planetfilter/.

You can either:

  • add file:///var/cache/planetfilter/planetname.xml to your local feed reader
  • serve it locally (e.g. http://localhost/planetname.xml) using a webserver, or
  • host it on a server somewhere on the Internet.

The software will fetch new posts every hour and overwrite the local copy of each feed.

A basic configuration file looks like this:

[feed]
url = http://planet.debian.org/atom.xml

[blacklist]

Filters

There are currently two ways of filtering posts out. The main one is by author name:

[blacklist]
authors =
  Alice Jones
  John Doe

and the other one is by title:

[blacklist]
titles =
  This week in review
  Wednesday meeting for

In both cases, if a blog entry contains one of the blacklisted authors or titles, it will be discarded from the generated feed.

Tor support

Since blog updates happen asynchronously in the background, they can work very well over Tor.

In order to set that up in the Debian version of planetfilter:

  1. Install the tor and polipo packages.
  2. Set the following in /etc/polipo/config:

     proxyAddress = "127.0.0.1"
     proxyPort = 8008
     allowedClients = 127.0.0.1
     allowedPorts = 1-65535
     proxyName = "localhost"
     cacheIsShared = false
     socksParentProxy = "localhost:9050"
     socksProxyType = socks5
     chunkHighMark = 67108864
     diskCacheRoot = ""
     localDocumentRoot = ""
     disableLocalInterface = true
     disableConfiguration = true
     dnsQueryIPv6 = no
     dnsUseGethostbyname = yes
     disableVia = true
     censoredHeaders = from,accept-language,x-pad,link
     censorReferer = maybe
    
  3. Tell planetfilter to use the polipo proxy by adding the following to /etc/default/planetfilter:

     export http_proxy="localhost:8008"
     export https_proxy="localhost:8008"
    

Bugs and suggestions

The source code is available on repo.or.cz.

I've been using this for over a month and it's been working quite well for me. If you give it a go and run into any problems, please file a bug!

I'm also interested in any suggestions you may have.

Error while running "git gc"

If you see errors like these while trying to do garbage collection on a git repository:

$ git gc
warning: reflog of 'refs/heads/synced/master' references pruned commits
warning: reflog of 'refs/heads/annex/direct/master' references pruned commits
warning: reflog of 'refs/heads/git-annex' references pruned commits
warning: reflog of 'refs/heads/master' references pruned commits
warning: reflog of 'HEAD' references pruned commits
error: Could not read a4909371f8d5a38316e140c11a2d127d554373c7
fatal: Failed to traverse parents of commit 334b7d05087ed036c1a3979bc09bcbe9e3897226
error: failed to run repack

then the reflog may be pointing to corrupt entries.

They can be purged by running this:

$ git reflog expire --all --stale-fix

Thanks to Joey Hess for pointing me in the right direction while debugging a git-annex problem.

Upgrading Lenovo ThinkPad BIOS under Linux

The Lenovo support site offers downloadable BIOS updates that can be run either from Windows or from a bootable CD.

Here's how to convert the bootable CD ISO images under Linux in order to update the BIOS from a USB stick.

Checking the BIOS version

Before upgrading your BIOS, you may want to look up which version of the BIOS you are currently running. To do this, install the dmidecode package:

apt-get install dmidecode

then run:

dmidecode

or alternatively, look at the following file:

cat /sys/devices/virtual/dmi/id/bios_version

Updating the BIOS using a USB stick

To update without using a bootable CD, install the genisoimage package:

apt-get install genisoimage

then use geteltorito to convert the ISO you got from Lenovo:

geteltorito -o bios.img gluj19us.iso

Insert a USB stick you're willing to erase entirely and then copy the image onto it (replacing sdX with the correct device name, not partition name, for the USB stick):

dd if=bios.img of=/dev/sdX

then restart and boot from the USB stick by pressing Enter, then F12 when you see the Lenovo logo.

Using unattended-upgrades on Rackspace's Debian and Ubuntu servers

I install the unattended-upgrades package on almost all of my Debian and Ubuntu servers in order to ensure that security updates are automatically applied. It works quite well except that I still need to login manually to upgrade my Rackspace servers whenever a new rackspace-monitoring-agent is released because it comes from a separate repository that's not covered by unattended-upgrades.

It turns out that unattended-upgrades can be configured to automatically upgrade packages outside of the standard security repositories but it's not very well documented and the few relevant answers you can find online are still using the old whitelist syntax.

Initial setup

The first thing to do is to install the package if it's not already done:

apt-get install unattended-upgrades

and to answer yes to the automatic stable update question.

If you don't see the question (because your debconf threshold is too low -- change it with dpkg-reconfigure debconf), you can always trigger the question manually:

dpkg-reconfigure -plow unattended-upgrades

Once you've got that installed, the configuration file you need to look at is /etc/apt/apt.conf.d/50unattended-upgrades.

Whitelist matching criteria

Looking at the unattended-upgrades source code, I found the list of things that can be used to match on in the whitelist:

  • origin (shortcut: o)
  • label (shortcut: l)
  • archive (shortcut: a)
  • suite (which is the same as archive)
  • component (shortcut: c)
  • site (no shortcut)

You can find the value for each of these fields in the appropriate _Release file under /var/lib/apt/lists/.

Note that the value of site is the hostname of the package repository, also present in the first part these *_Release filenames (stable.packages.cloudmonitoring.rackspace.com in the example below).

In my case, I was looking at the following inside /var/lib/apt/lists/stable.packages.cloudmonitoring.rackspace.com_debian-wheezy-x86%5f64_dists_cloudmonitoring_Release:

Origin: Rackspace
Codename: cloudmonitoring
Date: Fri, 23 Jan 2015 18:58:49 UTC
Architectures: i386 amd64
Components: main
...

which means that, in addition to site, the only things I could match on were origin and component since there are no Suite or Label fields in the Release file.

This is the line I ended up adding to my /etc/apt/apt.conf.d/50unattended-upgrades:

 Unattended-Upgrade::Origins-Pattern {
         // Archive or Suite based matching:
         // Note that this will silently match a different release after
         // migration to the specified archive (e.g. testing becomes the
         // new stable).
 //      "o=Debian,a=stable";
 //      "o=Debian,a=stable-updates";
 //      "o=Debian,a=proposed-updates";
         "origin=Debian,archive=stable,label=Debian-Security";
         "origin=Debian,archive=oldstable,label=Debian-Security";
+        "origin=Rackspace,component=main";
 };

Testing

To ensure that the config is right and that unattended-upgrades will pick up rackspace-monitoring-agent the next time it runs, I used:

unattended-upgrade --dry-run --debug

which should output something like this:

Initial blacklisted packages: 
Starting unattended upgrades script
Allowed origins are: ['origin=Debian,archive=stable,label=Debian-Security', 'origin=Debian,archive=oldstable,label=Debian-Security', 'origin=Rackspace,component=main']
Checking: rackspace-monitoring-agent (["<Origin component:'main' archive:'' origin:'Rackspace' label:'' site:'stable.packages.cloudmonitoring.rackspace.com' isTrusted:True>"])
pkgs that look like they should be upgraded: rackspace-monitoring-agent
...
Option --dry-run given, *not* performing real actions
Packages that are upgraded: rackspace-monitoring-agent

Making sure that automatic updates are happening

In order to make sure that all of this is working and that updates are actually happening, I always install apticron on all of the servers I maintain. It runs once a day and emails me a list of packages that need to be updated and it keeps doing that until the system is fully up-to-date.

The only thing missing from this is getting a reminder whenever a package update (usually the kernel) requires a reboot to take effect. That's where the update-notifier-common package comes in.

Because that package will add a hook that will create the /var/run/reboot-required file whenever a kernel update has been installed, all you need to do is create a cronjob like this in /etc/cron.daily/reboot-required:

#!/bin/sh
cat /var/run/reboot-required 2> /dev/null || true

assuming of course that you are already receiving emails sent to the root user (if not, add the appropriate alias in /etc/aliases and run newaliases).

Making Firefox Hello work with NoScript and RequestPolicy

Firefox Hello is a new beta feature in Firefox 34 which give users the ability to do plugin-free video-conferencing without leaving the browser (using WebRTC technology).

If you cannot get it to work after adding the Hello button to the toolbar, this post may help.

Preferences to check

There are a few preferences to check in about:config:

  • media.peerconnection.enabled should be true
  • network.websocket.enabled should be true
  • loop.enabled should be true
  • loop.throttled should be false

NoScript

If you use the popular NoScript add-on, you will need to whitelist the following hosts:

  • about:loopconversation
  • hello.firefox.com
  • loop.services.mozilla.com
  • opentok.com
  • tokbox.com

RequestPolicy

If you use the less popular but equally annoying RequestPolicy add-on, then you will need to whitelist the following origin to destination mappings:

  • about:loopconversation -> opentok.com
  • about:loopconversation -> tokbox.com
  • firefox.com -> mozilla.com
  • firefox.com -> opentok.com
  • firefox.com -> tokbox.com

If you find a more restrictive policy that works, please leave a comment!

Mercurial and Bitbucket workflow for Gecko development

While it sounds like I should really switch to a bookmark-based Mercurial workflow for my Gecko development, I figured that before I do that, I should document how I currently use patch queues and Bitbucket.

Starting work on a new bug

After creating a new bug in Bugzilla, I do the following:

  1. Create a new mozilla-central-mq-BUGNUMBER repo on Bitbucket using the web interface and put https://bugzilla.mozilla.org/show_bug.cgi?id=BUGNUMBER as the Website in the repository settings.
  2. Create a new patch queue: hg qqueue -c BUGNUMBER
  3. Initialize the patch queue: hg init --mq
  4. Make some changes.
  5. Create a new patch: hg qnew -Ue bugBUGNUMBER.patch
  6. Commit the patch to the mq repo: hg commit --mq -m "Initial version"
  7. Push the mq repo to Bitbucket: hg push --mq ssh://hg@bitbucket.org/fmarier/mozilla-central-mq-BUGNUMBER
  8. Make the above URL the default for pull/push by putting this in .hg/patches-BUGNUMBER/.hg/hgrc:

    [paths]
    default = https://bitbucket.org/fmarier/mozilla-central-mq-BUGNUMBER
    default-push = ssh://hg@bitbucket.org/fmarier/mozilla-central-mq-BUGNUMBER
    

Working on a bug

I like to preserve the history of the work I did on a patch. So once I've got some meaningful changes to commit to my patch queue repo, I do the following:

  1. Add the changes to the current patch: hg qref
  2. Check that everything looks fine: hg diff --mq
  3. Commit the changes to the mq repo: hg commit --mq
  4. Push the changes to Bitbucket: hg push --mq

Switching between bugs

Since I have one patch queue per bug, I can easily work on more than one bug at a time without having to clone the repository again and work from a different directory.

Here's how I switch between patch queues:

  1. Unapply the current queue's patches: hg qpop -a
  2. Switch to the new queue: hg qqueue BUGNUMBER
  3. Apply all of the new queue's patches: hg qpush -a

Rebasing a patch queue

To rebase my patch onto the latest mozilla-central tip, I do the following:

  1. Unapply patches using hg qpop -a
  2. Update the branch: hg pull -u
  3. Reapply the first patch: hg qpush and resolve any conflicts
  4. Update the patch file in the queue: hg qref
  5. Repeat steps 3 and 4 for each patch.
  6. Commit the changes: hg commit --mq -m "Rebase patch"

Credits

Thanks to Thinker Lee for telling me about qqueue and Chris Pearce for explaining to me how he uses mq repos on Bitbucket.

Of course, feel free to leave a comment if I missed anything useful or if there's a easier way to do any of the above.

Hiding network disconnections using an IRC bouncer

A bouncer can be a useful tool if you rely on IRC for team communication and instant messaging. The most common use of such a server is to be permanently connected to IRC and to buffer messages while your client is disconnected.

However, that's not what got me interested in this tool. I'm not looking for another place where messages accumulate and wait to be processed later. I'm much happier if people email me when I'm not around.

Instead, I wanted to do to irssi what mosh did to ssh clients: transparently handle and hide temporary disconnections. Here's how I set everything up.

Server setup

The first step is to install znc:

apt-get install znc

Make sure you get the 1.0 series (in jessie or trusty, not wheezy or precise) since it has much better multi-network support.

Then, as a non-root user, generate a self-signed TLS certificate for it:

openssl req -x509 -sha256 -newkey rsa:2048 -keyout znc.pem -nodes -out znc.crt -days 365

and make sure you use something like irc.example.com as the subject name, that is the URL you will be connecting to from your IRC client.

Then install the certificate in the right place:

mkdir ~/.znc
mv znc.pem ~/.znc/
cat znc.crt >> ~/.znc/znc.pem

Once that's done, you're ready to create a config file for znc using the znc --makeconf command, again as the same non-root user:

  • create separate znc users if you have separate nicks on different networks
  • use your nickserv password as the server password for each network
  • enable ssl
  • say no to the chansaver and nickserv plugins

Finally, open the IRC port (tcp port 6697 by default) in your firewall:

iptables -A INPUT -p tcp --dport 6697 -j ACCEPT

Client setup (irssi)

On the client side, the official documentation covers a number of IRC clients, but the irssi page was quite sparse.

Here's what I used for the two networks I connect to (irc.oftc.net and irc.mozilla.org):

servers = (
  {
    address = "irc.example.com";
    chatnet = "OFTC";
    password = "fmarier/oftc:Passw0rd1!";
    port = "6697";
    use_ssl = "yes";
    ssl_verify = "yes";
    ssl_cafile = "~/.irssi/certs/znc.crt";
  },
  {
    address = "irc.example.com";
    chatnet = "Mozilla";
    password = "francois/mozilla:Passw0rd1!";
    port = "6697";
    use_ssl = "yes";
    ssl_verify = "yes";
    ssl_cafile = "~/.irssi/certs/znc.crt";
  }
);

Of course, you'll need to copy your znc.crt file from the server into ~/.irssi/certs/znc.crt.

Make sure that you're no longer authenticating with the nickserv from within irssi. That's znc's job now.

Wrapper scripts

So far, this is a pretty standard znc+irssi setup. What makes it work with my workflow is the wrapper script I wrote to enable znc before starting irssi and then prompt to turn it off after exiting:

#!/bin/bash
ssh irc.example.com "pgrep znc || znc"
irssi
read -p "Terminate the bouncer? [y/N] " -n 1 -r
echo
if [[ $REPLY =~ ^[Yy]$ ]]
then
  ssh irc.example.com killall -sSIGINT znc
fi

Now, instead of typing irssi to start my IRC client, I use irc.

If I'm exiting irssi before commuting or because I need to reboot for a kernel update, I keep the bouncer running. At the end of the day, I say yes to killing the bouncer. That way, I don't have a backlog to go through when I wake up the next day.

LXC setup on Debian jessie

Here's how to setup LXC-based "chroots" on Debian jessie. While this is documented on the Debian wiki, I had to tweak a few things to get the networking to work on my machine.

Start by installing (as root) the necessary packages:

apt-get install lxc libvirt-bin debootstrap

Network setup

I decided to use the default /etc/lxc/default.conf configuration (no change needed here):

lxc.network.type = veth
lxc.network.flags = up
lxc.network.link = virbr0
lxc.network.hwaddr = 00:FF:AA:xx:xx:xx
lxc.network.ipv4 = 0.0.0.0/24

but I had to make sure that the "guests" could connect to the outside world through the "host":

  1. Enable IPv4 forwarding by putting this in /etc/sysctl.conf:

    net.ipv4.ip_forward=1
    
  2. and then applying it using:

    sysctl -p
    
  3. Ensure that the network bridge is automatically started on boot:

    virsh -c lxc:/// net-start default
    virsh -c lxc:/// net-autostart default
    
  4. and that it's not blocked by the host firewall, by putting this in /etc/network/iptables.up.rules:

    -A INPUT -d 224.0.0.251 -s 192.168.122.1 -j ACCEPT
    -A INPUT -d 192.168.122.255 -s 192.168.122.1 -j ACCEPT
    -A INPUT -d 192.168.122.1 -s 192.168.122.0/24 -j ACCEPT
    
  5. and applying the rules using:

    iptables-apply
    

Creating a container

Creating a new container (in /var/lib/lxc/) is simple:

sudo MIRROR=http://http.debian.net/debian lxc-create -n sid64 -t debian -- -r sid -a amd64

You can start or stop it like this:

sudo lxc-start -n sid64 -d
sudo lxc-stop -n sid64

Connecting to a guest using ssh

The ssh server is configured to require pubkey-based authentication for root logins, so you'll need to log into the console:

sudo lxc-stop -n sid64
sudo lxc-start -n sid64

then install a text editor inside the container because the root image doesn't have one by default:

apt-get install vim

then paste your public key in /root/.ssh/authorized_keys.

Then you can exit the console (using Ctrl+a q) and ssh into the container. You can find out what IP address the container received from DHCP by typing this command:

sudo lxc-ls --fancy

Fixing Perl locale errors

If you see a bunch of errors like these when you start your container:

perl: warning: Setting locale failed.
perl: warning: Please check that your locale settings:
LANGUAGE = (unset),
LC_ALL = (unset),
LANG = "fr_CA.utf8"
    are supported and installed on your system.
perl: warning: Falling back to the standard locale ("C").

then log into the container as root and use:

dpkg-reconfigure locales

to enable the same locales as the ones you have configured in the host.

Encrypted mailing list on Debian and Ubuntu

Running an encrypted mailing list is surprisingly tricky. One of the first challenges is that you need to decide what the threat model is. Are you worried about someone compromising the list server? One of the subscribers stealing the list of subscriber email addresses? You can't just "turn on encryption", you have to think about what you're trying to defend against.

I decided to use schleuder. Here's how I set it up.

Requirements

What I decided to create was a mailing list where people could subscribe and receive emails encrypted to them from the list itself. In order to post, they need to send an email encrypted to the list' public key and signed using the private key of a subscriber.

What the list then does is decrypt the email and encrypts it individually for each subscriber. This protects the emails while in transit, but is vulnerable to the list server itself being compromised since every list email transits through there at some point in plain text.

Installing the schleuder package

The first thing to know about installing schleuder on Debian or Ubuntu is that at the moment it unfortunately depends on ruby 1.8. This means that you can only install it on Debian wheezy or Ubuntu precise: trusty and jessie won't work (until schleuder is ported to a more recent version of ruby).

If you're running wheezy, you're fine, but if you're running precise, I recommend adding my ppa to your /etc/apt/sources.list to get a version of schleuder that actually lets you create a new list without throwing an error.

Then, simply install this package:

apt-get install schleuder

Postfix configuration

The next step is to configure your mail server (I use postfix) to handle the schleuder lists.

This may be obvious but if you're like me and you're repurposing a server which hasn't had to accept incoming emails, make sure that postfix is set to the following in /etc/postfix/main.cf:

inet_interfaces = all

Then follow the instructions from /usr/share/doc/schleuder/README.Debian and finally add the following line (thanks to the wiki instructions) to /etc/postfix/main.cf:

local_recipient_maps = proxy:unix:passwd.byname $alias_maps $transport_maps

Creating a new list

Once everything is set up, creating a new list is pretty easy. Simply run schleuder-newlist list@example.org and follow the instructions.

After creating your list, remember to update /etc/postfix/transports and run postmap /etc/postfix/transports.

Then you can test it by sending an email to LISTNAME-sendkey@example.com. You should receive the list's public key.

Adding list members

Once your list is created, the list admin is the only subscriber. To add more people, you can send an admin email to the list or follow these instructions to do it manually:

  1. Get the person's GPG key: gpg --recv-key KEYID
  2. Verify that the key is trusted: gpg --fingerprint KEYID
  3. Add the person to the list's /var/lib/schleuder/HOSTNAME/LISTNAME/members.conf:
    - email: francois@fmarier.org
      key_fingerprint: 8C470B2A0B31568E110D432516281F2E007C98D1
    
  4. Export the public key: gpg --export -a KEYID
  5. Paste the exported key into the list's keyring:

    sudo -u schleuder gpg --homedir /var/lib/schleuder/HOSTNAME/LISTNAME/ --import
    

Signing the list key

In order to give assurance to list members that the list key is legitimate, it should be signed by the list admin.

  1. Export the key:

    sudo -u schleuder gpg --homedir /var/lib/schleuder/HOSTNAME/LISTNAME/ -a --export LISTNAE@HOSTNAME > ~/LISTNAME.asc
    
  2. Copy the key onto your own machine using scp and import it: gpg --import < LISTNAME.asc
  3. Sign the key locally by editing the key gpg --edit-key KEYID and using the sign command.
  4. Export your signature: gpg -a --export KEYID > LISTNAME.asc
  5. Copy the signed key back onto the server using scp.
  6. Import your signature into the schleuder public keyring:

    sudo -u schleuder gpg --homedir /var/lib/schleuder/HOSTNAME/LISTNAME/ --import < ~/LISTNAME.asc
    

After that, anybody requesting the list key will get your signature as well.

Outsourcing your webapp maintenance to Debian

Modern web applications are much more complicated than the simple Perl CGI scripts or PHP pages of the past. They usually start with a framework and include lots of external components both on the front-end and on the back-end.

Here's an example from the Node.js back-end of a real application:

$ npm list | wc -l
256

What if one of these 256 external components has a security vulnerability? How would you know and what would you do if of your direct dependencies had a hard-coded dependency on the vulnerable version? It's a real problem and of course one way to avoid this is to write everything yourself. But that's neither realistic nor desirable.

However, it's not a new problem. It was solved years ago by Linux distributions for C and C++ applications. For some reason though, this learning has not propagated to the web where the standard approach seems to be to "statically link everything".

What if we could build on the work done by Debian maintainers and the security team?

Case study - the Libravatar project

As a way of discussing a different approach to the problem of dependency management in web applications, let me describe the decisions made by the Libravatar project.

Description

Libravatar is a federated and free software alternative to the Gravatar profile photo hosting site.

From a developer point of view, it's a fairly simple stack:

The service is split between the master node, where you create an account and upload your avatar, and a few mirrors, which serve the photos to third-party sites.

Like with Gravatar, sites wanting to display images don't have to worry about a complicated protocol. In a nutshell, all that a site needs to do is hash the user's email and add that hash to a base URL. Where the federation kicks in is that every email domain is able to specify a different base URL via an SRV record in DNS.

For example, francois@debian.org hashes to 7cc352a2907216992f0f16d2af50b070 and so the full URL is:

http://cdn.libravatar.org/avatar/7cc352a2907216992f0f16d2af50b070

whereas francois@fmarier.org hashes to 0110e86fdb31486c22dd381326d99de9 and the full URL is:

http://fmarier.org/avatar/0110e86fdb31486c22dd381326d99de9

due to the presence of an SRV record on fmarier.org.

Ground rules

The main rules that the project follows is to:

  1. only use Python libraries that are in Debian
  2. use the versions present in the latest stable release (including backports)

Deployment using packages

In addition to these rules around dependencies, we decided to treat the application as if it were going to be uploaded to Debian:

  • It includes an "upstream" Makefile which minifies CSS and JavaScript, gzips them, and compiles PO files (i.e. a "build" step).
  • The Makefile includes a test target which runs the unit tests and some lint checks (pylint, pyflakes and pep8).
  • Debian packages are produced to encode the dependencies in the standard way as well as to run various setup commands in maintainer scripts and install cron jobs.
  • The project runs its own package repository using reprepro to easily distribute these custom packages.
  • In order to update the repository and the packages installed on servers that we control, we use fabric, which is basically a fancy way to run commands over ssh.
  • Mirrors can simply add our repository to their apt sources.list and upgrade Libravatar packages at the same time as their system packages.

Results

Overall, this approach has been quite successful and Libravatar has been a very low-maintenance service to run.

The ground rules have however limited our choice of libraries. For example, to talk to our queuing system, we had to use the raw Python bindings to the C Gearman library instead of being able to use a nice pythonic library which wasn't in Debian squeeze at the time.

There is of course always the possibility of packaging a missing library for Debian and maintaining a backport of it until the next Debian release. This wouldn't be a lot of work considering the fact that responsible bundling of a library would normally force you to follow its releases closely and keep any dependencies up to date, so you may as well share the result of that effort. But in the end, it turns out that there is a lot of Python stuff already in Debian and we haven't had to package anything new yet.

Another thing that was somewhat scary, due to the number of packages that were going to get bumped to a new major version, was the upgrade from squeeze to wheezy. It turned out however that it was surprisingly easy to upgrade to wheezy's version of Django, Apache and Postgres. It may be a problem next time, but all that means is that you have to set a day aside every 2 years to bring everything up to date.

Problems

The main problem we ran into is that we optimized for sysadmins and unfortunately made it harder for new developers to setup their environment. That's not very good from the point of view of welcoming new contributors as there is quite a bit of friction in preparing and testing your first patch. That's why we're looking at encoding our setup instructions into a Vagrant script so that new contributors can get started quickly.

Another problem we faced is that because we use the Debian version of jQuery and minify our own JavaScript files in the build step of the Makefile, we were affected by the removal from that package of the minified version of jQuery. In our setup, there is no way to minify JavaScript files that are provided by other packages and so the only way to fix this would be to fork the package in our repository or (preferably) to work with the Debian maintainer and get it fixed globally in Debian.

One thing worth noting is that while the Django project is very good at issuing backwards-compatible fixes for security issues, sometimes there is no way around disabling broken features. In practice, this means that we cannot run unattended-upgrades on our main server in case something breaks. Instead, we make use of apticron to automatically receive email reminders for any outstanding package updates.

On that topic, it can occasionally take a while for security updates to be released in Debian, but this usually falls into one of two cases:

  1. You either notice because you're already tracking releases pretty well and therefore could help Debian with backporting of fixes and/or testing;
  2. or you don't notice because it has slipped through the cracks or there simply are too many potential things to keep track of, in which case the fact that it eventually gets fixed without your intervention is a huge improvement.

Finally, relying too much on Debian packaging does prevent Fedora users (a project that also makes use of Libravatar) from easily contributing mirrors. Though if we had a concrete offer, we would certainly look into creating the appropriate RPMs.

Is it realistic?

It turns out that I'm not the only one who thought about this approach, which has been named "debops". The same day that my talk was announced on the DebConf website, someone emailed me saying that he had instituted the exact same rules at his company, which operates a large Django-based web application in the US and Russia. It was pretty impressive to read about a real business coming to the same conclusions and using the same approach (i.e. system libraries, deployment packages) as Libravatar.

Regardless of this though, I think there is a class of applications that are particularly well-suited for the approach we've just described. If a web application is not your full-time job and you want to minimize the amount of work required to keep it running, then it's a good investment to restrict your options and leverage the work of the Debian community to simplify your maintenance burden.

The second criterion I would look at is framework maturity. Given the 2-3 year release cycle of stable distributions, this approach is more likely to work with a mature framework like Django. After all, you probably wouldn't compile Apache from source, but until recently building Node.js from source was the preferred option as it was changing so quickly.

While it goes against conventional wisdom, relying on system libraries is a sustainable approach you should at least consider in your next project. After all, there is a real cost in bundling and keeping up with external dependencies.

This blog post is based on a talk I gave at DebConf 14: slides, video.