Sample Python application using Libgearman

Gearman is a distributed queue with several language bindings.

While Gearman has a nice Python implementation (python-gearman) of the client and worker, I chose to use the libgearman bindings (python-libgearman) directly since they are already packaged for Debian (as python-gearman.libgearman).

Unfortunately, these bindings are not very well documented, so here's the sample application I wished I had seen before I started.

Using the command-line tools

Before diving into the Python bindings, you should make sure that you can get a quick application working on the command line (using the gearman-tools package).

Here's a very simple worker which returns verbatim the input it receives:

gearman -w -f myfunction cat

and here is the matching client:

gearman -f myfunction 'test'

You can have have a look at the status of the queues in the server by connecting to gearmand via telnet (port 4730) and issuing the status command.

Using the Python libgearman bindings

Once your gearman setup is working (debugging is easier with the command-line tools), you can roll the gearman connection code into your application.

Here's a simple Python worker which returns what it receives:

#!/usr/bin/python  

from gearman import libgearman  

def work(job):  
   workload = job.get_workload()  
   return workload  

gm_worker = libgearman.Worker()  
gm_worker.add_server('localhost')  
gm_worker.add_function('myfunction', work)  

while True:  
   gm_worker.work()

and a matching client:

#!/usr/bin/python  

from gearman import libgearman  

gm_client = libgearman.Client()  
gm_client.add_server('localhost')  

result = gm_client.do('myfunction', 'test')  
print result

This should behave in exactly the same way as the command-line examples above.

Returning job errors

If you want to expose to the client errors in the processing done by the worker, modify the worker like this:

#!/usr/bin/python  
  
from gearman import libgearman  
  
def work(job):  
   workload = job.get_workload()  
   if workload == 'fail':  
       job.send_fail()  
   return workload  
  
gm_worker = libgearman.Worker()  
gm_worker.add_server('localhost')  
gm_worker.add_function('myfunction', work)  
  
while True:  
   gm_worker.work()

and the client this way:

#!/usr/bin/python  
  
from gearman import libgearman  
  
gm_client = libgearman.Client()  
gm_client.add_server('localhost')  
  
result = gm_client.do('myfunction', 'fail')  
print result

License

The above source code is released under the following terms:

CC0 To the extent possible under law, Francois Marier has waived all copyright and related or neighboring rights to this sample libgearman Python application. This work is published from: New Zealand.

Serving pre-compressed files using Apache

The easiest way to compress the data that is being served to the visitors of your web application is to make use of mod_deflate. Once you have enabled that module and provided it with a suitable configuration file, it will compress all releant files on the fly as it is serving them.

Given that I was already going to minify my Javascript and CSS files ahead of time (i.e. not using mod_pagespeed), I figured that there must be a way for me to serve gzipped files directly.

"Compiling" Static Files

I decided to treat my web application like a c program. After all, it starts as readable source code and ends up as an unreadable binary file.

So I created a Makefile to minify and compress all CSS and Javascript files using YUI Compressor and gzip:

all: build  

build:  
     find static/css -type f -name "[^.]*.css" -execdir yui-compressor -o {}.css {} \;  
     find static/js -type f -name "[^.]*.js"  -execdir yui-compressor -o {}.js {} \;  
     cd static/css && for f in *.css.css ; do gzip -c $$f > `basename $$f .css`.gz ; done  
     cd static/js && for f in *.js.js ; do gzip -c $$f > `basename $$f .js`.gz ; done  

clean:  
     find static/css -name "*.css.css" -delete  
     find static/js -name "*.js.js" -delete  
     find static/css -name "*.css.gz" -delete  
     find static/js -name "*.js.gz" -delete  
     find -name "*.pyc" -delete

This leaves the original files intact and adds minified .css.css and .js.js files as well as minified and compressed .css.gz and .js.gz files.

How browsers advertise gzip support

The nice thing about serving compressed content to browsers is that browsers that support receiving gzipped content (almost all of them nowadays) include the following HTTP header in their requests:

Accept-Encoding = gzip,deflate

(Incidently, if you want to test what non-gzipped enable browsers see, just browse to about:config and remove what's in the network.http.accept-encoding variable.)

Serving compressed files to clients

To serve different files to different browsers, all that's needed is to enable Multiviews in our Apache configuration (as suggested on the Apache mailing list):

<Directory /var/www/static/css>  
 AddEncoding gzip gz  
 ForceType text/css  
 Options +Multiviews  
 SetEnv force-no-vary  
 Header set Cache-Control "private"  
</Directory>  

<Directory /var/www/static/js>  
 AddEncoding gzip gz  
 ForceType text/javascript  
 Options +Multiviews  
 SetEnv force-no-vary  
 Header set Cache-Control "private"  
</Directory>

The ForceType directive is there to force the mimetype (as described in this solution) and to make sure that browsers (including Firefox) don't download the files to disk.

As for the SetEnv directive, it turns out that on Internet Explorer, most files with a Vary header (added by Apache) are not cached and so we must make sure it gets stripped out before the response goes out.

Finally, the Cache-Control headers are set to private to prevent intermediate/transparent proxies from caching our CSS and Javascript files, while allowing browsers to do so. If intermediate proxies start caching compressed content, they may incorrectly serve it to clients without gzip support.

Keeping a log of branch updates on a git server

Using a combination of bad luck and some of the more advanced git options, it is possible to mess up a centralised repository by accidentally pushing a branch and overwriting the existing branch pointer (or "head") on the server.

If you know where the head was pointing prior to that push, recovering it is a simple matter of running this on the server:

git update-ref /refs/heads/branchname commit_id

However, if you don't know the previous commit ID, then you pretty much have to dig through the history using git log.

Enabling a server-side reflog

One option to prevent this from happening is to simply enable the reflog, which is disabled by default in bare repositories, on the server.

Simply add this to your git config file on the server:

[core]  
   logallrefupdates = true

and then whenever a head is updated, an entry will be added to the reflog.