2010-08-24

Combining multiple commits into one using git rebase

git rebase provides a simple way of combining multiple commits into a single one. However using rebase to squash an entire branch down to a single commit is not completely straightforward.

Squashing normal commits

Using the following repository:
$ git log --oneline
c172641 Fix second file
24f5ad2 Another file
97c9d7d Add first file
we can combine the last two commits (c172641 and 24f5ad2) by rebasing up to the first commit:
$ git rebase -i 97c9d7d
and specify the following commands in the interactive rebase screen:
pick 24f5ad2 Another file
squash c172641 Fix second file
which will rewrite the history into this:
$ git log --oneline
1a9d5e4 Another file
97c9d7d Add first file

Rebasing the initial commit

Trying to include the initial commit in the interactive rebase screen will return this error:
$ git rebase -i 97c9d7d^
fatal: Needed a single revision
Invalid base
and squashing the top commit in the interactive rebase screen:
$ git rebase -i 97c9d7d

squash 24f5ad2 Another file
squash c172641 Fix second file
will return this error:
Cannot 'squash' without a previous commit
So we need to use a different approach to deal with the initial commit.

Amending the initial commit

Here is an alternative to rebase which will work on commits that don't have a parent.

Taking the previously rebased branch:
$ git log --oneline
1a9d5e4 Another file
97c9d7d Add first file
we can rewind the branch to the initial commit:
$ git reset 97c9d7d
$ git log --oneline
97c9d7d Add first file
without losing any of the changes introduced in 1a9d5e4 (shown here as uncommitted changes):
$ git status
# On branch master
# Changed but not updated:
# (use "git add ..." to update what will be committed)
# (use "git checkout -- ..." to discard changes in working directory)
#
# modified: file1
#
# Untracked files:
# (use "git add ..." to include in what will be committed)
#
# file2
no changes added to commit (use "git add" and/or "git commit -a")
Then we can reopen commit 97c9d7d and add the changes present in the working directory:
$ git add .
$ gitc -a --amend -m "Initial version"
which will finally give us a fully squashed branch:
$ git log --oneline
fcb85fb Initial version

$ git status
# On branch master
nothing to commit (working directory clean)

2010-07-20

Cherry-picking a range of git commits

The cherry-pick command in git allows you to copy commits from one branch to another, one commit at a time. In order to copy more than one commit at once, you need a different approach.

Cherry-picking a single commit

Say we have the following repository composed of three branches (master, feature1 and stable):

$ git tree --all
* d9484311 (HEAD, master) Delete test file
* 4d4a0da8 Add a test file
| * 5753515c (stable) Add a license
| * 4b95278e Add readme file
|/
| * a37658bd (feature1) Add fourth file
| * a7785c10 Add lines to 3rd file
| * 7f545188 Add third file
| * 2bca593b Add line to second file
| * 0c13e436 Add second file
|/
* d3199755 Add a line
* b58d925c Initial commit

The "git tree" command is an alias I defined in my ~/.gitconfig:

[alias]
tree = log --oneline --decorate --graph

To copy the license file (commit 5753515c) to the master branch then we simply need to run:

$ git checkout master
$ git cherry-pick 5753515c
Finished one cherry-pick.
[master 08ff7d4] Add a license
1 files changed, 676 insertions(+), 0 deletions(-)
create mode 100644 COPYING

and the repository now looks like this:

$ git tree --all
* 08ff7d4a4 (HEAD, master) Add a license
* d94843113 Delete test file
* 4d4a0da88 Add a test file
| * 5753515c (stable) Add a license
| * 4b95278e Add readme file
|/
| * a37658bd (feature1) Add fourth file
| * a7785c10 Add lines to 3rd file
| * 7f545188 Add third file
| * 2bca593b Add line to second file
| * 0c13e436 Add second file
|/
* d3199755 Add a line
* b58d925c Initial commit

Cherry-picking a range of commits

In order to only take the third file (commits a7785c10 and 7f545188) from the feature1 branch and add it to the stable branch, I could cherry-pick each commit separately, but there is a faster way if you need to cherry-pick a large range of commits.

First of all, let's create a new branch which ends on the last commit we want to cherry-pick:

$ git branch tempbranch a7785c10
$ git tree --all
* 08ff7d4a (HEAD, master) Add a license
* d9484311 Delete test file
* 4d4a0da8 Add a test file
| * 5753515c (stable) Add a license
| * 4b95278e Add readme file
|/
| * a37658bd (feature1) Add fourth file
| * a7785c10 (tempbranch) Add lines to 3rd file
| * 7f545188 Add third file
| * 2bca593b Add line to second file
| * 0c13e436 Add second file
|/
* d3199755 Add a line
* b58d925c Initial commit

Now we'll rebase that temporary branch on top of the stable branch:

$ git rebase --onto stable 7f545188^ tempbranch
First, rewinding head to replay your work on top of it...
Applying: Add third file
Applying: Add lines to 3rd file
$ git tree --all
* ec488677 (HEAD, tempbranch) Add lines to 3rd file
* a85e5281 Add third file
* 5753515c (stable) Add a license
* 4b95278e Add readme file
| * 08ff7d4a (master) Add a license
| * d9484311 Delete test file
| * 4d4a0da8 Add a test file
|/
| * a37658bd (feature1) Add fourth file
| * a7785c10 Add lines to 3rd file
| * 7f545188 Add third file
| * 2bca593b Add line to second file
| * 0c13e436 Add second file
|/
* d3199755 Add a line
* b58d925c Initial commit

All that's left to do is to make stable point to the top commit of tempbranch and delete the old branch:

$ git checkout stable
Switched to branch 'stable'
$ git reset --hard tempbranch
HEAD is now at ec48867 Add lines to 3rd file
$ git tree --all
* ec488677 (HEAD, tempbranch, stable) Add lines to 3rd file
* a85e5281 Add third file
* 5753515c Add a license
* 4b95278e Add readme file
| * 08ff7d4a (master) Add a license
| * d9484311 Delete test file
| * 4d4a0da8 Add a test file
|/
| * a37658bd (feature1) Add fourth file
| * a7785c10 Add lines to 3rd file
| * 7f545188 Add third file
| * 2bca593b Add line to second file
| * 0c13e436 Add second file
|/
* d3199755 Add a line
* b58d925c Initial commit
$ git branch -d tempbranch
Deleted branch tempbranch (was ec48867).

It would be nice to be able to do it without having to use a temporary branch, but it still beats cherry-picking everything manually.

Another approach

Another way to achieve this is to use the format-patch command to output patches for the commits you are interested in copying to another branch and then using the am command to apply them all to the target branch:

$ git format-patch 7f545188^..a7785c10
0001-Add-third-file.patch
0002-Add-lines-to-3rd-file.patch
$ git am *.patch

Update: looking forward to git 1.7.2

According to a few people who were nice to point this out in a comment, version 1.7.2 of git, which is going to be released soon, will have support for this in cherry-pick:

git cherry-pick 7f545188^..a7785c10

2010-07-09

Improving the performance of Request Tracker by reducing latency

Request Tracker is a really neat support tool, but one of the common complaints I heard from people using it during a previous project was that it was pretty slow.

There wasn't much we could do about the (overloaded) server it was running on, but I found that enabling mod_deflate really helped.

After watching this great video though, I was inspired to look into it a bit more, focussing this time on latency.

Description of tests

  • RT 3.6.7-5+lenny4
  • Running on a Debian Lenny vserser.
  • Server is on the LAN.
  • Firefox 3.6.6 client (with Firebug 1.5.0)

Also note that I was looking for the "best case" for each of the different configurations and so each screenshot was taken after reloading the homepage 10-20 times to maximize cache hits (thanks in large part to mod_expires).

(Is there a nice automated way of measuring average latency?)

Stock RT 3.6

Using the default apache2-modperl2 config file (as supplied by RT), here's what the homepage (logged in as root) looked like before I changed anything:



The purple section here indicates the time spent waiting for the server. This shows that the server (running Mason inside mod_perl) is doing quite a bit of processing, including a lot more work than you'd expect while serving static files. It's quite impressive to see how fast the images are being served (directly by Apache) in comparison with the Javascript and the CSS files (which go through Mason).

The reason while Javascript and CSS files have to be served by mod_perl is that they are in fact templates. They contain a few Mason variables which must be substituted before being served.

Looking into it further though, all of these replacements have to do with variables defined in RT_SiteConfig.pm (mostly the install path). Here's an example:
var path = "" ? "" : "/";
which gets turned into:
var path = "/rt" ? "/rt" : "/";
So as long as these paths don't change, then there is no need to re-generate these files.

Static Javascript and CSS

This next diagram was produced after configuring Apache to serve all Javascript and CSS files directly from Apache:



The way I did that (without modifying any of the original files) was by saving the Javascript/CSS sent to the browser and using mod_rewrite rules to serve these files instead of the original templated ones:

# Serve static files directly
RewriteRule ^/usr/share/request-tracker3.6/html/NoAuth/js/ahah.js$ /var/www/rt/ahah.js
RewriteRule ^/usr/share/request-tracker3.6/html/NoAuth/js/cascaded.js$ /var/www/rt/cascaded.js
RewriteRule ^/usr/share/request-tracker3.6/html/NoAuth/js/class.js$ /var/www/rt/class.js
RewriteRule ^/usr/share/request-tracker3.6/html/NoAuth/js/combobox.js$ /var/www/rt/combobox.js
RewriteRule ^/usr/share/request-tracker3.6/html/NoAuth/js/list.js$ /var/www/rt/list.js
RewriteRule ^/usr/share/request-tracker3.6/html/NoAuth/js/titlebox-state.js$ /var/www/rt/titlebox-state.js
RewriteRule ^/usr/share/request-tracker3.6/html/NoAuth/js/util.js$ /var/www/rt/util.js
RewriteRule ^/usr/share/request-tracker3.6/html/NoAuth/css/3.5-default/main-squished.css$ /var/www/rt/main-squished.css
RewriteRule ^/usr/share/request-tracker3.6/html/NoAuth/css/print.css$ /var/www/rt/print.css
RewriteRule ^/usr/share/request-tracker3.6/html/NoAuth/webrtfm.css$ /var/www/rt/webrtfm.css

Removing unnecessary images

Finally, one thing I noticed from this last graph is that the rounded corners in the theme require a number of small images. While these don't take a whole lot of bandwidth, they do require quite a bit of back and forth between the browser and the server.

So I replaced all of the "rounded corner" images in the main-squished.css file with the following CSS attributes:

-moz-border-radius-topleft: 8px;
-moz-border-radius-topright: 8px;
-moz-border-radius-bottomleft: 8px;
-moz-border-radius-bottomright: 8px;
-webkit-border-top-left-radius: 8px;
-webkit-border-top-right-radius: 8px;
-webkit-border-bottom-left-radius: 8px;
-webkit-border-bottom-right-radius: 8px;

(yes, Internet Explorer users probably don't get the rounded corners... oh well)

This eliminated a number of roundtrips and shaved off a few more milliseconds:



Merging CSS and Javascript files

By this stage, the pages are pretty snappy so there is not much to be gained anymore, but I figured I'd try to reduce the latency a bit more by combining all Javascript files into one (and doing the same for CSS files with the exception of print.css). This is what I got:



(Note that I also took the opportunity to minify both squished files to reduce the filesize.)

Not a huge improvement and I unfortunately had to copy quite a few Mason templates from /usr/share/request-tracker3.6/html/ to /usr/local/share/request-tracker3.6/html/ and then replace all of the script tags with a single one in html/Elements/Header.

Others things to look into

I've stopped here, but there might be ways to further reduce the processing time on the server (hence the latency) by tuning mod_perl/Mason or Postgres. The RT wiki also has a few pointers.

Replacing Apache with Nginx (which means moving to FastCGI) was something I considered, but after trying it out, it turned out that it would add about 100ms of extra latency.

Feel free to leave a comment if you've found something else that makes a big difference on your site.

2010-07-04

Querying deleted content in git

If you have removed a file (or part of a file) from git, it's not immediately obvious how to query its history. Here are two ways to deal with deleted content in git.

Commit history of a deleted file

If we take the following two files:
$ ls
file1 file2

and then decide to delete one of them:
$ git rm file2
rm 'file2'
$ git commit -m "Delete a file"
[deletefile 87fadb9] Delete a file
1 files changed, 0 insertions(+), 1 deletions(-)
delete mode 100644 file2

To see the commit history of that file, you can't do it the usual way:
$ git log file2
fatal: ambiguous argument 'file2': unknown revision or path not in the working tree.
Use '--' to separate paths from revisions

Instead, you need to do this:
$ git log -- file2

Finding the commit that deleted a line

Finding the commit that deleted a line is slightly more complicated. Unfortunately, we can't really use git blame for that. All we can do with git blame is to find the last commit which contained the deleted line.

So if we add the following file:
$ cat file3
one
two
three
$ git add file3
$ git commit -a -m "Add a third file"
[master e62ace6] Add a third file
1 files changed, 3 insertions(+), 0 deletions(-)
create mode 100644 file3

and remove the second line:
$ cat file3
one
three
$ git commit -a -m "Remove a line"
[removeline f3eb691] Remove a line
1 files changed, 0 insertions(+), 1 deletions(-)

then we can use git blame see what was the last revision to contain each line:
$ git blame --reverse HEAD^..HEAD file3
f3eb691d (Francois 2010-07-04 1) one
^e62ace6 (Francois 2010-07-04 2) two
f3eb691d (Francois 2010-07-04 3) three

Finding the commit that deleted that file requires using git log to search for the text contained on that deleted line:
$ git log --oneline -S'two' file3
f3eb691 Remove a line
e62ace6 Add a third file

2010-06-14

Handling email-based events safely

Mahara.org users have recently witnessed the unfortunate effects of a bug in the Mahara event handling: the Mahara cron job got stuck in an email loop and kept on sending the same forum post over and over again.

Here is what the affected code used to look like (edited for clarity):

db_begin();
$activities = get_records('activity_queue');
foreach ($activities as $activity) {
handle_activity($activity);
}
delete_records('activity_queue');
db_commit();


One of the problems with this code was that if the handle_activity() function threw an exception, it would interfere with the processing of the entire queue. So it got fixed in the following way:

db_begin();
$activities = get_records('activity_queue');
foreach ($activities as $activity) {
try {
handle_activity($activity);
}
catch (MaharaException $e) {
log_debug($e->getMessage());
}

}
delete_records('activity_queue');
db_commit();


Much better. However, there was still a problem with the code: the whole queue processing is contained within a transaction. This means that should any of the SQL statements fail at any point, the exception would be caught but the SQL statements would all be rolled back at the end by the database.

Now the idea of a transaction is good: we want activity handling to either succeed entirely or be rolled back. But the fact that some activities cannot be handled should not interfere with other activities. So this has been fixed by moving the transaction to the inside of the loop:

$activities = get_records('activity_queue');
foreach ($activities as $activity) {
db_begin();
try {
handle_activity($activity);
}
catch (MaharaException $e) {
log_debug($e->getMessage());
}
db_commit();
}
delete_records('activity_queue');


So individual activites are allowed to fail and get rolled back, but they will not affect other activites. But there was still one remaining problem: what if we encounter an error we cannot catch? For example, what would happen if PHP were to segfault or run out of memory while the activity queue is being processed?

Well, in that case, it turns out that Mahara will never reach the delete_records() call and the activity queue will never be cleared. Which means that on the next cron run, all of the activities will be handled again, even the ones that were successfully handled already (i.e. emails will be sent over and over again).

The way we fixed this problem was by moving the delete_records() from the end of the function to the beginning of the loop:

$activities = get_records('activity_queue');
foreach ($activities as $activity) {
delete_records('activity_queue', 'id', $activity->id);

db_begin();
try {
handle_activity($activity);
}
catch (MaharaException $e) {
log_debug($e->getMessage());
}
db_commit();
}


Each activity is removed from the queue before it is processed.

Unfortunately, there is a downside to this modification: should an activity handler fail for whatever reason, no further attempts will be made. This means that some notifications could be lost if an unexpected error occurs.

However, given that some of the activity handlers send emails out into the world and that it is not possible to "un-send" them, this is the only way we can guarantee that no duplicate emails will be sent. Of course, if you notice that certain notifications are lost because of a bug in Mahara, let us know and we'll fix it!