2009-08-04

Migrating Sahana from cvs to git

Here is how I migrated the Sahana CVS repository to git, as part of the work that the Sahana New Zealand Cluster is gearing up to do.

It was actually a lot simpler than I expected!

Simple import

In a newly created directory:
  1. Do the initial setup:
    cvs -d:pserver:anonymous@sahana.cvs.sourceforge.net:/cvsroot/sahana login
  2. Run the migration:
    git cvsimport -k -m -v -d:pserver:anonymous@sahana.cvs.sourceforge.net:/cvsroot/sahana sahana-phase2
  3. Wait several hours for it to finish!

Fixing the author names

The only thing I didn't like from the above import was that all of the commit authors were missing full names and email addresses.

So I grabbed the list of Sahana project members from Sourceforge and created a mappping like this in a text file:
ajuonline=Ajay Kumar <ajuonline@users.sourceforge.net>
akshits88=Akshit Sharma <akshits88@users.sourceforge.net>
bhyde=Ben Hyde <bhyde@users.sourceforge.net>
Then I ran the import with the -A parameter:
git cvsimport -A username_map.txt -k -m -v -d:pserver:anonymous@sahana.cvs.sourceforge.net:/cvsroot/sahana sahana-phase2

Syncing new commits

To keep the git repository in sync with CVS and collect any new commits, simply run the same git cvsimport command again.

My next steps will be to setup an automated sync script to keep this repository up to date with the Sourceforge one.

5 comments:

Anonymous said...

It was mentioned to me elsewhere that you can _really_ speed up the initial import if you fetch the real cvs repository (as in, the files in cvs.sf.net) via rsync and then sync your git repo with that copy, since cvsimport needs to fetch each commit one by one.

Frédéric Brière said...

In my experience, git-cvsimport is the easiest and fastest way to import a CVS repo, as long as you don't care about whether or not the end result is correct. You might want to look at the ISSUES section of the manpage; in particular, it often cannot deal with branches correctly. (To be fair, this is partially a cvsps issue.)

Me, I don't trust anything but cvs2git to import a CVS repo. I then use git-cvsimport to keep it up-to-date, but even then, I've seen it omit a couple of commits for no good reason, forcing me to re-import from scratch. (Of course, if you're moving away from CVS, you only need to get it right once.)

If you choose to keep your initial import, don't blindly trust git-cvsimport. Compare the tip of every branch and every tag with a fresh CVS checkout, and take a close look at the history (gitk helps a lot). By its nature, Git makes it easy to fix mistakes at the start, but they'll be very hard to fix later on.

Good luck!

Frédéric Brière said...

BTW, cvs2git can be a bitch sometimes (though it was much worse when I first used it, compared to now), so I jotted down some notes the last time I had to import something. Here they are, tentatively adapted to your project name:

# Copy the CVS directory
rsync -av rsync://sahana.cvs.sourceforge.net/cvsroot/sahana/sahana-phase2 .
rsync -av rsync://sahana.cvs.sourceforge.net/cvsroot/sahana/CVSROOT .

# Checkout the latest cvs2svn (empty password, just press Enter)
svn co --username=guest http://cvs2svn.tigris.org/svn/cvs2svn/trunk cvs2svn-trunk
# Apply the attached patch to turn off keyword expansion (sigh)
patch -p0 -d cvs2svn-trunk < cvs2git.patch

# Dump the CVS history in a format ready for git-fast-import
cd cvs2svn-trunk
./cvs2git --blobfile ../cvs2git.blob --dumpfile ../cvs2git.dump \
--username '(no author)' --fallback-encoding utf-8 ../sahana-phase2

# Create and fill the Git repo
mkdir ../sahana
cd ../sahana
git init
cat ../cvs2git.{blob,dump} | git fast-import

mariuz said...

Thanks for the tips i now import the full firebird project history

seems that git cvsimport missed some commits a few weeks ago and is better to use rsync+ cvs2git to reimport it full

breezeight said...

Mariuz: How do you track the CVS repository after the initial import with cvs2git?