Like any sensible software developer, I have a close relationship with revision control systems. In my previous job, I was an SCM Engineer (see
Software configuration management) which meant I had an even closer relationship than most, since we were running the CVS servers and actively using them to track changes and deployments.
We all know, deep down, that revision control systems shouldn’t exist. This kind of thing should be inherent in the design of the operating system, through standard file and filesystem formats. The
OLPC interface is making some headway towards that, but for the rest of us, it means using a revision control tool throughout the development process.
Unfortunately, even though the tool is expected to be the most-used command on your system, very few of them are particularly easy to use. Thus there’s a large learning curve, and people become religious about their choice since they have invested significant time in using it.
Just to spice the mix up, not only will people religiously defend their choice of revision control system, but they’ll do so while actively hating it.
In the beginning there was
CVS and we all thought that it was pretty good. It was based on the simpler RCS and shared a file-format with it, but introduced control of directory trees and remote operation.
Actually, in reality, CVS wasn’t that good. Its command set could be a little strange and inconsistent (e.g. it’s not possible to diff between two dates on a branch); the support for branching assumed that all branches would be merged into the mainline, and only once; and nobody ever really knew how to create a new project in a repository (tip.
cvs import
is wrong).
But we all used it anyway, and we muddled through. It did have some good features; it was simple, fast and pretty reliable–when it did break, you could usually fix the repository yourself. And most importantly of all, we understood how to drive it.
And so it was for many years, until
Subversion (SVN) came along. Subversion intended to be “a better CVS”, perhaps this goal should have made us suspicious at the time since CVS was already being a pretty good CVS by itself; unfortunately we hated CVS so much we flocked to the new system in hope.
In hindsight, Subversion didn’t really improve on CVS much at all. In fact, arguably, the only real improvement was the addition of atomic commits (in CVS, each commit is per-file, so it’s manual labour to work out which change was made to two files at the same time).
(Its support for branching, tagging, copying, renaming, etc. were no better than CVS’s when done in the repository by hand.)
The cost of this single new feature was a much more complicated interface (with two separate commands), a backend that tended to break down weekly and a lethargic slowness to its operation.
Most people I know now justify their use of Subversion instead of CVS by “Subversion is maintained, CVS isn’t” which is a somewhat self-fulfilling justification.
While the mass conversion to Subversion, and ensuing disappointment and frustration, was going on; something new appeared on the horizon:
Arch.
Arch was different, it broke one of the core assumptions of revision control, that of the repository as a cathedral. In CVS, and Subversion like it, if somebody wants to modify your code (even if on a branch) you need to give them access to your own repository. In some cases (especially with CVS), vast access control and permission structures would be in place to ensure proper behaviour.
With Arch, you don’t; all you need to give to anyone is
read access. Anybody can make their own branch by copying yours and committing to their own copy.
This model also necessitated fixing a long standing problem that CVS had; Arch has repeatable (smart) merging. If you merge from a branch, you can merge again later, and again, and again.
Arch made this possible through each commit (changeset) having a globally unique identifier; made from the branch’s own globally unique identifier and the changeset number in the branch.
Unfortunately while this was a massive step in a new direction, Arch had an absolutely terrible user interface. Its command list was terrifying with over 100 commands, many of which had multiple word names (
tla set-tree-version
). It exposed too many of its own innards, and expected you to learn them. It also forced baroque file naming semantics on its users and strange policy (though shalt not commit without first running “make clean”).
Efforts were made to improve Arch’s user interface through projects such as
baz, but they were always to be doomed from the start.
We’ve since seen an explosion of new revision control systems;
Monotone,
Darcs,
Git and
Bazaar.
What’s especially interesting is the commonality between these systems. They are all “distributed” like Arch, though they also all discard the strange “unique branch identifier” convention and instead simply assign a unique identifier to each file or commit.
This means that they all support personal branches, and by necessity all support repeatable (smart) merging.
So how do they differ, what are their killer features and killer problems?
Monotone is all about repository integrity, ensuring that every commit is both authorised and intact. It pays for this with a severe lack of speed.
Darcs is based around a “theory of patches”, a branch is not made up of its history but by the collection of patches in it. Unfortunately this often breaks down, and darcs frequently gets stuck calculating even trial and commonplace branch models.
Git is very strange to me; its killer feature appears to be the speed at which it can handle very large trees, but the interface is as insane as Arch’s was. It is heavily optimised for the “I only apply patches” development model, at the expense of ordinary development models (it shares an issue with Arch where calculating annotations on an individual file is an expensive operation).
What about Bazaar? Its killer feature is that it is designed to work the way you do. The command set is relatively small, and each of them works in the most obvious manner. It also supports plugins so that you can always implement your own workflow.
Of all the revision control systems, it’s the only one (that I’m aware of) that supports both distributed and centralised workflows (and lets you go distributed when you need to, e.g. when you’re on a plane).
Here’s a few examples of how Bazaar’s command set works the way you do. To start managing some code in bzr:
$ cd myproject
$ bzr init
To add the files, copy in your usual
.bzrignore
file and just add everything:
$ cp ~/bzrignore .bzrignore
$ bzr add
added foo.c
added bar.c
Check the output for mistakenly added files, adjust
.bzrignore
and remove the file with
bzr rm
.
A common operation is realising that the commit you’re about to make should really go on a new branch for now:
$ cd ..
$ cp -a myproject myproject-foo
$ cd myproject-foo
$ bzr commit
A copy of a Bazaar branch is a different branch, you can commit to it separately. There’s a
bzr branch
command for it too (which deals with issues such as bound branches, checkouts, etc.) but it’s nice to demonstrate that Bazaar does what you’d expect even when you don’t use its own commands.
Pulling changes from another branch (where you haven’t made any modifications yet) is easy:
$ bzr pull ../myproject
As is merging (when your branches have diverged):
$ bzr merge ../myproject
One particularly nice feature is that after a merge, you see the merge as a single commit and it can be treated as such; but it also has the set of merged commits indented under it–you can examine these as individual commits as well!
What’s the downside of Bazaar? Well, it’s not the fastest system (but by no means the slowest), for small to medium sized projects this is never an issue but may be for extremely large projects–fortunately the developers are improving its performance all the time!
But that doesn’t matter; it is, honestly, the first revision control system that I don’t hate.