Executive summary
Debian should consider:
- less fine-grained software packaging for overall simplification of
the system and its development
- less flexibility at the lower levels of the software stack, again
for overall simplicity
- reducing friction in the development workflow, especially by making
it possible to branch and merge at the whole system level
Introduction
I've taken a job with
Codethink,
as part of a team to develop
a new embedded Linux system, called
Baserock. Baserock is not a
distribution as such, but deals with many of the same issues.
I thought it might be interesting to write out my new thoughts
related to this, since they might be useful for proper
distributions to think about too.
I'll hasten to add that many of these thoughts are not originally
mine. I shamelessly adopt any idea that I think is good.
The core ideas for Baserock
come from Rob Taylor, Daniel Silverstone, and Paul Sherwood, my
colleagues and bosses at Codethink. At the recent GNOME Summit
in Montreal, I was greatly influenced by Colin Walters.
This is also not an advertisment for Baserock, but since my of
my current thinking come from that project, I'll discuss things
in the context of Baserock.
Finally, I'm writing this to express my personal opinion and
thoughts. I'm not speaking for anyone else, least of all Codethink.
On the package abstraction
Baserock abandons the idea of packages for all individual programs.
In the early 1990s, when Linux was new, and the number of packages in
a distribution could be counted on two hands in binary, packages were
a great idea. It was feasible to know at least something about every
piece of software, and pick the exact set of software to install on
a machine.
It is still feasible to do so, but only in quite restricted circumstances.
For example, picking the packages to install a DNS server, or an NFS
server, or a mail server, by hand, without using meta packages or
tasks (in Debian terminology), is still quite possible. On embedded
devices, there's also usually only a handful of programs installed,
and the people doing the install can be expected to understand all
of them and decide which ones to install on each specific device.
However, those are the exceptions, and they're getting rarer. For
most people, manually picking software to install is much too tedious.
In Debian, we've realised this a great many years ago, and developed
meta packages (whose only purpose is to depend on other packages) and
tasks (solves the same problem, but differently). These make it possible
for a user to say "I want to have the GNOME desktop environment", and
not have to worry about finding every piece that belongs in GNOME,
and install that separately.
For much of the past decade, computers have had sufficient hard disk
space that it is no longer necessary to be quite so picky about what
to install. A new cheap laptop will now typically come with at least 250
gigabytes of disk space. An expensive one, with an SSD drive, will have
at least 128 gigabytes. A fairly complete desktop install uses less than
ten gigabytes, so there's rarely a need to pick and choose between
the various components.
From a usability point of view, choosing from a list of a dozen or
two options is much easier than from a list of thirty five thousand
(the number of Debian packages as I write this).
This is one reason why Baserock won't be using the traditional
package abstraction. Instead, we'll collect programs into larger
collections, called strata, which form some kind of logical or
functional whole. So there'll be one stratum for the core userland
software: init, shell, libc, perhaps a bit more. There'll be another
for a development environment, one for a desktop environment, etc.
Another, equally important reason, to move beyond packages is the
problems caused by the combinatorial explosion of packages and versions.
Colin Walters talks about this very well. When every system has a
fairly unique set of packages, and versions of them, it becomes much
harder to ensure that software works well for everyone, that upgrades
work well, and that any problems get solved. When the number of
possible package combinations is small, getting the interactions between
various packages right is easier, QA has a much easier time to
test all upgrade paths, and manual test coverage improves a lot when
everyone is testing the same versions.
Even debugging gets easier, when everyone can easily run the same
versions.
Grouping software into bigger collections does reduce flexibility
of what gets installed. In some cases this is important: very
constrainted embedded devices, for example, still need to be very
picky about what software gets installed. However, even for them,
the price of flash storage is low enough that it might not matter
too much, anymore. The benefit of a simpler overall system may well
outweigh the benefit of fine-grained software choice.
Everything in git
In Baserock, we'll be building everything from source in git.
It will not be possible to build anything, unless the source is
committed. This will allow us to track, for each binary blob we
produce, the precise sources that were used to bulid it.
We will also try
to achieve something a bit more ambitious: anything that affects
any bit in the final system image can be traced to files committed
to git. This means tracking also all configuration settings for the
build, and the whole build environment, in git.
This is important for us so that we can reproduce an image used in
the field. When a customer is deploying a specific image, and needs
it to be changed, we want to be able to make the change with the
minimal changes compared to the previous version of the image. This
requires that we can re-create the original image, from source, bit for bit,
so that when we make the actual change, only the changes we need to
make affect the image.
We will make it easy to branch and merge not just individual
projects, but the whole system. This will make it easy to do
large changes to the system, such as transitioning to a new
version of GNOME, or the toolchain. Currently, in Debian, such
large changes need to be serialised, so that they do not affect
each other. It is easy, for example, for a GNOME transition to
be broken by a toolchain transition.
Branching and merging has long been considered the best available
solution for concurrent development within a project. With Baserock,
we want to have that for the whole system. Our build servers will
be able to build images for each branch, without
requiring massive hardware investment: any software that is shared
between branches only gets built once.
Launchpad PPAs and similar solutions provide many of the benefits
of branching and merging on the system level. However, they're much
more work than "git checkout -b gnome-3.4-transition". I believe
that the git based approach will make concurrent development much
more efficient. Ask me in a year if I was right.
Git, git, and only git
There are a lot of version control systems in active use. For the
sake of simplicity, we'll use only git. When an upstream project
uses something else, we'll import their code into git. Luckily, there
are tools for that. The import and updates to it will be
fully automatic, of course.
Git is not my favorite version control system, but it's clearly
the winner. Everything else will eventually fade away into
obscurity. Or that's what we think. If it turns out that we're wrong
about that, we'll switch to something else. However, we do not
intend to have to deal with more than one at a time. Life's too short
to use all possible tools at the same time.
Tracking upstreams closely
We will track upstream version control repositories, and we will have
an automatic mechanism of building our binaries directly from git. This
will, we hope, make it easy to follow closely upstream development, so
that when, say, GNOME developers make commits, we want to be able to
generate a new system image which includes those changes the same day,
if not within minutes, rather than waiting days or weeks or months.
This kind of closeness is greatly enhanced by having everything in
version control. When upstream commits changes to their version control
system, we'll mirror them automatically, and this then triggers a
new system image build. When upstream makes changes that do not work,
we can easily create a branch from any earlier commit, and build images
off that branch.
This will, we hope, also make it simpler to make changes, and give them
back to upstream. Whenever we change anything, it'll be done in a branch,
and we'll have system images available to test the change. So not only
will upstream be able to easily get the change from our git repository,
they'll also be easily verify, on a running system, that the change
fixes the problem.
Automatic testing, maybe even test driven development
We will be automatically building system images from git commits
for Baserock. This will potentially result in a very large number
of images. We can't possibly test all of them manually, so we will
implement some kind of automatic testing. The details of that are
still under discussion.
I hope to be able to start adding some test driven development
to Baserock systems. In other words, when we are requested to
make changes to the system, I want the specification to be provided
as executable tests. This will probably be impossible in real
life, but I can hope.
I've talked about doing the same thing for Debian, but it's much
harder to push through such changes in an established, large project.
Solving the install, upgrade, and downgrade problems
All mainstream Linux distributions are based on packages, and they all,
pretty much, do installations and upgrades by unpacking packages onto
a running system, and then maybe running some scripts from the packages.
This works well for a completely idle system, but not so well on systems
that are in active use. Colin Walters again talks about this.
For installation of new software, the problem is that someone or something
may invoke it before it is fully configured by the package's maintainer
script. For example, a web application might unpack in such a way that
the web server notices it, and a web browser may request the web app
to run before it is configured to connect to the right database. Or a
GUI program may unpack a
.desktop
file before the executable or its
data files are unpacked, and a user may notice the program in their
menu and start it, resulting in an error.
Upgrades suffer from additonal problems. Software that gets upgraded
may be running during the upgrade. Should the package manager replace
the software's data files with new versions, which may be in a format
that the old program does not understand? Or install new plugins that
will cause the old version of the program to segfault? If the package
manager does that, users may experience turbulence without having put
on their seat belts. If it doesn't do
that, it can't install the package, or it needs to wait, perhaps
for a very long time, for a safe time to do the upgrade.
These problems have usually been either ignored, or solved by using
package specific hacks. For example, plugins might be stored in a
directory that embeds the program's version number, ensuring that
the old version won't see the new plugins. Some people would like to
apply installs and upgrades only at shutdown or bootup, but that has
other problems.
None of the hacks solve the downgrade problem. The package managers
can replace a package with an older version, and often this works well.
However, in many cases, any package maintainer scripts won't be able
to deal with downgrades. For example, they might convert data files
to a new format or name or location upon upgrades, but won't try to
undo that if the package gets downgraded. Given the combinatorial
explosion of package versions, it's perhaps just as well that they
don't try.
For Baserock, we absolutely need to have downgrades. We need to be
able to go back to a previous version of the system if an upgrade
fails. Traditionally, this has been done by providing a "factory
reset", where the current version of the system gets replaced with
whatever version was installed in the factory. We want that, but we
also want to be able to choose other versions, not just the factory
one. If a device is running version X, and upgrades to X+1, but that
version turns out to be a dud, we want to be able to go back to X,
rather than all the way back to the factory version.
The approach we'll be taking with Baserock relies on btrfs and
subvolumes and snapshots. Each version of the system will be
installed in a separate subvolume, which gets cloned from the
previous version, using copy-on-write to conserve space. We'll
make the bootloader be able to choose a version of the system to
boot, and (waving hands here) add some logic to be able to automatically
revert to the previous version when necessary. We expect this
to work better and more reliably than the current package based one.
Making choices
Debian is pretty bad at making choices. Almost always, when faced
with a need to choose between alternative solutions for the same
problem, we choose all of them. For example, we support pretty much
every
init
implementation, various implementations of
/bin/sh
, and we even have at least three entirely different kernels.
Sometimes this is non-choice is a good thing. Our users may need features
that only one of the kernels support, for example. And we certainly
need to be able to provide both mysql and postresql, since various
software we want to provide to our uses needs one and won't work with
the other.
At other times, the inability to choose causes trouble. Do we really
need to support more than one implemenation of
/bin/sh
? By supporting
both dash and bash for that, we double the load on testing and QA,
and introduce yet another variable to deal with into any debugging
situation involving shell scripts.
Especially for core components of the system, it makes sense to limit
the flexibility of users to pick and choose. Combinatorial explosion
d j vu. Every binary choice doubles the number of possible combinations
that need to be tested and supported and checked during debugging.
Flexibility begets complexity, complexity begets problems.
This is less of a problem at upper levels of the software stack.
At the very top level, it doesn't really matter if there are many
choices. If a user can freely choose between vi and Emacs, and this
does not add complexity at the system level,
since nothing else is affected by that
choice. However, if we were to add a choice between glibc, eglibc,
and uClibc for the system C library, then everything else in the
system needs to be tested three times rather than once.
Reducing the friction coefficient for system development
Currently, a Debian developer takes upstream code, adds packaging,
perhaps adds some patches (using one of several methods), builds
a binary package, tests it, uploads it, and waits for the build
daemons and the package archive and user-testers to report any
problems.
That's quite a number of steps to go through for the simple act
of adding a new program to Debian, or updating it to a new version.
Some of it can be automated, but there's still hoops to jump through.
Friction does not prevent you from getting stuff done, but the more
friction there is, the more energy you have to spend to get it done.
Friction slows down the crucial hack-build-test cycle of software
development, and that hurts productivity a lot. Every time a
developer has to jump through any hoops, or wait for anything,
he slows down.
It is, of course, not just a matter of the number of steps. Debian
requires a source package to be uploaded with the binary package.
Many, if not most, packages in Debian are maintained using version
control systems. Having to generate a source package and then wait
for it to be uploaded is unnecessary work. The build daemon could
get the source from version control directly. With signed commits,
this is as safe as uploading a tarball.
The above examples are specific to maintaining a single package.
The friction that really hurts Debian is the friction of making
large-scale changes, or changes that affect many packages. I've
already mention the difficulty of making large transitions above.
Another case is making policy changes, and then implementing them.
An excellent example of that is in Debian is the policy change to
use
/usr/share/doc
for documentation, instead of
/usr/doc
. This took us many years to do. We are, I think, perhaps
a little better at such things now, but even so, it is something
that should not take more than a few days to implement, rather than
half a decade.
On the future of distributions
Occasionally, people say things like "distributions are not needed",
or that "distributions are an unnecessary buffer between upstream
developers and users". Some even claim that there should only be
one distribution. I disagree.
A common view of a Linux distribution is that it takes
some source provided by upstream, compiles that, adds an installer,
and gives all of that to the users. This view is too simplistic.
The important part of developing a distribution is choosing the
upstream projects and their versions wisely, and then integrating
them into a whole system that works well. The integration part is
particularly important. Many upstreams are not even aware of each
other, nor should they need to be, even if their software may need
to interact with each other. For example, not every developer of
HTTP servers should need to be aware of every web application, or
vice versa. (It they had to be, it'd be a combinatorial explosion
that'd ruin everything, again.)
Instead, someone needs to set a policy of how web apps and web
servers interface, what their common interface is, and what files
should be put where, for web apps to work out of the box, with
minimal fuss for the users. That's part of the integration work
that goes into a Linux distribution. For Debian, such decisions
are recorded in the Policy Manual and its various sub-policies.
Further, distributions provide quality assurance, particularly at
the system level. It's not realistic to expect most upstream projects to
do that. It's a whole different skillset and approach that is needed
to develop a system, rather than just a single component.
Distributions also provide user support, security support, longer
term support than many upstreams, and port software to a much wider
range of architectures and platforms than most upstreams actively
care about, have access to, or even know about.
In some cases, these are things that
can and should be done in collaboration with upstreams; if nothing
else, portability fixes should be given back to upstreams.
So I do think distributions have a bright future, but the way they're
working will need to change.