
Unfortunately I could not go
on stage at the 31st Chaos Communication
Congress
to present
reproducible builds in
Debian alongside Mike Perry
from the Tor Project and Seth Schoen from the Electronic Frontier
Foundation. I've tried to make it up for it, though and we have made
amazing progress.
Wiki reorganization
What was a
massive and frightening wiki
page
now looks really more welcoming:

Depending on what one is looking for, it should be much easier to find.
There's now a high-level status overview given on the landing page,
maintainers can
learn how to make their packages
reproducible,
enthusiasts can more easily find what can
help the
project,
and we have even started
writing some
history.
.buildinfo for all packages
New year's eve saw me
hacking Perl to write
dpkg-genbuildinfo.
Similar to
dpkg-genchanges, it's run by
dpkg-buildpackage to produce
.buildinfo control
files.
This is where the build environment, and hash of source and binary
packages are recorded. This script, integrated with
dpkg, replace the
previous
debhelper interim solution written by Niko Tyni.
We used to fix
mtimes in control.tar and
data.tar
using a specific addition to
debhelper named
dh_fixmtimes. To better
support the
ALWAYS_EXCLUDE environment variable and for pragramtic
reasons, we
moved the process in
dh_builddeb.
Both changes were quickly pushed to our
continuous integration
platform. Before, only packages using
dh would create a
.buildinfo and
thus eventually be considered
reproducible. With these modifications,
many more packages had their chance and this shows:

Yes, with our
experimental
toolchain
we are now at more than eighty percent! That's more than 17200 source
packages!
srebuild
Another big item on the todo-list was crossed over by Johannes Schauer.
srebuild
is a wrapper around
sbuild:
Given a .buildinfo file, it first finds a timestamp of Debian Sid from
snapshot.debian.org which contains the
requested packages in their exact versions. It then runs sbuild with the
right architecture as given by the .buildinfo file and the right base
system to upgrade from, as given by the version of the base-files package
version in the .buildinfo file. Using two hooks it will install the right
package versions and verify that the installed packages are in the right
version before the build starts.
Understanding problems
Over 1700 packages have now been
reviewed to
understand why build results could not be reproduced on our experimental
platform. The variations between the two builds are currently limited
to time and file ordering, but this still has uncovered
many
problems.
There are still toolchain fixes to be made (more than 180 packages for
the
PHP
registry)
which can make many packages reproducible at once, but others like
C
pre-processor
macros
will require many individual changes.
debbindiff, the main tool used
to understand differences, has gained support for
.udeb, TrueType and
OpenType fonts, PNG and PDF files. It's less likely to crash on problems
with encoding or external tool. But most importantly for large package,
it has been made a lot faster, thanks to Reiner Herrmann and Helmut
Grohne. Helmut has also been able to
spot cross-compilation
issues by using
debbindiff!
Targeting our efforts
It gives warm fuzzy feelings to hit the 80% mark, but it would be a bit
irrelevant if this would not concern packages that matter. Thankfully,
Holger worked on producing statistics for
more specific package
sets. Mattia
Rizzolo has also done great work to improve the scripts generating the
various pages visible on
reproducible.debian.net.
All
essential
and
build-esential
packages, except
gcc and
bash, are considered reproducible or have
patches ready. After some lengthy builds, I also managed to come up
with a
patch to make
linux build reproducibly.
Miscellaneous
After my initial attempt to modify
r-base to remove a timestamp
in R packages, Dirk Eddelbuettel discussed the issue with upstream and
came up with a better patch. The latter has already been
merged
upstream!
Dirk's solution is to allow timestamps to be set using an external
environment variable. This is also how I
modified
FontForge to make it possible to
reproduce
fonts.
Identifiers generated by
xsltproc have also been an
issue. After reviewing my initial patch, Andrew Awyer came up with a
much nicer solution. Its potential performance implications need to be
evaluated before submission, though.
Chris West has been working on
packages built with
Maven
amongst other things.
PDF generated by
GhostScript,
another painful source of troubles, is being worked on by Peter De
Wachter.
Holger got X.509 certificates signed by the CA cartel for
jenkins.debian.net and
reproducible.debian.net. No more scary
security messages now. Let's hope next year we will be able to get
certificates through
Let's Encrypt!
Let's make a difference together
As you can imagine with all that happened in the past weeks, the
#debian-reproducible IRC channel has been a cool place to hang out.
It's very energizing to get together and share contributions, exchange
tips and discuss hardest points. Mandatory quote:
* h01ger is very happy to see again and again how this is a nice
learning circle...! i've learned a whole lot here too... in
just 3 months... and its going on...!
Reproducible builds are not going to change anything for most of our
users. They simply don't care how they get software on their computer.
But they care to get the right software without having to worry about
it. That's our responsibility, as developers. Enabling users to
trust their software is important and a major contribution, we as
Debian, can make to the wider free software movement. Once
Jessie is
released, we should make a collective effort to make reproducible builds
an highlight of our next release.