Lunar: 80%
Unfortunately I could not go on stage at the 31st Chaos Communication
Congress
to present reproducible builds in
Debian alongside Mike Perry
from the Tor Project and Seth Schoen from the Electronic Frontier
Foundation. I've tried to make it up for it, though and we have made
amazing progress.
Wiki reorganization
What was a massive and frightening wiki
page
now looks really more welcoming:
Depending on what one is looking for, it should be much easier to find.
There's now a high-level status overview given on the landing page,
maintainers can learn how to make their packages
reproducible,
enthusiasts can more easily find what can help the
project,
and we have even started writing some
history.
.buildinfo for all packages
New year's eve saw me hacking Perl to write
dpkg-genbuildinfo.
Similar to
dpkg-genchanges
, it's run by dpkg-buildpackage
to produce
.buildinfo control
files.
This is where the build environment, and hash of source and binary
packages are recorded. This script, integrated with dpkg
, replace the
previous debhelper
interim solution written by Niko Tyni.
We used to fix mtimes in control.tar
and
data.tar
using a specific addition to debhelper
named dh_fixmtimes
. To better
support the ALWAYS_EXCLUDE
environment variable and for pragramtic
reasons, we moved the process in
dh_builddeb
.
Both changes were quickly pushed to our continuous integration
platform. Before, only packages using
dh would create a .buildinfo
and
thus eventually be considered reproducible. With these modifications,
many more packages had their chance and this shows:
Yes, with our experimental
toolchain
we are now at more than eighty percent! That's more than 17200 source
packages!
srebuild
Another big item on the todo-list was crossed over by Johannes Schauer.
srebuild
is a wrapper around sbuild:
Given aUnderstanding problems Over 1700 packages have now been reviewed to understand why build results could not be reproduced on our experimental platform. The variations between the two builds are currently limited to time and file ordering, but this still has uncovered many problems. There are still toolchain fixes to be made (more than 180 packages for the PHP registry) which can make many packages reproducible at once, but others like C pre-processor macros will require many individual changes. debbindiff, the main tool used to understand differences, has gained support for.buildinfo
file, it first finds a timestamp of Debian Sid from snapshot.debian.org which contains the requested packages in their exact versions. It then runs sbuild with the right architecture as given by the.buildinfo
file and the right base system to upgrade from, as given by the version of thebase-files
package version in the.buildinfo
file. Using two hooks it will install the right package versions and verify that the installed packages are in the right version before the build starts.
.udeb
, TrueType and
OpenType fonts, PNG and PDF files. It's less likely to crash on problems
with encoding or external tool. But most importantly for large package,
it has been made a lot faster, thanks to Reiner Herrmann and Helmut
Grohne. Helmut has also been able to spot cross-compilation
issues by using debbindiff
!
Targeting our efforts
It gives warm fuzzy feelings to hit the 80% mark, but it would be a bit
irrelevant if this would not concern packages that matter. Thankfully,
Holger worked on producing statistics for more specific package
sets. Mattia
Rizzolo has also done great work to improve the scripts generating the
various pages visible on
reproducible.debian.net.
All
essential
and
build-esential
packages, except gcc
and bash
, are considered reproducible or have
patches ready. After some lengthy builds, I also managed to come up
with a patch to make
linux build reproducibly.
Miscellaneous
After my initial attempt to modify
r-base to remove a timestamp
in R packages, Dirk Eddelbuettel discussed the issue with upstream and
came up with a better patch. The latter has already been merged
upstream!
Dirk's solution is to allow timestamps to be set using an external
environment variable. This is also how I modified
FontForge to make it possible to
reproduce
fonts.
Identifiers generated by
xsltproc have also been an
issue. After reviewing my initial patch, Andrew Awyer came up with a
much nicer solution. Its potential performance implications need to be
evaluated before submission, though.
Chris West has been working on packages built with
Maven
amongst other things.
PDF generated by
GhostScript,
another painful source of troubles, is being worked on by Peter De
Wachter.
Holger got X.509 certificates signed by the CA cartel for
jenkins.debian.net
and reproducible.debian.net
. No more scary
security messages now. Let's hope next year we will be able to get
certificates through Let's Encrypt!
Let's make a difference together
As you can imagine with all that happened in the past weeks, the
#debian-reproducible
IRC channel has been a cool place to hang out.
It's very energizing to get together and share contributions, exchange
tips and discuss hardest points. Mandatory quote:
* h01ger is very happy to see again and again how this is a nice
learning circle...! i've learned a whole lot here too... in
just 3 months... and its going on...!
Reproducible builds are not going to change anything for most of our
users. They simply don't care how they get software on their computer.
But they care to get the right software without having to worry about
it. That's our responsibility, as developers. Enabling users to
trust their software is important and a major contribution, we as
Debian, can make to the wider free software movement. Once Jessie is
released, we should make a collective effort to make reproducible builds
an highlight of our next release.