- Geoffroy Berret (kaliko)
- Arnaud Ferraris (aferraris)
- Alec Leanas
- Christopher Michael Obbard
- Lance Lin
- Stefan Kropp
- Matteo Bini
- Tino Didriksen
Various efforts towards build verifiability have been made to C/C++-based systems, yet the techniques for Java-based systems are not systematic and are often specific to a particular build tool (eg. Maven). In this study, we present a systematic approach towards build verifiability on Java-based systems.
We first define the problem, and then provide insight into the challenges of making real-world software build in a reproducible manner-this is, when every build generates bit-for-bit identical results. Through the experience of the Reproducible Builds project making the Debian Linux distribution reproducible, we also describe the affinity between reproducibility and quality assurance (QA).
SOURCE_DATE_EPOCHspecification related to formats that cannot help embedding potentially timezone-specific timestamp. (Full thread index.)
206to Debian unstable, as well as made the following changes to the code itself:
file(1)-related regression where Debian
.changesfiles that contained non-ASCII text were not identified as such, therefore resulting in seemingly arbitrary packages not actually comparing the nested files themselves. The non-ASCII parts were typically in the
Maintaineror in the changelog text. [ ][ ]
BinwalkFile.recognizes. [ ]
binwalk, don t report that we are missing the Python
rpmmodule! [ ]
diffoscope-minimalpackages have the same version. [ ]
debian-develmailing list after noticing that the
binutilssource package contained unreproducible logs in one of its binary packages. Vagrant expanded the discussion to one about all kinds of build metadata in packages and outlines a number of potential solutions that support reproducible builds and arbitrary metadata. Vagrant also started a discussion on
debian-develafter identifying a large number of packages that embed build paths via RPATH when building with CMake, including a list of packages (grouped by Debian maintainer) affected by this issue. Maintainers were requested to check whether their package still builds correctly when passing the
-DCMAKE_BUILD_RPATH_USE_ORIGIN=ONdirective. On our mailing list this month, kpcyrd announced the release of rebuilderd-debian-buildinfo-crawler a tool to parse the
Packages.xzDebian package index file, attempts to discover the right
.buildinfofile from buildinfos.debian.net and outputs it in a format that can be understood by rebuilderd. The tool, which is available on GitHub, solves a problem regarding correlating Debian version numbers with their builds. bauen1 provided two patches for debian-cd, the software used to make Debian installer images. This involved passing
mkfs.msdos(8)and avoided embedding timestamps into the gzipped
Translationsfiles. After some discussion, the patches in question were merged and will be included in debian-cd version 3.1.36. Roland Clobus wrote another in-depth status update about status of live Debian images, summarising the current situation that all major desktops build reproducibly with bullseye, bookworm and sid . The
python3.10package was uploaded to Debian by doko, fixing an issue where [
.pycfiles were not reproducible because the elements in
frozensetdata structures were not ordered reproducibly. This meant that to creating a bit-for-bit reproducible Debian chroot which included
.pycfiles was not reproducible. As of writing, the only remaining unreproducible parts of a
man-db, but Guillem Jover has a patch for
update-alternativeswhich will likely be part of the next release of
dpkg. Elsewhere in Debian, 139 reviews of Debian packages were added, 29 were updated and 17 were removed this month adding to our knowledge about identified issues. A large number of issue types have been updated too, including the addition of
contributors.shBash/shell script into a Python script. [ ][ ][ ]
giac(update the version with upstreamed date patch)
htcondor(use CMake timestamp)
readdirsystem call related)
librime-lua(sort filesystem ordering)
quimb(single-CPU build failure)
radare2(Meson date/time-related issue)
SOURCE_DATE_EPOCHusage to be portable)
siproxd(date, with Sebastian Kemper + follow-up
xonsh(Address Space Layout Randomisation-related issue)
zip(toolchain issue related to filesystem ordering)
openwrt.gitrepository the next day.
useraddwarnings when building packages. [ ]
armhfarchitecture nodes to add a hint to where nodes named
virt-*. [ ]
man-dbservices. [ ]
All Debian suites from buster onwards ship the 3.22-26 release, although the maintainer just pushed a 3.22-27 release to fix a seven year old null pointer dereference, after this article was drafted. Procmail is also shipped in all major distributions: Fedora and its derivatives, Debian derivatives, Gentoo, Arch, FreeBSD, OpenBSD. We all seem to be ignoring this problem. The upstream website (http://procmail.org/) has been down since about 2015, according to Debian bug #805864, with no change since. In effect, every distribution is currently maintaining its fork of this dead program. Note that, after filing a bug to keep Debian from shipping procmail in a stable release again, I was told that the Debian maintainer is apparently in contact with the upstream. And, surprise! they still plan to release that fabled 3.23 release, which has been now in "pre-release" for all those twenty years. In fact, it turns out that 3.23 is considered released already, and that the procmail author actually pushed a 3.24 release, codenamed "Two decades of fixes". That amounts to 25 commits since 3.23pre some of which address serious security issues, but none of which address fundamental issues with the code base.
procmail (3.22-1) unstable; urgency=low * New upstream release, which uses the standard' format for Maildir filenames and retries on name collision. It also contains some bug fixes from the 3.23pre snapshot dated 2001-09-13. * Removed sendmail' from the Recommends field, since we already have exim' (the default Debian MTA) and mail-transport-agent'. * Removed suidmanager support. Conflicts: suidmanager (<< 0.50). * Added support for DEB_BUILD_OPTIONS in the source package. * README.Maildir: Do not use locking on the example recipe, since it's wrong to do so in this case. -- Santiago Vila <email@example.com> Wed, 21 Nov 2001 09:40:20 +0100
root:mailin Debian. There's no
debconfor pre-seed setting that can change this. There has been two bug reports against the Debian to make this configurable (298058, 264011), but both were closed to say that, basically, you should use
dpkg-statoverrideto change the permissions on the binary. So if anything, you should immediately run this command on any host that you have
Note that this might break email delivery. It might also not work at all, thanks to usrmerge. Not sure. Yes, everything is on fire. This is fine. In my opinion, even assuming we keep procmail in Debian, that default should be reversed. It should be up to people installing procmail to assign it those dangerous permissions, after careful consideration of the risk involved. The last maintainer of procmail explicitly advised us (in that null pointer dereference bug) and other projects (e.g. OpenBSD, in ) to stop shipping it, back in 2014. Quote:
dpkg-statoverride --update --add root root 0755 /usr/bin/procmail
Executive summary: delete the procmail port; the code is not safe and should not be used as a basis for any further work.I just read some of the code again this morning, after the original author claimed that procmail was active again. It's still littered with bizarre macros like:
... from regexp.c, line 66 (yes, that's a custom regex engine). Or this one:
#define bit_set(name,which,value) \ (value?(name[bit_index(which)] =bit_mask(which)):\ (name[bit_index(which)]&=~bit_mask(which)))
It uses insecure functions like
#define jj (aleps.au.sopc)
malloc()is thrown around
gotos like it's 1984 all over again. (To be fair, it has been feeling like 1984 a lot lately, but that's another matter entirely.) That null pointer deref bug? It's fixed upstream now, in this commit merged a few hours ago, which I presume might be in response to my request to remove procmail from Debian. So while that's nice, this is the just tip of the iceberg. I speculate that one could easily find an exploitable crash in procmail if only by running it through a fuzzer. But I don't need to speculate: procmail had, for years, serious security issues that could possibly lead to root privilege escalation, remotely exploitable if procmail is (as it's designed to do) exposed to the network. Maybe I'm overreacting. Maybe the procmail author will go through the code base and do a proper rewrite. But I don't think that's what is in the cards right now. What I expect will happen next is that people will start fuzzing procmail, throw an uncountable number of bug reports at it which will get fixed in a trickle while never fixing the underlying, serious design flaws behind procmail.
procmail(1)itself are typically part of mail servers. For example, Dovecot has its own LDA which implements the standard Sieve language (RFC 5228). (Interestingly, Sieve was published as RFC 3028 in 2001, before procmail was formally abandoned.) Courier also has "maildrop" which has its own filtering mechanism, and there is fdm (2007) which is a fetchmail and procmail replacement. Update: there's also mailprocessing, which is not an LDA, but processing an existing folder. It was, however, specifically designed to replace complex Procmail rules. But procmail, of course, doesn't just ship procmail; that would just be too easy. It ships
mailstat(1)which we could probably ignore because it only parses procmail log files. But more importantly, it also ships:
lockfile(1)- conditional semaphore-file creator
formail(1)- mail (re)formatter
lockfile(1)already has a somewhat acceptable replacement in the form of
flock(1), part of util-linux (which is Essential, so installed on any normal Debian system). It might not be a direct drop-in replacement, but it should be close enough.
formail(1)is similar: the courier
reformail(1)which is, presumably, a rewrite of formail. It's unclear if it's a drop-in replacement, but it should probably possible to port uses of formail to it easily.
Update: theThe real challenge is, of course, migrating those old
maildroppackage ships a SUID root binary (two, even). So if you want only
reformail(1), you might want to disable that with:It would be perhaps better to have
dpkg-statoverride --update --add root root 0755 /usr/bin/lockmail.maildrop dpkg-statoverride --update --add root root 0755 /usr/bin/maildrop
reformail(1)as a separate package, see bug 1006903 for that discussion.
.procmailrcrecipes to Sieve (basically). I added a few examples in the appendix below. You might notice the Sieve examples are easier to read, which is a nice added bonus.
procmailinstalled everywhere, possibly because userdir-ldap was using it for
lockfileuntil 2019. I sent a patch to fix that and scrambled to remove get rid of procmail everywhere. That took about a day. But many other sites are now in that situation, possibly not imagining they have this glaring security hole in their infrastructure.
firstname.lastname@example.org the folder
foo. You might write something like this in procmail:
That, in sieve language, would be:
MAILDIR=$HOME/Maildir/ DEFAULT=$MAILDIR LOGFILE=$HOME/.procmail.log VERBOSE=off EXTENSION=$1 # Need to rename it - ?? does not like $1 nor 1 :0 * EXTENSION ?? [a-zA-Z0-9]+ .$EXTENSION/
require ["variables", "envelope", "fileinto", "subaddress"]; ######################################################################## # wildcard +extension # https://doc.dovecot.org/configuration_manual/sieve/examples/#plus-addressed-mail-filtering if envelope :matches :detail "to" "*" # Save name in $ name in all lowercase set :lower "name" "$ 1 "; fileinto "$ name "; stop;
FreshPortsin it into the
freshportsfolder, and mails from
alternc.orgmailing lists into the
:0 ## mailing list freshports * ^Subject.*FreshPorts.* .freshports/ :0 ## mailing list alternc * ^List-Post.*mailto:.*@alternc.org.* .alternc/
if header :contains "subject" "FreshPorts" fileinto "freshports"; elsif header :contains "List-Id" "alternc.org" fileinto "alternc";
Would look something like this in Sieve:
:0 * ^Subject: Cron * ^From: .*root@ .rapports/
Note that this is what the automated converted does (below). It's not very readable, but it works.
if header :comparator "i;octet" :contains "Subject" "Cron" if header :regex :comparator "i;octet" "From" ".*root@" fileinto "rapports";
if header :contains "Precedence" "bulk" fileinto "bulk";
if exists "List-Id" fileinto "lists";
You can even pile up a bunch of options together to have one big rule with multiple patterns:
if anyof (header :contains "from" "example.com", header :contains ["to", "cc"] "email@example.com") fileinto "example";
if anyof (exists "X-Cron-Env", header :contains ["subject"] ["security run output", "monthly run output", "daily run output", "weekly run output", "Debian Package Updates", "Debian package update", "daily mail stats", "Anacron job", "nagios", "changes report", "run output", "[Systraq]", "Undelivered mail", "Postfix SMTP server: errors from", "backupninja", "DenyHosts report", "Debian security status", "apt-listchanges" ], header :contains "Auto-Submitted" "auto-generated", envelope :contains "from" ["nagios@", "logcheck@", "root@"]) fileinto "rapports";
dgit clone sourcepackagegets you the source code, as a git tree, in
./sourcepackage. cd into it and
dpkg-buildpackage -uc -b. Do not use: "VCS" links on official Debian web pages like tracker.debian.org; "debcheckout"; searching Debian's gitlab (salsa.debian.org). These are good for Debian experts only. If you use Debian's "official" source git repo links you can easily build a package without Debian's patches applied. This can even mean missing security patches. Or maybe it can't even be built in a normal way (or at all). OMG WTF BBQ, why? It's complicated. There is History. Debian's "most-official" centralised source repository is still the Debian Archive, which is a system based on tarballs and patches. I invented the Debian source package format in 1992/3 and it has been souped up since, but it's still tarballs and patches. This system is, of course, obsolete, now that we have modern version control systems, especially git. Maintainers of Debian packages have invented ways of using git anyway, of course. But this is not standardised. There is a bewildering array of approaches. The most common approach is to maintain git tree containing a pile of
*.patchfiles, which are then often maintained using quilt. Yes, really, many Debian people are still using quilt, despite having git! There is machinery for converting this git tree containing a series of patches, to an "official" source package. If you don't use that machinery, and just build from git, nothing applies the patches.  This post was prompted by a conversation with a friend who had wanted to build a Debian package, and didn't know to use dgit. They had got the source from salsa via a link on tracker.d.o, and built
.debs without Debian's patches. This not a theoretical unsoundness, but a very real practical risk. Future is not very bright In 2013 at the Debconf in Vaumarcus, Joey Hess, myself, and others, came up with a plan to try to improve this which we thought would be deployable. (Previous attempts had failed.) Crucially, this transition plan does not force change onto any of Debian's many packaging teams, nor onto people doing cross-package maintenance work. I worked on this for quite a while, and at a technical level it is a resounding success. Unfortunately there is a big limitation. At the current stage of the transition, to work at its best, this replacement scheme hopes that maintainers who update a package will use a new upload tool. The new tool fits into their existing Debian git packaging workflow and has some benefits, but it does make things more complicated rather than less (like any transition plan must, during the transitional phase). When maintainers don't use this new tool, the standardised git branch seen by users is a compatibility stub generated from the tarballs-and-patches. So it has the right contents, but useless history. The next step is to allow a maintainer to update a package without dealing with tarballs-and-patches at all. This would be massively more convenient for the maintainer, so an easy sell. And of course it links the tarballs-and-patches to the git history in a proper machine-readable way. We held a "git packaging requirements-gathering session" at the Curitiba Debconf in 2019. I think the DPL's intent was to try to get input into the git workflow design problem. The session was a great success: my existing design was able to meet nearly everyone's needs and wants. The room was obviously keen to see progress. The next stage was to deploy tag2upload. I spoke to various key people at the Debconf and afterwards in 2019 and the code has been basically ready since then. Unfortunately, deployment of tag2upload is mired in politics. It was blocked by a key team because of unfounded security concerns; positive opinions from independent security experts within Debian were disregarded. Of course it is always hard to get a team to agree to something when it's part of a transition plan which treats their systems as an obsolete setup retained for compatibility. Current status If you don't know about Debian's git packaging practices (eg, you have no idea what "patches-unapplied packaging branch without .pc directory" means), and don't want want to learn about them, you must use
dgitto obtain the source of Debian packages. There is a lot more information and detailed instructions in
dgit-user(7). Hopefully either the maintainer did the best thing, or, if they didn't, you won't need to inspect the history. If you are a Debian maintainer, you should use
dgit push-sourceto do your uploads. This will make sure that users of dgit will see a reasonable git history. edited 2021-09-15 14:48 Z to fix a typo
README.md) that are included in every package I maintain, and thus is part of the transitive closure of Debian main, but I'm not sure anyone will install it from there for any other purpose. But for once making something for someone else isn't the point. This is my quirky, individual way to maintain web sites that originated in an older era of the web and that I plan to keep up-to-date (I'm long overdue to figure out what they did to HTML after abandoning the XHTML approach) because it brings me joy to do things this way. In addition to adding the static site generator, this release also has the regular sorts of bug fixes and minor improvements: better formatting of software pages for software that's packaged for Debian, not assuming every package has a TODO file, and ignoring Autoconf 2.71 backup files when generating distribution tarballs. You can get the latest version of DocKnot from CPAN as App-DocKnot, or from its distribution page. I know I haven't yet updated my web tools page to reflect this move, or changed the URL in the footer of all of my pages. This transition will be a process over the next few months and will probably prompt several more minor releases.
tools.deps.alpha, a library for dependency graph resolution and classpath building, and the CLI tool
clj, for REPL interaction. If time permitted, I was also to improve the quality of both new and existing Clojure packages, and the overall Debian Clojure packaging process. My mentor was Louis-Louis-Philippe V ronneau, and my co-mentor was Utkarsh Gupta.
clojure-debian-helper). The second reason for which we only currently have a suboptimal Clojure experience in Debian, and probably the root of the previous one, is that many core build tools and libraries for the language have not simply been packaged yet. My project aimed to attack that seemingly root cause.As I said, another reason for me choosing this project is my own experience as the Co-founder and Leader of, probably, the first Free Software Community experience in my hometown of San Juan, Argentina. That interest in Free Software evolved in a first PhD attempt in what is now known as the field of Peer Production. A subject that has lived within me as a research interest during my day job at a University.Being a Clojure fan, it felt only logical combining all those interests somehow. And this project seemed like the ideal combination.
The Debian Janitor is an automated system that commits fixes for (minor) issues in Debian packages that can be fixed by software. It gradually started proposing merges in early December. The first set of changes sent out ran lintian-brush on sid packages maintained in Git. This post is part of a series about the progress of the Janitor. The FOSS world uses a wide variety of different build tools; given a git repository or tarball, it can be hard to figure out how to build and install a piece of software. Humans will generally know what build tool a project is using when they check out a project from git, or they can read the README. And even then, the answer may not always be straightforward to everybody. For automation, there is no obvious place to figure out how to build or install a project.
1 2 3 4 5 6 7 8 9
% git clone https://github.com/dulwich/dulwich % cd dulwich % ogni --schroot=unstable-amd64-sbuild dist Writing dulwich-0.20.21/setup.cfg creating dist Creating tar archive removing 'dulwich-0.20.21' (and everything under it) Found new tarball dulwich-0.20.21.tar.gz in /var/run/schroot/mount/unstable-amd64-sbuild-974d32d7-6f10-4e77-8622-b6a091857e85/build/tmpucazj7j7/package/dist.
1 2 3 4 5 6 7 8 9
% wget https://download.samba.org/pub/ldb/ldb-2.3.0.tar.gz % tar xvfz ldb-2.3.0.tar.gz % cd ldb-2.3.0 % ogni install --prefix=/tmp/ldb + install /tmp/ldb/include/ldb.h (from include/ldb.h) Waf: Leaving directory /tmp/ldb-2.3.0/bin/default' 'install' finished successfully (11.395s)
1 2 3 4 5 6
% wget https://cpan.metacpan.org/authors/id/T/TO/TORU/XML-LibXML-LazyBuilder-0.08.tar.gz _ <https://cpan.metacpan.org/authors/id/T/TO/TORU/XML-LibXML-LazyBuilder-0.08.tar.gz> _ % tar xvfz XML-LibXML-LazyBuilder-0.08.tar.gz Cd XML-LibXML-LazyBuilder-0.08 % ogni test
I wrote this blog post with Kaylea Champion and a version of this post was originally posted on the Community Data Science Collective blog. Critical software we all rely on can silently crumble away beneath us. Unfortunately, we often don t find out software infrastructure is in poor condition until it is too late. Over the last year or so, I have been supporting Kaylea Champion on a project my group announced earlier to measure software underproduction a term we use to describe software that is low in quality but high in importance. Underproduction reflects an important type of risk in widely used free/libre open source software (FLOSS) because participants often choose their own projects and tasks. Because FLOSS contributors work as volunteers and choose what they work on, important projects aren t always the ones to which FLOSS developers devote the most attention. Even when developers want to work on important projects, relative neglect among important projects is often difficult for FLOSS contributors to see. Given all this, what can we do to detect problems in FLOSS infrastructure before major failures occur? Kaylea Champion and I recently published a paper laying out our new method for measuring underproduction at the IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER) 2021 that we believe provides one important answer to this question.In the paper, we describe a general approach for detecting underproduced software infrastructure that consists of five steps: (1) identifying a body of digital infrastructure (like a code repository); (2) identifying a measure of quality (like the time to takes to fix bugs); (3) identifying a measure of importance (like install base); (4) specifying a hypothesized relationship linking quality and importance if quality and importance are in perfect alignment; and (5) quantifying deviation from this theoretical baseline to find relative underproduction. To show how our method works in practice, we applied the technique to an important collection of FLOSS infrastructure: 21,902 packages in the Debian GNU/Linux distribution. Although there are many ways to measure quality, we used a measure of how quickly Debian maintainers have historically dealt with 461,656 bugs that have been filed over the last three decades. To measure importance, we used data from Debian s Popularity Contest opt-in survey. After some statistical machinations that are documented in our paper, the result was an estimate of relative underproduction for the 21,902 packages in Debian we looked at. One of our key findings is that underproduction is very common in Debian. By our estimates, at least 4,327 packages in Debian are underproduced. As you can see in the list of the most underproduced packages again, as estimated using just one more measure many of the most at risk packages are associated with the desktop and windowing environments where there are many users but also many extremely tricky integration-related bugs. We hope these results are useful to folks at Debian and the Debian QA team. We also hope that the basic method we ve laid out is something that others will build off in other contexts and apply to other software repositories. In addition to the paper itself and the video of the conference presentation on Youtube by Kaylea, we ve put a repository with all our code and data in an archival repository Harvard Dataverse and we d love to work with others interested in applying our approach in other software ecosytems.
For more details, check out the full paper which is available as a freely accessible preprint.
This project was supported by the Ford/Sloan Digital Infrastructure Initiative. Wm Salt Hale of the Community Data Science Collective and Debian Developers Paul Wise and Don Armstrong provided valuable assistance in accessing and interpreting Debian bug data. Ren Just generously provided insight and feedback on the manuscript.
Paper Citation: Kaylea Champion and Benjamin Mako Hill. 2021. Underproduction: An Approach for Measuring Risk in Open Source Software. In Proceedings of the IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER 2021). IEEE.
Contact Kaylea Champion (firstname.lastname@example.org) with any questions or if you are interested in following up.