Search Results: "kpcyrd"

11 April 2024

Reproducible Builds: Reproducible Builds in March 2024

Welcome to the March 2024 report from the Reproducible Builds project! In our reports, we attempt to outline what we have been up to over the past month, as well as mentioning some of the important things happening more generally in software supply-chain security. As ever, if you are interested in contributing to the project, please visit our Contribute page on our website. Table of contents:
  1. Arch Linux minimal container userland now 100% reproducible
  2. Validating Debian s build infrastructure after the XZ backdoor
  3. Making Fedora Linux (more) reproducible
  4. Increasing Trust in the Open Source Supply Chain with Reproducible Builds and Functional Package Management
  5. Software and source code identification with GNU Guix and reproducible builds
  6. Two new Rust-based tools for post-processing determinism
  7. Distribution work
  8. Mailing list highlights
  9. Website updates
  10. Delta chat clients now reproducible
  11. diffoscope updates
  12. Upstream patches
  13. Reproducibility testing framework

Arch Linux minimal container userland now 100% reproducible In remarkable news, Reproducible builds developer kpcyrd reported that that the Arch Linux minimal container userland is now 100% reproducible after work by developers dvzv and Foxboron on the one remaining package. This represents a real world , widely-used Linux distribution being reproducible. Their post, which kpcyrd suffixed with the question now what? , continues on to outline some potential next steps, including validating whether the container image itself could be reproduced bit-for-bit. The post, which was itself a followup for an Arch Linux update earlier in the month, generated a significant number of replies.

Validating Debian s build infrastructure after the XZ backdoor From our mailing list this month, Vagrant Cascadian wrote about being asked about trying to perform concrete reproducibility checks for recent Debian security updates, in an attempt to gain some confidence about Debian s build infrastructure given that they performed builds in environments running the high-profile XZ vulnerability. Vagrant reports (with some caveats):
So far, I have not found any reproducibility issues; everything I tested I was able to get to build bit-for-bit identical with what is in the Debian archive.
That is to say, reproducibility testing permitted Vagrant and Debian to claim with some confidence that builds performed when this vulnerable version of XZ was installed were not interfered with.

Making Fedora Linux (more) reproducible In March, Davide Cavalca gave a talk at the 2024 Southern California Linux Expo (aka SCALE 21x) about the ongoing effort to make the Fedora Linux distribution reproducible. Documented in more detail on Fedora s website, the talk touched on topics such as the specifics of implementing reproducible builds in Fedora, the challenges encountered, the current status and what s coming next. (YouTube video)

Increasing Trust in the Open Source Supply Chain with Reproducible Builds and Functional Package Management Julien Malka published a brief but interesting paper in the HAL open archive on Increasing Trust in the Open Source Supply Chain with Reproducible Builds and Functional Package Management:
Functional package managers (FPMs) and reproducible builds (R-B) are technologies and methodologies that are conceptually very different from the traditional software deployment model, and that have promising properties for software supply chain security. This thesis aims to evaluate the impact of FPMs and R-B on the security of the software supply chain and propose improvements to the FPM model to further improve trust in the open source supply chain. PDF
Julien s paper poses a number of research questions on how the model of distributions such as GNU Guix and NixOS can be leveraged to further improve the safety of the software supply chain , etc.

Software and source code identification with GNU Guix and reproducible builds In a long line of commendably detailed blog posts, Ludovic Court s, Maxim Cournoyer, Jan Nieuwenhuizen and Simon Tournier have together published two interesting posts on the GNU Guix blog this month. In early March, Ludovic Court s, Maxim Cournoyer, Jan Nieuwenhuizen and Simon Tournier wrote about software and source code identification and how that might be performed using Guix, rhetorically posing the questions: What does it take to identify software ? How can we tell what software is running on a machine to determine, for example, what security vulnerabilities might affect it? Later in the month, Ludovic Court s wrote a solo post describing adventures on the quest for long-term reproducible deployment. Ludovic s post touches on GNU Guix s aim to support time travel , the ability to reliably (and reproducibly) revert to an earlier point in time, employing the iconic image of Harold Lloyd hanging off the clock in Safety Last! (1925) to poetically illustrate both the slapstick nature of current modern technology and the gymnastics required to navigate hazards of our own making.

Two new Rust-based tools for post-processing determinism Zbigniew J drzejewski-Szmek announced add-determinism, a work-in-progress reimplementation of the Reproducible Builds project s own strip-nondeterminism tool in the Rust programming language, intended to be used as a post-processor in RPM-based distributions such as Fedora In addition, Yossi Kreinin published a blog post titled refix: fast, debuggable, reproducible builds that describes a tool that post-processes binaries in such a way that they are still debuggable with gdb, etc.. Yossi post details the motivation and techniques behind the (fast) performance of the tool.

Distribution work In Debian this month, since the testing framework no longer varies the build path, James Addison performed a bulk downgrade of the bug severity for issues filed with a level of normal to a new level of wishlist. In addition, 28 reviews of Debian packages were added, 38 were updated and 23 were removed this month adding to ever-growing knowledge about identified issues. As part of this effort, a number of issue types were updated, including Chris Lamb adding a new ocaml_include_directories toolchain issue [ ] and James Addison adding a new filesystem_order_in_java_jar_manifest_mf_include_resource issue [ ] and updating the random_uuid_in_notebooks_generated_by_nbsphinx to reference a relevant discussion thread [ ]. In addition, Roland Clobus posted his 24th status update of reproducible Debian ISO images. Roland highlights that the images for Debian unstable often cannot be generated due to changes in that distribution related to the 64-bit time_t transition. Lastly, Bernhard M. Wiedemann posted another monthly update for his reproducibility work in openSUSE.

Mailing list highlights Elsewhere on our mailing list this month:

Website updates There were made a number of improvements to our website this month, including:
  • Pol Dellaiera noticed the frequent need to correctly cite the website itself in academic work. To facilitate easier citation across multiple formats, Pol contributed a Citation File Format (CIF) file. As a result, an export in BibTeX format is now available in the Academic Publications section. Pol encourages community contributions to further refine the CITATION.cff file. Pol also added an substantial new section to the buy in page documenting the role of Software Bill of Materials (SBOMs) and ephemeral development environments. [ ][ ]
  • Bernhard M. Wiedemann added a new commandments page to the documentation [ ][ ] and fixed some incorrect YAML elsewhere on the site [ ].
  • Chris Lamb add three recent academic papers to the publications page of the website. [ ]
  • Mattia Rizzolo and Holger Levsen collaborated to add Infomaniak as a sponsor of amd64 virtual machines. [ ][ ][ ]
  • Roland Clobus updated the stable outputs page, dropping version numbers from Python documentation pages [ ] and noting that Python s set data structure is also affected by the PYTHONHASHSEED functionality. [ ]

Delta chat clients now reproducible Delta Chat, an open source messaging application that can work over email, announced this month that the Rust-based core library underlying Delta chat application is now reproducible.

diffoscope diffoscope is our in-depth and content-aware diff utility that can locate and diagnose reproducibility issues. This month, Chris Lamb made a number of changes such as uploading versions 259, 260 and 261 to Debian and made the following additional changes:
  • New features:
    • Add support for the zipdetails tool from the Perl distribution. Thanks to Fay Stegerman and Larry Doolittle et al. for the pointer and thread about this tool. [ ]
  • Bug fixes:
    • Don t identify Redis database dumps as GNU R database files based simply on their filename. [ ]
    • Add a missing call to File.recognizes so we actually perform the filename check for GNU R data files. [ ]
    • Don t crash if we encounter an .rdb file without an equivalent .rdx file. (#1066991)
    • Correctly check for 7z being available and not lz4 when testing 7z. [ ]
    • Prevent a traceback when comparing a contentful .pyc file with an empty one. [ ]
  • Testsuite improvements:
    • Fix .epub tests after supporting the new zipdetails tool. [ ]
    • Don t use parenthesis within test skipping messages, as PyTest adds its own parenthesis. [ ]
    • Factor out Python version checking in test_zip.py. [ ]
    • Skip some Zip-related tests under Python 3.10.14, as a potential regression may have been backported to the 3.10.x series. [ ]
    • Actually test 7z support in the test_7z set of tests, not the lz4 functionality. (Closes: reproducible-builds/diffoscope#359). [ ]
In addition, Fay Stegerman updated diffoscope s monkey patch for supporting the unusual Mozilla ZIP file format after Python s zipfile module changed to detect potentially insecure overlapping entries within .zip files. (#362) Chris Lamb also updated the trydiffoscope command line client, dropping a build-dependency on the deprecated python3-distutils package to fix Debian bug #1065988 [ ], taking a moment to also refresh the packaging to the latest Debian standards [ ]. Finally, Vagrant Cascadian submitted an update for diffoscope version 260 in GNU Guix. [ ]

Upstream patches This month, we wrote a large number of patches, including: Bernhard M. Wiedemann used reproducibility-tooling to detect and fix packages that added changes in their %check section, thus failing when built with the --no-checks option. Only half of all openSUSE packages were tested so far, but a large number of bugs were filed, including ones against caddy, exiv2, gnome-disk-utility, grisbi, gsl, itinerary, kosmindoormap, libQuotient, med-tools, plasma6-disks, pspp, python-pypuppetdb, python-urlextract, rsync, vagrant-libvirt and xsimd. Similarly, Jean-Pierre De Jesus DIAZ employed reproducible builds techniques in order to test a proposed refactor of the ath9k-htc-firmware package. As the change produced bit-for-bit identical binaries to the previously shipped pre-built binaries:
I don t have the hardware to test this firmware, but the build produces the same hashes for the firmware so it s safe to say that the firmware should keep working.

Reproducibility testing framework The Reproducible Builds project operates a comprehensive testing framework running primarily at tests.reproducible-builds.org in order to check packages and other artifacts for reproducibility. In March, an enormous number of changes were made by Holger Levsen:
  • Debian-related changes:
    • Sleep less after a so-called 404 package state has occurred. [ ]
    • Schedule package builds more often. [ ][ ]
    • Regenerate all our HTML indexes every hour, but only every 12h for the released suites. [ ]
    • Create and update unstable and experimental base systems on armhf again. [ ][ ]
    • Don t reschedule so many depwait packages due to the current size of the i386 architecture queue. [ ]
    • Redefine our scheduling thresholds and amounts. [ ]
    • Schedule untested packages with a higher priority, otherwise slow architectures cannot keep up with the experimental distribution growing. [ ]
    • Only create the stats_buildinfo.png graph once per day. [ ][ ]
    • Reproducible Debian dashboard: refactoring, update several more static stats only every 12h. [ ]
    • Document how to use systemctl with new systemd-based services. [ ]
    • Temporarily disable armhf and i386 continuous integration tests in order to get some stability back. [ ]
    • Use the deb.debian.org CDN everywhere. [ ]
    • Remove the rsyslog logging facility on bookworm systems. [ ]
    • Add zst to the list of packages which are false-positive diskspace issues. [ ]
    • Detect failures to bootstrap Debian base systems. [ ]
  • Arch Linux-related changes:
    • Temporarily disable builds because the pacman package manager is broken. [ ][ ]
    • Split reproducible_html_live_status and split the scheduling timing . [ ][ ][ ]
    • Improve handling when database is locked. [ ][ ]
  • Misc changes:
    • Show failed services that require manual cleanup. [ ][ ]
    • Integrate two new Infomaniak nodes. [ ][ ][ ][ ]
    • Improve IRC notifications for artifacts. [ ]
    • Run diffoscope in different systemd slices. [ ]
    • Run the node health check more often, as it can now repair some issues. [ ][ ]
    • Also include the string Bot in the userAgent for Git. (Re: #929013). [ ]
    • Document increased tmpfs size on our OUSL nodes. [ ]
    • Disable memory account for the reproducible_build service. [ ][ ]
    • Allow 10 times as many open files for the Jenkins service. [ ]
    • Set OOMPolicy=continue and OOMScoreAdjust=-1000 for both the Jenkins and the reproducible_build service. [ ]
Mattia Rizzolo also made the following changes:
  • Debian-related changes:
    • Define a systemd slice to group all relevant services. [ ][ ]
    • Add a bunch of quotes in scripts to assuage the shellcheck tool. [ ]
    • Add stats on how many packages have been built today so far. [ ]
    • Instruct systemd-run to handle diffoscope s exit codes specially. [ ]
    • Prefer the pgrep tool over grepping the output of ps. [ ]
    • Re-enable a couple of i386 and armhf architecture builders. [ ][ ]
    • Fix some stylistic issues flagged by the Python flake8 tool. [ ]
    • Cease scheduling Debian unstable and experimental on the armhf architecture due to the time_t transition. [ ]
    • Start a few more i386 & armhf workers. [ ][ ][ ]
    • Temporarly skip pbuilder updates in the unstable distribution, but only on the armhf architecture. [ ]
  • Other changes:
    • Perform some large-scale refactoring on how the systemd service operates. [ ][ ]
    • Move the list of workers into a separate file so it s accessible to a number of scripts. [ ]
    • Refactor the powercycle_x86_nodes.py script to use the new IONOS API and its new Python bindings. [ ]
    • Also fix nph-logwatch after the worker changes. [ ]
    • Do not install the stunnel tool anymore, it shouldn t be needed by anything anymore. [ ]
    • Move temporary directories related to Arch Linux into a single directory for clarity. [ ]
    • Update the arm64 architecture host keys. [ ]
    • Use a common Postfix configuration. [ ]
The following changes were also made by:
  • Jan-Benedict Glaw:
    • Initial work to clean up a messy NetBSD-related script. [ ][ ]
  • Roland Clobus:
    • Show the installer log if the installer fails to build. [ ]
    • Avoid the minus character (i.e. -) in a variable in order to allow for tags in openQA. [ ]
    • Update the schedule of Debian live image builds. [ ]
  • Vagrant Cascadian:
    • Maintenance on the virt* nodes is completed so bring them back online. [ ]
    • Use the fully qualified domain name in configuration. [ ]
Node maintenance was also performed by Holger Levsen, Mattia Rizzolo [ ][ ] and Vagrant Cascadian [ ][ ][ ][ ]

If you are interested in contributing to the Reproducible Builds project, please visit our Contribute page on our website. However, you can get in touch with us via:

7 February 2024

Reproducible Builds: Reproducible Builds in January 2024

Welcome to the January 2024 report from the Reproducible Builds project. In these reports we outline the most important things that we have been up to over the past month. If you are interested in contributing to the project, please visit our Contribute page on our website.

How we executed a critical supply chain attack on PyTorch John Stawinski and Adnan Khan published a lengthy blog post detailing how they executed a supply-chain attack against PyTorch, a popular machine learning platform used by titans like Google, Meta, Boeing, and Lockheed Martin :
Our exploit path resulted in the ability to upload malicious PyTorch releases to GitHub, upload releases to [Amazon Web Services], potentially add code to the main repository branch, backdoor PyTorch dependencies the list goes on. In short, it was bad. Quite bad.
The attack pivoted on PyTorch s use of self-hosted runners as well as submitting a pull request to address a trivial typo in the project s README file to gain access to repository secrets and API keys that could subsequently be used for malicious purposes.

New Arch Linux forensic filesystem tool On our mailing list this month, long-time Reproducible Builds developer kpcyrd announced a new tool designed to forensically analyse Arch Linux filesystem images. Called archlinux-userland-fs-cmp, the tool is supposed to be used from a rescue image (any Linux) with an Arch install mounted to, [for example], /mnt. Crucially, however, at no point is any file from the mounted filesystem eval d or otherwise executed. Parsers are written in a memory safe language. More information about the tool can be found on their announcement message, as well as on the tool s homepage. A GIF of the tool in action is also available.

Issues with our SOURCE_DATE_EPOCH code? Chris Lamb started a thread on our mailing list summarising some potential problems with the source code snippet the Reproducible Builds project has been using to parse the SOURCE_DATE_EPOCH environment variable:
I m not 100% sure who originally wrote this code, but it was probably sometime in the ~2015 era, and it must be in a huge number of codebases by now. Anyway, Alejandro Colomar was working on the shadow security tool and pinged me regarding some potential issues with the code. You can see this conversation here.
Chris ends his message with a request that those with intimate or low-level knowledge of time_t, C types, overflows and the various parsing libraries in the C standard library (etc.) contribute with further info.

Distribution updates In Debian this month, Roland Clobus posted another detailed update of the status of reproducible ISO images on our mailing list. In particular, Roland helpfully summarised that all major desktops build reproducibly with bullseye, bookworm, trixie and sid provided they are built for a second time within the same DAK run (i.e. [within] 6 hours) . Additionally 7 of the 8 bookworm images from the official download link build reproducibly at any later time. In addition to this, three reviews of Debian packages were added, 17 were updated and 15 were removed this month adding to our knowledge about identified issues. Elsewhere, Bernhard posted another monthly update for his work elsewhere in openSUSE.

Community updates There were made a number of improvements to our website, including Bernhard M. Wiedemann fixing a number of typos of the term nondeterministic . [ ] and Jan Zerebecki adding a substantial and highly welcome section to our page about SOURCE_DATE_EPOCH to document its interaction with distribution rebuilds. [ ].
diffoscope is our in-depth and content-aware diff utility that can locate and diagnose reproducibility issues. This month, Chris Lamb made a number of changes such as uploading versions 254 and 255 to Debian but focusing on triaging and/or merging code from other contributors. This included adding support for comparing eXtensible ARchive (.XAR/.PKG) files courtesy of Seth Michael Larson [ ][ ], as well considerable work from Vekhir in order to fix compatibility between various and subtle incompatible versions of the progressbar libraries in Python [ ][ ][ ][ ]. Thanks!

Reproducibility testing framework The Reproducible Builds project operates a comprehensive testing framework (available at tests.reproducible-builds.org) in order to check packages and other artifacts for reproducibility. In January, a number of changes were made by Holger Levsen:
  • Debian-related changes:
    • Reduce the number of arm64 architecture workers from 24 to 16. [ ]
    • Use diffoscope from the Debian release being tested again. [ ]
    • Improve the handling when killing unwanted processes [ ][ ][ ] and be more verbose about it, too [ ].
    • Don t mark a job as failed if process marked as to-be-killed is already gone. [ ]
    • Display the architecture of builds that have been running for more than 48 hours. [ ]
    • Reboot arm64 nodes when they hit an OOM (out of memory) state. [ ]
  • Package rescheduling changes:
    • Reduce IRC notifications to 1 when rescheduling due to package status changes. [ ]
    • Correctly set SUDO_USER when rescheduling packages. [ ]
    • Automatically reschedule packages regressing to FTBFS (build failure) or FTBR (build success, but unreproducible). [ ]
  • OpenWrt-related changes:
    • Install the python3-dev and python3-pyelftools packages as they are now needed for the sunxi target. [ ][ ]
    • Also install the libpam0g-dev which is needed by some OpenWrt hardware targets. [ ]
  • Misc:
    • As it s January, set the real_year variable to 2024 [ ] and bump various copyright years as well [ ].
    • Fix a large (!) number of spelling mistakes in various scripts. [ ][ ][ ]
    • Prevent Squid and Systemd processes from being killed by the kernel s OOM killer. [ ]
    • Install the iptables tool everywhere, else our custom rc.local script fails. [ ]
    • Cleanup the /srv/workspace/pbuilder directory on boot. [ ]
    • Automatically restart Squid if it fails. [ ]
    • Limit the execution of chroot-installation jobs to a maximum of 4 concurrent runs. [ ][ ]
Significant amounts of node maintenance was performed by Holger Levsen (eg. [ ][ ][ ][ ][ ][ ][ ] etc.) and Vagrant Cascadian (eg. [ ][ ][ ][ ][ ][ ][ ][ ]). Indeed, Vagrant Cascadian handled an extended power outage for the network running the Debian armhf architecture test infrastructure. This provided the incentive to replace the UPS batteries and consolidate infrastructure to reduce future UPS load. [ ] Elsewhere in our infrastructure, however, Holger Levsen also adjusted the email configuration for @reproducible-builds.org to deal with a new SMTP email attack. [ ]

Upstream patches The Reproducible Builds project tries to detects, dissects and fix as many (currently) unreproducible packages as possible. We endeavour to send all of our patches upstream where appropriate. This month, we wrote a large number of such patches, including: Separate to this, Vagrant Cascadian followed up with the relevant maintainers when reproducibility fixes were not included in newly-uploaded versions of the mm-common package in Debian this was quickly fixed, however. [ ]

If you are interested in contributing to the Reproducible Builds project, please visit our Contribute page on our website. However, you can get in touch with us via:

6 December 2023

Reproducible Builds: Reproducible Builds in November 2023

Welcome to the November 2023 report from the Reproducible Builds project! In these reports we outline the most important things that we have been up to over the past month. As a rather rapid recap, whilst anyone may inspect the source code of free software for malicious flaws, almost all software is distributed to end users as pre-compiled binaries (more).

Reproducible Builds Summit 2023 Between October 31st and November 2nd, we held our seventh Reproducible Builds Summit in Hamburg, Germany! Amazingly, the agenda and all notes from all sessions are all online many thanks to everyone who wrote notes from the sessions. As a followup on one idea, started at the summit, Alexander Couzens and Holger Levsen started work on a cache (or tailored front-end) for the snapshot.debian.org service. The general idea is that, when rebuilding Debian, you do not actually need the whole ~140TB of data from snapshot.debian.org; rather, only a very small subset of the packages are ever used for for building. It turns out, for amd64, arm64, armhf, i386, ppc64el, riscv64 and s390 for Debian trixie, unstable and experimental, this is only around 500GB ie. less than 1%. Although the new service not yet ready for usage, it has already provided a promising outlook in this regard. More information is available on https://rebuilder-snapshot.debian.net and we hope that this service becomes usable in the coming weeks. The adjacent picture shows a sticky note authored by Jan-Benedict Glaw at the summit in Hamburg, confirming Holger Levsen s theory that rebuilding all Debian packages needs a very small subset of packages, the text states that 69,200 packages (in Debian sid) list 24,850 packages in their .buildinfo files, in 8,0200 variations. This little piece of paper was the beginning of rebuilder-snapshot and is a direct outcome of the summit! The Reproducible Builds team would like to thank our event sponsors who include Mullvad VPN, openSUSE, Debian, Software Freedom Conservancy, Allotropia and Aspiration Tech.

Beyond Trusting FOSS presentation at SeaGL On November 4th, Vagrant Cascadian presented Beyond Trusting FOSS at SeaGL in Seattle, WA in the United States. Founded in 2013, SeaGL is a free, grassroots technical summit dedicated to spreading awareness and knowledge about free source software, hardware and culture. The summary of Vagrant s talk mentions that it will:
[ ] introduce the concepts of Reproducible Builds, including best practices for developing and releasing software, the tools available to help diagnose issues, and touch on progress towards solving decades-old deeply pervasive fundamental security issues Learn how to verify and demonstrate trust, rather than simply hoping everything is OK!
Germane to the contents of the talk, the slides for Vagrant s talk can be built reproducibly, resulting in a PDF with a SHA1 of cfde2f8a0b7e6ec9b85377eeac0661d728b70f34 when built on Debian bookworm and c21fab273232c550ce822c4b0d9988e6c49aa2c3 on Debian sid at the time of writing.

Human Factors in Software Supply Chain Security Marcel Fourn , Dominik Wermke, Sascha Fahl and Yasemin Acar have published an article in a Special Issue of the IEEE s Security & Privacy magazine. Entitled A Viewpoint on Human Factors in Software Supply Chain Security: A Research Agenda, the paper justifies the need for reproducible builds to reach developers and end-users specifically, and furthermore points out some under-researched topics that we have seen mentioned in interviews. An author pre-print of the article is available in PDF form.

Community updates On our mailing list this month:

openSUSE updates Bernhard M. Wiedemann has created a wiki page outlining an proposal to create a general-purpose Linux distribution which consists of 100% bit-reproducible packages albeit minus the embedded signature within RPM files. It would be based on openSUSE Tumbleweed or, if available, its Slowroll-variant. In addition, Bernhard posted another monthly update for his work elsewhere in openSUSE.

Ubuntu Launchpad now supports .buildinfo files Back in 2017, Steve Langasek filed a bug against Ubuntu s Launchpad code hosting platform to report that .changes files (artifacts of building Ubuntu and Debian packages) reference .buildinfo files that aren t actually exposed by Launchpad itself. This was causing issues when attempting to process .changes files with tools such as Lintian. However, it was noticed last month that, in early August of this year, Simon Quigley had resolved this issue, and .buildinfo files are now available from the Launchpad system.

PHP reproducibility updates There have been two updates from the PHP programming language this month. Firstly, the widely-deployed PHPUnit framework for the PHP programming language have recently released version 10.5.0, which introduces the inclusion of a composer.lock file, ensuring total reproducibility of the shipped binary file. Further details and the discussion that went into their particular implementation can be found on the associated GitHub pull request. In addition, the presentation Leveraging Nix in the PHP ecosystem has been given in late October at the PHP International Conference in Munich by Pol Dellaiera. While the video replay is not yet available, the (reproducible) presentation slides and speaker notes are available.

diffoscope changes diffoscope is our in-depth and content-aware diff utility that can locate and diagnose reproducibility issues. This month, Chris Lamb made a number of changes, including:
  • Improving DOS/MBR extraction by adding support for 7z. [ ]
  • Adding a missing RequiredToolNotFound import. [ ]
  • As a UI/UX improvement, try and avoid printing an extended traceback if diffoscope runs out of memory. [ ]
  • Mark diffoscope as stable on PyPI.org. [ ]
  • Uploading version 252 to Debian unstable. [ ]

Website updates A huge number of notes were added to our website that were taken at our recent Reproducible Builds Summit held between October 31st and November 2nd in Hamburg, Germany. In particular, a big thanks to Arnout Engelen, Bernhard M. Wiedemann, Daan De Meyer, Evangelos Ribeiro Tzaras, Holger Levsen and Orhun Parmaks z. In addition to this, a number of other changes were made, including:

Upstream patches The Reproducible Builds project detects, dissects and attempts to fix as many currently-unreproducible packages as possible. We endeavour to send all of our patches upstream where appropriate. This month, we wrote a large number of such patches, including:

Reproducibility testing framework The Reproducible Builds project operates a comprehensive testing framework (available at tests.reproducible-builds.org) in order to check packages and other artifacts for reproducibility. In October, a number of changes were made by Holger Levsen:
  • Debian-related changes:
    • Track packages marked as Priority: important in a new package set. [ ][ ]
    • Stop scheduling packages that fail to build from source in bookworm [ ] and bullseye. [ ].
    • Add old releases dashboard link in web navigation. [ ]
    • Permit re-run of the pool_buildinfos script to be re-run for a specific year. [ ]
    • Grant jbglaw access to the osuosl4 node [ ][ ] along with lynxis [ ].
    • Increase RAM on the amd64 Ionos builders from 48 GiB to 64 GiB; thanks IONOS! [ ]
    • Move buster to archived suites. [ ][ ]
    • Reduce the number of arm64 architecture workers from 24 to 16 in order to improve stability [ ], reduce the workers for amd64 from 32 to 28 and, for i386, reduce from 12 down to 8 [ ].
    • Show the entire build history of each Debian package. [ ]
    • Stop scheduling already tested package/version combinations in Debian bookworm. [ ]
  • Snapshot service for rebuilders
    • Add an HTTP-based API endpoint. [ ][ ]
    • Add a Gunicorn instance to serve the HTTP API. [ ]
    • Add an NGINX config [ ][ ][ ][ ]
  • System-health:
    • Detect failures due to HTTP 503 Service Unavailable errors. [ ]
    • Detect failures to update package sets. [ ]
    • Detect unmet dependencies. (This usually occurs with builds of Debian live-build.) [ ]
  • Misc-related changes:
    • do install systemd-ommd on jenkins. [ ]
    • fix harmless typo in squid.conf for codethink04. [ ]
    • fixup: reproducible Debian: add gunicorn service to serve /api for rebuilder-snapshot.d.o. [ ]
    • Increase codethink04 s Squid cache_dir size setting to 16 GiB. [ ]
    • Don t install systemd-oomd as it unfortunately kills sshd [ ]
    • Use debootstrap from backports when commisioning nodes. [ ]
    • Add the live_build_debian_stretch_gnome, debsums-tests_buster and debsums-tests_buster jobs to the zombie list. [ ][ ]
    • Run jekyll build with the --watch argument when building the Reproducible Builds website. [ ]
    • Misc node maintenance. [ ][ ][ ]
Other changes were made as well, however, including Mattia Rizzolo fixing rc.local s Bash syntax so it can actually run [ ], commenting away some file cleanup code that is (potentially) deleting too much [ ] and fixing the html_brekages page for Debian package builds [ ]. Finally, diagnosed and submitted a patch to add a AddEncoding gzip .gz line to the tests.reproducible-builds.org Apache configuration so that Gzip files aren t re-compressed as Gzip which some clients can t deal with (as well as being a waste of time). [ ]

If you are interested in contributing to the Reproducible Builds project, please visit our Contribute page on our website. However, you can get in touch with us via:

4 August 2023

Reproducible Builds: Reproducible Builds in July 2023

Welcome to the July 2023 report from the Reproducible Builds project. In our reports, we try to outline the most important things that we have been up to over the past month. As ever, if you are interested in contributing to the project, please visit the Contribute page on our website.
Marcel Fourn et al. presented at the IEEE Symposium on Security and Privacy in San Francisco, CA on The Importance and Challenges of Reproducible Builds for Software Supply Chain Security. As summarised in last month s report, the abstract of their paper begins:
The 2020 Solarwinds attack was a tipping point that caused a heightened awareness about the security of the software supply chain and in particular the large amount of trust placed in build systems. Reproducible Builds (R-Bs) provide a strong foundation to build defenses for arbitrary attacks against build systems by ensuring that given the same source code, build environment, and build instructions, bitwise-identical artifacts are created. (PDF)

Chris Lamb published an interview with Simon Butler, associate senior lecturer in the School of Informatics at the University of Sk vde, on the business adoption of Reproducible Builds. (This is actually the seventh instalment in a series featuring the projects, companies and individuals who support our project. We started this series by featuring the Civil Infrastructure Platform project, and followed this up with a post about the Ford Foundation as well as recent ones about ARDC, the Google Open Source Security Team (GOSST), Bootstrappable Builds, the F-Droid project and David A. Wheeler.) Vagrant Cascadian presented Breaking the Chains of Trusting Trust at FOSSY 2023.
Rahul Bajaj has been working with Roland Clobus on merging an overview of environment variations to our website:
I have identified 16 root causes for unreproducible builds in my empirical study, which I have linked to the corresponding documentation. The initial MR right now contains information about 10 root causes. For each root cause, I have provided a definition, a notable instance, and a workaround. However, I have only found workarounds for 5 out of the 10 root causes listed in this merge request. In the upcoming commits, I plan to add an additional 6 root causes. I kindly request you review the text for any necessary refinements, modifications, or corrections. Additionally, I would appreciate the help with documentation for the solutions/workarounds for the remaining root causes: Archive Metadata, Build ID, File System Ordering, File Permissions, and Snippet Encoding. Your input on the identified root causes for unreproducible builds would be greatly appreciated. [ ]

Just a reminder that our upcoming Reproducible Builds Summit is set to take place from October 31st November 2nd 2023 in Hamburg, Germany. Our summits are a unique gathering that brings together attendees from diverse projects, united by a shared vision of advancing the Reproducible Builds effort. During this enriching event, participants will have the opportunity to engage in discussions, establish connections and exchange ideas to drive progress in this vital field. If you re interested in joining us this year, please make sure to read the event page which has more details about the event and location.
There was more progress towards making the Go programming language ecosystem reproducible this month, including: In addition, kpcyrd posted to our mailing list to report that:
while packaging govulncheck for Arch Linux I noticed a checksum mismatch for a tar file I downloaded from go.googlesource.com. I used diffoscope to compare the .tar file I downloaded with the .tar file the build server downloaded, and noticed the timestamps are different.

In Debian, 20 reviews of Debian packages were added, 25 were updated and 25 were removed this month adding to our knowledge about identified issues. A number of issue types were updated, including marking ffile_prefix_map_passed_to_clang being fixed since Debian bullseye [ ] and adding a Debian bug tracker reference for the nondeterminism_added_by_pyqt5_pyrcc5 issue [ ]. In addition, Roland Clobus posted another detailed update of the status of reproducible Debian ISO images on our mailing list. In particular, Roland helpfully summarised that live images are looking good, and the number of (passing) automated tests is growing .
Bernhard M. Wiedemann published another monthly report about reproducibility within openSUSE.
F-Droid added 20 new reproducible apps in July, making 165 apps in total that are published with Reproducible Builds and using the upstream developer s signature. [ ]
The Sphinx documentation tool recently accepted a change to improve deterministic reproducibility of documentation. It s internal util.inspect.object_description attempts to sort collections, but this can fail. The change handles the failure case by using string-based object descriptions as a fallback deterministic sort ordering, as well as adding recursive object-description calls for list and tuple datatypes. As a result, documentation generated by Sphinx will be more likely to be automatically reproducible. Lastly in news, kpcyrd posted to our mailing list announcing a new repro-env tool:
My initial interest in reproducible builds was how do I distribute pre-compiled binaries on GitHub without people raising security concerns about them . I ve cycled back to this original problem about 5 years later and built a tool that is meant to address this. [ ]

Upstream patches The Reproducible Builds project detects, dissects and attempts to fix as many currently-unreproducible packages as possible. We endeavour to send all of our patches upstream where appropriate. This month, we wrote a large number of such patches, including:
In diffoscope development this month, versions 244, 245 and 246 were uploaded to Debian unstable by Chris Lamb, who also made the following changes:
  • Don t include the file size in image metadata. It is, at best, distracting, and it is already in the directory metadata. [ ]
  • Add compatibility with libarchive-5. [ ]
  • Mark that the test_dex::test_javap_14_differences test requires the procyon tool. [ ]
  • Initial work on DOS/MBR extraction. [ ]
  • Move to using assert_diff in the .ico and .jpeg tests. [ ]
  • Temporarily mark some Android-related as XFAIL due to Debian bugs #1040941 & #1040916. [ ]
  • Fix the test skipped reason generation in the case of a version outside of the required range. [ ]
  • Update copyright years. [ ][ ]
  • Fix try.diffoscope.org. [ ]
In addition, Gianfranco Costamagna added support for LLVM version 16. [ ]

Testing framework The Reproducible Builds project operates a comprehensive testing framework (available at tests.reproducible-builds.org) in order to check packages and other artifacts for reproducibility. In July, a number of changes were made by Holger Levsen:
  • General changes:
    • Upgrade Jenkins host to Debian bookworm now that Debian 12.1 is out. [ ][ ][ ][ ]
    • djm: improve UX when rebooting a node fails. [ ]
    • djm: reduce wait time between rebooting nodes. [ ]
  • Debian-related changes:
    • Various refactoring of the Debian scheduler. [ ][ ][ ]
    • Make Debian live builds more robust with respect to salsa.debian.org returning HTTP 502 errors. [ ][ ]
    • Use the legacy SCP protocol instead of the SFTP protocol when transfering Debian live builds. [ ][ ]
    • Speed up a number of database queries thanks, Myon! [ ][ ][ ][ ][ ]
    • Split create_meta_pkg_sets job into two (for Debian unstable and Debian testing) to half the job runtime to approximately 90 minutes. [ ][ ]
    • Split scheduler job into four separate jobs, one for each tested architecture. [ ][ ]
    • Treat more PostgreSQL errors as serious (for some jobs). [ ]
    • Re-enable automatic database documentation now that postgresql_autodoc is back in Debian bookworm. [ ]
    • Remove various hardcoding of Debian release names. [ ]
    • Drop some i386 special casing. [ ]
  • Other distributions:
    • Speed up Alpine SQL queries. [ ]
    • Adjust CSS layout for Arch Linux pages to match 3 and not 4 repos being tested. [ ]
    • Drop the community Arch Linux repo as it has now been merged into the extra repo. [ ]
    • Speed up a number of Arch-related database queries. [ ]
    • Try harder to properly cleanup after building OpenWrt packages. [ ]
    • Drop all kfreebsd-related tests now that it s officially dead. [ ]
  • System health:
    • Always ignore some well-known harmless orphan processes. [ ][ ][ ]
    • Detect another case of job failure due to Jenkins shutdown. [ ]
    • Show all non co-installable package sets on the status page. [ ]
    • Warn that some specific reboot nodes are currently false-positives. [ ]
  • Node health checks:
    • Run system and node health checks for Jenkins less frequently. [ ]
    • Try to restart any failed dpkg-db-backup [ ] and munin-node services [ ].
In addition, Vagrant Cascadian updated the paths in our automated to tests to use the same paths used by the official Debian build servers. [ ]

If you are interested in contributing to the Reproducible Builds project, please visit our Contribute page on our website. However, you can get in touch with us via:

5 June 2023

Reproducible Builds: Reproducible Builds in May 2023

Welcome to the May 2023 report from the Reproducible Builds project In our reports, we outline the most important things that we have been up to over the past month. As always, if you are interested in contributing to the project, please visit our Contribute page on our website.


Holger Levsen gave a talk at the 2023 edition of the Debian Reunion Hamburg, a semi-informal meetup of Debian-related people in northern Germany. The slides are available online.
In April, Holger Levsen gave a talk at foss-north 2023 titled Reproducible Builds, the first ten years. Last month, however, Holger s talk was covered in a round-up of the conference on the Free Software Foundation Europe (FSFE) blog.
Pronnoy Goswami, Saksham Gupta, Zhiyuan Li, Na Meng and Daphne Yao from Virginia Tech published a paper investigating the Reproducibility of NPM Packages. The abstract includes:
When using open-source NPM packages, most developers download prebuilt packages on npmjs.com instead of building those packages from available source, and implicitly trust the downloaded packages. However, it is unknown whether the blindly trusted prebuilt NPM packages are reproducible (i.e., whether there is always a verifiable path from source code to any published NPM package). [ ] We downloaded versions/releases of 226 most popularly used NPM packages and then built each version with the available source on GitHub. Next, we applied a differencing tool to compare the versions we built against versions downloaded from NPM, and further inspected any reported difference.
The paper reports that among the 3,390 versions of the 226 packages, only 2,087 versions are reproducible, and furthermore that multiple factors contribute to the non-reproducibility including flexible versioning information in package.json file and the divergent behaviors between distinct versions of tools used in the build process. The paper concludes with insights for future verifiable build procedures. Unfortunately, a PDF is not available publically yet, but a Digital Object Identifier (DOI) is available on the paper s IEEE page.
Elsewhere in academia, Betul Gokkaya, Leonardo Aniello and Basel Halak of the School of Electronics and Computer Science at the University of Southampton published a new paper containing a broad overview of attacks and comprehensive risk assessment for software supply chain security. Their paper, titled Software supply chain: review of attacks, risk assessment strategies and security controls, analyses the most common software supply-chain attacks by providing the latest trend of analyzed attack, and identifies the security risks for open-source and third-party software supply chains. Furthermore, their study introduces unique security controls to mitigate analyzed cyber-attacks and risks by linking them with real-life security incidence and attacks . (arXiv.org, PDF)
NixOS is now tracking two new reports at reproducible.nixos.org. Aside from the collection of build-time dependencies of the minimal and Gnome installation ISOs, this page now also contains reports that are restricted to the artifacts that make it into the image. The minimal ISO is currently reproducible except for Python 3.10, which hopefully will be resolved with the coming update to Python version 3.11.
On our rb-general mailing list this month: David A. Wheeler started a thread noting that the OSSGadget project s oss-reproducible tool was measuring something related to but not the same as reproducible builds. Initially they had adopted the term semantically reproducible build term for what it measured, which they defined as being if its build results can be either recreated exactly (a bit for bit reproducible build), or if the differences between the release package and a rebuilt package are not expected to produce functional differences in normal cases. This generated a significant number of replies, and several were concerned that people might confuse what they were measuring with reproducible builds . After discussion, the OSSGadget developers decided to switch to the term semantically equivalent for what they measured in order to reduce the risk of confusion. Vagrant Cascadian (vagrantc) posted an update about GCC, binutils, and Debian s build-essential set with some progress, some hope, and I daresay, some fears . Lastly, kpcyrd asked a question about building a reproducible Linux kernel package for Arch Linux (answered by Arnout Engelen). In the same, thread David A. Wheeler pointed out that the Linux Kernel documentation has a chapter about Reproducible kernel builds now as well.
In Debian this month, nine reviews of Debian packages were added, 20 were updated and 6 were removed this month, all adding to our knowledge about identified issues. In addition, Vagrant Cascadian added a link to the source code causing various ecbuild issues. [ ]
The F-Droid project updated its Inclusion How-To with a new section explaining why it considers reproducible builds to be best practice and hopes developers will support the team s efforts to make as many (new) apps reproducible as it reasonably can.
In diffoscope development this month, version 242 was uploaded to Debian unstable by Chris Lamb who also made the following changes: In addition, Mattia Rizzolo documented how to (re)-produce a binary blob in the code [ ] and Vagrant Cascadian updated the version of diffoscope in GNU Guix to 242 [ ].
reprotest is our tool for building the same source code twice in different environments and then checking the binaries produced by each build for any differences. This month, Holger Levsen uploaded versions 0.7.24 and 0.7.25 to Debian unstable which added support for Tox versions 3 and 4 with help from Vagrant Cascadian [ ][ ][ ]

Upstream patches The Reproducible Builds project detects, dissects and attempts to fix as many currently-unreproducible packages as possible. We endeavour to send all of our patches upstream where appropriate. This month, we wrote a large number of such patches, including: In addition, Jason A. Donenfeld filed a bug (now fixed in the latest alpha version) in the Android issue tracker to report that generateLocaleConfig in Android Gradle Plugin version 8.1.0 generates XML files using non-deterministic ordering, breaking reproducible builds. [ ]

Testing framework The Reproducible Builds project operates a comprehensive testing framework (available at tests.reproducible-builds.org) in order to check packages and other artifacts for reproducibility. In May, a number of changes were made by Holger Levsen:
  • Update the kernel configuration of arm64 nodes only put required modules in the initrd to save space in the /boot partition. [ ]
  • A huge number of changes to a new tool to document/track Jenkins node maintenance, including adding --fetch, --help, --no-future and --verbose options [ ][ ][ ][ ] as well as adding a suite of new actions, such as apt-upgrade, command, deploy-git, rmstamp, etc. [ ][ ][ ][ ] in addition a significant amount of refactoring [ ][ ][ ][ ].
  • Issue warnings if apt has updates to install. [ ]
  • Allow Jenkins to run apt get update in maintenance job. [ ]
  • Installed bind9-dnsutils on some Ubuntu 18.04 nodes. [ ][ ]
  • Fixed the Jenkins shell monitor to correctly deal with little-used directories. [ ]
  • Updated the node health check to warn when apt upgrades are available. [ ]
  • Performed some node maintenance. [ ]
In addition, Vagrant Cascadian added the nocheck, nopgo and nolto when building gcc-* and binutils packages [ ] as well as performed some node maintenance [ ][ ]. In addition, Roland Clobus updated the openQA configuration to specify longer timeouts and access to the developer mode [ ] and updated the URL used for reproducible Debian Live images [ ].

If you are interested in contributing to the Reproducible Builds project, please visit our Contribute page on our website. However, you can get in touch with us via:

6 May 2023

Reproducible Builds: Reproducible Builds in April 2023

Welcome to the April 2023 report from the Reproducible Builds project! In these reports we outline the most important things that we have been up to over the past month. And, as always, if you are interested in contributing to the project, please visit our Contribute page on our website.

General news Trisquel is a fully-free operating system building on the work of Ubuntu Linux. This month, Simon Josefsson published an article on his blog titled Trisquel is 42% Reproducible!. Simon wrote:
The absolute number may not be impressive, but what I hope is at least a useful contribution is that there actually is a number on how much of Trisquel is reproducible. Hopefully this will inspire others to help improve the actual metric.
Simon wrote another blog post this month on a new tool to ensure that updates to Linux distribution archive metadata (eg. via apt-get update) will only use files that have been recorded in a globally immutable and tamper-resistant ledger. A similar solution exists for Arch Linux (called pacman-bintrans) which was announced in August 2021 where an archive of all issued signatures is publically accessible.
Joachim Breitner wrote an in-depth blog post on a bootstrap-capable GHC, the primary compiler for the Haskell programming language. As a quick background to what this is trying to solve, in order to generate a fully trustworthy compile chain, trustworthy root binaries are needed and a popular approach to address this problem is called bootstrappable builds where the core idea is to address previously-circular build dependencies by creating a new dependency path using simpler prerequisite versions of software. Joachim takes an somewhat recursive approach to the problem for Haskell, leading to the inadvertently humourous question: Can I turn all of GHC into one module, and compile that? Elsewhere in the world of bootstrapping, Janneke Nieuwenhuizen and Ludovic Court s wrote a blog post on the GNU Guix blog announcing The Full-Source Bootstrap, specifically:
[ ] the third reduction of the Guix bootstrap binaries has now been merged in the main branch of Guix! If you run guix pull today, you get a package graph of more than 22,000 nodes rooted in a 357-byte program something that had never been achieved, to our knowledge, since the birth of Unix.
More info about this change is available on the post itself, including:
The full-source bootstrap was once deemed impossible. Yet, here we are, building the foundations of a GNU/Linux distro entirely from source, a long way towards the ideal that the Guix project has been aiming for from the start. There are still some daunting tasks ahead. For example, what about the Linux kernel? The good news is that the bootstrappable community has grown a lot, from two people six years ago there are now around 100 people in the #bootstrappable IRC channel.

Michael Ablassmeier created a script called pypidiff as they were looking for a way to track differences between packages published on PyPI. According to Micahel, pypidiff uses diffoscope to create reports on the published releases and automatically pushes them to a GitHub repository. This can be seen on the pypi-diff GitHub page (example).
Eleuther AI, a non-profit AI research group, recently unveiled Pythia, a collection of 16 Large Language Model (LLMs) trained on public data in the same order designed specifically to facilitate scientific research. According to a post on MarkTechPost:
Pythia is the only publicly available model suite that includes models that were trained on the same data in the same order [and] all the corresponding data and tools to download and replicate the exact training process are publicly released to facilitate further research.
These properties are intended to allow researchers to understand how gender bias (etc.) can affected by training data and model scale.
Back in February s report we reported on a series of changes to the Sphinx documentation generator that was initiated after attempts to get the alembic Debian package to build reproducibly. Although Chris Lamb was able to identify the source problem and provided a potential patch that might fix it, James Addison has taken the issue in hand, leading to a large amount of activity resulting in a proposed pull request that is waiting to be merged.
WireGuard is a popular Virtual Private Network (VPN) service that aims to be faster, simpler and leaner than other solutions to create secure connections between computing devices. According to a post on the WireGuard developer mailing list, the WireGuard Android app can now be built reproducibly so that its contents can be publicly verified. According to the post by Jason A. Donenfeld, the F-Droid project now does this verification by comparing their build of WireGuard to the build that the WireGuard project publishes. When they match, the new version becomes available. This is very positive news.
Author and public speaker, V. M. Brasseur published a sample chapter from her upcoming book on corporate open source strategy which is the topic of Software Bill of Materials (SBOM):
A software bill of materials (SBOM) is defined as a nested inventory for software, a list of ingredients that make up software components. When you receive a physical delivery of some sort, the bill of materials tells you what s inside the box. Similarly, when you use software created outside of your organisation, the SBOM tells you what s inside that software. The SBOM is a file that declares the software supply chain (SSC) for that specific piece of software. [ ]

Several distributions noticed recent versions of the Linux Kernel are no longer reproducible because the BPF Type Format (BTF) metadata is not generated in a deterministic way. This was discussed on the #reproducible-builds IRC channel, but no solution appears to be in sight for now.

Community news On our mailing list this month: Holger Levsen gave a talk at foss-north 2023 in Gothenburg, Sweden on the topic of Reproducible Builds, the first ten years. Lastly, there were a number of updates to our website, including:
  • Chris Lamb attempted a number of ways to try and fix literal : .lead appearing in the page [ ][ ][ ], made all the Back to who is involved links italics [ ], and corrected the syntax of the _data/sponsors.yml file [ ].
  • Holger Levsen added his recent talk [ ], added Simon Josefsson, Mike Perry and Seth Schoen to the contributors page [ ][ ][ ], reworked the People page a little [ ] [ ], as well as fixed spelling of Arch Linux [ ].
Lastly, Mattia Rizzolo moved some old sponsors to a former section [ ] and Simon Josefsson added Trisquel GNU/Linux. [ ]

Debian
  • Vagrant Cascadian reported on the Debian s build-essential package set, which was inspired by how close we are to making the Debian build-essential set reproducible and how important that set of packages are in general . Vagrant mentioned that: I have some progress, some hope, and I daresay, some fears . [ ]
  • Debian Developer Cyril Brulebois (kibi) filed a bug against snapshot.debian.org after they noticed that there are many missing dinstalls that is to say, the snapshot service is not capturing 100% of all of historical states of the Debian archive. This is relevant to reproducibility because without the availability historical versions, it is becomes impossible to repeat a build at a future date in order to correlate checksums. .
  • 20 reviews of Debian packages were added, 21 were updated and 5 were removed this month adding to our knowledge about identified issues. Chris Lamb added a new build_path_in_line_annotations_added_by_ruby_ragel toolchain issue. [ ]
  • Mattia Rizzolo announced that the data for the stretch archive on tests.reproducible-builds.org has been archived. This matches the archival of stretch within Debian itself. This is of some historical interest, as stretch was the first Debian release regularly tested by the Reproducible Builds project.

Upstream patches The Reproducible Builds project detects, dissects and attempts to fix as many currently-unreproducible packages as possible. We endeavour to send all of our patches upstream where appropriate. This month, we wrote a large number of such patches, including:

diffoscope development diffoscope version 241 was uploaded to Debian unstable by Chris Lamb. It included contributions already covered in previous months as well a change by Chris Lamb to add a missing raise statement that was accidentally dropped in a previous commit. [ ]

Testing framework The Reproducible Builds project operates a comprehensive testing framework (available at tests.reproducible-builds.org) in order to check packages and other artifacts for reproducibility. In April, a number of changes were made, including:
  • Holger Levsen:
    • Significant work on a new Documented Jenkins Maintenance (djm) script to support logged maintenance of nodes, etc. [ ][ ][ ][ ][ ][ ]
    • Add the new APT repo url for Jenkins itself with a new signing key. [ ][ ]
    • In the Jenkins shell monitor, allow 40 GiB of files for diffoscope for the Debian experimental distribution as Debian is frozen around the release at the moment. [ ]
    • Updated Arch Linux testing to cleanup leftover files left in /tmp/archlinux-ci/ after three days. [ ][ ][ ]
    • Mark a number of nodes hosted by Oregon State University Open Source Lab (OSUOSL) as online and offline. [ ][ ][ ]
    • Update the node health checks to detect failures to end schroot sessions. [ ]
    • Filter out another duplicate contributor from the contributor statistics. [ ]
  • Mattia Rizzolo:



If you are interested in contributing to the Reproducible Builds project, please visit our Contribute page on our website. However, you can get in touch with us via:

28 March 2023

kpcyrd: Writing a Linux executable from scratch with x86_64-unknown-none and Rust

I recently mentioned on the internet I did work in this direction and a friend of mine asked me to write a blogpost on this. I didn t blog for a long time (keeping all the goodness for myself hehe), so here we go. To set the scene, let s assume we want to make an exectuable binary for x86_64 Linux that s supposed to be extremely portable. It should work on both Debian and Arch Linux. It should work on systems without glibc like Alpine Linux. It should even work in a FROM scratch Docker container. In a more serious setting you would statically link musl-libc with your Rust program, but today we re in a silly-goofy mood so we re going to try to make this work without a libc. And we re also going to use Rust for this, more specifically the stable release channel of Rust, so this blog post won t use any nightly-only features that might still change/break. If you re using a Rust 1.0 version that was recent at the time of writing or later (>= 1.68.0 according to my computer), you should be able to try this at home just fine . This tutorial assumes you have no prior programming experience in any programming language, but it s going to involve some x86_64 assembly. If you already know what a syscall is, you ll be just fine. If this is your first exposure to programming you might still be able to follow along, but it might be a wild ride. If you haven t already, install rustup (possibly also available in your package manager, who knows?)
# when asked, press enter to confirm default settings
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs   sh
This is going to install everything you need to use Rust on Linux (this tutorial assumes you re following along on Linux btw). Usually it s still using a system linker (by calling the cc binary, and errors out if none is present), but instead we re going to use rustup to install an additional target:
rustup target add x86_64-unknown-none
I don t know if/how this is made available by Linux distributions, so I recommend following along with rust installed from rustup. Anyway, we re creating a new project with cargo, this creates a new directory that we can then change into (you might ve done this before):
cargo new hack-the-planet
cd hack-the-planet
There s going to be a file named Cargo.toml, we don t need to make any changes there, but the one that was auto-generated for me at the time of writing looks like this:
[package]
name = "hack-the-planet"
version = "0.1.0"
edition = "2021"
# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html
[dependencies]
There s a second file named src/main.rs, it s going to contain some pre-generated hello world, but we re going to delete it and create a new, empty file:
rm src/main.rs
touch src/main.rs
Alrighty, leaving this file empty is not valid but we re going to walk through the individual steps so we re going to try to build with an empty file first. At this point I would like to credit this chapter of a fasterthanli.me series and a blogpost by Philipp Oppermann, this tutorial is merely an 2023 update and makes it work with stable Rust. Let s run the build:
$ cargo build --release --target x86_64-unknown-none
   Compiling hack-the-planet v0.1.0 (/hack-the-planet)
error[E0463]: can't find crate for  std 
   
  = note: the  x86_64-unknown-none  target may not support the standard library
  = note:  std  is required by  hack_the_planet  because it does not declare  #![no_std] 
error[E0601]:  main  function not found in crate  hack_the_planet 
   
  = note: consider adding a  main  function to  src/main.rs 
Some errors have detailed explanations: E0463, E0601.
For more information about an error, try  rustc --explain E0463 .
error: could not compile  hack-the-planet  due to 2 previous errors
Since this doesn t use a libc (oh right, I forgot to mention this up to this point actually), this also means there s no std standard library. Usually the standard library of Rust still uses the system libc to do syscalls, but since we specify our libc as none this means std won t be available (use std::fs::rename won t work). There are still other functions we can use and import, for example there s core that s effectively a second standard library, but much smaller. To opt-out of the std standard library, we can put #![no_std] into src/main.rs:
#![no_std]
Running the build again:
$ cargo build --release --target x86_64-unknown-none
   Compiling hack-the-planet v0.1.0 (/hack-the-planet)
error[E0601]:  main  function not found in crate  hack_the_planet 
 --> src/main.rs:1:11
   
1   #![no_std]
              ^ consider adding a  main  function to  src/main.rs 
For more information about this error, try  rustc --explain E0601 .
error: could not compile  hack-the-planet  due to previous error
Rust noticed we didn t define a main function and suggest we add one. This isn t what we want though so we ll politely decline and inform Rust we don t have a main and it shouldn t attempt to call it. We re adding #![no_main] to our file and src/main.rs now looks like this:
#![no_std]
#![no_main]
Running the build again:
$ cargo build
   Compiling hack-the-planet v0.1.0 (/hack-the-planet)
error:  #[panic_handler]  function required, but not found
error: language item required, but not found:  eh_personality 
   
  = note: this can occur when a binary crate with  #![no_std]  is compiled for a target where  eh_personality  is defined in the standard library
  = help: you may be able to compile for a target that doesn't need  eh_personality , specify a target with  --target  or in  .cargo/config 
error: could not compile  hack-the-planet  due to 2 previous errors
Rust is asking us for a panic handler, basically I m going to jump to this address if something goes terribly wrong and execute whatever you put there . Eventually we would put some code there to just exit the program, but for now an infinitely loop will do. This is likely going to get stripped away anyway by the compiler if it notices our program has no code-branches leading to a panic and the code is unused. Our src/main.rs now looks like this:
#![no_std]
#![no_main]
use core::panic::PanicInfo;
#[panic_handler]
fn panic(_info: &PanicInfo) -> !  
    loop  
 
Running the build again:
$ cargo build --release --target x86_64-unknown-none
   Compiling hack-the-planet v0.1.0 (/hack-the-planet)
    Finished release [optimized] target(s) in 0.16s
Neat, it worked! What happens if we run it?
$ target/x86_64-unknown-none/release/hack-the-planet
Segmentation fault (core dumped)
Oops. Let s try to disassemble it:
$ objdump -d target/x86_64-unknown-none/release/hack-the-planet
target/x86_64-unknown-none/release/hack-the-planet:     file format elf64-x86-64
Ok that looks pretty from scratch to me . The file contains no cpu instructions. Also note how our infinity loop is not present (as predicted).

Making a basic program and executing it Ok let s try to make a valid program that basically just cleanly exits. First let s try to add some cpu instructions and verify they re indeed getting executed. Lemme introduce, the INT 3 instruction in x86_64 assembly. In binary it s also known as the 0xCC opcode. It crashes our program in a slightly different way, so if the error message changes, we know it worked. The other tutorials use a #[naked] function for the entry point, but since this feature isn t stabilized at the time of writing we re going to use the global_asm! macro. Also don t worry, I m not going to introduce every assembly instruction individually. Our program now looks like this:
#![no_std]
#![no_main]
use core::arch::global_asm;
use core::panic::PanicInfo;
#[panic_handler]
fn panic(_info: &PanicInfo) -> !  
    loop  
 
global_asm!  
    ".global _start",
    "_start:",
    "int 3"
 
Running the build again (ok basically from now on the build is always going to be expected to work unless I say otherwise):
$ cargo build --release --target x86_64-unknown-none
   Compiling hack-the-planet v0.1.0 (/hack-the-planet)
    Finished release [optimized] target(s) in 0.11s
Let s try to disassemble the binary again:
$ objdump -d target/x86_64-unknown-none/release/hack-the-planet
target/x86_64-unknown-none/release/hack-the-planet:     file format elf64-x86-64
Disassembly of section .text:
0000000000001210 <_start>:
    1210:	cc                   	int3
And sure enough, there s a cc instruction that was identified as int3. Let s try to run this:
$ target/x86_64-unknown-none/release/hack-the-planet
Trace/breakpoint trap (core dumped)
The error message of the crash is now slightly different because it s hitting our breakpoint cpu instruction. Funfact btw, if you run this in strace you can see this isn t making any system calls (aka not talking to the kernel at all, it just crashes):
$ strace -f ./hack-the-planet
execve("./hack-the-planet", ["./hack-the-planet"], 0x74f12430d1d8 /* 39 vars */) = 0
--- SIGTRAP  si_signo=SIGTRAP, si_code=SI_KERNEL, si_addr=NULL  ---
+++ killed by SIGTRAP (core dumped) +++
[1]    2796457 trace trap (core dumped)  strace -f ./hack-the-planet
Let s try to make a program that does a clean shutdown. To do this we inform the kernel with a system call that we may like to exit. We can get more info on this with man 2 exit and it defines exit like this:
[[noreturn]] void _exit(int status);
On Linux this syscall is actually called _exit and exit is implemented as a libc function, but we don t care about any of that today, it s going to do the job just fine. Also note how it takes a single argument of type int. In C-speak this means signed 32 bit , i32 in Rust. Next we need to figure out the syscall number of this syscall. These numbers are cpu architecture specific for some reason (idk, idc). We re looking these numbers up with ripgrep in /usr/include/asm/:
$ rg __NR_exit /usr/include/asm
/usr/include/asm/unistd_64.h
64:#define __NR_exit 60
235:#define __NR_exit_group 231
/usr/include/asm/unistd_x32.h
53:#define __NR_exit (__X32_SYSCALL_BIT + 60)
206:#define __NR_exit_group (__X32_SYSCALL_BIT + 231)
/usr/include/asm/unistd_32.h
5:#define __NR_exit 1
253:#define __NR_exit_group 252
Since we re on x86_64 the correct value is the one in unistd_64.h, 60. Also, on x86_64 the syscall number goes into the rax cpu register, the status argument goes in the rdi register. The return value of the syscall is going to be placed in the rax register after the syscall is done, but for exit the execution is never given back to us. Let s try to write 60 into the rax register and 69 into the rdi register. To copy into registers we re going to use the mov destination, source instruction to copy from source to destination. With these registers setup we can use the syscall cpu instruction to hand execution over to the kernel. Don t worry, there s only one more assembly instruction coming and for everything else we re going to use Rust. Our code now looks like this:
#![no_std]
#![no_main]
use core::arch::global_asm;
use core::panic::PanicInfo;
#[panic_handler]
fn panic(_info: &PanicInfo) -> !  
    loop  
 
global_asm!  
    ".global _start",
    "_start:",
    "mov rax, 60",
    "mov rdi, 69",
    "syscall"
 
Build the binary, run it and print the exit code:
$ cargo build --release --target x86_64-unknown-none
$ target/x86_64-unknown-none/release/hack-the-planet; echo $?
69
Nice. Rust is quite literally putting these cpu instructions into the binary for us, nothing else.
$ objdump -d target/x86_64-unknown-none/release/hack-the-planet
target/x86_64-unknown-none/release/hack-the-planet:     file format elf64-x86-64
Disassembly of section .text:
0000000000001210 <_start>:
    1210:	48 c7 c0 3c 00 00 00 	mov    $0x3c,%rax
    1217:	48 c7 c7 45 00 00 00 	mov    $0x45,%rdi
    121e:	0f 05                	syscall
Running this with strace shows the program does exactly one thing.
$ strace -f ./hack-the-planet
execve("./hack-the-planet", ["./hack-the-planet"], 0x70699fe8c908 /* 39 vars */) = 0
exit(69)                                = ?
+++ exited with 69 +++

Writing Rust Ok but even though cpu instructions can be fun at times, I d rather not deal with them most of the time (this might strike you as odd, considering this blog post). Instead let s try to define a function in Rust and call into that instead. We re going to define this function as unsafe (btw none of this is taking advantage of the safety guarantees by Rust in case it wasn t obvious. This tutorial is mostly going to stick to unsafe Rust, but for bigger projects you can attempt to reduce your usage of unsafe to opt back into normal safe Rust), it also declares the function with #[no_mangle] so the function name is preserved as main and we can call it from our global_asm entry point. Lastely, when our program is started it s going to get the stack address passed in one of the cpu registers, this value is expected to be passed to our function as an argument. Our function declares ! as return type, which means it never returns:
#[no_mangle]
unsafe fn main(_stack_top: *const u8) -> !  
    // TODO: this is missing
 
This won t compile yet, we need to add our assembly for the exit syscall back in.
#[no_mangle]
unsafe fn main(_stack_top: *const u8) -> !  
    asm!(
        "syscall",
        in("rax") 60,
        in("rdi") 0,
        options(noreturn)
    );
 
This time we re using the asm! macro, this is a slightly more declarative approach. We want to run the syscall cpu instruction with 60 in the rax register, and this time we want the rdi register to be zero, to indicate a successful exit. We also use options(noreturn) so Rust knows it should assume execution does not resume after this assembly is executed (the Linux kernel guarantees this). We modify our global_asm! entrypoint to call our new main function, and to copy the stack address from rsp into the register for the first argument rdi because it would otherwise get lost forever:
global_asm!  
    ".global _start",
    "_start:",
    "mov rdi, rsp",
    "call main"
 
Our full program now looks like this:
#![no_std]
#![no_main]
use core::arch::asm;
use core::arch::global_asm;
use core::panic::PanicInfo;
#[panic_handler]
fn panic(_info: &PanicInfo) -> !  
    loop  
 
global_asm!  
    ".global _start",
    "_start:",
    "mov rdi, rsp",
    "call main"
 
#[no_mangle]
unsafe fn main(_stack_top: *const u8) -> !  
    asm!(
        "syscall",
        in("rax") 60,
        in("rdi") 0,
        options(noreturn)
    );
 
After building and disassembling this the Rust compiler is slowly starting to do work for us:
$ cargo build --release --target x86_64-unknown-none
$ objdump -d target/x86_64-unknown-none/release/hack-the-planet
target/x86_64-unknown-none/release/hack-the-planet:     file format elf64-x86-64
Disassembly of section .text:
0000000000001210 <_start>:
    1210:	48 89 e7             	mov    %rsp,%rdi
    1213:	e8 08 00 00 00       	call   1220 <main>
    1218:	cc                   	int3
    1219:	cc                   	int3
    121a:	cc                   	int3
    121b:	cc                   	int3
    121c:	cc                   	int3
    121d:	cc                   	int3
    121e:	cc                   	int3
    121f:	cc                   	int3
0000000000001220 <main>:
    1220:	50                   	push   %rax
    1221:	b8 3c 00 00 00       	mov    $0x3c,%eax
    1226:	31 ff                	xor    %edi,%edi
    1228:	0f 05                	syscall
    122a:	0f 0b                	ud2
The mov and syscall instructions are still the same, but it noticed it can XOR the rdi register with itself to set it to zero. It s using x86 assembly language (the 32 bit variant of x86_64, that also happens to work on x86_64) to do so, that s why the register is refered to as edi in the disassembly. You can also see it s inserting a bunch of 0xCC instructions (for alignment) and Rust puts the opcodes 0x0F 0x0B at the end of the function to force an invalid opcode exception so the program is guaranteed to crash in case the exit syscall doesn t do it. This code still executes as expected:
$ strace -f ./hack-the-planet
execve("./hack-the-planet", ["./hack-the-planet"], 0x72dae7e5dc08 /* 39 vars */) = 0
exit(0)                                 = ?
+++ exited with 0 +++

Adding functions Ok we re getting closer but we aren t quite there yet. Let s try to write an exit function for our assembly that we can then call like a normal function. Remember that it takes a signed 32 bit integer that s supposed to go into rdi.
unsafe fn exit(status: i32) -> !  
    asm!(
        "syscall",
        in("rax") 60,
        in("rdi") status,
        options(noreturn)
    );
 
Actually, since this function doesn t take any raw pointers and any i32 is valid for this syscall we re going to remove the unsafe marker of this function. When doing this we still need to use unsafe within the function for our inline assembly.
fn exit(status: i32) -> !  
    unsafe  
        asm!(
            "syscall",
            in("rax") 60,
            in("rdi") status,
            options(noreturn)
        );
     
 
Let s call this function from our main, and also remove the infinity loop of the panic handler with a call to exit(1):
#![no_std]
#![no_main]
use core::arch::asm;
use core::arch::global_asm;
use core::panic::PanicInfo;
#[panic_handler]
fn panic(_info: &PanicInfo) -> !  
    exit(1);
 
global_asm!  
    ".global _start",
    "_start:",
    "mov rdi, rsp",
    "call main"
 
fn exit(status: i32) -> !  
    unsafe  
        asm!(
            "syscall",
            in("rax") 60,
            in("rdi") status,
            options(noreturn)
        );
     
 
#[no_mangle]
unsafe fn main(_stack_top: *const u8) -> !  
    exit(0);
 
Running this still works, but interestingly the generated assembly didn t change at all:
$ cargo build --release --target x86_64-unknown-none
$ objdump -d target/x86_64-unknown-none/release/hack-the-planet
target/x86_64-unknown-none/release/hack-the-planet:     file format elf64-x86-64
Disassembly of section .text:
0000000000001210 <_start>:
    1210:	48 89 e7             	mov    %rsp,%rdi
    1213:	e8 08 00 00 00       	call   1220 <main>
    1218:	cc                   	int3
    1219:	cc                   	int3
    121a:	cc                   	int3
    121b:	cc                   	int3
    121c:	cc                   	int3
    121d:	cc                   	int3
    121e:	cc                   	int3
    121f:	cc                   	int3
0000000000001220 <main>:
    1220:	50                   	push   %rax
    1221:	b8 3c 00 00 00       	mov    $0x3c,%eax
    1226:	31 ff                	xor    %edi,%edi
    1228:	0f 05                	syscall
    122a:	0f 0b                	ud2
Rust noticed there s no need to make it a separate function at runtime and instead merged the instructions of the exit function directly into our main. It also noticed the 0 argument in exit(0) means rdi is supposed to be zero and uses the XOR optimization mentioned before. Since main is not calling any unsafe functions anymore we could mark it as safe too, but in the next few functions we re going to deal with file descriptors and raw pointers, so this is likely the only safe function we re going to write in this tutorial so let s just keep the unsafe marker.

Printing text Ok let s try to do a quick hello world, to do this we re going to call the write syscall. Looking it up with man 2 write:
ssize_t write(int fd, const void buf[.count], size_t count);
The write syscall takes 3 arguments and returns a signed size_t. In Rust this is called isize. In C size_t is an unsigned integer type that can hold any value of sizeof(...) for the given platform, ssize_t can only store half of that because it uses one of the bits to indicate an error has occured (the first s means signed, write returns -1 in case of an error). The arguments for write are:
  • the file descriptor to write to. stdout is located on file descriptor 1.
  • a pointer/address to some memory.
  • the number of bytes that should be written, starting at the given address.
Let s also lookup the syscall number of write:
% rg __NR_write /usr/include/asm
/usr/include/asm/unistd_64.h
5:#define __NR_write 1
24:#define __NR_writev 20
/usr/include/asm/unistd_32.h
8:#define __NR_write 4
150:#define __NR_writev 146
/usr/include/asm/unistd_x32.h
5:#define __NR_write (__X32_SYSCALL_BIT + 1)
323:#define __NR_writev (__X32_SYSCALL_BIT + 516)
The value we re looking for is 1. Let s write our write function (heh).
unsafe fn write(fd: i32, buf: *const u8, count: usize) -> isize  
    let r0;
    asm!(
        "syscall",
        inlateout("rax") 1 => r0,
        in("rdi") fd,
        in("rsi") buf,
        in("rdx") count,
        lateout("rcx") _,
        lateout("r11") _,
        options(nostack, preserves_flags)
    );
    r0
 
Now that s a lot of stuff at once. Since this syscall is actually going to hand execution back to our program we need to let Rust know which cpu registers the syscall is writing to, so Rust doesn t attempt to use them to store data (that would be silently overwritten by the syscall). inlateout("raw") 1 => r0 means we re writing a value to the register and want the result back in variable r0. in("rdi") fd means we want to write the value of fd into the rdi register. lateout("rcx") _ means the Linux kernel may write to that register (so the previous value may get lost), but we don t want to store the value anywhere (the underscore acts as a dummy variable name). This doesn t compile just yet though
$ cargo build --release --target x86_64-unknown-none
   Compiling hack-the-planet v0.1.0 (/hack-the-planet)
error: incompatible types for asm inout argument
  --> src/main.rs:35:26
    
35           inlateout("rax") 1 => r0,
                              ^    ^^ type  isize 
                               
                              type  i32 
    
   = note: asm inout arguments must have the same type, unless they are both pointers or integers of the same size
error: could not compile  hack-the-planet  due to previous error
Rust has inferred the type of r0 is isize since that s what our function returns, but the type of the input value for the register was inferred to be i32. We re going to select a specific number type to fix this.
unsafe fn write(fd: i32, buf: *const u8, count: usize) -> isize  
    let r0;
    asm!(
        "syscall",
        inlateout("rax") 1isize => r0,
        in("rdi") fd,
        in("rsi") buf,
        in("rdx") count,
        lateout("rcx") _,
        lateout("r11") _,
        options(nostack, preserves_flags)
    );
    r0
 
We can now call our new write function like this:
write(1, b"Hello world\n".as_ptr(), 12);
We need to set the number of bytes we want to write explicitly because there s no concept of null-byte termination in the write system call, it s quite literally write the next X bytes, starting from this address . Our program now looks like this:
#![no_std]
#![no_main]
use core::arch::asm;
use core::arch::global_asm;
use core::panic::PanicInfo;
#[panic_handler]
fn panic(_info: &PanicInfo) -> !  
    exit(1);
 
global_asm!  
    ".global _start",
    "_start:",
    "mov rdi, rsp",
    "call main"
 
fn exit(status: i32) -> !  
    unsafe  
        asm!(
            "syscall",
            in("rax") 60,
            in("rdi") status,
            options(noreturn)
        );
     
 
unsafe fn write(fd: i32, buf: *const u8, count: usize) -> isize  
    let r0;
    asm!(
        "syscall",
        inlateout("rax") 1isize => r0,
        in("rdi") fd,
        in("rsi") buf,
        in("rdx") count,
        lateout("rcx") _,
        lateout("r11") _,
        options(nostack, preserves_flags)
    );
    r0
 
#[no_mangle]
unsafe fn main(_stack_top: *const u8) -> !  
    write(1, b"Hello world\n".as_ptr(), 12);
    exit(0);
 
Let s try to build and disassemble it:
$ cargo build --release --target x86_64-unknown-none
$ objdump -d target/x86_64-unknown-none/release/hack-the-planet
target/x86_64-unknown-none/release/hack-the-planet:     file format elf64-x86-64
Disassembly of section .text:
0000000000001220 <_start>:
    1220:	48 89 e7             	mov    %rsp,%rdi
    1223:	e8 08 00 00 00       	call   1230 <main>
    1228:	cc                   	int3
    1229:	cc                   	int3
    122a:	cc                   	int3
    122b:	cc                   	int3
    122c:	cc                   	int3
    122d:	cc                   	int3
    122e:	cc                   	int3
    122f:	cc                   	int3
0000000000001230 <main>:
    1230:	50                   	push   %rax
    1231:	48 8d 35 d5 ef ff ff 	lea    -0x102b(%rip),%rsi        # 20d <_start-0x1013>
    1238:	b8 01 00 00 00       	mov    $0x1,%eax
    123d:	ba 0c 00 00 00       	mov    $0xc,%edx
    1242:	bf 01 00 00 00       	mov    $0x1,%edi
    1247:	0f 05                	syscall
    1249:	b8 3c 00 00 00       	mov    $0x3c,%eax
    124e:	31 ff                	xor    %edi,%edi
    1250:	0f 05                	syscall
    1252:	0f 0b                	ud2
This time there are 2 syscalls, first write, then exit. For write it s setting up the 3 arguments in our cpu registers (rdi, rsi, rdx). The lea instruction subtracts 0x102b from the rip register (the instruction pointer) and places the result in the rsi register. This is effectively saying an address relative to wherever this code was loaded into memory . The instruction pointer is going to point directly behind the opcodes of the lea instruction, so 0x1238 - 0x102b = 0x20d. This address is also pointed out in the disassembly as a comment. We don t see the string in our disassembly but we can convert our 0x20d hex to 525 in decimal and use dd to read 12 bytes from that offset, and sure enough:
$ dd bs=1 skip=525 count=12 if=target/x86_64-unknown-none/release/hack-the-planet
Hello world
12+0 records in
12+0 records out
Execute our binary with strace also shows the new write syscall (and the bytes that are being written mixed up in the output).
$ strace -f ./hack-the-planet
execve("./hack-the-planet", ["./hack-the-planet"], 0x74493abe64a8 /* 39 vars */) = 0
write(1, "Hello world\n", 12Hello world
)           = 12
exit(0)                                 = ?
+++ exited with 0 +++
After running strip on it to remove some symbols the binary is so small, if you open it in a text editor it fits on a screenshot:

11 November 2022

Reproducible Builds: Reproducible Builds in October 2022

Welcome to the Reproducible Builds report for October 2022! In these reports we attempt to outline the most important things that we have been up to over the past month. As ever, if you are interested in contributing to the project, please visit our Contribute page on our website.

Our in-person summit this year was held in the past few days in Venice, Italy. Activity and news from the summit will therefore be covered in next month s report!
A new article related to reproducible builds was recently published in the 2023 IEEE Symposium on Security and Privacy. Titled Taxonomy of Attacks on Open-Source Software Supply Chains and authored by Piergiorgio Ladisa, Henrik Plate, Matias Martinez and Olivier Barais, their paper:
[ ] proposes a general taxonomy for attacks on opensource supply chains, independent of specific programming languages or ecosystems, and covering all supply chain stages from code contributions to package distribution.
Taking the form of an attack tree, the paper covers 107 unique vectors linked to 94 real world supply-chain incidents which is then mapped to 33 mitigating safeguards including, of course, reproducible builds:
Reproducible Builds received a very high utility rating (5) from 10 participants (58.8%), but also a high-cost rating (4 or 5) from 12 (70.6%). One expert commented that a reproducible build like used by Solarwinds now, is a good measure against tampering with a single build system and another claimed this is going to be the single, biggest barrier .

It was noticed this month that Solarwinds published a whitepaper back in December 2021 in order to:
[ ] illustrate a concerning new reality for the software industry and illuminates the increasingly sophisticated threats made by outside nation-states to the supply chains and infrastructure on which we all rely.
The 12-month anniversary of the 2020 Solarwinds attack (which SolarWinds Worldwide LLC itself calls the SUNBURST attack) was, of course, the likely impetus for publication.
Whilst collaborating on making the Cyrus IMAP server reproducible, Ellie Timoney asked why the Reproducible Builds testing framework uses two remarkably distinctive build paths when attempting to flush out builds that vary on the absolute system path in which they were built. In the case of the Cyrus IMAP server, these happened to be: Asked why they vary in three different ways, Chris Lamb listed in detail the motivation behind to each difference.
On our mailing list this month:
The Reproducible Builds project is delighted to welcome openEuler to the Involved projects page [ ]. openEuler is Linux distribution developed by Huawei, a counterpart to it s more commercially-oriented EulerOS.

Debian Colin Watson wrote about his experience towards making the databases generated by the man-db UNIX manual page indexing tool:
One of the people working on [reproducible builds] noticed that man-db s database files were an obstacle to [reproducibility]: in particular, the exact contents of the database seemed to depend on the order in which files were scanned when building it. The reporter proposed solving this by processing files in sorted order, but I wasn t keen on that approach: firstly because it would mean we could no longer process files in an order that makes it more efficient to read them all from disk (still valuable on rotational disks), but mostly because the differences seemed to point to other bugs.
Colin goes on to describe his approach to solving the problem, including fixing various fits of internal caching, and he ends his post with None of this is particularly glamorous work, but it paid off .
Vagrant Cascadian announced on our mailing list another online sprint to help clear the huge backlog of reproducible builds patches submitted by performing NMUs (Non-Maintainer Uploads). The first such sprint took place on September 22nd, but another was held on October 6th, and another small one on October 20th. This resulted in the following progress:
41 reviews of Debian packages were added, 62 were updated and 12 were removed this month adding to our knowledge about identified issues. A number of issue types were updated too. [1][ ]
Lastly, Luca Boccassi submitted a patch to debhelper, a set of tools used in the packaging of the majority of Debian packages. The patch addressed an issue in the dh_installsysusers utility so that the postinst post-installation script that debhelper generates the same data regardless of the underlying filesystem ordering.

Other distributions F-Droid is a community-run app store that provides free software applications for Android phones. This month, F-Droid changed their documentation and guidance to now explicitly encourage RB for new apps [ ][ ], and FC Stegerman created an extremely in-depth issue on GitLab concerning the APK signing block. You can read more about F-Droid s approach to reproducibility in our July 2022 interview with Hans-Christoph Steiner of the F-Droid Project. In openSUSE, Bernhard M. Wiedemann published his usual openSUSE monthly report.

Upstream patches The Reproducible Builds project detects, dissects and attempts to fix as many currently-unreproducible packages as possible. We endeavour to send all of our patches upstream where appropriate. This month, we wrote a large number of such patches, including:

diffoscope diffoscope is our in-depth and content-aware diff utility. Not only can it locate and diagnose reproducibility issues, it can provide human-readable diffs from many kinds of binary formats. This month, Chris Lamb prepared and uploaded versions 224 and 225 to Debian:
  • Add support for comparing the text content of HTML files using html2text. [ ]
  • Add support for detecting ordering-only differences in XML files. [ ]
  • Fix an issue with detecting ordering differences. [ ]
  • Use the capitalised version of Ordering consistently everywhere in output. [ ]
  • Add support for displaying font metadata using ttx(1) from the fonttools suite. [ ]
  • Testsuite improvements:
    • Temporarily allow the stable-po pipeline to fail in the CI. [ ]
    • Rename the order1.diff test fixture to json_expected_ordering_diff. [ ]
    • Tidy the JSON tests. [ ]
    • Use assert_diff over get_data and an manual assert within the XML tests. [ ]
    • Drop the ALLOWED_TEST_FILES test; it was mostly just annoying. [ ]
    • Tidy the tests/test_source.py file. [ ]
Chris Lamb also added a link to diffoscope s OpenBSD packaging on the diffoscope.org homepage [ ] and Mattia Rizzolo fix an test failure that was occurring under with LLVM 15 [ ].

Testing framework The Reproducible Builds project operates a comprehensive testing framework at tests.reproducible-builds.org in order to check packages and other artifacts for reproducibility. In October, the following changes were made by Holger Levsen:
  • Run the logparse tool to analyse results on the Debian Edu build logs. [ ]
  • Install btop(1) on all nodes running Debian. [ ]
  • Switch Arch Linux from using SHA1 to SHA256. [ ]
  • When checking Debian debstrap jobs, correctly log the tool usage. [ ]
  • Cleanup more task-related temporary directory names when testing Debian packages. [ ][ ]
  • Use the cdebootstrap-static binary for the 2nd runs of the cdebootstrap tests. [ ]
  • Drop a workaround when testing OpenWrt and coreboot as the issue in diffoscope has now been fixed. [ ]
  • Turn on an rm(1) warning into an info -level message. [ ]
  • Special case the osuosl168 node for running Debian bookworm already. [ ][ ]
  • Use the new non-free-firmware suite on the o168 node. [ ]
In addition, Mattia Rizzolo made the following changes:
  • Ensure that 2nd build has a merged /usr. [ ]
  • Only reconfigure the usrmerge package on Debian bookworm and above. [ ]
  • Fix bc(1) syntax in the computation of the percentage of unreproducible packages in the dashboard. [ ][ ][ ]
  • In the index_suite_ pages, order the package status to be the same order of the menu. [ ]
  • Pass the --distribution parameter to the pbuilder utility. [ ]
Finally, Roland Clobus continued his work on testing Live Debian images. In particular, he extended the maintenance script to warn when workspace directories cannot be deleted. [ ]
If you are interested in contributing to the Reproducible Builds project, please visit our Contribute page on our website. However, you can get in touch with us via:

16 October 2022

kpcyrd: updlockfiles: Manage lockfiles in PKGBUILDs for upstreams that don't ship them

I ve released a new tool to manage lockfiles for Arch Linux packages that can t use a lockfile from the official upstream release. It integrates closely with other Arch Linux tooling like updpkgsums that s already used to pin the content of build inputs in PKGBUILD. To use this, the downstream lockfile becomes an additional source input in the source= array of our PKGBUILD (this is already the case for some packages).
source=("git+https://github.com/vimeo/psalm.git#commit=$ _commit "
        "composer.lock")
You would then add a new function named updlockfiles that can generate new lockfiles and copies them into $outdir, and a prepare function to copy the lockfile in the right place:
 prepare()  
   cd $ pkgname 
   cp ../composer.lock .
 
updlockfiles()  
  cd $ pkgname 
  rm -f composer.lock
  composer update
  cp composer.lock "$ outdir /"
 
To update the package to the latest (compatible) patch level simply run:
updlockfiles
This can also be used in case upstreams lockfile has vulnerable dependencies that you want to patch downstream. For more detailed instructions see the readme.

Thanks This work is currently crowd-funded on github sponsors. I d like to thank @SantiagoTorres, @repi and @rgacogne for their support in particular.

6 June 2022

Reproducible Builds: Reproducible Builds in May 2022

Welcome to the May 2022 report from the Reproducible Builds project. In our reports we outline the most important things that we have been up to over the past month. As ever, if you are interested in contributing to the project, please visit our Contribute page on our website.

Repfix paper Zhilei Ren, Shiwei Sun, Jifeng Xuan, Xiaochen Li, Zhide Zhou and He Jiang have published an academic paper titled Automated Patching for Unreproducible Builds:
[..] fixing unreproducible build issues poses a set of challenges [..], among which we consider the localization granularity and the historical knowledge utilization as the most significant ones. To tackle these challenges, we propose a novel approach [called] RepFix that combines tracing-based fine-grained localization with history-based patch generation mechanisms.
The paper (PDF, 3.5MB) uses the Debian mylvmbackup package as an example to show how RepFix can automatically generate patches to make software build reproducibly. As it happens, Reiner Herrmann submitted a patch for the mylvmbackup package which has remained unapplied by the Debian package maintainer for over seven years, thus this paper inadvertently underscores that achieving reproducible builds will require both technical and social solutions.

Python variables Johannes Schauer discovered a fascinating bug where simply naming your Python variable _m led to unreproducible .pyc files. In particular, the types module in Python 3.10 requires the following patch to make it reproducible:
--- a/Lib/types.py
+++ b/Lib/types.py
@@ -37,8 +37,8 @@ _ag = _ag()
 AsyncGeneratorType = type(_ag)
 
 class _C:
-    def _m(self): pass
-MethodType = type(_C()._m)
+    def _b(self): pass
+MethodType = type(_C()._b)
Simply renaming the dummy method from _m to _b was enough to workaround the problem. Johannes bug report first led to a number of improvements in diffoscope to aid in dissecting .pyc files, but upstream identified this as caused by an issue surrounding interned strings and is being tracked in CPython bug #78274.

New SPDX team to incorporate build metadata in Software Bill of Materials SPDX, the open standard for Software Bill of Materials (SBOM), is continuously developed by a number of teams and committees. However, SPDX has welcomed a new addition; a team dedicated to enhancing metadata about software builds, complementing reproducible builds in creating a more secure software supply chain. The SPDX Builds Team has been working throughout May to define the universal primitives shared by all build systems, including the who, what, where and how of builds:
  • Who: the identity of the person or organisation that controls the build infrastructure.
  • What: the inputs and outputs of a given build, combining metadata about the build s configuration with an SBOM describing source code and dependencies.
  • Where: the software packages making up the build system, from build orchestration tools such as Woodpecker CI and Tekton to language-specific tools.
  • How: the invocation of a build, linking metadata of a build to the identity of the person or automation tool that initiated it.
The SPDX Builds Team expects to have a usable data model by September, ready for inclusion in the SPDX 3.0 standard. The team welcomes new contributors, inviting those interested in joining to introduce themselves on the SPDX-Tech mailing list.

Talks at Debian Reunion Hamburg Some of the Reproducible Builds team (Holger Levsen, Mattia Rizzolo, Roland Clobus, Philip Rinn, etc.) met in real life at the Debian Reunion Hamburg (official homepage). There were several informal discussions amongst them, as well as two talks related to reproducible builds. First, Holger Levsen gave a talk on the status of Reproducible Builds for bullseye and bookworm and beyond (WebM, 210MB): Secondly, Roland Clobus gave a talk called Reproducible builds as applied to non-compiler output (WebM, 115MB):

Supply-chain security attacks This was another bumper month for supply-chain attacks in package repositories. Early in the month, Lance R. Vick noticed that the maintainer of the NPM foreach package let their personal email domain expire, so they bought it and now controls foreach on NPM and the 36,826 projects that depend on it . Shortly afterwards, Drew DeVault published a related blog post titled When will we learn? that offers a brief timeline of major incidents in this area and, not uncontroversially, suggests that the correct way to ship packages is with your distribution s package manager .

Bootstrapping Bootstrapping is a process for building software tools progressively from a primitive compiler tool and source language up to a full Linux development environment with GCC, etc. This is important given the amount of trust we put in existing compiler binaries. This month, a bootstrappable mini-kernel was announced. Called boot2now, it comprises a series of compilers in the form of bootable machine images.

Google s new Assured Open Source Software service Google Cloud (the division responsible for the Google Compute Engine) announced a new Assured Open Source Software service. Noting the considerable 650% year-over-year increase in cyberattacks aimed at open source suppliers, the new service claims to enable enterprise and public sector users of open source software to easily incorporate the same OSS packages that Google uses into their own developer workflows . The announcement goes on to enumerate that packages curated by the new service would be:
  • Regularly scanned, analyzed, and fuzz-tested for vulnerabilities.
  • Have corresponding enriched metadata incorporating Container/Artifact Analysis data.
  • Are built with Cloud Build including evidence of verifiable SLSA-compliance
  • Are verifiably signed by Google.
  • Are distributed from an Artifact Registry secured and protected by Google.
(Full announcement)

A retrospective on the Rust programming language Andrew bunnie Huang published a long blog post this month promising a critical retrospective on the Rust programming language. Amongst many acute observations about the evolution of the language s syntax (etc.), the post beings to critique the languages approach to supply chain security ( Rust Has A Limited View of Supply Chain Security ) and reproducibility ( You Can t Reproduce Someone Else s Rust Build ):
There s some bugs open with the Rust maintainers to address reproducible builds, but with the number of issues they have to deal with in the language, I am not optimistic that this problem will be resolved anytime soon. Assuming the only driver of the unreproducibility is the inclusion of OS paths in the binary, one fix to this would be to re-configure our build system to run in some sort of a chroot environment or a virtual machine that fixes the paths in a way that almost anyone else could reproduce. I say almost anyone else because this fix would be OS-dependent, so we d be able to get reproducible builds under, for example, Linux, but it would not help Windows users where chroot environments are not a thing.
(Full post)

Reproducible Builds IRC meeting The minutes and logs from our May 2022 IRC meeting have been published. In case you missed this one, our next IRC meeting will take place on Tuesday 28th June at 15:00 UTC on #reproducible-builds on the OFTC network.

A new tool to improve supply-chain security in Arch Linux kpcyrd published yet another interesting tool related to reproducibility. Writing about the tool in a recent blog post, kpcyrd mentions that although many PKGBUILDs provide authentication in the context of signed Git tags (i.e. the ability to verify the Git tag was signed by one of the two trusted keys ), they do not support pinning, ie. that upstream could create a new signed Git tag with an identical name, and arbitrarily change the source code without the [maintainer] noticing . Conversely, other PKGBUILDs support pinning but not authentication. The new tool, auth-tarball-from-git, fixes both problems, as nearly outlined in kpcyrd s original blog post.

diffoscope diffoscope is our in-depth and content-aware diff utility. Not only can it locate and diagnose reproducibility issues, it can provide human-readable diffs from many kinds of binary formats. This month, Chris Lamb prepared and uploaded versions 212, 213 and 214 to Debian unstable. Chris also made the following changes:
  • New features:
    • Add support for extracting vmlinuz Linux kernel images. [ ]
    • Support both python-argcomplete 1.x and 2.x. [ ]
    • Strip sticky etc. from x.deb: sticky Debian binary package [ ]. [ ]
    • Integrate test coverage with GitLab s concept of artifacts. [ ][ ][ ]
  • Bug fixes:
    • Don t mask differences in .zip or .jar central directory extra fields. [ ]
    • Don t show a binary comparison of .zip or .jar files if we have observed at least one nested difference. [ ]
  • Codebase improvements:
    • Substantially update comment for our calls to zipinfo and zipinfo -v. [ ]
    • Use assert_diff in test_zip over calling get_data with a separate assert. [ ]
    • Don t call re.compile and then call .sub on the result; just call re.sub directly. [ ]
    • Clarify the comment around the difference between --usage and --help. [ ]
  • Testsuite improvements:
    • Test --help and --usage. [ ]
    • Test that --help includes the file formats. [ ]
Vagrant Cascadian added an external tool reference xb-tool for GNU Guix [ ] as well as updated the diffoscope package in GNU Guix itself [ ][ ][ ].

Distribution work In Debian, 41 reviews of Debian packages were added, 85 were updated and 13 were removed this month adding to our knowledge about identified issues. A number of issue types have been updated, including adding a new nondeterministic_ordering_in_deprecated_items_collected_by_doxygen toolchain issue [ ] as well as ones for mono_mastersummary_xml_files_inherit_filesystem_ordering [ ], extended_attributes_in_jar_file_created_without_manifest [ ] and apxs_captures_build_path [ ]. Vagrant Cascadian performed a rough check of the reproducibility of core package sets in GNU Guix, and in openSUSE, Bernhard M. Wiedemann posted his usual monthly reproducible builds status report.

Upstream patches The Reproducible Builds project detects, dissects and attempts to fix as many currently-unreproducible packages as possible. We endeavour to send all of our patches upstream where appropriate. This month, we wrote a large number of such patches, including:

Reproducible builds website Chris Lamb updated the main Reproducible Builds website and documentation in a number of small ways, but also prepared and published an interview with Jan Nieuwenhuizen about Bootstrappable Builds, GNU Mes and GNU Guix. [ ][ ][ ][ ] In addition, Tim Jones added a link to the Talos Linux project [ ] and billchenchina fixed a dead link [ ].

Testing framework The Reproducible Builds project runs a significant testing framework at tests.reproducible-builds.org, to check packages and other artifacts for reproducibility. This month, the following changes were made:
  • Holger Levsen:
    • Add support for detecting running kernels that require attention. [ ]
    • Temporarily configure a host to support performing Debian builds for packages that lack .buildinfo files. [ ]
    • Update generated webpages to clarify wishes for feedback. [ ]
    • Update copyright years on various scripts. [ ]
  • Mattia Rizzolo:
    • Provide a facility so that Debian Live image generation can copy a file remotely. [ ][ ][ ][ ]
  • Roland Clobus:
    • Add initial support for testing generated images with OpenQA. [ ]
And finally, as usual, node maintenance was also performed by Holger Levsen [ ][ ].

Misc news On our mailing list this month:

Contact If you are interested in contributing to the Reproducible Builds project, please visit our Contribute page on our website. However, you can get in touch with us via:

28 May 2022

kpcyrd: auth-tarball-from-git: Verifying tarballs with signed git tags

I noticed there s a common anti-pattern in some PKGBUILDs, the short scripts that are used to build Arch Linux packages. Specifically we re looking at the part that references the source code used when building a package:
source=("git+https://github.com/alacritty/alacritty.git#tag=v$ pkgver ?signed")
validpgpkeys=('4DAA67A9EA8B91FCC15B699C85CDAE3C164BA7B4'
              'A56EF308A9F1256C25ACA3807EA8F8B94622A6A9')
sha256sums=('SKIP')
This does: In contrast consider this PKGBUILD:
source=($pkgname-$pkgver.tar.gz::https://github.com/alacritty/alacritty/archive/refs/tags/v$pkgver.tar.gz)
sha256sums=('e48d4b10762c2707bb17fd8f89bd98f0dcccc450d223cade706fdd9cfaefb308')
Personally - if I had to decide between these two - I d prefer the later because I can always try to authenticate the pinned tarball later on, but it s impossible to know for sure which source code has been used if all I know is something that had a valid signature on it . This set could be infinitely large for all we know! But is there a way to get both? Consider this PKGBUILD:
makedepends=('auth-tarball-from-git')
source=($pkgname-$pkgver.tar.gz::https://github.com/alacritty/alacritty/archive/refs/tags/v$pkgver.tar.gz
        chrisduerr.pgp
        kchibisov.pgp)
sha256sums=('e48d4b10762c2707bb17fd8f89bd98f0dcccc450d223cade706fdd9cfaefb308'
            '19573dc0ba7a2f003377dc49986867f749235ecb45fe15eb923a74b2ab421d74'
            '5b866e6cb791c58cba2e7fc60f647588699b08abc2ad6b18ba82470f0fd3db3b')
prepare()  
  cd "$pkgname-$pkgver"
  auth-tarball-from-git --keyring ../chrisduerr.pgp --keyring ../kchibisov.pgp \
    --tag v$pkgver --prefix $pkgname-$pkgver \
    https://github.com/alacritty/alacritty.git ../$pkgname-$pkgver.tar.gz
 
In this case sha256sums= is the primary line of defense against tampering with build inputs and the git tag is only used to document authorship. For more infos on how this works you can have a look at the auth-tarball-from-git repo, there s also a section about attacks on signed git tags that you should probably know about.

Thanks This work is currently crowd-funded on github sponsors. I d like to thank @SantiagoTorres, @repi and @rgacogne for their support in particular.

5 March 2022

Reproducible Builds: Reproducible Builds in February 2022

Welcome to the February 2022 report from the Reproducible Builds project. In these reports, we try to round-up the important things we and others have been up to over the past month. As ever, if you are interested in contributing to the project, please visit our Contribute page on our website.
Jiawen Xiong, Yong Shi, Boyuan Chen, Filipe R. Cogo and Zhen Ming Jiang have published a new paper titled Towards Build Verifiability for Java-based Systems (PDF). The abstract of the paper contains the following:
Various efforts towards build verifiability have been made to C/C++-based systems, yet the techniques for Java-based systems are not systematic and are often specific to a particular build tool (eg. Maven). In this study, we present a systematic approach towards build verifiability on Java-based systems.

GitBOM is a flexible scheme to track the source code used to generate build artifacts via Git-like unique identifiers. Although the project has been active for a while, the community around GitBOM has now started running weekly community meetings.
The paper Chris Lamb and Stefano Zacchiroli is now available in the March/April 2022 issue of IEEE Software. Titled Reproducible Builds: Increasing the Integrity of Software Supply Chains (PDF), the abstract of the paper contains the following:
We first define the problem, and then provide insight into the challenges of making real-world software build in a reproducible manner-this is, when every build generates bit-for-bit identical results. Through the experience of the Reproducible Builds project making the Debian Linux distribution reproducible, we also describe the affinity between reproducibility and quality assurance (QA).

In openSUSE, Bernhard M. Wiedemann posted his monthly reproducible builds status report.
On our mailing list this month, Thomas Schmitt started a thread around the SOURCE_DATE_EPOCH specification related to formats that cannot help embedding potentially timezone-specific timestamp. (Full thread index.)
The Yocto Project is pleased to report that it s core metadata (OpenEmbedded-Core) is now reproducible for all recipes (100% coverage) after issues with newer languages such as Golang were resolved. This was announced in their recent Year in Review publication. It is of particular interest for security updates so that systems can have specific components updated but reducing the risk of other unintended changes and making the sections of the system changing very clear for audit. The project is now also making heavy use of equivalence of build output to determine whether further items in builds need to be rebuilt or whether cached previously built items can be used. As mentioned in the article above, there are now public servers sharing this equivalence information. Reproducibility is key in making this possible and effective to reduce build times/costs/resource usage.

diffoscope diffoscope is our in-depth and content-aware diff utility. Not only can it locate and diagnose reproducibility issues, it can provide human-readable diffs from many kinds of binary formats. This month, Chris Lamb prepared and uploaded versions 203, 204, 205 and 206 to Debian unstable, as well as made the following changes to the code itself:
  • Bug fixes:
    • Fix a file(1)-related regression where Debian .changes files that contained non-ASCII text were not identified as such, therefore resulting in seemingly arbitrary packages not actually comparing the nested files themselves. The non-ASCII parts were typically in the Maintainer or in the changelog text. [ ][ ]
    • Fix a regression when comparing directories against non-directories. [ ][ ]
    • If we fail to scan using binwalk, return False from BinwalkFile.recognizes. [ ]
    • If we fail to import binwalk, don t report that we are missing the Python rpm module! [ ]
  • Testsuite improvements:
    • Add a test for recent file(1) issue regarding .changes files. [ ]
    • Use our assert_diff utility where we can within the test_directory.py set of tests. [ ]
    • Don t run our binwalk-related tests as root or fakeroot. The latest version of binwalk has some new security protection against this. [ ]
  • Codebase improvements:
    • Drop the _PATH suffix from module-level globals that are not paths. [ ]
    • Tidy some control flow in Difference._reverse_self. [ ]
    • Don t print a warning to the console regarding NT_GNU_BUILD_ID changes. [ ]
In addition, Mattia Rizzolo updated the Debian packaging to ensure that diffoscope and diffoscope-minimal packages have the same version. [ ]

Website updates There were quite a few changes to the Reproducible Builds website and documentation this month as well, including:
  • Chris Lamb:
    • Considerably rework the Who is involved? page. [ ][ ]
    • Move the contributors.sh Bash/shell script into a Python script. [ ][ ][ ]
  • Daniel Shahaf:
    • Try a different Markdown footnote content syntax to work around a rendering issue. [ ][ ][ ]
  • Holger Levsen:
    • Make a huge number of changes to the Who is involved? page, including pre-populating a large number of contributors who cannot be identified from the metadata of the website itself. [ ][ ][ ][ ][ ]
    • Improve linking to sponsors in sidebar navigation. [ ]
    • drop sponsors paragraph as the navigation is clearer now. [ ]
    • Add Mullvad VPN as a bronze-level sponsor . [ ][ ]
  • Vagrant Cascadian:

Upstream patches The Reproducible Builds project attempts to fix as many currently-unreproducible packages as possible. February s patches included the following:

Testing framework The Reproducible Builds project runs a significant testing framework at tests.reproducible-builds.org, to check packages and other artifacts for reproducibility. This month, the following changes were made:
  • Daniel Golle:
    • Update the OpenWrt configuration to not depend on the host LLVM, adding lines to the .config seed to build LLVM for eBPF from source. [ ]
    • Preserve more OpenWrt-related build artifacts. [ ]
  • Holger Levsen:
  • Temporary use a different Git tree when building OpenWrt as our tests had been broken since September 2020. This was reverted after the patch in question was accepted by Paul Spooren into the canonical openwrt.git repository the next day.
    • Various improvements to debugging OpenWrt reproducibility. [ ][ ][ ][ ][ ]
    • Ignore useradd warnings when building packages. [ ]
    • Update the script to powercycle armhf architecture nodes to add a hint to where nodes named virt-*. [ ]
    • Update the node health check to also fix failed logrotate and man-db services. [ ]
  • Mattia Rizzolo:
    • Update the website job after contributors.sh script was rewritten in Python. [ ]
    • Make sure to set the DIFFOSCOPE environment variable when available. [ ]
  • Vagrant Cascadian:
    • Various updates to the diffoscope timeouts. [ ][ ][ ]
Node maintenance was also performed by Holger Levsen [ ] and Vagrant Cascadian [ ].

Finally If you are interested in contributing to the Reproducible Builds project, please visit our Contribute page on our website. However, you can get in touch with us via:

5 February 2022

Reproducible Builds: Reproducible Builds in January 2022

Welcome to the January 2022 report from the Reproducible Builds project. In our reports, we try outline the most important things that have been happening in the past month. As ever, if you are interested in contributing to the project, please visit our Contribute page on our website.
An interesting blog post was published by Paragon Initiative Enterprises about Gossamer, a proposal for securing the PHP software supply-chain. Utilising code-signing and third-party attestations, Gossamer aims to mitigate the risks within the notorious PHP world via publishing attestations to a transparency log. Their post, titled Solving Open Source Supply Chain Security for the PHP Ecosystem goes into some detail regarding the design, scope and implementation of the system.
This month, the Linux Foundation announced SupplyChainSecurityCon, a conference focused on exploring the security threats affecting the software supply chain, sharing best practices and mitigation tactics. The conference is part of the Linux Foundation s Open Source Summit North America and will take place June 21st 24th 2022, both virtually and in Austin, Texas.

Debian There was a significant progress made in the Debian Linux distribution this month, including:

Other distributions kpcyrd reported on Twitter about the release of version 0.2.0 of pacman-bintrans, an experiment with binary transparency for the Arch Linux package manager, pacman. This new version is now able to query rebuilderd to check if a package was independently reproduced.
In the world of openSUSE, however, Bernhard M. Wiedemann posted his monthly reproducible builds status report.

diffoscope diffoscope is our in-depth and content-aware diff utility. Not only can it locate and diagnose reproducibility issues, it can provide human-readable diffs from many kinds of binary formats. This month, Chris Lamb prepared and uploaded versions 199, 200, 201 and 202 to Debian unstable (that were later backported to Debian bullseye-backports by Mattia Rizzolo), as well as made the following changes to the code itself:
  • New features:
    • First attempt at incremental output support with a timeout. Now passing, for example, --timeout=60 will mean that diffoscope will not recurse into any sub-archives after 60 seconds total execution time has elapsed. Note that this is not a fixed/strict timeout due to implementation issues. [ ][ ]
    • Support both variants of odt2txt, including the one provided by the unoconv package. [ ]
  • Bug fixes:
    • Do not return with a UNIX exit code of 0 if we encounter with a file whose human-readable metadata matches literal file contents. [ ]
    • Don t fail if comparing a nonexistent file with a .pyc file (and add test). [ ][ ]
    • If the debian.deb822 module raises any exception on import, re-raise it as an ImportError. This should fix diffoscope on some Fedora systems. [ ]
    • Even if a Sphinx .inv inventory file is labelled The remainder of this file is compressed using zlib, it might not actually be. In this case, don t traceback and simply return the original content. [ ]
  • Documentation:
    • Improve documentation for the new --timeout option due to a few misconceptions. [ ]
    • Drop reference in the manual page claiming the ability to compare non-existent files on the command-line. (This has not been possible since version 32 which was released in September 2015). [ ]
    • Update X has been modified after NT_GNU_BUILD_ID has been applied messages to, for example, not duplicating the full filename in the diffoscope output. [ ]
  • Codebase improvements:
    • Tidy some control flow. [ ]
    • Correct a recompile typo. [ ]
In addition, Alyssa Ross fixed the comparison of CBFS names that contain spaces [ ], Sergei Trofimovich fixed whitespace for compatibility with version 21.12 of the Black source code reformatter [ ] and Zbigniew J drzejewski-Szmek fixed JSON detection with a new version of file [ ].

Testing framework The Reproducible Builds project runs a significant testing framework at tests.reproducible-builds.org, to check packages and other artifacts for reproducibility. This month, the following changes were made:
  • Fr d ric Pierret (fepitre):
    • Add Debian bookworm to package set creation. [ ]
  • Holger Levsen:
    • Install the po4a package where appropriate, as it is needed for the Reproducible Builds website job [ ]. In addition, also run the i18n.sh and contributors.sh scripts [ ].
    • Correct some grammar in Debian live image build output. [ ]
    • Shell monitor improvements:
      • Only show the offline node section if there are offline nodes. [ ]
      • Colorise offline nodes. [ ]
      • Shrink screen usage. [ ][ ][ ]
    • Node health check improvements:
      • Detect if live package builds encounter incomplete snapshots. [ ][ ][ ]
      • Detect if a host is running with today s date (when it should be set artificially in the future). [ ]
    • Use the devscripts package from bullseye-backports on Debian nodes. [ ]
    • Use the Munin monitoring package bullseye-backports on Debian nodes too. [ ]
    • Update New Year handling, needed to be able to detect real and fake dates. [ ][ ]
    • Improve the error message of the script that powercycles the arm64 architecture nodes hosted by Codethink. [ ]
  • Mattia Rizzolo:
    • Use the new --timeout option added in diffoscope version 202. [ ]
  • Roland Clobus:
    • Update the build scripts now that the hooks for live builds are now maintained upstream in the live-build repository. [ ]
    • Show info lines in Jenkins when reproducible hooks have been active. [ ]
    • Use unique folders for the artifacts from each live Debian version. [ ]
  • Vagrant Cascadian:
    • Switch the Debian armhf architecture nodes to use new proxy. [ ]
    • Misc. node maintenance. [ ].

Upstream patches The Reproducible Builds project attempts to fix as many currently-unreproducible packages as possible. In January, we wrote a large number of such patches, including:

And finally If you are interested in contributing to the Reproducible Builds project, please visit our Contribute page on our website. However, you can get in touch with us via:

5 December 2021

Reproducible Builds: Reproducible Builds in November 2021

Welcome to the November 2021 report from the Reproducible Builds project. As a quick recap, whilst anyone may inspect the source code of free software for malicious flaws, almost all software is distributed to end users as pre-compiled binaries. The motivation behind the reproducible builds effort is therefore to ensure no flaws have been introduced during this compilation process by promising identical results are always generated from a given source, thus allowing multiple third-parties to come to a consensus on whether a build was compromised. If you are interested in contributing to our project, please visit our Contribute page on our website.
On November 6th, Vagrant Cascadian presented at this year s edition of the SeaGL conference, giving a talk titled Debugging Reproducible Builds One Day at a Time:
I ll explore how I go about identifying issues to work on, learn more about the specific issues, recreate the problem locally, isolate the potential causes, dissect the problem into identifiable parts, and adapt the packaging and/or source code to fix the issues.
A video recording of the talk is available on archive.org.
Fedora Magazine published a post written by Zbigniew J drzejewski-Szmek about how to Use Diffoscope in packager workflows, specifically around ensuring that new versions of a package do not introduce breaking changes:
In the role of a packager, updating packages is a recurring task. For some projects, a packager is involved in upstream maintenance, or well written release notes make it easy to figure out what changed between the releases. This isn t always the case, for instance with some small project maintained by one or two people somewhere on GitHub, and it can be useful to verify what exactly changed. Diffoscope can help determine the changes between package releases. [ ]

kpcyrd announced the release of rebuilderd version 0.16.3 on our mailing list this month, adding support for builds to generate multiple artifacts at once.
Lastly, we held another IRC meeting on November 30th. As mentioned in previous reports, due to the global events throughout 2020 etc. there will be no in-person summit event this year.

diffoscope diffoscope is our in-depth and content-aware diff utility. Not only can it locate and diagnose reproducibility issues, it can provide human-readable diffs from many kinds of binary formats. This month, Chris Lamb made the following changes, including preparing and uploading versions 190, 191, 192, 193 and 194 to Debian:
  • New features:
    • Continue loading a .changes file even if the referenced files do not exist, but include a comment in the returned diff. [ ]
    • Log the reason if we cannot load a Debian .changes file. [ ]
  • Bug fixes:
    • Detect XML files as XML files if file(1) claims if they are XML files or if they are named .xml. (#999438)
    • Don t duplicate file lists at each directory level. (#989192)
    • Don t raise a traceback when comparing nested directories with non-directories. [ ]
    • Re-enable test_android_manifest. [ ]
    • Don t reject Debian .changes files if they contain non-printable characters. [ ]
  • Codebase improvements:
    • Avoid aliasing variables if we aren t going to use them. [ ]
    • Use isinstance over type. [ ]
    • Drop a number of unused imports. [ ]
    • Update a bunch of %-style string interpolations into f-strings or str.format. [ ]
    • When pretty-printing JSON, mark the difference as being reformatted, additionally avoiding including the full path. [ ]
    • Import itertools top-level module directly. [ ]
Chris Lamb also made an update to the command-line client to trydiffoscope, a web-based version of the diffoscope in-depth and content-aware diff utility, specifically only waiting for 2 minutes for try.diffoscope.org to respond in tests. (#998360) In addition Brandon Maier corrected an issue where parts of large diffs were missing from the output [ ], Zbigniew J drzejewski-Szmek fixed some logic in the assert_diff_startswith method [ ] and Mattia Rizzolo updated the packaging metadata to denote that we support both Python 3.9 and 3.10 [ ] as well as a number of warning-related changes[ ][ ]. Vagrant Cascadian also updated the diffoscope package in GNU Guix [ ][ ].

Distribution work In Debian, Roland Clobus updated the wiki page documenting Debian reproducible Live images to mention some new bug reports and also posted an in-depth status update to our mailing list. In addition, 90 reviews of Debian packages were added, 18 were updated and 23 were removed this month adding to our knowledge about identified issues. Chris Lamb identified a new toolchain issue, absolute_path_in_cmake_file_generated_by_meson.
Work has begun on classifying reproducibility issues in packages within the Arch Linux distribution. Similar to the analogous effort within Debian (outlined above), package information is listed in a human-readable packages.yml YAML file and a sibling README.md file shows how to classify packages too. Finally, Bernhard M. Wiedemann posted his monthly reproducible builds status report for openSUSE and Vagrant Cascadian updated a link on our website to link to the GNU Guix reproducibility testing overview [ ].

Software development The Reproducible Builds project detects, dissects and attempts to fix as many currently-unreproducible packages as possible. We endeavour to send all of our patches upstream where appropriate. This month, we wrote a large number of such patches, including: Elsewhere, in software development, Jonas Witschel updated strip-nondeterminism, our tool to remove specific non-deterministic results from a completed build so that it did not fail on JAR archives containing invalid members with a .jar extension [ ]. This change was later uploaded to Debian by Chris Lamb. reprotest is the Reproducible Build s project end-user tool to build the same source code twice in widely different environments and checking whether the binaries produced by the builds have any differences. This month, Mattia Rizzolo overhauled the Debian packaging [ ][ ][ ] and fixed a bug surrounding suffixes in the Debian package version [ ], whilst Stefano Rivera fixed an issue where the package tests were broken after the removal of diffoscope from the package s strict dependencies [ ].

Testing framework The Reproducible Builds project runs a testing framework at tests.reproducible-builds.org, to check packages and other artifacts for reproducibility. This month, the following changes were made:
  • Holger Levsen:
    • Document the progress in setting up snapshot.reproducible-builds.org. [ ]
    • Add the packages required for debian-snapshot. [ ]
    • Make the dstat package available on all Debian based systems. [ ]
    • Mark virt32b-armhf and virt64b-armhf as down. [ ]
  • Jochen Sprickerhof:
    • Add SSH authentication key and enable access to the osuosl168-amd64 node. [ ][ ]
  • Mattia Rizzolo:
    • Revert reproducible Debian: mark virt(32 64)b-armhf as down - restored. [ ]
  • Roland Clobus (Debian live image generation):
    • Rename sid internally to unstable until an issue in the snapshot system is resolved. [ ]
    • Extend testing to include Debian bookworm too.. [ ]
    • Automatically create the Jenkins view to display jobs related to building the Live images. [ ]
  • Vagrant Cascadian:
    • Add a Debian package set group for the packages and tools maintained by the Reproducible Builds maintainers themselves. [ ]


If you are interested in contributing to the Reproducible Builds project, please visit our Contribute page on our website. However, you can get in touch with us via:

6 November 2021

Reproducible Builds: Reproducible Builds in October 2021

Welcome to the October 2021 report from the Reproducible Builds project!
This month Samanta Navarro posted to the oss-security security mailing on a novel category of exploit in the .tar archive format, where a single .tar file contains different contents depending on the tar utility being used. Naturally, this has consequences for reproducible builds as Samanta goes onto reply:

Arch Linux uses libarchive (bsdtar) in its build environment. The default tar program installed is GNU tar. It is possible to create a source distribution which leads to different files seen by the build environment than compared to a careful reviewer and other Linux distributions.
Samanta notes that addressing the tar utilities themselves will not be a sufficient fix:
I have submitted bug reports and patches to some projects but eventually I had to conclude that the problem itself cannot be fixed by these implementations alone. The best choice for these tools would be to only allow archives which are fully compatible to standards but this in turn would render a lot of archives broken.
Reproducible builds, with its twin ideas of reaching consensus on the build outputs as well as precisely recording and describing the build environment, would help address this problem at a higher level.
Codethink announced that they had achieved ISO-26262 ASIL D Tool Certification, a way of determining specific safety standards for software. Codethink used open source tooling to achieve this, but they also leverage:
Reproducibility, repeatability and traceability of builds, drawing heavily on best-practices championed by the Reproducible Builds project.

Elsewhere on the internet, according to a comment on Hacker News, Microsoft are now comparing NPM Javascript packages with their original source repositories:
I got a PR in my repository a few days ago leading back to a team trying to make it easier for packages to be reproducible from source.

Lastly, Martin Monperrus started an interesting thread on our mailing list about Github, specifically that their autogenerated release tarballs are not deterministic . The thread generated a significant number of replies that are worth reading.

Events and presentations

Community news On our mailing list this month:
There were quite a few changes to the Reproducible Builds website and documentation this month as well, including Feng Chai updating some links on our publications page [ ] and marco updated our project metadata around the Bitcoin Core building guide [ ].
Lastly, we ran another productive meeting on IRC during October. A full set of notes from the meeting is available to view.

Distribution work Qubes was heavily featured in the latest edition of Linux Weekly News, and a significant section was dedicated to discussing reproducibility. For example, it was mentioned that the Qubes project has been working on incorporating reproducible builds into its continuous integration (CI) infrastructure . But the LWN article goes on to describe that:
The current goal is to be able to build the Qubes OS Debian templates solely from packages that can be built reproducibly. Templates in Qubes OS are VM images that can be used to start an application qube quickly based on the template. The qube will have read-only access to the root filesystem of the template, so that the same root filesystem can be shared with multiple application qubes. There are official templates for several variants of both Fedora and Debian, as well as community maintained templates for several other distributions.
You can view the whole article on LWN, and Fr d ric also published a lengthy summary about their work on reproducible builds in Qubes as well for those wishing to learn more.
In Debian this month, 133 reviews of Debian packages were added, 81 were updated and 24 were removed this month, adding to Debian s ever-growing knowledge about identified issues. A number of issues were categorised and added by Chris Lamb and Vagrant Cascadian too [ ][ ][ ]. In addition, work on alternative snapshot service has made progress by Fr d ric Pierret and Holger Levsen this month, including moving from the existing host (snapshot.notset.fr) to snapshot.reproducible-builds.org (more info) thanks to OSUOSL for the machine and hosting and Debian for the disks.
Finally, Bernhard M. Wiedemann posted his monthly reproducible builds status report.

diffoscope diffoscope is our in-depth and content-aware diff utility. Not only can it locate and diagnose reproducibility issues, it can provide human-readable diffs from many kinds of binary formats. This month, Chris Lamb made the following changes, including preparing and uploading versions 186, 187, 188 and 189 to Debian
  • New features:
    • Add support for Python Sphinx inventory files (usually named objects.inv on-disk). [ ]
    • Add support for comparing .pyc files. Thanks to Sergei Trofimovich for the inspiration. [ ]
    • Try some alternative suffixes (e.g. .py) to support distributions that strip or retain them. [ ][ ]
  • Bug fixes:
    • Fix Python decompilation tests under Python 3.10+ [ ] and for Python 3.7 [ ].
    • Don t raise a traceback if we cannot unmarshal Python bytecode. This is in order to support Python 3.7 failing to load .pyc files generated with newer versions of Python. [ ]
    • Skip Python bytecode testing where we do not have an expected diff. [ ]
  • Codebase improvements:
    • Use our file_version_is_lt utility instead of accepting both versions of uImage expected diff. [ ]
    • Split out a custom call to assert_diff for a .startswith equivalent. [ ]
    • Use skipif instead of manual conditionals in some tests. [ ]
In addition, Jelle van der Waa added external tool references for Arch Linux for ocamlobjinfo, openssl and ffmpeg [ ][ ][ ] and added Arch Linux as a Continuous Integration (CI) test target. [ ] and Vagrant Cascadian updated the testsuite to skip Python bytecode comparisons when file(1) is older than 5.39. [ ] as well as added external tool references for the Guix distribution for dumppdf and ppudump. [ ][ ]. Vagrant Cascadian also updated the diffoscope package in GNU Guix [ ][ ]. Lastly, Guangyuan Yang updated the FreeBSD package name on the website [ ], Mattia Rizzolo made a change to override a new Lintian warning due to the new test files [ ], Roland Clobus added support to detect and log if the GNU_BUILD_ID field in an ELF binary been modified [ ], Sandro J ckel updated a number of helpful links on the website [ ] and Sergei Trofimovich made the uImage test output support file() version 5.41 [ ].

reprotest reprotest is the Reproducible Build s project end-user tool to build same source code twice in widely differing environments, checking the binaries produced by the builds for any differences. This month, reprotest version 0.7.18 was uploaded to Debian unstable by Holger Levsen, which also included a change by Holger to clarify that Python 3.9 is used nowadays [ ], but it also included two changes by Vasyl Gello to implement realistic CPU architecture shuffling [ ] and to log the selected variations when the verbosity is configured at a sufficiently high level [ ]. Finally, Vagrant Cascadian updated reprotest to version 0.7.18 in GNU Guix.

Upstream patches The Reproducible Builds project detects, dissects and attempts to fix unreproducible packages. We try to send all of our patches upstream where appropriate. We authored a large number of such patches this month, including:

Testing framework The Reproducible Builds project runs a testing framework at tests.reproducible-builds.org, to check packages and other artifacts for reproducibility. This month, the following changes were made:
  • Holger Levsen:
    • Debian-related changes:
      • Incorporate a fix from bremner into builtin-pho related to binary-NMUs. [ ]
      • Keep bullseye environments around longer, in an attempt to fix a Jenkins issue. [ ]
      • Improve the documentation of buildinfos.debian.net. [ ]
      • Improve documentation for the builtin-pho setup. [ ][ ]
    • OpenWrt-related changes:
      • Also use -j1 for better debugging. [ ]
      • Document that that Python 3.x is now used. [ ]
      • Enable further debugging for the toolchain build. [ ]
    • New snapshot.reproducible-builds.org service:
      • Actually add new node. [ ][ ]
      • Install xfsprogs on snapshot.reproducible-builds.org. [ ]
      • Create account for fpierret on new node. [ ]
      • Run node_health_check job on new node too. [ ]
  • Mattia Rizzolo:
    • Debian-related changes:
      • Handle schroot errors when invoking diffoscope instead of masking them. [ ][ ]
      • Declare and define some variables separately to avoid masking the subshell return code. [ ]
      • Fix variable name. [ ]
      • Improve log reporting. [ ]
      • Execute apt-get update with the -q argument to get more decent logs. [ ]
      • Set the Debian HTTP mirror and proxy for snapshot.reproducible-builds.org. [ ]
      • Install the libarchive-tools package (instead of bsdtar) when updating Jenkins nodes. [ ]
    • Be stricter about errors when starting the node agent [ ] and don t overwrite NODE_NAME so that we can expect Jenkins to properly set for us [ ].
    • Explicitly warn if the NODE_NAME is not a fully-qualified domain name (FQDN). [ ]
    • Document whether a node runs in the future. [ ]
    • Disable postgresql_autodoc as it not available in bullseye. [ ]
    • Don t be so eager when deleting schroot internals, call to schroot -e to terminate the schroots instead. [ ]
    • Only consider schroot underlays for deletion that are over a month old. [ ][ ]
    • Only try to unmount /proc if it s actually mounted. [ ]
    • Move the db_backup task to its own Jenkins job. [ ]
Lastly, Vasyl Gello added usage information to the reproducible_build.sh script [ ].

Contributing If you are interested in contributing to the Reproducible Builds project, please visit our Contribute page on our website. However, you can get in touch with us via:

5 September 2021

Reproducible Builds: Reproducible Builds in August 2021

Welcome to the latest report from the Reproducible Builds project. In this post, we round up the important things that happened in the world of reproducible builds in August 2021. As always, if you are interested in contributing to the project, please visit the Contribute page on our website.
There were a large number of talks related to reproducible builds at DebConf21 this year, the 21st annual conference of the Debian Linux distribution (full schedule):
PackagingCon (@PackagingCon) is new conference for developers of package management software as well as their related communities and stakeholders. The virtual event, which is scheduled to take place on the 9th and 10th November 2021, has a mission is to bring different ecosystems together: from Python s pip to Rust s cargo to Julia s Pkg, from Debian apt over Nix to conda and mamba, and from vcpkg to Spack we hope to have many different approaches to package management at the conference . A number of people from reproducible builds community are planning on attending this new conference, and some may even present. Tickets start at $20 USD.
As reported in our May report, the president of the United States signed an executive order outlining policies aimed to improve the cybersecurity in the US. The executive order comes after a number of highly-publicised security problems such as a ransomware attack that affected an oil pipeline between Texas and New York and the SolarWinds hack that affected a large number of US federal agencies. As a followup this month, however, a detailed fact sheet was released announcing a number large-scale initiatives and that will undoubtedly be related to software supply chain security and, as a result, reproducible builds.
Lastly, We ran another productive meeting on IRC in August (original announcement) which ran for just short of two hours. A full set of notes from the meeting is available.

Software development kpcyrd announced an interesting new project this month called I probably didn t backdoor this which is an attempt to be:
a practical attempt at shipping a program and having reasonably solid evidence there s probably no backdoor. All source code is annotated and there are instructions explaining how to use reproducible builds to rebuild the artifacts distributed in this repository from source. The idea is shifting the burden of proof from you need to prove there s a backdoor to we need to prove there s probably no backdoor . This repository is less about code (we re going to try to keep code at a minimum actually) and instead contains technical writing that explains why these controls are effective and how to verify them. You are very welcome to adopt the techniques used here in your projects. ( )
As the project s README goes on the mention: the techniques used to rebuild the binary artifacts are only possible because the builds for this project are reproducible . This was also announced on our mailing list this month in a thread titled i-probably-didnt-backdoor-this: Reproducible Builds for upstreams. kpcyrd also wrote a detailed blog post about the problems surrounding Linux distributions (such as Alpine and Arch Linux) that distribute compiled Python bytecode in the form of .pyc files generated during the build process.

diffoscope diffoscope is our in-depth and content-aware diff utility. Not only can it locate and diagnose reproducibility issues, it can provide human-readable diffs from many kinds of binary formats. This month, Chris Lamb made a number of changes, including releasing version 180), version 181) and version 182) as well as the following changes:
  • New features:
    • Add support for extracting the signing block from Android APKs. [ ]
    • If we specify a suffix for a temporary file or directory within the code, ensure it starts with an underscore (ie. _ ) to make the generated filenames more human-readable. [ ]
    • Don t include short GCC lines that differ on a single prefix byte either. These are distracting, not very useful and are simply the strings(1) command s idea of the build ID, which is displayed elsewhere in the diff. [ ][ ]
    • Don t include specific .debug-like lines in the ELF-related output, as it is invariably a duplicate of the debug ID that exists better in the readelf(1) differences for this file. [ ]
  • Bug fixes:
    • Add a special case to SquashFS image extraction to not fail if we aren t the superuser. [ ]
    • Only use java -jar /path/to/apksigner.jar if we have an apksigner.jar as newer versions of apksigner in Debian use a shell wrapper script which will be rejected if passed directly to the JVM. [ ]
    • Reduce the maximum line length for calculating Wagner-Fischer, improving the speed of output generation a lot. [ ]
    • Don t require apksigner in order to compare .apk files using apktool. [ ]
    • Update calls (and tests) for the new version of odt2txt. [ ]
  • Output improvements:
    • Mention in the output if the apksigner tool is missing. [ ]
    • Profile diffoscope.diff.linediff and specialize. [ ][ ]
  • Logging improvements:
    • Format debug-level messages related to ELF sections using the diffoscope.utils.format_class. [ ]
    • Print the size of generated reports in the logs (if possible). [ ]
    • Include profiling information in --debug output if --profile is not set. [ ]
  • Codebase improvements:
    • Clarify a comment about the HUGE_TOOLS Python dictionary. [ ]
    • We can pass -f to apktool to avoid creating a strangely-named subdirectory. [ ]
    • Drop an unused File import. [ ]
    • Update the supported & minimum version of Black. [ ]
    • We don t use the logging variable in a specific place, so alias it to an underscore (ie. _ ) instead. [ ]
    • Update some various copyright years. [ ]
    • Clarify a comment. [ ]
  • Test improvements:
    • Update a test to check specific contents of SquashFS listings, otherwise it fails depending on the test systems user ID to username passwd(5) mapping. [ ]
    • Assign seen and expected values to local variables to improve contextual information in failed tests. [ ]
    • Don t print an orphan newline when the source code formatting test passes. [ ]

In addition Santiago Torres Arias added support for Squashfs version 4.5 [ ] and Felix C. Stegerman suggested a number of small improvements to the output of the new APK signing block [ ]. Lastly, Chris Lamb uploaded python-libarchive-c version 3.1-1 to Debian experimental for the new 3.x branch python-libarchive-c is used by diffoscope.

Distribution work In Debian, 68 reviews of packages were added, 33 were updated and 10 were removed this month, adding to our knowledge about identified issues. Two new issue types have been identified too: nondeterministic_ordering_in_todo_items_collected_by_doxygen and kodi_package_captures_build_path_in_source_filename_hash. kpcyrd published another monthly report on their work on reproducible builds within the Alpine and Arch Linux distributions, specifically mentioning rebuilderd, one of the components powering reproducible.archlinux.org. The report also touches on binary transparency, an important component for supply chain security. The @GuixHPC account on Twitter posted an infographic on what fraction of GNU Guix packages are bit-for-bit reproducible: Finally, Bernhard M. Wiedemann posted his monthly reproducible builds status report for openSUSE.

Upstream patches The Reproducible Builds project detects, dissects and attempts to fix as many currently-unreproducible packages as possible. We endeavour to send all of our patches upstream where appropriate. This month, we wrote a large number of such patches, including: Elsewhere, it was discovered that when supporting various new language features and APIs for Android apps, the resulting APK files that are generated now vary wildly from build to build (example diffoscope output). Happily, it appears that a patch has been committed to the relevant source tree. This was also discussed on our mailing list this month in a thread titled Android desugaring and reproducible builds started by Marcus Hoffmann.

Website and documentation There were quite a few changes to the Reproducible Builds website and documentation this month, including:
  • Felix C. Stegerman:
    • Update the website self-build process to not use the buster-backports suite now that Debian Bullseye is the stable release. [ ]
  • Holger Levsen:
    • Add a new page documenting various package rebuilder solutions. [ ]
    • Add some historical talks and slides from DebConf20. [ ][ ]
    • Various improvements to the history page. [ ][ ][ ]
    • Rename the Comparison protocol documentation category to Verification . [ ]
    • Update links to F-Droid documentation. [ ]
  • Ian Muchina:
    • Increase the font size of titles and de-emphasize event details on the talk page. [ ]
    • Rename the README file to README.md to improve the user experience when browsing the Git repository in a web browser. [ ]
  • Mattia Rizzolo:
    • Drop a position:fixed CSS statement that is negatively affecting with some width settings. [ ]
    • Fix the sizing of the elements inside the side navigation bar. [ ]
    • Show gold level sponsors and above in the sidebar. [ ]
    • Updated the documentation within reprotest to mention how ldconfig conflicts with the kernel variation. [ ]
  • Roland Clobus:
    • Added a ticket number for the issue with the live Cinnamon image and diffoscope. [ ]

Testing framework The Reproducible Builds project runs a testing framework at tests.reproducible-builds.org, to check packages and other artifacts for reproducibility. This month, the following changes were made:
  • Holger Levsen:
    • Debian-related changes:
      • Make a large number of changes to support the new Debian bookworm release, including adding it to the dashboard [ ], start scheduling tests [ ], adding suitable Apache redirects [ ] etc. [ ][ ][ ][ ][ ]
      • Make the first build use LANG=C.UTF-8 to match the official Debian build servers. [ ]
      • Only test Debian Live images once a week. [ ]
      • Upgrade all nodes to use Debian Bullseye [ ] [ ]
      • Update README documentation for the Debian Bullseye release. [ ]
    • Other changes:
      • Only include rsync output if the $DEBUG variable is enabled. [ ]
      • Don t try to install mock, a tool used to build Fedora packages some time ago. [ ]
      • Drop an unused function. [ ]
      • Various documentation improvements. [ ][ ]
      • Improve the node health check to detect zombie jobs. [ ]
  • Jessica Clarke (FreeBSD-related changes):
    • Update the location and branch name for the main FreeBSD Git repository. [ ]
    • Correctly ignore the source tarball when comparing build results. [ ]
    • Drop an outdated version number from the documentation. [ ]
  • Mattia Rizzolo:
    • Block F-Droid jobs from running whilst the setup is running. [ ]
    • Enable debugging for the rsync job related to Debian Live images. [ ]
    • Pass BUILD_TAG and BUILD_URL environment for the Debian Live jobs. [ ]
    • Refactor the master_wrapper script to use a Bash array for the parameters. [ ]
    • Prefer YAML s safe_load() function over the unsafe variant. [ ]
    • Use the correct variable in the Apache config to match possible existing files on disk. [ ]
    • Stop issuing HTTP 301 redirects for things that not actually permanent. [ ]
  • Roland Clobus (Debian live image generation):
    • Increase the diffoscope timeout from 120 to 240 minutes; the Cinnamon image should now be able to finish. [ ]
    • Use the new snapshot service. [ ]
    • Make a number of improvements to artifact handling, such as moving the artifacts to the Jenkins host [ ] and correctly cleaning them up at the right time. [ ][ ][ ]
    • Where possible, link to the Jenkins build URL that created the artifacts. [ ][ ]
    • Only allow only one job to run at the same time. [ ]
  • Vagrant Cascadian:
    • Temporarily disable armhf nodes for DebConf21. [ ][ ]

Lastly, if you are interested in contributing to the Reproducible Builds project, please visit the Contribute page on our website. You can get in touch with us via:

5 October 2020

Reproducible Builds: Reproducible Builds in September 2020

Welcome to the September 2020 report from the Reproducible Builds project. In our monthly reports, we attempt to summarise the things that we have been up to over the past month, but if you are interested in contributing to the project, please visit our main website. This month, the Reproducible Builds project was pleased to announce a donation from Amateur Radio Digital Communications (ARDC) in support of its goals. ARDC s contribution will propel the Reproducible Builds project s efforts in ensuring the future health, security and sustainability of our increasingly digital society. Amateur Radio Digital Communications (ARDC) is a non-profit which was formed to further research and experimentation with digital communications using radio, with a goal of advancing the state of the art of amateur radio and to educate radio operators in these techniques. You can view the full announcement as well as more information about ARDC on their website.
In August s report, we announced that Jennifer Helsby (redshiftzero) launched a new reproduciblewheels.com website to address the lack of reproducibility of Python wheels . This month, Kushal Das posted a brief follow-up to provide an update on reproducible sources as well. The Threema privacy and security-oriented messaging application announced that within the next months , their apps will become fully open source, supporting reproducible builds :
This is to say that anyone will be able to independently review Threema s security and verify that the published source code corresponds to the downloaded app.
You can view the full announcement on Threema s website.

Events Sadly, due to the unprecedented events in 2020, there will be no in-person Reproducible Builds event this year. However, the Reproducible Builds project intends to resume meeting regularly on IRC, starting on Monday, October 12th at 18:00 UTC (full announcement). The cadence of these meetings will probably be every two weeks, although this will be discussed and decided on at the first meeting. (An editable agenda is available.) On 18th September, Bernhard M. Wiedemann gave a presentation in German titled Wie reproducible builds Software sicherer machen ( How reproducible builds make software more secure ) at the Internet Security Digital Days 2020 conference. (View video.) On Saturday 10th October, Morten Linderud will give a talk at Arch Conf Online 2020 on The State of Reproducible Builds in the Arch Linux distribution:
The previous year has seen great progress in Arch Linux to get reproducible builds in the hands of the users and developers. In this talk we will explore the current tooling that allows users to reproduce packages, the rebuilder software that has been written to check packages and the current issues in this space.
During the Reproducible Builds summit in Marrakesh, GNU Guix, NixOS and Debian were able to produce a bit-for-bit identical binary when building GNU Mes, despite using three different major versions of GCC. Since the summit, additional work resulted in a bit-for-bit identical Mes binary using tcc and this month, a fuller update was posted by the individuals involved.

Development work In openSUSE, Bernhard M. Wiedemann published his monthly Reproducible Builds status update.

Debian Chris Lamb uploaded a number of Debian packages to address reproducibility issues that he had previously provided patches for, including cfingerd (#831021), grap (#870573), splint (#924003) & schroot (#902804) Last month, an issue was identified where a large number of Debian .buildinfo build certificates had been tainted on the official Debian build servers, as these environments had files underneath the /usr/local/sbin directory to prevent the execution of system services during package builds. However, this month, Aurelien Jarno and Wouter Verhelst fixed this issue in varying ways, resulting in a special policy-rcd-declarative-deny-all package. Building on Chris Lamb s previous work on reproducible builds for Debian .ISO images, Roland Clobus announced his work in progress on making the Debian Live images reproducible. [ ] Lucas Nussbaum performed an archive-wide rebuild of packages to test enabling the reproducible=+fixfilepath Debian build flag by default. Enabling the fixfilepath feature will likely fix reproducibility issues in an estimated 500-700 packages. The test revealed only 33 packages (out of 30,000 in the archive) that fail to build with fixfilepath. Many of those will be fixed when the default LLVM/Clang version is upgraded. 79 reviews of Debian packages were added, 23 were updated and 17 were removed this month adding to our knowledge about identified issues. Chris Lamb added and categorised a number of new issue types, including packages that captures their build path via quicktest.h and absolute build directories in documentation generated by Doxygen , etc. Lastly, Lukas Puehringer s uploaded a new version of the in-toto to Debian which was sponsored by Holger Levsen. [ ]

diffoscope diffoscope is our in-depth and content-aware diff utility that can not only locate and diagnose reproducibility issues, it provides human-readable diffs of all kinds too. In September, Chris Lamb made the following changes to diffoscope, including preparing and uploading versions 159 and 160 to Debian:
  • New features:
    • Show ordering differences only in strings(1) output by applying the ordering check to all differences across the codebase. [ ]
  • Bug fixes:
    • Mark some PGP tests that they require pgpdump, and check that the associated binary is actually installed before attempting to run it. (#969753)
    • Don t raise exceptions when cleaning up after guestfs cleanup failure. [ ]
    • Ensure we check FALLBACK_FILE_EXTENSION_SUFFIX, otherwise we run pgpdump against all files that are recognised by file(1) as data. [ ]
  • Codebase improvements:
    • Add some documentation for the EXTERNAL_TOOLS dictionary. [ ]
    • Abstract out a variable we use a couple of times. [ ]
  • diffoscope.org website improvements:
    • Make the (long) demonstration GIF less prominent on the page. [ ]
In addition, Paul Spooren added support for automatically deploying Docker images. [ ]

Website and documentation This month, a number of updates to the main Reproducible Builds website and related documentation. Chris Lamb made the following changes: In addition, Holger Levsen re-added the documentation link to the top-level navigation [ ] and documented that the jekyll-polyglot package is required [ ]. Lastly, diffoscope.org and reproducible-builds.org were transferred to Software Freedom Conservancy. Many thanks to Brett Smith from Conservancy, J r my Bobbio (lunar) and Holger Levsen for their help with transferring and to Mattia Rizzolo for initiating this.

Upstream patches The Reproducible Builds project detects, dissects and attempts to fix as many currently-unreproducible packages as possible. We endeavour to send all of our patches upstream where appropriate. This month, we wrote a large number of these patches, including: Bernhard M. Wiedemann also reported issues in git2-rs, pyftpdlib, python-nbclient, python-pyzmq & python-sidpy.

Testing framework The Reproducible Builds project operates a Jenkins-based testing framework to power tests.reproducible-builds.org. This month, Holger Levsen made the following changes:
  • Debian:
    • Shorten the subject of nodes have gone offline notification emails. [ ]
    • Also track bugs that have been usertagged with usrmerge. [ ]
    • Drop abort-related codepaths as that functionality has been removed from Jenkins. [ ]
    • Update the frequency we update base images and status pages. [ ][ ][ ][ ]
  • Status summary view page:
    • Add support for monitoring systemctl status [ ] and the number of diffoscope processes [ ].
    • Show the total number of nodes [ ] and colourise critical disk space situations [ ].
    • Improve the visuals with respect to vertical space. [ ][ ]
  • Debian rebuilder prototype:
    • Resume building random packages again [ ] and update the frequency that packages are rebuilt. [ ][ ]
    • Use --no-respect-build-path parameter until sbuild 0.81 is available. [ ]
    • Treat the inability to locate some packages as a debrebuild problem, and not as a issue with the rebuilder itself. [ ]
  • Arch Linux:
    • Update various components to be compatible with Arch Linux s move to the xz compression format. [ ][ ][ ]
    • Allow scheduling of old packages to catch up on the backlog. [ ][ ][ ]
    • Improve formatting on the summary page. [ ][ ]
    • Update HTML pages once every hour, not every 30 minutes. [ ]
    • Use the Ubuntu (!) GPG keyserver to validate packages. [ ]
  • System health checks:
    • Highlight important bad conditions in colour. [ ][ ]
    • Add support for detecting more problems, including Jenkins shutdown issues [ ], failure to upgrade Arch Linux packages [ ], kernels with wrong permissions [ ], etc.
  • Misc:
    • Delete old schroot sessions after 2 days, not 3. [ ]
    • Use sudo to cleanup diffoscope schroot sessions. [ ]
In addition, stefan0xC fixed a query for unknown results in the handling of Arch Linux packages [ ] and Mattia Rizzolo updated the template that notifies maintainers by email of their newly-unreproducible packages to ensure that it did not get caught in junk/spam folders [ ]. Finally, build node maintenance was performed by Holger Levsen [ ][ ][ ][ ], Mattia Rizzolo [ ][ ] and Vagrant Cascadian [ ][ ][ ].
If you are interested in contributing to the Reproducible Builds project, please visit our Contribute page on our website. However, you can get in touch with us via:

8 August 2020

Reproducible Builds: Reproducible Builds in July 2020

Welcome to the July 2020 report from the Reproducible Builds project. In these monthly reports, we round-up the things that we have been up to over the past month. As a brief refresher, the motivation behind the Reproducible Builds effort is to ensure no flaws have been introduced from the original free software source code to the pre-compiled binaries we install on our systems. (If you re interested in contributing to the project, please visit our main website.)

General news At the upcoming DebConf20 conference (now being held online), Holger Levsen will present a talk on Thursday 27th August about Reproducing Bullseye in practice , focusing on independently verifying that the binaries distributed from ftp.debian.org were made from their claimed sources. Tavis Ormandy published a blog post making the provocative claim that You don t need reproducible builds , asserting elsewhere that the many attacks that have been extensively reported in our previous reports are fantasy threat models . A number of rebuttals have been made, including one from long-time contributor Reproducible Builds contributor Bernhard Wiedemann. On our mailing list this month, Debian Developer Graham Inggs posted to our list asking for ideas why the openorienteering-mapper Debian package was failing to build on the Reproducible Builds testing framework. Chris Lamb remarked from the build logs that the package may be missing a build dependency, although Graham then used our own diffoscope tool to show that the resulting package remains unchanged with or without it. Later, Nico Tyni noticed that the build failure may be due to the relationship between the FILE C preprocessor macro and the -ffile-prefix-map GCC flag. An issue in Zephyr, a small-footprint kernel designed for use on resource-constrained systems, around .a library files not being reproducible was closed after it was noticed that a key part of their toolchain was updated that now calls --enable-deterministic-archives by default. Reproducible Builds developer kpcyrd commented on a pull request against the libsodium cryptographic library wrapper for Rust, arguing against the testing of CPU features at compile-time. He noted that:
I ve accidentally shipped broken updates to users in the past because the build system was feature-tested and the final binary assumed the instructions would be present without further runtime checks
David Kleuker also asked a question on our mailing list about using SOURCE_DATE_EPOCH with the install(1) tool from GNU coreutils. When comparing two installed packages he noticed that the filesystem birth times differed between them. Chris Lamb replied, realising that this was actually a consequence of using an outdated version of diffoscope and that a fix was in diffoscope version 146 released in May 2020. Later in July, John Scott posted asking for clarification regarding on the Javascript files on our website to add metadata for LibreJS, the browser extension that blocks non-free Javascript scripts from executing. Chris Lamb investigated the issue and realised that we could drop a number of unused Javascript files [ ][ ][ ] and added unminified versions of Bootstrap and jQuery [ ].

Development work

Website On our website this month, Chris Lamb updated the main Reproducible Builds website and documentation to drop a number of unused Javascript files [ ][ ][ ] and added unminified versions of Bootstrap and jQuery [ ]. He also fixed a number of broken URLs [ ][ ]. Gonzalo Bulnes Guilpain made a large number of grammatical improvements [ ][ ][ ][ ][ ] as well as some misspellings, case and whitespace changes too [ ][ ][ ]. Lastly, Holger Levsen updated the README file [ ], marked the Alpine Linux continuous integration tests as currently disabled [ ] and linked the Arch Linux Reproducible Status page from our projects page [ ].

diffoscope diffoscope is our in-depth and content-aware diff utility that can not only locate and diagnose reproducibility issues, it provides human-readable diffs of all kinds. In July, Chris Lamb made the following changes to diffoscope, including releasing versions 150, 151, 152, 153 & 154:
  • New features:
    • Add support for flash-optimised F2FS filesystems. (#207)
    • Don t require zipnote(1) to determine differences in a .zip file as we can use libarchive. [ ]
    • Allow --profile as a synonym for --profile=-, ie. write profiling data to standard output. [ ]
    • Increase the minimum length of the output of strings(1) to eight characters to avoid unnecessary diff noise. [ ]
    • Drop some legacy argument styles: --exclude-directory-metadata and --no-exclude-directory-metadata have been replaced with --exclude-directory-metadata= yes,no . [ ]
  • Bug fixes:
    • Pass the absolute path when extracting members from SquashFS images as we run the command with working directory in a temporary directory. (#189)
    • Correct adding a comment when we cannot extract a filesystem due to missing libguestfs module. [ ]
    • Don t crash when listing entries in archives if they don t have a listed size such as hardlinks in ISO images. (#188)
  • Output improvements:
    • Strip off the file offset prefix from xxd(1) and show bytes in groups of 4. [ ]
    • Don t emit javap not found in path if it is available in the path but it did not result in an actual difference. [ ]
    • Fix ... not available in path messages when looking for Java decompilers that used the Python class name instead of the command. [ ]
  • Logging improvements:
    • Add a bit more debugging info when launching libguestfs. [ ]
    • Reduce the --debug log noise by truncating the has_some_content messages. [ ]
    • Fix the compare_files log message when the file does not have a literal name. [ ]
  • Codebase improvements:
    • Rewrite and rename exit_if_paths_do_not_exist to not check files multiple times. [ ][ ]
    • Add an add_comment helper method; don t mess with our internal list directly. [ ]
    • Replace some simple usages of str.format with Python f-strings [ ] and make it easier to navigate to the main.py entry point [ ].
    • In the RData comparator, always explicitly return None in the failure case as we return a non-None value in the success one. [ ]
    • Tidy some imports [ ][ ][ ] and don t alias a variable when we do not use it. [ ]
    • Clarify the use of a separate NullChanges quasi-file to represent missing data in the Debian package comparator [ ] and clarify use of a null diff in order to remember an exit code. [ ]
  • Other changes:
    • Profile the launch of libguestfs filesystems. [ ]
    • Clarify and correct our contributing info. [ ][ ][ ][ ][ ][ ]
Jean-Romain Garnier also made the following changes:
  • Allow passing a file with a list of arguments via diffoscope @args.txt. (!62)
  • Improve the output of side-by-side diffs by detecting added lines better. (!64)
  • Remove offsets before instructions in objdump [ ][ ] and remove raw instructions from ELF tests [ ].

Other tools strip-nondeterminism is our tool to remove specific non-deterministic results from a completed build. It is used automatically in most Debian package builds. In July, Chris Lamb ensured that we did not install the internal handler documentation generated from Perl POD documents [ ] and fixed a trivial typo [ ]. Marc Herbert added a --verbose-level warning when the Archive::Cpio Perl module is missing. (!6) reprotest is our end-user tool to build same source code twice in widely differing environments and then checks the binaries produced by each build for any differences. This month, Vagrant Cascadian made a number of changes to support diffoscope version 153 which had removed the (deprecated) --exclude-directory-metadata and --no-exclude-directory-metadata command-line arguments, and updated the testing configuration to also test under Python version 3.8 [ ].

Distributions

Debian In June 2020, Timo R hling filed a wishlist bug against the debhelper build tool impacting the reproducibility status of hundreds of packages that use the CMake build system. This month however, Niels Thykier uploaded debhelper version 13.2 that passes the -DCMAKE_SKIP_RPATH=ON and -DBUILD_RPATH_USE_ORIGIN=ON arguments to CMake when using the (currently-experimental) Debhelper compatibility level 14. According to Niels, this change:
should fix some reproducibility issues, but may cause breakage if packages run binaries directly from the build directory.
34 reviews of Debian packages were added, 14 were updated and 20 were removed this month adding to our knowledge about identified issues. Chris Lamb added and categorised the nondeterministic_order_of_debhelper_snippets_added_by_dh_fortran_mod [ ] and gem2deb_install_mkmf_log [ ] toolchain issues. Lastly, Holger Levsen filed two more wishlist bugs against the debrebuild Debian package rebuilder tool [ ][ ].

openSUSE In openSUSE, Bernhard M. Wiedemann published his monthly Reproducible Builds status update. Bernhard also published the results of performing 12,235 verification builds of packages from openSUSE Leap version 15.2 and, as a result, created three pull requests against the openSUSE Build Result Compare Script [ ][ ][ ].

Other distributions In Arch Linux, there was a mass rebuild of old packages in an attempt to make them reproducible. This was performed because building with a previous release of the pacman package manager caused file ordering and size calculation issues when using the btrfs filesystem. A system was also implemented for Arch Linux packagers to receive notifications if/when their package becomes unreproducible, and packagers now have access to a dashboard where they can all see all their unreproducible packages (more info). Paul Spooren sent two versions of a patch for the OpenWrt embedded distribution for adding a build system revision to the packages manifest so that all external feeds can be rebuilt and verified. [ ][ ]

Upstream patches The Reproducible Builds project detects, dissects and attempts to fix as many currently-unreproducible packages as possible. We endeavour to send all of our patches upstream where appropriate. This month, we wrote a large number of these patches, including: Vagrant Cascadian also reported two issues, the first regarding a regression in u-boot boot loader reproducibility for a particular target [ ] and a non-deterministic segmentation fault in the guile-ssh test suite [ ]. Lastly, Jelle van der Waa filed a bug against the MeiliSearch search API to report that it embeds the current build date.

Testing framework We operate a large and many-featured Jenkins-based testing framework that powers tests.reproducible-builds.org. This month, Holger Levsen made the following changes:
  • Debian-related changes:
    • Tweak the rescheduling of various architecture and suite combinations. [ ][ ]
    • Fix links for 404 and not for us icons. (#959363)
    • Further work on a rebuilder prototype, for example correctly processing the sbuild exit code. [ ][ ]
    • Update the sudo configuration file to allow the node health job to work correctly. [ ]
    • Add php-horde packages back to the pkg-php-pear package set for the bullseye distribution. [ ]
    • Update the version of debrebuild. [ ]
  • System health check development:
    • Add checks for broken SSH [ ], logrotate [ ], pbuilder [ ], NetBSD [ ], unkillable processes [ ], unresponsive nodes [ ][ ][ ][ ], proxy connection failures [ ], too many installed kernels [ ], etc.
    • Automatically fix some failed systemd units. [ ]
    • Add notes explaining all the issues that hosts are experiencing [ ] and handle zipped job log files correctly [ ].
    • Separate nodes which have been automatically marked as down [ ] and show status icons for jobs with issues [ ].
  • Misc:
    • Disable all Alpine Linux jobs until they are or Alpine is fixed. [ ]
    • Perform some general upkeep of build nodes hosted by OSUOSL. [ ][ ][ ][ ]
In addition, Mattia Rizzolo updated the init_node script to suggest using sudo instead of explicit logout and logins [ ][ ] and the usual build node maintenance was performed by Holger Levsen [ ][ ][ ][ ][ ][ ], Mattia Rizzolo [ ][ ] and Vagrant Cascadian [ ][ ][ ][ ].

If you are interested in contributing to the Reproducible Builds project, please visit our Contribute page on our website. However, you can get in touch with us via:

4 June 2020

Reproducible Builds: Reproducible Builds in May 2020

Welcome to the May 2020 report from the Reproducible Builds project. One of the original promises of open source software is that distributed peer review and transparency of process results in enhanced end-user security. Nonetheless, whilst anyone may inspect the source code of free and open source software for malicious flaws, almost all software today is distributed as pre-compiled binaries. This allows nefarious third-parties to compromise systems by injecting malicious code into seemingly secure software during the various compilation and distribution processes. In these reports we outline the most important things that we and the rest of the community have been up to over the past month.

News The Corona-Warn app that helps trace infection chains of SARS-CoV-2/COVID-19 in Germany had a feature request filed against it that it build reproducibly. A number of academics from Cornell University have published a paper titled Backstabber s Knife Collection which reviews various open source software supply chain attacks:
Recent years saw a number of supply chain attacks that leverage the increasing use of open source during software development, which is facilitated by dependency managers that automatically resolve, download and install hundreds of open source packages throughout the software life cycle.
In related news, the LineageOS Android distribution announced that a hacker had access to the infrastructure of their servers after exploiting an unpatched vulnerability. Marcin Jachymiak of the Sia decentralised cloud storage platform posted on their blog that their siac and siad utilities can now be built reproducibly:
This means that anyone can recreate the same binaries produced from our official release process. Now anyone can verify that the release binaries were created using the source code we say they were created from. No single person or computer needs to be trusted when producing the binaries now, which greatly reduces the attack surface for Sia users.
Synchronicity is a distributed build system for Rust build artifacts which have been published to crates.io. The goal of Synchronicity is to provide a distributed binary transparency system which is independent of any central operator. The Comparison of Linux distributions article on Wikipedia now features a Reproducible Builds column indicating whether distributions approach and progress towards achieving reproducible builds.

Distribution work In Debian this month: In Alpine Linux, an issue was filed and closed regarding the reproducibility of .apk packages. Allan McRae of the ArchLinux project posted their third Reproducible builds progress report to the arch-dev-public mailing list which includes the following call for help:
We also need help to investigate and fix the packages that fail to reproduce that we have not investigated as of yet.
In openSUSE, Bernhard M. Wiedemann published his monthly Reproducible Builds status update.

Software development

diffoscope Chris Lamb made the changes listed below to diffoscope, our in-depth and content-aware diff utility that can locate and diagnose reproducibility issues. He also prepared and uploaded versions 142, 143, 144, 145 and 146 to Debian, PyPI, etc.
  • Comparison improvements:
    • Improve fuzzy matching of JSON files as file now supports recognising JSON data. (#106)
    • Refactor .changes and .buildinfo handling to show all details (including the GnuPG header and footer components) even when referenced files are not present. (#122)
    • Use our BuildinfoFile comparator (etc.) regardless of whether the associated files (such as the orig.tar.gz and the .deb) are present. [ ]
    • Include GnuPG signature data when comparing .buildinfo, .changes, etc. [ ]
    • Add support for printing Android APK signatures via apksigner(1). (#121)
    • Identify iOS App Zip archive data as .zip files. (#116)
    • Add support for Apple Xcode .mobilepovision files. (#113)
  • Bug fixes:
    • Don t print a traceback if we pass a single, missing argument to diffoscope (eg. a JSON diff to re-load). [ ]
    • Correct differences typo in the ApkFile handler. (#127)
  • Output improvements:
    • Never emit the same id="foo" anchor reference twice in the HTML output, otherwise identically-named parts will not be able to linked to via a #foo anchor. (#120)
    • Never emit an empty id anchor either; it is not possible to link to #. [ ]
    • Don t pretty-print the output when using the --json presenter; it will usually be too complicated to be readable by the human anyway. [ ]
    • Use the SHA256 over MD5 hash when generating page names for the HTML directory-style presenter. (#124)
  • Reporting improvements:
    • Clarify the message when we truncate the number of lines to standard error [ ] and reduce the number of maximum lines printed to 25 as usually the error is obvious by then [ ].
    • Print the amount of free space that we have available in our temporary directory as a debugging message. [ ]
    • Clarify Command [ ] failed with exit code messages to remove duplicate exited with exit but also to note that diffoscope is interpreting this as an error. [ ]
    • Don t leak the full path of the temporary directory in Command [ ] exited with 1 messages. (#126)
    • Clarify the warning message when we cannot import the debian Python module. [ ]
    • Don t repeat stderr from if both commands emit the same output. [ ]
    • Clarify that an external command emits for both files, otherwise it can look like we are repeating itself when, in reality, it is being run twice. [ ]
  • Testsuite improvements:
    • Prevent apksigner test failures due to lack of binfmt_misc, eg. on Salsa CI and elsewhere. [ ]
    • Drop .travis.yml as we use Salsa instead. [ ]
  • Dockerfile improvements:
    • Add a .dockerignore file to whitelist files we actually need in our container. (#105)
    • Use ARG instead of ENV when setting up the DEBIAN_FRONTEND environment variable at runtime. (#103)
    • Run as a non-root user in container. (#102)
    • Install/remove the build-essential during build so we can install the recommended packages from Git. [ ]
  • Codebase improvements:
    • Bump the officially required version of Python from 3.5 to 3.6. (#117)
    • Drop the (default) shell=False keyword argument to subprocess.Popen so that the potentially-unsafe shell=True is more obvious. [ ]
    • Perform string normalisation in Black [ ] and include the Black output in the assertion failure too [ ].
    • Inline MissingFile s special handling of deb822 to prevent leaking through abstract layers. [ ][ ]
    • Allow a bare try/except block when cleaning up temporary files with respect to the flake8 quality assurance tool. [ ]
    • Rename in_dsc_path to dsc_in_same_dir to clarify the use of this variable. [ ]
    • Abstract out the duplicated parts of the debian_fallback class [ ] and add descriptions for the file types. [ ]
    • Various commenting and internal documentation improvements. [ ][ ]
    • Rename the Openssl command class to OpenSSLPKCS7 to accommodate other command names with this prefix. [ ]
  • Misc:
    • Rename the --debugger command-line argument to --pdb. [ ]
    • Normalise filesystem stat(2) birth times (ie. st_birthtime) in the same way we do with the stat(1) command s Access: and Change: times to fix a nondeterministic build failure in GNU Guix. (#74)
    • Ignore case when ordering our file format descriptions. [ ]
    • Drop, add and tidy various module imports. [ ][ ][ ][ ]
In addition:
  • Jean-Romain Garnier fixed a general issue where, for example, LibarchiveMember s has_same_content method was called regardless of the underlying type of file. [ ]
  • Daniel Fullmer fixed an issue where some filesystems could only be mounted read-only. (!49)
  • Emanuel Bronshtein provided a patch to prevent a build of the Docker image containing parts of the build s. (#123)
  • Mattia Rizzolo added an entry to debian/py3dist-overrides to ensure the rpm-python module is used in package dependencies (#89) and moved to using the new execute_after_* and execute_before_* Debhelper rules [ ].

Chris Lamb also performed a huge overhaul of diffoscope s website:
  • Add a completely new design. [ ][ ]
  • Dynamically generate our contributor list [ ] and supported file formats [ ] from the main Git repository.
  • Add a separate, canonical page for every new release. [ ][ ][ ]
  • Generate a latest release section and display that with the corresponding date on the homepage. [ ]
  • Add an RSS feed of our releases [ ][ ][ ][ ][ ] and add to Planet Debian [ ].
  • Use Jekyll s absolute_url and relative_url where possible [ ][ ] and move a number of configuration variables to _config.yml [ ][ ].

Upstream patches The Reproducible Builds project detects, dissects and attempts to fix as many currently-unreproducible packages as possible. We endeavour to send all of our patches upstream where appropriate. This month, we wrote a large number of such patches, including:

Other tools Elsewhere in our tooling: strip-nondeterminism is our tool to remove specific non-deterministic results from a completed build. In May, Chris Lamb uploaded version 1.8.1-1 to Debian unstable and Bernhard M. Wiedemann fixed an off-by-one error when parsing PNG image modification times. (#16) In disorderfs, our FUSE-based filesystem that deliberately introduces non-determinism into directory system calls in order to flush out reproducibility issues, Chris Lamb replaced the term dirents in place of directory entries in human-readable output/log messages [ ] and used the astyle source code formatter with the default settings to the main disorderfs.cpp source file [ ]. Holger Levsen bumped the debhelper-compat level to 13 in disorderfs [ ] and reprotest [ ], and for the GNU Guix distribution Vagrant Cascadian updated the versions of disorderfs to version 0.5.10 [ ] and diffoscope to version 145 [ ].

Project documentation & website
  • Carl Dong:
  • Chris Lamb:
    • Rename the Who page to Projects . [ ]
    • Ensure that Jekyll enters the _docs subdirectory to find the _docs/index.md file after an internal move. (#27)
    • Wrap ltmain.sh etc. in preformatted quotes. [ ]
    • Wrap the SOURCE_DATE_EPOCH Python examples onto more lines to prevent visual overflow on the page. [ ]
    • Correct a preferred spelling error. [ ]
  • Holger Levsen:
    • Sort our Academic publications page by publication year [ ] and add Trusting Trust and Fully Countering Trusting Trust through Diverse Double-Compiling [ ].
  • Juri Dispan:

Testing framework We operate a large and many-featured Jenkins-based testing framework that powers tests.reproducible-builds.org that, amongst many other tasks, tracks the status of our reproducibility efforts as well as identifies any regressions that have been introduced. Holger Levsen made the following changes:
  • System health status:
    • Improve page description. [ ]
    • Add more weight to proxy failures. [ ]
    • More verbose debug/failure messages. [ ][ ][ ]
    • Work around strangeness in the Bash shell let VARIABLE=0 exits with an error. [ ]
  • Debian:
    • Fail loudly if there are more than three .buildinfo files with the same name. [ ]
    • Fix a typo which prevented /usr merge variation on Debian unstable. [ ]
    • Temporarily ignore PHP s horde](https://www.horde.org/) packages in Debian bullseye. [ ]
    • Document how to reboot all nodes in parallel, working around molly-guard. [ ]
  • Further work on a Debian package rebuilder:
    • Workaround and document various issues in the debrebuild script. [ ][ ][ ][ ]
    • Improve output in the case of errors. [ ][ ][ ][ ]
    • Improve documentation and future goals [ ][ ][ ][ ], in particular documentiing two real world tests case for an impossible to recreate build environment [ ].
    • Find the right source package to rebuild. [ ]
    • Increase the frequency we run the script. [ ][ ][ ][ ]
    • Improve downloading and selection of the sources to build. [ ][ ][ ]
    • Improve version string handling.. [ ]
    • Handle build failures better. [ ]. [ ]. [ ]
    • Also consider architecture all .buildinfo files. [ ][ ]
In addition:
  • kpcyrd, for Alpine Linux, updated the alpine_schroot.sh script now that a patch for abuild had been released upstream. [ ]
  • Alexander Couzens of the OpenWrt project renamed the brcm47xx target to bcm47xx. [ ]
  • Mattia Rizzolo fixed the printing of the build environment during the second build [ ][ ][ ] and made a number of improvements to the script that deploys Jenkins across our infrastructure [ ][ ][ ].
Lastly, Vagrant Cascadian clarified in the documentation that you need to be user jenkins to run the blacklist command [ ] and the usual build node maintenance was performed was performed by Holger Levsen [ ][ ][ ], Mattia Rizzolo [ ][ ] and Vagrant Cascadian [ ][ ][ ].

Mailing list: There were a number of discussions on our mailing list this month: Paul Spooren started a thread titled Reproducible Builds Verification Format which reopens the discussion around a schema for sharing the results from distributed rebuilders:
To make the results accessible, storable and create tools around them, they should all follow the same schema, a reproducible builds verification format. The format tries to be as generic as possible to cover all open source projects offering precompiled source code. It stores the rebuilder results of what is reproducible and what not.
Hans-Christoph Steiner of the Guardian Project also continued his previous discussion regarding making our website translatable. Lastly, Leo Wandersleb posted a detailed request for feedback on a question of supply chain security and other issues of software review; Leo is the founder of the Wallet Scrutiny project which aims to prove the security of Android Bitcoin Wallets:
Do you own your Bitcoins or do you trust that your app allows you to use your coins while they are actually controlled by them ? Do you have a backup? Do they have a copy they didn t tell you about? Did anybody check the wallet for deliberate backdoors or vulnerabilities? Could anybody check the wallet for those?
Elsewhere, Leo had posted instructions on his attempts to reproduce the binaries for the BlueWallet Bitcoin wallet for iOS and Android platforms.


If you are interested in contributing to the Reproducible Builds project, please visit our Contribute page on our website. However, you can get in touch with us via:

This month s report was written by Bernhard M. Wiedemann, Chris Lamb, Holger Levsen, Jelle van der Waa and Vagrant Cascadian. It was subsequently reviewed by a bunch of Reproducible Builds folks on IRC and the mailing list.

6 May 2020

Reproducible Builds: Reproducible Builds in April 2020

Welcome to the April 2020 report from the Reproducible Builds project. In our regular reports we outline the most important things that we and the rest of the community have been up to over the past month. What are reproducible builds? One of the original promises of open source software is that distributed peer review and transparency of process results in enhanced end-user security. But whilst anyone may inspect the source code of free and open source software for malicious flaws, almost all software today is distributed as pre-compiled binaries. This allows nefarious third-parties to compromise systems by injecting malicious code into seemingly secure software during the various compilation and distribution processes.

News It was discovered that more than 725 malicious packages were downloaded thousands of times from RubyGems, the official channel for distributing code for the Ruby programming language. Attackers used a variation of typosquatting and replaced hyphens and underscores (for example, uploading a malevolent atlas-client in place of atlas_client) that executed a script that intercepted Bitcoin payments. (Ars Technica report) Bernhard M. Wiedemann launched ismypackagereproducibleyet.org, a service that takes a package name as input and displays whether the package is reproducible in a number of distributions. For example, it can quickly show the status of Perl as being reproducible on openSUSE but not in Debian. Bernhard also improved the documentation of his unreproducible package to add some example patches for hash issues. [ ]. There was a post on Chaos Computer Club s website listing Ten requirements for the evaluation of Contact Tracing apps in relation to the SARS-CoV-2 epidemic. In particular:
4. Transparency and verifiability: The complete source code for the app and infrastructure must be freely available without access restrictions to allow audits by all interested parties. Reproducible build techniques must be used to ensure that users can verify that the app they download has been built from the audited source code.
Elsewhere, Nicolas Boulenguez wrote a patch for the Ada programming language component of the GCC compiler to skip -f.*-prefix-map options when writing Ada Library Information files. Amongst other properties, these .ali files embed the compiler flags used at the time of the build which results in the absolute build path being recorded via -ffile-prefix-map, -fdebug-prefix-map, etc. In the Arch Linux project, kpcyrd reported that they held their first rebuilder workshop . The session was held on IRC and participants were provided a document with instructions on how to install and use Arch s repro tool. The meeting resulted in multiple people with no prior experience of Reproducible Builds validate their first package. Later in the month he also announced that it was now possible to run independent rebuilders under Arch in a hands-off, everything just works solution to distributed package verification. Mathias Lang submitted a pull request against dmd, the canonical compiler for the D programming languageto add support for our SOURCE_DATE_EPOCH environment variable as well the other C preprocessor tokens such __DATE__, __TIME__ and __TIMESTAMP__ which was subsequently merged. SOURCE_DATE_EPOCH defines a distribution-agnostic standard for build toolchains to consume and emit timestamps in situations where they are deemed to be necessary. [ ] The Telegram instant-messaging platform announced that they had updated to version 5.1.1 continuing their claim that they are reproducible according to their full instructions and therefore verifying that its original source code is exactly the same code that is used to build the versions available on the Apple App Store and Google Play distribution platforms respectfully. Lastly, Herv Boutemy reported that 97% of the current development versions of various Maven packages appear to have a reproducible build. [ ]

Distribution work In Debian this month, 89 reviews of Debian packages were added, 21 were updated and 33 were removed this month adding to our knowledge about identified issues. Many issue types were noticed, categorised and updated by Chris Lamb, including: In addition, Holger Levsen filed a feature request against debrebuild, a tool for rebuilding a Debian package given a .buildinfo file, proposing to add --standalone or --one-shot-mode functionality.
In openSUSE, Bernhard M. Wiedemann made the following changes: In Arch Linux, a rebuilder instance has been setup at reproducible.archlinux.org that is rebuilding Arch s [core] repository directly. The first rebuild has led to approximately 90% packages reproducible contrasting with 94% on the Reproducible Build s project own ArchLinux status page on tests.reproducible-builds.org that continiously builds packages and does not verify Arch Linux packages. More information may be found on the corresponding wiki page and the underlying decisions were explained on our mailing list.

Software development

diffoscope Chris Lamb made the following changes to diffoscope, the Reproducible Builds project s in-depth and content-aware diff utility that can locate and diagnose reproducibility issues (including preparing and uploading versions 139, 140, 141, 142 and 143 to Debian which were subsequently uploaded to the backports repository):
  • Comparison improvements:
    • Dalvik .dex files can also serve as APK containers so restrict the narrower identification of .dex files to files ending with this extension and widen the identification of APK files to when file(1) discovers a Dalvik file. (#28)
    • Add support for Hierarchical Data Format (HD5) files. (#95)
    • Add support for .p7c and .p7b certificates. (#94)
    • Strip paths from the output of zipinfo(1) warnings. (#97)
    • Don t uselessly include the JSON similarity percentage if it is 0.0% . [ ]
    • Render multi-line difference comments in a way to show indentation. (#101)
  • Testsuite improvements:
    • Add pdftotext as a requirement to run the PDF test_metadata text. (#99)
    • apktool 2.5.0 changed the handling of output of XML schemas so update and restrict the corresponding test to match. (#96)
    • Explicitly list python3-h5py in debian/tests/control.in to ensure that we have this module installed during a test run to generate the fixtures in these tests. [ ]
    • Correct parsing of ./setup.py test --pytest-args arguments. [ ]
  • Misc:
    • Capitalise Ordering differences only in text comparison comments. [ ]
    • Improve documentation of FILE_TYPE_HEADER_PREFIX and FALLBACK_FILE_TYPE_HEADER_PREFIX to highlight that only the first 16 bytes are used. [ ]
Michael Osipov created a well-researched merge request to return diffoscope to using zipinfo directly instead of piping input via /dev/stdin in order to ensure portability to the BSD operating system [ ]. In addition, Ben Hutchings documented how --exclude arguments are matched against filenames [ ] and Jelle van der Waa updated the LLVM test fixture difference for LLVM version 10 [ ] as well as adding a reference to the name of the h5dump tool in Arch Linux [ ]. Lastly, Mattia Rizzolo also fixed in incorrect build dependency [ ] and Vagrant Cascadian enabled diffoscope to locate the openssl and h5dump packages on GNU Guix [ ][ ], and updated diffoscope in GNU Guix to version 141 [ ] and 143 [ ].

strip-nondeterminism strip-nondeterminism is our tool to remove specific non-deterministic results from a completed build. In April, Chris Lamb made the following changes:
  • Add deprecation plans to all handlers documenting how or if they could be disabled and eventually removed, etc. (#3)
  • Normalise *.sym files as Java archives. (#15)
  • Add support for custom .zip filename filtering and exclude two patterns of files generated by Maven projects in fork mode. (#13)

disorderfs disorderfs is our FUSE-based filesystem that deliberately introduces non-determinism into directory system calls in order to flush out reproducibility issues. This month, Chris Lamb fixed a long-standing issue by not drop UNIX groups in FUSE multi-user mode when we are not root (#1) and uploaded version 0.5.9-1 to Debian unstable. Vagrant Cascadian subsequently refreshed disorderfs in GNU Guix to version 0.5.9 [ ].

Upstream patches The Reproducible Builds project detects, dissects and attempts to fix as many currently-unreproducible packages as possible. We endeavour to send all of our patches upstream where appropriate. This month, we wrote a large number of such patches, including: In addition, Bernhard informed the following projects that their packages are not reproducible:
  • acoular (report unknown non-determinism)
  • cri-o (report a date issue)
  • gnutls (report certtool being unable to extend certificates beyond 2049)
  • gnutls (report copyright year variation)
  • libxslt (report a bug about non-deterministic output from data corruption)
  • python-astropy (report a future build failure in 2021)

Project documentation This month, Chris Lamb made a large number of changes to our website and documentation in the following categories:
  • Community engagement improvements:
    • Update instructions to register for Salsa on our Contribute page now that the signup process has been overhauled. [ ]
    • Make it clearer that joining the rb-general mailing list is probably a first step for contributors to take. [ ]
    • Make our full contact information easier to find in the footer (#19) and improve text layout using bullets to separate sections [ ].
  • Accessibility:
    • To improve accessibility, make all links underlined. (#12)
    • Use an enhanced foreground/background contrast ratio of 7.04:1. (#11)
  • General improvements:
  • Internals:
    • Move to using jekyll-redirect-from over manual redirect pages [ ][ ] and add a redirect from /docs/buildinfo/ to /docs/recording/. (#23)
    • Limit the website self-check to not scan generated files [ ] and remove the old layout checker now that I have migrated all them [ ].
    • Move the news archive under the /news/ namespace [ ] and improve formatting of archived news links [ ].
    • Various improvements to the draft template generation. [ ][ ][ ][ ]
In addition, Holger Levsen clarified exactly which month we ceased to do weekly reports [ ] and Mattia Rizzolo adjusted the title style of an event page [ ]. Marcus Hoffman also started a discussion on our website s issue tracker asking for clarification on embedded signatures and Chris Lamb subsequently replied and asked Marcus to go ahead and propose a concrete change.

Testing framework We operate a large and many-featured Jenkins-based testing framework that powers tests.reproducible-builds.org that, amongst many other tasks, tracks the status of our reproducibility efforts as well as identifies any regressions that have been introduced.
  • Chris Lamb:
    • Print the build environment prior to executing a build. [ ]
    • Drop a misleading disorderfs-debug prefix in log output when we change non-disorderfs things in the file and, as it happens, do not run disorderfs at all. [ ]
    • The CSS for the package report pages added a margin to all <a> HTML elements under <li> ones, which was causing a comma/bullet spacing issue. [ ]
    • Tidy the copy in the project links sidebar. [ ]
  • Holger Levsen:
    • General:
    • Debian:
      • Reduce scheduling frequency of the buster distribution on the arm64 architecture, etc.. [ ][ ]
      • Show builds per day on a per-architecture basis for the last year on the Debian dashboard. [ ]
      • Drop the Subgraph OS package set as development halted in 2017 or 2018. [ ]
      • Update debrebuild to version from the latest version of devscripts. [ ][ ]
      • Add or improve various parts of the documentation. [ ][ ][ ]
    • Work on a Debian rebuilder:
      • Integrate sbuild. [ ][ ][ ][ ][ ]
      • Select a random .buildinfo file and attempt to build and compare the result. [ ][ ][ ][ ]
      • Improve output and related output formatting. [ ][ ][ ][ ][ ]
      • Outline next steps for the development of the tool. [ ][ ][ ]
      • Various refactoring and code improvements. [ ][ ][ ]
Lastly, Mattia Rizzolo fixed some log parsing code regarding potentially-harmless warnings from package installation [ ][ ] and the usual build node maintenance was performed by Holger Levsen [ ][ ][ ] and Mattia Rizzolo [ ][ ][ ].

Misc news On our mailing list this month, Santiago Torres asked whether we were still publishing releases of our tools to our website and Chris Lamb replied that this was not the case and fixed the issue. Later in the month Santiago also reported that the signature for the disorderfs package did not pass its GPG verification which was also fixed by Chris Lamb. Hans-Christoph Steiner of the Guardian Project asked whether there would be interest in making our website translatable which resulted in a WIP merge request being filed against the website and a discussion on how to track translation updates.
If you are interested in contributing to the Reproducible Builds project, please visit our Contribute page on our website. However, you can get in touch with us via:

This month s report was written by Bernhard M. Wiedemann, Chris Lamb, Daniel Shahaf, Holger Levsen, Jelle van der Waa, kpcyrd, Mattia Rizzolo and Vagrant Cascadian. It was subsequently reviewed by a bunch of Reproducible Builds folks on IRC and the mailing list.

Next.