Search Results: "uwe"

14 January 2024

Uwe Kleine-K nig: PGP Keysigning on FOSDEM'24

I'm going to FOSDEM'24. Assuming to meet Debian and Kernel folks there, this should be a good opportunity to do PGP keysigning. If you also go there and you're interested in keysigning: Send me your key via email to fosdem24-keysigning@kleine-koenig.org. I'll collect the keys, create a paper list for a keysigning party and send it back to you in the week before FOSDEM. The list will only be made available to other participants. Then maybe wear a "keysigning" badge (or a crepe tape with that caption) that allows us to identify others on the list. If there are not too many who are interested, I guess that should work fine. (If this idea becomes too successful, I'd close the list after the first 100 participants.) Update: Several people let me know there is another keysigning announced on Ludovic Hirlimann's blog. So you might want to bring some more paper slips with your fingerprint and show up there, too.

16 November 2023

Dimitri John Ledkov: Ubuntu 23.10 significantly reduces the installed kernel footprint


Photo by Pixabay
Ubuntu systems typically have up to 3 kernels installed, before they are auto-removed by apt on classic installs. Historically the installation was optimized for metered download size only. However, kernel size growth and usage no longer warrant such optimizations. During the 23.10 Mantic Minatour cycle, I led a coordinated effort across multiple teams to implement lots of optimizations that together achieved unprecedented install footprint improvements.

Given a typical install of 3 generic kernel ABIs in the default configuration on a regular-sized VM (2 CPU cores 8GB of RAM) the following metrics are achieved in Ubuntu 23.10 versus Ubuntu 22.04 LTS:

  • 2x less disk space used (1,417MB vs 2,940MB, including initrd)

  • 3x less peak RAM usage for the initrd boot (68MB vs 204MB)

  • 0.5x increase in download size (949MB vs 600MB)

  • 2.5x faster initrd generation (4.5s vs 11.3s)

  • approximately the same total time (103s vs 98s, hardware dependent)


For minimal cloud images that do not install either linux-firmware or modules extra the numbers are:

  • 1.3x less disk space used (548MB vs 742MB)

  • 2.2x less peak RAM usage for initrd boot (27MB vs 62MB)

  • 0.4x increase in download size (207MB vs 146MB)


Hopefully, the compromise of download size, relative to the disk space & initrd savings is a win for the majority of platforms and use cases. For users on extremely expensive and metered connections, the likely best saving is to receive air-gapped updates or skip updates.

This was achieved by precompressing kernel modules & firmware files with the maximum level of Zstd compression at package build time; making actual .deb files uncompressed; assembling the initrd using split cpio archives - uncompressed for the pre-compressed files, whilst compressing only the userspace portions of the initrd; enabling in-kernel module decompression support with matching kmod; fixing bugs in all of the above, and landing all of these things in time for the feature freeze. Whilst leveraging the experience and some of the design choices implementations we have already been shipping on Ubuntu Core. Some of these changes are backported to Jammy, but only enough to support smooth upgrades to Mantic and later. Complete gains are only possible to experience on Mantic and later.

The discovered bugs in kernel module loading code likely affect systems that use LoadPin LSM with kernel space module uncompression as used on ChromeOS systems. Hopefully, Kees Cook or other ChromeOS developers pick up the kernel fixes from the stable trees. Or you know, just use Ubuntu kernels as they do get fixes and features like these first.

The team that designed and delivered these changes is large: Benjamin Drung, Andrea Righi, Juerg Haefliger, Julian Andres Klode, Steve Langasek, Michael Hudson-Doyle, Robert Kratky, Adrien Nader, Tim Gardner, Roxana Nicolescu - and myself Dimitri John Ledkov ensuring the most optimal solution is implemented, everything lands on time, and even implementing portions of the final solution.

Hi, It's me, I am a Staff Engineer at Canonical and we are hiring https://canonical.com/careers.

Lots of additional technical details and benchmarks on a huge range of diverse hardware and architectures, and bikeshedding all the things below:

For questions and comments please post to Kernel section on Ubuntu Discourse.



10 September 2023

Freexian Collaborators: Debian Contributions: /usr-merge updates, Salsa CI progress, DebConf23 lead-up, and more! (by Utkarsh Gupta)

Contributing to Debian is part of Freexian s mission. This article covers the latest achievements of Freexian and their collaborators. All of this is made possible by organizations subscribing to our Long Term Support contracts and consulting services.

/usr-merge work, by Helmut Grohne, et al. Given that we now have consensus on moving forward by moving aliased files from / to /usr, we will also run into the problems that the file move moratorium was meant to prevent. The way forward is detecting them early and applying workarounds on a per-package basis. Said detection is now automated using the Debian Usr Merge Analysis Tool. As problems are reported to the bug tracking system, they are connected to the reports if properly usertagged. Bugs and patches for problem categories DEP17-P2 and DEP17-P6 have been filed. After consensus has been reached on the bootstrapping matters, debootstrap has been changed to swap the initial unpack and merging to avoid unpack errors due to pre-existing links. This is a precondition for having base-files install the aliasing symbolic links eventually. It was identified that the root filesystem used by the Debian installer is still unmerged and a change has been proposed. debhelper was changed to recognize systemd units installed to /usr. A discussion with the CTTE and release team on repealing the moratorium has been initiated.

Salsa CI work, by Santiago Ruano Rinc n August was a busy month in the Salsa CI world. Santiago reviewed and merged a bunch of MRs that have improved the project in different aspects: The aptly job got two MRs from Philip Hands. With the first one, the aptly now can export a couple of variables in a dotenv file, and with the second, it can include packages from multiple artifact directories. These MRs bring the base to improve how to test reverse dependencies with Salsa CI. Santiago is working on documenting this. As a result of the mass bug filing done in August, Salsa CI now includes a job to test how a package builds twice in a row. Thanks to the MRs of Sebastiaan Couwenberg and Johannes Schauer Marin Rodrigues. Last but not least, Santiago helped Johannes Schauer Marin Rodrigues to complete the support for arm64-only pipelines.

DebConf23 lead-up, by Stefano Rivera Stefano wears a few hats in the DebConf organization and in the lead up to the conference in mid-September, they ve all been quite busy. As one of the treasurers of DebConf 23, there has been a final budget update, and quite a few payments to coordinate from Debian s Trusted Organizations. We try to close the books from the previous conference at the next one, so a push was made to get DebConf 22 account statements out of TOs and record them in the conference ledger. As a website developer, we had a number of registration-related tasks, emailing attendees and trying to estimate numbers for food and accommodation. As a conference committee member, the job was mostly taking calls and helping the local team to make decisions on urgent issues. For example, getting conference visas issued to attendees required getting political approval from the Indian government. We only discovered the full process for this too late to clear some complex cases, so this required some hard calls on skipping some countries from the application list, allowing everyone else to get visas in time. Unfortunate, but necessary.

Miscellaneous contributions
  • Rapha l Hertzog updated gnome-shell-extension-hamster to a new upstream git snapshot that is compatible with GNOME Shell 44 that was recently uploaded to Debian unstable/testing. This extension makes it easy to start/stop tracking time with Hamster Time Tracker. Very handy for consultants like us who are billing their work per hour.
  • Rapha l also updated zim to the latest upstream release (0.74.2). This is a desktop wiki that can be very useful as a note-taking tool to build your own personal knowledge base or even to manage your personal todo lists.
  • Utkarsh reviewed and sponsored some uploads from mentors.debian.net.
  • Utkarsh helped the local team and the bursary team with some more DebConf activities and helped finalize the data.
  • Thorsten tried to update package hplip. Unfortunately upstream added some new compressed files that need to appear uncompressed in the package. Even though this sounded like an easy task, which seemed to be already implemented in the current debian/rules, the new type of files broke this implementation and made the package no longer buildable. The problem has been solved and the upload will happen soon.
  • Helmut sent 7 patches for cross build failures. Since dpkg-buildflags now defaults to issue arm64-specific compiler flags, more care is needed to distinguish between build architecture flags and host architecture flags than previously.
  • Stefano pushed the final bit of the tox 4 transition over the line in Debian, allowing dh-python and tox 4 to migrate to testing. We got caught up in a few unusual bugs in tox and the way we run it in Debian package building (which had to change with tox 4). This resulted in a couple of patches upstream.
  • Stefano visited Haifa, Israel, to see the proposed DebConf 24 venue and meet with the local team. While the venue isn t committed yet, we have high hopes for it.

4 August 2023

Reproducible Builds: Reproducible Builds in July 2023

Welcome to the July 2023 report from the Reproducible Builds project. In our reports, we try to outline the most important things that we have been up to over the past month. As ever, if you are interested in contributing to the project, please visit the Contribute page on our website.
Marcel Fourn et al. presented at the IEEE Symposium on Security and Privacy in San Francisco, CA on The Importance and Challenges of Reproducible Builds for Software Supply Chain Security. As summarised in last month s report, the abstract of their paper begins:
The 2020 Solarwinds attack was a tipping point that caused a heightened awareness about the security of the software supply chain and in particular the large amount of trust placed in build systems. Reproducible Builds (R-Bs) provide a strong foundation to build defenses for arbitrary attacks against build systems by ensuring that given the same source code, build environment, and build instructions, bitwise-identical artifacts are created. (PDF)

Chris Lamb published an interview with Simon Butler, associate senior lecturer in the School of Informatics at the University of Sk vde, on the business adoption of Reproducible Builds. (This is actually the seventh instalment in a series featuring the projects, companies and individuals who support our project. We started this series by featuring the Civil Infrastructure Platform project, and followed this up with a post about the Ford Foundation as well as recent ones about ARDC, the Google Open Source Security Team (GOSST), Bootstrappable Builds, the F-Droid project and David A. Wheeler.) Vagrant Cascadian presented Breaking the Chains of Trusting Trust at FOSSY 2023.
Rahul Bajaj has been working with Roland Clobus on merging an overview of environment variations to our website:
I have identified 16 root causes for unreproducible builds in my empirical study, which I have linked to the corresponding documentation. The initial MR right now contains information about 10 root causes. For each root cause, I have provided a definition, a notable instance, and a workaround. However, I have only found workarounds for 5 out of the 10 root causes listed in this merge request. In the upcoming commits, I plan to add an additional 6 root causes. I kindly request you review the text for any necessary refinements, modifications, or corrections. Additionally, I would appreciate the help with documentation for the solutions/workarounds for the remaining root causes: Archive Metadata, Build ID, File System Ordering, File Permissions, and Snippet Encoding. Your input on the identified root causes for unreproducible builds would be greatly appreciated. [ ]

Just a reminder that our upcoming Reproducible Builds Summit is set to take place from October 31st November 2nd 2023 in Hamburg, Germany. Our summits are a unique gathering that brings together attendees from diverse projects, united by a shared vision of advancing the Reproducible Builds effort. During this enriching event, participants will have the opportunity to engage in discussions, establish connections and exchange ideas to drive progress in this vital field. If you re interested in joining us this year, please make sure to read the event page which has more details about the event and location.
There was more progress towards making the Go programming language ecosystem reproducible this month, including: In addition, kpcyrd posted to our mailing list to report that:
while packaging govulncheck for Arch Linux I noticed a checksum mismatch for a tar file I downloaded from go.googlesource.com. I used diffoscope to compare the .tar file I downloaded with the .tar file the build server downloaded, and noticed the timestamps are different.

In Debian, 20 reviews of Debian packages were added, 25 were updated and 25 were removed this month adding to our knowledge about identified issues. A number of issue types were updated, including marking ffile_prefix_map_passed_to_clang being fixed since Debian bullseye [ ] and adding a Debian bug tracker reference for the nondeterminism_added_by_pyqt5_pyrcc5 issue [ ]. In addition, Roland Clobus posted another detailed update of the status of reproducible Debian ISO images on our mailing list. In particular, Roland helpfully summarised that live images are looking good, and the number of (passing) automated tests is growing .
Bernhard M. Wiedemann published another monthly report about reproducibility within openSUSE.
F-Droid added 20 new reproducible apps in July, making 165 apps in total that are published with Reproducible Builds and using the upstream developer s signature. [ ]
The Sphinx documentation tool recently accepted a change to improve deterministic reproducibility of documentation. It s internal util.inspect.object_description attempts to sort collections, but this can fail. The change handles the failure case by using string-based object descriptions as a fallback deterministic sort ordering, as well as adding recursive object-description calls for list and tuple datatypes. As a result, documentation generated by Sphinx will be more likely to be automatically reproducible. Lastly in news, kpcyrd posted to our mailing list announcing a new repro-env tool:
My initial interest in reproducible builds was how do I distribute pre-compiled binaries on GitHub without people raising security concerns about them . I ve cycled back to this original problem about 5 years later and built a tool that is meant to address this. [ ]

Upstream patches The Reproducible Builds project detects, dissects and attempts to fix as many currently-unreproducible packages as possible. We endeavour to send all of our patches upstream where appropriate. This month, we wrote a large number of such patches, including:
In diffoscope development this month, versions 244, 245 and 246 were uploaded to Debian unstable by Chris Lamb, who also made the following changes:
  • Don t include the file size in image metadata. It is, at best, distracting, and it is already in the directory metadata. [ ]
  • Add compatibility with libarchive-5. [ ]
  • Mark that the test_dex::test_javap_14_differences test requires the procyon tool. [ ]
  • Initial work on DOS/MBR extraction. [ ]
  • Move to using assert_diff in the .ico and .jpeg tests. [ ]
  • Temporarily mark some Android-related as XFAIL due to Debian bugs #1040941 & #1040916. [ ]
  • Fix the test skipped reason generation in the case of a version outside of the required range. [ ]
  • Update copyright years. [ ][ ]
  • Fix try.diffoscope.org. [ ]
In addition, Gianfranco Costamagna added support for LLVM version 16. [ ]

Testing framework The Reproducible Builds project operates a comprehensive testing framework (available at tests.reproducible-builds.org) in order to check packages and other artifacts for reproducibility. In July, a number of changes were made by Holger Levsen:
  • General changes:
    • Upgrade Jenkins host to Debian bookworm now that Debian 12.1 is out. [ ][ ][ ][ ]
    • djm: improve UX when rebooting a node fails. [ ]
    • djm: reduce wait time between rebooting nodes. [ ]
  • Debian-related changes:
    • Various refactoring of the Debian scheduler. [ ][ ][ ]
    • Make Debian live builds more robust with respect to salsa.debian.org returning HTTP 502 errors. [ ][ ]
    • Use the legacy SCP protocol instead of the SFTP protocol when transfering Debian live builds. [ ][ ]
    • Speed up a number of database queries thanks, Myon! [ ][ ][ ][ ][ ]
    • Split create_meta_pkg_sets job into two (for Debian unstable and Debian testing) to half the job runtime to approximately 90 minutes. [ ][ ]
    • Split scheduler job into four separate jobs, one for each tested architecture. [ ][ ]
    • Treat more PostgreSQL errors as serious (for some jobs). [ ]
    • Re-enable automatic database documentation now that postgresql_autodoc is back in Debian bookworm. [ ]
    • Remove various hardcoding of Debian release names. [ ]
    • Drop some i386 special casing. [ ]
  • Other distributions:
    • Speed up Alpine SQL queries. [ ]
    • Adjust CSS layout for Arch Linux pages to match 3 and not 4 repos being tested. [ ]
    • Drop the community Arch Linux repo as it has now been merged into the extra repo. [ ]
    • Speed up a number of Arch-related database queries. [ ]
    • Try harder to properly cleanup after building OpenWrt packages. [ ]
    • Drop all kfreebsd-related tests now that it s officially dead. [ ]
  • System health:
    • Always ignore some well-known harmless orphan processes. [ ][ ][ ]
    • Detect another case of job failure due to Jenkins shutdown. [ ]
    • Show all non co-installable package sets on the status page. [ ]
    • Warn that some specific reboot nodes are currently false-positives. [ ]
  • Node health checks:
    • Run system and node health checks for Jenkins less frequently. [ ]
    • Try to restart any failed dpkg-db-backup [ ] and munin-node services [ ].
In addition, Vagrant Cascadian updated the paths in our automated to tests to use the same paths used by the official Debian build servers. [ ]

If you are interested in contributing to the Reproducible Builds project, please visit our Contribute page on our website. However, you can get in touch with us via:

1 August 2023

Reproducible Builds: Supporter spotlight: Simon Butler on business adoption of Reproducible Builds

The Reproducible Builds project relies on several projects, supporters and sponsors for financial support, but they are also valued as ambassadors who spread the word about our project and the work that we do. This is the seventh instalment in a series featuring the projects, companies and individuals who support the Reproducible Builds project. We started this series by featuring the Civil Infrastructure Platform project, and followed this up with a post about the Ford Foundation as well as recent ones about ARDC, the Google Open Source Security Team (GOSST), Bootstrappable Builds, the F-Droid project and David A. Wheeler. Today, however, we will be talking with Simon Butler, an associate senior lecturer in the School of Informatics at the University of Sk vde, where he undertakes research in software engineering that focuses on IoT and open source software, and contributes to the teaching of computer science to undergraduates.

Chris: For those who have not heard of it before, can you tell us more about the School of Informatics at Sk vde University? Simon: Certainly, but I may be a little long-winded. Sk vde is a city in the area between the two large lakes in southern Sweden. The city is a busy place. Sk vde is home to the regional hospital, some of Volvo s manufacturing facilities, two regiments of the Swedish defence force, a lot of businesses in the Swedish computer games industry, other tech companies and more. The University of Sk vde is relatively small. Sweden s large land area and low population density mean that regional centres such as Sk vde are important and local universities support businesses by training new staff and supporting innovation. The School of Informatics has two divisions. One focuses on teaching and researching computer games. The other division encompasses a wider range of teaching and research, including computer science, web development, computer security, network administration, data science and so on.
Chris: You recently had a open-access paper published in Software Quality Journal. Could you tell us a little bit more about it and perhaps briefly summarise its key findings? Simon: The paper is one output of a collaborative research project with six Swedish businesses that use open source software. There are two parts to the paper. The first consists of an analysis of what the group of businesses in the project know about Reproducible Builds (R-Bs), their experiences with R-Bs and their perception of the value of R-Bs to the businesses. The second part is an interview study with business practitioners and others with experience and expertise in R-Bs. We set out to try to understand the extent to which software-intensive businesses were aware of R-Bs, the technical and business reasons they were or were not using R-Bs and to document the business and technical use cases for R-Bs. The key findings were that businesses are aware of R-Bs, and some are using R-Bs as part of their day-to-day development process. Some of the uses for R-Bs we found were not previously documented. We also found that businesses understood the value R-Bs have as part of engineering and software quality processes. They are also aware of the costs of implementing R-Bs and that R-Bs are an intangible value proposition - in other words, businesses can add value through process improvement by using R-Bs. But, that, currently at least, R-Bs are not a selling point for software or products.
Chris: You performed a large number of interviews in order to prepare your paper. What was the most surprising response to you? Simon: Most surprising is a good question. Everybody I spoke to brought something new to my understanding of R-Bs, and many responses surprised me. The interviewees that surprised me most were I01 and I02 (interviews were anonymised and interviewees were assigned numeric identities). I02 described the sceptical perspective that there is a viable, pragmatic alternative to R-Bs - verifiable builds - which I was aware of before undertaking the research. The company had developed a sufficiently robust system for their needs and worked well. With a large archive of software used in production, they couldn t justify the cost of retrofitting a different solution that might only offer small advantages over the existing system. Doesn t really sound too surprising, but the interview was one of the first I did on this topic, and I was very focused on the value of, and need for, trust in a system that motivated the R-B. The solution used by the company requires trust, but they seem to have established sufficient trust for their needs by securing their build systems to the extent that they are more or less tamper-proof. The other big surprise for me was I01 s use of R-Bs to support the verification of system configuration in a system with multiple embedded components at boot time. It s such an obvious application of R-Bs, and exactly the kind of response I hoped to get from interviewees. However, it is another instance of a solution where trust is only one factor. In the first instance, the developer is using R-Bs to establish trust in the toolchain. There is also the second application that the developer can use a set of R-Bs to establish that deployed system consists of compatible components. While this might not sound too significant, there appear to be some important potential applications. One that came to mind immediately is a problem with firmware updates on nodes in IoT systems where the node needs to update quickly with limited downtime and without failure. The node also needs to be able to roll back any update proposed by a server if there are conflicts with the current configuration or if any tests on the node fail. Perhaps the chances of failure could be reduced, if a node can instead negotiate with a server to determine a safe path to migrate from its current configuration to a working configuration with the upgraded components the central system requires? Another potential application appears to be in the configuration management of AI systems, where decisions need to be explainable. A means of specifying validated configurations of training data, models and deployed systems might, perhaps, be leveraged to prevent invalid or broken configurations from being deployed in production.
Chris: One of your findings was that reproducible builds were perceived to be good engineering practice . To what extent do you believe cultural forces affect the adoption or rejection of a given technology or practice? Simon: To a large extent. People s decisions are informed by cultural norms, and business decisions are made by people acting collectively. Of course, decision-making, including assessments of risk and usefulness, is mediated by individual positions on the continuum from conformity to non-conformity, as well as individual and in-group norms. Whether a business will consider a given technology for adoption will depend on cultural forces. The decision to adopt may well be made on the grounds of cost and benefits.
Chris: Another conclusion implied by your research is that businesses are often dealing with software deployment lifespans (eg. 20+ years) that differ from widely from those of the typical hobbyist programmer. To what degree do you think this temporal mismatch is a problem for both groups? Simon: This is a fascinating question. Long-term software maintenance is a requirement in some industries because of the working lifespans of the products and legal requirements to maintain the products for a fixed period. For some other industries, it is less of a problem. Consequently, I would tend to divide developers into those who have been exposed to long-term maintenance problems and those who have not. Although, more professional than hobbyist developers will have been exposed to the problem. Nonetheless, there are areas, such as music software, where there are also long-term maintenance challenges for data formats and software.
Chris: Based on your research, what would you say are the biggest blockers for the adoption of reproducible builds within business ? And, based on this, would you have any advice or recommendations for the broader reproducible builds ecosystem? Simon: From the research, the main blocker appears to be cost. Not an absolute cost, but there is an overhead to introducing R-Bs. Businesses (and thus business managers) need to understand the business case for R-Bs. Making decision-makers in businesses aware of R-Bs and that they are valuable will take time. Advocacy at multiple levels appears to be the way forward and this is being done. I would recommend being persistent while being patient and to keep talking about reproducible builds. The work done in Linux distributions raises awareness of R-Bs amongst developers. Guix, NixOS and Software Heritage are all providing practical solutions and getting attention - I ve been seeing progressively more mentions of all three during the last couple of years. Increased awareness amongst developers should lead to more interest within companies. There is also research money being assigned to supply chain security and R-B s. The CHAINS project at KTH in Stockholm is one example of a strategic research project. There may be others that I m not aware of. The policy-level advocacy is slowly getting results in some countries, and where CISA leads, others may follow.
Chris: Was there a particular reason you alighted on the question of the adoption of reproducible builds in business? Do you think there s any truth behind the shopworn stereotype of hacker types neglecting the resources that business might be able to offer? Simon: Much of the motivation for the research came from the contrast between the visibility of R-Bs in open source projects and the relative invisibility of R-Bs in industry. Where companies are known to be using R-Bs (e.g. Google, etc.) there is no fuss, no hype. They were not selling R-Bs as a solution; instead the documentation is very matter-of-fact that R-Bs are part of a customer-facing process in their cloud solutions. An obvious question for me was that if some people use R-B s in software development, why doesn t everybody? There are limits to the tooling for some programming languages that mean R-Bs are difficult or impossible. But where creating an R-B is practical, why are they not used more widely? So, to your second question. There is another factor, which seems to be more about a lack of communication rather than neglecting opportunities. Businesses may not always be willing to discuss their development processes and innovations. Though I do think the increasing number of conferences (big and small) for software practitioners is helping to facilitate more communication and greater exchange of ideas.
Chris: Has your personal view of reproducible builds changed since before you embarked on writing this paper? Simon: Absolutely! In the early stages of the research, I was interested in questions of trust and how R-Bs were applied to resolve build and supply chain security problems. As the research developed, however, I started to see there were benefits to the use of R-Bs that were less obvious and that, in some cases, an R-B can have more than a single application.
Chris: Finally, do you have any plans to do future research touching on reproducible builds? Simon: Yes, definitely. There are a set of problems that interest me. One already mentioned is the use of reproducible builds with AI systems. Interpretable or explainable AI (XAI) is a necessity, and I think that R-Bs can be used to support traceability in the configuration and testing of both deployed systems and systems used during model training and evaluation. I would also like to return to a problem discussed briefly in the article, which is to develop a deeper understanding of the elements involved in the application of R-Bs that can be used to support reasoning about existing and potential applications of R-Bs. For example, R-Bs can be used to establish trust for different groups of individuals at different times, say, between remote developers prior to the release of software and by users after release. One question is whether when an R-B is used might be a significant factor. Another group of questions concerns the ways in which trust (of some sort) propagates among users of an R-B. There is an example in the paper of a company that rebuilds Debian reproducibly for security reasons and is then able to collaborate on software projects where software is built reproducibly with other companies that use public distributions of Debian.
Chris: Many thanks for this interview, Simon. If someone wanted to get in touch or learn more about you and your colleagues at the School of Informatics, where might they go? Thank you for the opportunity. It has been a pleasure to reflect a little more widely on the research! Personally, you can find out about my work on my official homepage and on my personal site. The software systems research group (SSRG) has a website, and the University of Sk vde s English language pages are also available. Chris: Many thanks for this interview, Simon!


For more information about the Reproducible Builds project, please see our website at reproducible-builds.org. If you are interested in ensuring the ongoing security of the software that underpins our civilisation and wish to sponsor the Reproducible Builds project, please reach out to the project by emailing contact@reproducible-builds.org.

29 June 2023

C.J. Collier: Converting a windows install to a libvirt VM

Reduce the size of your c: partition to the smallest it can be and then turn off windows with the understanding that you will never boot this system on the iron ever again.
Boot into a netinst installer image (no GUI). hold alt and press left arrow a few times until you get to a prompt to press enter. Press enter. In this example /dev/sda is your windows disk which contains the c: partition
and /dev/disk/by-id/usb0 is the USB-3 attached SATA controller that you have your SSD attached to (please find an example attached). This SSD should be equal to or larger than the windows disk for best compatability. A photo of a USB-3 attached SATA controller To find the literal path names of your detected drives you can run fdisk -l. Pay attention to the names of the partitions and the sizes of the drives to help determine which is which. Once you have a shell in the netinst installer, you should maybe be able to run a command like the following. This will duplicate the disk located at if (in file) to the disk located at of (out file) while showing progress as the status.
dd if=/dev/sda of=/dev/disk/by-id/usb0 status=progress
If you confirm that dd is available on the netinst image and the previous command runs successfully, test that your windows partition is visible in the new disk s partition table. The start block of the windows partition on each should match, as should the partition size.
fdisk -l /dev/disk/by-id/usb0
fdisk -l /dev/sda
If the output from the first is the same as the output from the second, then you are probably safe to proceed. Once you confirm that you have made and tested a full copy of the blocks from your windows drive saved on your usb disk, nuke your windows partition table from orbit.
dd if=/dev/zero of=/dev/sda bs=1M count=42
You can press alt-f1 to return to the Debian installer now. Follow the instructions to install Debian. Don t forget to remove all attached USB drives. Once you install Debian, press ctrl-alt-f3 to get a root shell. Add your user to the sudoers group:
# adduser cjac sudoers
log out
# exit
log in as your user and confirm that you have sudo
$ sudo ls
Don t forget to read the spider man advice enter your password you ll need to install virt-manager. I think this should help:
$ sudo apt-get install virt-manager libvirt-daemon-driver-qemu qemu-system-x86
insert the USB drive. You can now create a qcow2 file for your virtual machine.
$ sudo qemu-img convert -O qcow2 \
/dev/disk/by-id/usb0 \
/var/lib/libvirt/images/windows.qcow2
I personally create a volume group called /dev/vg00 for the stuff I want to run raw and instead of converting to qcow2 like all of the other users do, I instead write it to a new logical volume.
sudo lvcreate /dev/vg00 -n windows -L 42G # or however large your drive was
sudo dd if=/dev/disk/by-id/usb0 of=/dev/vg00/windows status=progress
Now that you ve got the qcow2 file created, press alt-left until you return to your GDM session. The apt-get install command above installed virt-manager, so log in to your system if you haven t already and open up gnome-terminal by pressing the windows key or moving your mouse/gesture to the top left of your screen. Type in gnome-terminal and either press enter or click/tap on the icon. I like to run this full screen so that I feel like I m in a space ship. If you like to feel like you re in a spaceship, too, press F11. You can start virt-manager from this shell or you can press the windows key and type in virt-manager and press enter. You ll want the shell to run commands such as virsh console windows or virsh list When virt-manager starts, right click on QEMU/KVM and select New.
In the New VM window, select Import existing disk image
When prompted for the path to the image, use the one we created with sudo qemu-img convert above.
Select the version of Windows you want.
Select memory and CPUs to allocate to the VM.
Tick the Customize configuration before install box
If you re prompted to enable the default network, do so now.
The default hardware layout should probably suffice. Get it as close to the underlying hardware as it is convenient to do. But Windows is pretty lenient these days about virtualizing licensed windows instances so long as they re not running in more than one place at a time. Good luck! Leave comments if you have questions.

6 May 2023

Reproducible Builds: Reproducible Builds in April 2023

Welcome to the April 2023 report from the Reproducible Builds project! In these reports we outline the most important things that we have been up to over the past month. And, as always, if you are interested in contributing to the project, please visit our Contribute page on our website.

General news Trisquel is a fully-free operating system building on the work of Ubuntu Linux. This month, Simon Josefsson published an article on his blog titled Trisquel is 42% Reproducible!. Simon wrote:
The absolute number may not be impressive, but what I hope is at least a useful contribution is that there actually is a number on how much of Trisquel is reproducible. Hopefully this will inspire others to help improve the actual metric.
Simon wrote another blog post this month on a new tool to ensure that updates to Linux distribution archive metadata (eg. via apt-get update) will only use files that have been recorded in a globally immutable and tamper-resistant ledger. A similar solution exists for Arch Linux (called pacman-bintrans) which was announced in August 2021 where an archive of all issued signatures is publically accessible.
Joachim Breitner wrote an in-depth blog post on a bootstrap-capable GHC, the primary compiler for the Haskell programming language. As a quick background to what this is trying to solve, in order to generate a fully trustworthy compile chain, trustworthy root binaries are needed and a popular approach to address this problem is called bootstrappable builds where the core idea is to address previously-circular build dependencies by creating a new dependency path using simpler prerequisite versions of software. Joachim takes an somewhat recursive approach to the problem for Haskell, leading to the inadvertently humourous question: Can I turn all of GHC into one module, and compile that? Elsewhere in the world of bootstrapping, Janneke Nieuwenhuizen and Ludovic Court s wrote a blog post on the GNU Guix blog announcing The Full-Source Bootstrap, specifically:
[ ] the third reduction of the Guix bootstrap binaries has now been merged in the main branch of Guix! If you run guix pull today, you get a package graph of more than 22,000 nodes rooted in a 357-byte program something that had never been achieved, to our knowledge, since the birth of Unix.
More info about this change is available on the post itself, including:
The full-source bootstrap was once deemed impossible. Yet, here we are, building the foundations of a GNU/Linux distro entirely from source, a long way towards the ideal that the Guix project has been aiming for from the start. There are still some daunting tasks ahead. For example, what about the Linux kernel? The good news is that the bootstrappable community has grown a lot, from two people six years ago there are now around 100 people in the #bootstrappable IRC channel.

Michael Ablassmeier created a script called pypidiff as they were looking for a way to track differences between packages published on PyPI. According to Micahel, pypidiff uses diffoscope to create reports on the published releases and automatically pushes them to a GitHub repository. This can be seen on the pypi-diff GitHub page (example).
Eleuther AI, a non-profit AI research group, recently unveiled Pythia, a collection of 16 Large Language Model (LLMs) trained on public data in the same order designed specifically to facilitate scientific research. According to a post on MarkTechPost:
Pythia is the only publicly available model suite that includes models that were trained on the same data in the same order [and] all the corresponding data and tools to download and replicate the exact training process are publicly released to facilitate further research.
These properties are intended to allow researchers to understand how gender bias (etc.) can affected by training data and model scale.
Back in February s report we reported on a series of changes to the Sphinx documentation generator that was initiated after attempts to get the alembic Debian package to build reproducibly. Although Chris Lamb was able to identify the source problem and provided a potential patch that might fix it, James Addison has taken the issue in hand, leading to a large amount of activity resulting in a proposed pull request that is waiting to be merged.
WireGuard is a popular Virtual Private Network (VPN) service that aims to be faster, simpler and leaner than other solutions to create secure connections between computing devices. According to a post on the WireGuard developer mailing list, the WireGuard Android app can now be built reproducibly so that its contents can be publicly verified. According to the post by Jason A. Donenfeld, the F-Droid project now does this verification by comparing their build of WireGuard to the build that the WireGuard project publishes. When they match, the new version becomes available. This is very positive news.
Author and public speaker, V. M. Brasseur published a sample chapter from her upcoming book on corporate open source strategy which is the topic of Software Bill of Materials (SBOM):
A software bill of materials (SBOM) is defined as a nested inventory for software, a list of ingredients that make up software components. When you receive a physical delivery of some sort, the bill of materials tells you what s inside the box. Similarly, when you use software created outside of your organisation, the SBOM tells you what s inside that software. The SBOM is a file that declares the software supply chain (SSC) for that specific piece of software. [ ]

Several distributions noticed recent versions of the Linux Kernel are no longer reproducible because the BPF Type Format (BTF) metadata is not generated in a deterministic way. This was discussed on the #reproducible-builds IRC channel, but no solution appears to be in sight for now.

Community news On our mailing list this month: Holger Levsen gave a talk at foss-north 2023 in Gothenburg, Sweden on the topic of Reproducible Builds, the first ten years. Lastly, there were a number of updates to our website, including:
  • Chris Lamb attempted a number of ways to try and fix literal : .lead appearing in the page [ ][ ][ ], made all the Back to who is involved links italics [ ], and corrected the syntax of the _data/sponsors.yml file [ ].
  • Holger Levsen added his recent talk [ ], added Simon Josefsson, Mike Perry and Seth Schoen to the contributors page [ ][ ][ ], reworked the People page a little [ ] [ ], as well as fixed spelling of Arch Linux [ ].
Lastly, Mattia Rizzolo moved some old sponsors to a former section [ ] and Simon Josefsson added Trisquel GNU/Linux. [ ]

Debian
  • Vagrant Cascadian reported on the Debian s build-essential package set, which was inspired by how close we are to making the Debian build-essential set reproducible and how important that set of packages are in general . Vagrant mentioned that: I have some progress, some hope, and I daresay, some fears . [ ]
  • Debian Developer Cyril Brulebois (kibi) filed a bug against snapshot.debian.org after they noticed that there are many missing dinstalls that is to say, the snapshot service is not capturing 100% of all of historical states of the Debian archive. This is relevant to reproducibility because without the availability historical versions, it is becomes impossible to repeat a build at a future date in order to correlate checksums. .
  • 20 reviews of Debian packages were added, 21 were updated and 5 were removed this month adding to our knowledge about identified issues. Chris Lamb added a new build_path_in_line_annotations_added_by_ruby_ragel toolchain issue. [ ]
  • Mattia Rizzolo announced that the data for the stretch archive on tests.reproducible-builds.org has been archived. This matches the archival of stretch within Debian itself. This is of some historical interest, as stretch was the first Debian release regularly tested by the Reproducible Builds project.

Upstream patches The Reproducible Builds project detects, dissects and attempts to fix as many currently-unreproducible packages as possible. We endeavour to send all of our patches upstream where appropriate. This month, we wrote a large number of such patches, including:

diffoscope development diffoscope version 241 was uploaded to Debian unstable by Chris Lamb. It included contributions already covered in previous months as well a change by Chris Lamb to add a missing raise statement that was accidentally dropped in a previous commit. [ ]

Testing framework The Reproducible Builds project operates a comprehensive testing framework (available at tests.reproducible-builds.org) in order to check packages and other artifacts for reproducibility. In April, a number of changes were made, including:
  • Holger Levsen:
    • Significant work on a new Documented Jenkins Maintenance (djm) script to support logged maintenance of nodes, etc. [ ][ ][ ][ ][ ][ ]
    • Add the new APT repo url for Jenkins itself with a new signing key. [ ][ ]
    • In the Jenkins shell monitor, allow 40 GiB of files for diffoscope for the Debian experimental distribution as Debian is frozen around the release at the moment. [ ]
    • Updated Arch Linux testing to cleanup leftover files left in /tmp/archlinux-ci/ after three days. [ ][ ][ ]
    • Mark a number of nodes hosted by Oregon State University Open Source Lab (OSUOSL) as online and offline. [ ][ ][ ]
    • Update the node health checks to detect failures to end schroot sessions. [ ]
    • Filter out another duplicate contributor from the contributor statistics. [ ]
  • Mattia Rizzolo:



If you are interested in contributing to the Reproducible Builds project, please visit our Contribute page on our website. However, you can get in touch with us via:

22 January 2023

Dirk Eddelbuettel: Rcpp 1.0.10 on CRAN: Regular Update

rcpp logo The Rcpp team is thrilled to announce the newest release 1.0.10 of the Rcpp package which is hitting CRAN now and will go to Debian shortly. Windows and macOS builds should appear at CRAN in the next few days, as will builds in different Linux distribution and of course at r2u. The release was prepared a few days ago, but given the widespread use at CRAN it took a few days to be processed. As always, our sincere thanks to the CRAN maintainers Uwe Ligges and Kurt Hornik. This release continues with the six-months cycle started with release 1.0.5 in July 2020. As a reminder, we do of course make interim snapshot dev or rc releases available via the Rcpp drat repo and strongly encourage their use and testing I run my systems with these versions which tend to work just as well, and are also fully tested against all reverse-dependencies. Rcpp has become the most popular way of enhancing R with C or C++ code. Right now, around 2623 packages on CRAN depend on Rcpp for making analytical code go faster and further, along with 252 in BioConductor. On CRAN, 13.7% of all packages depend (directly) on CRAN, and 58.7% of all compiled packages do. From the cloud mirror of CRAN (which is but a subset of all CRAN downloads), Rcpp has been downloaded 67.1 million times. This release is incremental as usual, preserving existing capabilities faithfully while smoothing our corners and / or extending slightly. Of particular note is the now fully-enabled use of the unwind protection making some operations a little faster by default; special thanks to I aki for spearheading this. Kevin and I also polished a few other bugs off as detailed below. The full list of details follows.

Changes in Rcpp release version 1.0.10 (2023-01-12)
  • Changes in Rcpp API:
    • Unwind protection is enabled by default (I aki in #1225). It can be disabled by defining RCPP_NO_UNWIND_PROTECT before including Rcpp.h. RCPP_USE_UNWIND_PROTECT is not checked anymore and has no effect. The associated plugin unwindProtect is therefore deprecated and will be removed in a future release.
    • The 'finalize' method for Rcpp Modules is now eagerly materialized, fixing an issue where errors can occur when Module finalizers are run (Kevin in #1231 closing #1230).
    • Zero-row data.frame objects can receive push_back or push_front (Dirk in #1233 fixing #1232).
    • One remaining sprintf has been replaced by snprintf (Dirk and Kevin in #1236 and #1237).
    • Several conversion warnings found by clang++ have been addressed (Dirk in #1240 and #1241).
  • Changes in Rcpp Attributes:
    • The C++20, C++2b (experimental) and C++23 standards now have plugin support like the other C++ standards (Dirk in #1228).
    • The source path for attributes received one more protection from spaces (Dirk in #1235 addressing #1234).
  • Changes in Rcpp Deployment:
    • Several GitHub Actions have been updated.

Thanks to my CRANberries, you can also look at a diff to the previous release. Questions, comments etc should go to the rcpp-devel mailing list off the R-Forge page. Bugs reports are welcome at the GitHub issue tracker as well (where one can also search among open or closed issues); questions are also welcome under rcpp tag at StackOverflow which also allows searching among the (currently) 2932 previous questions. If you like this or other open-source work I do, you can sponsor me at GitHub.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

7 January 2023

Reproducible Builds: Reproducible Builds in December 2022

Welcome to the December 2022 report from the Reproducible Builds project.
We are extremely pleased to announce that the dates for the Reproducible Builds Summit in 2023 have been announced in 2022 already: We plan to spend three days continuing to the grow of the Reproducible Builds effort. As in previous events, the exact content of the meeting will be shaped by the participants. And, as mentioned in Holger Levsen s post to our mailing list, the dates have been booked and confirmed with the venue, so if you are considering attending, please reserve these dates in your calendar today.
R my Gr nblatt, an associate professor in the T l com Sud-Paris engineering school wrote up his pain points of using Nix and NixOS. Although some of the points do not touch on reproducible builds, R my touches on problems he has encountered with the different kinds of reproducibility that these distributions appear to promise including configuration files affecting the behaviour of systems, the fragility of upstream sources as well as the conventional idea of binary reproducibility.
Morten Linderud reported that he is quietly optimistic that if Go programming language resolves all of its issues with reproducible builds (tracking issue) then the Go binaries distributed from Google and by Arch Linux may be bit-for-bit identical. It s just a bit early to sorta figure out what roadblocks there are. [But] Go bootstraps itself every build, so in theory I think it should be possible.
On December 15th, Holger Levsen published an in-depth interview he performed with David A. Wheeler on supply-chain security and reproducible builds, but it also touches on the biggest challenges in computing as well. This is part of a larger series of posts featuring the projects, companies and individuals who support the Reproducible Builds project. Other instalments include an article featuring the Civil Infrastructure Platform project and followed this up with a post about the Ford Foundation as well as a recent ones about ARDC, the Google Open Source Security Team (GOSST), Jan Nieuwenhuizen on Bootstrappable Builds, GNU Mes and GNU Guix and Hans-Christoph Steiner of the F-Droid project.
A number of changes were made to the Reproducible Builds website and documentation this month, including FC Stegerman adding an F-Droid/apksigcopier example to our embedded signatures page [ ], Holger Levsen making a large number of changes related to the 2022 summit in Venice as well as 2023 s summit in Hamburg [ ][ ][ ][ ] and Simon Butler updated our publications page [ ][ ].
On our mailing list this month, James Addison asked a question about whether there has been any effort to trace the files used by a build system in order to identify the corresponding build-dependency packages. [ ] In addition, Bernhard M. Wiedemann then posed a thought-provoking question asking How to talk to skeptics? , which was occasioned by a colleague who had published a blog post in May 2021 skeptical of reproducible builds. The thread generated a number of replies.

Android news obfusk (FC Stegerman) performed a thought-provoking review of tools designed to determine the difference between two different .apk files shipped by a number of free-software instant messenger applications. These scripts are often necessary in the Android/APK ecosystem due to these files containing embedded signatures so the conventional bit-for-bit comparison cannot be used. After detailing a litany of issues with these tools, they come to the conclusion that:
It s quite possible these messengers actually have reproducible builds, but the verification scripts they use don t actually allow us to verify whether they do.
This reflects the consensus view within the Reproducible Builds project: pursuing a situation in language or package ecosystems where binaries are bit-for-bit identical (over requiring a bespoke ecosystem-specific tool) is not a luxury demanded by purist engineers, but rather the only practical way to demonstrate reproducibility. obfusk also announced the first release of their own set of tools on our mailing list. Related to this, obfusk also posted to an issue filed against Mastodon regarding the difficulties of creating bit-by-bit identical APKs, especially with respect to copying v2/v3 APK signatures created by different tools; they also reported that some APK ordering differences were not caused by building on macOS after all, but by using Android Studio [ ] and that F-Droid added 16 more apps published with Reproducible Builds in December.

Debian As mentioned in last months report, Vagrant Cascadian has been organising a series of online sprints in order to clear the huge backlog of reproducible builds patches submitted by performing NMUs (Non-Maintainer Uploads). During December, meetings were held on the 1st, 8th, 15th, 22nd and 29th, resulting in a large number of uploads and bugs being addressed: The next sprint is due to take place this coming Tuesday, January 10th at 16:00 UTC.

Upstream patches The Reproducible Builds project attempts to fix as many currently-unreproducible packages as possible. This month, we wrote a large number of such patches, including:

Testing framework The Reproducible Builds project operates a comprehensive testing framework at tests.reproducible-builds.org in order to check packages and other artifacts for reproducibility. In October, the following changes were made by Holger Levsen:
  • The osuosl167 machine is no longer a openqa-worker node anymore. [ ][ ]
  • Detect problems with APT repository signatures [ ] and update a repository signing key [ ].
  • reproducible Debian builtin-pho: improve job output. [ ]
  • Only install the foot-terminfo package on Debian systems. [ ]
In addition, Mattia Rizzolo added support for the version of diffoscope in Debian stretch which doesn t support the --timeout flag. [ ][ ]

diffoscope diffoscope is our in-depth and content-aware diff utility. Not only can it locate and diagnose reproducibility issues, it can provide human-readable diffs from many kinds of binary formats. This month, Chris Lamb made the following changes to diffoscope, including preparing and uploading versions 228, 229 and 230 to Debian:
  • Fix compatibility with file(1) version 5.43, with thanks to Christoph Biedl. [ ]
  • Skip the test_html.py::test_diff test if html2text is not installed. (#1026034)
  • Update copyright years. [ ]
In addition, Jelle van der Waa added support for Berkeley DB version 6. [ ] Orthogonal to this, Holger Levsen bumped the Debian Standards-Version on all of our packages, including diffoscope [ ], strip-nondeterminism [ ], disorderfs [ ] and reprotest [ ].
If you are interested in contributing to the Reproducible Builds project, please visit our Contribute page on our website. You can get in touch with us via:

15 December 2022

Reproducible Builds: Supporter spotlight: David A. Wheeler on supply chain security

The Reproducible Builds project relies on several projects, supporters and sponsors for financial support, but they are also valued as ambassadors who spread the word about our project and the work that we do. This is the sixth instalment in a series featuring the projects, companies and individuals who support the Reproducible Builds project. We started this series by featuring the Civil Infrastructure Platform project and followed this up with a post about the Ford Foundation as well as a recent ones about ARDC, the Google Open Source Security Team (GOSST), Jan Nieuwenhuizen on Bootstrappable Builds, GNU Mes and GNU Guix and Hans-Christoph Steiner of the F-Droid project. Today, however, we will be talking with David A. Wheeler, the Director of Open Source Supply Chain Security at the Linux Foundation.

Holger Levsen: Welcome, David, thanks for taking the time to talk with us today. First, could you briefly tell me about yourself? David: Sure! I m David A. Wheeler and I work for the Linux Foundation as the Director of Open Source Supply Chain Security. That just means that my job is to help open source software projects improve their security, including its development, build, distribution, and incorporation in larger works, all the way out to its eventual use by end-users. In my copious free time I also teach at George Mason University (GMU); in particular, I teach a graduate course on how to design and implement secure software. My background is technical. I have a Bachelor s in Electronics Engineering, a Master s in Computer Science and a PhD in Information Technology. My PhD dissertation is connected to reproducible builds. My PhD dissertation was on countering the Trusting Trust attack, an attack that subverts fundamental build system tools such as compilers. The attack was discovered by Karger & Schell in the 1970s, and later demonstrated & popularized by Ken Thompson. In my dissertation on trusting trust I showed that a process called Diverse Double-Compiling (DDC) could detect trusting trust attacks. That process is a specialized kind of reproducible build specifically designed to detect trusting trust style attacks. In addition, countering the trusting trust attack primarily becomes more important only when reproducible builds become more common. Reproducible builds enable detection of build-time subversions. Most attackers wouldn t bother with a trusting trust attack if they could just directly use a build-time subversion of the software they actually want to subvert.
Holger: Thanks for taking the time to introduce yourself to us. What do you think are the biggest challenges today in computing? There are many big challenges in computing today. For example:
Holger: Do you think reproducible builds are an important part in secure computing today already? David: Yes, but first let s put things in context. Today, when attackers exploit software vulnerabilities, they re primarily exploiting unintentional vulnerabilities that were created by the software developers. There are a lot of efforts to counter this: We re just starting to get better at this, which is good. However, attackers always try to attack the easiest target. As our deployed software has started to be hardened against attack, attackers have dramatically increased their attacks on the software supply chain (Sonatype found in 2022 that there s been a 742% increase year-over-year). The software supply chain hasn t historically gotten much attention, making it the easy target. There are simple supply chain attacks with simple solutions: Unfortunately, attackers know there are other lines of attack. One of the most dangerous is subverted build systems, as demonstrated by the subversion of SolarWinds Orion system. In a subverted build system, developers can review the software source code all day and see no problem, because there is no problem there. Instead, the process to convert source code into the code people run, called the build system , is subverted by an attacker. One solution for countering subverted build systems is to make the build systems harder to attack. That s a good thing to do, but you can never be confident that it was good enough . How can you be sure it s not subverted, if there s no way to know? A stronger defense against subverted build systems is the idea of verified reproducible builds. A build is reproducible if given the same source code, build environment and build instructions, any party can recreate bit-by-bit identical copies of all specified artifacts. A build is verified if multiple different parties verify that they get the same result for that situation. When you have a verified reproducible build, either all the parties colluded (and you could always double-check it yourself), or the build process isn t subverted. There is one last turtle: What if the build system tools or machines are subverted themselves? This is not a common attack today, but it s important to know if we can address them when the time comes. The good news is that we can address this. For some situations reproducible builds can also counter such attacks. If there s a loop (that is, a compiler is used to generate itself), that s called the trusting trust attack, and that is more challenging. Thankfully, the trusting trust attack has been known about for decades and there are known solutions. The diverse double-compiling (DDC) process that I explained in my PhD dissertation, as well as the bootstrappable builds process, can both counter trusting trust attacks in the software space. So there is no reason to lose hope: there is a bottom turtle , as it were.
Holger: Thankfully, this has all slowly started to change and supply chain issues are now widely discussed, as evident by efforts like Securing the Software Supply Chain: Recommended Practices Guide for Developers which you shared on our mailing list. In there, Reproducible Builds are mentioned as recommended advanced practice, which is both pretty cool (we ve come a long way!), but to me it also sounds like this will take another decade until it s become standard normal procedure. Do you agree on that timeline? David: I don t think there will be any particular timeframe. Different projects and ecosystems will move at different speeds. I wouldn t be surprised if it took a decade or so for them to become relatively common there are good reasons for that. Today the most common kinds of attacks based on software vulnerabilities still involve unintentional vulnerabilities in operational systems. Attackers are starting to apply supply chain attacks, but the top such attacks today are typosquatting (creating packages with similar names) and dependency confusion) (convincing projects to download packages from the wrong repositories). Reproducible builds don t counter those kinds of attacks, they counter subverted builds. It s important to eventually have verified reproducible builds, but understandably other issues are currently getting prioritized first. That said, reproducible builds are important long term. Many people are working on countering unintentional vulnerabilities and the most common kinds of supply chain attacks. As these other threats are countered, attackers will increasingly target build systems. Attackers always go for the weakest link. We will eventually need verified reproducible builds in many situations, and it ll take a while to get build systems able to widely perform reproducible builds, so we need to start that work now. That s true for anything where you know you ll need it but it will take a long time to get ready you need to start now.
Holger: What are your suggestions to accelerate adoption? David: Reproducible builds need to be: I think there s a snowball effect. Once many projects packages are reproducible, it will be easier to convince other projects to make their packages reproducible. I also think there should be some prioritization. If a package is in wide use (e.g., part of minimum set of packages for a widely-used Linux distribution or framework), its reproducibility should be a special focus. If a package is vital for supporting some societally important critical infrastructure (e.g., running dams), it should also be considered important. You can then work on the ones that are less important over time.
Holger: How is the Best Practices Badge going? How many projects are participating and how many are missing? David: It s going very well. You can see some automatically-generated statistics, showing we have over 5,000 projects, adding more than 1/day on average. We have more than 900 projects that have earned at least the passing badge level.
Holger: How many of the projects participating in the Best Practices badge engaging with reproducible builds? David: As of this writing there are 168 projects that report meeting the reproducible builds criterion. That s a relatively small percentage of projects. However, note that this criterion (labelled build_reproducible) is only required for the gold badge. It s not required for the passing or silver level badge. Currently we ve been strategically focused on getting projects to at least earn a passing badge, and less on earning silver or gold badges. We would love for all projects to get earn a silver or gold badge, of course, but our theory is that projects that can t even earn a passing badge present the most risk to their users. That said, there are some projects we especially want to see implementing higher badge levels. Those include projects that are very widely used, so that vulnerabilities in them can impact many systems. Examples of such projects include the Linux kernel and curl. In addition, some projects are used within systems where it s important to society that they not have serious security vulnerabilities. Examples include projects used by chemical manufacturers, financial systems and weapons. We definitely encourage any of those kinds of projects to earn higher badge levels.
Holger: Many thanks for this interview, David, and for all of your work at the Linux Foundation and elsewhere!




For more information about the Reproducible Builds project, please see our website at reproducible-builds.org. If you are interested in ensuring the ongoing security of the software that underpins our civilisation and wish to sponsor the Reproducible Builds project, please reach out to the project by emailing contact@reproducible-builds.org.

11 November 2022

Reproducible Builds: Reproducible Builds in October 2022

Welcome to the Reproducible Builds report for October 2022! In these reports we attempt to outline the most important things that we have been up to over the past month. As ever, if you are interested in contributing to the project, please visit our Contribute page on our website.

Our in-person summit this year was held in the past few days in Venice, Italy. Activity and news from the summit will therefore be covered in next month s report!
A new article related to reproducible builds was recently published in the 2023 IEEE Symposium on Security and Privacy. Titled Taxonomy of Attacks on Open-Source Software Supply Chains and authored by Piergiorgio Ladisa, Henrik Plate, Matias Martinez and Olivier Barais, their paper:
[ ] proposes a general taxonomy for attacks on opensource supply chains, independent of specific programming languages or ecosystems, and covering all supply chain stages from code contributions to package distribution.
Taking the form of an attack tree, the paper covers 107 unique vectors linked to 94 real world supply-chain incidents which is then mapped to 33 mitigating safeguards including, of course, reproducible builds:
Reproducible Builds received a very high utility rating (5) from 10 participants (58.8%), but also a high-cost rating (4 or 5) from 12 (70.6%). One expert commented that a reproducible build like used by Solarwinds now, is a good measure against tampering with a single build system and another claimed this is going to be the single, biggest barrier .

It was noticed this month that Solarwinds published a whitepaper back in December 2021 in order to:
[ ] illustrate a concerning new reality for the software industry and illuminates the increasingly sophisticated threats made by outside nation-states to the supply chains and infrastructure on which we all rely.
The 12-month anniversary of the 2020 Solarwinds attack (which SolarWinds Worldwide LLC itself calls the SUNBURST attack) was, of course, the likely impetus for publication.
Whilst collaborating on making the Cyrus IMAP server reproducible, Ellie Timoney asked why the Reproducible Builds testing framework uses two remarkably distinctive build paths when attempting to flush out builds that vary on the absolute system path in which they were built. In the case of the Cyrus IMAP server, these happened to be: Asked why they vary in three different ways, Chris Lamb listed in detail the motivation behind to each difference.
On our mailing list this month:
The Reproducible Builds project is delighted to welcome openEuler to the Involved projects page [ ]. openEuler is Linux distribution developed by Huawei, a counterpart to it s more commercially-oriented EulerOS.

Debian Colin Watson wrote about his experience towards making the databases generated by the man-db UNIX manual page indexing tool:
One of the people working on [reproducible builds] noticed that man-db s database files were an obstacle to [reproducibility]: in particular, the exact contents of the database seemed to depend on the order in which files were scanned when building it. The reporter proposed solving this by processing files in sorted order, but I wasn t keen on that approach: firstly because it would mean we could no longer process files in an order that makes it more efficient to read them all from disk (still valuable on rotational disks), but mostly because the differences seemed to point to other bugs.
Colin goes on to describe his approach to solving the problem, including fixing various fits of internal caching, and he ends his post with None of this is particularly glamorous work, but it paid off .
Vagrant Cascadian announced on our mailing list another online sprint to help clear the huge backlog of reproducible builds patches submitted by performing NMUs (Non-Maintainer Uploads). The first such sprint took place on September 22nd, but another was held on October 6th, and another small one on October 20th. This resulted in the following progress:
41 reviews of Debian packages were added, 62 were updated and 12 were removed this month adding to our knowledge about identified issues. A number of issue types were updated too. [1][ ]
Lastly, Luca Boccassi submitted a patch to debhelper, a set of tools used in the packaging of the majority of Debian packages. The patch addressed an issue in the dh_installsysusers utility so that the postinst post-installation script that debhelper generates the same data regardless of the underlying filesystem ordering.

Other distributions F-Droid is a community-run app store that provides free software applications for Android phones. This month, F-Droid changed their documentation and guidance to now explicitly encourage RB for new apps [ ][ ], and FC Stegerman created an extremely in-depth issue on GitLab concerning the APK signing block. You can read more about F-Droid s approach to reproducibility in our July 2022 interview with Hans-Christoph Steiner of the F-Droid Project. In openSUSE, Bernhard M. Wiedemann published his usual openSUSE monthly report.

Upstream patches The Reproducible Builds project detects, dissects and attempts to fix as many currently-unreproducible packages as possible. We endeavour to send all of our patches upstream where appropriate. This month, we wrote a large number of such patches, including:

diffoscope diffoscope is our in-depth and content-aware diff utility. Not only can it locate and diagnose reproducibility issues, it can provide human-readable diffs from many kinds of binary formats. This month, Chris Lamb prepared and uploaded versions 224 and 225 to Debian:
  • Add support for comparing the text content of HTML files using html2text. [ ]
  • Add support for detecting ordering-only differences in XML files. [ ]
  • Fix an issue with detecting ordering differences. [ ]
  • Use the capitalised version of Ordering consistently everywhere in output. [ ]
  • Add support for displaying font metadata using ttx(1) from the fonttools suite. [ ]
  • Testsuite improvements:
    • Temporarily allow the stable-po pipeline to fail in the CI. [ ]
    • Rename the order1.diff test fixture to json_expected_ordering_diff. [ ]
    • Tidy the JSON tests. [ ]
    • Use assert_diff over get_data and an manual assert within the XML tests. [ ]
    • Drop the ALLOWED_TEST_FILES test; it was mostly just annoying. [ ]
    • Tidy the tests/test_source.py file. [ ]
Chris Lamb also added a link to diffoscope s OpenBSD packaging on the diffoscope.org homepage [ ] and Mattia Rizzolo fix an test failure that was occurring under with LLVM 15 [ ].

Testing framework The Reproducible Builds project operates a comprehensive testing framework at tests.reproducible-builds.org in order to check packages and other artifacts for reproducibility. In October, the following changes were made by Holger Levsen:
  • Run the logparse tool to analyse results on the Debian Edu build logs. [ ]
  • Install btop(1) on all nodes running Debian. [ ]
  • Switch Arch Linux from using SHA1 to SHA256. [ ]
  • When checking Debian debstrap jobs, correctly log the tool usage. [ ]
  • Cleanup more task-related temporary directory names when testing Debian packages. [ ][ ]
  • Use the cdebootstrap-static binary for the 2nd runs of the cdebootstrap tests. [ ]
  • Drop a workaround when testing OpenWrt and coreboot as the issue in diffoscope has now been fixed. [ ]
  • Turn on an rm(1) warning into an info -level message. [ ]
  • Special case the osuosl168 node for running Debian bookworm already. [ ][ ]
  • Use the new non-free-firmware suite on the o168 node. [ ]
In addition, Mattia Rizzolo made the following changes:
  • Ensure that 2nd build has a merged /usr. [ ]
  • Only reconfigure the usrmerge package on Debian bookworm and above. [ ]
  • Fix bc(1) syntax in the computation of the percentage of unreproducible packages in the dashboard. [ ][ ][ ]
  • In the index_suite_ pages, order the package status to be the same order of the menu. [ ]
  • Pass the --distribution parameter to the pbuilder utility. [ ]
Finally, Roland Clobus continued his work on testing Live Debian images. In particular, he extended the maintenance script to warn when workspace directories cannot be deleted. [ ]
If you are interested in contributing to the Reproducible Builds project, please visit our Contribute page on our website. However, you can get in touch with us via:

24 June 2022

Reproducible Builds: Supporter spotlight: Hans-Christoph Steiner of the F-Droid project

The Reproducible Builds project relies on several projects, supporters and sponsors for financial support, but they are also valued as ambassadors who spread the word about our project and the work that we do. This is the fifth instalment in a series featuring the projects, companies and individuals who support the Reproducible Builds project. We started this series by featuring the Civil Infrastructure Platform project and followed this up with a post about the Ford Foundation as well as a recent ones about ARDC, the Google Open Source Security Team (GOSST) and Jan Nieuwenhuizen on Bootstrappable Builds, GNU Mes and GNU Guix. Today, however, we will be talking with Hans-Christoph Steiner from the F-Droid project.

Chris: Welcome, Hans! Could you briefly tell me about yourself? Hans: Sure. I spend most of my time trying to private communications software usable by everyone, designing interactive software with a focus on human perceptual capabilities and building networks with free software. I ve been involved in Debian since approximately 2008 and became an official Debian developer in 2011. In the little time left over from that, I sometimes compose music with computers from my home in Austria.
Chris: For those who have not heard of it before, what is the F-Droid project? Hans: F-Droid is a community-run app store that provides free software applications for Android phones. First and foremost our goal is to represent our users. In particular, we review all of the apps that we distribute, and these reviews not only check for whether something is definitely free software or not, we additionally look for ethical problems with applications as well issues that we call Anti-Features . Since the project began in 2010, F-Droid now offers almost 4,000 free-software applications. F-Droid is also a app store kit as well, providing all the tools that are needed to operate an free app store of your own. It also includes complete build and release tools for managing the process of turning app source code into published builds.
Chris: How exactly does F-Droid differ from the Google Play store? Why might someone use F-Droid over Google Play? Hans: One key difference to the Google Play Store is that F-Droid does not ship proprietary software by default. All apps shipped from f-droid.org are built from source on our own builders. This is partly because F-Droid is backed by the free software community; that is, people who have engaged in the free software community long before Android was conceived, and, in particular, share many if not all of its values. Using F-Droid will therefore feel very familiar to anyone familiar with a modern Linux distribution.
Chris: How do you describe reproducibility from the F-Droid point of view? Hans: All centralised software repositories are extremely tempting targets for exploitation by malicious third parties, and the kinds of personal or otherwise sensitive data on our phones only increase this risk. In F-Droid s case, not only could the software we distribute be theoretically replaced with nefarious copies, the build infrastructure that generates that software could be compromised as well. F-Droid having reproducible builds is extremely important as it allows us to verify that our build systems have not been compromised and distributing malware to our users. In particular, if an independent build infrastructure can produce precisely the same results from a build, then we can be reasonably sure that they haven t been compromised. Technically-minded users can also validate their builds on their own systems too, further increasing trust in our build infrastructure. (This can be performed using fdroid verify command.) Our signature & trust scheme means that F-Droid can verify that an app is 100% free software whilst still using the developer s original .APK file. More details about this may be found in our reproducibility documentation and on the page about our Verification Server.
Chris: How do you see F-Droid fitting into the rest of the modern security ecosystem? Hans: Whilst F-Droid inherits all of the social benefits of free software, F-Droid takes special care to respect your privacy as well we don t attempt to track your device in any way. In particular, you don t need an account to use the F-Droid client, and F-Droid doesn t send any device-identifying information to our servers other than its own version number. What is more, we mark all apps in our repository that track you, and users can choose to hide any apps that has been tagged with a specific Anti-Feature in the F-Droid settings. Furthermore, any personal data you decide to give us (such as your email address when registering for a forum account) goes no further than us as well, and we pledge that it will never be used for anything beyond merely maintaining your account.
Chris: What would fully reproducible mean to F-Droid? What it would look like if reproducibility was a solved problem ? Or, rather, what would be your ultimate reproducibility goals? Hans: In an ideal world, every application submitted to F-Droid would not only build reproducibly, but would come with a cryptographically-signed signature from the developer as well. Then we would only distribute an compiled application after a build had received a number of matching signatures from multiple, independent third parties. This would mean that our users were not placing their trust solely in software developers machines, and wouldn t be trusting our centralised build servers as well.
Chris: What are the biggest blockers to reaching this state? Are there any key steps or milestones to get there? Hans: Time is probably the main constraint to reaching this goal. Not only do we need system administrators on an ongoing basis but we also need to incorporate reproducibly checks into our Continuous Integration (CI) system. We are always looking for new developers to join our effort, as well as learning about how to better bring them up to speed. Separate to this, we often experience blockers with reproducibility-related bugs in the Android build tooling. Luckily, upstreams do ultimately address these issues, but in some cases this has taken three or four years to reach end-users and developers. Unfortunately, upstream is not chiefly concerned with the security aspects of reproducibility builds; they care more about how it can minimise and optimise download size and speed.
Chris: Are you tracking any statistics about reproducibility in F-Droid over time? If so, can you share them? Does F-Droid track statistics about its own usage? Hans: To underline a topic touched on above, F-Droid is dedicated to preserving the privacy of its users; we therefore do not record usage statistics. This is, of course, in contrast to other application stores. However, we are in a position to track whether packages in F-Droid are reproducible or not. As in: what proportion of APKs in F-Droid have been independently verified over time? Unfortunately, due to time constraints, we do not yet automatically publish charts for this. We do publish some raw data that is related, though, and we naturally welcome contributions of visualizations based on any and all of our data. The All our APIs page on our wiki is a good place to start for someone wanting to contribute, everything about reproducible F-Droid apps is available in JSON format, what s missing are apps or dashboards making use of the available raw data.
Chris: Many thanks for this interview, Hans, and for all of your work on F-Droid and elsewhere. If someone wanted to get in touch or learn more about the project, where might someone go? Hans: The best way to find out about F-Droid is, of course, the main F-Droid homepage, but we are also on Twitter @fdroidorg. We have many avenues to participate and to learn more! We have an About page on our website and a thriving forum. We also have part of our documentation specifically dedicated to reproducible builds.


For more information about the Reproducible Builds project, please see our website at reproducible-builds.org. If you are interested in ensuring the ongoing security of the software that underpins our civilisation and wish to sponsor the Reproducible Builds project, please reach out to the project by emailing contact@reproducible-builds.org.

6 June 2022

Reproducible Builds: Reproducible Builds in May 2022

Welcome to the May 2022 report from the Reproducible Builds project. In our reports we outline the most important things that we have been up to over the past month. As ever, if you are interested in contributing to the project, please visit our Contribute page on our website.

Repfix paper Zhilei Ren, Shiwei Sun, Jifeng Xuan, Xiaochen Li, Zhide Zhou and He Jiang have published an academic paper titled Automated Patching for Unreproducible Builds:
[..] fixing unreproducible build issues poses a set of challenges [..], among which we consider the localization granularity and the historical knowledge utilization as the most significant ones. To tackle these challenges, we propose a novel approach [called] RepFix that combines tracing-based fine-grained localization with history-based patch generation mechanisms.
The paper (PDF, 3.5MB) uses the Debian mylvmbackup package as an example to show how RepFix can automatically generate patches to make software build reproducibly. As it happens, Reiner Herrmann submitted a patch for the mylvmbackup package which has remained unapplied by the Debian package maintainer for over seven years, thus this paper inadvertently underscores that achieving reproducible builds will require both technical and social solutions.

Python variables Johannes Schauer discovered a fascinating bug where simply naming your Python variable _m led to unreproducible .pyc files. In particular, the types module in Python 3.10 requires the following patch to make it reproducible:
--- a/Lib/types.py
+++ b/Lib/types.py
@@ -37,8 +37,8 @@ _ag = _ag()
 AsyncGeneratorType = type(_ag)
 
 class _C:
-    def _m(self): pass
-MethodType = type(_C()._m)
+    def _b(self): pass
+MethodType = type(_C()._b)
Simply renaming the dummy method from _m to _b was enough to workaround the problem. Johannes bug report first led to a number of improvements in diffoscope to aid in dissecting .pyc files, but upstream identified this as caused by an issue surrounding interned strings and is being tracked in CPython bug #78274.

New SPDX team to incorporate build metadata in Software Bill of Materials SPDX, the open standard for Software Bill of Materials (SBOM), is continuously developed by a number of teams and committees. However, SPDX has welcomed a new addition; a team dedicated to enhancing metadata about software builds, complementing reproducible builds in creating a more secure software supply chain. The SPDX Builds Team has been working throughout May to define the universal primitives shared by all build systems, including the who, what, where and how of builds:
  • Who: the identity of the person or organisation that controls the build infrastructure.
  • What: the inputs and outputs of a given build, combining metadata about the build s configuration with an SBOM describing source code and dependencies.
  • Where: the software packages making up the build system, from build orchestration tools such as Woodpecker CI and Tekton to language-specific tools.
  • How: the invocation of a build, linking metadata of a build to the identity of the person or automation tool that initiated it.
The SPDX Builds Team expects to have a usable data model by September, ready for inclusion in the SPDX 3.0 standard. The team welcomes new contributors, inviting those interested in joining to introduce themselves on the SPDX-Tech mailing list.

Talks at Debian Reunion Hamburg Some of the Reproducible Builds team (Holger Levsen, Mattia Rizzolo, Roland Clobus, Philip Rinn, etc.) met in real life at the Debian Reunion Hamburg (official homepage). There were several informal discussions amongst them, as well as two talks related to reproducible builds. First, Holger Levsen gave a talk on the status of Reproducible Builds for bullseye and bookworm and beyond (WebM, 210MB): Secondly, Roland Clobus gave a talk called Reproducible builds as applied to non-compiler output (WebM, 115MB):

Supply-chain security attacks This was another bumper month for supply-chain attacks in package repositories. Early in the month, Lance R. Vick noticed that the maintainer of the NPM foreach package let their personal email domain expire, so they bought it and now controls foreach on NPM and the 36,826 projects that depend on it . Shortly afterwards, Drew DeVault published a related blog post titled When will we learn? that offers a brief timeline of major incidents in this area and, not uncontroversially, suggests that the correct way to ship packages is with your distribution s package manager .

Bootstrapping Bootstrapping is a process for building software tools progressively from a primitive compiler tool and source language up to a full Linux development environment with GCC, etc. This is important given the amount of trust we put in existing compiler binaries. This month, a bootstrappable mini-kernel was announced. Called boot2now, it comprises a series of compilers in the form of bootable machine images.

Google s new Assured Open Source Software service Google Cloud (the division responsible for the Google Compute Engine) announced a new Assured Open Source Software service. Noting the considerable 650% year-over-year increase in cyberattacks aimed at open source suppliers, the new service claims to enable enterprise and public sector users of open source software to easily incorporate the same OSS packages that Google uses into their own developer workflows . The announcement goes on to enumerate that packages curated by the new service would be:
  • Regularly scanned, analyzed, and fuzz-tested for vulnerabilities.
  • Have corresponding enriched metadata incorporating Container/Artifact Analysis data.
  • Are built with Cloud Build including evidence of verifiable SLSA-compliance
  • Are verifiably signed by Google.
  • Are distributed from an Artifact Registry secured and protected by Google.
(Full announcement)

A retrospective on the Rust programming language Andrew bunnie Huang published a long blog post this month promising a critical retrospective on the Rust programming language. Amongst many acute observations about the evolution of the language s syntax (etc.), the post beings to critique the languages approach to supply chain security ( Rust Has A Limited View of Supply Chain Security ) and reproducibility ( You Can t Reproduce Someone Else s Rust Build ):
There s some bugs open with the Rust maintainers to address reproducible builds, but with the number of issues they have to deal with in the language, I am not optimistic that this problem will be resolved anytime soon. Assuming the only driver of the unreproducibility is the inclusion of OS paths in the binary, one fix to this would be to re-configure our build system to run in some sort of a chroot environment or a virtual machine that fixes the paths in a way that almost anyone else could reproduce. I say almost anyone else because this fix would be OS-dependent, so we d be able to get reproducible builds under, for example, Linux, but it would not help Windows users where chroot environments are not a thing.
(Full post)

Reproducible Builds IRC meeting The minutes and logs from our May 2022 IRC meeting have been published. In case you missed this one, our next IRC meeting will take place on Tuesday 28th June at 15:00 UTC on #reproducible-builds on the OFTC network.

A new tool to improve supply-chain security in Arch Linux kpcyrd published yet another interesting tool related to reproducibility. Writing about the tool in a recent blog post, kpcyrd mentions that although many PKGBUILDs provide authentication in the context of signed Git tags (i.e. the ability to verify the Git tag was signed by one of the two trusted keys ), they do not support pinning, ie. that upstream could create a new signed Git tag with an identical name, and arbitrarily change the source code without the [maintainer] noticing . Conversely, other PKGBUILDs support pinning but not authentication. The new tool, auth-tarball-from-git, fixes both problems, as nearly outlined in kpcyrd s original blog post.

diffoscope diffoscope is our in-depth and content-aware diff utility. Not only can it locate and diagnose reproducibility issues, it can provide human-readable diffs from many kinds of binary formats. This month, Chris Lamb prepared and uploaded versions 212, 213 and 214 to Debian unstable. Chris also made the following changes:
  • New features:
    • Add support for extracting vmlinuz Linux kernel images. [ ]
    • Support both python-argcomplete 1.x and 2.x. [ ]
    • Strip sticky etc. from x.deb: sticky Debian binary package [ ]. [ ]
    • Integrate test coverage with GitLab s concept of artifacts. [ ][ ][ ]
  • Bug fixes:
    • Don t mask differences in .zip or .jar central directory extra fields. [ ]
    • Don t show a binary comparison of .zip or .jar files if we have observed at least one nested difference. [ ]
  • Codebase improvements:
    • Substantially update comment for our calls to zipinfo and zipinfo -v. [ ]
    • Use assert_diff in test_zip over calling get_data with a separate assert. [ ]
    • Don t call re.compile and then call .sub on the result; just call re.sub directly. [ ]
    • Clarify the comment around the difference between --usage and --help. [ ]
  • Testsuite improvements:
    • Test --help and --usage. [ ]
    • Test that --help includes the file formats. [ ]
Vagrant Cascadian added an external tool reference xb-tool for GNU Guix [ ] as well as updated the diffoscope package in GNU Guix itself [ ][ ][ ].

Distribution work In Debian, 41 reviews of Debian packages were added, 85 were updated and 13 were removed this month adding to our knowledge about identified issues. A number of issue types have been updated, including adding a new nondeterministic_ordering_in_deprecated_items_collected_by_doxygen toolchain issue [ ] as well as ones for mono_mastersummary_xml_files_inherit_filesystem_ordering [ ], extended_attributes_in_jar_file_created_without_manifest [ ] and apxs_captures_build_path [ ]. Vagrant Cascadian performed a rough check of the reproducibility of core package sets in GNU Guix, and in openSUSE, Bernhard M. Wiedemann posted his usual monthly reproducible builds status report.

Upstream patches The Reproducible Builds project detects, dissects and attempts to fix as many currently-unreproducible packages as possible. We endeavour to send all of our patches upstream where appropriate. This month, we wrote a large number of such patches, including:

Reproducible builds website Chris Lamb updated the main Reproducible Builds website and documentation in a number of small ways, but also prepared and published an interview with Jan Nieuwenhuizen about Bootstrappable Builds, GNU Mes and GNU Guix. [ ][ ][ ][ ] In addition, Tim Jones added a link to the Talos Linux project [ ] and billchenchina fixed a dead link [ ].

Testing framework The Reproducible Builds project runs a significant testing framework at tests.reproducible-builds.org, to check packages and other artifacts for reproducibility. This month, the following changes were made:
  • Holger Levsen:
    • Add support for detecting running kernels that require attention. [ ]
    • Temporarily configure a host to support performing Debian builds for packages that lack .buildinfo files. [ ]
    • Update generated webpages to clarify wishes for feedback. [ ]
    • Update copyright years on various scripts. [ ]
  • Mattia Rizzolo:
    • Provide a facility so that Debian Live image generation can copy a file remotely. [ ][ ][ ][ ]
  • Roland Clobus:
    • Add initial support for testing generated images with OpenQA. [ ]
And finally, as usual, node maintenance was also performed by Holger Levsen [ ][ ].

Misc news On our mailing list this month:

Contact If you are interested in contributing to the Reproducible Builds project, please visit our Contribute page on our website. However, you can get in touch with us via:

18 May 2022

Reproducible Builds: Supporter spotlight: Jan Nieuwenhuizen on Bootstrappable Builds, GNU Mes and GNU Guix

The Reproducible Builds project relies on several projects, supporters and sponsors for financial support, but they are also valued as ambassadors who spread the word about our project and the work that we do. This is the fourth instalment in a series featuring the projects, companies and individuals who support the Reproducible Builds project. We started this series by featuring the Civil Infrastructure Platform project and followed this up with a post about the Ford Foundation as well as a recent ones about ARDC and the Google Open Source Security Team (GOSST). Today, however, we will be talking with Jan Nieuwenhuizen about Bootstrappable Builds, GNU Mes and GNU Guix.
Chris Lamb: Hi Jan, thanks for taking the time to talk with us today. First, could you briefly tell me about yourself? Jan: Thanks for the chat; it s been a while! Well, I ve always been trying to find something new and interesting that is just asking to be created but is mostly being overlooked. That s how I came to work on GNU Guix and create GNU Mes to address the bootstrapping problem that we have in free software. It s also why I have been working on releasing Dezyne, a programming language and set of tools to specify and formally verify concurrent software systems as free software. Briefly summarised, compilers are often written in the language they are compiling. This creates a chicken-and-egg problem which leads users and distributors to rely on opaque, pre-built binaries of those compilers that they use to build newer versions of the compiler. To gain trust in our computing platforms, we need to be able to tell how each part was produced from source, and opaque binaries are a threat to user security and user freedom since they are not auditable. The goal of bootstrappability (and the bootstrappable.org project in particular) is to minimise the amount of these bootstrap binaries. Anyway, after studying Physics at Eindhoven University of Technology (TU/e), I worked for digicash.com, a startup trying to create a digital and anonymous payment system sadly, however, a traditional account-based system won. Separate to this, as there was no software (either free or proprietary) to automatically create beautiful music notation, together with Han-Wen Nienhuys, I created GNU LilyPond. Ten years ago, I took the initiative to co-found a democratic school in Eindhoven based on the principles of sociocracy. And last Christmas I finally went vegan, after being mostly vegetarian for about 20 years!
Chris: For those who have not heard of it before, what is GNU Guix? What are the key differences between Guix and other Linux distributions? Jan: GNU Guix is both a package manager and a full-fledged GNU/Linux distribution. In both forms, it provides state-of-the-art package management features such as transactional upgrades and package roll-backs, hermetical-sealed build environments, unprivileged package management as well as per-user profiles. One obvious difference is that Guix forgoes the usual Filesystem Hierarchy Standard (ie. /usr, /lib, etc.), but there are other significant differences too, such as Guix being scriptable using Guile/Scheme, as well as Guix s dedication and focus on free software.
Chris: How does GNU Guix relate to GNU Mes? Or, rather, what problem is Mes attempting to solve? Jan: GNU Mes was created to address the security concerns that arise from bootstrapping an operating system such as Guix. Even if this process entirely involves free software (i.e. the source code is, at least, available), this commonly uses large and unauditable binary blobs. Mes is a Scheme interpreter written in a simple subset of C and a C compiler written in Scheme, and it comes with a small, bootstrappable C library. Twice, the Mes bootstrap has halved the size of opaque binaries that were needed to bootstrap GNU Guix. These reductions were achieved by first replacing GNU Binutils, GNU GCC and the GNU C Library with Mes, and then replacing Unix utilities such as awk, bash, coreutils, grep sed, etc., by Gash and Gash-Utils. The final goal of Mes is to help create a full-source bootstrap for any interested UNIX-like operating system.
Chris: What is the current status of Mes? Jan: Mes supports all that is needed from R5RS and GNU Guile to run MesCC with Nyacc, the C parser written for Guile, for 32-bit x86 and ARM. The next step for Mes would be more compatible with Guile, e.g., have guile-module support and support running Gash and Gash Utils. In working to create a full-source bootstrap, I have disregarded the kernel and Guix build system for now, but otherwise, all packages should be built from source, and obviously, no binary blobs should go in. We still need a Guile binary to execute some scripts, and it will take at least another one to two years to remove that binary. I m using the 80/20 approach, cutting corners initially to get something working and useful early. Another metric would be how many architectures we have. We are quite a way with ARM, tinycc now works, but there are still problems with GCC and Glibc. RISC-V is coming, too, which could be another metric. Someone has looked into picking up NixOS this summer. How many distros do anything about reproducibility or bootstrappability? The bootstrappability community is so small that we don t need metrics, sadly. The number of bytes of binary seed is a nice metric, but running the whole thing on a full-fledged Linux system is tough to put into a metric. Also, it is worth noting that I m developing on a modern Intel machine (ie. a platform with a management engine), that s another key component that doesn t have metrics.
Chris: From your perspective as a Mes/Guix user and developer, what does reproducibility mean to you? Are there any related projects? Jan: From my perspective, I m more into the problem of bootstrapping, and reproducibility is a prerequisite for bootstrappability. Reproducibility clearly takes a lot of effort to achieve, however. It s relatively easy to install some Linux distribution and be happy, but if you look at communities that really care about security, they are investing in reproducibility and other ways of improving the security of their supply chain. Projects I believe are complementary to Guix and Mes include NixOS, Debian and on the hardware side the RISC-V platform shares many of our core principles and goals.
Chris: Well, what are these principles and goals? Jan: Reproducibility and bootstrappability often feel like the next step in the frontier of free software. If you have all the sources and you can t reproduce a binary, that just doesn t feel right anymore. We should start to desire (and demand) transparent, elegant and auditable software stacks. To a certain extent, that s always been a low-level intent since the beginning of free software, but something clearly got lost along the way. On the other hand, if you look at the NPM or Rust ecosystems, we see a world where people directly install binaries. As they are not as supportive of copyleft as the rest of the free software community, you can see that movement and people in our area are doing more as a response to that so that what we have continues to remain free, and to prevent us from falling asleep and waking up in a couple of years and see, for example, Rust in the Linux kernel and (more importantly) we require big binary blobs to use our systems. It s an excellent time to advance right now, so we should get a foothold in and ensure we don t lose any more.
Chris: What would be your ultimate reproducibility goal? And what would the key steps or milestones be to reach that? Jan: The ultimate goal would be to have a system built with open hardware, with all software on it fully bootstrapped from its source. This bootstrap path should be easy to replicate and straightforward to inspect and audit. All fully reproducible, of course! In essence, we would have solved the supply chain security problem. Our biggest challenge is ignorance. There is much unawareness about the importance of what we are doing. As it is rather technical and doesn t really affect everyday computer use, that is not surprising. This unawareness can be a great force driving us in the opposite direction. Think of Rust being allowed in the Linux kernel, or Python being required to build a recent GNU C library (glibc). Also, the fact that companies like Google/Apple still want to play us vs them , not willing to to support GPL software. Not ready yet to truly support user freedom. Take the infamous log4j bug everyone is using open source these days, but nobody wants to take responsibility and help develop or nurture the community. Not ecosystem , as that s how it s being approached right now: live and let live/die: see what happens without taking any responsibility. We are growing and we are strong and we can do a lot but if we have to work against those powers, it can become problematic. So, let s spread our great message and get more people involved!
Chris: What has been your biggest win? Jan: From a technical point of view, the full-source bootstrap has have been our biggest win. A talk by Carl Dong at the 2019 Breaking Bitcoin conference stated that connecting Jeremiah Orian s Stage0 project to Mes would be the holy grail of bootstrapping, and we recently managed to achieve just that: in other words, starting from hex0, 357-byte binary, we can now build the entire Guix system. This past year we have not made significant visible progress, however, as our funding was unfortunately not there. The Stage0 project has advanced in RISC-V. A month ago, though, I secured NLnet funding for another year, and thanks to NLnet, Ekaitz Zarraga and Timothy Sample will work on GNU Mes and the Guix bootstrap as well. Separate to this, the bootstrappable community has grown a lot from two people it was six years ago: there are now currently over 100 people in the #bootstrappable IRC channel, for example. The enlarged community is possibly an even more important win going forward.
Chris: How realistic is a 100% bootstrappable toolchain? And from someone who has been working in this area for a while, is solving Trusting Trust) actually feasible in reality? Jan: Two answers: Yes and no, it really depends on your definition. One great thing is that the whole Stage0 project can also run on the Knight virtual machine, a hardware platform that was designed, I think, in the 1970s. I believe we can and must do better than we are doing today, and that there s a lot of value in it. The core issue is not the trust; we can probably all trust each other. On the other hand, we don t want to trust each other or even ourselves. I am not, personally, going to inspect my RISC-V laptop, and other people create the hardware and do probably not want to inspect the software. The answer comes back to being conscientious and doing what is right. Inserting GCC as a binary blob is not right. I think we can do better, and that s what I d like to do. The security angle is interesting, but I don t like the paranoid part of that; I like the beauty of what we are creating together and stepwise improving on that.
Chris: Thanks for taking the time to talk to us today. If someone wanted to get in touch or learn more about GNU Guix or Mes, where might someone go? Jan: Sure! First, check out: I m also on Twitter (@janneke_gnu) and on octodon.social (@janneke@octodon.social).
Chris: Thanks for taking the time to talk to us today. Jan: No problem. :)


For more information about the Reproducible Builds project, please see our website at reproducible-builds.org. If you are interested in ensuring the ongoing security of the software that underpins our civilisation and wish to sponsor the Reproducible Builds project, please reach out to the project by emailing contact@reproducible-builds.org.

1 April 2022

Russell Coker: Converting to UEFI

When I got my HP ML110 Gen9 working as a workstation I initially was under the impression that boot wasn t supported on NVMe and booted it from USB. I found USB booting with legacy boot to be unreliable so decided to try EFI booting and noticed that the NVMe devices were boot candidates with UEFI. Making one of them bootable was more complex than expected because no-one seems to have documented such things. So here s my documentation, it s not great but this method has worked once for me. Before starting major partitioning work it s best to run parted -l and save the output to a file, that can allow you to recreate partitions if you corrupt them. One thing I m doing on systems I manage is putting @reboot /usr/sbin/parted -l > /root/parted.log in the root crontab, then when the system is backed up the backup server gets any recent changes to partitioning (I don t backup /var/log on all my systems). Firstly run parted on the device to create the EFI and /boot partitions, note that if you want to copy and paste from this you must do so one line at a time, a block paste seemed to confuse parted.
mklabel gpt
mkpart EFI fat32 1 99
mkpart boot ext3 99 300
toggle 1 boot
toggle 1 esp
p
# Model: CT1000P1SSD8 (nvme)
# Disk /dev/nvme1n1: 1000GB
# Sector size (logical/physical): 512B/512B
# Partition Table: gpt
# Disk Flags: 
#
# Number  Start   End     Size    File system  Name  Flags
#  1      1049kB  98.6MB  97.5MB  fat32        EFI   boot, esp
#  2      98.6MB  300MB   201MB   ext3         boot
q
Here are the commands needed to create the filesystems and install the necessary files. This is almost to the stage of being scriptable. Some minor changes need to be made to convert from NVMe device names to SATA/SAS but nothing serious.
mkfs.vfat /dev/nvme1n1p1
mkfs.ext3 -N 1000 /dev/nvme1n1p2
file -s /dev/nvme1n1p2   sed -e s/^.*UUID/UUID/ -e "s/ .*$/ \/boot ext3 noatime 0 1/" >> /etc/fstab
file -s /dev/nvme1n1p1   tr "[a-f]" "[A-F]"  sed -e s/^.*numBEr.0x/UUID=/ -e "s/, .*$/ \/boot\/efi vfat umask=0077 0 1/" >> /etc/fstab
# edit /etc/fstab to put a hyphen between the 2 groups of 4 chars for the VFAT filesystem UUID
mount /boot
mkdir -p /boot/efi /boot/grub
mount /boot/efi
mkdir -p /boot/efi/EFI/debian
apt install efibootmgr shim-unsigned grub-efi-amd64
cp /usr/lib/shim/* /usr/lib/grub/x86_64-efi/monolithic/grubx64.efi /boot/efi/EFI/debian
file -s /dev/nvme1n1p2   sed -e "s/^.*UUID=/search.fs_uuid /" -e "s/ .needs.*$/ root hd0,gpt2/" > /boot/efi/EFI/debian/grub.cfg
echo "set prefix=(\$root)'/boot/grub'" >> /boot/efi/EFI/debian/grub.cfg
echo "configfile \$prefix/grub.cfg" >> /boot/efi/EFI/debian/grub.cfg
grub-install
update-grub
If someone would like to make a script that can handle the different partition names of regular SCSI/SATA disks, NVMe, CCISS, etc then that would be great. It would be good to have a script in Debian that creates the partitions and sets up the EFI files. If you want to have a second bootable device then the following commands will copy a GPT partition table and give it new UUIDs, make very certain that $DISKB is the one you want to be wiped and refer to my previous mention of parted -l . Also note that parted has a rescue command which works very well.
sgdisk /dev/$DISKA -R /dev/$DISKB 
sgdisk -G /dev/$DISKB
To backup a GPT partition table run a command like this. Note that if sgdisk is told to backup a MBR partitioned disk it will say Found invalid GPT and valid MBR; converting MBR to GPT forma which is probably a viable way of converting MBR format to GPT.
sgdisk -b sda.bak /dev/sda

14 April 2021

Dirk Eddelbuettel: RcppArmadillo 0.10.4.0.0 on CRAN: New Upstream Plus

armadillo image Armadillo is a powerful and expressive C++ template library for linear algebra aiming towards a good balance between speed and ease of use with a syntax deliberately close to a Matlab. RcppArmadillo integrates this library with the R environment and language and is widely used by (currently) 852 other packages on CRAN. This new release brings us the just release Armadillo 10.4.0. Upstream moves at a speed that is a little faster than the cadence CRAN likes. We release RcppArmadillo 0.10.2.2.0 on March 9; and upstream 10.3.0 came out shortly thereafter. We aim to accomodate CRAN with (roughly) monthly (or less frequent) releases) so by the time we were ready 10.4.0 had just come out. As it turns, the full testing had a benefit. Among the (currently) 852 CRAN packages using RcppArmadillo, two were failing tests. This is due to a subtle, but important point. Early on we realized that it would be beneficial if the standard R control over random-number creation and seeding affected Armadillo too, which Conrad accomodated kindly with an optional RNG interface which RcppArmadillo supplies. With recent changes he made, the R side saw normally-distributed draws (via the Armadillo interface) changed, which lead to the two changes. All hail unit tests. So I mentioned this to Conrad, and with the usual Chicago-Brisbane time difference late my evening a fix was in my inbox. The CRAN upload was then halted as I had missed that due to other changes he had made random draws from a Gamma would now call std::rand() which CRAN flags. Another email to Brisbane, another late (one-line) fix back and all was good. We still encountered one package with an error but flagged this as internal to that package s setup, so Uwe let RcppArmadillo onto CRAN, I contacted that package s maintainer who was very receptive and a change should be forthcoming. So with all that we have 0.10.4.0.0 on CRAN giving us Armadillo 10.4.0. The full set of changes follows. As Armadillo 10.3.0 was not uploaded to CRAN, its changes are included too.

Changes in RcppArmadillo version 0.10.4.0.0 (2021-04-12)
  • Upgraded to Armadillo release 10.4.0 (Pressure Cooker)
    • faster handling of triangular matrices by log_det()
    • added log_det_sympd() for log determinant of symmetric positive matrices
    • added ARMA_WARN_LEVEL configuration option, to control the degree of emitted warning messages
    • reduced the default degree of warning messages, so that failed decompositions, failed saving/loading, etc, no longer emit warnings
  • Apply one upstream corrections for arma::randn draws when using alternative (here R) generator, and arma::randg.

Changes in RcppArmadillo version 0.10.3.0.0 (2021-03-10)
  • Upgraded to Armadillo release 10.3 (Sunrise Chaos)
    • faster handling of symmetric positive definite matrices by pinv()
    • expanded .save() / .load() for dense matrices to handle coord_ascii format
    • for out of bounds access, element accessors now throw the more nuanced std::out_of_range exception, instead of only std::logic_error
    • improved quality of random numbers

Courtesy of my CRANberries, there is a diffstat report relative to previous release. More detailed information is on the RcppArmadillo page. Questions, comments etc should go to the rcpp-devel mailing list off the R-Forge page. If you like this or other open-source work I do, you can sponsor me at GitHub.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

6 July 2020

Dirk Eddelbuettel: Rcpp 1.0.5: Several Updates

rcpp logo Right on the heels of the news of 2000 CRAN packages using Rcpp (and also hitting 12.5 of CRAN package, or one in eight), we are happy to announce release 1.0.5 of Rcpp. Since the ten-year anniversary and the 1.0.0 release release in November 2018, we have been sticking to a four-month release cycle. The last release has, however, left us with a particularly bad taste due to some rather peculiar interactions with a very small (but ever so vocal) portion of the user base. So going forward, we will change two things. First off, we reiterate that we have already made rolling releases. Each minor snapshot of the main git branch gets a point releases. Between release 1.0.4 and this 1.0.5 release, there were in fact twelve of those. Each and every one of these was made available via the drat repo, and we will continue to do so going forward. Releases to CRAN, however, are real work. If they then end up with as much nonsense as the last release 1.0.4, we think it is appropriate to slow things down some more so we intend to now switch to a six-months cycle. As mentioned, interim releases are always just one install.packages() call with a properly set repos argument away. Rcpp has become the most popular way of enhancing R with C or C++ code. As of today, 2002 packages on CRAN depend on Rcpp for making analytical code go faster and further, along with 203 in BioConductor. And per the (partial) logs of CRAN downloads, we are running steady at around one millions downloads per month. This release features again a number of different pull requests by different contributors covering the full range of API improvements, attributes enhancements, changes to Sugar and helper functions, extended documentation as well as continuous integration deplayment. See the list below for details.

Changes in Rcpp patch release version 1.0.5 (2020-07-01)
  • Changes in Rcpp API:
    • The exception handler code in #1043 was updated to ensure proper include behavior (Kevin in #1047 fixing #1046).
    • A missing Rcpp_list6 definition was added to support R 3.3.* builds (Davis Vaughan in #1049 fixing #1048).
    • Missing Rcpp_list 2,3,4,5 definition were added to the Rcpp namespace (Dirk in #1054 fixing #1053).
    • A further updated corrected the header include and provided a missing else branch (Mattias Ellert in #1055).
    • Two more assignments are protected with Rcpp::Shield (Dirk in #1059).
    • One call to abs is now properly namespaced with std:: (Uwe Korn in #1069).
    • String object memory preservation was corrected/simplified (Kevin in #1082).
  • Changes in Rcpp Attributes:
    • Empty strings are not passed to R CMD SHLIB which was seen with R 4.0.0 on Windows (Kevin in #1062 fixing #1061).
    • The short_file_name() helper function is safer with respect to temporaries (Kevin in #1067 fixing #1066, and #1071 fixing #1070).
  • Changes in Rcpp Sugar:
    • Two sample() objects are now standard vectors and not R_alloc created (Dirk in #1075 fixing #1074).
  • Changes in Rcpp support functions:
    • Rcpp.package.skeleton() adjusts for a (documented) change in R 4.0.0 (Dirk in #1088 fixing #1087).
  • Changes in Rcpp Documentation:
    • The pdf file of the earlier introduction is again typeset with bibliographic information (Dirk).
    • A new vignette describing how to package C++ libraries has been added (Dirk in #1078 fixing #1077).
  • Changes in Rcpp Deployment:
    • Travis CI unit tests now run a matrix over the versions of R also tested at CRAN (rel/dev/oldrel/oldoldrel), and coverage runs in parallel for a net speed-up (Dirk in #1056 and #1057).
    • The exceptions test is now partially skipped on Solaris as it already is on Windows (Dirk in #1065).
    • The default CI runner was upgraded to R 4.0.0 (Dirk).
    • The CI matrix spans R 3.5, 3.6, r-release and r-devel (Dirk).

Thanks to CRANberries, you can also look at a diff to the previous release. Questions, comments etc should go to the rcpp-devel mailing list off the R-Forge page. Bugs reports are welcome at the GitHub issue tracker as well (where one can also search among open or closed issues); questions are also welcome under rcpp tag at StackOverflow which also allows searching among the (currently) 2455 previous questions. If you like this or other open-source work I do, you can now sponsor me at GitHub. For the first year, GitHub will match your contributions.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

Reproducible Builds: Reproducible Builds in June 2020

Welcome to the June 2020 report from the Reproducible Builds project. In these reports we outline the most important things that we and the rest of the community have been up to over the past month.

What are reproducible builds? One of the original promises of open source software is that distributed peer review and transparency of process results in enhanced end-user security. But whilst anyone may inspect the source code of free and open source software for malicious flaws, almost all software today is distributed as pre-compiled binaries. This allows nefarious third-parties to compromise systems by injecting malicious code into seemingly secure software during the various compilation and distribution processes.

News The GitHub Security Lab published a long article on the discovery of a piece of malware designed to backdoor open source projects that used the build process and its resulting artifacts to spread itself. In the course of their analysis and investigation, the GitHub team uncovered 26 open source projects that were backdoored by this malware and were actively serving malicious code. (Full article) Carl Dong from Chaincode Labs uploaded a presentation on Bitcoin Build System Security and reproducible builds to YouTube: The app intended to trace infection chains of Covid-19 in Switzerland published information on how to perform a reproducible build. The Reproducible Builds project has received funding in the past from the Open Technology Fund (OTF) to reach specific technical goals, as well as to enable the project to meet in-person at our summits. The OTF has actually also assisted countless other organisations that promote transparent, civil society as well as those that provide tools to circumvent censorship and repressive surveillance. However, the OTF has now been threatened with closure. (More info) It was noticed that Reproducible Builds was mentioned in the book End-user Computer Security by Mark Fernandes (published by WikiBooks) in the section titled Detection of malware in software. Lastly, reproducible builds and other ideas around software supply chain were mentioned in a recent episode of the Ubuntu Podcast in a wider discussion about the Snap and application stores (at approx 16:00).

Distribution work In the ArchLinux distribution, a goal to remove .doctrees from installed files was created via Arch s TODO list mechanism. These .doctree files are caches generated by the Sphinx documentation generator when developing documentation so that Sphinx does not have to reparse all input files across runs. They should not be packaged, especially as they lead to the package being unreproducible as their pickled format contains unreproducible data. Jelle van der Waa and Eli Schwartz submitted various upstream patches to fix projects that install these by default. Dimitry Andric was able to determine why the reproducibility status of FreeBSD s base.txz depended on the number of CPU cores, attributing it to an optimisation made to the Clang C compiler [ ]. After further detailed discussion on the FreeBSD bug it was possible to get the binaries reproducible again [ ]. For the GNU Guix operating system, Vagrant Cascadian started a thread about collecting reproducibility metrics and Jan janneke Nieuwenhuizen posted that they had further reduced their bootstrap seed to 25% which is intended to reduce the amount of code to be audited to avoid potential compiler backdoors. In openSUSE, Bernhard M. Wiedemann published his monthly Reproducible Builds status update as well as made the following changes within the distribution itself:

Debian Holger Levsen filed three bugs (#961857, #961858 & #961859) against the reproducible-check tool that reports on the reproducible status of installed packages on a running Debian system. They were subsequently all fixed by Chris Lamb [ ][ ][ ]. Timo R hling filed a wishlist bug against the debhelper build tool impacting the reproducibility status of 100s of packages that use the CMake build system which led to a number of tests and next steps. [ ] Chris Lamb contributed to a conversation regarding the nondeterministic execution of order of Debian maintainer scripts that results in the arbitrary allocation of UNIX group IDs, referencing the Tails operating system s approach this [ ]. Vagrant Cascadian also added to a discussion regarding verification formats for reproducible builds. 47 reviews of Debian packages were added, 37 were updated and 69 were removed this month adding to our knowledge about identified issues. Chris Lamb identified and classified a new uids_gids_in_tarballs_generated_by_cmake_kde_package_app_templates issue [ ] and updated the paths_vary_due_to_usrmerge as deterministic issue, and Vagrant Cascadian updated the cmake_rpath_contains_build_path and gcc_captures_build_path issues. [ ][ ][ ]. Lastly, Debian Developer Bill Allombert started a mailing list thread regarding setting the -fdebug-prefix-map command-line argument via an environment variable and Holger Levsen also filed three bugs against the debrebuild Debian package rebuilder tool (#961861, #961862 & #961864).

Development On our website this month, Arnout Engelen added a link to our Mastodon account [ ] and moved the SOURCE_DATE_EPOCH git log example to another section [ ]. Chris Lamb also limited the number of news posts to avoid showing items from (for example) 2017 [ ]. strip-nondeterminism is our tool to remove specific non-deterministic results from a completed build. It is used automatically in most Debian package builds. This month, Mattia Rizzolo bumped the debhelper compatibility level to 13 [ ] and adjusted a related dependency to avoid potential circular dependency [ ].

Upstream work The Reproducible Builds project attempts to fix unreproducible packages and we try to to send all of our patches upstream. This month, we wrote a large number of such patches including: Bernhard M. Wiedemann also filed reports for frr (build fails on single-processor machines), ghc-yesod-static/git-annex (a filesystem ordering issue) and ooRexx (ASLR-related issue).

diffoscope diffoscope is our in-depth diff-on-steroids utility which helps us diagnose reproducibility issues in packages. It does not define reproducibility, but rather provides a helpful and human-readable guidance for packages that are not reproducible, rather than relying essentially-useless binary diffs. This month, Chris Lamb uploaded versions 147, 148 and 149 to Debian and made the following changes:
  • New features:
    • Add output from strings(1) to ELF binaries. (#148)
    • Dump PE32+ executables (such as EFI applications) using objdump(1). (#181)
    • Add support for Zsh shell completion. (#158)
  • Bug fixes:
    • Prevent a traceback when comparing PDF documents that did not contain metadata (ie. a PDF /Info stanza). (#150)
    • Fix compatibility with jsondiff version 1.2.0. (#159)
    • Fix an issue in GnuPG keybox file handling that left filenames in the diff. [ ]
    • Correct detection of JSON files due to missing call to File.recognizes that checks candidates against file(1). [ ]
  • Output improvements:
    • Use the CSS word-break property over manually adding U+200B zero-width spaces as these were making copy-pasting cumbersome. (!53)
    • Downgrade the tlsh warning message to an info level warning. (#29)
  • Logging improvements:
  • Testsuite improvements:
    • Update tests for file(1) version 5.39. (#179)
    • Drop accidentally-duplicated copy of the --diff-mask tests. [ ]
    • Don t mask an existing test. [ ]
  • Codebase improvements:
    • Replace obscure references to WF with Wagner-Fischer for clarity. [ ]
    • Use a semantic AbstractMissingType type instead of remembering to check for both types of missing files. [ ]
    • Add a comment regarding potential security issue in the .changes, .dsc and .buildinfo comparators. [ ]
    • Drop a large number of unused imports. [ ][ ][ ][ ][ ]
    • Make many code sections more Pythonic. [ ][ ][ ][ ]
    • Prevent some variable aliasing issues. [ ][ ][ ]
    • Use some tactical f-strings to tidy up code [ ][ ] and remove explicit u"unicode" strings [ ].
    • Refactor a large number of routines for clarity. [ ][ ][ ][ ]
trydiffoscope is the web-based version of diffoscope. This month, Chris Lamb also corrected the location for the celerybeat scheduler to ensure that the clean/tidy tasks are actually called which had caused an accidental resource exhaustion. (#12) In addition Jean-Romain Garnier made the following changes:
  • Fix the --new-file option when comparing directories by merging DirectoryContainer.compare and Container.compare. (#180)
  • Allow user to mask/filter diff output via --diff-mask=REGEX. (!51)
  • Make child pages open in new window in the --html-dir presenter format. [ ]
  • Improve the diffs in the --html-dir format. [ ][ ]
Lastly, Daniel Fullmer fixed the Coreboot filesystem comparator [ ] and Mattia Rizzolo prevented warnings from the tlsh fuzzy-matching library during tests [ ] and tweaked the build system to remove an unwanted .build directory [ ]. For the GNU Guix distribution Vagrant Cascadian updated the version of diffoscope to version 147 [ ] and later 148 [ ].

Testing framework We operate a large and many-featured Jenkins-based testing framework that powers tests.reproducible-builds.org. Amongst many other tasks, this tracks the status of our reproducibility efforts across many distributions as well as identifies any regressions that have been introduced. This month, Holger Levsen made the following changes:
  • Debian-related changes:
    • Prevent bogus failure emails from rsync2buildinfos.debian.net every night. [ ]
    • Merge a fix from David Bremner s database of .buildinfo files to include a fix regarding comparing source vs. binary package versions. [ ]
    • Only run the Debian package rebuilder job twice per day. [ ]
    • Increase bullseye scheduling. [ ]
  • System health status page:
    • Add a note displaying whether a node needs to be rebooted for a kernel upgrade. [ ]
    • Fix sorting order of failed jobs. [ ]
    • Expand footer to link to the related Jenkins job. [ ]
    • Add archlinux_html_pages, openwrt_rebuilder_today and openwrt_rebuilder_future to known broken jobs. [ ]
    • Add HTML <meta> header to refresh the page every 5 minutes. [ ]
    • Count the number of ignored jobs [ ], ignore permanently known broken jobs [ ] and jobs on known offline nodes [ ].
    • Only consider the known offline status from Git. [ ]
    • Various output improvements. [ ][ ]
  • Tools:
    • Switch URLs for the Grml Live Linux and PureOS package sets. [ ][ ]
    • Don t try to build a disorderfs Debian source package. [ ][ ][ ]
    • Stop building diffoscope as we are moving this to Salsa. [ ][ ]
    • Merge several is diffoscope up-to-date on every platform? test jobs into one [ ] and fail less noisily if the version in Debian cannot be determined [ ].
In addition: Marcus Hoffmann was added as a maintainer of the F-Droid reproducible checking components [ ], Jelle van der Waa updated the is diffoscope up-to-date in every platform check for Arch Linux and diffoscope [ ], Mattia Rizzolo backed up a copy of a remove script run on the Codethink-hosted jump server [ ] and Vagrant Cascadian temporarily disabled the fixfilepath on bullseye, to get better data about the ftbfs_due_to_f-file-prefix-map categorised issue. Lastly, the usual build node maintenance was performed by Holger Levsen [ ][ ], Mattia Rizzolo [ ] and Vagrant Cascadian [ ][ ][ ][ ][ ].

If you are interested in contributing to the Reproducible Builds project, please visit our Contribute page on our website. However, you can get in touch with us via:

This month s report was written by Bernhard M. Wiedemann, Chris Lamb, Eli Schwartz, Holger Levsen, Jelle van der Waa and Vagrant Cascadian. It was subsequently reviewed by a bunch of Reproducible Builds folks on IRC and the mailing list.

30 June 2020

Russ Allbery: Review: The Fifth Risk

Review: The Fifth Risk, by Michael Lewis
Publisher: W.W. Norton
Copyright: 2018
Printing: 2019
ISBN: 0-393-35745-7
Format: Kindle
Pages: 254
The Fifth Risk starts with the presidential transition. Max Stier, the first person profiled by Lewis in this book, is the founder of the Partnership for Public Service. That foundation helped push through laws to provide more resources and structure for the transition of the United States executive branch from one president to the next. The goal was to fight wasted effort, unnecessary churn, and pointless disruption in the face of each administration's skepticism about everyone who worked for the previous administration.
"It's Groundhog Day," said Max. "The new people come in and think that the previous administration and the civil service are lazy or stupid. Then they actually get to know the place they are managing. And when they leave, they say, 'This was a really hard job, and those are the best people I've ever worked with.' This happens over and over and over."
By 2016, Stier saw vast improvements, despite his frustration with other actions of the Obama administration. He believed their transition briefings were one of the best courses ever produced on how the federal government works. Then that transition process ran into Donald Trump. Or, to be more accurate, that transition did not run into Donald Trump, because neither he nor anyone who worked for him were there. We'll never know how good the transition information was because no one ever listened to or read it. Meetings were never scheduled. No one showed up. This book is not truly about the presidential transition, though, despite its presence as a continuing theme. The Fifth Risk is, at its heart, an examination of government work, the people who do it, why it matters, and why you should care about it. It's a study of the surprising and misunderstood responsibilities of the departments of the United States federal government. And it's a series of profiles of the people who choose this work as a career, not in the upper offices of political appointees, but deep in the civil service, attempting to keep that system running. I will warn now that I am far too happy that this book exists to be entirely objective about it. The United States desperately needs basic education about the government at all levels, but particularly the federal civil service. The public impression of government employees is skewed heavily towards the small number of public-facing positions and towards paperwork frustrations, over which the agency usually has no control because they have been sabotaged by Congress (mostly by Republicans, although the Democrats get involved occasionally). Mental images of who works for the government are weirdly selective. The Coast Guard could say "I'm from the government and I'm here to help" every day, to the immense gratitude of the people they rescue, but Reagan was still able to use that as a cheap applause line in his attack on government programs. Other countries have more functional and realistic social attitudes towards their government workers. The United States is trapped in a politically-fueled cycle of contempt and ignorance. It has to stop. And one way to help stop it is someone with Michael Lewis's story-telling skills writing a different narrative. The Fifth Risk is divided into a prologue about presidential transitions, three main parts, and an afterword (added in current editions) about a remarkable government worker whom you likely otherwise would never hear about. Each of the main parts talks about a different federal department: the Department of Energy, the Department of Agriculture, and the Department of Commerce. In keeping with the theme of the book, the people Lewis profiles do not do what you might expect from the names of those departments. Lewis's title comes from his discussion with John MacWilliams, a former Goldman Sachs banker who quit the industry in search of more personally meaningful work and became the chief risk officer for the Department of Energy. Lewis asks him for the top five risks he sees, and if you know that the DOE is responsible for safeguarding nuclear weapons, you will be able to guess several of them: nuclear weapons accidents, North Korea, and Iran. If you work in computer security, you may share his worry about the safety of the electrical grid. But his fifth risk was project management. Can the government follow through on long-term hazardous waste safety and cleanup projects, despite constant political turnover? Can it attract new scientists to the work of nuclear non-proliferation before everyone with the needed skills retires? Can it continue to lay the groundwork with basic science for innovation that we'll need in twenty or fifty years? This is what the Department of Energy is trying to do. Lewis's profiles of other departments are similarly illuminating. The Department of Agriculture is responsible for food stamps, the most effective anti-poverty program in the United States with the possible exception of Social Security. The section on the Department of Commerce is about weather forecasting, specifically about NOAA (the National Oceanic and Atmospheric Administration). If you didn't know that all of the raw data and many of the forecasts you get from weather apps and web sites are the work of government employees, and that AccuWeather has lobbied Congress persistently for years to prohibit the NOAA from making their weather forecasts public so that AccuWeather can charge you more for data your taxes already paid for, you should read this book. The story of American contempt for government work is partly about ignorance, but it's also partly about corporations who claim all of the credit while selling taxpayer-funded resources back to you at absurd markups. The afterword I'll leave for you to read for yourself, but it's the story of Art Allen, a government employee you likely have never heard of but whose work for the Coast Guard has saved more lives than we are able to measure. I found it deeply moving. If you, like I, are a regular reader of long-form journalism and watch for new Michael Lewis essays in particular, you've probably already read long sections of this book. By the time I sat down with it, I think I'd read about a third in other forms on-line. But the profiles that I had already read were so good that I was happy to read them again, and the additional stories and elaboration around previously published material was more than worth the cost and time investment in the full book.
It was never obvious to me that anyone would want to read what had interested me about the United States government. Doug Stumpf, my magazine editor for the past decade, persuaded me that, at this strange moment in American history, others might share my enthusiasm.
I'll join Michael Lewis in thanking Doug Stumpf. The Fifth Risk is not a proposal for how to fix government, or politics, or polarization. It's not even truly a book about the Trump presidency or about the transition. Lewis's goal is more basic: The United States government is full of hard-working people who are doing good and important work. They have effectively no public relations department. Achievements that would result in internal and external press releases in corporations, not to mention bonuses and promotions, go unnoticed and uncelebrated. If you are a United States citizen, this is your government and it does important work that you should care about. It deserves the respect of understanding and thoughtful engagement, both from the citizenry and from the politicians we elect. Rating: 10 out of 10

12 May 2020

Daniel Silverstone: The Lars, Mark, and Daniel Club

Last night, Lars, Mark, and I discussed Jeremy Kun's The communicative value of using Git well post. While a lot of our discussion was spawned by the article, we did go off-piste a little, and I hope that my notes below will enlighten you all as to a bit of how we see revision control these days. It was remarkably pleasant to read an article where the comments section wasn't a cesspool of horror, so if this posting encourages you to go and read the article, don't stop when you reach the bottom -- the comments are good and useful too.
This was a fairly non-contentious article for us though each of us had points we wished to bring up and chat about it turned into a very convivial chat. We saw the main thrust of the article as being about using the metadata of revision control to communicate intent, process, and decision making. We agreed that it must be possible to do so effectively with Mercurial (thus deciding that the mention of it was simply a bit of clickbait / red herring) and indeed Mark figured that he was doing at least some of this kind of thing way back with CVS. We all discussed how knowing the fundamentals of Git's data model improved our ability to work wih the tool. Lars and I mentioned how jarring it has originally been to come to Git from revision control systems such as Bazaar (bzr) but how over time we came to appreciate Git for what it is. For Mark this was less painful because he came to Git early enough that there was little more than the fundamental data model, without much of the porcelain which now exists. One point which we all, though Mark in particular, felt was worth considering was that of distinguishing between published and unpublished changes. The article touches on it a little, but one of the benefits of the workflow which Jeremy espouses is that of using the revision control system as an integral part of the review pipeline. This is perhaps done very well with Git based workflows, but can be done with other VCSs. With respect to the points Jeremy makes regarding making commits which are good for reviewing, we had a general agreement that things were good and sensible, to a point, but that some things were missed out on. For example, I raised that commit messages often need to be more thorough than one-liners, but Jeremy's examples (perhaps through expedience for the article?) were all pretty trite one-liners which perhaps would have been entirely obvious from the commit content. Jeremy makes the point that large changes are hard to review, and Lars poined out that Greg Wilson did research in this area, and at least one article mentions 200 lines as a maximum size of a reviewable segment. I had a brief warble at this point about how reviewing needs to be able to consider the whole of the intended change (i.e. a diff from base to tip) not just individual commits, which is also missing from Jeremy's article, but that such a review does not need to necessarily be thorough and detailed since the commit-by-commit review remains necessary. I use that high level diff as a way to get a feel for the full shape of the intended change, a look at the end-game if you will, before I read the story of how someone got to it. As an aside at this point, I talked about how Jeremy included a 'style fixes' commit in his example, but I loathe seeing such things and would much rather it was either not in the series because it's unrelated to it; or else the style fixes were folded into the commits they were related to. We discussed how style rules, as well as commit-bisectability, and other rules which may exist for a codebase, the adherence to which would form part of the checks that a code reviewer may perform, are there to be held to when they help the project, and to be broken when they are in the way of good communication between humans. In this, Lars talked about how revision control histories provide high level valuable communication between developers. Communication between humans is fraught with error and the rules are not always clear on what will work and what won't, since this depends on the communicators, the context, etc. However whatever communication rules are in place should be followed. We often say that it takes two people to communicate information, but when you're writing commit messages or arranging your commit history, the second party is often nebulous "other" and so the code reviewer fulfils that role to concretise it for the purpose of communication. At this point, I wondered a little about what value there might be (if any) in maintaining the metachanges (pull request info, mailing list discussions, etc) for historical context of decision making. Mark suggested that this is useful for design decisions etc but not for the style/correctness discussions which often form a large section of review feedback. Maybe some of the metachange tracking is done automatically by the review causing the augmentation of the changes (e.g. by comments, or inclusion of design documentation changes) to explain why changes are made. We discussed how the "rebase always vs. rebase never" feeling flip-flopped in us for many years until, as an example, what finally won Lars over was that he wants the history of the project to tell the story, in the git commits, of how the software has changed and evolved in an intentional manner. Lars said that he doesn't care about the meanderings, but rather about a clear story which can be followed and understood. I described this as the switch from "the revision history is about what I did to achieve the goal" to being more "the revision history is how I would hope someone else would have done this". Mark further refined that to "The revision history of a project tells the story of how the project, as a whole, chose to perform its sequence of evolution." We discussed how project history must necessarily then contain issue tracking, mailing list discussions, wikis, etc. There are exist free software projects where part of their history is forever lost because, for example, the project moved from Sourceforge to Github, but made no effort (or was unable) to migrate issues or somesuch. Linkages between changes and the issues they relate to can easily be broken, though at least with mailing lists you can often rewrite URLs if you have something consistent like a Message-Id. We talked about how cover notes, pull request messages, etc. can thus also be lost to some extent. Is this an argument to always use merges whose message bodies contain those details, rather than always fast-forwarding? Or is it a reason to encapsulate all those discussions into git objects which can be forever associated with the changes in the DAG? We then diverted into discussion of CI, testing every commit, and the benefits and limitations of automated testing vs. manual testing; though I think that's a little too off-piste for even this summary. We also talked about how commit message audiences include software perhaps, with the recent movement toward conventional commits and how, with respect to commit messages for machine readability, it can become very complex/tricky to craft good commit messages once there are multiple disparate audiences. For projects the size of the Linux kernel this kind of thing would be nearly impossible, but for smaller projects, perhaps there's value. Finally, we all agreed that we liked the quote at the end of the article, and so I'd like to close out by repeating it for you all... Hal Abelson famously said:
Programs must be written for people to read, and only incidentally for machines to execute.
Jeremy agrees, as do we, and extends that to the metacommit information as well.

Next.