Search Results: "beh"

26 July 2025

Matthew Palmer: Object deserialization attacks using Ruby's Oj JSON parser

tl;dr: there is an attack in the wild which is triggering dangerous-but-seemingly-intended behaviour in the Oj JSON parser when used in the default and recommended manner, which can lead to everyone s favourite kind of security problem: object deserialization bugs! If you have the oj gem anywhere in your Gemfile.lock, the quickest mitigation is to make sure you have Oj.default_options = mode: :strict somewhere, and that no library is overwriting that setting to something else.

Prologue As a sensible sysadmin, all the sites I run send me a notification if any unhandled exception gets raised. Mostly, what I get sent is error-handling corner cases I missed, but now and then things get more interesting. In this case, it was a PG::UndefinedColumn exception, which looked something like this:
PG::UndefinedColumn: ERROR:  column "xyzzydeadbeef" does not exist
This is weird on two fronts: firstly, this application has been running for a while, and if there was a schema problem, I d expect it to have made itself apparent long before now. And secondly, while I don t profess to perfection in my programming, I m usually better at naming my database columns than that. Something is definitely hinky here, so let s jump into the mystery mobile!

The column name is coming from outside the building! The exception notifications I get sent include a whole lot of information about the request that caused the exception, including the request body. In this case, the request body was JSON, and looked like this:
 "name":":xyzzydeadbeef", ... 
The leading colon looks an awful lot like the syntax for a Ruby symbol, but it s in a JSON string. Surely there s no way a JSON parser would be turning that into a symbol, right? Right?!? Immediately, I thought that that possibly was what was happening, because I use Sequel for my SQL database access needs, and Sequel treats symbols as database column names. It seemed like too much of a coincidence that a vaguely symbol-shaped string was being sent in, and the exact same name was showing up as a column name. But how the flying fudgepickles was a JSON string being turned into a Ruby symbol, anyway? Enter Oj.

Oj? I barely know aj A long, long time ago, the standard Ruby JSON library had a reputation for being slow. Thus did many competitors flourish, claiming more features and better performance. Strong amongst the contenders was oj (for Optimized JSON ), touted as The fastest JSON parser and object serializer . Given the history, it s not surprising that people who wanted the best possible performance turned to Oj, leading to it being found in a great many projects, often as a sub-dependency of a dependency of a dependency (which is how it ended up in my project). You might have noticed in Oj s description that, in addition to claiming fastest , it also describes itself as an object serializer . Anyone who has kept an eye on the security bug landscape will recall that object deserialization is a rich vein of vulnerabilities to mine. Libraries that do object deserialization, especially ones with a history that goes back to before the vulnerability class was well-understood, are likely to be trouble magnets. And thus, it turns out to be with Oj. By default, Oj will happily turn any string that starts with a colon into a symbol:

>> require "oj"
>> Oj.load(' "name":":xyzzydeadbeef","username":"bob","answer":42 ')
=>  "name"=>:xyzzydeadbeef, "username"=>"bob", "answer"=>42 

How that gets exploited is only limited by the creativity of an attacker. Which I ll talk about more shortly but first, a word from my rant cortex.

Insecure By Default is a Cancer While the object of my ire today is Oj and its fast-and-loose approach to deserialization, it is just one example of a pervasive problem in software: insecurity by default. Whether it s a database listening on 0.0.0.0 with no password as soon as its installed, or a library whose default behaviour is to permit arbitrary code execution, it all contributes to a software ecosystem that is an appalling security nightmare. When a user (in this case, a developer who wants to parse JSON) comes across a new piece of software, they have by definition no idea what they re doing with that software. They re going to use the defaults, and follow the most easily-available documentation, to achieve their goal. It is unrealistic to assume that a new user of a piece of software is going to do things the right way , unless that right way is the only way, or at least the by-far-the-easiest way. Conversely, the developer(s) of the software is/are the domain experts. They have knowledge of the problem domain, through their exploration while building the software, and unrivalled expertise in the codebase. Given this disparity in knowledge, it is tantamount to malpractice for the experts the developer(s) to off-load the responsibility for the safe and secure use of the software to the party that has the least knowledge of how to do that (the new user). To apply this general principle to the specific case, take the Using section of the Oj README. The example code there calls Oj.load, with no indication that this code will, in fact, parse specially-crafted JSON documents into Ruby objects. The brand-user user of the library, no doubt being under pressure to Get Things Done, is almost certainly going to look at this Using example, get the apparent result they were after (a parsed JSON document), and call it a day. It is unlikely that a brand-new user will, for instance, scroll down to the Further Reading section, find the second last (of ten) listed documents, Security.md , and carefully peruse it. If they do, they ll find an oblique suggestion that parsing untrusted input is never a good idea . While that s true, it s also rather unhelpful, because I d wager that by far the majority of JSON parsed in the world is untrusted , in one way or another, given the predominance of JSON as a format for serializing data passing over the Internet. This guidance is roughly akin to putting a label on a car s airbags that driving at speed can be hazardous to your health : true, but unhelpful under the circumstances. The solution is for default behaviours to be secure, and any deviation from that default that has the potential to degrade security must, at the very least, be clearly labelled as such. For example, the Oj.load function should be named Oj.unsafe_load, and the Oj.load function should behave as the Oj.safe_load function does presently. By naming the unsafe function as explicitly unsafe, developers (and reviewers) have at least a fighting chance of recognising they re doing something risky. We put warning labels on just about everything in the real world; the same should be true of dangerous function calls. OK, rant over. Back to the story.

But how is this exploitable? So far, I ve hopefully made it clear that Oj does some Weird Stuff with parsing certain JSON strings. It caused an unhandled exception in a web application I run, which isn t cool, but apart from bombing me with exception notifications, what s the harm? For starters, let s look at our original example: when presented with a symbol, Sequel will interpret that as a column name, rather than a string value. Thus, if our save an update to the user code looked like this:

# request_body has the JSON representation of the form being submitted
body = Oj.load(request_body)
DB[:users].where(id: user_id).update(name: body["name"])

In normal operation, this will issue an SQL query along the lines of UPDATE users SET name='Jaime' WHERE id=42. If the name given is Jaime O Dowd , all is still good, because Sequel quotes string values, etc etc. All s well so far. But, imagine there is a column in the users table that normally users cannot read, perhaps admin_notes. Or perhaps an attacker has gotten temporary access to an account, and wants to dump the user s password hash for offline cracking. So, they send an update claiming that their name is :admin_notes (or :password_hash). In JSON, that ll look like "name":":admin_notes" , and Oj.load will happily turn that into a Ruby object of "name"=>:admin_notes . When run through the above update the user code fragment, it ll produce the SQL UPDATE users SET name=admin_notes WHERE id=42. In other words, it ll copy the contents of the admin_notes column into the name column which the attacker can then read out just by refreshing their profile page.

But Wait, There s More! That an attacker can read other fields in the same table isn t great, but that s barely scratching the surface. Remember before I said that Oj does object serialization ? That means that, in general, you can create arbitrary Ruby objects from JSON. Since objects contain code, it s entirely possible to trigger arbitrary code execution by instantiating an appropriate Ruby object. I m not going to go into details about how to do this, because it s not really my area of expertise, and many others have covered it in detail. But rest assured, if an attacker can feed input of their choosing into a default call to Oj.load, they ve been handed remote code execution on a platter.

Mitigations As Oj s object deserialization is intended and documented behaviour, don t expect a future release to make any of this any safer. Instead, we need to mitigate the risks. Here are my recommended steps:
  1. Look in your Gemfile.lock (or SBOM, if that s your thing) to see if the oj gem is anywhere in your codebase. Remember that even if you don t use it directly, it s popular enough that it is used in a lot of places. If you find it in your transitive dependency tree anywhere, there s a chance you re vulnerable, limited only by the ingenuity of attackers to feed crafted JSON into a deeply-hidden Oj.load call.
  2. If you depend on oj directly and use it in your project, consider not doing that. The json gem is acceptably fast, and JSON.parse won t create arbitrary Ruby objects.
  3. If you really, really need to squeeze the last erg of performance out of your JSON parsing, and decide to use oj to do so, find all calls to Oj.load in your code and switch them to call Oj.safe_load.
  4. It is a really, really bad idea to ever use Oj to deserialize JSON into objects, as it lacks the safety features needed to mitigate the worst of the risks of doing so (for example, restricting which classes can be instantiated, as is provided by the permitted_classes argument to Psych.load). I d make it a priority to move away from using Oj for that, and switch to something somewhat safer (such as the aforementioned Psych). At the very least, audit and comment heavily to minimise the risk of user-provided input sneaking into those calls somehow, and pass mode: :object as the second argument to Oj.load, to make it explicit that you are opting-in to this far more dangerous behaviour only when it s absolutely necessary.
  5. To secure any unsafe uses of Oj.load in your dependencies, consider setting the default Oj parsing mode to :strict, by putting Oj.default_options = mode: :strict somewhere in your initialization code (and make sure no dependencies are setting it to something else later!). There is a small chance that this change of default might break something, if a dependency is using Oj to deliberately create Ruby objects from JSON, but the overwhelming likelihood is that Oj s just being used to parse ordinary JSON, and these calls are just RCE vulnerabilities waiting to give you a bad time.

Is Your Bacon Saved? If I ve helped you identify and fix potential RCE vulnerabilities in your software, or even just opened your eyes to the risks of object deserialization, please help me out by buying me a refreshing beverage. I would really appreciate any support you can give. Alternately, if you d like my help in fixing these (and many other) sorts of problems, I m looking for work, so email me.

20 July 2025

Michael Prokop: What to expect from Debian/trixie #newintrixie

Trixie Banner, Copyright 2024 Elise Couper Update on 2025-07-28: added note about Debian 13/trixie support for OpenVox (thanks, Ben Ford!) Debian v13 with codename trixie is scheduled to be published as new stable release on 9th of August 2025. I was the driving force at several of my customers to be well prepared for the upcoming stable release (my efforts for trixie started in August 2024). On the one hand, to make sure packages we care about are available and actually make it into the release. On the other hand, to ensure there are no severe issues that make it into the release and to get proper and working upgrades. So far everything is looking pretty well and working fine, the efforts seemed to have payed off. :) As usual with major upgrades, there are some things to be aware of, and hereby I m starting my public notes on trixie that might be worth for other folks. My focus is primarily on server systems and looking at things from a sysadmin perspective. Further readings As usual start at the official Debian release notes, make sure to especially go through What s new in Debian 13 + issues to be aware of for trixie (strongly recommended read!). Package versions As a starting point, let s look at some selected packages and their versions in bookworm vs. trixie as of 2025-07-20 (mainly having amd64 in mind):
Package bookworm/v12 trixie/v13
ansible 2.14.3 2.19.0
apache 2.4.62 2.4.64
apt 2.6.1 3.0.3
bash 5.2.15 5.2.37
ceph 16.2.11 18.2.7
docker 20.10.24 26.1.5
dovecot 2.3.19 2.4.1
dpkg 1.21.22 1.22.21
emacs 28.2 30.1
gcc 12.2.0 14.2.0
git 2.39.5 2.47.2
golang 1.19 1.24
libc 2.36 2.41
linux kernel 6.1 6.12
llvm 14.0 19.0
lxc 5.0.2 6.0.4
mariadb 10.11 11.8
nginx 1.22.1 1.26.3
nodejs 18.13 20.19
openjdk 17.0 21.0
openssh 9.2p1 10.0p1
openssl 3.0 3.5
perl 5.36.0 5.40.1
php 8.2+93 8.4+96
podman 4.3.1 5.4.2
postfix 3.7.11 3.10.3
postgres 15 17
puppet 7.23.0 8.10.0
python3 3.11.2 3.13.5
qemu/kvm 7.2 10.0
rsync 3.2.7 3.4.1
ruby 3.1 3.3
rust 1.63.0 1.85.0
samba 4.17.12 4.22.3
systemd 252.36 257.7-1
unattended-upgrades 2.9.1 2.12
util-linux 2.38.1 2.41
vagrant 2.3.4 2.3.7
vim 9.0.1378 9.1.1230
zsh 5.9 5.9
Misc unsorted apt The new apt version 3.0 brings several new features, including: systemd systemd got upgraded from v252.36-1~deb12u1 to 257.7-1 and there are lots of changes. Be aware that systemd v257 has a new net.naming_scheme, v257 being PCI slot number is now read from firmware_node/sun sysfs file. The naming scheme based on devicetree aliases was extended to support aliases for individual interfaces of controllers with multiple ports. This might affect you, see e.g. #1092176 and #1107187, the Debian Wiki provides further useful information. There are new systemd tools available: The tools provided by systemd gained several new options: Debian s systemd ships new binary packages: Linux Kernel The trixie release ships a Linux kernel based on latest longterm version 6.12. As usual there are lots of changes in the kernel area, including better hardware support, and this might warrant a separate blog entry. To highlight some changes with Debian trixie: See Kernelnewbies.org for further changes between kernel versions. Configuration management For puppet users, Debian provides the puppet-agent (v8.10.0), puppetserver (v8.7.0) and puppetdb (v8.4.1) packages. Puppet s upstream does not provide packages for trixie, yet. Given how long it took them for Debian bookworm, and with their recent Plans for Open Source Puppet in 2025, it s unclear when (and whether at all) we might get something. As a result of upstream behavior, also the OpenVox project evolved, and they already provide Debian 13/trixie support (https://apt.voxpupuli.org/openvox8-release-debian13.deb). FYI: the AIO puppet-agent package for bookworm (v7.34.0-1bookworm) so far works fine for me on Debian/trixie. Be aware that due to the apt-key removal you need a recent version of the puppetlabs-apt for usage with trixie. The puppetlabs-ntp module isn t yet ready for trixie (regarding ntp/ntpsec), if you should depend on that. ansible is available and made it with version 2.19 into trixie. Prometheus stack Prometheus server was updated from v2.42.0 to v2.53, and all the exporters that got shipped with bookworm are still around (in more recent versions of course). Trixie gained some new exporters: Virtualization docker (v26.1.5), ganeti (v3.1.0), libvirt (v11.3.0, be aware of significant changes to libvirt packaging), lxc (v6.0.4), podman (v5.4.2), openstack (see openstack-team on Salsa), qemu/kvm (v10.0.2), xen (v4.20.0) are all still around. Proxmox already announced their PVE 9.0 BETA, being based on trixie and providing 6.14.8-1 kernel, QEMU 10.0.2, LXC 6.0.4, OpenZFS 2.3.3. Vagrant is available in version 2.3.7, but Vagrant upstream does not provide packages for trixie yet. Given that HashiCorp adopted the BSL, the future of vagrant in Debian is unclear. If you re relying on VirtualBox, be aware that upstream doesn t provide packages for trixie, yet. VirtualBox is available from Debian/unstable (version 7.1.12-dfsg-1 as of 2025-07-20), but not shipped with stable release since quite some time (due to lack of cooperation from upstream on security support for older releases, see #794466). Be aware that starting with Linux kernel 6.12, KVM initializes virtualization on module loading by default. This prevents VirtualBox VMs from starting. In order to avoid this, either add kvm.enable_virt_at_load=0 parameter into kernel command line or unload the corresponding kvm_intel / kvm_amd module. If you want to use Vagrant with VirtualBox on trixie, be aware that Debian s vagrant package as present in trixie doesn t support the VirtualBox package version 7.1 as present in Debian/unstable (manually patching vagrant s meta.rb and rebuilding the package without Breaks: virtualbox (>= 7.1) is known to be working). util-linux The are plenty of new options available in the tools provided by util-linux: Now no longer present in util-linux as of trixie: The following binaries got moved from util-linux to the util-linux-extra package: And the util-linux-extra package also provides new tools: OpenSSH OpenSSH was updated from v9.2p1 to 10.0p1-5, so if you re interested in all the changes, check out the release notes between those versions (9.3, 9.4, 9.5, 9.6, 9.7, 9.8, 9.9 + 10.0). Let s highlight some notable behavior changes in Debian: There are some notable new features: Thanks to everyone involved in the release, looking forward to trixie + and happy upgrading!
Let s continue with working towards Debian/forky. :)

17 July 2025

Arnaud Rebillout: Acquire-By-Hash for APT packages repositories, and the lack of it in Kali Linux

This is a lenghty blog post. It features a long introduction that explains how apt update acquires various files from a package repository, what is Acquire-By-Hash, and how it all works for Kali Linux: a Debian-based distro that doesn't support Acquire-By-Hash, and which is distributed via a network of mirrors and a redirector. In a second part, I explore some "Hash Sum Mismatch" errors that we can hit with Kali Linux, errors that would not happen if only Acquire-By-Hash was supported. If anything, this blog post supports the case for adding Acquire-By-Hash support in reprepro, as requested at https://bugs.debian.org/820660. All of this could have just remained some personal notes for myself, but I got carried away and turned it into a blog post, dunno why... Hopefully others will find it interesting, but you really need to like troubleshooting stories, packed with details, and poorly written at that. You've been warned! Introducing Acquire-By-Hash Acquire-By-Hash is a feature of APT package repositories, that might or might not be supported by your favorite Debian-based distribution. A repository that supports it says so, in the Release file, by setting the field Acquire-By-Hash: yes. It's easy to check. Debian and Ubuntu both support it:
$ wget -qO- http://deb.debian.org/debian/dists/sid/Release   grep -i ^Acquire-By-Hash:
Acquire-By-Hash: yes
$ wget -qO- http://archive.ubuntu.com/ubuntu/dists/devel/Release   grep -i ^Acquire-By-Hash:
Acquire-By-Hash: yes
What about other Debian derivatives?
$ wget -qO- http://http.kali.org/kali/dists/kali-rolling/Release   grep -i ^Acquire-By-Hash:   echo not supported
not supported
$ wget -qO- https://archive.raspberrypi.com/debian/dists/trixie/Release   grep -i ^Acquire-By-Hash:   echo not supported
not supported
$ wget -qO- http://packages.linuxmint.com/dists/faye/Release   grep -i ^Acquire-By-Hash:   echo not supported
not supported
$ wget -qO- https://apt.pop-os.org/release/dists/noble/Release   grep -i ^Acquire-By-Hash:   echo not supported
not supported
Huhu, Acquire-By-Hash is not ubiquitous. But wait, what is Acquire-By-Hash to start with? To answer that, we have to take a step back and cover some basics first. The HTTP requests performed by 'apt update' What happens when one runs apt update? APT first requests the Release file from the repository(ies) configured in the APT sources. This file is a starting point, it contains a list of other files (sometimes called "Index files") that are available in the repository, along with their hashes. After fetching the Release file, APT proceeds to request those Index files. To give you an idea, there are many kinds of Index files, among which: There's an excellent Wiki page that details the structure of a Debian package repository, it's there: https://wiki.debian.org/DebianRepository/Format. Note that APT doesn't necessarily download ALL of those Index files. For simplicity, we'll limit ourselves to the minimal scenario, where apt update downloads only the Packages files. Let's try to make it more visual: here's a representation of a apt update transaction, assuming that all the components of the repository are enabled:
apt update -> Release -> Packages (main/amd64)
                      -> Packages (contrib/amd64)
                      -> Packages (non-free/amd64)
                      -> Packages (non-free-firmware/amd64)
Meaning that, in a first step, APT downloads the Release file, reads its content, and then in a second step it downloads the Index files in parallel. You can actually see that happen with a command such as apt -q -o Debug::Acquire::http=true update 2>&1 grep ^GET. For Kali Linux you'll see something pretty similar to what I described above. Try it!
$ podman run --rm kali-rolling apt -q -o Debug::Acquire::http=true update 2>&1   grep ^GET
GET /kali/dists/kali-rolling/InRelease HTTP/1.1    # <- returns a redirect, that is why the file is requested twice
GET /kali/dists/kali-rolling/InRelease HTTP/1.1
GET /kali/dists/kali-rolling/non-free/binary-amd64/Packages.gz HTTP/1.1
GET /kali/dists/kali-rolling/main/binary-amd64/Packages.gz HTTP/1.1
GET /kali/dists/kali-rolling/non-free-firmware/binary-amd64/Packages.gz HTTP/1.1
GET /kali/dists/kali-rolling/contrib/binary-amd64/Packages.gz HTTP/1.1
However, and it's now becoming interesting, for Debian or Ubuntu you won't see the same kind of URLs:
$ podman run --rm debian:sid apt -q -o Debug::Acquire::http=true update 2>&1   grep ^GET
GET /debian/dists/sid/InRelease HTTP/1.1
GET /debian/dists/sid/main/binary-amd64/by-hash/SHA256/22709f0ce67e5e0a33a6e6e64d96a83805903a3376e042c83d64886bb555a9c3 HTTP/1.1
APT doesn't download a file named Packages, instead it fetches a file named after a hash. Why? This is due to the field Acquire-By-Hash: yes that is present in the Debian's Release file. What does Acquire-By-Hash mean for 'apt update' The idea with Acquire-By-Hash is that the Index files are named after their hash on the repository, so if the MD5 sum of main/binary-amd64/Packages is 77b2c1539f816832e2d762adb20a2bb1, then the file will be stored at main/binary-amd64/by-hash/MD5Sum/77b2c1539f816832e2d762adb20a2bb1. The path main/binary-amd64/Packages still exists (it's the "Canonical Location" of this particular Index file), but APT won't use it, instead it downloads the file located in the by-hash/ directory. Why does it matter? This has to do with repository updates, and allowing the package repository to be updated atomically, without interruption of service, and without risk of failure client-side. It's important to understand that the Release file and the Index files are part of a whole, a set of files that go altogether, given that Index files are validated by their hash (as listed in the Release file) after download by APT. If those files are simply named "Release" and "Packages", it means they are not immutable: when the repository is updated, all of those files are updated "in place". And it causes problems. A typical failure mode for the client, during a repository update, is that: 1) APT requests the Release file, then 2) the repository is updated and finally 3) APT requests the Packages files, but their checksum don't match, causing apt update to fail. There are variations of this error, but you get the idea: updating a set of files "in place" is problematic. The Acquire-By-Hash mechanism was introduced exactly to solve this problem: now the Index files have a unique, immutable name. When the repository is updated, at first new Index files are added in the by-hash/ directory, and only after the Release file is updated. Old Index files in by-hash/ are retained for a while, so there's a grace period during which both the old and the new Release files are valid and working: the Index files that they refer to are available in the repo. As a result: no interruption of service, no failure client-side during repository updates. This is explained in more details at https://www.chiark.greenend.org.uk/~cjwatson/blog/no-more-hash-sum-mismatch-errors.html, which is the blog post from Colin Watson that came out at the time Acquire-By-Hash was introduced in... 2016. This is still an excellent read in 2025. So you might be wondering why I'm rambling about a problem that's been solved 10 years ago, but then as I've shown in the introduction, the problem is not solved for everyone. Support for Acquire-By-Hash server side is not for granted, and unfortunately it never landed in reprepro, as one can see at https://bugs.debian.org/820660. reprepro is a popular tool for creating APT package repositories. In particular, at Kali Linux we use reprepro, and that's why there's no Acquire-By-Hash: yes in the Kali Release file. As one can guess, it leads to subtle issues during those moments when the repository is updated. However... we're not ready to talk about that yet! There's still another topic that we need to cover: this window of time during which a repository is being updated, and during which apt update might fail. The window for Hash Sum Mismatches, and the APT trick that saves the day Pay attention! In this section, we're now talking about packages repositories that do NOT support Acquire-By-Hash, such as the Kali Linux repository. As I've said above, it's only when the repository is being updated that there is a "Hash Sum Mismatch Window", ie. a moment when apt update might fail for some unlucky clients, due to invalid Index files. Surely, it's a very very short window of time, right? I mean, it can't take that long to update files on a server, especially when you know that a repository is usually updated via rsync, and rsync goes to great length to update files the most atomically as it can (with the option --delay=updates). So if apt update fails for me, I've been very unlucky, but I can just retry in a few seconds and it should be fixed, isn't it? The answer is: it's not that simple. So far I pictured the "package repository" as a single server, for simplicity. But it's not always what it is. For Kali Linux, by default users have http.kali.org configured in their APT sources, and it is a redirector, ie. a web server that redirects requests to mirrors that are nearby the client. Some context that matters for what comes next: the Kali repository is synced with ~70 mirrors all around the world, 4 times a day. What happens if your apt update requests are redirected to 2 mirrors close-by, and one was just synced, while the other is still syncing (or even worse, failed to sync entirely)? You'll get a mix of old and new Index files. Hash Sum Mismatch! As you can see, with this setup the "Hash Sum Mismatch Window" becomes much longer than a few seconds: as long as nearby mirrors are syncing the window is opened. You could have a fast and a slow mirror next to you, and they can be out of sync with each other for several minutes every time the repository is updated, for example. For Kali Linux in particular, there's a "detail" in our network of mirrors that, as a side-effect, almost guarantees that this window lasts several minutes at least. This is because the pool of mirrors includes kali.download which is in fact the Cloudflare CDN, and from the redirector point of view, it's seen as a "super mirror" that is present in every country. So when APT fires a bunch of requests against http.kali.org, it's likely that some of them will be redirected to the Kali CDN, and others will be redirected to a mirror nearby you. So far so good, but there's another point of detail to be aware of: the Kali CDN is synced first, before the other mirrors. Another thing: usually the mirrors that are the farthest from the Tier-0 mirror are the longest to sync. Packing all of that together: if you live somewhere in Asia, it's not uncommon for your "Hash Sum Mismatch Window" to be as long as 30 minutes, between the moment the Kali CDN is synced, and the moment your nearby mirrors catch up and are finally in sync as well. Having said all of that, and assuming you're still reading (anyone here?), you might be wondering... Does that mean that apt update is broken 4 times a day, for around 30 minutes, for every Kali user out there? How can they bear with that? Answer is: no, of course not, it's not broken like that. It works despite all of that, and this is thanks to yet another detail that we didn't go into yet. This detail lies in APT itself. APT is in fact "redirector aware", in a sense. When it fetches a Release file, and if ever the request is redirected, it then fires the subsequent requests against the server where it was initially redirected. So you are guaranteed that the Release file and the Index files are retrieved from the same mirror! Which brings back our "Hash Sum Mismatch Window" to the window for a single server, ie. something like a few seconds at worst, hopefully. And that's what makes it work for Kali, literally. Without this trick, everything would fall apart. For reference, this feature was implemented in APT back in... 2016! A busy year it seems! Here's the link to the commit: use the same redirection mirror for all index files. To finish, a dump from the console. You can see this behaviour play out easily, again with APT debugging turned on. Below we can see that only the first request hits the Kali redirector:
$ podman run --rm kali-rolling apt -q -o Debug::Acquire::http=true update 2>&1   grep -e ^Answer -e ^HTTP
Answer for: http://http.kali.org/kali/dists/kali-rolling/InRelease
HTTP/1.1 302 Found
Answer for: http://mirror.freedif.org/kali/dists/kali-rolling/InRelease
HTTP/1.1 200 OK
Answer for: http://mirror.freedif.org/kali/dists/kali-rolling/non-free-firmware/binary-amd64/Packages.gz
HTTP/1.1 200 OK
Answer for: http://mirror.freedif.org/kali/dists/kali-rolling/contrib/binary-amd64/Packages.gz
HTTP/1.1 200 OK
Answer for: http://mirror.freedif.org/kali/dists/kali-rolling/main/binary-amd64/Packages.gz
HTTP/1.1 200 OK
Answer for: http://mirror.freedif.org/kali/dists/kali-rolling/non-free/binary-amd64/Packages.gz
HTTP/1.1 200 OK
Interlude Believe it or not, we're done with the introduction! At this point, we have a good understanding of what apt update does (in terms of HTTP requests), we know that Release files and Index files are part of a whole, and we know that a repository can be updated atomically thanks to the Acquire-By-Hash feature, so that users don't experience interruption of service or failures of any sort, even with a rolling repository that is updated several times a day, like Debian sid. We've also learnt that, despite the fact that Acquire-By-Hash landed almost 10 years ago, some distributions like Kali Linux are still doing without it... and yet it works! But the reason why it works is more complicated to grasp, especially when you add a network of mirrors and a redirector to the picture. Moreover, it doesn't work as flawlessly as with the Acquire-By-Hash feature: we still expect some short (seconds at worst) "Hash Sum Mismatch Windows" for those unlucky users that run apt update at the wrong moment. This was a long intro, but that really sets the stage for what comes next: the edge cases. Some situations in which we can hit some Hash Sum Mismatch errors with Kali. Error cases that I've collected and investigated over the time... If anything, it supports the case that Acquire-By-Hash is really something that should be implemented in reprepro. More on that in the conclusion, but for now, let's look at those edge cases. Edge Case 1: the caching proxy If you put a caching proxy (such as approx, my APT caching proxy of choice) between yourself and the actual packages repository, then obviously it's the caching proxy that performs the HTTP requests, and therefore APT will never know about the redirections returned by the server, if any. So the APT trick of downloading all the Index files from the same server in case of redirect doesn't work anymore. It was rather easy to confirm that by building a Kali package during a mirror sync, and watch if fail at the "Update chroot" step:
$ sudo rm /var/cache/approx/kali/dists/ -fr
$ gbp buildpackage --git-builder=sbuild
+------------------------------------------------------------------------------+
  Update chroot                                Wed, 11 Jun 2025 10:33:32 +0000  
+------------------------------------------------------------------------------+
Get:1 http://http.kali.org/kali kali-dev InRelease [41.4 kB]
Get:2 http://http.kali.org/kali kali-dev/contrib Sources [81.6 kB]
Get:3 http://http.kali.org/kali kali-dev/main Sources [17.3 MB]
Get:4 http://http.kali.org/kali kali-dev/non-free Sources [122 kB]
Get:5 http://http.kali.org/kali kali-dev/non-free-firmware Sources [8297 B]
Get:6 http://http.kali.org/kali kali-dev/non-free amd64 Packages [197 kB]
Get:7 http://http.kali.org/kali kali-dev/non-free-firmware amd64 Packages [10.6 kB]
Get:8 http://http.kali.org/kali kali-dev/contrib amd64 Packages [120 kB]
Get:9 http://http.kali.org/kali kali-dev/main amd64 Packages [21.0 MB]
Err:9 http://http.kali.org/kali kali-dev/main amd64 Packages
  File has unexpected size (20984689 != 20984861). Mirror sync in progress? [IP: ::1 9999]
  Hashes of expected file:
   - Filesize:20984861 [weak]
   - SHA256:6cbbee5838849ffb24a800bdcd1477e2f4adf5838a844f3838b8b66b7493879e
   - SHA1:a5c7e557a506013bd0cf938ab575fc084ed57dba [weak]
   - MD5Sum:1433ce57419414ffb348fca14ca1b00f [weak]
  Release file created at: Wed, 11 Jun 2025 07:15:10 +0000
Fetched 17.9 MB in 9s (1893 kB/s)
Reading package lists...
E: Failed to fetch http://http.kali.org/kali/dists/kali-dev/main/binary-amd64/Packages.gz  File has unexpected size (20984689 != 20984861). Mirror sync in progress? [IP: ::1 9999]
   Hashes of expected file:
    - Filesize:20984861 [weak]
    - SHA256:6cbbee5838849ffb24a800bdcd1477e2f4adf5838a844f3838b8b66b7493879e
    - SHA1:a5c7e557a506013bd0cf938ab575fc084ed57dba [weak]
    - MD5Sum:1433ce57419414ffb348fca14ca1b00f [weak]
   Release file created at: Wed, 11 Jun 2025 07:15:10 +0000
E: Some index files failed to download. They have been ignored, or old ones used instead.
E: apt-get update failed
The obvious workaround is to NOT use the redirector in the approx configuration. Either use a mirror close by, or the Kali CDN:
$ grep kali /etc/approx/approx.conf 
#kali http://http.kali.org/kali <- do not use the redirector!
kali  http://kali.download/kali
Edge Case 2: debootstrap struggles What if one tries to debootstrap Kali while mirrors are being synced? It can give you some ugly logs, but it might not be fatal:
$ sudo debootstrap kali-dev kali-dev http://http.kali.org/kali
[...]
I: Target architecture can be executed
I: Retrieving InRelease 
I: Checking Release signature
I: Valid Release signature (key id 827C8569F2518CC677FECA1AED65462EC8D5E4C5)
I: Retrieving Packages 
I: Validating Packages 
W: Retrying failed download of http://http.kali.org/kali/dists/kali-dev/main/binary-amd64/Packages.gz
I: Retrieving Packages 
I: Validating Packages 
W: Retrying failed download of http://http.kali.org/kali/dists/kali-dev/main/binary-amd64/Packages.gz
I: Retrieving Packages 
I: Validating Packages 
W: Retrying failed download of http://http.kali.org/kali/dists/kali-dev/main/binary-amd64/Packages.gz
I: Retrieving Packages 
I: Validating Packages 
W: Retrying failed download of http://http.kali.org/kali/dists/kali-dev/main/binary-amd64/Packages.gz
I: Retrieving Packages 
I: Validating Packages 
I: Resolving dependencies of required packages...
I: Resolving dependencies of base packages...
I: Checking component main on http://http.kali.org/kali...
I: Retrieving adduser 3.152
[...]
To understand this one, we have to go and look at the debootstrap source code. How does debootstrap fetch the Release file and the Index files? It uses wget, and it retries up to 10 times in case of failure. It's not as sophisticated as APT: it doesn't detect when the Release file is served via a redirect. As a consequence, what happens above can be explained as such:
  1. debootstrap requests the Release file, gets redirected to a mirror, and retrieves it from there
  2. then it requests the Packages file, gets redirected to another mirror that is not in sync with the first one, and retrieves it from there
  3. validation fails, since the checksum is not as expected
  4. try again and again
Since debootstrap retries up to 10 times, at some point it's lucky enough to get redirected to the same mirror as the one from where it got its Release file from, and this time it gets the right Packages file, with the expected checksum. So ultimately it succeeds. Edge Case 3: post-debootstrap failure I like this one, because it gets us to yet another detail that we didn't talk about yet. So, what happens after we successfully debootstraped Kali? We have only the main component enabled, and only the Index file for this component have been retrieved. It looks like that:
$ sudo debootstrap kali-dev kali-dev http://http.kali.org/kali
[...]
I: Base system installed successfully.
$ cat kali-dev/etc/apt/sources.list
deb http://http.kali.org/kali kali-dev main
$ ls -l kali-dev/var/lib/apt/lists/
total 80468
-rw-r--r-- 1 root root    41445 Jun 19 07:02 http.kali.org_kali_dists_kali-dev_InRelease
-rw-r--r-- 1 root root 82299122 Jun 19 07:01 http.kali.org_kali_dists_kali-dev_main_binary-amd64_Packages
-rw-r--r-- 1 root root    40562 Jun 19 11:54 http.kali.org_kali_dists_kali-dev_Release
-rw-r--r-- 1 root root      833 Jun 19 11:54 http.kali.org_kali_dists_kali-dev_Release.gpg
drwxr-xr-x 2 root root     4096 Jun 19 11:54 partial
So far so good. Next step would be to complete the sources.list with other components, then run apt update: APT will download the missing Index files. But if you're unlucky, that might fail:
$ sudo sed -i 's/main$/main contrib non-free non-free-firmware/' kali-dev/etc/apt/sources.list
$ cat kali-dev/etc/apt/sources.list
deb http://http.kali.org/kali kali-dev main contrib non-free non-free-firmware
$ sudo chroot kali-dev apt update
Hit:1 http://http.kali.org/kali kali-dev InRelease
Get:2 http://kali.download/kali kali-dev/contrib amd64 Packages [121 kB]
Get:4 http://mirror.sg.gs/kali kali-dev/non-free-firmware amd64 Packages [10.6 kB]
Get:3 http://mirror.freedif.org/kali kali-dev/non-free amd64 Packages [198 kB]
Err:3 http://mirror.freedif.org/kali kali-dev/non-free amd64 Packages
  File has unexpected size (10442 != 10584). Mirror sync in progress? [IP: 66.96.199.63 80]
  Hashes of expected file:
   - Filesize:10584 [weak]
   - SHA256:71a83d895f3488d8ebf63ccd3216923a7196f06f088461f8770cee3645376abb
   - SHA1:c4ff126b151f5150d6a8464bc6ed3c768627a197 [weak]
   - MD5Sum:a49f46a85febb275346c51ba0aa8c110 [weak]
  Release file created at: Fri, 23 May 2025 06:48:41 +0000
Fetched 336 kB in 4s (77.5 kB/s)  
Reading package lists... Done
E: Failed to fetch http://mirror.freedif.org/kali/dists/kali-dev/non-free/binary-amd64/Packages.gz  File has unexpected size (10442 != 10584). Mirror sync in progress? [IP: 66.96.199.63 80]
   Hashes of expected file:
    - Filesize:10584 [weak]
    - SHA256:71a83d895f3488d8ebf63ccd3216923a7196f06f088461f8770cee3645376abb
    - SHA1:c4ff126b151f5150d6a8464bc6ed3c768627a197 [weak]
    - MD5Sum:a49f46a85febb275346c51ba0aa8c110 [weak]
   Release file created at: Fri, 23 May 2025 06:48:41 +0000
E: Some index files failed to download. They have been ignored, or old ones used instead.
What happened here? Again, we need APT debugging options to have a hint:
$ sudo chroot kali-dev apt -q -o Debug::Acquire::http=true update 2>&1   grep -e ^Answer -e ^HTTP
Answer for: http://http.kali.org/kali/dists/kali-dev/InRelease
HTTP/1.1 304 Not Modified
Answer for: http://http.kali.org/kali/dists/kali-dev/contrib/binary-amd64/Packages.gz
HTTP/1.1 302 Found
Answer for: http://http.kali.org/kali/dists/kali-dev/non-free/binary-amd64/Packages.gz
HTTP/1.1 302 Found
Answer for: http://http.kali.org/kali/dists/kali-dev/non-free-firmware/binary-amd64/Packages.gz
HTTP/1.1 302 Found
Answer for: http://kali.download/kali/dists/kali-dev/contrib/binary-amd64/Packages.gz
HTTP/1.1 200 OK
Answer for: http://mirror.sg.gs/kali/dists/kali-dev/non-free-firmware/binary-amd64/Packages.gz
HTTP/1.1 200 OK
Answer for: http://mirror.freedif.org/kali/dists/kali-dev/non-free/binary-amd64/Packages.gz
HTTP/1.1 200 OK
As we can see above, for the Release file we get a 304 (aka. "Not Modified") from the redirector. Why is that? This is due to If-Modified-Since also known as RFC-7232. APT supports this feature when it retrieves the Release file, it basically says to the server "Give me the Release file, but only if it's newer than what I already have". If the file on the server is not newer than that, it answers with a 304, which basically says to the client "You have the latest version already". So APT doesn't get a new Release file, it uses the Release file that is already present locally in /var/lib/apt/lists/, and then it proceeeds to download the missing Index files. And as we can see above: it then hits the redirector for each requests, and might be redirected to different mirrors for each Index file. So the important bit here is: the APT "trick" of downloading all the Index files from the same mirror only works if the Release file is served via a redirect. If it's not, like in this case, then APT hits the redirector for each files it needs to download, and it's subject to the "Hash Sum Mismatch" error again. In practice, for the casual user running apt update every now and then, it's not an issue. If they have the latest Release file, no extra requests are done, because they also have the latest Index files, from a previous apt update transaction. So APT doesn't re-download those Index files. The only reason why they'd have the latest Release file, and would miss some Index files, would be that they added new components to their APT sources, like we just did above. Not so common, and then they'd need to run apt update at a unlucky moment. I don't think many users are affected in practice. Note that this issue is rather new for Kali Linux. The redirector running on http.kali.org is mirrorbits, and support for If-Modified-Since just landed in the latest release, version 0.6. This feature was added by no one else than me, a great example of the expression "shooting oneself in the foot". An obvious workaround here is to empty /var/lib/apt/lists/ in the chroot after debootstrap completed. Or we could disable support for If-Modified-Since entirely for Kali's instance of mirrorbits. Summary and Conclusion The Hash Sum Mismatch failures above are caused by a combination of things: At the same time: All in all, it seems that all those issues would go away if only Acquire-By-Hash was supported in the Kali packages repository. Now is not a bad moment to try to land this feature in reprepro. After development halted in 2019, there's now a new upstream, and patches are being merged again. But it won't be easy: reprepro is a C codebase of around 50k lines of code, and it will take time and effort for the newcomer to get acquainted with the codebase, to the point of being able to implement a significant feature like this one. As an alternative, aptly is another popular tool to manage APT package repositories. And it seems to support Acquire-By-Hash already. Another alternative: I was told that debusine has (experimental) support for package repositories, and that Acquire-By-Hash is supported as well. Options are on the table, and I hope that Kali will eventually get support for Acquire-By-Hash, one way or another. To finish, due credits: this blog post exists thanks to my employer OffSec. Thanks for reading!

15 July 2025

Valhalla's Things: Federated instant messaging, 100% debianized

Posted on July 15, 2025
Tags: madeof:bits, topic:xmpp, topic:debian
This is an approximation of what I told at my talk Federated instant messaging, 100% debianized at DebConf 25, for people who prefer reading text. There will also be a video recording, as soon as it s ready :) at the link above. Communicating is a basic human need, and today some kind of computer-mediated communication is a requirement for most people, especially those in this room. With everything that is happening, it s now more important than ever that these means of communication aren t controlled by entities that can t be trusted, whether because they can stop providing the service at any given time or worse because they are going to abuse it in order to extract more profit. If only there was a well established chat system based on some standard developed in an open way, with all of the features one expects from a chat system but federated so that one can choose between many different and independent providers, or even self-hosting. But wait, it does exist! I m not talking about IRC, I m talking about XMPP! While it has been around since the last millennium, it has not remained still, with hundred of XMPP Extension Protocols, or XEPs that have been developed to add all of the features that nobody in 1999 imagined we could need in Instant Messaging today, and more, such as IoT devices or even social networks. There is a myth that this makes XMPP a mess of incompatible software, but there is an XEP for that: XEP-0479: XMPP Compliance Suites 2023, which is a list of XEPs that needs to be supported by Instant Messaging servers and clients, including mobile ones, and all of the recommended ones will mostly just work. These include conversations.im on android, dino on linux, which also works pretty nicely on linux phones, gajim for a more fully featured option that includes the kitchen sink, profanity for text interface fanatics like me, and I ve heard that monal works decently enough on the iThings. One thing that sets XMPP apart from other federated protocols, is that it has already gone through the phase where everybody was on one very big server, which then cut out federation, and we ve learned from the experience. These days there are still a few places that cater to newcomers, like https://account.conversations.im/, https://snikket.org/ (which also includes tools to make it easier to host your own instance) and https://quicksy.im/, but most people are actually on servers of a manageable size. My strong recommendation is for community hosting: not just self-hosting for yourself, but finding a community you feel part of and trust, and share a server with them, whether managed by volunteers from the community itself, or by a paid provider. If you are a Debian Developer, you already have one: you can go to https://db.debian.org/ , select Change rtc password to set your own password, wait an hour or so and you re good to go, as described at the bottom of https://wiki.debian.org/Teams/DebianSocial. A few years ago it had remained a bit behind, but these days it s managed by an active team, and if you re missing some features, or just want to know what s happening with it, you can join their BoF on Friday afternoon (and also thank them for their work). But for most people in this room, I d also recommend finding a friend or two who can help as a backup, and run a server for your own families or community: as a certified lazy person who doesn t like doing sysadmin jobs, I can guarantee it s perfectly feasible, about in the same range of difficulty as running your own web server for a static site. The two most popular servers for this, prosody and ejabberd, are well maintained in Debian, and these days there isn t a lot more to do than installing them, telling them your hostname, setting up a few DNS entries, and then you mostly need to keep the machine updated and very little else. After that, it s just applying system security updates, upgrading everything every couple years (some configuration updates may be needed, but nothing major) and maybe helping some non-technical users, if you are hosting your non-technical friends (the kind who would need support on any other platform).
Question time (including IRC questions) included which server would be recommended for very few users (I use prosody and I m very happy with it, but I believe ejabberd works also just fine), then somebody reminded me that I had forgotten to mention https://www.chatons.org/ , which lists free, ethical and decentralized services, including xmpp ones. I was also asked a comparison with matrix, which does cover a very similar target as XMPP, but I am quite biased against it, and I d prefer to talk well of my favourite platform than badly of its competitor.

12 July 2025

Christian Kastner: Easy dynamic dispatch using GLIBC Hardware Capabilities

TL;DR With GLIBC 2.33+, you can build a shared library multiple times targeting various optimization levels, and the dynamic linker/loader will pick the highest version supported by the current CPU. For example, with the layout below, on a Ryzen 9 5900X, x86-64-v3/libfoo0.so would be loaded:
/usr/lib/glibc-hwcaps/x86-64-v4/libfoo0.so
/usr/lib/glibc-hwcaps/x86-64-v3/libfoo0.so
/usr/lib/glibc-hwcaps/x86-64-v2/libfoo0.so
/usr/lib/libfoo0.so
Longer Version GLIBC Hardware Capabilities or "hwcaps" are an easy, almost trivial way to add a simple form of dynamic dispatch to any amd64 or POWER build, provided that either the build target or the compiler's optimizations can make use of certain CPU extensions. Mo Zhou pointed me towards this when I was faced with the challenge of creating a performant Debian package for ggml, the tensor library behind llama.cpp and whisper.cpp.
The Challenge A performant yet universally loadable library needs to make use of some form of dynamic dispatch to leverage the most effective SIMD extensions available on any given CPU it may run on. Last January, when I first started with the packaging of ggml for Debian, ggml did have support for this through its GGML_CPU_ALL_VARIANTS=ON option, but this was limited to amd64. This meant that on all the other architectures that Debian supports, I would need to target some ancient baseline, thus effectively crippling the package there.
Dynamic Dispatch using hwcaps hwcaps were introduced in GLIBC 2.33 and replace the (now) Legacy Hardware Capabilities, which were removed in 2.37. The way hwcaps work is delightfully simple: the dynamic linker/loader will look for a shared library not just in the standard library paths, but also in subdirectories thereof of the form hwcaps/<level>, starting with the highest <level> that the current CPU supports. The levels are predefined. I'm using the amd64 levels below. For ggml, this meant that I simply could build the library in multiple passes, each time targeting a different <level>, and install the result in the corresponding subdirectory, which resulted in the following layout (reduced to libggml.so for brevity):
/usr/lib/x86_64-linux-gnu/ggml/glibc-hwcaps/x86-64-v4/libggml.so
/usr/lib/x86_64-linux-gnu/ggml/glibc-hwcaps/x86-64-v3/libggml.so
/usr/lib/x86_64-linux-gnu/ggml/glibc-hwcaps/x86-64-v2/libggml.so
/usr/lib/x86_64-linux-gnu/ggml/libggml.so
In practice, this means that on a CPU supporting AVX512, the linker/loader would load x86-64-v4/libggml.so if it existed, and otherwise continue to look for the other levels, all the way down to the lowest one. On a CPU which supported only SSE4.2, the lookup process would be the same, ending with picking x86-64-v2/libggml.so. With QEMU, all of this was quickly verified. Note that the lowest-level library, targeting x86-64-v1, is not installed to a subdirectory, but to the path where the library would normally have been installed. This has the nice property that on systems not using GLIBC, and thus not having hwcaps available, package installation will still result in a loadable library, albeit the version with the worst performance. And a careful observer might have noticed that in the example above, the library is installed to a private ggml/ directory, so this mechanism also works when using RUNPATH or LD_LIBRARY_PATH. As mentioned above, Debian's ggml package will soon switch to GGML_CPU_ALL_VARIANTS=ON, but this was still quite the useful feature to discover.

Reproducible Builds: Reproducible Builds in June 2025

Welcome to the 6th report from the Reproducible Builds project in 2025. Our monthly reports outline what we ve been up to over the past month, and highlight items of news from elsewhere in the increasingly-important area of software supply-chain security. If you are interested in contributing to the Reproducible Builds project, please see the Contribute page on our website. In this report:
  1. Reproducible Builds at FOSSY 2025
  2. Distribution work
  3. diffoscope
  4. OSS Rebuild updates
  5. Website updates
  6. Upstream patches
  7. Reproducibility testing framework

Reproducible Builds at FOSSY 2025 On Saturday 2nd August, Vagrant Cascadian and Chris Lamb will be presenting at this year s FOSSY 2025. Their talk, titled Never Mind the Checkboxes, Here s Reproducible Builds!, is being introduced as follows:
There are numerous policy compliance and regulatory processes being developed that target software development but do they solve actual problems? Does it improve the quality of software? Do Software Bill of Materials (SBOMs) actually give you the information necessary to verify how a given software artifact was built? What is the goal of all these compliance checklists anyways or more importantly, what should the goals be? If a software object is signed, who should be trusted to sign it, and can they be trusted forever?
The talk will introduce the audience to Reproducible Builds as a set of best practices which allow users and developers to verify that software artifacts were built from the source code, but also allows auditing for license compliance, providing security benefits, and removes the need to trust arbitrary software vendors. Hosted by the Software Freedom Conservancy and taking place in Portland, Oregon, USA, FOSSY aims to be a community-focused event: Whether you are a long time contributing member of a free software project, a recent graduate of a coding bootcamp or university, or just have an interest in the possibilities that free and open source software bring, FOSSY will have something for you . More information on the event is available on the FOSSY 2025 website, including the full programme schedule. Vagrant and Chris will also be staffing a table this year, where they will be available to answer any questions about Reproducible Builds and discuss collaborations with other projects.

Distribution work In Debian this month:
  • Holger Levsen has discovered that it is now possible to bootstrap a minimal Debian trixie using 100% reproducible packages. This result can itself be reproduced, using the debian-repro-status tool and mmdebstrap s support for hooks:
      $ mmdebstrap --variant=apt --include=debian-repro-status \
           --chrooted-customize-hook=debian-repro-status \
           trixie /dev/null 2>&1   grep "Your system has"
       INFO  debian-repro-status > Your system has 100.00% been reproduced.
    
  • On our mailing list this month, Helmut Grohne wrote an extensive message raising an issue related to Uploads with conflicting buildinfo filenames:
    Having several .buildinfo files for the same architecture is something that we plausibly want to have eventually. Imagine running two sets of buildds and assembling a single upload containing buildinfo files from both buildds in the same upload. In a similar vein, as a developer I may want to supply several .buildinfo files with my source upload (e.g. for multiple architectures). Doing any of this is incompatible with current incoming processing and with reprepro.
  • 5 reviews of Debian packages were added, 4 were updated and 8 were removed this month adding to our ever-growing knowledge about identified issues.

In GNU Guix, Timothee Mathieu reported that a long-standing issue with reproducibility of shell containers across different host operating systems has been solved. In their message, Timothee mentions:
I discovered that pytorch (and maybe other dependencies) has a reproducibility problem of order 1e-5 when on AVX512 compared to AVX2. I first tried to solve the problem by disabling AVX512 at the level of pytorch, but it did not work. The dev of pytorch said that it may be because some components dispatch computation to MKL-DNN, I tried to disable AVX512 on MKL, and still the results were not reproducible, I also tried to deactivate in openmpi without success. I finally concluded that there was a problem with AVX512 somewhere in the dependencies graph but I gave up identifying where, as this seems very complicated.

The IzzyOnDroid Android APK repository made more progress in June. Not only have they just passed 48% reproducibility coverage, Ben started making their reproducible builds more visible, by offering rbtlog shields, a kind of badge that has been quickly picked up by many developers who are proud to present their applications reproducibility status.
Lastly, in openSUSE news, Bernhard M. Wiedemann posted another monthly update for their work there.

diffoscope diffoscope is our in-depth and content-aware diff utility that can locate and diagnose reproducibility issues. This month, Chris Lamb made the following changes, including preparing and uploading versions 298, 299 and 300 to Debian:
  • Add python3-defusedxml to the Build-Depends in order to include it in the Docker image. [ ]
  • Handle the RPM format s HEADERSIGNATURES and HEADERIMMUTABLE as a special-case to avoid unnecessarily large diffs. Thanks to Daniel Duan for the report and suggestion. [ ][ ]
  • Update copyright years. [ ]
In addition, @puer-robustus fixed a regression introduced in an earlier commit which resulted in some differences being lost. [ ][ ] Lastly, Vagrant Cascadian updated diffoscope in GNU Guix to version 299 [ ][ ] and 300 [ ][ ].

OSS Rebuild updates OSS Rebuild has added a new network analyzer that provides transparent HTTP(S) interception during builds, capturing all network traffic to monitor external dependencies and identify suspicious behavior, even in unmodified maintainer-controlled build processes. The text-based user interface now features automated failure clustering that can group similar rebuild failures and provides natural language failure summaries, making it easier to identify and understand patterns across large numbers of build failures. OSS Rebuild has also improved the local development experience with a unified interface for build execution strategies, allowing for more extensible environment setup for build execution. The team also designed a new website and logo.

Website updates Once again, there were a number of improvements made to our website this month including:
  • Arnaud Brousseau added Stage , a new Linux distribution, to our Tools page.
  • Chris Lamb improved the docker instructions on the diffoscope website. [ ]


Upstream patches The Reproducible Builds project detects, dissects and attempts to fix as many currently-unreproducible packages as possible. We endeavour to send all of our patches upstream where appropriate. This month, we wrote a large number of such patches, including:

Reproducibility testing framework The Reproducible Builds project operates a comprehensive testing framework running primarily at tests.reproducible-builds.org in order to check packages and other artifacts for reproducibility. In June, however, a number of changes were made by Holger Levsen, including:
  • reproduce.debian.net-related:
    • Installed and deployed rebuilderd version 0.24 from Debian unstable in order to make use of the new compression feature added by Jarl Gullberg for the database. This resulted in massive decrease of the SQLite databases:
      • 79G 2.8G (all)
      • 84G 3.2G (amd64)
      • 75G 2.9G (arm64)
      • 45G 2.1G (armel)
      • 48G 2.2G (armhf)
      • 73G 2.8G (i386)
      • 72G 2.7G (ppc64el)
      • 45G 2.1G (riscv64)
      for a combined saving from 521G 20.8G. This naturally reduces the requirements to run an independent rebuilderd instance and will permit us to add more Debian suites as well.
    • During migration to the latest version of rebuilderd, make sure several services are not started. [ ]
    • Actually run rebuilderd from /usr/bin. [ ]
    • Raise temperatures for NVME devices on some riscv64 nodes that should be ignored. [ ][ ]
    • Use a 64KB kernel page size on the ppc64el architecture (see #1106757). [ ]
    • Improve ordering of some failed to reproduce statistics. [ ]
    • Detect a number of potential causes of build failures within the statistics. [ ][ ]
    • Add support for manually scheduling for the any architecture. [ ]
  • Misc:
    • Update the Codethink nodes as there are now many kernels installed. [ ][ ]
    • Install linux-sysctl-defaults on Debian trixie systems as we need ping functionality. [ ]
    • Limit the fs.nr_open kernel turnable. [ ]
    • Stop submitting results to deprecated buildinfo.debian.net service. [ ][ ]
In addition, Jochen Sprickerhof greatly improved the statistics and the logging functionality, including adopting to the new database format of rebuilderd version 0.24.0 [ ] and temporarily increasing maximum log size in order to debug a nettlesome build [ ]. Jochen also dropped the CPUSchedulingPolicy=idle systemd flag on the workers. [ ]

Finally, if you are interested in contributing to the Reproducible Builds project, please visit our Contribute page on our website. However, you can get in touch with us via:

5 July 2025

Bits from Debian: Bits from the DPL

Dear Debian community, This is bits from the DPL for June. The Challenge of Mentoring Newcomers In June there was an extended discussion about the ongoing challenges around mentoring newcomers in Debian. As many of you know, this is a topic I ve cared about deeply--long before becoming DPL. In my view, the issue isn t just a matter of lacking tools or needing to try harder to attract contributors. Anyone who followed the discussion will likely agree that it s more complex than that. I sometimes wonder whether Debian s success contributes to the problem. From the outside, things may appear to just work , which can lead to the impression: Debian is doing fine without me--they clearly have everything under control. But that overlooks how much volunteer effort it takes to keep the project running smoothly. We should make it clearer that help is always needed--not only in packaging, but also in writing technical documentation, designing web pages, reaching out to upstreams about license issues, finding sponsors, or organising events. (Speaking from experience, I would have appreciated help in patiently explaining Free Software benefits to upstream authors.) Sometimes we think too narrowly about what newcomers can do, and also about which tasks could be offloaded from overcommitted contributors. In fact, one of the most valuable things a newcomer can contribute is better documentation. Those of us who ve been around for years may be too used to how things work--or make assumptions about what others already know. A person who just joined the project is often in the best position to document what s confusing, what s missing, and what they wish they had known sooner. In that sense, the recent "random new contributor s experience" posts might be a useful starting point for further reflection. I think we can learn a lot from positive user stories, like this recent experience of a newcomer adopting the courier package. I'm absolutely convinced that those who just found their way into Debian have valuable perspectives--and that we stand to learn the most from listening to them. We should also take seriously what Russ Allbery noted in the discussion: "This says bad things about the project's sustainability and I think everyone knows that." Volunteers move on--that s normal and expected. But it makes it all the more important that we put effort into keeping Debian's contributor base at least stable, if not growing. Project-wide LLM budget for helping people Lucas Nussbaum has volunteered to handle the paperwork and submit a request on Debian s behalf to LLM providers, aiming to secure project-wide access for Debian Developers. If successful, every DD will be free to use this access--or not--according to their own preferences. Kind regards Andreas.

4 July 2025

Sahil Dhiman: Secondary Authoritative Name Server Options for Self-Hosted Domains

In the past few months, I have moved authoritative name servers (NS) of two of my domains (sahilister.net and sahil.rocks) in house using PowerDNS. Subdomains of sahilister.net see roughly 320,000 hits/day across my IN and DE mirror nodes, so adding secondary name servers with good availability (in addition to my own) servers was one of my first priorities. I explored the following options for my secondary NS, which also didn t cost me anything:

1984 Hosting

Hurriance Electric

Afraid.org

Puck

NS-Global

Asking friends Two of my friends and fellow mirror hosts have their own authoritative name server setup, Shrirang (ie albony) and Luke. Shirang gave me another POP in IN and through Luke (who does have an insane amount of in-house NS, see dig ns jing.rocks +short), I added a JP POP. If we know each other, I would be glad to host a secondary NS for you in (IN and/or DE locations).

Some notes
  • Adding a third-party secondary is putting trust that the third party would serve your zone right.
  • Hurricane Electric and 1984 hosting provide multiple NS. One can use some or all of them. Ideally, you can get away with just using your own with full set from any of these two. Play around with adding and removing secondaries, which gives you the best results. . Using everyone is anyhow overkill, unless you have specific reasons for it.
  • Moving NS in-house isn t that hard. Though, be prepared to get it wrong a few times (and some more). I have already faced partial outages because:
    • Recursive resolvers (RR) in the wild behave in a weird way and cache the wrong NS response for longer time than in TTL.
    • NS expiry took more than time. 2 out of 3 of my Netim s NS (my domain registrar) had stopped serving my domain, while RRs in the wild hadn t picked up my new in-house NS. I couldn t really do anything about it, though.
    • Dot is pretty important at the end.
    • With HE.net, I forgot to delegate my domain on their panel and just added in my NS set, thinking I ve already done so (which I did but for another domain), leading to a lame server situation.
  • In terms of serving traffic, there s no distinction between primary and secondary NS. RR don t really care who they re asking the query to. So one can have hidden primary too.
  • I initially thought of adding periodic RIPE Atlas measurements from the global set but thought against it as I already host a termux mirror, which brings in thousands of queries from around the world leading to a diverse set of RRs querying my domain already.
  • In most cases, query resolution time would increase with out of zone NS servers (which most likely would be in external secondary). 1 query vs. 2 queries. Pay close attention to ADDITIONAL SECTION Shrirang s case followed by mine:
$ dig ns albony.in
; <<>> DiG 9.18.36 <<>> ns albony.in
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 60525
;; flags: qr rd ra; QUERY: 1, ANSWER: 4, AUTHORITY: 0, ADDITIONAL: 9
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 65494
;; QUESTION SECTION:
;albony.in.			IN	NS
;; ANSWER SECTION:
albony.in.		1049	IN	NS	ns3.albony.in.
albony.in.		1049	IN	NS	ns4.albony.in.
albony.in.		1049	IN	NS	ns2.albony.in.
albony.in.		1049	IN	NS	ns1.albony.in.
;; ADDITIONAL SECTION:
ns3.albony.in.		1049	IN	AAAA	2a14:3f87:f002:7::a
ns1.albony.in.		1049	IN	A	82.180.145.196
ns2.albony.in.		1049	IN	AAAA	2403:44c0:1:4::2
ns4.albony.in.		1049	IN	A	45.64.190.62
ns2.albony.in.		1049	IN	A	103.77.111.150
ns1.albony.in.		1049	IN	AAAA	2400:d321:2191:8363::1
ns3.albony.in.		1049	IN	A	45.90.187.14
ns4.albony.in.		1049	IN	AAAA	2402:c4c0:1:10::2
;; Query time: 29 msec
;; SERVER: 127.0.0.53#53(127.0.0.53) (UDP)
;; WHEN: Fri Jul 04 07:57:01 IST 2025
;; MSG SIZE  rcvd: 286
vs mine
$ dig ns sahil.rocks
; <<>> DiG 9.18.36 <<>> ns sahil.rocks
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 64497
;; flags: qr rd ra; QUERY: 1, ANSWER: 11, AUTHORITY: 0, ADDITIONAL: 1
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 65494
;; QUESTION SECTION:
;sahil.rocks.			IN	NS
;; ANSWER SECTION:
sahil.rocks.		6385	IN	NS	ns5.he.net.
sahil.rocks.		6385	IN	NS	puck.nether.net.
sahil.rocks.		6385	IN	NS	colin.sahilister.net.
sahil.rocks.		6385	IN	NS	marvin.sahilister.net.
sahil.rocks.		6385	IN	NS	ns2.afraid.org.
sahil.rocks.		6385	IN	NS	ns4.he.net.
sahil.rocks.		6385	IN	NS	ns2.albony.in.
sahil.rocks.		6385	IN	NS	ns3.jing.rocks.
sahil.rocks.		6385	IN	NS	ns0.1984.is.
sahil.rocks.		6385	IN	NS	ns1.1984.is.
sahil.rocks.		6385	IN	NS	ns-global.kjsl.com.
;; Query time: 24 msec
;; SERVER: 127.0.0.53#53(127.0.0.53) (UDP)
;; WHEN: Fri Jul 04 07:57:20 IST 2025
;; MSG SIZE  rcvd: 313
  • Theoretically speaking, a small increase/decrease in resolution would occur based on the chosen TLD and the popularity of the TLD in query originators area (already cached vs. fresh recursion).
  • One can get away with having only 3 NS (or be like Google and have 4 anycast NS or like Amazon and have 8 or like Verisign and make it 13 :P).
  • Nowhere it s written, your NS needs not to be called dns* or ns1, ns2 etc. Get creative with naming NS; be deceptive with the naming :D.
  • A good understanding of RR behavior can help engineer a good authoritative NS system.

Further reading

3 July 2025

Sergio Cipriano: Disable sleep on lid close

Disable sleep on lid close I am using an old laptop in my homelab, but I want to do everything from my personal computer, with ssh. The default behavior in Debian is to suspend when the laptop lid is closed, but it's easy to change that, just edit
/etc/systemd/logind.conf
and change the line
#HandleLidSwitch=suspend
to
HandleLidSwitch=ignore
then
$ sudo systemctl restart systemd-logind
That's it.

2 July 2025

Dirk Eddelbuettel: RcppArmadillo 14.6.0-1 on CRAN: New Upstream Minor Release

armadillo image Armadillo is a powerful and expressive C++ template library for linear algebra and scientific computing. It aims towards a good balance between speed and ease of use, has a syntax deliberately close to Matlab, and is useful for algorithm development directly in C++, or quick conversion of research code into production environments. RcppArmadillo integrates this library with the R environment and language and is widely used by (currently) 1241 other packages on CRAN, downloaded 40.4 million times (per the partial logs from the cloud mirrors of CRAN), and the CSDA paper (preprint / vignette) by Conrad and myself has been cited 634 times according to Google Scholar. Conrad released a minor version 4.6.0 yesterday which offers new accessors for non-finite values. And despite being in Beautiful British Columbia on vacation, I had wrapped up two rounds of reverse dependency checks preparing his 4.6.0 release, and shipped this to CRAN this morning where it passed with flying colours and no human intervention even with over 1200 reverse dependencies. The changes since the last CRAN release are summarised below.

Changes in RcppArmadillo version 14.6.0-1 (2025-07-02)
  • Upgraded to Armadillo release 14.6.0 (Caffe Mocha)
    • Added balance() to transform matrices so that column and row norms are roughly the same
    • Added omit_nan() and omit_nonfinite() to extract elements while omitting NaN and non-finite values
    • Added find_nonnan() for finding indices of non-NaN elements
    • Added standalone replace() function
  • The fastLm() help page now mentions that options to solve() can control its behavior.

Courtesy of my CRANberries, there is a diffstat report relative to previous release. More detailed information is on the RcppArmadillo page. Questions, comments etc should go to the rcpp-devel mailing list off the Rcpp R-Forge page.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. If you like this or other open-source work I do, you can sponsor me at GitHub.

Dirk Eddelbuettel: Rcpp 1.1.0 on CRAN: C++11 now Minimum, Regular Semi-Annual Update

rcpp logo With a friendly Canadian hand wave from vacation in Beautiful British Columbia, and speaking on behalf of the Rcpp Core Team, I am excited to shared that the (regularly scheduled bi-annual) update to Rcpp just brought version 1.1.0 to CRAN. Debian builds haven been prepared and uploaded, Windows and macOS builds should appear at CRAN in the next few days, as will builds in different Linux distribution and of course r2u should catch up tomorrow as well. The key highlight of this release is the switch to C++11 as minimum standard. R itself did so in release 4.0.0 more than half a decade ago; if someone is really tied to an older version of R and an equally old compiler then using an older Rcpp with it has to be acceptable. Our own tests (using continuous integration at GitHub) still go back all the way to R 3.5.* and work fine (with a new-enough compiler). In the previous release post, we commented that we had only reverse dependency (falsely) come up in the tests by CRAN, this time there was none among the well over 3000 packages using Rcpp at CRAN. Which really is quite amazing, and possibly also a testament to our rigorous continued testing of our development and snapshot releases on the key branch. This release continues with the six-months January-July cycle started with release 1.0.5 in July 2020. As just mentioned, we do of course make interim snapshot dev or rc releases available. While we not longer regularly update the Rcpp drat repo, the r-universe page and repo now really fill this role admirably (and with many more builds besides just source). We continue to strongly encourage their use and testing I run my systems with these versions which tend to work just as well, and are of course also fully tested against all reverse-dependencies. Rcpp has long established itself as the most popular way of enhancing R with C or C++ code. Right now, 3038 packages on CRAN depend on Rcpp for making analytical code go faster and further. On CRAN, 13.6% of all packages depend (directly) on Rcpp, and 61.3% of all compiled packages do. From the cloud mirror of CRAN (which is but a subset of all CRAN downloads), Rcpp has been downloaded 100.8 million times. The two published papers (also included in the package as preprint vignettes) have, respectively, 2023 (JSS, 2011) and 380 (TAS, 2018) citations, while the the book (Springer useR!, 2013) has another 695. As mentioned, this release switches to C++11 as the minimum standard. The diffstat display in the CRANberries comparison to the previous release shows how several (generated) sources files with C++98 boilerplate have now been removed; we also flattened a number of if/else sections we no longer need to cater to older compilers (see below for details). We also managed more accommodation for the demands of tighter use of the C API of R by removing DATAPTR and CLOENV use. A number of other changes are detailed below. The full list below details all changes, their respective PRs and, if applicable, issue tickets. Big thanks from all of us to all contributors!

Changes in Rcpp release version 1.1.0 (2025-07-01)
  • Changes in Rcpp API:
    • C++11 is now the required minimal C++ standard
    • The std::string_view type is now covered by wrap() (Lev Kandel in #1356 as discussed in #1357)
    • A last remaining DATAPTR use has been converted to DATAPTR_RO (Dirk in #1359)
    • Under R 4.5.0 or later, R_ClosureEnv is used instead of CLOENV (Dirk in #1361 fixing #1360)
    • Use of lsInternal switched to lsInternal3 (Dirk in #1362)
    • Removed compiler detection macro in a header cleanup setting C++11 as the minunum (Dirk in #1364 closing #1363)
    • Variadic templates are now used onconditionally given C++11 (Dirk in #1367 closing #1366)
    • Remove RCPP_USING_CXX11 as a #define as C++11 is now a given (Dirk in #1369)
    • Additional cleanup for __cplusplus checks (I aki in #1371 fixing #1370)
    • Unordered set construction no longer needs a macro for the pre-C++11 case (I aki in #1372)
    • Lambdas are supported in a Rcpp Sugar functions (I aki in #1373)
    • The Date(time)Vector classes now have default ctor (Dirk in #1385 closing #1384)
    • Fixed an issue where Rcpp::Language would duplicate its arguments (Kevin in #1388, fixing #1386)
  • Changes in Rcpp Attributes:
    • The C++26 standard now has plugin support (Dirk in #1381 closing #1380)
  • Changes in Rcpp Documentation:
    • Several typos were correct in the NEWS file (Ben Bolker in #1354)
    • The Rcpp Libraries vignette mentions PACKAGE_types.h to declare types used in RcppExports.cpp (Dirk in #1355)
    • The vignettes bibliography file was updated to current package versions, and now uses doi references (Dirk in #1389)
  • Changes in Rcpp Deployment:
    • Rcpp.package.skeleton() creates URL and BugReports if given a GitHub username (Dirk in #1358)
    • R 4.4.* has been added to the CI matrix (Dirk in #1376)
    • Tests involving NA propagation are skipped under linux-arm64 as they are under macos-arm (Dirk in #1379 closing #1378)

Thanks to my CRANberries, you can also look at a diff to the previous release Questions, comments etc should go to the rcpp-devel mailing list off the R-Forge page. Bugs reports are welcome at the GitHub issue tracker as well (where one can also search among open or closed issues).

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. If you like this or other open-source work I do, you can sponsor me at GitHub.

1 July 2025

Guido G nther: Free Software Activities June 2025

Another short status update of what happened on my side last month. Phosh 0.48.0 is out with nice improvements, phosh.mobi e.V. is alive, helped a bit to get cellbroadcastd out, osk bugfixes and some more: See below for details on the above and more: phosh phoc phosh-mobile-settings stevia (formerly phosh-osk-stub) phosh-osk-data phosh-vala-plugins pfs xdg-desktop-portal-phosh xdg-desktop-portal phosh-debs meta-phosh feedbackd feedbackd-device-themes gmobile GNOME clocks Debian Mobian ModemManager libmbim libqmi mobile-broadband-provider-info Cellbroadcastd iio-sensor-proxy twenty-twenty-hugo gotosocial Reviews This is not code by me but reviews on other peoples code. The list is (as usual) slightly incomplete. Thanks for the contributions! Help Development If you want to support my work see donations. Comments? Join the Fediverse thread

30 June 2025

Otto Kek l inen: Corporate best practices for upstream open source contributions

Featured image of post Corporate best practices for upstream open source contributions
This post is based on presentation given at the Validos annual members meeting on June 25th, 2025.
When I started getting into Linux and open source over 25 years ago, the majority of the software development in this area was done by academics and hobbyists. The number of companies participating in open source has since exploded in parallel with the growth of mobile and cloud software, the majority of which is built on top of open source. For example, Android powers most mobile phones today and is based on Linux. Almost all software used to operate large cloud provider data centers, such as AWS or Google, is either open source or made in-house by the cloud provider. Pretty much all companies, regardless of the industry, have been using open source software at least to some extent for years. However, the degree to which they collaborate with the upstream origins of the software varies. I encourage all companies in a technical industry to start contributing upstream. There are many benefits to having a good relationship with your upstream open source software vendors, both for the short term and especially for the long term. Moreover, with the rollout of CRA in EU in 2025-2027, the law will require software companies to contribute security fixes upstream to the open source projects their products use. To ensure the process is well managed, business-aligned and legally compliant, there are a few do s and don t do s that are important to be aware of.

Maintain your SBOMs For every piece of software, regardless of whether the code was done in-house, from an open source project, or a combination of these, every company needs to produce a Software Bill of Materials (SBOM). The SBOMs provide a standardized and interoperable way to track what software and which versions are used where, what software licenses apply, who holds the copyright of which component, which security fixes have been applied and so forth. A catalog of SBOMs, or equivalent, forms the backbone of software supply-chain management in corporations.

Identify your strategic upstream vendors The SBOMs are likely to reveal that for any piece of non-trivial software, there are hundreds or thousands of upstream open source projects in use. Few organizations have resources to contribute to all of their upstreams. If your organization is just starting to organize upstream contribution activities, identify the key projects that have the largest impact on your business and prioritize forming a relationship with them first. Organizations with a mature contribution process will be collaborating with tens or hundreds of upstreams.

Appoint an internal coordinator and champions Having a written policy on how to contribute upstream will help ensure a consistent process and avoid common pitfalls. However, a written policy alone does not automatically translate into a well-running process. It is highly recommended to appoint at least one internal coordinator who is knowledgeable about how open source communities work, how software licensing and patents work, and is senior enough to have a good sense of what business priorities to optimize for. In small organizations it can be a single person, while larger organizations typically have a full Open Source Programs Office. This coordinator should oversee the contribution process, track all contributions made across the organization, and further optimize the process by working with stakeholders across the business, including legal experts, business owners and CTOs. The marketing and recruiting folks should also be involved, as upstream contributions will have a reputation-building aspect as well, which can be enhanced with systematic tracking and publishing of activities. Additionally, at least in the beginning, the organization should also appoint key staff members as open source champions. Implementing a new process always includes some obstacles and occasional setbacks, which may discourage employees from putting in the extra effort to reap the full long-term benefits for the company. Having named champions will empower them to make the first few contributions themselves, setting a good example and encouraging and mentoring others to contribute upstream as well.

Avoid excessive approvals To maintain a high quality bar, it is always good to have all outgoing submissions reviewed by at least one or two people. Two or three pairs of eyeballs are significantly more likely to catch issues that might slip by someone working alone. The review also slows down the process by a day or two, which gives the author time to sleep on it , which usually helps to ensure the final submission is well-thought-out by the author. Do not require more than one or two reviewers. The marginal utility goes quickly to zero beyond a few reviewers, and at around four or five people the effect becomes negative, as the weight of each approval decreases and the reviewers begin to take less personal responsibility. Having too many people in the loop also makes each feedback round slow and expensive, to the extent that the author will hesitate to make updates and ask for re-reviews due to the costs involved. If the organization experiences setbacks due to mistakes slipping through the review process, do not respond by adding more reviewers, as it will just grind the contribution process to a halt. If there are quality concerns, invest in training for engineers, CI systems and perhaps an internal certification program for those making public upstream code submissions. A typical software engineer is more likely to seriously try to become proficient at their job and put effort into a one-off certification exam and then make multiple high-quality contributions, than it is for a low-skilled engineer to improve and even want to continue doing more upstream contributions if they are burdened by heavy review processes every time they try to submit an upstream contribution.

Don t expect upstream to accept all code contributions Sure, identifying the root cause of and fixing a tricky bug or writing a new feature requires significant effort. While an open source project will certainly appreciate the effort invested, it doesn t mean it will always welcome all contributions with open arms. Occasionally, the project won t agree that the code is correct or the feature is useful, and some contributions are bound to be rejected. You can minimize the chance of experiencing rejections by having a solid internal review process that includes assessing how the upstream community is likely to understand the proposal. Sometimes how things are communicated is more important than how they are coded. Polishing inline comments and git commit messages help ensure high-quality communication, along with a commitment to respond quickly to review feedback and conducting regular follow-ups until a contribution is finalized and accepted.

Start small to grow expertise and reputation In addition to keeping the open source contribution policy lean and nimble, it is also good to start practical contributions with small issues. Don t aim to contribute massive features until you have a track record of being able to make multiple small contributions. Keep in mind that not all open source projects are equal. Each has its own culture, written and unwritten rules, development process, documented requirements (which may be outdated) and more. Starting with a tiny contribution, even just a typo fix, is a good way to validate how code submissions, reviews and approvals work in a particular project. Once you have staff who have successfully landed smaller contributions, you can start planning larger proposals. The exact same proposal might be unsuccessful when proposed by a new person, and successful when proposed by a person who already has a reputation for prior high-quality work.

Embrace all and any publicity you get Some companies have concerns about their employees working in the open. Indeed, every email and code patch an employee submits, and all related discussions become public. This may initially sound scary, but is actually a potential source of good publicity. Employees need to be trained on how to conduct themselves publicly, and the discussions about code should contain only information strictly related to the code, without any references to actual production environments or other sensitive information. In the long run most employees contributing have a positive impact and the company should reap the benefits of positive publicity. If there are quality issues or employee judgment issues, hiding the activity or forcing employees to contribute with pseudonyms is not a proper solution. Instead, the problems should be addressed at the root, and bad behavior addressed rather than tolerated. When people are working publicly, there tends to also be some degree of additional pride involved, which motivates people to try their best. Contributions need to be public for the sponsoring corporation to later be able to claim copyright or licenses. Considering that thousands of companies participate in open source every day, the prevalence of bad publicity is quite low, and the benefits far exceed the risks.

Scratch your own itch When choosing what to contribute, select things that benefit your own company. This is not purely about being selfish - often people working on resolving a problem they suffer from are the same people with the best expertise of what the problem is and what kind of solution is optimal. Also, the issues that are most pressing to your company are more likely to be universally useful to solve than any random bug or feature request in the upstream project s issue tracker.

Remember there are many ways to help upstream While submitting code is often considered the primary way to contribute, please keep in mind there are also other highly impactful ways to contribute. Submitting high-quality bug reports will help developers quickly identify and prioritize issues to fix. Providing good research, benchmarks, statistics or feedback helps guide development and the project make better design decisions. Documentation, translations, organizing events and providing marketing support can help increase adoption and strengthen long-term viability for the project. In some of the largest open source projects there are already far more pending contributions than the core maintainers can process. Therefore, developers who contribute code should also get into the habit of contributing reviews. As Linus law states, given enough eyeballs, all bugs are shallow. Reviewing other contributors submissions will help improve quality, and also alleviate the pressure on core maintainers who are the only ones providing feedback. Reviewing code submitted by others is also a great learning opportunity for the reviewer. The reviewer does not need to be better than the submitter - any feedback is useful; merely posting review feedback is not the same thing as making an approval decision. Many projects are also happy to accept monetary support and sponsorships. Some offer specific perks in return. By human nature, the largest sponsors always get their voice heard in important decisions, as no open source project wants to take actions that scare away major financial contributors.

Starting is the hardest part Long-term success in open source comes from a positive feedback loop of an ever-increasing number of users and collaborators. As seen in the examples of countless corporations contributing open source, the benefits are concrete, and the process usually runs well after the initial ramp-up and organizational learning phase has passed. In open source ecosystems, contributing upstream should be as natural as paying vendors in any business. If you are using open source and not contributing at all, you likely have latent business risks without realizing it. You don t want to wake up one morning to learn that your top talent left because they were forbidden from participating in open source for the company s benefit, or that you were fined due to CRA violations and mismanagement in sharing security fixes with the correct parties. The faster you start with the process, the less likely those risks will materialize.

24 June 2025

Evgeni Golov: Using LXCFS together with Podman

JP was puzzled that using podman run --memory=2G would not result in the 2G limit being visible inside the container. While we were able to identify this as a visualization problem tools like free(1) only look at /proc/meminfo and that is not virtualized inside a container, you'd have to look at /sys/fs/cgroup/memory.max and friends instead I couldn't leave it at that. And then I remembered there is actually something that can provide a virtual (cgroup-aware) /proc for containers: LXCFS! But does it work with Podman?! I always used it with LXC, but there is technically no reason why it wouldn't work with a different container solution cgroups are cgroups after all. As we all know: there is only one way to find out! Take a fresh Debian 12 VM, install podman and verify things behave as expected:
user@debian12:~$ podman run -ti --rm --memory=2G centos:stream9
bash-5.1# grep MemTotal /proc/meminfo
MemTotal:        6067396 kB
bash-5.1# cat /sys/fs/cgroup/memory.max
2147483648
And after installing (and starting) lxcfs, we can use the virtual /proc/meminfo it generates by bind-mounting it into the container (LXC does that part automatically for us):
user@debian12:~$ podman run -ti --rm --memory=2G --mount=type=bind,source=/var/lib/lxcfs/proc/meminfo,destination=/proc/meminfo centos:stream9
bash-5.1# grep MemTotal /proc/meminfo
MemTotal:        2097152 kB
bash-5.1# cat /sys/fs/cgroup/memory.max
2147483648
The same of course works with all the other proc entries lxcfs provides (cpuinfo, diskstats, loadavg, meminfo, slabinfo, stat, swaps, and uptime here), just bind-mount them. And yes, free(1) now works too!
bash-5.1# free -m
               total        used        free      shared  buff/cache   available
Mem:            2048           3        1976           0          67        2044
Swap:              0           0           0
Just don't blindly mount the whole /var/lib/lxcfs/proc over the container's /proc. It did work (as in: "bash and free didn't crash") for me, but with /proc/$PID etc missing, I bet things will go south pretty quickly.

22 June 2025

Iustin Pop: Coding, as we knew it, has forever changed

Back when I was terribly na ve When I was younger, and definitely na ve, I was so looking forward to AI, which will help us write lots of good, reliable code faster. Well, principally me, not thinking what impact it will have industry-wide. Other more general concerns, like societal issues, role of humans in the future and so on were totally not on my radar. At the same time, I didn t expect this will actually happen. Even years later, things didn t change dramatically. Even the first release of ChatGPT a few years back didn t click for me, as the limitations were still significant.

Hints of serious change The first hint of the change, for me, was when a few months ago (yes, behind the curve), I asked ChatGPT to re-explain a concept to me, and it just wrote a lot of words, but without a clear explanation. On a whim, I asked Grok then recently launched, I think to do the same. And for the first time, the explanation clicked and I felt I could have a conversation with it. Of course, now I forgot again that theoretical CS concept, but the first step was done: I can ask an LLM to explain something, and it will, and I can have a back and forth logical discussion, even if on some theoretical concept. Additionally, I learned that not all LLMs are the same, and that means there s real competition and that leap frogging is possible. Another topic on which I tried to adopt early and failed to get mileage out of it, was GitHub Copilot (in VSC). I tried, it helped, but didn t feel any speed-up at all. Then more recently, in May, I asked Grok what s the state of the art in AI-assisted coding. It said either Claude in a browser tab, or in VSC via continue.dev extension. The continue.dev extension/tooling is a bit of a strange/interesting thing. It seems to want to be a middle-man between the user and actual LLM services, i.e. you pay a subscription to continue.dev, not to Anthropic itself, and they manage the keys/APIs, for whatever backend LLMs you want to use. The integration with Visual Studio Code is very nice, but I don t know if long-term their business model will make sense. Well, not my problem.

Claude: reverse engineering my old code and teaching new concepts So I installed the latter and subscribed, thinking 20 CHF for a month is good for testing. I skipped the tutorial model/assistant, created a new one from scratch, just enabled Claude 3.7 Sonnet, and started using it. And then, my mind was blown-not just by the LLM, but by the ecosystem. As said, I ve used GitHub copilot before, but it didn t seem effective. I don t know if a threshold has been reached, or Claude (3.7 at that time) is just better than ChatGPT. I didn t use the AI to write (non-trivial) code for me, at most boilerplate snippets. But I used it both as partner for discussion - I want to do x, what do you think, A or B? , and as a teacher, especially for fronted topics, which I m not familiar with. Since May, in mostly fragmented sessions, I ve achieved more than in the last two years. Migration from old school JS to ECMA modules, a webpacker (reducing bundle size by 50%), replacing an old Javascript library with hand written code using modern APIs, implementing the zoom feature together with all of keyboard, mouse, touchpad and touchscreen support, simplifying layout from manually computed to automatic layout, and finding a bug in webkit for which it also wrote a cool minimal test (cool, as in, way better than I d have ever, ever written, because for me it didn t matter that much). And more. Could I have done all this? Yes, definitely, nothing was especially tricky here. But hours and hours of reading MDN, scouring Stack Overflow and Reddit, and lots of trial and error. So doable, but much more toily. This, to me, feels like cheating. 20 CHF per month to make me 3x more productive is free money well, except that I don t make money on my code which is written basically for myself. However, I don t get stuck anymore searching hours in the web for guidance, I ask my question, and I get at least direction if not answer, and I m finished way earlier. I can now actually juggle more hobbies, in the same amount of time, if my personal code takes less time or differently said, if I m more efficient at it. Not all is roses, of course. Once, it did write code with such an endearing error that it made me laugh. It was so blatantly obvious that you shouldn t keep other state in the array that holds pointer status because that confuses the calculation of how many pointers are down , probably to itself too if I d have asked. But I didn t, since it felt a bit embarassing to point out such a dumb mistake. Yes, I m anthropomorphising again, because this is the easiest way to deal with things. In general, it does an OK-to-good-to-sometimes-awesome job, and the best thing is that it summarises documentation and all of Reddit and Stack Overflow. And gives links to those. Now, I have no idea yet what this means for the job of a software engineer. If on open source code, my own code, it makes me 3x faster reverse engineering my code from 10 years ago is no small feat for working on large codebases, it should do at least the same, if not more. As an example of how open-ended the assistance can be, at one point, I started implementing a new feature threading a new attribute to a large number of call points. This is not complex at all, just add a new field to a Haskell record, and modifying everything to take it into account, populate it, merge it when merging the data structures, etc. The code is not complex, tending toward boilerplate a bit, and I was wondering on a few possible choices for implementation, so, with just a few lines of code written that were not even compiling, I asked I want to add a new feature, should I do A or B if I want it to behave like this , and the answer was something along the lines of I see you want to add the specific feature I was working on, but the implementation is incomplete, you still need to to X, Y and Z . My mind was blown at this point, as I thought, if the code doesn t compile, surely the computer won t be able to parse it, but this is not a program, this is an LLM, so of course it could read it kind of as a human would. Again, the code complexity is not great, but the fact that it was able to read a half-written patch, understand what I was working towards, and reason about, was mind-blowing, and scary. Like always.

Non-code writing Now, after all this, while writing a recent blog post, I thought this is going to be public anyway, so let me ask Claude what it thinks about it. And I was very surprised, again: gone was all the pain of rereading three times my post to catch typos (easy) or phrasing structure issues. It gave me very clearly points, and helped me cut 30-40% of the total time. So not only coding, but word smithing too is changed. If I were an author, I d be delighted (and scared). Here is the overall reply it gave me:
  • Spelling and grammar fixes, all of them on point except one mistake (I claimed I didn t capitalize one word, but I did). To the level of a good grammar checker.
  • Flow Suggestions, which was way beyond normal spelling and grammar. It felt like a teacher telling me to do better in my writing, i.e. nitpicking on things that actually were true even if they d still work. I.e. lousy phrase structure, still understandable, but lousy nevertheless.
  • Other notes: an overall summary. This was mostly just praising my post . I wish LLMs were not so focused on praise the user .
So yeah, this speeds me up to about 2x on writing blog posts, too. It definitely feels not fair.

Wither the future? After all this, I m a bit flabbergasted. Gone are the 2000 s with code without unittests, gone are the 2010 s without CI/CD, and now, mid-2020 s, gone is the lone programmer that scours the internet to learn new things, alone? What this all means for our skills in software development, I have no idea, except I know things have irreversibly changed (a butlerian jihad aside). Do I learn better with a dedicated tutor even if I don t fight with the problem for so long? Or is struggling in finding good docs the main method of learning? I don t know yet. I feel like I understand the topics I m discussing with the AI, but who knows in reality what it will mean long term in terms of stickiness of learning. For the better, or for worse, things have changed. After all the advances over the last five centuries in mechanical sciences, it has now come to some aspects of the intellectual work. Maybe this is the answer to the ever-growing complexity of tech stacks? I.e. a return of the lone programmer that builds things end-to-end, but with AI taming the complexity added in the last 25 years? I can dream, of course, but this also means that the industry overall will increase in complexity even more, because large companies tend to do that, so maybe a net effect of not much One thing I did learn so far is that my expectation that AI (at this level) will only help junior/beginner people, i.e. it would flatten the skills band, is not true. I think AI can speed up at least the middle band, likely the middle top band, I don t know about the 10x programmers (I m not one of them). So, my question about AI now is how to best use it, not to lament how all my learning (90% self learning, to be clear) is obsolete. No, it isn t. AI helps me start and finish one migration (that I delayed for ages), then start the second, in the same day. At the end of this a bit rambling reflection on the past month and a half, I still have many questions about AI and humanity. But one has been answered: yes, AI , quotes or no quotes, already has changed this field (producing software), and we ve not seen the end of it, for sure.

21 June 2025

Ravi Dwivedi: Getting Brunei visa

In December 2024, my friend Badri and I were planning a trip to Southeast Asia. At this point, we were planning to visit Singapore, Malaysia and Vietnam. My Singapore visa had already been approved, and Malaysia was visa-free for us. For Vietnam, we had to apply for an e-visa online. We considered adding Brunei to our itinerary. I saw some videos of the Brunei visa process and got the impression that we needed to go to the Brunei embassy in Kuching, Malaysia in person. However, when I happened to search for Brunei on Organic Maps1, I stumbled upon the Brunei Embassy in Delhi. It seemed to be somewhere in Hauz Khas. As I was going to Delhi to collect my Singapore visa the next day, I figured I d also visit the Brunei Embassy to get information about the visa process. The next day I went to the location displayed by Organic Maps. It was next to the embassy of Madagascar, and a sign on the road divider confirmed that I was at the right place. That said, it actually looked like someone s apartment. I entered and asked for directions to the Brunei embassy, but the people inside did not seem to understand my query. After some back and forth, I realized that the embassy wasn t there. I now searched for the Brunei embassy on the Internet, and this time I got an address in Vasant Vihar. It seemed like the embassy had been moved from Hauz Khas to Vasant Vihar. Going by the timings mentioned on the web page, the embassy was closing in an hour. I took a Metro from Hauz Khas to Vasant Vihar. After deboarding at the Vasant Vihar metro station, I took an auto to reach the embassy. The address listed on the webpage got me into the correct block. However, the embassy was still nowhere to be seen. I asked around, but security guards in that area pointed me to the Burundi embassy instead. After some more looking around, I did end up finding the embassy. I spoke to the security guards at the gate and told them that I would like to know the visa process. They dialled a number and asked that person to tell me the visa process. I spoke to a lady on the phone. She listed the documents required for the visa process and mentioned that the timings for visa application were from 9 o clock to 11 o clock in the morning. She also informed me that the visa fees was 1000. I also asked about the process Badri, who lives far away in Tamil Nadu and cannot report at the embassy physically. She told me that I can submit a visa application on his behalf, along with an authorization letter. Having found the embassy in Delhi was a huge relief. The other plan - going to Kuching, Malaysia - was a bit uncertain, and we didn t know how much time it would take. Getting our passport submitted at an embassy in a foreign country was also not ideal. A few days later, Badri sent me all the documents required for his visa. I went to the embassy and submitted both the applications. The lady who collected our visa submissions asked me for our flight reservations from Delhi to Brunei, whereas ours were (keeping with our itinerary) from Kuala Lampur. She said that she might contact me later if it was required. For reference, here is the list of documents we submitted - I then asked about the procedure to collect the passports and visa results. Usually, embassies will tell you that they will contact you when they have decided on your applications. However, here I was informed that if they don t contact me within 5 days, I can come and collect our passports and visa result between 13:30-14:30 hours on the fifth day. That was strange :) I did visit the embassy to collect our visa results on the fifth day. However, the lady scolded me for not bringing the receipt she gave me. I was afraid that I might have to go all the way back home and bring the receipt to get our passports. The travel date was close, and it would take some time for Badri to receive his passport via courier as well. Fortunately, she gave me our passports (with the visa attached) and asked me to share a scanned copy of the receipt via email after I get home. We were elated that our visas were approved. Now we could focus on booking our flights. If you are going to Brunei, remember to fill their arrival card from the website within 48 hours of your arrival! Thanks to Badri and Contrapunctus for reviewing the draft before publishing the article.

  1. Nowadays, I prefer using Comaps instead of Organic Maps and recommend you do the same. Organic Maps had some issues with its governance and the community issues weren t being addressed.

20 June 2025

Matthew Garrett: My a11y journey

23 years ago I was in a bad place. I'd quit my first attempt at a PhD for various reasons that were, with hindsight, bad, and I was suddenly entirely aimless. I lucked into picking up a sysadmin role back at TCM where I'd spent a summer a year before, but that's not really what I wanted in my life. And then Hanna mentioned that her PhD supervisor was looking for someone familiar with Linux to work on making Dasher, one of the group's research projects, more usable on Linux. I jumped.

The timing was fortuitous. Sun were pumping money and developer effort into accessibility support, and the Inference Group had just received a grant from the Gatsy Foundation that involved working with the ACE Centre to provide additional accessibility support. And I was suddenly hacking on code that was largely ignored by most developers, supporting use cases that were irrelevant to most developers. Being in a relatively green field space sounds refreshing, until you realise that you're catering to actual humans who are potentially going to rely on your software to be able to communicate. That's somewhat focusing.

This was, uh, something of an on the job learning experience. I had to catch up with a lot of new technologies very quickly, but that wasn't the hard bit - what was difficult was realising I had to cater to people who were dealing with use cases that I had no experience of whatsoever. Dasher was extended to allow text entry into applications without needing to cut and paste. We added support for introspection of the current applications UI so menus could be exposed via the Dasher interface, allowing people to fly through menu hierarchies and pop open file dialogs. Text-to-speech was incorporated so people could rapidly enter sentences and have them spoke out loud.

But what sticks with me isn't the tech, or even the opportunities it gave me to meet other people working on the Linux desktop and forge friendships that still exist. It was the cases where I had the opportunity to work with people who could use Dasher as a tool to increase their ability to communicate with the outside world, whose lives were transformed for the better because of what we'd produced. Watching someone use your code and realising that you could write a three line patch that had a significant impact on the speed they could talk to other people is an incomparable experience. It's been decades and in many ways that was the most impact I've ever had as a developer.

I left after a year to work on fruitflies and get my PhD, and my career since then hasn't involved a lot of accessibility work. But it's stuck with me - every improvement in that space is something that has a direct impact on the quality of life of more people than you expect, but is also something that goes almost unrecognised. The people working on accessibility are heroes. They're making all the technology everyone else produces available to people who would otherwise be blocked from it. They deserve recognition, and they deserve a lot more support than they have.

But when we deal with technology, we deal with transitions. A lot of the Linux accessibility support depended on X11 behaviour that is now widely regarded as a set of misfeatures. It's not actually good to be able to inject arbitrary input into an arbitrary window, and it's not good to be able to arbitrarily scrape out its contents. X11 never had a model to permit this for accessibility tooling while blocking it for other code. Wayland does, but suffers from the surrounding infrastructure not being well developed yet. We're seeing that happen now, though - Gnome has been performing a great deal of work in this respect, and KDE is picking that up as well. There isn't a full correspondence between X11-based Linux accessibility support and Wayland, but for many users the Wayland accessibility infrastructure is already better than with X11.

That's going to continue improving, and it'll improve faster with broader support. We've somehow ended up with the bizarre politicisation of Wayland as being some sort of woke thing while X11 represents the Roman Empire or some such bullshit, but the reality is that there is no story for improving accessibility support under X11 and sticking to X11 is going to end up reducing the accessibility of a platform.

When you read anything about Linux accessibility, ask yourself whether you're reading something written by either a user of the accessibility features, or a developer of them. If they're neither, ask yourself why they actually care and what they're doing to make the future better.

comment count unavailable comments

19 June 2025

Debian Outreach Team: GSoC 2025 Introduction: Make Debian for Raspberry Pi Build Again

Hello everyone! I am Kurva Prashanth, Interested in the lower level working of system software, CPUs/SoCs and Hardware design. I was introduced to Open Hardware and Embedded Linux while studying electronics and embedded systems as part of robotics coursework. Initially, I did not pay much attention to it and quickly moved on. However, a short talk on Liberating SBCs using Debian by Yuvraj at MiniDebConf India, 2021 caught my interest. The talk focused on Open Hardware platforms such as Olimex and BeagleBone Black, as well as the Debian distributions tailored for these ARM-based single-board computers has intrigued me to delve deeper into the realm of Open Hardware and Embedded Linux. These days I m trying to improve my abilities to contribute to Debian and Linux Kernel development. Before finding out about the Google Summer of Code project, I had already started my journey with Debian. I extensively used Debian system build tools(debootstrap, sbuild, deb-build-pkg, qemu-debootstrap) for Building Debian Image for Bela Cape a real-time OS for music making to achieve extremely fast audio and sensor processing times. In 2023, I had the opportunity to attend DebConf23 in Kochi, India - thanks to Nilesh Patra (@nilesh) and I met Hector Oron (@zumbi) over dinner at DebConf23 and It was nice talking about his contributions/work at Debian on armhf port and Debian System Administration that conversation got me interested in knowing more about Debian ARM, Installer and I found it fascinating that EmDebian was once a external project bringing Debian to embedded systems and now, Debian itself can be run on many embedded systems. And, also during DebCamp I got Introduced to PGP/GPG keys and the web of trust by Carlos Henrique Lima Melara (@charles) I learned how to use and generate GPG keys. After DebConf23 I tried debian packaging and I miserably failed to get sponsorship for a python library I packaged. I came across the Debian project for this year s Google Summer of Code and found the project titled Make Debian for Raspberry Pi Build Again quite interesting to me and applied. Gladly, on May 8th, I received an acceptance e-mail from GSoC. I got excited that I ll spend the summer working on something that I like doing. I am thrilled to be part of this project and I am super excited for the summer of 25. I m looking forward to work on what I most like, new connections and learning opportunities. So, let me talk a bit more about my project. I will be working on to Make Debian for Raspberry Pi SBC s under the guidance of Gunnar Wolf (@gwolf). In this post, I will describe the project I will be working on.

Why make Debian for Raspberry Pi build again? There is an available set of images for running Debian in Raspberry Pi computers (all models below the 5 series)! However, the maintainer severely lacking time to take care for them; called for help for somebody to adopt them, but have not been successful. The image generation scripts might have bitrotted a bit, but it is mostly all done. And there is a lot of interest and use still in having the images freshly generated and decently tested! This GSoC project is about getting the [https://raspi.debian.net/ Raspberry Pi Debian images] site working reliably, daily-built images become automatic again and ideally making it easily deployable to be run in project machines and migrating exsisting hosting infrastructure to Debian.

How much it differ from Debian build process? While the goal is to stay as close as possible to the Debian build process, Raspberry Pi boards require some necessary platform-specific changes primarily in the early boot sequence and firmware handling. Unlike typical Debian systems, Raspberry Pi boards depend on a non-standard bootloader and use non-free firmware (raspi-firmware), Introducing some hardware-specific differences in the initialization process. These differences are largely confined to the early boot and hardware initialization stages. Once the system boots, the userspace remains closely aligned with a typical Debian install, using Debian packages. The current modifications are required due to non-free firmware. However, several areas merit review: but there are a few parts that might be worth changing.
  1. Boot flow: Transitioning to a U-Boot based boot process (as used in Debian installer images for many other SBCs) would reduce divergence and better align with Debian Installer.
  2. Current scripts/workarounds: Some existing hacks may now be redundant with recent upstream support and could be removed.
  3. Board-specific images: Shift to architecture-specific base images with runtime detection could simplify builds and reduce duplication.
Debian already already building SD card images for a wide range of SBCs (e.g., BeagleBone, BananaPi, OLinuXino, Cubieboard, etc.) installer-arm64/images/u-boot and installer-armhf/images/u-boot, a similar approach for Raspberry Pi could improve maintainability and consistency with Debian s broader SBC support.

Quoted from Mail Discussion Thread with Mentor (Gunnar Wolf)
"One direction we wanted to explore was whether we should still be building one image per family, or whether we could instead switch to one image per architecture (armel, armhf, arm64). There were some details to iron out as RPi3 and RPi4 were quite different, but I think it will be similar to the differences between the RPi 0 and 1, which are handled at first-boot time. To understand what differs between families, take a look at Cyril Brulebois generate-recipe (in the repo), which is a great improvement over the ugly mess I had before he contributed it"
In this project, I intend to to build one image per architecture (armel, armhf, arm64) rather than continuing with the current model of building one image per board. This change simplifies image management, reduces redundancy, and leverages dynamic configuration at boot time to support all supported boards within each architecture. By using U-Boot and flash-kernel, we can detect the board type and configure kernel parameters, DTBs, and firmware during the first boot, reducing duplication across images and simplifying the maintenance burden and we can also generalize image creation while still supporting board-specific behavior at runtime. This method aligns with existing practices in the DebianInstaller team and aligns with Debian s long-term maintainability goals and better leverages upstream capabilities, ensuring a consistent and scalable boot experience. To streamline and standardize the process of building bootable Debian images for Raspberry Pi devices, I proposed a new workflow that leverages U-Boot and flash-kernel Debian packages. This provides a clean, maintainable, and reproducible way to generate images for armel, armhf and arm64 boards. The workflow is vmdb2, a lightweight, declarative tool designed to automate the creation of disk images. A typical vmdb2 recipe defines the disk layout, base system installation (via debootstrap), architecture-specific packages, and any custom post-install hooks and the image should includes U-Boot (the u-boot-rpi package), flash-kernel, and a suitable Debian kernel package like linux-image-arm64 or linux-image-armmp. U-Boot serves as the platform s bootloader and is responsible for loading the kernel and initramfs. Unlike Raspberry Pi s non-free firmware/proprietary bootloader, U-Boot provides an open and scriptable interface, allowing us to follow a more standard Debian boot process. It can be configured to boot using either an extlinux.conf or a boot.scr script generated automatically by flash-kernel. The role of flash-kernel is to bridge Debian s kernel installation system with the specifics of embedded bootloaders like U-Boot. When installed, it automatically copies the kernel image, initrd, and device tree blobs (DTBs) to the /boot partition. It also generates the necessary boot.scr script if the board configuration demands it. To work correctly, flash-kernel requires that the target machine be identified via /etc/flash-kernel/machine, which must correspond to an entry in its internal machine database.\ Once the vmdb2 build is complete, the resulting image will contain a fully configured bootable system with all necessary boot components correctly installed. The image can be flashed to an SD card and used to boot on the intended device without additional manual configuration. Because all key packages (U-Boot, kernel, flash-kernel) are managed through Debian s package system, kernel updates and boot script regeneration are handled automatically during system upgrades.

Current Workflow: Builds one Image per family The current vmdb2 recipe uses the Raspberry Pi GPU bootloader provided via the raspi-firmware package. This is the traditional boot process followed by Raspberry Pi OS, and it s tightly coupled with firmware files like bootcode.bin, start.elf, and fixup.dat. These files are installed to /boot/firmware, which is mounted from a FAT32 partition labeled RASPIFIRM. The device tree files (*.dtb) are manually copied from /usr/lib/linux-image-*-arm64/broadcom/ into this partition. The kernel is installed via the linux-image-arm64 package, and the boot arguments are injected by modifying /boot/firmware/cmdline.txt using sed commands. Booting depends on the root partition being labeled RASPIROOT, referenced through that file. There is no bootloader like UEFI-based or U-Boot involved the Raspberry Pi firmware directly loads the kernel, which is standard for Raspberry Pi boards.
- apt: install
  packages:
    ...
    - raspi-firmware  
The boot partition contents and kernel boot setup are tightly controlled via scripting in the recipe. Limitations of Current Workflow: While this setup works, it is
  1. Proprietary and Raspberry Pi specific It relies on the closed-source GPU bootloader the raspi-firmware package, which is tightly coupled to specific Raspberry Pi models.
  2. Manual DTB handling Device tree files are manually copied and hardcoded, making upgrades or board-specific changes error-prone.
  3. Not easily extendable to future Raspberry Pi boards Any change in bootloader behavior (as seen in the Raspberry Pi 5, which introduces a more flexible firmware boot process) would require significant rework.
  4. No UEFI-based/U-Boot The current method bypasses the standard bootloader layers, making it inconsistent with other Debian ARM platforms and harder to maintain long-term.
As Raspberry Pi firmware and boot processes evolve, especially with the introduction of Pi 5 and potentially Pi 6, maintaining compatibility will require more flexibility - something best delivered by adopting U-Boot and flash-kernel.

New Workflow: Building Architecture-Specific Images with vmdb2, U-Boot, flash-kernel, and Debian Kernel This workflow outlines an improved approach to generating bootable Debian images architecture specific, using vmdb2, U-Boot, flash-kernel, and Debian kernels and also to move away from Raspberry Pi s proprietary bootloader to a fully open-source boot process which improves maintainability, consistency, and cross-board support.

New Method: Shift to U-Boot + flash-kernel U-Boot (via Debian su-boot-rpi package) and flash-kernel bring the image building process closer to how Debian officially boots ARM devices. flash-kernel integrates with the system s initramfs and kernel packages to install bootloaders, prepare boot.scr or extlinux.conf, and copy kernel/initrd/DTBs to /boot in a format that U-Boot expects. U-Boot will be used as a second-stage bootloader, loaded by the Raspberry Pi s built-in firmware. Once U-Boot is in place, it will read standard boot scripts ( boot.scr) generated by flash-kernel, providing a Debian-compatible and board-flexible solution. Extending YAML spec for vmdb2 build with U-Boot and flash-kernel To improve an existing vmdb2 YAML spec(https://salsa.debian.org/raspi-team/image-specs/raspi_master.yaml), to integrate U-Boot, flash-kernel, and the architecture-specific Debian kernel into the image build process. By incorporating u-boot-rpi and flash-kernel from Debian packages, alongside the standard initramfs-tools, we align the image closer to Debian best practices while supporting both armhf and arm64 architectures. Below are key additions and adjustments needed in a vmdb2 YAML spec to support the workflow: Install U-Boot, flash-kernel, initramfs-tools and the architecture-specific Debian kernel.
- apt: install
  packages:
    - u-boot-rpi
    - flash-kernel
    - initramfs-tools
    - linux-image-arm64 # or linux-image-armmp for armhf 
  tag: tag-root
Replace linux-image-arm64 with the correct kernel package for specific target architecture. These packages should be added under the tag-root section in YAML spec for vmdb2 build recipe. This ensures that the necessary bootloader, kernel, and initramfs tools are included and properly configured in the image. Configure Raspberry Pi firmware to Load U-Boot Install the U-Boot binary as kernel.img in /boot/firmware we can also download and build U-Boot from source, but Debian provides tested binaries.
- shell:  
    cp /usr/lib/u-boot/rpi_4/u-boot.bin $ ROOT? /boot/firmware/kernel.img
    echo "enable_uart=1" >> $ ROOT? /boot/firmware/config.txt
  root-fs: tag-root
This makes the RPi firmware load u-boot.bin instead of the Linux kernel directly. Set Up flash-kernel for Debian-style Boot flash-kernel integrates with initramfs-tools and writes boot config suitable for U-Boot. We need to make sure /etc/flash-kernel/db contains an entry for board (most Raspberry Pi boards already supported in Bookworm). Set up /etc/flash-kernel.conf with:
- create-file: /etc/flash-kernel.conf
  contents:  
    MACHINE="Raspberry Pi 4"
    BOOTPART="/dev/disk/by-label/RASPIFIRM"
    ROOTPART="/dev/disk/by-label/RASPIROOT"
  unless: rootfs_unpacked
This allows flash-kernel to write an extlinux.conf or boot.scr into /boot/firmware. Clean up Proprietary/Non-Free Firmware Bootflow Remove the direct kernel loading flow:
- shell:  
    rm -f $ ROOT? /boot/firmware/vmlinuz*
    rm -f $ ROOT? /boot/firmware/initrd.img*
    rm -f $ ROOT? /boot/firmware/cmdline.txt
  root-fs: tag-root
Let U-Boot and flash-kernel manage kernel/initrd and boot parameters instead. Boot Flow After This Change
[SoC ROM] -> [start.elf] -> [U-Boot] -> [boot.scr] -> [Linux Kernel]
  1. This still depends on the Raspberry Pi firmware to start, but it only loads U-Boot, not Linux kernel.
  2. U-Boot gives you more flexibility (e.g., networking, boot menus, signed boot).
  3. Using flash-kernel ensures kernel updates are handled the Debian Installer way.
  4. Test with a serial console (enable_uart=1) in case HDMI doesn t show early boot logs.
Advantage of New Workflow
  1. Replaces the proprietary Raspberry Pi bootloader with upstream U-Boot.
  2. Debian-native tooling Uses flash-kernel and initramfs-tools to manage boot configuration.
  3. Consistent across boards Works for both armhf and arm64, unifying the image build process.
  4. Easier to support new boards Like the Raspberry Pi 5 and future models.
This transition will standardize a bit image-building process, making it aligned with upstream Debian Installer workflows.

vmdb2 configuration for arm64 using u-boot and flash-kernel NOTE: This is a baseline example and may require tuning.
# Raspberry Pi arm64 image using U-Boot and flash-kernel
steps:
  # ... (existing mkimg, partitions, mount, debootstrap, etc.) ...
  # Install U-Boot, flash-kernel, initramfs-tools and architecture specific kernel
  - apt: install
    packages:
      - u-boot-rpi
      - flash-kernel
      - initramfs-tools
      - linux - image - arm64 # or linux - image - armmp for armhf
    tag: tag-root
  # Install U-Boot binary as kernel.img in firmware partition
  - shell:  
      cp /usr/lib/u-boot/rpi_arm64 /u-boot.bin $ ROOT? /boot/firmware/kernel.img
      echo "enable_uart=1" >> $ ROOT? /boot/firmware/config.txt
    root-fs: tag-root
  # Configure flash-kernel for Raspberry Pi
  - create-file: /etc/flash-kernel.conf
    contents:  
      MACHINE="Generic Raspberry Pi ARM64"
      BOOTPART="/dev/disk/by-label/RASPIFIRM"
      ROOTPART="/dev/disk/by-label/RASPIROOT"
    unless: rootfs_unpacked
  # Remove direct kernel boot files from Raspberry Pi firmware
  - shell:  
      rm -f $ ROOT? /boot/firmware/vmlinuz*
      rm -f $ ROOT? /boot/firmware/initrd.img*
      rm -f $ ROOT? /boot/firmware/cmdline.txt
    root-fs: tag-root
  # flash-kernel will manage boot scripts and extlinux.conf
  # Rest of image build continues...

Required Changes to Support Raspberry Pi Boards in Debian (flash-kernel + U-Boot)

Overview of Required Changes
Component Required Task
Debian U-Boot Package Add build target for rpi_arm64 in u-boot-rpi. Optionally deprecate legacy 32-bit targets.
Debian flash-kernel Package Add or verify entries in db/all.db for Pi 4, Pi 5, Zero 2W, CM4. Ensure boot script generation works via bootscr.uboot-generic.
Debian Kernel Ensure DTBs are installed at /usr/lib/linux-image-<version>/ and available for flash-kernel to reference.

flash-kernel

Already Supported Boards in flash-kernel Debian Package https://sources.debian.org/src/flash-kernel/3.109/db/all.db/#L1700
Model Arch DTB-Id
Raspberry Pi 1 A/B/B+, Rev2 armel bcm2835-*
Raspberry Pi CM1 armel bcm2835-rpi-cm1-io1.dtb
Raspberry Pi Zero/Zero W armel bcm2835-rpi-zero*.dtb
Raspberry Pi 2B armhf bcm2836-rpi-2-b.dtb
Raspberry Pi 3B/3B+ arm64 bcm2837-*
Raspberry Pi CM3 arm64 bcm2837-rpi-cm3-io3.dtb
Raspberry Pi 400 arm64 bcm2711-rpi-400.dtb

uboot

Already Supported Boards in Debian U-Boot Package https://salsa.debian.org/installer-team/flash-kernel/-/blob/master/db/all.db

arm64 Model Arch Upstream Defconfig Debian Target - - - Raspberry Pi 3B arm64 rpi_3_defconfig rpi_3 Raspberry Pi 4B arm64 rpi_4_defconfig rpi_4 Raspberry Pi 3B/3B+/CM3/CM3+/4B/CM4/400/5B/Zero 2W arm64 rpi_arm64_defconfig rpi_arm64
armhf Model Arch Upstream Defconfig Debian Target - - - Raspberry Pi 2 armhf rpi_2_defconfig rpi_2 Raspberry Pi 3B (32-bit) armhf rpi_3_32b_defconfig rpi_3_32b Raspberry Pi 4B (32-bit) armhf rpi_4_32b_defconfig rpi_4_32b
armel Model Arch Upstream Defconfig Debian Target - - - Raspberry Pi armel rpi_defconfig rpi Raspberry Pi 1/Zero armel rpi_0_w rpi_0_w These boards are already defined in debian/rules under the u-boot-rpi source package and generates usable U-Boot binaries for corresponding Raspberry Pi models.

To-Do: Add Missing Board Support to U-Boot and flash-kernel in Debian Several Raspberry Pi models are missing from the Debian U-Boot and flash-kernel packages, even though upstream support and DTBs exist in the Debian kernel but are missing entries in the flash-kernel database to enable support for bootloader installation and initrd handling.

Boards Not Yet Supported in flash-kernel Debian Package
Model Arch DTB-Id
Raspberry Pi 3A+ (32 & 64 bit) armhf, arm64 bcm2837-rpi-3-a-plus.dtb
Raspberry Pi 4B (32 & 64 bit) armhf, arm64 bcm2711-rpi-4-b.dtb
Raspberry Pi CM4 arm64 bcm2711-rpi-cm4-io.dtb
Raspberry Pi CM 4S arm64 -
Raspberry Zero 2 W arm64 bcm2710-rpi-zero-2-w.dtb
Raspberry Pi 5 arm64 bcm2712-rpi-5-b.dtb
Raspberry Pi CM5 arm64 -
Raspberry Pi 500 arm64 -

Boards Not Yet Supported in Debian U-Boot Package
Model Arch Upstream defconfig(s)
Raspberry Pi 3A+/3B+ arm64 -, rpi_3_b_plus_defconfig
Raspberry Pi CM 4S arm64 -
Raspberry Pi 5 arm64 -
Raspberry Pi CM5 arm64 -
Raspberry Pi 500 arm64 -

So, what next? During the Community Bonding Period, I got hands-on with workflow improvements, set up test environments, and began reviewing Raspberry Pi support in Debian s U-Boot and flash-kernel and these are the logs of the project, where I provide weekly reports on the work done. You can check here: Community Bonding Period logs. My next steps include submitting patches to the u-boot and flash-kernel packages to ensure all missing Raspberry Pi entries are built and shipped. And, also to confirm the kernel DTB installation paths and make sure the necessary files are included for all Raspberry Pi variants. Finally, plan to validate changes with test builds on Raspberry Pi hardware. In parallel, I m organizing my tasks and setting up my environment to contribute more effectively. It s been exciting to explore how things work under the hood and to prepare for a summer of learning and contributing to this great community.

18 June 2025

Sergio Durigan Junior: GCC, glibc, stack unwinding and relocations A war story

I ve been meaning to write a post about this bug for a while, so here it is (before I forget the details!). First, I d like to thank a few people: I ll probably forget some details because it s been more than a week (and life at $DAYJOB moves fast), but we ll see.

The background story Wolfi OS takes security seriously, and one of the things we have is a package which sets the hardening compiler flags for C/C++ according to the best practices recommended by OpenSSF. At the time of this writing, these flags are (in GCC s spec file parlance):
*self_spec:
+ % !O:% !O1:% !O2:% !O3:% !O0:% !Os:% !0fast:% !0g:% !0z:-O2  -fhardened -Wno-error=hardened -Wno-hardened % !fdelete-null-pointer-checks:-fno-delete-null-pointer-checks  -fno-strict-overflow -fno-strict-aliasing % !fomit-frame-pointer:-fno-omit-frame-pointer  -mno-omit-leaf-frame-pointer
*link:
+ --as-needed -O1 --sort-common -z noexecstack -z relro -z now
The important part for our bug is the usage of -z now and -fno-strict-aliasing. As I was saying, these flags are set for almost every build, but sometimes things don t work as they should and we need to disable them. Unfortunately, one of these problematic cases has been glibc. There was an attempt to enable hardening while building glibc, but that introduced a strange breakage to several of our packages and had to be reverted. Things stayed pretty much the same until a few weeks ago, when I started working on one of my roadmap items: figure out why hardening glibc wasn t working, and get it to work as much as possible.

Reproducing the bug I started off by trying to reproduce the problem. It s important to mention this because I often see young engineers forgetting to check if the problem is even valid anymore. I don t blame them; the anxiety to get the bug fixed can be really blinding. Fortunately, I already had one simple test to trigger the failure. All I had to do was install the py3-matplotlib package and then invoke:
$ python3 -c 'import matplotlib'
This would result in an abortion with a coredump. I followed the steps above, and readily saw the problem manifesting again. OK, first step is done; I wasn t getting out easily from this one.

Initial debug The next step is to actually try to debug the failure. In an ideal world you get lucky and are able to spot what s wrong after just a few minutes. Or even better: you also can devise a patch to fix the bug and contribute it to upstream. I installed GDB, and then ran the py3-matplotlib command inside it. When the abortion happened, I issued a backtrace command inside GDB to see where exactly things had gone wrong. I got a stack trace similar to the following:
#0  0x00007c43afe9972c in __pthread_kill_implementation () from /lib/libc.so.6
#1  0x00007c43afe3d8be in raise () from /lib/libc.so.6
#2  0x00007c43afe2531f in abort () from /lib/libc.so.6
#3  0x00007c43af84f79d in uw_init_context_1[cold] () from /usr/lib/libgcc_s.so.1
#4  0x00007c43af86d4d8 in _Unwind_RaiseException () from /usr/lib/libgcc_s.so.1
#5  0x00007c43acac9014 in __cxxabiv1::__cxa_throw (obj=0x5b7d7f52fab0, tinfo=0x7c429b6fd218 <typeinfo for pybind11::attribute_error>, dest=0x7c429b5f7f70 <pybind11::reference_cast_error::~reference_cast_error() [clone .lto_priv.0]>)
    at ../../../../libstdc++-v3/libsupc++/eh_throw.cc:93
#6  0x00007c429b5ec3a7 in ft2font__getattr__(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) [clone .lto_priv.0] [clone .cold] () from /usr/lib/python3.13/site-packages/matplotlib/ft2font.cpython-313-x86_64-linux-gnu.so
#7  0x00007c429b62f086 in pybind11::cpp_function::initialize<pybind11::object (*&)(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >), pybind11::object, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, pybind11::name, pybind11::scope, pybind11::sibling>(pybind11::object (*&)(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >), pybind11::object (*)(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >), pybind11::name const&, pybind11::scope const&, pybind11::sibling const&):: lambda(pybind11::detail::function_call&)#1 ::_FUN(pybind11::detail::function_call&) [clone .lto_priv.0] ()
   from /usr/lib/python3.13/site-packages/matplotlib/ft2font.cpython-313-x86_64-linux-gnu.so
#8  0x00007c429b603886 in pybind11::cpp_function::dispatcher(_object*, _object*, _object*) () from /usr/lib/python3.13/site-packages/matplotlib/ft2font.cpython-313-x86_64-linux-gnu.so
...
Huh. Initially this didn t provide me with much information. There was something strange seeing the abort function being called right after _Unwind_RaiseException, but at the time I didn t pay much attention to it. OK, time to expand our horizons a little. Remember when I said that several of our packages would crash with a hardened glibc? I decided to look for another problematic package so that I could make it crash and get its stack trace. My thinking here is that maybe if I can compare both traces, something will come up. I happened to find an old discussion where Dann Frazier mentioned that Emacs was also crashing for him. He and I share the Emacs passion, and I totally agreed with him when he said that Emacs crashing is priority -1! (I m paraphrasing). I installed Emacs, ran it, and voil : the crash happened again. OK, that was good. When I ran Emacs inside GDB and asked for a backtrace, here s what I got:
#0  0x00007eede329972c in __pthread_kill_implementation () from /lib/libc.so.6
#1  0x00007eede323d8be in raise () from /lib/libc.so.6
#2  0x00007eede322531f in abort () from /lib/libc.so.6
#3  0x00007eede262879d in uw_init_context_1[cold] () from /usr/lib/libgcc_s.so.1
#4  0x00007eede2646e7c in _Unwind_Backtrace () from /usr/lib/libgcc_s.so.1
#5  0x00007eede3327b11 in backtrace () from /lib/libc.so.6
#6  0x000059535963a8a1 in emacs_backtrace ()
#7  0x000059535956499a in main ()
Ah, this backtrace is much simpler to follow. Nice. Hmmm. Now the crash is happening inside _Unwind_Backtrace. A pattern emerges! This must have something to do with stack unwinding (or so I thought keep reading to discover the whole truth). You see, the backtrace function (yes, it s a function) and C++ s exception handling mechanism use similar techniques to do their jobs, and it pretty much boils down to unwinding frames from the stack. I looked into Emacs source code, specifically the emacs_backtrace function, but could not find anything strange over there. This bug was probably not going to be an easy fix

The quest for a minimal reproducer Being able to easily reproduce the bug is awesome and really helps with debugging, but even better is being able to have a minimal reproducer for the problem. You see, py3-matplotlib is a huge package and pulls in a bunch of extra dependencies, so it s not easy to ask other people to just install this big package plus these other dependencies, and then run this command , especially if we have to file an upstream bug and talk to people who may not even run the distribution we re using. So I set up to try and come up with a smaller recipe to reproduce the issue, ideally something that s not tied to a specific package from the distribution. Having all the information gathered from the initial debug session, especially the Emacs backtrace, I thought that I could write a very simple program that just invoked the backtrace function from glibc in order to trigger the code path that leads to _Unwind_Backtrace. Here s what I wrote:
#include <execinfo.h>

int
main(int argc, char *argv[])
 
  void *a[4096];
  backtrace (a, 100);
  return 0;
 
After compiling it, I determined that yes, the problem did happen with this small program as well. There was only a small nuisance: the manifestation of the bug was not deterministic, so I had to execute the program a few times until it crashed. But that s much better than what I had before, and a small price to pay. Having a minimal reproducer pretty much allows us to switch our focus to what really matters. I wouldn t need to dive into Emacs or Python s source code anymore. At the time, I was sure this was a glibc bug. But then something else happened.

GCC 15 I had to stop my investigation efforts because something more important came up: it was time to upload GCC 15 to Wolfi. I spent a couple of weeks working on this (it involved rebuilding the whole archive, filing hundreds of FTBFS bugs, patching some programs, etc.), and by the end of it the transition went smooth. When the GCC 15 upload was finally done, I switched my focus back to the glibc hardening problem. The first thing I did was to yes, reproduce the bug again. It had been a few weeks since I had touched the package, after all. So I built a hardened glibc with the latest GCC and the bug did not happen anymore! Fortunately, the very first thing I thought was this must be GCC , so I rebuilt the hardened glibc with GCC 14, and the bug was there again. Huh, unexpected but very interesting.

Diving into glibc and libgcc At this point, I was ready to start some serious debugging. And then I got a message on Signal. It was one of those moments where two minds think alike: Gabriel decided to check how I was doing, and I was thinking about him because this involved glibc, and Gabriel contributed to the project for many years. I explained what I was doing, and he promptly offered to help. Yes, there are more people who love low level debugging! We spent several hours going through disassembles of certain functions (because we didn t have any debug information in the beginning), trying to make sense of what we were seeing. There was some heavy GDB involved; unfortunately I completely lost the session s history because it was done inside a container running inside an ephemeral VM. But we learned a lot. For example:
  • It was hard to actually understand the full stack trace leading to uw_init_context_1[cold]. _Unwind_Backtrace obviously didn t call it (it called uw_init_context_1, but what was that [cold] doing?). We had to investigate the disassemble of uw_init_context_1 in order to determined where uw_init_context_1[cold] was being called.
  • The [cold] suffix is a GCC function attribute that can be used to tell the compiler that the function is unlikely to be reached. When I read that, my mind immediately jumped to this must be an assertion , so I went to the source code and found the spot.
  • We were able to determine that the return code of uw_frame_state_for was 5, which means _URC_END_OF_STACK. That s why the assertion was triggering.
After finding these facts without debug information, I decided to bite the bullet and recompiled GCC 14 with -O0 -g3, so that we could debug what uw_frame_state_for was doing. After banging our heads a bit more, we found that fde is NULL at this excerpt:
// ...
  fde = _Unwind_Find_FDE (context->ra + _Unwind_IsSignalFrame (context) - 1,
                          &context->bases);
  if (fde == NULL)
     
#ifdef MD_FALLBACK_FRAME_STATE_FOR
      /* Couldn't find frame unwind info for this function.  Try a
         target-specific fallback mechanism.  This will necessarily
         not provide a personality routine or LSDA.  */
      return MD_FALLBACK_FRAME_STATE_FOR (context, fs);
#else
      return _URC_END_OF_STACK;
#endif
     
// ...
We re debugging on amd64, which means that MD_FALLBACK_FRAME_STATE_FOR is defined and therefore is called. But that s not really important for our case here, because we had established before that _Unwind_Find_FDE would never return NULL when using a non-hardened glibc (or a glibc compiled with GCC 15). So we decided to look into what _Unwind_Find_FDE did. The function is complex because it deals with .eh_frame , but we were able to pinpoint the exact location where find_fde_tail (one of the functions called by _Unwind_Find_FDE) is returning NULL:
if (pc < table[0].initial_loc + data_base)
  return NULL;
We looked at the addresses of pc and table[0].initial_loc + data_base, and found that the former fell within libgcc s text section, which the latter fell within /lib/ld-linux-x86-64.so.2 text. At this point, we were already too tired to continue. I decided to keep looking at the problem later and see if I could get any further.

Bisecting GCC The next day, I woke up determined to find what changed in GCC 15 that caused the bug to disappear. Unless you know GCC s internals like they are your own home (which I definitely don t), the best way to do that is to git bisect the commits between GCC 14 and 15. I spent a few days running the bisect. It took me more time than I d have liked to find the right range of commits to pass git bisect (because of how branches and tags are done in GCC s repository), and I also had to write some helper scripts that:
  • Modified the gcc.yaml package definition to make it build with the commit being bisected.
  • Built glibc using the GCC that was just built.
  • Ran tests inside a docker container (with the recently built glibc installed) to determine whether the bug was present.
At the end, I had a commit to point to:
commit 99b1daae18c095d6c94d32efb77442838e11cbfb
Author: Richard Biener <rguenther@suse.de>
Date:   Fri May 3 14:04:41 2024 +0200
    tree-optimization/114589 - remove profile based sink heuristics
Makes sense, right?! No? Well, it didn t for me either. Even after reading what was changed in the code and the upstream bug fixed by the commit, I was still clueless as to why this change fixed the problem (I say fixed because it may very well be an unintended consequence of the change, and some other problem might have been introduced).

Upstream takes over After obtaining the commit that possibly fixed the bug, while talking to Dann and explaining what I did, he suggested that I should file an upstream bug and check with them. Great idea, of course. I filed the following upstream bug: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120653 It s a bit long, very dense and complex, but ultimately upstream was able to find the real problem and have a patch accepted in just two days. Nothing like knowing the code base. The initial bug became: https://sourceware.org/bugzilla/show_bug.cgi?id=33088 In the end, the problem was indeed in how the linker defines __ehdr_start, which, according to the code (from elf/dl-support.c):
if (_dl_phdr == NULL)
   
    /* Starting from binutils-2.23, the linker will define the
       magic symbol __ehdr_start to point to our own ELF header
       if it is visible in a segment that also includes the phdrs.
       So we can set up _dl_phdr and _dl_phnum even without any
       information from auxv.  */


    extern const ElfW(Ehdr) __ehdr_start attribute_hidden;
    assert (__ehdr_start.e_phentsize == sizeof *GL(dl_phdr));
    _dl_phdr = (const void *) &__ehdr_start + __ehdr_start.e_phoff;
    _dl_phnum = __ehdr_start.e_phnum;
   
But the following definition is the problematic one (from elf/rtld.c):
extern const ElfW(Ehdr) __ehdr_start attribute_hidden;
This symbol (along with its counterpart, __ehdr_end) was being run-time relocated when it shouldn t be. The fix that was pushed added optimization barriers to prevent the compiler from doing the relocations. I don t claim to fully understand what was done here, and Jakub s analysis is a thing to behold, but in the end I was able to confirm that the patch fixed the bug. And in the end, it was indeed a glibc bug.

Conclusion This was an awesome bug to investigate. It s one of those that deserve a blog post, even though some of the final details of the fix flew over my head. I d like to start blogging more about these sort of bugs, because I ve encountered my fair share of them throughout my career. And it was great being able to do some debugging with another person, exchange ideas, learn things together, and ultimately share that deep satisfaction when we find why a crash is happening. I have at least one more bug in my TODO list to write about (another one with glibc, but this time I was able to get to the end of it and come up with a patch). Stay tunned. P.S.: After having published the post I realized that I forgot to explain why the -z now and -fno-strict-aliasing flags were important. -z now is the flag that I determined to be the root cause of the breakage. If I compiled glibc with every hardening flag except -z now, everything worked. So initially I thought that the problem had to do with how ld.so was resolving symbols at runtime. As it turns out, this ended up being more a symptom than the real cause of the bug. As for -fno-strict-aliasing, a Gentoo developer who commented on the GCC bug above mentioned that this OpenSSF bug had a good point against using this flag for hardening. I still have to do a deep dive on what was discussed in the issue, but this is certainly something to take into consideration. There s this very good write-up about strict aliasing in general if you re interested in understanding it better.

17 June 2025

Evgeni Golov: Arguing with an AI or how Evgeni tried to use CodeRabbit

Everybody is trying out AI assistants these days, so I figured I'd jump on that train and see how fast it derails. I went with CodeRabbit because I've seen it on YouTube ads work, I guess. I am trying to answer the following questions: To reduce the amount of output and not to confuse contributors, CodeRabbit was configured to only do reviews on demand. What follows is a rather unscientific evaluation of CodeRabbit based on PRs in two Foreman-related repositories, looking at the summaries CodeRabbit posted as well as the comments/suggestions it had about the code. Ansible 2.19 support PR: theforeman/foreman-ansible-modules#1848 summary posted The summary CodeRabbit posted is technically correct.
This update introduces several changes across CI configuration, Ansible roles, plugins, and test playbooks. It expands CI test coverage to a new Ansible version, adjusts YAML key types in test variables, refines conditional logic in Ansible tasks, adds new default variables, and improves clarity and consistency in playbook task definitions and debug output.
Yeah, it does all of that, all right. But it kinda misses the point that the addition here is "Ansible 2.19 support", which starts with adding it to the CI matrix and then adjusting the code to actually work with that version. Also, the changes are not for "clarity" or "consistency", they are fixing bugs in the code that the older Ansible versions accepted, but the new one is more strict about. Then it adds a table with the changed files and what changed in there. To me, as the author, it felt redundant, and IMHO doesn't add any clarity to understand the changes. (And yes, same "clarity" vs bugfix mistake here, but that makes sense as it apparently miss-identified the change reason) And then the sequence diagrams They probably help if you have a dedicated change to a library or a library consumer, but for this PR it's just noise, especially as it only covers two of the changes (addition of 2.19 to the test matrix and a change to the inventory plugin), completely ignoring other important parts. Overall verdict: noise, don't need this. comments posted CodeRabbit also posted 4 comments/suggestions to the changes. Guard against undefined result.task IMHO a valid suggestion, even if on the picky side as I am not sure how to make it undefined here. I ended up implementing it, even if with slightly different (and IMHO better readable) syntax. Inconsistent pipeline in when for composite CV versions That one was funny! The original complaint was that the when condition used slightly different data manipulation than the data that was passed when the condition was true. The code was supposed to do "clean up the data, but only if there are any items left after removing the first 5, as we always want to keep 5 items". And I do agree with the analysis that it's badly maintainable code. But the suggested fix was to re-use the data in the variable we later use for performing the cleanup. While this is (to my surprise!) valid Ansible syntax, it didn't make the code much more readable as you need to go and look at the variable definition. The better suggestion then came from Ewoud: to compare the length of the data with the number we want to keep. Humans, so smart! But Ansible is not Ewoud's native turf, so he asked whether there is a more elegant way to count how much data we have than to use list count in Jinja (the data comes from a Python generator, so needs to be converted to a list first). And the AI helpfully suggested to use count instead! However, count is just an alias for length in Jinja, so it behaves identically and needs a list. Luckily the AI quickly apologized for being wrong after being pointed at the Jinja source and didn't try to waste my time any further. Wouldn't I have known about the count alias, we'd have committed that suggestion and let CI fail before reverting again. Apply the same fix for non-composite CV versions The very same complaint was posted a few lines later, as the logic there is very similar just slightly different data to be filtered and cleaned up. Interestingly, here the suggestion also was to use the variable. But there is no variable with the data! The text actually says one need to "define" it, yet the "committable suggestion" doesn't contain that part. Interestingly, when asked where it sees the "inconsistency" in that hunk, it said the inconsistency is with the composite case above. That however is nonsense, as while we want to keep the same number of composite and non-composite CV versions, the data used in the task is different it even gets consumed by a totally different playbook so there can't be any real consistency between the branches. I ended up applying the same logic as suggested by Ewoud above. As that refactoring was possible in a consistent way. Ensure consistent naming for Oracle Linux subscription defaults One of the changes in Ansible 2.19 is that Ansible fails when there are undefined variables, even if they are only undefined for cases where they are unused. CodeRabbit complains that the names of the defaults I added are inconsistent. And that is technically correct. But those names are already used in other places in the code, so I'd have to refactor more to make it work properly. Once being pointed at the fact that the variables already exist, the AI is as usual quick to apologize, yay. add new parameters to the repository module PR: theforeman/foreman-ansible-modules#1860 summary posted Again, the summary is technically correct
The repository module was updated to support additional parameters for repository synchronization and authentication. New options were added for ansible collections, ostree, Python packages, and yum repositories, including authentication tokens, filtering controls, and version retention settings. All changes were limited to module documentation and argument specification.
But it doesn't add anything you'd not get from looking at the diff, especially as it contains a large documentation chunk explaining those parameters. No sequence diagram this time. That's a good thing! Overall verdict: noise (even if the amount is small), don't need this. comments posted CodeRabbit generated two comments for this PR. Interestingly, none of them overlapped with the issues ansible-lint and friends found. get rid of the FIXMEs Yepp, that's fair add validation for the new parameters Yepp, I forgot these (not intentionally!). The diff it suggests is nonsense, as it doesn't take into account the existing Ansible and Yum validations, but it clearly has read them as the style etc of the new ones matches. It also managed to group the parameters correctly by repository type, so it's something.
 if module.foreman_params['content_type'] != 'ansible_collection':
     invalid_list = [key for key in ['ansible_collection_requirements'] if key in module.foreman_params]
     if invalid_list:
         module.fail_json(msg="( 0 ) can only be used with content_type 'ansible_collection'".format(",".join(invalid_list)))
+
+# Validate ansible_collection specific parameters
+if module.foreman_params['content_type'] != 'ansible_collection':
+    invalid_list = [key for key in ['ansible_collection_auth_token', 'ansible_collection_auth_url'] if key in module.foreman_params]
+    if invalid_list:
+        module.fail_json(msg="( 0 ) can only be used with content_type 'ansible_collection'".format(",".join(invalid_list)))
+
+# Validate ostree specific parameters
+if module.foreman_params['content_type'] != 'ostree':
+    invalid_list = [key for key in ['depth', 'exclude_refs', 'include_refs'] if key in module.foreman_params]
+    if invalid_list:
+        module.fail_json(msg="( 0 ) can only be used with content_type 'ostree'".format(",".join(invalid_list)))
+
+# Validate python package specific parameters
+if module.foreman_params['content_type'] != 'python':
+    invalid_list = [key for key in ['excludes', 'includes', 'package_types', 'keep_latest_packages'] if key in module.foreman_params]
+    if invalid_list:
+        module.fail_json(msg="( 0 ) can only be used with content_type 'python'".format(",".join(invalid_list)))
+
+# Validate yum specific parameter
+if module.foreman_params['content_type'] != 'yum' and 'upstream_authentication_token' in module.foreman_params:
+    module.fail_json(msg="upstream_authentication_token can only be used with content_type 'yum'")
Interestingly, it also said "Note: If 'python' is not a valid content_type, please adjust the validation accordingly." which is quite a hint at a bug in itself. The module currently does not even allow to create content_type=python repositories. That should have been more prominent, as it's a BUG! parameter persistence in obsah PR: theforeman/obsah#72 summary posted Mostly correct. It did miss-interpret the change to a test playbook as an actual "behavior" change: "Introduced new playbook variables for database configuration" there is no database configuration in this repository, just the test playbook using the same metadata as a consumer of the library. Later on it does say "Playbook metadata and test fixtures", so unclear whether this is a miss-interpretation or just badly summarized. As long as you also look at the diff, it won't confuse you, but if you're using the summary as the sole source of information (bad!) it would. This time the sequence diagram is actually useful, yay. Again, not 100% accurate: it's missing the fact that saving the parameters is hidden behind an "if enabled" flag something it did represent correctly for loading them. Overall verdict: not really useful, don't need this. comments posted Here I was a bit surprised, especially as the nitpicks were useful! Persist-path should respect per-user state locations (nitpick) My original code used os.environ.get('OBSAH_PERSIST_PATH', '/var/lib/obsah/parameters.yaml') for the location of the persistence file. CodeRabbit correctly pointed out that this won't work for non-root users and one should respect XDG_STATE_HOME. Ewoud did point that out in his own review, so I am not sure whether CodeRabbit came up with this on its own, or also took the human comments into account. The suggested code seems fine too just doesn't use /var/lib/obsah at all anymore. This might be a good idea for the generic library we're working on here, and then be overridden to a static /var/lib path in a consumer (which always runs as root). In the end I did not implement it, but mostly because I was lazy and was sure we'd override it anyway. Positional parameters are silently excluded from persistence (nitpick) The library allows you to generate both positional (foo without --) and non-positional (--foo) parameters, but the code I wrote would only ever persist non-positional parameters. This was intentional, but there is no documentation of the intent in a comment which the rabbit thought would be worth pointing out. It's a fair nitpick and I ended up adding a comment. Enforce FQDN validation for database_host The library has a way to perform type checking on passed parameters, and one of the supported types is "FQDN" so a fully qualified domain name, with dots and stuff. The test playbook I added has a database_host variable, but I didn't bother adding a type to it, as I don't really need any type checking here. While using "FQDN" might be a bit too strict here technically a working database connection can also use a non-qualified name or an IP address, I was positively surprised by this suggestion. It shows that the rest of the repository was taken into context when preparing the suggestion. reset_args() can raise AttributeError when a key is absent This is a correct finding, the code is not written in a way that would survive if it tries to reset things that are not set. However, that's only true for the case where users pass in --reset-<parameter> without ever having set parameter before. The complaint about the part where the parameter is part of the persisted set but not in the parsed args is wrong as parsed args inherit from the persisted set. The suggested code is not well readable, so I ended up fixing it slightly differently. Persisted values bypass argparse type validation When persisting, I just yaml.safe_dump the parsed parameters, which means the YAML will contain native types like integers. The argparse documentation warns that the type checking argparse does only applies to strings and is skipped if you pass anything else (via default values). While correct, it doesn't really hurt here as the persisting only happens after the values were type-checked. So there is not really a reason to type-check them again. Well, unless the type changes, anyway. Not sure what I'll do with this comment. consider using contextlib.suppress This was added when I asked CodeRabbit for a re-review after pushing some changes. Interestingly, the PR already contained try: except: pass code before, and it did not flag that. Also, the code suggestion contained import contextlib in the middle of the code, instead in the head of the file. Who would do that?! But the comment as such was valid, so I fixed it in all places it is applicable, not only the one the rabbit found. workaround to ensure LCE and CV are always sent together PR: theforeman/foreman-ansible-modules#1867 summary posted
A workaround was added to the _update_entity method in the ForemanAnsibleModule class to ensure that when updating a host, both content_view_id and lifecycle_environment_id are always included together in the update payload. This prevents partial updates that could cause inconsistencies.
Partial updates are not a thing. The workaround is purely for the fact that Katello expects both parameters to be sent, even if only one of them needs an actual update. No diagram, good. Overall verdict: misleading summaries are bad! comments posted Given a small patch, there was only one comment. Implementation looks correct, but consider adding error handling for robustness. This reads correct on the first glance. More error handling is always better, right? But if you dig into the argumentation, you see it's wrong. Either: The AI accepted defeat once I asked it to analyze things in more detail, but why did I have to ask in the first place?! Summary Well, idk, really. Did the AI find things that humans did not find (or didn't bother to mention)? Yes. It's debatable whether these were useful (see e.g. the database_host example), but I tend to be in the "better to nitpick/suggest more and dismiss than oversee" team, so IMHO a positive win. Did the AI output help the humans with the review (useful summary etc)? In my opinion it did not. The summaries were either "lots of words, no real value" or plain wrong. The sequence diagrams were not useful either. Luckily all of that can be turned off in the settings, which is what I'd do if I'd continue using it. Did the AI output help the humans with the code (useful suggestions etc)? While the actual patches it posted were "meh" at best, there were useful findings that resulted in improvements to the code. Was the AI output misleading? Absolutely! The whole Jinja discussion would have been easier without the AI "help". Same applies for the "error handling" in the workaround PR. Was the AI output distracting? The output is certainly a lot, so yes I think it can be distracting. As mentioned, I think dropping the summaries can make the experience less distracting. What does all that mean? I will disable the summaries for the repositories, but will leave the @coderabbitai review trigger active if someone wants an AI-assisted review. This won't be something that I'll force on our contributors and maintainers, but they surely can use it if they want. But I don't think I'll be using this myself on a regular basis. Yes, it can be made "usable". But so can be vim ;-) Also, I'd prefer to have a junior human asking all the questions and making bad suggestions, so they can learn from it, and not some planet burning machine.

Next.

Previous.