Search Results: "hec"

15 October 2024

Lukas M rdian: Waiting for a Linux system to be online

Designed by Freepik

What is an online system? Networking is a complex topic, and there is lots of confusion around the definition of an online system. Sometimes the boot process gets delayed up to two minutes, because the system still waits for one or more network interfaces to be ready. Systemd provides the network-online.target that other service units can rely on, if they are deemed to require network connectivity. But what does online actually mean in this context, is a link-local IP address enough, do we need a routable gateway and how about DNS name resolution? The requirements for an online network interface depend very much on the services using an interface. For some services it might be good enough to reach their local network segment (e.g. to announce Zeroconf services), while others need to reach domain names (e.g. to mount a NFS share) or reach the global internet to run a web server. On the other hand, the implementation of network-online.target varies, depending on which networking daemon is in use, e.g. systemd-networkd-wait-online.service or NetworkManager-wait-online.service. For Ubuntu, we created a specification that describes what we as a distro expect an online system to be. Having a definition in place, we are able to tackle the network-online-ordering issues that got reported over the years and can work out solutions to avoid delayed boot times on Ubuntu systems. In essence, we want systems to reach the following networking state to be considered online:
  1. Do not wait for optional interfaces to receive network configuration
  2. Have IPv6 and/or IPv4 link-local addresses on every network interface
  3. Have at least one interface with a globally routable connection
  4. Have functional domain name resolution on any routable interface

A common implementation NetworkManager and systemd-networkd are two very common networking daemons used on modern Linux systems. But they originate from different contexts and therefore show different behaviours in certain scenarios, such as wait-online. Luckily, on Ubuntu we already have Netplan as a unification layer on top of those networking daemons, that allows for common network configuration, and can also be used to tweak the wait-online logic. With the recent release of Netplan v1.1 we introduced initial functionality to tweak the behaviour of the systemd-networkd-wait-online.service, as used on Ubuntu Server systems. When Netplan is used to drive the systemd-networkd backend, it will emit an override configuration file in /run/systemd/system/systemd-networkd-wait-online.service.d/10-netplan.conf, listing the specific non-optional interfaces that should receive link-local IP configuration. In parallel to that, it defines a list of network interfaces that Netplan detected to be potential global connections, and waits for any of those interfaces to reach a globally routable state. Such override config file might look like this:
[Unit]
ConditionPathIsSymbolicLink=/run/systemd/generator/network-online.target.wants/systemd-networkd-wait-online.service

[Service]
ExecStart=
ExecStart=/lib/systemd/systemd-networkd-wait-online -i eth99.43:carrier -i lo:carrier -i eth99.42:carrier -i eth99.44:degraded -i bond0:degraded
ExecStart=/lib/systemd/systemd-networkd-wait-online --any -o routable -i eth99.43 -i eth99.45 -i bond0
In addition to the new features implemented in Netplan, we reached out to upstream systemd, proposing an enhancement to the systemd-networkd-wait-online service, integrating it with systemd-resolved to check for the availability of DNS name resolution. Once this is implemented upstream, we re able to fully control the systemd-networkd backend on Ubuntu Server systems, to behave consistently and according to the definition of an online system that was lined out above.

Future work The story doesn t end there, because Ubuntu Desktop systems are using NetworkManager as their networking backend. This daemon provides its very own nm-online utility, utilized by the NetworkManager-wait-online systemd service. It implements a much higher-level approach, looking at the networking daemon in general instead of the individual network interfaces. By default, it considers a system to be online once every autoconnect profile got activated (or failed to activate), meaning that either a IPv4 or IPv6 address got assigned. There are considerable enhancements to be implemented to this tool, for it to be controllable in a fine-granular way similar to systemd-networkd-wait-online, so that it can be instructed to wait for specific networking states on selected interfaces.

A note of caution Making a service depend on network-online.target is considered an antipattern in most cases. This is because networking on Linux systems is very dynamic and the systemd target can only ever reflect the networking state at a single point in time. It cannot guarantee this state to be remained over the uptime of your system and has the potentially to delay the boot process considerably. Cables can be unplugged, wireless connectivity can drop, or remote routers can go down at any time, affecting the connectivity state of your local system. Therefore, instead of wondering what to do about network.target, please just fix your program to be friendly to dynamically changing network configuration. [source].

Iustin Pop: Optical media lifetime - one data point

Way back (more than 10 years ago) when I was doing DVD-based backups, I knew that normal DVDs/Blu-Rays are no long-term archival solutions, and that if I was real about doing optical media backups, I need to switch to M-Disc. I actually bought a (small stack) of M-Disc Blu-Rays, but never used them. I then switched to other backups solutions, and forgot about the whole topic. Until, this week, while sorting stuff, I happened upon a set of DVD backups from a range of years, and was very curious whether they are still readable after many years. And, to my surprise, there were no surprises! Went backward in time, and: I also found stack of dual-layer DVD+R from 2012-2014, some for sure Verbatim, and some unmarked (they were intended to be printed on), but likely Verbatim as well. All worked just fine. Just that, even at ~8GiB per disk, backing up raw photo files took way too many disks, even in 2014 . At this point I was happy that all 12+ DVDs I found, ranging from 10 to 14 years, are all good. Then I found a batch of 3 CDs! Here the results were mixed: I think the takeaway is that for all explicitly selected media - TDK, JVC and Verbatim - they hold for 10-20 years. Valid reads from summer 2003 is mind boggling for me, for (IIRC) organic media - not sure about the TDK metallic substrate. And when you just pick whatever ( Creation ), well, the results are mixed. Note that in all this, it was about CDs and DVDs. I have no idea how Blu-Rays behave, since I don t think I ever wrote a Blu-Ray. In any case, surprising to me, and makes me rethink a bit my backup options. Sizes from 25 to 100GB Blu-Rays are reasonable for most critical data. And they re WORM, as opposed to most LTO media, which is re-writable (and to some small extent, prone to accidental wiping). Now, I should check those M-Disks to see if they can still be written to, after 10 years

10 October 2024

Freexian Collaborators: Debian Contributions: Packaging Pydantic v2, Reworking of glib2.0 for cross bootstrap, Python archive rebuilds and more! (by Anupa Ann Joseph)

Debian Contributions: 2024-09 Contributing to Debian is part of Freexian s mission. This article covers the latest achievements of Freexian and their collaborators. All of this is made possible by organizations subscribing to our Long Term Support contracts and consulting services.

Pydantic v2, by Colin Watson Pydantic is a useful library for validating data in Python using type hints: Freexian uses it in a number of projects, including Debusine. Its Debian packaging had been stalled at 1.10.17 in testing for some time, partly due to needing to make sure everything else could cope with the breaking changes introduced in 2.x, but mostly due to needing to sort out packaging of its new Rust dependencies. Several other people (notably Alexandre Detiste, Andreas Tille, Drew Parsons, and Timo R hling) had made some good progress on this, but nobody had quite got it over the line and it seemed a bit stuck. Colin upgraded a few Rust libraries to new upstream versions, packaged rust-jiter, and chased various failures in other packages. This eventually allowed getting current versions of both pydantic-core and pydantic into testing. It should now be much easier for us to stay up to date routinely.

Reworking of glib2.0 for cross bootstrap, by Helmut Grohne Simon McVittie (not affiliated with Freexian) earlier restructured the libglib2.0-dev such that it would absorb more functionality and in particular provide tools for working with .gir files. Those tools practically require being run for their host architecture (practically this means running under qemu-user) which is at odds with the requirements of architecture cross bootstrap. The qemu requirement was expressed in package dependencies and also made people unhappy attempting to use libglib2.0-dev for i386 on amd64 without resorting to qemu. The use of qemu in architecture bootstrap is particularly problematic as it tends to not be ready at the time bootstrapping is needed. As a result, Simon proposed and implemented the introduction of a libgio-2.0-dev package providing a subset of libglib2.0-dev that does not require qemu. Packages should continue to use libglib2.0-dev in their Build-Depends unless involved in architecture bootstrap. Helmut reviewed and tested the implementation and integrated the necessary changes into rebootstrap. He also prepared a patch for libverto to use the new package and proposed adding forward compatibility to glib2.0. Helmut continued working on adding cross-exe-wrapper to architecture-properties and implemented autopkgtests later improved by Simon. The cross-exe-wrapper package now provides a generic mechanism to a program on a different architecture by using qemu when needed only. For instance, a dependency on cross-exe-wrapper:i386 provides a i686-linux-gnu-cross-exe-wrapper program that can be used to wrap an ELF executable for the i386 architecture. When installed on amd64 or i386 it will skip installing or running qemu, but for other architectures qemu will be used automatically. This facility can be used to support cross building with targeted use of qemu in cases where running host code is unavoidable as is the case for GObject introspection. This concludes the joint work with Simon and Niels Thykier on glib2.0 and architecture-properties resolving known architecture bootstrap regressions arising from the glib2.0 refactoring earlier this year.

Analyzing binary package metadata, by Helmut Grohne As Guillem Jover (not affiliated with Freexian) continues to work on adding metadata tracking to dpkg, the question arises how this affects existing packages. The dedup.debian.net infrastructure provides an easy playground to answer such questions, so Helmut gathered file metadata from all binary packages in unstable and performed an explorative analysis. Some results include: Guillem also performed a cursory analysis and reported other problem categories such as mismatching directory permissions for directories installed by multiple packages and thus gained a better understanding of what consistency checks dpkg can enforce.

Python archive rebuilds, by Stefano Rivera Last month Stefano started to write some tooling to do large-scale rebuilds in debusine, starting with finding packages that had already started to fail to build from source (FTBFS) due to the removal of setup.py test. This month, Stefano did some more rebuilds, starting with experimental versions of dh-python. During the Python 3.12 transition, we had added a dependency on python3-setuptools to dh-python, to ease the transition. Python 3.12 removed distutils from the stdlib, but many packages were expecting it to still be available. Setuptools contains a version of distutils, and dh-python was a convenient place to depend on setuptools for most package builds. This dependency was never meant to be permanent. A rebuild without it resulted in mass-filing about 340 bugs (and around 80 more by mistake). A new feature in Python 3.12, was to have unittest s test runner exit with a non-zero return code, if no tests were run. We added this feature, to be able to detect tests that are not being discovered, by mistake. We are ignoring this failure, as we wouldn t want to suddenly cause hundreds of packages to fail to build, if they have no tests. Stefano did a rebuild to see how many packages were affected, and found that around 1000 were. The Debian Python community has not come to a conclusion on how to move forward with this. As soon as Python 3.13 release candidate 2 was available, Stefano did a rebuild of the Python packages in the archive against it. This was a more complex rebuild than the others, as it had to be done in stages. Many packages need other Python packages at build time, typically to run tests. So transitions like this involve some manual bootstrapping, followed by several rounds of builds. Not all packages could be tested, as not all their dependencies support 3.13 yet. The result was around 100 bugs in packages that need work to support Python 3.13. Many other packages will need additional work to properly support Python 3.13, but being able to build (and run tests) is an important first step.

Miscellaneous contributions
  • Carles prepared the update of python-pyaarlo package to a new upstream release.
  • Carles worked on updating python-ring-doorbell to a new upstream release. Unfinished, pending to package a new dependency python3-firebase-messaging RFP #1082958 and its dependency python3-http-ece RFP #1083020.
  • Carles improved po-debconf-manager. Main new feature is that it can open Salsa merge requests. Aiming for a lightning talk in MiniDebConf Toulouse (November) to be functional end to end and get feedback from the wider public for this proof of concept.
  • Carles helped one translator to use po-debconf-manager (added compatibility for bullseye, fixed other issues) and reviewed 17 package templates.
  • Colin upgraded the OpenSSH packaging to 9.9p1.
  • Colin upgraded the various YubiHSM packages to new upstream versions, enabled more tests, fixed yubihsm-shell build failures on some 32-bit architectures, made yubihsm-shell build reproducibly, and fixed yubihsm-connector to apply udev rules to existing devices when the package is installed. As usual, bookworm-backports is up to date with all these changes.
  • Colin fixed quite a bit of fallout from setuptools 72.0.0 removing setup.py test, backported a large upstream patch set to make buildbot work with SQLAlchemy 2.0, and upgraded 25 other Python packages to new upstream versions.
  • Enrico worked with Jakob Haufe to get him up to speed for managing sso.debian.org
  • Rapha l did remove spam entries in the list of teams on tracker.debian.org (see #1080446), and he applied a few external contributions, fixing a rendering issue and replacing the DDPO link with a more useful alternative. He also gave feedback on a couple of merge requests that required more work. As part of the analysis of the underlying problem, he suggested to the ftpmasters (via #1083068) to auto-reject packages having the too-many-contacts lintian error, and he raised the severity of #1076048 to serious to actually have that 4 year old bug fixed.
  • Rapha l uploaded zim and hamster-time-tracker to fix issues with Python 3.12 getting rid of setuptools. He also uploaded a new gnome-shell-extension-hamster to cope with the upcoming transition to GNOME 47.
  • Helmut sent seven patches and sponsored one upload for cross build failures.
  • Helmut uploaded a Nagios/Icinga plugin check-smart-attributes for monitoring the health of physical disks.
  • Helmut collaborated on sbuild reviewing and improving a MR for refactoring the unshare backend.
  • Helmut sent a patch fixing coinstallability of gcc-defaults.
  • Helmut continued to monitor the evolution of the /usr-move. With more and more key packages such as libvirt or fuse3 fixed. We re moving into the boring long-tail of the transition.
  • Helmut proposed updating the meson buildsystem in debhelper to use env2mfile.
  • Helmut continued to update patches maintained in rebootstrap. Due to the work on glib2.0 above, rebootstrap moves a lot further, but still fails for any architecture.
  • Santiago reviewed some Merge Request in Salsa CI, such as: !478, proposed by Otto to extend the information about how to use additional runners in the pipeline and !518, proposed by Ahmed to add support for Ubuntu images, that will help to test how some debian packages, including the complex MariaDB are built on Ubuntu. Santiago also prepared !545, which will make the reprotest job more consistent with the result seen on reproducible-builds.
  • Santiago worked on different tasks related to DebConf 25. Especially he drafted the fundraising brochure (which is almost ready).
  • Thorsten Alteholz uploaded package libcupsfilter to fix the autopkgtest and a dependency problem of this package. After package splix was abandoned by upstream and OpenPrinting.org adopted its maintenance, Thorsten uploaded their first release.
  • Anupa published posts on the Debian Administrators group in LinkedIn and moderated the group, one of the tasks of the Debian Publicity Team.
  • Anupa helped organize DebUtsav 2024. It had over 100 attendees with hand-on sessions on making initial contributions to Linux Kernel, Debian packaging, submitting documentation to Debian wiki and assisting Debian Installations.

9 October 2024

Ben Hutchings: FOSS activity in September 2024

8 October 2024

Debian Brasil: Testing feed in English

Testing the feed in English and check If it's going to Debian Planet. Sorry the noise :-)

7 October 2024

Reproducible Builds: Reproducible Builds in September 2024

Welcome to the September 2024 report from the Reproducible Builds project! Our reports attempt to outline what we ve been up to over the past month, highlighting news items from elsewhere in tech where they are related. As ever, if you are interested in contributing to the project, please visit our Contribute page on our website. Table of contents:
  1. New binsider tool to analyse ELF binaries
  2. Unreproducibility of GHC Haskell compiler 95% fixed
  3. Mailing list summary
  4. Towards a 100% bit-for-bit reproducible OS
  5. Two new reproducibility-related academic papers
  6. Distribution work
  7. diffoscope
  8. Other software development
  9. Android toolchain core count issue reported
  10. New Gradle plugin for reproducibility
  11. Website updates
  12. Upstream patches
  13. Reproducibility testing framework

New binsider tool to analyse ELF binaries Reproducible Builds developer Orhun Parmaks z has announced a fantastic new tool to analyse the contents of ELF binaries. According to the project s README page:
Binsider can perform static and dynamic analysis, inspect strings, examine linked libraries, and perform hexdumps, all within a user-friendly terminal user interface!
More information about Binsider s features and how it works can be found within Binsider s documentation pages.

Unreproducibility of GHC Haskell compiler 95% fixed A seven-year-old bug about the nondeterminism of object code generated by the Glasgow Haskell Compiler (GHC) received a recent update, consisting of Rodrigo Mesquita noting that the issue is:
95% fixed by [merge request] !12680 when -fobject-determinism is enabled. [ ]
The linked merge request has since been merged, and Rodrigo goes on to say that:
After that patch is merged, there are some rarer bugs in both interface file determinism (eg. #25170) and in object determinism (eg. #25269) that need to be taken care of, but the great majority of the work needed to get there should have been merged already. When merged, I think we should close this one in favour of the more specific determinism issues like the two linked above.

Mailing list summary On our mailing list this month:
  • Fay Stegerman let everyone know that she started a thread on the Fediverse about the problems caused by unreproducible zlib/deflate compression in .zip and .apk files and later followed up with the results of her subsequent investigation.
  • Long-time developer kpcyrd wrote that there has been a recent public discussion on the Arch Linux GitLab [instance] about the challenges and possible opportunities for making the Linux kernel package reproducible , all relating to the CONFIG_MODULE_SIG flag. [ ]
  • Bernhard M. Wiedemann followed-up to an in-person conversation at our recent Hamburg 2024 summit on the potential presence for Reproducible Builds in recognised standards. [ ]
  • Fay Stegerman also wrote about her worry about the possible repercussions for RB tooling of Debian migrating from zlib to zlib-ng as reproducibility requires identical compressed data streams. [ ]
  • Martin Monperrus wrote the list announcing the latest release of maven-lockfile that is designed aid building Maven projects with integrity . [ ]
  • Lastly, Bernhard M. Wiedemann wrote about potential role of reproducible builds in combatting silent data corruption, as detailed in a recent Tweet and scholarly paper on faulty CPU cores. [ ]

Towards a 100% bit-for-bit reproducible OS Bernhard M. Wiedemann began writing on journey towards a 100% bit-for-bit reproducible operating system on the openSUSE wiki:
This is a report of Part 1 of my journey: building 100% bit-reproducible packages for every package that makes up [openSUSE s] minimalVM image. This target was chosen as the smallest useful result/artifact. The larger package-sets get, the more disk-space and build-power is required to build/verify all of them.
This work was sponsored by NLnet s NGI Zero fund.

Distribution work In Debian this month, 14 reviews of Debian packages were added, 12 were updated and 20 were removed, all adding to our knowledge about identified issues. A number of issue types were updated as well. [ ][ ] In addition, Holger opened 4 bugs against the debrebuild component of the devscripts suite of tools. In particular:
  • #1081047: Fails to download .dsc file.
  • #1081048: Does not work with a proxy.
  • #1081050: Fails to create a debrebuild.tar.
  • #1081839: Fails with E: mmdebstrap failed to run error.
Last month, an issue was filed to update the Salsa CI pipeline (used by 1,000s of Debian packages) to no longer test for reproducibility with reprotest s build_path variation. Holger Levsen provided a rationale for this change in the issue, which has already been made to the tests being performed by tests.reproducible-builds.org. This month, this issue was closed by Santiago R. R., nicely explaining that build path variation is no longer the default, and, if desired, how developers may enable it again. In openSUSE news, Bernhard M. Wiedemann published another report for that distribution.

diffoscope diffoscope is our in-depth and content-aware diff utility that can locate and diagnose reproducibility issues. This month, Chris Lamb made the following changes, including preparing and uploading version 278 to Debian:
  • New features:
    • Add a helpful contextual message to the output if comparing Debian .orig tarballs within .dsc files without the ability to fuzzy-match away the leading directory. [ ]
  • Bug fixes:
    • Drop removal of calculated os.path.basename from GNU readelf output. [ ]
    • Correctly invert X% similar value and do not emit 100% similar . [ ]
  • Misc:
    • Temporarily remove procyon-decompiler from Build-Depends as it was removed from testing (via #1057532). (#1082636)
    • Update copyright years. [ ]
For trydiffoscope, the command-line client for the web-based version of diffoscope, Chris Lamb also:
  • Added an explicit python3-setuptools dependency. (#1080825)
  • Bumped the Standards-Version to 4.7.0. [ ]

Other software development disorderfs is our FUSE-based filesystem that deliberately introduces non-determinism into system calls to reliably flush out reproducibility issues. This month, version 0.5.11-4 was uploaded to Debian unstable by Holger Levsen making the following changes:
  • Replace build-dependency on the obsolete pkg-config package with one on pkgconf, following a Lintian check. [ ]
  • Bump Standards-Version field to 4.7.0, with no related changes needed. [ ]

In addition, reprotest is our tool for building the same source code twice in different environments and then checking the binaries produced by each build for any differences. This month, version 0.7.28 was uploaded to Debian unstable by Holger Levsen including a change by Jelle van der Waa to move away from the pipes Python module to shlex, as the former will be removed in Python version 3.13 [ ].

Android toolchain core count issue reported Fay Stegerman reported an issue with the Android toolchain where a part of the build system generates a different classes.dex file (and thus a different .apk) depending on the number of cores available during the build, thereby breaking Reproducible Builds:
We ve rebuilt [tag v3.6.1] multiple times (each time in a fresh container): with 2, 4, 6, 8, and 16 cores available, respectively:
  • With 2 and 4 cores we always get an unsigned APK with SHA-256 14763d682c9286ef .
  • With 6, 8, and 16 cores we get an unsigned APK with SHA-256 35324ba4c492760 instead.

New Gradle plugin for reproducibility A new plugin for the Gradle build tool for Java has been released. This easily-enabled plugin results in:
reproducibility settings [being] applied to some of Gradle s built-in tasks that should really be the default. Compatible with Java 8 and Gradle 8.3 or later.

Website updates There were a rather substantial number of improvements made to our website this month, including:

Upstream patches The Reproducible Builds project detects, dissects and attempts to fix as many currently-unreproducible packages as possible. We endeavour to send all of our patches upstream where appropriate. This month, we wrote a large number of such patches, including:

Reproducibility testing framework The Reproducible Builds project operates a comprehensive testing framework running primarily at tests.reproducible-builds.org in order to check packages and other artifacts for reproducibility. In September, a number of changes were made by Holger Levsen, including:
  • Debian-related changes:
    • Upgrade the osuosl4 node to Debian trixie in anticipation of running debrebuild and rebuilderd there. [ ][ ][ ]
    • Temporarily mark the osuosl4 node as offline due to ongoing xfs_repair filesystem maintenance. [ ][ ]
    • Do not warn about (very old) broken nodes. [ ]
    • Add the risc64 architecture to the multiarch version skew tests for Debian trixie and sid. [ ][ ][ ]
    • Mark the virt 32,64 b nodes as down. [ ]
  • Misc changes:
    • Add support for powercycling OpenStack instances. [ ]
    • Update the fail2ban to ban hosts for 4 weeks in total [ ][ ] and take care to never ban our own Jenkins instance. [ ]
In addition, Vagrant Cascadian recorded a disk failure for the virt32b and virt64b nodes [ ], performed some maintenance of the cbxi4a node [ ][ ] and marked most armhf architecture systems as being back online.

Finally, If you are interested in contributing to the Reproducible Builds project, please visit our Contribute page on our website. However, you can get in touch with us via:

6 October 2024

Bits from Debian: Bits from the DPL

Dear Debian community, this are my bits from DPL for September. New lintian maintainer I'm pleased to welcome Louis-Philippe V ronneau as a new Lintian maintainer. He humorously acknowledged his new role, stating, "Apparently I'm a Lintian maintainer now". I remain confident that we can, and should, continue modernizing our policy checker, and I see this as one important step toward that goal. SPDX name / license tools There was a discussion about deprecating the unique names for DEP-5 and migrating to fully compliant SPDX names. Simon McVittie wrote: "Perhaps our Debian-specific names are better, but the relevant question is whether they are sufficiently better to outweigh the benefit of sharing effort and specifications with the rest of the world (and I don't think they are)." Also Charles Plessy sees the value of deprecating the Debian ones and align on SPDX. The thread on debian-devel list contains several practical hints for writing debian/copyright files. proposal: Hybrid network stack for Trixie There was a very long discussion on debian-devel list about the network stack on Trixie that started in July and was continued in end of August / beginning of September. The discussion was also covered on LWN. It continued in a "proposal: Hybrid network stack for Trixie" by Lukas M rdian. Contacting teams I continued reaching out to teams in September. One common pattern I've noticed is that most teams lack a clear strategy for attracting new contributors. Here's an example snippet from one of my outreach emails, which is representative of the typical approach: Q: Do you have some strategy to gather new contributors for your team? A: No. Q: Can I do anything for you? A: Everything that can help to have more than 3 guys :-D Well, only the first answer, "No," is typical. To help the JavaScript team, I'd like to invite anyone with JavaScript experience to join the team's mailing list and offer to learn and contribute. While I've only built a JavaScript package once, I know this team has developed excellent tools that are widely adopted by others. It's an active and efficient team, making it a great starting point for those looking to get involved in Debian. You might also want to check out the "Little tutorial for JS-Team beginners". Given the lack of a strategy to actively recruit new contributors--a common theme in the responses I've received--I recommend reviewing my talk from DebConf23 about teams. The Debian Med team would have struggled significantly in my absence (I've paused almost all work with the team since becoming DPL) if I hadn't consistently focused on bringing in new members. I'm genuinely proud of how the team has managed to keep up with the workload (thank you, Debian Med team!). Of course, onboarding newcomers takes time, and there's no guarantee of long-term success, but if you don't make the effort, you'll never find out. OS underpaid The Register, in its article titled "Open Source Maintainers Underpaid, Swamped by Security, Going Gray", summarizes the 2024 State of the Open Source Maintainer Report. I find this to be an interesting read, both in general and in connection with the challenges mentioned in the previous paragraph about finding new team members. Kind regards Andreas.

1 October 2024

Guido G nther: Free Software Activities September 2024

Another short status update of what happened on my side last month. Besides the usual amount of housekeeping last month was a lot about getting old issues resolved by finishing some stale merge requests and work in pogress MRs. I also pushed out the Phosh 0.42.0 Release phosh phoc phosh-mobile-settings libphosh-rs phosh-osk-stub phosh-wallpapers meta-phosh Debian ModemManager Calls bluez gnome-text-editor feedbackd Chatty libcall-ui glib wlr-protocols git-buildpackage iio-sensor-proxy Fotema Help Development If you want to support my work see donations. This includes a list of hardware we want to improve support for. Thanks a lot to all current and past donors.

29 September 2024

Vasudev Kamath: Signing the systemd-boot on Upgrade Using Dpkg Triggers

In my previous post on enabling SecureBoot, I mentioned that one pending improvement was signing the systemd-boot EFI binary with my keys on every upgrade. In this post, we'll explore the implementation of this process using dpkg triggers. For an excellent introduction to dpkg triggers, refer to this archived blog post. The source code mentioned in that post can be downloaded from alioth archive. From /usr/share/doc/dpkg/spec/triggers.txt, triggers are described as follows:
A dpkg trigger is a facility that allows events caused by one package but of interest to another package to be recorded and aggregated, and processed later by the interested package. This feature simplifies various registration and system-update tasks and reduces duplication of processing.
To implement this, we create a custom package with a single script that signs the systemd-boot EFI binary using our key. The script is as simple as:
#!/bin/bash
set -e
echo "Signing the new systemd-bootx64.efi"
sbsign --key /etc/secureboot/db.key --cert /etc/secureboot/db.crt \
       /usr/lib/systemd/boot/efi/systemd-bootx64.efi
echo "Invoking bootctl install to copy stuff"
bootctl install
Invoking bootctl install is optional if we have enabled systemd-boot-update.service, which will update the signed bootloader on the next boot. We need to have a triggers file under the debian/ folder of the package, which declares its interest in modifications to the path /usr/lib/systemd/boot/efi/systemd-bootx64.efi. The trigger file looks like this:
# trigger 1 interest on systemd-bootx64.efi
interest-noawait /usr/lib/systemd/boot/efi/systemd-bootx64.efi
You can read about various directives and their meanings that can be used in the triggers file in the deb-triggers man page. Once we build and install the package, this request is added to /var/lib/dpkg/triggers/File. See the screenshot below after installation of our package: installed trigger To test the functionality, I performed a re-installation of the systemd-boot-efi package, which provides the EFI binary for systemd-boot, using the following command:
sudo apt install --reinstall systemd-boot-efi
During installation, you can see the debug message being printed in the screenshot below: systemd-boot-signer triggered To test the systemd-boot-update.service, I commented out the bootctl install line from the above script, performed a reinstallation, and restarted the systemd-boot-update.service. Checking the log, I saw the following:
Sep 29 13:42:51 chamunda systemd[1]: Stopping systemd-boot-update.service - Automatic Boot Loader Update...
Sep 29 13:42:51 chamunda systemd[1]: Starting systemd-boot-update.service - Automatic Boot Loader Update...
Sep 29 13:42:51 chamunda bootctl[1801516]: Skipping "/efi/EFI/systemd/systemd-bootx64.efi", same boot loader version in place already.
Sep 29 13:42:51 chamunda bootctl[1801516]: Skipping "/efi/EFI/BOOT/BOOTX64.EFI", same boot loader version in place already.
Sep 29 13:42:51 chamunda bootctl[1801516]: Skipping "/efi/EFI/BOOT/BOOTX64.EFI", same boot loader version in place already.
Sep 29 13:42:51 chamunda systemd[1]: Finished systemd-boot-update.service - Automatic Boot Loader Update.
Sep 29 13:43:37 chamunda systemd[1]: systemd-boot-update.service: Deactivated successfully.
Sep 29 13:43:37 chamunda systemd[1]: Stopped systemd-boot-update.service - Automatic Boot Loader Update.
Sep 29 13:43:37 chamunda systemd[1]: Stopping systemd-boot-update.service - Automatic Boot Loader Update...
Indeed, the service attempted to copy the bootloader but did not do so because there was no actual update to the binary; it was just a reinstallation trigger. The complete code for this package can be found here. With this post the entire series on using UKI to Secureboot with Debian comes to an end. Happy hacking!.

Reproducible Builds: Supporter spotlight: Kees Cook on Linux kernel security

The Reproducible Builds project relies on several projects, supporters and sponsors for financial support, but they are also valued as ambassadors who spread the word about our project and the work that we do. This is the eighth installment in a series featuring the projects, companies and individuals who support the Reproducible Builds project. We started this series by featuring the Civil Infrastructure Platform project, and followed this up with a post about the Ford Foundation as well as recent ones about ARDC, the Google Open Source Security Team (GOSST), Bootstrappable Builds, the F-Droid project, David A. Wheeler and Simon Butler. Today, however, we will be talking with Kees Cook, founder of the Kernel Self-Protection Project.

Vagrant Cascadian: Could you tell me a bit about yourself? What sort of things do you work on? Kees Cook: I m a Free Software junkie living in Portland, Oregon, USA. I have been focusing on the upstream Linux kernel s protection of itself. There is a lot of support that the kernel provides userspace to defend itself, but when I first started focusing on this there was not as much attention given to the kernel protecting itself. As userspace got more hardened the kernel itself became a bigger target. Almost 9 years ago I formally announced the Kernel Self-Protection Project because the work necessary was way more than my time and expertise could do alone. So I just try to get people to help as much as possible; people who understand the ARM architecture, people who understand the memory management subsystem to help, people who understand how to make the kernel less buggy.
Vagrant: Could you describe the path that lead you to working on this sort of thing? Kees: I have always been interested in security through the aspect of exploitable flaws. I always thought it was like a magic trick to make a computer do something that it was very much not designed to do and seeing how easy it is to subvert bugs. I wanted to improve that fragility. In 2006, I started working at Canonical on Ubuntu and was mainly focusing on bringing Debian and Ubuntu up to what was the state of the art for Fedora and Gentoo s security hardening efforts. Both had really pioneered a lot of userspace hardening with compiler flags and ELF stuff and many other things for hardened binaries. On the whole, Debian had not really paid attention to it. Debian s packaging building process at the time was sort of a chaotic free-for-all as there wasn t centralized build methodology for defining things. Luckily that did slowly change over the years. In Ubuntu we had the opportunity to apply top down build rules for hardening all the packages. In 2011 Chrome OS was following along and took advantage of a bunch of the security hardening work as they were based on ebuild out of Gentoo and when they looked for someone to help out they reached out to me. We recognized the Linux kernel was pretty much the weakest link in the Chrome OS security posture and I joined them to help solve that. Their userspace was pretty well handled but the kernel had a lot of weaknesses, so focusing on hardening was the next place to go. When I compared notes with other users of the Linux kernel within Google there were a number of common concerns and desires. Chrome OS already had an upstream first requirement, so I tried to consolidate the concerns and solve them upstream. It was challenging to land anything in other kernel team repos at Google, as they (correctly) wanted to minimize their delta from upstream, so I needed to work on any major improvements entirely in upstream and had a lot of support from Google to do that. As such, my focus shifted further from working directly on Chrome OS into being entirely upstream and being more of a consultant to internal teams, helping with integration or sometimes backporting. Since the volume of needed work was so gigantic I needed to find ways to inspire other developers (both inside and outside of Google) to help. Once I had a budget I tried to get folks paid (or hired) to work on these areas when it wasn t already their job.
Vagrant: So my understanding of some of your recent work is basically defining undefined behavior in the language or compiler? Kees: I ve found the term undefined behavior to have a really strict meaning within the compiler community, so I have tried to redefine my goal as eliminating unexpected behavior or ambiguous language constructs . At the end of the day ambiguity leads to bugs, and bugs lead to exploitable security flaws. I ve been taking a four-pronged approach: supporting the work people are doing to get rid of ambiguity, identify new areas where ambiguity needs to be removed, actually removing that ambiguity from the C language, and then dealing with any needed refactoring in the Linux kernel source to adapt to the new constraints. None of this is particularly novel; people have recognized how dangerous some of these language constructs are for decades and decades but I think it is a combination of hard problems and a lot of refactoring that nobody has the interest/resources to do. So, we have been incrementally going after the lowest hanging fruit. One clear example in recent years was the elimination of C s implicit fall-through in switch statements. The language would just fall through between adjacent cases if a break (or other code flow directive) wasn t present. But this is ambiguous: is the code meant to fall-through, or did the author just forget a break statement? By defining the [[fallthrough]] statement, and requiring its use in Linux, all switch statements now have explicit code flow, and the entire class of bugs disappeared. During our refactoring we actually found that 1 in 10 added [[fallthrough]] statements were actually missing break statements. This was an extraordinarily common bug! So getting rid of that ambiguity is where we have been. Another area I ve been spending a bit of time on lately is looking at how defensive security work has challenges associated with metrics. How do you measure your defensive security impact? You can t say because we installed locks on the doors, 20% fewer break-ins have happened. Much of our signal is always secondary or retrospective, which is frustrating: This class of flaw was used X much over the last decade so, and if we have eliminated that class of flaw and will never see it again, what is the impact? Is the impact infinity? Attackers will just move to the next easiest thing. But it means that exploitation gets incrementally more difficult. As attack surfaces are reduced, the expense of exploitation goes up.
Vagrant: So it is hard to identify how effective this is how bad would it be if people just gave up? Kees: I think it would be pretty bad, because as we have seen, using secondary factors, the work we have done in the industry at large, not just the Linux kernel, has had an impact. What we, Microsoft, Apple, and everyone else is doing for their respective software ecosystems, has shown that the price of functional exploits in the black market has gone up. Especially for really egregious stuff like a zero-click remote code execution. If those were cheap then obviously we are not doing something right, and it becomes clear that it s trivial for anyone to attack the infrastructure that our lives depend on. But thankfully we have seen over the last two decades that prices for exploits keep going up and up into millions of dollars. I think it is important to keep working on that because, as a central piece of modern computer infrastructure, the Linux kernel has a giant target painted on it. If we give up, we have to accept that our computers are not doing what they were designed to do, which I can t accept. The safety of my grandparents shouldn t be any different from the safety of journalists, and political activists, and anyone else who might be the target of attacks. We need to be able to trust our devices otherwise why use them at all?
Vagrant: What has been your biggest success in recent years? Kees: I think with all these things I am not the only actor. Almost everything that we have been successful at has been because of a lot of people s work, and one of the big ones that has been coordinated across the ecosystem and across compilers was initializing stack variables to 0 by default. This feature was added in Clang, GCC, and MSVC across the board even though there were a lot of fears about forking the C language. The worry was that developers would come to depend on zero-initialized stack variables, but this hasn t been the case because we still warn about uninitialized variables when the compiler can figure that out. So you still still get the warnings at compile time but now you can count on the contents of your stack at run-time and we drop an entire class of uninitialized variable flaws. While the exploitation of this class has mostly been around memory content exposure, it has also been used for control flow attacks. So that was politically and technically a large challenge: convincing people it was necessary, showing its utility, and implementing it in a way that everyone would be happy with, resulting in the elimination of a large and persistent class of flaws in C.
Vagrant: In a world where things are generally Reproducible do you see ways in which that might affect your work? Kees: One of the questions I frequently get is, What version of the Linux kernel has feature $foo? If I know how things are built, I can answer with just a version number. In a Reproducible Builds scenario I can count on the compiler version, compiler flags, kernel configuration, etc. all those things are known, so I can actually answer definitively that a certain feature exists. So that is an area where Reproducible Builds affects me most directly. Indirectly, it is just being able to trust the binaries you are running are going to behave the same for the same build environment is critical for sane testing.
Vagrant: Have you used diffoscope? Kees: I have! One subset of tree-wide refactoring that we do when getting rid of ambiguous language usage in the kernel is when we have to make source level changes to satisfy some new compiler requirement but where the binary output is not expected to change at all. It is mostly about getting the compiler to understand what is happening, what is intended in the cases where the old ambiguity does actually match the new unambiguous description of what is intended. The binary shouldn t change. We have used diffoscope to compare the before and after binaries to confirm that yep, there is no change in binary .
Vagrant: You cannot just use checksums for that? Kees: For the most part, we need to only compare the text segments. We try to hold as much stable as we can, following the Reproducible Builds documentation for the kernel, but there are macros in the kernel that are sensitive to source line numbers and as a result those will change the layout of the data segment (and sometimes the text segment too). With diffoscope there s flexibility where I can exclude or include different comparisons. Sometimes I just go look at what diffoscope is doing and do that manually, because I can tweak that a little harder, but diffoscope is definitely the default. Diffoscope is awesome!
Vagrant: Where has reproducible builds affected you? Kees: One of the notable wins of reproducible builds lately was dealing with the fallout of the XZ backdoor and just being able to ask the question is my build environment running the expected code? and to be able to compare the output generated from one install that never had a vulnerable XZ and one that did have a vulnerable XZ and compare the results of what you get. That was important for kernel builds because the XZ threat actor was working to expand their influence and capabilities to include Linux kernel builds, but they didn t finish their work before they were noticed. I think what happened with Debian proving the build infrastructure was not affected is an important example of how people would have needed to verify the kernel builds too.
Vagrant: What do you want to see for the near or distant future in security work? Kees: For reproducible builds in the kernel, in the work that has been going on in the ClangBuiltLinux project, one of the driving forces of code and usability quality has been the continuous integration work. As soon as something breaks, on the kernel side, the Clang side, or something in between the two, we get a fast signal and can chase it and fix the bugs quickly. I would like to see someone with funding to maintain a reproducible kernel build CI. There have been places where there are certain architecture configurations or certain build configuration where we lose reproducibility and right now we have sort of a standard open source development feedback loop where those things get fixed but the time in between introduction and fix can be large. Getting a CI for reproducible kernels would give us the opportunity to shorten that time.
Vagrant: Well, thanks for that! Any last closing thoughts? Kees: I am a big fan of reproducible builds, thank you for all your work. The world is a safer place because of it.
Vagrant: Likewise for your work!


For more information about the Reproducible Builds project, please see our website at reproducible-builds.org. If you are interested in ensuring the ongoing security of the software that underpins our civilisation and wish to sponsor the Reproducible Builds project, please reach out to the project by emailing contact@reproducible-builds.org.

26 September 2024

Vasudev Kamath: Disabling Lockdown Mode with Secure Boot on Distro Kernel

In my previous post, I mentioned that Lockdown mode is activated when Secure Boot is enabled. One way to override this was to use a self-compiled upstream kernel. However, sometimes we might want to use the distribution kernel itself. This post explains how to disable lockdown mode while keeping Secure Boot enabled with a distribution kernel.
Understanding Secure Boot Detection To begin, we need to understand how the kernel detects if Secure Boot is enabled. This is done by the efi_get_secureboot function, as shown in the image below: Secure Boot status check
Disabling Kernel Lockdown The kernel code uses the value of MokSBStateRT to identify the Secure Boot state, assuming that Secure Boot can only be enabled via shim. This assumption holds true when using the Microsoft certificate for signature validation (as Microsoft currently only signs shim). However, if we're using our own keys, we don't need shim and can sign the bootloader ourselves. In this case, the Secure Boot state of the system doesn't need to be tied to the MokSBStateRT variable. To disable kernel lockdown, we need to set the UEFI runtime variable MokSBStateRT. This essentially tricks the kernel into thinking Secure Boot is disabled when it's actually enabled. This is achieved using a UEFI initializing driver. The code for this was written by an anonymous colleague who also assisted me with various configuration guidance for setting up UKI and Secure Boot on my system. The code is available here.
Implementation Detailed instructions for compiling and deploying the code are provided in the repository, so I won't repeat them here.
Results I've tested this method with the default distribution kernel on my Debian unstable system, and it successfully disables lockdown while maintaining Secure Boot integrity. See the screenshot below for confirmation: Distribution kernel lockdown disabled

Melissa Wen: Reflections on 2024 Linux Display Next Hackfest

Hey everyone! The 2024 Linux Display Next hackfest concluded in May, and its outcomes continue to shape the Linux Display stack. Igalia hosted this year s event in A Coru a, Spain, bringing together leading experts in the field. Samuel Iglesias and I organized this year s edition and this blog post summarizes the experience and its fruits. One of the highlights of this year s hackfest was the wide range of backgrounds represented by our 40 participants (both on-site and remotely). Developers and experts from various companies and open-source projects came together to advance the Linux Display ecosystem. You can find the list of participants here. The event covered a broad spectrum of topics affecting the development of Linux projects, user experiences, and the future of display technologies on Linux. From cutting-edge topics to long-term discussions, you can check the event agenda here.

Organization Highlights The hackfest was marked by in-depth discussions and knowledge sharing among Linux contributors, making everyone inspired, informed, and connected to the community. Building on feedback from the previous year, we refined the unconference format to enhance participant preparation and engagement. Structured Agenda and Timeboxes: Each session had a defined scope, time limit (1h20 or 2h10), and began with an introductory talk on the topic.
  • Participant-Led Discussions: We pre-selected in-person participants to lead discussions, allowing them to prepare introductions, resources, and scope.
  • Transparent Scheduling: The schedule was shared in advance as GitHub issues, encouraging participants to review and prepare for sessions of interest.
Engaging Sessions: The hackfest featured a variety of topics, including presentations and discussions on how participants were addressing specific subjects within their companies.
  • No Breakout Rooms, No Overlaps: All participants chose to attend all sessions, eliminating the need for separate breakout rooms. We also adapted run-time schedule to keep everybody involved in the same topics.
  • Real-time Updates: We provided notifications and updates through dedicated emails and the event matrix room.
Strengthening Community Connections: The hackfest offered ample opportunities for networking among attendees.
  • Social Events: Igalia sponsored coffee breaks, lunches, and a dinner at a local restaurant.
  • Museum Visit: Participants enjoyed a sponsored visit to the Museum of Estrela Galicia Beer (MEGA).

Fruitful Discussions and Follow-up The structured agenda and breaks allowed us to cover multiple topics during the hackfest. These discussions have led to new display feature development and improvements, as evidenced by patches, merge requests, and implementations in project repositories and mailing lists. With the KMS color management API taking shape, we discussed refinements and best approaches to cover the variety of color pipeline from different hardware-vendors. We are also investigating techniques for a performant SDR<->HDR content reproduction and reducing latency and power consumption when using the color blocks of the hardware.

Color Management/HDR Color Management and HDR continued to be the hottest topic of the hackfest. We had three sessions dedicated to discuss Color and HDR across Linux Display stack layers.

Color/HDR (Kernel-Level) Harry Wentland (AMD) led this session. Here, kernel Developers shared the Color Management pipeline of AMD, Intel and NVidia. We counted with diagrams and explanations from HW-vendors developers that discussed differences, constraints and paths to fit them into the KMS generic color management properties such as advertising modeset needs, IN\_FORMAT, segmented LUTs, interpolation types, etc. Developers from Qualcomm and ARM also added information regarding their hardware. Upstream work related to this session:

Color/HDR (Compositor-Level) Sebastian Wick (RedHat) led this session. It started with Sebastian s presentation covering Wayland color protocols and compositor implementation. Also, an explanation of APIs provided by Wayland and how they can be used to achieve better color management for applications and discussions around ICC profiles and color representation metadata. There was also an intensive Q&A about LittleCMS with Marti Maria. Upstream work related to this session:

Color/HDR (Use Cases and Testing) Christopher Cameron (Google) and Melissa Wen (Igalia) led this session. In contrast to the other sessions, here we focused less on implementation and more on brainstorming and reflections of real-world SDR and HDR transformations (use and validation) and gainmaps. Christopher gave a nice presentation explaining HDR gainmap images and how we should think of HDR. This presentation and Q&A were important to put participants at the same page of how to transition between SDR and HDR and somehow emulating HDR. We also discussed on the usage of a kernel background color property. Finally, we discussed a bit about Chamelium and the future of VKMS (future work and maintainership).

Power Savings vs Color/Latency Mario Limonciello (AMD) led this session. Mario gave an introductory presentation about AMD ABM (adaptive backlight management) that is similar to Intel DPST. After some discussions, we agreed on exposing a kernel property for power saving policy. This work was already merged on kernel and the userspace support is under development. Upstream work related to this session:

Strategy for video and gaming use-cases Leo Li (AMD) led this session. Miguel Casas (Google) started this session with a presentation of Overlays in Chrome/OS Video, explaining the main goal of power saving by switching off GPU for accelerated compositing and the challenges of different colorspace/HDR for video on Linux. Then Leo Li presented different strategies for video and gaming and we discussed the userspace need of more detailed feedback mechanisms to understand failures when offloading. Also, creating a debugFS interface came up as a tool for debugging and analysis.

Real-time scheduling and async KMS API Xaver Hugl (KDE/BlueSystems) led this session. Compositor developers have exposed some issues with doing real-time scheduling and async page flips. One is that the Kernel limits the lifetime of realtime threads and if a modeset takes too long, the thread will be killed and thus the compositor as well. Also, simple page flips take longer than expected and drivers should optimize them. Another issue is the lack of feedback to compositors about hardware programming time and commit deadlines (the lastest possible time to commit). This is difficult to predict from drivers, since it varies greatly with the type of properties. For example, color management updates take much longer. In this regard, we discusssed implementing a hw_done callback to timestamp when the hardware programming of the last atomic commit is complete. Also an API to pre-program color pipeline in a kind of A/B scheme. It may not be supported by all drivers, but might be useful in different ways.

VRR/Frame Limit, Display Mux, Display Control, and more and beer We also had sessions to discuss a new KMS API to mitigate headaches on VRR and Frame Limit as different brightness level at different refresh rates, abrupt changes of refresh rates, low frame rate compensation (LFC) and precise timing in VRR more. On Display Control we discussed features missing in the current KMS interface for HDR mode, atomic backlight settings, source-based tone mapping, etc. We also discussed the need of a place where compositor developers can post TODOs to be developed by KMS people. The Content-adaptive Scaling and Sharpening session focused on sharpening and scaling filters. In the Display Mux session, we discussed proposals to expose the capability of dynamic mux switching display signal between discrete and integrated GPUs. In the last session of the 2024 Display Next Hackfest, participants representing different compositors summarized current and future work and built a Linux Display wish list , which includes: improvements to VTTY and HDR switching, better dmabuf API for multi-GPU support, definition of tone mapping, blending and scaling sematics, and wayland protocols for advertising to clients which colorspaces are supported. We closed this session with a status update on feature development by compositors, including but not limited to: plane offloading (from libcamera to output) / HDR video offloading (dma-heaps) / plane-based scrolling for web pages, color management / HDR / ICC profiles support, addressing issues such as flickering when color primaries don t match, etc. After three days of intensive discussions, all in-person participants went to a guided tour at the Museum of Extrela Galicia beer (MEGA), pouring and tasting the most famous local beer.

Feedback and Future Directions Participants provided valuable feedback on the hackfest, including suggestions for future improvements.
  • Schedule and Break-time Setup: Having a pre-defined agenda and schedule provided a better balance between long discussions and mental refreshments, preventing the fatigue caused by endless discussions.
  • Action Points: Some participants recommended explicitly asking for action points at the end of each session and assigning people to follow-up tasks.
  • Remote Participation: Remote attendees appreciated the inclusive setup and opportunities to actively participate in discussions.
  • Technical Challenges: There were bandwidth and video streaming issues during some sessions due to the large number of participants.

Thank you for joining the 2024 Display Next Hackfest We can t help but thank the 40 participants, who engaged in-person or virtually on relevant discussions, for a collaborative evolution of the Linux display stack and for building an insightful agenda. A big thank you to the leaders and presenters of the nine sessions: Christopher Cameron (Google), Harry Wentland (AMD), Leo Li (AMD), Mario Limoncello (AMD), Sebastian Wick (RedHat) and Xaver Hugl (KDE/BlueSystems) for the effort in preparing the sessions, explaining the topic and guiding discussions. My acknowledge to the others in-person participants that made such an effort to travel to A Coru a: Alex Goins (NVIDIA), David Turner (Raspberry Pi), Georges Stavracas (Igalia), Joan Torres (SUSE), Liviu Dudau (Arm), Louis Chauvet (Bootlin), Robert Mader (Collabora), Tian Mengge (GravityXR), Victor Jaquez (Igalia) and Victoria Brekenfeld (System76). It was and awesome opportunity to meet you and chat face-to-face. Finally, thanks virtual participants who couldn t make it in person but organized their days to actively participate in each discussion, adding different perspectives and valuable inputs even remotely: Abhinav Kumar (Qualcomm), Chaitanya Borah (Intel), Christopher Braga (Qualcomm), Dor Askayo (Red Hat), Jiri Koten (RedHat), Jonas dahl (Red Hat), Leandro Ribeiro (Collabora), Marti Maria (Little CMS), Marijn Suijten, Mario Kleiner, Martin Stransky (Red Hat), Michel D nzer (Red Hat), Miguel Casas-Sanchez (Google), Mitulkumar Golani (Intel), Naveen Kumar (Intel), Niels De Graef (Red Hat), Pekka Paalanen (Collabora), Pichika Uday Kiran (AMD), Shashank Sharma (AMD), Sriharsha PV (AMD), Simon Ser, Uma Shankar (Intel) and Vikas Korjani (AMD). We look forward to another successful Display Next hackfest, continuing to drive innovation and improvement in the Linux display ecosystem!

25 September 2024

Melissa Wen: Reflections on 2024 Linux Display Next Hackfest

Hey everyone! The 2024 Linux Display Next hackfest concluded in May, and its outcomes continue to shape the Linux Display stack. Igalia hosted this year s event in A Coru a, Spain, bringing together leading experts in the field. Samuel Iglesias and I organized this year s edition and this blog post summarizes the experience and its fruits. One of the highlights of this year s hackfest was the wide range of backgrounds represented by our 40 participants (both on-site and remotely). Developers and experts from various companies and open-source projects came together to advance the Linux Display ecosystem. You can find the list of participants here. The event covered a broad spectrum of topics affecting the development of Linux projects, user experiences, and the future of display technologies on Linux. From cutting-edge topics to long-term discussions, you can check the event agenda here.

Organization Highlights The hackfest was marked by in-depth discussions and knowledge sharing among Linux contributors, making everyone inspired, informed, and connected to the community. Building on feedback from the previous year, we refined the unconference format to enhance participant preparation and engagement. Structured Agenda and Timeboxes: Each session had a defined scope, time limit (1h20 or 2h10), and began with an introductory talk on the topic.
  • Participant-Led Discussions: We pre-selected in-person participants to lead discussions, allowing them to prepare introductions, resources, and scope.
  • Transparent Scheduling: The schedule was shared in advance as GitHub issues, encouraging participants to review and prepare for sessions of interest.
Engaging Sessions: The hackfest featured a variety of topics, including presentations and discussions on how participants were addressing specific subjects within their companies.
  • No Breakout Rooms, No Overlaps: All participants chose to attend all sessions, eliminating the need for separate breakout rooms. We also adapted run-time schedule to keep everybody involved in the same topics.
  • Real-time Updates: We provided notifications and updates through dedicated emails and the event matrix room.
Strengthening Community Connections: The hackfest offered ample opportunities for networking among attendees.
  • Social Events: Igalia sponsored coffee breaks, lunches, and a dinner at a local restaurant.
  • Museum Visit: Participants enjoyed a sponsored visit to the Museum of Estrela Galicia Beer (MEGA).

Fruitful Discussions and Follow-up The structured agenda and breaks allowed us to cover multiple topics during the hackfest. These discussions have led to new display feature development and improvements, as evidenced by patches, merge requests, and implementations in project repositories and mailing lists. With the KMS color management API taking shape, we discussed refinements and best approaches to cover the variety of color pipeline from different hardware-vendors. We are also investigating techniques for a performant SDR<->HDR content reproduction and reducing latency and power consumption when using the color blocks of the hardware.

Color Management/HDR Color Management and HDR continued to be the hottest topic of the hackfest. We had three sessions dedicated to discuss Color and HDR across Linux Display stack layers.

Color/HDR (Kernel-Level) Harry Wentland (AMD) led this session. Here, kernel Developers shared the Color Management pipeline of AMD, Intel and NVidia. We counted with diagrams and explanations from HW-vendors developers that discussed differences, constraints and paths to fit them into the KMS generic color management properties such as advertising modeset needs, IN\_FORMAT, segmented LUTs, interpolation types, etc. Developers from Qualcomm and ARM also added information regarding their hardware. Upstream work related to this session:

Color/HDR (Compositor-Level) Sebastian Wick (RedHat) led this session. It started with Sebastian s presentation covering Wayland color protocols and compositor implementation. Also, an explanation of APIs provided by Wayland and how they can be used to achieve better color management for applications and discussions around ICC profiles and color representation metadata. There was also an intensive Q&A about LittleCMS with Marti Maria. Upstream work related to this session:

Color/HDR (Use Cases and Testing) Christopher Cameron (Google) and Melissa Wen (Igalia) led this session. In contrast to the other sessions, here we focused less on implementation and more on brainstorming and reflections of real-world SDR and HDR transformations (use and validation) and gainmaps. Christopher gave a nice presentation explaining HDR gainmap images and how we should think of HDR. This presentation and Q&A were important to put participants at the same page of how to transition between SDR and HDR and somehow emulating HDR. We also discussed on the usage of a kernel background color property. Finally, we discussed a bit about Chamelium and the future of VKMS (future work and maintainership).

Power Savings vs Color/Latency Mario Limonciello (AMD) led this session. Mario gave an introductory presentation about AMD ABM (adaptive backlight management) that is similar to Intel DPST. After some discussions, we agreed on exposing a kernel property for power saving policy. This work was already merged on kernel and the userspace support is under development. Upstream work related to this session:

Strategy for video and gaming use-cases Leo Li (AMD) led this session. Miguel Casas (Google) started this session with a presentation of Overlays in Chrome/OS Video, explaining the main goal of power saving by switching off GPU for accelerated compositing and the challenges of different colorspace/HDR for video on Linux. Then Leo Li presented different strategies for video and gaming and we discussed the userspace need of more detailed feedback mechanisms to understand failures when offloading. Also, creating a debugFS interface came up as a tool for debugging and analysis.

Real-time scheduling and async KMS API Xaver Hugl (KDE/BlueSystems) led this session. Compositor developers have exposed some issues with doing real-time scheduling and async page flips. One is that the Kernel limits the lifetime of realtime threads and if a modeset takes too long, the thread will be killed and thus the compositor as well. Also, simple page flips take longer than expected and drivers should optimize them. Another issue is the lack of feedback to compositors about hardware programming time and commit deadlines (the lastest possible time to commit). This is difficult to predict from drivers, since it varies greatly with the type of properties. For example, color management updates take much longer. In this regard, we discusssed implementing a hw_done callback to timestamp when the hardware programming of the last atomic commit is complete. Also an API to pre-program color pipeline in a kind of A/B scheme. It may not be supported by all drivers, but might be useful in different ways.

VRR/Frame Limit, Display Mux, Display Control, and more and beer We also had sessions to discuss a new KMS API to mitigate headaches on VRR and Frame Limit as different brightness level at different refresh rates, abrupt changes of refresh rates, low frame rate compensation (LFC) and precise timing in VRR more. On Display Control we discussed features missing in the current KMS interface for HDR mode, atomic backlight settings, source-based tone mapping, etc. We also discussed the need of a place where compositor developers can post TODOs to be developed by KMS people. The Content-adaptive Scaling and Sharpening session focused on sharpening and scaling filters. In the Display Mux session, we discussed proposals to expose the capability of dynamic mux switching display signal between discrete and integrated GPUs. In the last session of the 2024 Display Next Hackfest, participants representing different compositors summarized current and future work and built a Linux Display wish list , which includes: improvements to VTTY and HDR switching, better dmabuf API for multi-GPU support, definition of tone mapping, blending and scaling sematics, and wayland protocols for advertising to clients which colorspaces are supported. We closed this session with a status update on feature development by compositors, including but not limited to: plane offloading (from libcamera to output) / HDR video offloading (dma-heaps) / plane-based scrolling for web pages, color management / HDR / ICC profiles support, addressing issues such as flickering when color primaries don t match, etc. After three days of intensive discussions, all in-person participants went to a guided tour at the Museum of Extrela Galicia beer (MEGA), pouring and tasting the most famous local beer.

Feedback and Future Directions Participants provided valuable feedback on the hackfest, including suggestions for future improvements.
  • Schedule and Break-time Setup: Having a pre-defined agenda and schedule provided a better balance between long discussions and mental refreshments, preventing the fatigue caused by endless discussions.
  • Action Points: Some participants recommended explicitly asking for action points at the end of each session and assigning people to follow-up tasks.
  • Remote Participation: Remote attendees appreciated the inclusive setup and opportunities to actively participate in discussions.
  • Technical Challenges: There were bandwidth and video streaming issues during some sessions due to the large number of participants.

Thank you for joining the 2024 Display Next Hackfest We can t help but thank the 40 participants, who engaged in-person or virtually on relevant discussions, for a collaborative evolution of the Linux display stack and for building an insightful agenda. A big thank you to the leaders and presenters of the nine sessions: Christopher Cameron (Google), Harry Wentland (AMD), Leo Li (AMD), Mario Limoncello (AMD), Sebastian Wick (RedHat) and Xaver Hugl (KDE/BlueSystems) for the effort in preparing the sessions, explaining the topic and guiding discussions. My acknowledge to the others in-person participants that made such an effort to travel to A Coru a: Alex Goins (NVIDIA), David Turner (Raspberry Pi), Georges Stavracas (Igalia), Joan Torres (SUSE), Liviu Dudau (Arm), Louis Chauvet (Bootlin), Robert Mader (Collabora), Tian Mengge (GravityXR), Victor Jaquez (Igalia) and Victoria Brekenfeld (System76). It was and awesome opportunity to meet you and chat face-to-face. Finally, thanks virtual participants who couldn t make it in person but organized their days to actively participate in each discussion, adding different perspectives and valuable inputs even remotely: Abhinav Kumar (Qualcomm), Chaitanya Borah (Intel), Christopher Braga (Qualcomm), Dor Askayo, Jiri Koten (RedHat), Jonas dahl (Red Hat), Leandro Ribeiro (Collabora), Marti Maria (Little CMS), Marijn Suijten, Mario Kleiner, Martin Stransky (Red Hat), Michel D nzer (Red Hat), Miguel Casas-Sanchez (Google), Mitulkumar Golani (Intel), Naveen Kumar (Intel), Niels De Graef (Red Hat), Pekka Paalanen (Collabora), Pichika Uday Kiran (AMD), Shashank Sharma (AMD), Sriharsha PV (AMD), Simon Ser, Uma Shankar (Intel) and Vikas Korjani (AMD). We look forward to another successful Display Next hackfest, continuing to drive innovation and improvement in the Linux display ecosystem!

17 September 2024

Dirk Eddelbuettel: nanotime 0.3.10 on CRAN: Update

A minor update 0.3.10 for our nanotime package is now on CRAN. nanotime relies on the RcppCCTZ package (as well as the RcppDate package for additional C++ operations) and offers efficient high(er) resolution time parsing and formatting up to nanosecond resolution, using the bit64 package for the actual integer64 arithmetic. Initially implemented using the S3 system, it has benefitted greatly from a rigorous refactoring by Leonardo who not only rejigged nanotime internals in S4 but also added new S4 types for periods, intervals and durations. This release updates one S4 methods to very recent changes in r-devel for which CRAN had reached out. This concerns the setdiff() method when applied to two nanotime objects. As it only affected R 4.5.0, due next April, if rebuilt in the last two or so weeks it will not have been visible to that many users, if any. In any event, it now works again for that setup too, and should be going forward. We also retired one demo function from the very early days, apparently it relied on ggplot2 features that have since moved on. If someone would like to help out and resurrect the demo, please get in touch. We also cleaned out some no longer used tests, and updated DESCRIPTION to what is required now. The NEWS snippet below has the full details.

Changes in version 0.3.10 (2024-09-16)
  • Retire several checks for Solaris in test suite (Dirk in #130)
  • Switch to Authors@R in DESCRIPTION as now required by CRAN
  • Accommodate R-devel change for setdiff (Dirk in #133 fixing #132)
  • No longer ship defunction ggplot2 demo (Dirk fixing #131)

Thanks to my CRANberries, there is a diffstat report for this release. More details and examples are at the nanotime page; code, issue tickets etc at the GitHub repository and all documentation is provided at the nanotime documentation site. If you like this or other open-source work I do, you can sponsor me at GitHub.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

11 September 2024

Jamie McClelland: MariaDB mystery

I keep getting an error in our backup logs:
Sep 11 05:08:03 Warning: mysqldump: Error 2013: Lost connection to server during query when dumping table  1C4Uonkwhe_options  at row: 1402
Sep 11 05:08:03 Warning: Failed to dump mysql databases ic_wp
It s a WordPress database having trouble dumping the options table. The error log has a corresponding message:
Sep 11 13:50:11 mysql007 mariadbd[580]: 2024-09-11 13:50:11 69577 [Warning] Aborted connection 69577 to db: 'ic_wp' user: 'root' host: 'localhost' (Got an error writing communication packets)
The Internet is full of suggestions, almost all of which either focus on the network connection between the client and the server or the FEDERATED plugin. We aren t using the federated plugin and this error happens when conneting via the socket. Check it out - what is better than a consistently reproducible problem! It happens if I try to select all the values in the table:
root@mysql007:~# mysql --protocol=socket -e 'select * from 1C4Uonkwhe_options' ic_wp > /dev/null
ERROR 2013 (HY000) at line 1: Lost connection to server during query
root@mysql007:~#
It happens when I specifiy one specific offset:
root@mysql007:~# mysql --protocol=socket -e 'select * from 1C4Uonkwhe_options limit 1 offset 1402' ic_wp
ERROR 2013 (HY000) at line 1: Lost connection to server during query
root@mysql007:~#
It happens if I specify the field name explicitly:
root@mysql007:~# mysql --protocol=socket -e 'select option_id,option_name,option_value,autoload from 1C4Uonkwhe_options limit 1 offset 1402' ic_wp
ERROR 2013 (HY000) at line 1: Lost connection to server during query
root@mysql007:~#
It doesn t happen if I specify the key field:
root@mysql007:~# mysql --protocol=socket -e 'select option_id from 1C4Uonkwhe_options limit 1 offset 1402' ic_wp
+-----------+
  option_id  
+-----------+
   16296351  
+-----------+
root@mysql007:~#
It does happen if I specify the value field:
root@mysql007:~# mysql --protocol=socket -e 'select option_value from 1C4Uonkwhe_options limit 1 offset 1402' ic_wp
ERROR 2013 (HY000) at line 1: Lost connection to server during query
root@mysql007:~#
It doesn t happen if I query the specific row by key field:
root@mysql007:~# mysql --protocol=socket -e 'select * from 1C4Uonkwhe_options where option_id = 16296351' ic_wp
+-----------+----------------------+--------------+----------+
  option_id   option_name            option_value   autoload  
+-----------+----------------------+--------------+----------+
   16296351   z_taxonomy_image8905                  yes       
+-----------+----------------------+--------------+----------+
root@mysql007:~#
Hm. Surely there is some funky non-printing character in that option_value right?
root@mysql007:~# mysql --protocol=socket -e 'select CHAR_LENGTH(option_value) from 1C4Uonkwhe_options where option_id = 16296351' ic_wp
+---------------------------+
  CHAR_LENGTH(option_value)  
+---------------------------+
                          0  
+---------------------------+
root@mysql007:~# mysql --protocol=socket -e 'select HEX(option_value) from 1C4Uonkwhe_options where option_id = 16296351' ic_wp
+-------------------+
  HEX(option_value)  
+-------------------+
                     
+-------------------+
root@mysql007:~#
Resetting the value to an empty value doesn t make a difference:
root@mysql007:~# mysql --protocol=socket -e 'update 1C4Uonkwhe_options set option_value = "" where option_id = 16296351' ic_wp
root@mysql007:~# mysql --protocol=socket -e 'select * from 1C4Uonkwhe_options' ic_wp > /dev/null
ERROR 2013 (HY000) at line 1: Lost connection to server during query
root@mysql007:~#
Deleting the row in question causes the error to specify a new offset:
root@mysql007:~# mysql --protocol=socket -e 'delete from 1C4Uonkwhe_options where option_id = 16296351' ic_wp
root@mysql007:~# mysql --protocol=socket -e 'select * from 1C4Uonkwhe_options' ic_wp > /dev/null
ERROR 2013 (HY000) at line 1: Lost connection to server during query
root@mysql007:~# mysqldump ic_wp > /dev/null
mysqldump: Error 2013: Lost connection to server during query when dumping table  1C4Uonkwhe_options  at row: 1401
root@mysql007:~#
If I put the record I deleted back in, we return to the old offset:
root@mysql007:~# mysql --protocol=socket -e 'insert into 1C4Uonkwhe_options VALUES(16296351,"z_taxonomy_image8905","","yes");' ic_wp 
root@mysql007:~# mysqldump ic_wp > /dev/null
mysqldump: Error 2013: Lost connection to server during query when dumping table  1C4Uonkwhe_options  at row: 1402
root@mysql007:~#
I m losing my little mind. Let s get drastic and create a whole new table, copy over the data delicately working around the deadly offset:
oot@mysql007:~# mysql --protocol=socket -e 'create table 1C4Uonkwhe_new_options like 1C4Uonkwhe_options;' ic_wp 
root@mysql007:~# mysql --protocol=socket -e 'insert into 1C4Uonkwhe_new_options select * from 1C4Uonkwhe_options limit 1402 offset 0;' ic_wp 
--- There is only 33 more records, not sure how to specify unlimited limit but 100 does the trick.
root@mysql007:~# mysql --protocol=socket -e 'insert into 1C4Uonkwhe_new_options select * from 1C4Uonkwhe_options limit 100 offset 1403;' ic_wp 
Now let s make sure all is working properly:
root@mysql007:~# mysql --protocol=socket -e 'select * from 1C4Uonkwhe_new_options' ic_wp >/dev/null;
Now let s examine which row we are missing:
root@mysql007:~# mysql --protocol=socket -e 'select option_id from 1C4Uonkwhe_options where option_id not in (select option_id from 1C4Uonkwhe_new_options) ;' ic_wp 
+-----------+
  option_id  
+-----------+
   18405297  
+-----------+
root@mysql007:~#
Wait, what? I was expecting option_id 16296351. Oh, now we are getting somewhere. And I see my mistake: when using offsets, you need to use ORDER BY or you won t get consistent results.
root@mysql007:~# mysql --protocol=socket -e 'select option_id from 1C4Uonkwhe_options order by option_id limit 1 offset 1402' ic_wp ;
+-----------+
  option_id  
+-----------+
   18405297  
+-----------+
root@mysql007:~#
Now that I have the correct row what is in it:
root@mysql007:~# mysql --protocol=socket -e 'select * from 1C4Uonkwhe_options where option_id = 18405297' ic_wp ;
ERROR 2013 (HY000) at line 1: Lost connection to server during query
root@mysql007:~#
Well, that makes a lot more sense. Let s start over with examining the value:
root@mysql007:~# mysql --protocol=socket -e 'select CHAR_LENGTH(option_value) from 1C4Uonkwhe_options where option_id = 18405297' ic_wp ;
+---------------------------+
  CHAR_LENGTH(option_value)  
+---------------------------+
                   50814767  
+---------------------------+
root@mysql007:~#
Wow, that s a lot of characters. If it were a book, it would be 35,000 pages long (I just discovered this site). It s a LONGTEXT field so it should be able to handle it. But now I have a better idea of what could be going wrong. The name of the option is rewrite_rules so it seems like something is going wrong with the generation of that option. I imagine there is some tweak I can make to allow MariaDB to cough up the value (read_buffer_size? tmp_table_size?). But I ll start with checking in with the database owner because I don t think 35,000 pages of rewrite rules is appropriate for any site.

10 September 2024

Freexian Collaborators: Debian Contributions: Python 3 patches, OpenSSH GSS-API split, rebootstrap, salsa CI, etc. (by Anupa Ann Joseph)

Debian Contributions: 2024-08 Contributing to Debian is part of Freexian s mission. This article covers the latest achievements of Freexian and their collaborators. All of this is made possible by organizations subscribing to our Long Term Support contracts and consulting services.

Debian Python 3 patch review, by Stefano Rivera Last month, at DebConf, Stefano reviewed the current patch set of Debian s cPython packages with Matthias Klose, the primary maintainer until now. As a result of that review, Stefano re-reviewed the patchset, updating descriptions, etc. A few patches were able to be dropped, and a few others were forwarded upstream. One finds all sorts of skeletons doing reviews like this. One of the patches had been inactive (fortunately, because it was buggy) since the day it was applied, 13 years ago. One is a cleanup that probably only fixes a bug on HPUX, and is a result of copying code from xfree86 into Python 25 years ago. It was fixed in xfree86 a year later. Others support just Debian-specific functionality and probably never seemed worth forwarding. Or good cleanup that only really applies to Debian. A trivial new patch would allow Debian to multiarch co-install Python stable ABI dynamic extensions (like we can with regular dynamic extensions). Performance concerns are stalling it in review, at the moment.

DebConf 24 Organization, by Stefano Rivera Stefano helped organize DebConf 24, which concluded in early August. The event is run by a large entirely volunteer team. The work involved in making this happen is far too varied to describe here. While Freexian provides funding for 20% of collaborator time to spend on Debian-related work, it only covers a small fraction of contributions to time-intensive tasks like this. Since the end of the event, Stefano has been doing some work on the conference finances, and initiated the reimbursement process for travel bursaries.

Archive rebuilds on Debusine, by Stefano Rivera The recent setuptools 73 upload to Debian unstable removed the test subcommand, breaking many packages that were using python3 setup.py test in their Debian packaging. Stefano did a partial archive-rebuild using debusine.debian.net to find the regressions and file bugs. Debusine will be a powerful tool to do QA work like this for Debian in the future, but it doesn t have all the features needed to coordinate rebuild-testing, yet. They are planned to be fleshed out in the next year. In the meantime, Debusine has the building blocks to work through a queue of package building tasks and store the results, it just needs to be driven from outside the system. So, Stefano started working on a set of tools using the Debusine client API to perform archive rebuilds, found and tagged existing bugs, and filed many more.

OpenSSH GSS-API split, by Colin Watson Colin landed the first stage of the planned split of GSS-API authentication and key exchange support in Debian s OpenSSH packaging. In order to allow for smooth upgrades, the second stage will have to wait until after the Debian 13 (trixie) release; but once that s done, as upstream puts it, this substantially reduces the amount of pre-authentication attack surface exposed on your users sshd by default .

OpenSSL vs. cryptography, by Colin Watson Colin facilitated a discussion between Debian s OpenSSL team and the upstream maintainers of Python cryptography about a new incompatibility between Debian s OpenSSL packaging and cryptography s handling of OpenSSL s legacy provider, which was causing a number of build and test failures. While the issue remains open, the Debian OpenSSL maintainers have effectively reverted the change now, so it s no longer a pressing problem.

/usr-move, by Helmut Grohne There are less than 40 source packages left to move files to /usr, so what we re left with is the long tail of the transition. Rather than fix all of them, Helmut started a discussion on removing packages from unstable and filed a first batch. As libvirt is being restructured in experimental, we re handling the fallout in collaboration with its maintainer Andrea Bolognani. Since base-files validates the aliasing symlinks before upgrading, it was discovered that systemd has its own ideas with no solution as of yet. Helmut also proposed that dash checks for ineffective diversions of /bin/sh and that lintian warns about aliased files.

rebootstrap by Helmut Grohne Bootstrapping Debian for a new or existing CPU architecture still is a quite manual process. The rebootstrap project attempts to automate part of the early stage, but it still is very sensitive to changes in unstable. We had a number of fairly intrusive changes this year already. August included a little more fallout from the earlier gcc-for-host work where the C++ include search path would end up being wrong in the generated cross toolchain. A number of packages such as util-linux (twice), libxml2, libcap-ng or systemd had their stage profiles broken. e2fsprogs gained a cycle with libarchive-dev due to having gained support for creating an ext4 filesystem from a tar archive. The restructuring of glib2.0 remains an unsolved problem for now, but libxt and cdebconf should be buildable without glib2.0.

Salsa CI, by Santiago Ruano Rinc n Santiago completed the initial RISC-V support (!523) in the Salsa CI s pipeline. The main work started in July, but it was required to take into account some comments in the review (thanks to Ahmed!) and some final details in [!534]. riscv64 is the most recently supported port in Debian, which will be part of trixie. As its name suggests, the new build-riscv64 job makes it possible to test that a package successfully builds in the riscv64 architecture. The RISC-V runner (salsaci riscv64 runner 01) runs in a couple of machines generously provided by lab.rvperf.org. Debian Developers interested in running this job in their projects should enable the runner (salsaci riscv64 runner 01) in Settings / CI / Runners, and follow the instructions available at https://salsa.debian.org/salsa-ci-team/pipeline/#build-job-on-risc-v. Santiago also took part in discussions about how to optimize the build jobs and reviewed !537 to make the build-source job to only satisfy the Build-Depends and Build-Conflicts fields by Andrea Pappacoda. Thanks a lot to him!

Miscellaneous contributions
  • Stefano submitted patches for BeautifulSoup to support the latest soupsieve and lxml.
  • Stefano uploaded pypy3 7.3.17, upgrading the cPython compatibility from 3.9 to 3.10. Then ran into a GCC-14-related regression, which had to be ignored for now as it s proving hard to fix.
  • Colin released libpipeline 1.5.8 and man-db 2.13.0; the latter included foundations allowing adding an autopkgtest for man-db.
  • Colin upgraded 19 Python packages to new upstream versions (fixing 5 CVEs), fixed several other build failures, fixed a Python 3.12 compatibility issue in zope.security, and made python-nacl build reproducibly.
  • Colin tracked down test failures in python-asyncssh and Ruby resulting from certain odd /etc/hosts configurations.
  • Carles upgraded the packages python-ring-doorbell and simplemonitor to new upstream versions.
  • Carles started discussions and implementation of a tool (still in early days) named po-debconf-manager : a way for translators and reviewers to collaborate using git as a backend instead of mailing list; and submit the translations using salsa MR. More information next month.
  • Carles (dog-fooding po-debconf-manager ) reviewed debconf templates translated by a collaborator.
  • Carles reviewed and submitted the translation of apt .
  • Helmut sent 19 patches for improving cross building.
  • Helmut implemented the cross-exe-wrapper proposed by Simon McVittie for use with glib2.0.
  • Helmut detailed what it takes to make Perl s ExtUtils::PkgConfig suitable for cross building.
  • Helmut made the deletion of the root password work in debvm in all situations and implemented a test case using expect.
  • Anupa attended Debian Publicity team meeting and is moderating and posting on Debian Administrators LinkedIn group.
  • Thorsten uploaded package gutenprint to fix a FTBFS with gcc14 and package ipp-usb to fix a /usr-merge issue.
  • Santiago updated bzip2 to fix a long-standing bug that requested to include a pkg-config file. An important impact of this change is that it makes it possible to use Rust bindings for libbz2 by Sequoia, an implementation of OpenPGP.

9 September 2024

Ben Hutchings: FOSS activity in July 2024

8 September 2024

Thorsten Alteholz: My Debian Activities in August 2024

FTP master This month I accepted 441 and rejected 15 packages. The overall number of packages that got accepted was 442. I am ashamed of some occurrences that happened this month and I apologize for this. Unfortunately I have no idea how to prevent this in the future without becoming a solo entertainer. Debian LTS This was my hundred-twenty-second month that I did some work for the Debian LTS initiative, started by Raphael Hertzog at Freexian. Unfortunately Bullseye was not handed over to LTS in August. So I only prepared new packages of asterisk, libvirt and tinyproxy and will upload them next month. Last but not least I did a week of FD this month. Debian ELTS This month was the seventy-third ELTS month. During my allocated time I uploaded or worked on: I also did a week of FD. Debian Printing This month I uploaded This work is generously funded by Freexian! Debian Astro This month I uploaded a new upstream or bugfix version of: Debian Mobcom The following packages have been prepared by the GSoC student Nathan: It was so much fun working with Nathan. Unfortunately GSoC is over now, but Nathan will continue working in Debian and become a Debian Maintainer. misc This month I uploaded new upstream or bugfix versions of: I also filed an RM bug against meep-openmpi. As Adrian made me ware, this package is no longer needed.

Jacob Adams: Linux's Bedtime Routine

How does Linux move from an awake machine to a hibernating one? How does it then manage to restore all state? These questions led me to read way too much C in trying to figure out how this particular hardware/software boundary is navigated. This investigation will be split into a few parts, with the first one going from invocation of hibernation to synchronizing all filesystems to disk. This article has been written using Linux version 6.9.9, the source of which can be found in many places, but can be navigated easily through the Bootlin Elixir Cross-Referencer: https://elixir.bootlin.com/linux/v6.9.9/source Each code snippet will begin with a link to the above giving the file path and the line number of the beginning of the snippet.

A Starting Point for Investigation: /sys/power/state and /sys/power/disk These two system files exist to allow debugging of hibernation, and thus control the exact state used directly. Writing specific values to the state file controls the exact sleep mode used and disk controls the specific hibernation mode1. This is extremely handy as an entry point to understand how these systems work, since we can just follow what happens when they are written to.

Show and Store Functions These two files are defined using the power_attr macro: kernel/power/power.h:80
#define power_attr(_name) \
static struct kobj_attribute _name##_attr =     \
    .attr   =               \
        .name = __stringify(_name), \
        .mode = 0644,           \
     ,                  \
    .show   = _name##_show,         \
    .store  = _name##_store,        \
 
show is called on reads and store on writes. state_show is a little boring for our purposes, as it just prints all the available sleep states. kernel/power/main.c:657
/*
 * state - control system sleep states.
 *
 * show() returns available sleep state labels, which may be "mem", "standby",
 * "freeze" and "disk" (hibernation).
 * See Documentation/admin-guide/pm/sleep-states.rst for a description of
 * what they mean.
 *
 * store() accepts one of those strings, translates it into the proper
 * enumerated value, and initiates a suspend transition.
 */
static ssize_t state_show(struct kobject *kobj, struct kobj_attribute *attr,
			  char *buf)
 
	char *s = buf;
#ifdef CONFIG_SUSPEND
	suspend_state_t i;
	for (i = PM_SUSPEND_MIN; i < PM_SUSPEND_MAX; i++)
		if (pm_states[i])
			s += sprintf(s,"%s ", pm_states[i]);
#endif
	if (hibernation_available())
		s += sprintf(s, "disk ");
	if (s != buf)
		/* convert the last space to a newline */
		*(s-1) = '\n';
	return (s - buf);
 
state_store, however, provides our entry point. If the string disk is written to the state file, it calls hibernate(). This is our entry point. kernel/power/main.c:715
static ssize_t state_store(struct kobject *kobj, struct kobj_attribute *attr,
			   const char *buf, size_t n)
 
	suspend_state_t state;
	int error;
	error = pm_autosleep_lock();
	if (error)
		return error;
	if (pm_autosleep_state() > PM_SUSPEND_ON)  
		error = -EBUSY;
		goto out;
	 
	state = decode_state(buf, n);
	if (state < PM_SUSPEND_MAX)  
		if (state == PM_SUSPEND_MEM)
			state = mem_sleep_current;
		error = pm_suspend(state);
	  else if (state == PM_SUSPEND_MAX)  
		error = hibernate();
	  else  
		error = -EINVAL;
	 
 out:
	pm_autosleep_unlock();
	return error ? error : n;
 
kernel/power/main.c:688
static suspend_state_t decode_state(const char *buf, size_t n)
 
#ifdef CONFIG_SUSPEND
	suspend_state_t state;
#endif
	char *p;
	int len;
	p = memchr(buf, '\n', n);
	len = p ? p - buf : n;
	/* Check hibernation first. */
	if (len == 4 && str_has_prefix(buf, "disk"))
		return PM_SUSPEND_MAX;
#ifdef CONFIG_SUSPEND
	for (state = PM_SUSPEND_MIN; state < PM_SUSPEND_MAX; state++)  
		const char *label = pm_states[state];
		if (label && len == strlen(label) && !strncmp(buf, label, len))
			return state;
	 
#endif
	return PM_SUSPEND_ON;
 
Could we have figured this out just via function names? Sure, but this way we know for sure that nothing else is happening before this function is called.

Autosleep Our first detour is into the autosleep system. When checking the state above, you may notice that the kernel grabs the pm_autosleep_lock before checking the current state. autosleep is a mechanism originally from Android that sends the entire system to either suspend or hibernate whenever it is not actively working on anything. This is not enabled for most desktop configurations, since it s primarily for mobile systems and inverts the standard suspend and hibernate interactions. This system is implemented as a workqueue2 that checks the current number of wakeup events, processes and drivers that need to run3, and if there aren t any, then the system is put into the autosleep state, typically suspend. However, it could be hibernate if configured that way via /sys/power/autosleep in a similar manner to using /sys/power/state to manually enable hibernation. kernel/power/main.c:841
static ssize_t autosleep_store(struct kobject *kobj,
			       struct kobj_attribute *attr,
			       const char *buf, size_t n)
 
	suspend_state_t state = decode_state(buf, n);
	int error;
	if (state == PM_SUSPEND_ON
	    && strcmp(buf, "off") && strcmp(buf, "off\n"))
		return -EINVAL;
	if (state == PM_SUSPEND_MEM)
		state = mem_sleep_current;
	error = pm_autosleep_set_state(state);
	return error ? error : n;
 
power_attr(autosleep);
#endif /* CONFIG_PM_AUTOSLEEP */
kernel/power/autosleep.c:24
static DEFINE_MUTEX(autosleep_lock);
static struct wakeup_source *autosleep_ws;
static void try_to_suspend(struct work_struct *work)
 
	unsigned int initial_count, final_count;
	if (!pm_get_wakeup_count(&initial_count, true))
		goto out;
	mutex_lock(&autosleep_lock);
	if (!pm_save_wakeup_count(initial_count)  
		system_state != SYSTEM_RUNNING)  
		mutex_unlock(&autosleep_lock);
		goto out;
	 
	if (autosleep_state == PM_SUSPEND_ON)  
		mutex_unlock(&autosleep_lock);
		return;
	 
	if (autosleep_state >= PM_SUSPEND_MAX)
		hibernate();
	else
		pm_suspend(autosleep_state);
	mutex_unlock(&autosleep_lock);
	if (!pm_get_wakeup_count(&final_count, false))
		goto out;
	/*
	 * If the wakeup occurred for an unknown reason, wait to prevent the
	 * system from trying to suspend and waking up in a tight loop.
	 */
	if (final_count == initial_count)
		schedule_timeout_uninterruptible(HZ / 2);
 out:
	queue_up_suspend_work();
 
static DECLARE_WORK(suspend_work, try_to_suspend);
void queue_up_suspend_work(void)
 
	if (autosleep_state > PM_SUSPEND_ON)
		queue_work(autosleep_wq, &suspend_work);
 

The Steps of Hibernation

Hibernation Kernel Config It s important to note that most of the hibernate-specific functions below do nothing unless you ve defined CONFIG_HIBERNATION in your Kconfig4. As an example, hibernate itself is defined as the following if CONFIG_HIBERNATE is not set. include/linux/suspend.h:407
static inline int hibernate(void)   return -ENOSYS;  

Check if Hibernation is Available We begin by confirming that we actually can perform hibernation, via the hibernation_available function. kernel/power/hibernate.c:742
if (!hibernation_available())  
	pm_pr_dbg("Hibernation not available.\n");
	return -EPERM;
 
kernel/power/hibernate.c:92
bool hibernation_available(void)
 
	return nohibernate == 0 &&
		!security_locked_down(LOCKDOWN_HIBERNATION) &&
		!secretmem_active() && !cxl_mem_active();
 
nohibernate is controlled by the kernel command line, it s set via either nohibernate or hibernate=no. security_locked_down is a hook for Linux Security Modules to prevent hibernation. This is used to prevent hibernating to an unencrypted storage device, as specified in the manual page kernel_lockdown(7). Interestingly, either level of lockdown, integrity or confidentiality, locks down hibernation because with the ability to hibernate you can extract bascially anything from memory and even reboot into a modified kernel image. secretmem_active checks whether there is any active use of memfd_secret, and if so it prevents hibernation. memfd_secret returns a file descriptor that can be mapped into a process but is specifically unmapped from the kernel s memory space. Hibernating with memory that not even the kernel is supposed to access would expose that memory to whoever could access the hibernation image. This particular feature of secret memory was apparently controversial, though not as controversial as performance concerns around fragmentation when unmapping kernel memory (which did not end up being a real problem). cxl_mem_active just checks whether any CXL memory is active. A full explanation is provided in the commit introducing this check but there s also a shortened explanation from cxl_mem_probe that sets the relevant flag when initializing a CXL memory device. drivers/cxl/mem.c:186
* The kernel may be operating out of CXL memory on this device,
* there is no spec defined way to determine whether this device
* preserves contents over suspend, and there is no simple way
* to arrange for the suspend image to avoid CXL memory which
* would setup a circular dependency between PCI resume and save
* state restoration.

Check Compression The next check is for whether compression support is enabled, and if so whether the requested algorithm is enabled. kernel/power/hibernate.c:747
/*
 * Query for the compression algorithm support if compression is enabled.
 */
if (!nocompress)  
	strscpy(hib_comp_algo, hibernate_compressor, sizeof(hib_comp_algo));
	if (crypto_has_comp(hib_comp_algo, 0, 0) != 1)  
		pr_err("%s compression is not available\n", hib_comp_algo);
		return -EOPNOTSUPP;
	 
 
The nocompress flag is set via the hibernate command line parameter, setting hibernate=nocompress. If compression is enabled, then hibernate_compressor is copied to hib_comp_algo. This synchronizes the current requested compression setting (hibernate_compressor) with the current compression setting (hib_comp_algo). Both values are character arrays of size CRYPTO_MAX_ALG_NAME (128 in this kernel). kernel/power/hibernate.c:50
static char hibernate_compressor[CRYPTO_MAX_ALG_NAME] = CONFIG_HIBERNATION_DEF_COMP;
/*
 * Compression/decompression algorithm to be used while saving/loading
 * image to/from disk. This would later be used in 'kernel/power/swap.c'
 * to allocate comp streams.
 */
char hib_comp_algo[CRYPTO_MAX_ALG_NAME];
hibernate_compressor defaults to lzo if that algorithm is enabled, otherwise to lz4 if enabled5. It can be overwritten using the hibernate.compressor setting to either lzo or lz4. kernel/power/Kconfig:95
choice
	prompt "Default compressor"
	default HIBERNATION_COMP_LZO
	depends on HIBERNATION
config HIBERNATION_COMP_LZO
	bool "lzo"
	depends on CRYPTO_LZO
config HIBERNATION_COMP_LZ4
	bool "lz4"
	depends on CRYPTO_LZ4
endchoice
config HIBERNATION_DEF_COMP
	string
	default "lzo" if HIBERNATION_COMP_LZO
	default "lz4" if HIBERNATION_COMP_LZ4
	help
	  Default compressor to be used for hibernation.
kernel/power/hibernate.c:1425
static const char * const comp_alg_enabled[] =  
#if IS_ENABLED(CONFIG_CRYPTO_LZO)
	COMPRESSION_ALGO_LZO,
#endif
#if IS_ENABLED(CONFIG_CRYPTO_LZ4)
	COMPRESSION_ALGO_LZ4,
#endif
 ;
static int hibernate_compressor_param_set(const char *compressor,
		const struct kernel_param *kp)
 
	unsigned int sleep_flags;
	int index, ret;
	sleep_flags = lock_system_sleep();
	index = sysfs_match_string(comp_alg_enabled, compressor);
	if (index >= 0)  
		ret = param_set_copystring(comp_alg_enabled[index], kp);
		if (!ret)
			strscpy(hib_comp_algo, comp_alg_enabled[index],
				sizeof(hib_comp_algo));
	  else  
		ret = index;
	 
	unlock_system_sleep(sleep_flags);
	if (ret)
		pr_debug("Cannot set specified compressor %s\n",
			 compressor);
	return ret;
 
static const struct kernel_param_ops hibernate_compressor_param_ops =  
	.set    = hibernate_compressor_param_set,
	.get    = param_get_string,
 ;
static struct kparam_string hibernate_compressor_param_string =  
	.maxlen = sizeof(hibernate_compressor),
	.string = hibernate_compressor,
 ;
We then check whether the requested algorithm is supported via crypto_has_comp. If not, we bail out of the whole operation with EOPNOTSUPP. As part of crypto_has_comp we perform any needed initialization of the algorithm, loading kernel modules and running initialization code as needed6.

Grab Locks The next step is to grab the sleep and hibernation locks via lock_system_sleep and hibernate_acquire. kernel/power/hibernate.c:758
sleep_flags = lock_system_sleep();
/* The snapshot device should not be opened while we're running */
if (!hibernate_acquire())  
	error = -EBUSY;
	goto Unlock;
 
First, lock_system_sleep marks the current thread as not freezable, which will be important later7. It then grabs the system_transistion_mutex, which locks taking snapshots or modifying how they are taken, resuming from a hibernation image, entering any suspend state, or rebooting.

The GFP Mask The kernel also issues a warning if the gfp mask is changed via either pm_restore_gfp_mask or pm_restrict_gfp_mask without holding the system_transistion_mutex. GFP flags tell the kernel how it is permitted to handle a request for memory. include/linux/gfp_types.h:12
 * GFP flags are commonly used throughout Linux to indicate how memory
 * should be allocated.  The GFP acronym stands for get_free_pages(),
 * the underlying memory allocation function.  Not every GFP flag is
 * supported by every function which may allocate memory.
In the case of hibernation specifically we care about the IO and FS flags, which are reclaim operators, ways the system is permitted to attempt to free up memory in order to satisfy a specific request for memory. include/linux/gfp_types.h:176
 * Reclaim modifiers
 * -----------------
 * Please note that all the following flags are only applicable to sleepable
 * allocations (e.g. %GFP_NOWAIT and %GFP_ATOMIC will ignore them).
 *
 * %__GFP_IO can start physical IO.
 *
 * %__GFP_FS can call down to the low-level FS. Clearing the flag avoids the
 * allocator recursing into the filesystem which might already be holding
 * locks.
gfp_allowed_mask sets which flags are permitted to be set at the current time. As the comment below outlines, preventing these flags from being set avoids situations where the kernel needs to do I/O to allocate memory (e.g. read/writing swap8) but the devices it needs to read/write to/from are not currently available. kernel/power/main.c:24
/*
 * The following functions are used by the suspend/hibernate code to temporarily
 * change gfp_allowed_mask in order to avoid using I/O during memory allocations
 * while devices are suspended.  To avoid races with the suspend/hibernate code,
 * they should always be called with system_transition_mutex held
 * (gfp_allowed_mask also should only be modified with system_transition_mutex
 * held, unless the suspend/hibernate code is guaranteed not to run in parallel
 * with that modification).
 */
static gfp_t saved_gfp_mask;
void pm_restore_gfp_mask(void)
 
	WARN_ON(!mutex_is_locked(&system_transition_mutex));
	if (saved_gfp_mask)  
		gfp_allowed_mask = saved_gfp_mask;
		saved_gfp_mask = 0;
	 
 
void pm_restrict_gfp_mask(void)
 
	WARN_ON(!mutex_is_locked(&system_transition_mutex));
	WARN_ON(saved_gfp_mask);
	saved_gfp_mask = gfp_allowed_mask;
	gfp_allowed_mask &= ~(__GFP_IO   __GFP_FS);
 

Sleep Flags After grabbing the system_transition_mutex the kernel then returns and captures the previous state of the threads flags in sleep_flags. This is used later to remove PF_NOFREEZE if it wasn t previously set on the current thread. kernel/power/main.c:52
unsigned int lock_system_sleep(void)
 
	unsigned int flags = current->flags;
	current->flags  = PF_NOFREEZE;
	mutex_lock(&system_transition_mutex);
	return flags;
 
EXPORT_SYMBOL_GPL(lock_system_sleep);
include/linux/sched.h:1633
#define PF_NOFREEZE		0x00008000	/* This thread should not be frozen */
Then we grab the hibernate-specific semaphore to ensure no one can open a snapshot or resume from it while we perform hibernation. Additionally this lock is used to prevent hibernate_quiet_exec, which is used by the nvdimm driver to active its firmware with all processes and devices frozen, ensuring it is the only thing running at that time9. kernel/power/hibernate.c:82
bool hibernate_acquire(void)
 
	return atomic_add_unless(&hibernate_atomic, -1, 0);
 

Prepare Console The kernel next calls pm_prepare_console. This function only does anything if CONFIG_VT_CONSOLE_SLEEP has been set. This prepares the virtual terminal for a suspend state, switching away to a console used only for the suspend state if needed. kernel/power/console.c:130
void pm_prepare_console(void)
 
	if (!pm_vt_switch())
		return;
	orig_fgconsole = vt_move_to_console(SUSPEND_CONSOLE, 1);
	if (orig_fgconsole < 0)
		return;
	orig_kmsg = vt_kmsg_redirect(SUSPEND_CONSOLE);
	return;
 
The first thing is to check whether we actually need to switch the VT kernel/power/console.c:94
/*
 * There are three cases when a VT switch on suspend/resume are required:
 *   1) no driver has indicated a requirement one way or another, so preserve
 *      the old behavior
 *   2) console suspend is disabled, we want to see debug messages across
 *      suspend/resume
 *   3) any registered driver indicates it needs a VT switch
 *
 * If none of these conditions is present, meaning we have at least one driver
 * that doesn't need the switch, and none that do, we can avoid it to make
 * resume look a little prettier (and suspend too, but that's usually hidden,
 * e.g. when closing the lid on a laptop).
 */
static bool pm_vt_switch(void)
 
	struct pm_vt_switch *entry;
	bool ret = true;
	mutex_lock(&vt_switch_mutex);
	if (list_empty(&pm_vt_switch_list))
		goto out;
	if (!console_suspend_enabled)
		goto out;
	list_for_each_entry(entry, &pm_vt_switch_list, head)  
		if (entry->required)
			goto out;
	 
	ret = false;
out:
	mutex_unlock(&vt_switch_mutex);
	return ret;
 
There is an explanation of the conditions under which a switch is performed in the comment above the function, but we ll also walk through the steps here. Firstly we grab the vt_switch_mutex to ensure nothing will modify the list while we re looking at it. We then examine the pm_vt_switch_list. This list is used to indicate the drivers that require a switch during suspend. They register this requirement, or the lack thereof, via pm_vt_switch_required. kernel/power/console.c:31
/**
 * pm_vt_switch_required - indicate VT switch at suspend requirements
 * @dev: device
 * @required: if true, caller needs VT switch at suspend/resume time
 *
 * The different console drivers may or may not require VT switches across
 * suspend/resume, depending on how they handle restoring video state and
 * what may be running.
 *
 * Drivers can indicate support for switchless suspend/resume, which can
 * save time and flicker, by using this routine and passing 'false' as
 * the argument.  If any loaded driver needs VT switching, or the
 * no_console_suspend argument has been passed on the command line, VT
 * switches will occur.
 */
void pm_vt_switch_required(struct device *dev, bool required)
Next, we check console_suspend_enabled. This is set to false by the kernel parameter no_console_suspend, but defaults to true. Finally, if there are any entries in the pm_vt_switch_list, then we check to see if any of them require a VT switch. Only if none of these conditions apply, then we return false. If a VT switch is in fact required, then we move first the currently active virtual terminal/console10 (vt_move_to_console) and then the current location of kernel messages (vt_kmsg_redirect) to the SUSPEND_CONSOLE. The SUSPEND_CONSOLE is the last entry in the list of possible consoles, and appears to just be a black hole to throw away messages. kernel/power/console.c:16
#define SUSPEND_CONSOLE	(MAX_NR_CONSOLES-1)
Interestingly, these are separate functions because you can use TIOCL_SETKMSGREDIRECT (an ioctl11) to send kernel messages to a specific virtual terminal, but by default its the same as the currently active console. The locations of the previously active console and the previous kernel messages location are stored in orig_fgconsole and orig_kmsg, to restore the state of the console and kernel messages after the machine wakes up again. Interestingly, this means orig_fgconsole also ends up storing any errors, so has to be checked to ensure it s not less than zero before we try to do anything with the kernel messages on both suspend and resume. drivers/tty/vt/vt_ioctl.c:1268
/* Perform a kernel triggered VT switch for suspend/resume */
static int disable_vt_switch;
int vt_move_to_console(unsigned int vt, int alloc)
 
	int prev;
	console_lock();
	/* Graphics mode - up to X */
	if (disable_vt_switch)  
		console_unlock();
		return 0;
	 
	prev = fg_console;
	if (alloc && vc_allocate(vt))  
		/* we can't have a free VC for now. Too bad,
		 * we don't want to mess the screen for now. */
		console_unlock();
		return -ENOSPC;
	 
	if (set_console(vt))  
		/*
		 * We're unable to switch to the SUSPEND_CONSOLE.
		 * Let the calling function know so it can decide
		 * what to do.
		 */
		console_unlock();
		return -EIO;
	 
	console_unlock();
	if (vt_waitactive(vt + 1))  
		pr_debug("Suspend: Can't switch VCs.");
		return -EINTR;
	 
	return prev;
 
Unlike most other locking functions we ve seen so far, console_lock needs to be careful to ensure nothing else is panicking and needs to dump to the console before grabbing the semaphore for the console and setting a couple flags.

Panics Panics are tracked via an atomic integer set to the id of the processor currently panicking. kernel/printk/printk.c:2649
/**
 * console_lock - block the console subsystem from printing
 *
 * Acquires a lock which guarantees that no consoles will
 * be in or enter their write() callback.
 *
 * Can sleep, returns nothing.
 */
void console_lock(void)
 
	might_sleep();
	/* On panic, the console_lock must be left to the panic cpu. */
	while (other_cpu_in_panic())
		msleep(1000);
	down_console_sem();
	console_locked = 1;
	console_may_schedule = 1;
 
EXPORT_SYMBOL(console_lock);
kernel/printk/printk.c:362
/*
 * Return true if a panic is in progress on a remote CPU.
 *
 * On true, the local CPU should immediately release any printing resources
 * that may be needed by the panic CPU.
 */
bool other_cpu_in_panic(void)
 
	return (panic_in_progress() && !this_cpu_in_panic());
 
kernel/printk/printk.c:345
static bool panic_in_progress(void)
 
	return unlikely(atomic_read(&panic_cpu) != PANIC_CPU_INVALID);
 
kernel/printk/printk.c:350
/* Return true if a panic is in progress on the current CPU. */
bool this_cpu_in_panic(void)
 
	/*
	 * We can use raw_smp_processor_id() here because it is impossible for
	 * the task to be migrated to the panic_cpu, or away from it. If
	 * panic_cpu has already been set, and we're not currently executing on
	 * that CPU, then we never will be.
	 */
	return unlikely(atomic_read(&panic_cpu) == raw_smp_processor_id());
 
console_locked is a debug value, used to indicate that the lock should be held, and our first indication that this whole virtual terminal system is more complex than might initially be expected. kernel/printk/printk.c:373
/*
 * This is used for debugging the mess that is the VT code by
 * keeping track if we have the console semaphore held. It's
 * definitely not the perfect debug tool (we don't know if _WE_
 * hold it and are racing, but it helps tracking those weird code
 * paths in the console code where we end up in places I want
 * locked without the console semaphore held).
 */
static int console_locked;
console_may_schedule is used to see if we are permitted to sleep and schedule other work while we hold this lock. As we ll see later, the virtual terminal subsystem is not re-entrant, so there s all sorts of hacks in here to ensure we don t leave important code sections that can t be safely resumed.

Disable VT Switch As the comment below lays out, when another program is handling graphical display anyway, there s no need to do any of this, so the kernel provides a switch to turn the whole thing off. Interestingly, this appears to only be used by three drivers, so the specific hardware support required must not be particularly common.
drivers/gpu/drm/omapdrm/dss
drivers/video/fbdev/geode
drivers/video/fbdev/omap2
drivers/tty/vt/vt_ioctl.c:1308
/*
 * Normally during a suspend, we allocate a new console and switch to it.
 * When we resume, we switch back to the original console.  This switch
 * can be slow, so on systems where the framebuffer can handle restoration
 * of video registers anyways, there's little point in doing the console
 * switch.  This function allows you to disable it by passing it '0'.
 */
void pm_set_vt_switch(int do_switch)
 
	console_lock();
	disable_vt_switch = !do_switch;
	console_unlock();
 
EXPORT_SYMBOL(pm_set_vt_switch);
The rest of the vt_switch_console function is pretty normal, however, simply allocating space if needed to create the requested virtual terminal and then setting the current virtual terminal via set_console.

Virtual Terminal Set Console With set_console, we begin (as if we haven t been already) to enter the madness that is the virtual terminal subsystem. As mentioned previously, modifications to its state must be made very carefully, as other stuff happening at the same time could create complete messes. All this to say, calling set_console does not actually perform any work to change the state of the current console. Instead it indicates what changes it wants and then schedules that work. drivers/tty/vt/vt.c:3153
int set_console(int nr)
 
	struct vc_data *vc = vc_cons[fg_console].d;
	if (!vc_cons_allocated(nr)   vt_dont_switch  
		(vc->vt_mode.mode == VT_AUTO && vc->vc_mode == KD_GRAPHICS))  
		/*
		 * Console switch will fail in console_callback() or
		 * change_console() so there is no point scheduling
		 * the callback
		 *
		 * Existing set_console() users don't check the return
		 * value so this shouldn't break anything
		 */
		return -EINVAL;
	 
	want_console = nr;
	schedule_console_callback();
	return 0;
 
The check for vc->vc_mode == KD_GRAPHICS is where most end-user graphical desktops will bail out of this change, as they re in graphics mode and don t need to switch away to the suspend console. vt_dont_switch is a flag used by the ioctls11 VT_LOCKSWITCH and VT_UNLOCKSWITCH to prevent the system from switching virtual terminal devices when the user has explicitly locked it. VT_AUTO is a flag indicating that automatic virtual terminal switching is enabled12, and thus deliberate switching to a suspend terminal is not required. However, if you do run your machine from a virtual terminal, then we indicate to the system that we want to change to the requested virtual terminal via the want_console variable and schedule a callback via schedule_console_callback. drivers/tty/vt/vt.c:315
void schedule_console_callback(void)
 
	schedule_work(&console_work);
 
console_work is a workqueue2 that will execute the given task asynchronously.

Console Callback drivers/tty/vt/vt.c:3109
/*
 * This is the console switching callback.
 *
 * Doing console switching in a process context allows
 * us to do the switches asynchronously (needed when we want
 * to switch due to a keyboard interrupt).  Synchronization
 * with other console code and prevention of re-entrancy is
 * ensured with console_lock.
 */
static void console_callback(struct work_struct *ignored)
 
	console_lock();
	if (want_console >= 0)  
		if (want_console != fg_console &&
		    vc_cons_allocated(want_console))  
			hide_cursor(vc_cons[fg_console].d);
			change_console(vc_cons[want_console].d);
			/* we only changed when the console had already
			   been allocated - a new console is not created
			   in an interrupt routine */
		 
		want_console = -1;
	 
...
console_callback first looks to see if there is a console change wanted via want_console and then changes to it if it s not the current console and has been allocated already. We do first remove any cursor state with hide_cursor. drivers/tty/vt/vt.c:841
static void hide_cursor(struct vc_data *vc)
 
	if (vc_is_sel(vc))
		clear_selection();
	vc->vc_sw->con_cursor(vc, false);
	hide_softcursor(vc);
 
A full dive into the tty driver is a task for another time, but this should give a general sense of how this system interacts with hibernation.

Notify Power Management Call Chain kernel/power/hibernate.c:767
pm_notifier_call_chain_robust(PM_HIBERNATION_PREPARE, PM_POST_HIBERNATION)
This will call a chain of power management callbacks, passing first PM_HIBERNATION_PREPARE and then PM_POST_HIBERNATION on startup or on error with another callback. kernel/power/main.c:98
int pm_notifier_call_chain_robust(unsigned long val_up, unsigned long val_down)
 
	int ret;
	ret = blocking_notifier_call_chain_robust(&pm_chain_head, val_up, val_down, NULL);
	return notifier_to_errno(ret);
 
The power management notifier is a blocking notifier chain, which means it has the following properties. include/linux/notifier.h:23
 *	Blocking notifier chains: Chain callbacks run in process context.
 *		Callouts are allowed to block.
The callback chain is a linked list with each entry containing a priority and a function to call. The function technically takes in a data value, but it is always NULL for the power management chain. include/linux/notifier.h:49
struct notifier_block;
typedef	int (*notifier_fn_t)(struct notifier_block *nb,
			unsigned long action, void *data);
struct notifier_block  
	notifier_fn_t notifier_call;
	struct notifier_block __rcu *next;
	int priority;
 ;
The head of the linked list is protected by a read-write semaphore. include/linux/notifier.h:65
struct blocking_notifier_head  
	struct rw_semaphore rwsem;
	struct notifier_block __rcu *head;
 ;
Because it is prioritized, appending to the list requires walking it until an item with lower13 priority is found to insert the current item before. kernel/notifier.c:252
/*
 *	Blocking notifier chain routines.  All access to the chain is
 *	synchronized by an rwsem.
 */
static int __blocking_notifier_chain_register(struct blocking_notifier_head *nh,
					      struct notifier_block *n,
					      bool unique_priority)
 
	int ret;
	/*
	 * This code gets used during boot-up, when task switching is
	 * not yet working and interrupts must remain disabled.  At
	 * such times we must not call down_write().
	 */
	if (unlikely(system_state == SYSTEM_BOOTING))
		return notifier_chain_register(&nh->head, n, unique_priority);
	down_write(&nh->rwsem);
	ret = notifier_chain_register(&nh->head, n, unique_priority);
	up_write(&nh->rwsem);
	return ret;
 
kernel/notifier.c:20
/*
 *	Notifier chain core routines.  The exported routines below
 *	are layered on top of these, with appropriate locking added.
 */
static int notifier_chain_register(struct notifier_block **nl,
				   struct notifier_block *n,
				   bool unique_priority)
 
	while ((*nl) != NULL)  
		if (unlikely((*nl) == n))  
			WARN(1, "notifier callback %ps already registered",
			     n->notifier_call);
			return -EEXIST;
		 
		if (n->priority > (*nl)->priority)
			break;
		if (n->priority == (*nl)->priority && unique_priority)
			return -EBUSY;
		nl = &((*nl)->next);
	 
	n->next = *nl;
	rcu_assign_pointer(*nl, n);
	trace_notifier_register((void *)n->notifier_call);
	return 0;
 
Each callback can return one of a series of options. include/linux/notifier.h:18
#define NOTIFY_DONE		0x0000		/* Don't care */
#define NOTIFY_OK		0x0001		/* Suits me */
#define NOTIFY_STOP_MASK	0x8000		/* Don't call further */
#define NOTIFY_BAD		(NOTIFY_STOP_MASK 0x0002)
						/* Bad/Veto action */
When notifying the chain, if a function returns STOP or BAD then the previous parts of the chain are called again with PM_POST_HIBERNATION14 and an error is returned. kernel/notifier.c:107
/**
 * notifier_call_chain_robust - Inform the registered notifiers about an event
 *                              and rollback on error.
 * @nl:		Pointer to head of the blocking notifier chain
 * @val_up:	Value passed unmodified to the notifier function
 * @val_down:	Value passed unmodified to the notifier function when recovering
 *              from an error on @val_up
 * @v:		Pointer passed unmodified to the notifier function
 *
 * NOTE:	It is important the @nl chain doesn't change between the two
 *		invocations of notifier_call_chain() such that we visit the
 *		exact same notifier callbacks; this rules out any RCU usage.
 *
 * Return:	the return value of the @val_up call.
 */
static int notifier_call_chain_robust(struct notifier_block **nl,
				     unsigned long val_up, unsigned long val_down,
				     void *v)
 
	int ret, nr = 0;
	ret = notifier_call_chain(nl, val_up, v, -1, &nr);
	if (ret & NOTIFY_STOP_MASK)
		notifier_call_chain(nl, val_down, v, nr-1, NULL);
	return ret;
 
Each of these callbacks tends to be quite driver-specific, so we ll cease discussion of this here.

Sync Filesystems The next step is to ensure all filesystems have been synchronized to disk. This is performed via a simple helper function that times how long the full synchronize operation, ksys_sync takes. kernel/power/main.c:69
void ksys_sync_helper(void)
 
	ktime_t start;
	long elapsed_msecs;
	start = ktime_get();
	ksys_sync();
	elapsed_msecs = ktime_to_ms(ktime_sub(ktime_get(), start));
	pr_info("Filesystems sync: %ld.%03ld seconds\n",
		elapsed_msecs / MSEC_PER_SEC, elapsed_msecs % MSEC_PER_SEC);
 
EXPORT_SYMBOL_GPL(ksys_sync_helper);
ksys_sync wakes and instructs a set of flusher threads to write out every filesystem, first their inodes15, then the full filesystem, and then finally all block devices, to ensure all pages are written out to disk. fs/sync.c:87
/*
 * Sync everything. We start by waking flusher threads so that most of
 * writeback runs on all devices in parallel. Then we sync all inodes reliably
 * which effectively also waits for all flusher threads to finish doing
 * writeback. At this point all data is on disk so metadata should be stable
 * and we tell filesystems to sync their metadata via ->sync_fs() calls.
 * Finally, we writeout all block devices because some filesystems (e.g. ext2)
 * just write metadata (such as inodes or bitmaps) to block device page cache
 * and do not sync it on their own in ->sync_fs().
 */
void ksys_sync(void)
 
	int nowait = 0, wait = 1;
	wakeup_flusher_threads(WB_REASON_SYNC);
	iterate_supers(sync_inodes_one_sb, NULL);
	iterate_supers(sync_fs_one_sb, &nowait);
	iterate_supers(sync_fs_one_sb, &wait);
	sync_bdevs(false);
	sync_bdevs(true);
	if (unlikely(laptop_mode))
		laptop_sync_completion();
 
It follows an interesting pattern of using iterate_supers to run both sync_inodes_one_sb and then sync_fs_one_sb on each known filesystem16. It also calls both sync_fs_one_sb and sync_bdevs twice, first without waiting for any operations to complete and then again waiting for completion17. When laptop_mode is enabled the system runs additional filesystem synchronization operations after the specified delay without any writes. mm/page-writeback.c:111
/*
 * Flag that puts the machine in "laptop mode". Doubles as a timeout in jiffies:
 * a full sync is triggered after this time elapses without any disk activity.
 */
int laptop_mode;
EXPORT_SYMBOL(laptop_mode);
However, when running a filesystem synchronization operation, the system will add an additional timer to schedule more writes after the laptop_mode delay. We don t want the state of the system to change at all while performing hibernation, so we cancel those timers. mm/page-writeback.c:2198
/*
 * We're in laptop mode and we've just synced. The sync's writes will have
 * caused another writeback to be scheduled by laptop_io_completion.
 * Nothing needs to be written back anymore, so we unschedule the writeback.
 */
void laptop_sync_completion(void)
 
	struct backing_dev_info *bdi;
	rcu_read_lock();
	list_for_each_entry_rcu(bdi, &bdi_list, bdi_list)
		del_timer(&bdi->laptop_mode_wb_timer);
	rcu_read_unlock();
 
As a side note, the ksys_sync function is simply called when the system call sync is used. fs/sync.c:111
SYSCALL_DEFINE0(sync)
 
	ksys_sync();
	return 0;
 

The End of Preparation With that the system has finished preparations for hibernation. This is a somewhat arbitrary cutoff, but next the system will begin a full freeze of userspace to then dump memory out to an image and finally to perform hibernation. All this will be covered in future articles!
  1. Hibernation modes are outside of scope for this article, see the previous article for a high-level description of the different types of hibernation.
  2. Workqueues are a mechanism for running asynchronous tasks. A full description of them is a task for another time, but the kernel documentation on them is available here: https://www.kernel.org/doc/html/v6.9/core-api/workqueue.html 2
  3. This is a bit of an oversimplification, but since this isn t the main focus of this article this description has been kept to a higher level.
  4. Kconfig is Linux s build configuration system that sets many different macros to enable/disable various features.
  5. Kconfig defaults to the first default found
  6. Including checking whether the algorithm is larval? Which appears to indicate that it requires additional setup, but is an interesting choice of name for such a state.
  7. Specifically when we get to process freezing, which we ll get to in the next article in this series.
  8. Swap space is outside the scope of this article, but in short it is a buffer on disk that the kernel uses to store memory not current in use to free up space for other things. See Swap Management for more details.
  9. The code for this is lengthy and tangential, thus it has not been included here. If you re curious about the details of this, see kernel/power/hibernate.c:858 for the details of hibernate_quiet_exec, and drivers/nvdimm/core.c:451 for how it is used in nvdimm.
  10. Annoyingly this code appears to use the terms console and virtual terminal interchangeably.
  11. ioctls are special device-specific I/O operations that permit performing actions outside of the standard file interactions of read/write/seek/etc. 2
  12. I m not entirely clear on how this flag works, this subsystem is particularly complex.
  13. In this case a higher number is higher priority.
  14. Or whatever the caller passes as val_down, but in this case we re specifically looking at how this is used in hibernation.
  15. An inode refers to a particular file or directory within the filesystem. See Wikipedia for more details.
  16. Each active filesystem is registed with the kernel through a structure known as a superblock, which contains references to all the inodes contained within the filesystem, as well as function pointers to perform the various required operations, like sync.
  17. I m including minimal code in this section, as I m not looking to deep dive into the filesystem code at this time.

4 September 2024

Reproducible Builds: Reproducible Builds in August 2024

Welcome to the August 2024 report from the Reproducible Builds project! Our reports attempt to outline what we ve been up to over the past month, highlighting news items from elsewhere in tech where they are related. As ever, if you are interested in contributing to the project, please visit our Contribute page on our website. Table of contents:
  1. LWN: The history, status, and plans for reproducible builds
  2. Intermediate Autotools build artifacts removed from PostgreSQL distribution tarballs
  3. Distribution news
  4. Mailing list news
  5. diffoscope
  6. Website updates
  7. Upstream patches
  8. Reproducibility testing framework

LWN: The history, status, and plans for reproducible builds The free software newspaper of record, Linux Weekly News, published an in-depth article based on Holger Levsen s talk, Reproducible Builds: The First Eleven Years which was presented at the recent DebConf24 conference in Busan, South Korea. Titled The history, status, and plans for reproducible builds and written by Jake Edge, LWN s article not only summarises Holger s talk and clarifies its message but it links to external information as well. Holger s original talk can also be watched on the DebConf24 webpage (direct .webm link and his HTML slides are available also). There are also a significant number of comments on LWN s page as well. Holger Levsen also headed a scheduled discussion session at DebConf24 on Preserving *other* build artifacts addressing a topic where a number of Debian packages are (or would like to) produce results that are neither the .deb files, the build logs nor the logs of CI tests. This is an issue for reproducible builds as this 4th type of build artifact are typically shipped within the binary .deb packages, and are invariably non-deterministic; thus making the .deb files unreproducible. (A direct .webm link and HTML slides are available).

Intermediate Autotools build artifacts removed from PostgreSQL distribution tarballs Peter Eisentraut wrote a detailed blog post on the subject of The new PostgreSQL 17 make dist . Like many projects, the PostgreSQL database has previously pre-built parts of its GNU Autotools build system: the reason for this is a mix of convenience and traditional practice . Peter astutely notes that this arrangement in the build system is quite tricky as:
You need to carefully maintain the different states of clean source code , partially built source code , and fully built source code , and the commands to transition between them.
However, Peter goes on to mention that:
a lot more attention is nowadays paid to the software supply chain. There are security and legal reasons for this. When users install software, they want to know where it came from, and they want to be sure that they got the right thing, not some fake version or some version of dubious legal provenance.
And cites the XZ Utils backdoor as a reason to care about transparent and reproducible ways of distributing and communicating a source tarball and provenance. Because of this, intermediate build artifacts are now henceforth essentially disallowed from PostgreSQL distribution tarballs.

Distribution news In Debian this month, 30 reviews of Debian packages were added, 17 were updated and 10 were removed this month adding to our knowledge about identified issues. One issue type was added by Chris Lamb, too. [ ] In addition, an issue was filed to update the Salsa CI pipeline (used by 1,000s of Debian packages) to no longer test for reproducibility with reprotest s build_path variation. Holger Levsen provided a rationale for this change in the issue, which has already been made to the tests being performed by tests.reproducible-builds.org.
In Arch Linux this month, Jelle van der Waa published a short blog post on the topic of Investigating creating reproducible images with mkosi, motivated by the desire to make it possible for anyone to re-recreate the official Arch cloud image bit-by-bit identical on their own machine as per [the] reproducible builds definition. In addition, Jelle filed a patch for pacman, the Arch Linux package manager, to respect the SOURCE_DATE_EPOCH environment variable when installing a package.
In openSUSE news, Bernhard M. Wiedemann published another report for that distribution.
In Android news, the IzzyOnDroid project added 49 new rebuilder recipes and now features 256 total reproducible applications representing 21% of the total offerings in the repository. IzzyOnDroid is an F-Droid style repository for Android apps[:] applications in this repository are official binaries built by the original application developers, taken from their resp. repositories (mostly GitHub).

Mailing list news From our mailing list this month:
  • Bernhard M. Wiedemann posted a brief message to the list with some helpful information regarding nondeterminism within Rust binaries, positing the use of the codegen-units = 16 default and resulting in a bug being filed in the Rust issue tracker. [ ]
  • Bernhard also wrote to the list, following up to a thread in November 2023, on attempts to make the LibreOffice suite of office applications build reproducibly. In the thread from this month, Bernhard could announce that the four patches previously mentioned have landed in LibreOffice upstream.
  • Fay Stegerman linked the mailing list to a thread she made on the Signal issue tracker regarding whether device-specific binaries [can] ever be considered meaningfully reproducible . In particular: the whole part about allow[ing] multiple third parties to come to a consensus on a correct result breaks down completely when correct is device-specific and not something everyone can agree on. [ ]
  • Developer kpcyrd posted an update for source code indexing project, whatsrc.org. Announcing that it now importing packages from live-bootstrap ( a usable Linux system [that is] created with only human-auditable, and wherever possible, human-written, source code ) into its database of provenance data.
  • Lastly, Mechtilde Stehmann posted an update to an earlier thread about how Java builds are not reproducible on the armhf architecture, enquiring how they might gain temporary access to such a machine in order to perform some deeper testing. [ ]

diffoscope diffoscope is our in-depth and content-aware diff utility that can locate and diagnose reproducibility issues. This month, Chris Lamb released versions 274, 275, 276 and 277, uploaded these to Debian, and made the following changes as well:
  • New features:
    • Strip ANSI escapes usually colour codes from the output of the Procyon Java decompiler. [ ]
    • Factor out a method for stripping ANSI escapes. [ ]
    • Append output from dumppdf(1) in more cases, avoiding situations where we fallback to a binary diff. [ ]
    • Add support for versions of Perl s IO::Compress::Zip version 2.212. [ ]
  • Bug fixes:
    • Also catch RuntimeError exceptions when importing the PyPDF library so that it, or, crucially, its transitive dependencies, cannot not cause diffoscope to traceback at runtime and build time. [ ]
    • Do not call marshal.load( ) of precompiled Python bytecode as it, alas, inherently unsafe. Replace for now with a brief summary of the code section of .pyc. [ ][ ]
    • Don t include excessive debug output when calling dumppdf(1). [ ]
  • Testsuite-related changes:
    • Don t bother to check version number in test_python.py: the fixture for this test is fixed. [ ][ ]
    • Update test_zip text fixtures and definitions to support new changes to the Perl IO::Compress library. [ ]
In addition, Mattia Rizzolo updated the available architectures for a number of test dependencies [ ] and Sergei Trofimovich fixed an issue to avoid diffoscope crashing when hashing directory symlinks [ ] and Vagrant Cascadian proposed GNU Guix updates for diffoscope versions 275 and 276 and 277.

Website updates There were a rather substantial number of improvements made to our website this month, including:
  • Alba Herrerias:
    • Substantially extend the guidance on the Contribute page. [ ]
  • Chris Lamb:
    • Set the future: true configuration value so we render all files and documents in the website, regardless of whether they have a date property in the future. After all, we don t re-generate the website on a timer, and have other ways of making unpublished, draft posts. [ ][ ]
  • Fay Stegerman:
  • hulkoba:
  • kpcyrd:
  • Mattia Rizzolo:
  • Pol Dellaiera:

Upstream patches The Reproducible Builds project detects, dissects and attempts to fix as many currently-unreproducible packages as possible. We endeavour to send all of our patches upstream where appropriate. This month, we wrote a large number of such patches, including:

Reproducibility testing framework The Reproducible Builds project operates a comprehensive testing framework running primarily at tests.reproducible-builds.org in order to check packages and other artifacts for reproducibility. In August, a number of changes were made by Holger Levsen, including:
  • Temporarily install the openssl-provider-legacy package for the Debian unstable environments for running diffoscope due to Debian bug #1078944. [ ][ ][ ][ ]
  • Mark Debian armhf architecture nodes as being down due to proxy down. [ ][ ]
  • Detect proxy failures. [ ][ ][ ]
  • Run the index-buildinfo for the builtin-pho script with the -q switch. [ ]
  • Disable all Arch Linux reproducible jobs. [ ]
In addition, Mattia Rizzolo updated the website configuration to install the ruby-jekyll-sitemap package as it is now used in the website [ ], Roland Clobus updated the script to build Debian live images to treat openQA issues as warnings [ ], and Vagrant Cascadian marked the cbxi4b node as down [ ].

If you are interested in contributing to the Reproducible Builds project, please visit our Contribute page on our website. However, you can get in touch with us via:

Next.