Search Results: "cts"

7 October 2024

Reproducible Builds: Reproducible Builds in September 2024

Welcome to the September 2024 report from the Reproducible Builds project! Our reports attempt to outline what we ve been up to over the past month, highlighting news items from elsewhere in tech where they are related. As ever, if you are interested in contributing to the project, please visit our Contribute page on our website. Table of contents:
  1. New binsider tool to analyse ELF binaries
  2. Unreproducibility of GHC Haskell compiler 95% fixed
  3. Mailing list summary
  4. Towards a 100% bit-for-bit reproducible OS
  5. Two new reproducibility-related academic papers
  6. Distribution work
  7. diffoscope
  8. Other software development
  9. Android toolchain core count issue reported
  10. New Gradle plugin for reproducibility
  11. Website updates
  12. Upstream patches
  13. Reproducibility testing framework

New binsider tool to analyse ELF binaries Reproducible Builds developer Orhun Parmaks z has announced a fantastic new tool to analyse the contents of ELF binaries. According to the project s README page:
Binsider can perform static and dynamic analysis, inspect strings, examine linked libraries, and perform hexdumps, all within a user-friendly terminal user interface!
More information about Binsider s features and how it works can be found within Binsider s documentation pages.

Unreproducibility of GHC Haskell compiler 95% fixed A seven-year-old bug about the nondeterminism of object code generated by the Glasgow Haskell Compiler (GHC) received a recent update, consisting of Rodrigo Mesquita noting that the issue is:
95% fixed by [merge request] !12680 when -fobject-determinism is enabled. [ ]
The linked merge request has since been merged, and Rodrigo goes on to say that:
After that patch is merged, there are some rarer bugs in both interface file determinism (eg. #25170) and in object determinism (eg. #25269) that need to be taken care of, but the great majority of the work needed to get there should have been merged already. When merged, I think we should close this one in favour of the more specific determinism issues like the two linked above.

Mailing list summary On our mailing list this month:
  • Fay Stegerman let everyone know that she started a thread on the Fediverse about the problems caused by unreproducible zlib/deflate compression in .zip and .apk files and later followed up with the results of her subsequent investigation.
  • Long-time developer kpcyrd wrote that there has been a recent public discussion on the Arch Linux GitLab [instance] about the challenges and possible opportunities for making the Linux kernel package reproducible , all relating to the CONFIG_MODULE_SIG flag. [ ]
  • Bernhard M. Wiedemann followed-up to an in-person conversation at our recent Hamburg 2024 summit on the potential presence for Reproducible Builds in recognised standards. [ ]
  • Fay Stegerman also wrote about her worry about the possible repercussions for RB tooling of Debian migrating from zlib to zlib-ng as reproducibility requires identical compressed data streams. [ ]
  • Martin Monperrus wrote the list announcing the latest release of maven-lockfile that is designed aid building Maven projects with integrity . [ ]
  • Lastly, Bernhard M. Wiedemann wrote about potential role of reproducible builds in combatting silent data corruption, as detailed in a recent Tweet and scholarly paper on faulty CPU cores. [ ]

Towards a 100% bit-for-bit reproducible OS Bernhard M. Wiedemann began writing on journey towards a 100% bit-for-bit reproducible operating system on the openSUSE wiki:
This is a report of Part 1 of my journey: building 100% bit-reproducible packages for every package that makes up [openSUSE s] minimalVM image. This target was chosen as the smallest useful result/artifact. The larger package-sets get, the more disk-space and build-power is required to build/verify all of them.
This work was sponsored by NLnet s NGI Zero fund.

Distribution work In Debian this month, 14 reviews of Debian packages were added, 12 were updated and 20 were removed, all adding to our knowledge about identified issues. A number of issue types were updated as well. [ ][ ] In addition, Holger opened 4 bugs against the debrebuild component of the devscripts suite of tools. In particular:
  • #1081047: Fails to download .dsc file.
  • #1081048: Does not work with a proxy.
  • #1081050: Fails to create a debrebuild.tar.
  • #1081839: Fails with E: mmdebstrap failed to run error.
Last month, an issue was filed to update the Salsa CI pipeline (used by 1,000s of Debian packages) to no longer test for reproducibility with reprotest s build_path variation. Holger Levsen provided a rationale for this change in the issue, which has already been made to the tests being performed by tests.reproducible-builds.org. This month, this issue was closed by Santiago R. R., nicely explaining that build path variation is no longer the default, and, if desired, how developers may enable it again. In openSUSE news, Bernhard M. Wiedemann published another report for that distribution.

diffoscope diffoscope is our in-depth and content-aware diff utility that can locate and diagnose reproducibility issues. This month, Chris Lamb made the following changes, including preparing and uploading version 278 to Debian:
  • New features:
    • Add a helpful contextual message to the output if comparing Debian .orig tarballs within .dsc files without the ability to fuzzy-match away the leading directory. [ ]
  • Bug fixes:
    • Drop removal of calculated os.path.basename from GNU readelf output. [ ]
    • Correctly invert X% similar value and do not emit 100% similar . [ ]
  • Misc:
    • Temporarily remove procyon-decompiler from Build-Depends as it was removed from testing (via #1057532). (#1082636)
    • Update copyright years. [ ]
For trydiffoscope, the command-line client for the web-based version of diffoscope, Chris Lamb also:
  • Added an explicit python3-setuptools dependency. (#1080825)
  • Bumped the Standards-Version to 4.7.0. [ ]

Other software development disorderfs is our FUSE-based filesystem that deliberately introduces non-determinism into system calls to reliably flush out reproducibility issues. This month, version 0.5.11-4 was uploaded to Debian unstable by Holger Levsen making the following changes:
  • Replace build-dependency on the obsolete pkg-config package with one on pkgconf, following a Lintian check. [ ]
  • Bump Standards-Version field to 4.7.0, with no related changes needed. [ ]

In addition, reprotest is our tool for building the same source code twice in different environments and then checking the binaries produced by each build for any differences. This month, version 0.7.28 was uploaded to Debian unstable by Holger Levsen including a change by Jelle van der Waa to move away from the pipes Python module to shlex, as the former will be removed in Python version 3.13 [ ].

Android toolchain core count issue reported Fay Stegerman reported an issue with the Android toolchain where a part of the build system generates a different classes.dex file (and thus a different .apk) depending on the number of cores available during the build, thereby breaking Reproducible Builds:
We ve rebuilt [tag v3.6.1] multiple times (each time in a fresh container): with 2, 4, 6, 8, and 16 cores available, respectively:
  • With 2 and 4 cores we always get an unsigned APK with SHA-256 14763d682c9286ef .
  • With 6, 8, and 16 cores we get an unsigned APK with SHA-256 35324ba4c492760 instead.

New Gradle plugin for reproducibility A new plugin for the Gradle build tool for Java has been released. This easily-enabled plugin results in:
reproducibility settings [being] applied to some of Gradle s built-in tasks that should really be the default. Compatible with Java 8 and Gradle 8.3 or later.

Website updates There were a rather substantial number of improvements made to our website this month, including:

Upstream patches The Reproducible Builds project detects, dissects and attempts to fix as many currently-unreproducible packages as possible. We endeavour to send all of our patches upstream where appropriate. This month, we wrote a large number of such patches, including:

Reproducibility testing framework The Reproducible Builds project operates a comprehensive testing framework running primarily at tests.reproducible-builds.org in order to check packages and other artifacts for reproducibility. In September, a number of changes were made by Holger Levsen, including:
  • Debian-related changes:
    • Upgrade the osuosl4 node to Debian trixie in anticipation of running debrebuild and rebuilderd there. [ ][ ][ ]
    • Temporarily mark the osuosl4 node as offline due to ongoing xfs_repair filesystem maintenance. [ ][ ]
    • Do not warn about (very old) broken nodes. [ ]
    • Add the risc64 architecture to the multiarch version skew tests for Debian trixie and sid. [ ][ ][ ]
    • Mark the virt 32,64 b nodes as down. [ ]
  • Misc changes:
    • Add support for powercycling OpenStack instances. [ ]
    • Update the fail2ban to ban hosts for 4 weeks in total [ ][ ] and take care to never ban our own Jenkins instance. [ ]
In addition, Vagrant Cascadian recorded a disk failure for the virt32b and virt64b nodes [ ], performed some maintenance of the cbxi4a node [ ][ ] and marked most armhf architecture systems as being back online.

Finally, If you are interested in contributing to the Reproducible Builds project, please visit our Contribute page on our website. However, you can get in touch with us via:

4 October 2024

Bits from Debian: Debian welcomes Freexian as our newest partner!

Freexian logo We are excited to announce and welcome Freexian into Debian Partners. Freexian specializes in Free Software with a particular focus on Debian GNU/Linux. Freexian can assist with consulting, training, technical support, packaging, or software development on projects involving use or development of Free software. All of Freexian's employees and partners are well-known contributors in the Free Software community, a choice that is integral to Freexian's business model. About the Debian Partners Program The Debian Partners Program was created to recognize companies and organizations that help and provide continuous support to the project with services, finances, equipment, vendor support, and a slew of other technical and non-technical services. Partners provide critical assistance, help, and support which has advanced and continues to further our work in providing the 'Universal Operating System' to the world. Thank you Freexian!

3 October 2024

Mike Gabriel: Creating (a) new frontend(s) for Polis

After (quite) a summer break, here comes the 4th article of the 5-episode blog post series on Polis, written by Guido Berh rster, member of staff at my company Fre(i)e Software GmbH. Have fun with the read on Guido's work on Polis,
Mike
Table of Contents of the Blog Post Series
  1. Introduction
  2. Initial evaluation and adaptation
  3. Issues extending Polis and adjusting our goals
  4. Creating (a) new frontend(s) for Polis (this article)
  5. Current status and roadmap
4. Creating (a) new frontend(s) for Polis Why a new frontend was needed... Our initial experiences of working with Polis, the effort required to implement more invasive changes and the desire of iterating changes more rapidly ultimately lead to the decision to create a new foundation for frontend development that would be independent of but compatible with the upstream project. Our primary objective was thus not to develop another frontend but rather to make frontend development more flexible and to facilitate experimentation and rapid prototyping of different frontends by providing abstraction layers and building blocks. This also implied developing a corresponding backend since the Polis backend is tightly coupled to the frontend and is neither intended to be used by third-party projects nor supporting cross-domain requests due to the expectation of being embedded as an iframe on third-party websites. The long-term plan for achieving our objectives is to provide three abstraction layers for building frontends: The Particiapp Project Under the umbrella of the Particiapp project we have so far developed two new components: Both the participation frontend and backend are fully compatible and require an existing Polis installation and can be run alongside the upstream frontend. More specifically, the administration frontend and common backend are required to administrate conversations and send out notifications and the statistics processing server is required for processing the voting results. Particiapi server For the backend the Python language and the Flask framework were chosen as a technological basis mainly due to developer mindshare, a large community and ecosystem and the smaller dependency chain and maintenance overhead compared to Node.js/npm. Instead of integrating specific identity providers we adopted the OpenID Connect standard as an abstraction layer for authentication which allows delegating authentication either to a self-hosted identity provider or a large number of existing external identity providers. Particiapp Example Frontend The experimental example frontend serves both as a test bed for the client library and as a tool for better understanding the needs of frontend designers. It also features a completely redesigned user interface and results visualization in line with our goals. Branded variants are currently used for evaluation and testing by the stakeholders. In order to simplify evaluation, development, testing and deployment a Docker Compose configuration is made available which contains all necessary components for running Polis with our experimental example frontend. In addition, a development environment is provided which includes a preconfigured OpenID Connect identity provider (KeyCloak), SMTP-Server with web interface (MailDev), and a database frontend (PgAdmin). The new frontend can also be tested using our public demo server.

1 October 2024

Colin Watson: Free software activity in September 2024

Almost all of my Debian contributions this month were sponsored by Freexian. You can also support my work directly via Liberapay. Pydantic My main Debian project for the month turned out to be getting Pydantic back into a good state in Debian testing. I ve used Pydantic quite a bit in various projects, most recently in Debusine, so I have an interest in making sure it works well in Debian. However, it had been stalled on 1.10.17 for quite a while due to the complexities of getting 2.x packaged. This was partly making sure everything else could cope with the transition, but in practice mostly sorting out packaging of its new Rust dependencies. Several other people (notably Alexandre Detiste, Andreas Tille, Drew Parsons, and Timo R hling) had made some good progress on this, but nobody had quite got it over the line and it seemed a bit stuck. Learning Rust is on my to-do list, but merely not knowing a language hasn t stopped me before. So I learned how the Debian Rust team s packaging works, upgraded a few packages to new upstream versions (including rust-half and upstream rust-idna test fixes), and packaged rust-jiter. After a lot of waiting around for various things and chasing some failures in other packages I was eventually able to get current versions of both pydantic-core and pydantic into testing. I m looking forward to being able to drop our clunky v1 compatibility code once debusine can rely on running on trixie! OpenSSH I upgraded the Debian packaging to OpenSSH 9.9p1. YubiHSM I upgraded python-yubihsm, yubihsm-connector, and yubihsm-shell to new upstream versions. I noticed that I could enable some tests in python-yubihsm and yubihsm-shell; I d previously thought the whole test suite required a real YubiHSM device, but when I looked closer it turned out that this was only true for some tests. I fixed yubihsm-shell build failures on some 32-bit architectures (upstream PRs #431, #432), and also made it build reproducibly. Thanks to Helmut Grohne, I fixed yubihsm-connector to apply udev rules to existing devices when the package is installed. As usual, bookworm-backports is up to date with all these changes. Python team setuptools 72.0.0 removed the venerable setup.py test command. This caused some fallout in Debian, some of which was quite non-obvious as packaging helpers sometimes fell back to different ways of running test suites that didn t quite work. I fixed django-guardian, manuel, python-autopage, python-flask-seeder, python-pgpdump, python-potr, python-precis-i18n, python-stopit, serpent, straight.plugin, supervisor, and zope.i18nmessageid. As usual for new language versions, the addition of Python 3.13 caused some problems. I fixed psycopg2, python-time-machine, and python-traits. I fixed build/autopkgtest failures in keymapper, python-django-test-migrations, python-rosettasciio, routes, transmissionrpc, and twisted. buildbot was in a bit of a mess due to being incompatible with SQLAlchemy 2.0. Fortunately by the time I got to it upstream had committed a workable set of patches, and the main difficulty was figuring out what to cherry-pick since they haven t made a new upstream release with all of that yet. I figured this out and got us up to 4.0.3. Adrian Bunk asked whether python-zipp should be removed from trixie. I spent some time investigating this and concluded that the answer was no, but looking into it was an interesting exercise anyway. On the other hand, I looked into flask-appbuilder, concluded that it should be removed, and filed a removal request. I upgraded some embedded CSS files in nbconvert. I upgraded importlib-resources, ipywidgets, jsonpickle, pydantic-settings, pylint (fixing a test failure), python-aiohttp-session, python-apptools, python-asyncssh, python-django-celery-beat, python-django-rules, python-limits, python-multidict, python-persistent, python-pkginfo, python-rt, python-spur, python-zipp, stravalib, transmissionrpc, vulture, zodbpickle, zope.exceptions (adopting it), zope.i18nmessageid, zope.proxy, and zope.security to new upstream versions. debmirror The experimental and *-proposed-updates suites used to not have Contents-* files, and a long time ago debmirror was changed to just skip those files in those suites. They were added to the Debian archive some time ago, but debmirror carried on skipping them anyway. Once I realized what was going on, I removed these unnecessary special cases (#819925, #1080168).

30 September 2024

Bits from Debian: New Debian Developers and Maintainers (July and August 2024)

The following contributors got their Debian Developer accounts in the last two months: The following contributors were added as Debian Maintainers in the last two months: Congratulations!

29 September 2024

Dirk Eddelbuettel: RApiSerialize 0.1.4 on CRAN: Added C++ Namespace

A new minor release 0.1.5 of RApiSerialize arrived on CRAN today. The RApiSerialize package is used by both my RcppRedis as well as by Travers excellent qs package. This release adds an optional C++ namespace, available when the API header file is included in a C++ source file. And as one often does, the release also brings a few small updates to different aspects of the packaging.

Changes in version 0.1.4 (2024-09-28)
  • Add C++ namespace in API header (Dirk in #9 closing #8)
  • Several packaging updates: switched to Authors@R, README.md badge updates, added .editorconfig and cleanup

Courtesy of my CRANberries, there is a diffstat report relative to previous release. More details are at the RApiSerialize page; code, issue tickets etc at the GitHub repositoryrapiserializerepo. If you like this or other open-source work I do, you can sponsor me at GitHub.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

Reproducible Builds: Supporter spotlight: Kees Cook on Linux kernel security

The Reproducible Builds project relies on several projects, supporters and sponsors for financial support, but they are also valued as ambassadors who spread the word about our project and the work that we do. This is the eighth installment in a series featuring the projects, companies and individuals who support the Reproducible Builds project. We started this series by featuring the Civil Infrastructure Platform project, and followed this up with a post about the Ford Foundation as well as recent ones about ARDC, the Google Open Source Security Team (GOSST), Bootstrappable Builds, the F-Droid project, David A. Wheeler and Simon Butler. Today, however, we will be talking with Kees Cook, founder of the Kernel Self-Protection Project.

Vagrant Cascadian: Could you tell me a bit about yourself? What sort of things do you work on? Kees Cook: I m a Free Software junkie living in Portland, Oregon, USA. I have been focusing on the upstream Linux kernel s protection of itself. There is a lot of support that the kernel provides userspace to defend itself, but when I first started focusing on this there was not as much attention given to the kernel protecting itself. As userspace got more hardened the kernel itself became a bigger target. Almost 9 years ago I formally announced the Kernel Self-Protection Project because the work necessary was way more than my time and expertise could do alone. So I just try to get people to help as much as possible; people who understand the ARM architecture, people who understand the memory management subsystem to help, people who understand how to make the kernel less buggy.
Vagrant: Could you describe the path that lead you to working on this sort of thing? Kees: I have always been interested in security through the aspect of exploitable flaws. I always thought it was like a magic trick to make a computer do something that it was very much not designed to do and seeing how easy it is to subvert bugs. I wanted to improve that fragility. In 2006, I started working at Canonical on Ubuntu and was mainly focusing on bringing Debian and Ubuntu up to what was the state of the art for Fedora and Gentoo s security hardening efforts. Both had really pioneered a lot of userspace hardening with compiler flags and ELF stuff and many other things for hardened binaries. On the whole, Debian had not really paid attention to it. Debian s packaging building process at the time was sort of a chaotic free-for-all as there wasn t centralized build methodology for defining things. Luckily that did slowly change over the years. In Ubuntu we had the opportunity to apply top down build rules for hardening all the packages. In 2011 Chrome OS was following along and took advantage of a bunch of the security hardening work as they were based on ebuild out of Gentoo and when they looked for someone to help out they reached out to me. We recognized the Linux kernel was pretty much the weakest link in the Chrome OS security posture and I joined them to help solve that. Their userspace was pretty well handled but the kernel had a lot of weaknesses, so focusing on hardening was the next place to go. When I compared notes with other users of the Linux kernel within Google there were a number of common concerns and desires. Chrome OS already had an upstream first requirement, so I tried to consolidate the concerns and solve them upstream. It was challenging to land anything in other kernel team repos at Google, as they (correctly) wanted to minimize their delta from upstream, so I needed to work on any major improvements entirely in upstream and had a lot of support from Google to do that. As such, my focus shifted further from working directly on Chrome OS into being entirely upstream and being more of a consultant to internal teams, helping with integration or sometimes backporting. Since the volume of needed work was so gigantic I needed to find ways to inspire other developers (both inside and outside of Google) to help. Once I had a budget I tried to get folks paid (or hired) to work on these areas when it wasn t already their job.
Vagrant: So my understanding of some of your recent work is basically defining undefined behavior in the language or compiler? Kees: I ve found the term undefined behavior to have a really strict meaning within the compiler community, so I have tried to redefine my goal as eliminating unexpected behavior or ambiguous language constructs . At the end of the day ambiguity leads to bugs, and bugs lead to exploitable security flaws. I ve been taking a four-pronged approach: supporting the work people are doing to get rid of ambiguity, identify new areas where ambiguity needs to be removed, actually removing that ambiguity from the C language, and then dealing with any needed refactoring in the Linux kernel source to adapt to the new constraints. None of this is particularly novel; people have recognized how dangerous some of these language constructs are for decades and decades but I think it is a combination of hard problems and a lot of refactoring that nobody has the interest/resources to do. So, we have been incrementally going after the lowest hanging fruit. One clear example in recent years was the elimination of C s implicit fall-through in switch statements. The language would just fall through between adjacent cases if a break (or other code flow directive) wasn t present. But this is ambiguous: is the code meant to fall-through, or did the author just forget a break statement? By defining the [[fallthrough]] statement, and requiring its use in Linux, all switch statements now have explicit code flow, and the entire class of bugs disappeared. During our refactoring we actually found that 1 in 10 added [[fallthrough]] statements were actually missing break statements. This was an extraordinarily common bug! So getting rid of that ambiguity is where we have been. Another area I ve been spending a bit of time on lately is looking at how defensive security work has challenges associated with metrics. How do you measure your defensive security impact? You can t say because we installed locks on the doors, 20% fewer break-ins have happened. Much of our signal is always secondary or retrospective, which is frustrating: This class of flaw was used X much over the last decade so, and if we have eliminated that class of flaw and will never see it again, what is the impact? Is the impact infinity? Attackers will just move to the next easiest thing. But it means that exploitation gets incrementally more difficult. As attack surfaces are reduced, the expense of exploitation goes up.
Vagrant: So it is hard to identify how effective this is how bad would it be if people just gave up? Kees: I think it would be pretty bad, because as we have seen, using secondary factors, the work we have done in the industry at large, not just the Linux kernel, has had an impact. What we, Microsoft, Apple, and everyone else is doing for their respective software ecosystems, has shown that the price of functional exploits in the black market has gone up. Especially for really egregious stuff like a zero-click remote code execution. If those were cheap then obviously we are not doing something right, and it becomes clear that it s trivial for anyone to attack the infrastructure that our lives depend on. But thankfully we have seen over the last two decades that prices for exploits keep going up and up into millions of dollars. I think it is important to keep working on that because, as a central piece of modern computer infrastructure, the Linux kernel has a giant target painted on it. If we give up, we have to accept that our computers are not doing what they were designed to do, which I can t accept. The safety of my grandparents shouldn t be any different from the safety of journalists, and political activists, and anyone else who might be the target of attacks. We need to be able to trust our devices otherwise why use them at all?
Vagrant: What has been your biggest success in recent years? Kees: I think with all these things I am not the only actor. Almost everything that we have been successful at has been because of a lot of people s work, and one of the big ones that has been coordinated across the ecosystem and across compilers was initializing stack variables to 0 by default. This feature was added in Clang, GCC, and MSVC across the board even though there were a lot of fears about forking the C language. The worry was that developers would come to depend on zero-initialized stack variables, but this hasn t been the case because we still warn about uninitialized variables when the compiler can figure that out. So you still still get the warnings at compile time but now you can count on the contents of your stack at run-time and we drop an entire class of uninitialized variable flaws. While the exploitation of this class has mostly been around memory content exposure, it has also been used for control flow attacks. So that was politically and technically a large challenge: convincing people it was necessary, showing its utility, and implementing it in a way that everyone would be happy with, resulting in the elimination of a large and persistent class of flaws in C.
Vagrant: In a world where things are generally Reproducible do you see ways in which that might affect your work? Kees: One of the questions I frequently get is, What version of the Linux kernel has feature $foo? If I know how things are built, I can answer with just a version number. In a Reproducible Builds scenario I can count on the compiler version, compiler flags, kernel configuration, etc. all those things are known, so I can actually answer definitively that a certain feature exists. So that is an area where Reproducible Builds affects me most directly. Indirectly, it is just being able to trust the binaries you are running are going to behave the same for the same build environment is critical for sane testing.
Vagrant: Have you used diffoscope? Kees: I have! One subset of tree-wide refactoring that we do when getting rid of ambiguous language usage in the kernel is when we have to make source level changes to satisfy some new compiler requirement but where the binary output is not expected to change at all. It is mostly about getting the compiler to understand what is happening, what is intended in the cases where the old ambiguity does actually match the new unambiguous description of what is intended. The binary shouldn t change. We have used diffoscope to compare the before and after binaries to confirm that yep, there is no change in binary .
Vagrant: You cannot just use checksums for that? Kees: For the most part, we need to only compare the text segments. We try to hold as much stable as we can, following the Reproducible Builds documentation for the kernel, but there are macros in the kernel that are sensitive to source line numbers and as a result those will change the layout of the data segment (and sometimes the text segment too). With diffoscope there s flexibility where I can exclude or include different comparisons. Sometimes I just go look at what diffoscope is doing and do that manually, because I can tweak that a little harder, but diffoscope is definitely the default. Diffoscope is awesome!
Vagrant: Where has reproducible builds affected you? Kees: One of the notable wins of reproducible builds lately was dealing with the fallout of the XZ backdoor and just being able to ask the question is my build environment running the expected code? and to be able to compare the output generated from one install that never had a vulnerable XZ and one that did have a vulnerable XZ and compare the results of what you get. That was important for kernel builds because the XZ threat actor was working to expand their influence and capabilities to include Linux kernel builds, but they didn t finish their work before they were noticed. I think what happened with Debian proving the build infrastructure was not affected is an important example of how people would have needed to verify the kernel builds too.
Vagrant: What do you want to see for the near or distant future in security work? Kees: For reproducible builds in the kernel, in the work that has been going on in the ClangBuiltLinux project, one of the driving forces of code and usability quality has been the continuous integration work. As soon as something breaks, on the kernel side, the Clang side, or something in between the two, we get a fast signal and can chase it and fix the bugs quickly. I would like to see someone with funding to maintain a reproducible kernel build CI. There have been places where there are certain architecture configurations or certain build configuration where we lose reproducibility and right now we have sort of a standard open source development feedback loop where those things get fixed but the time in between introduction and fix can be large. Getting a CI for reproducible kernels would give us the opportunity to shorten that time.
Vagrant: Well, thanks for that! Any last closing thoughts? Kees: I am a big fan of reproducible builds, thank you for all your work. The world is a safer place because of it.
Vagrant: Likewise for your work!


For more information about the Reproducible Builds project, please see our website at reproducible-builds.org. If you are interested in ensuring the ongoing security of the software that underpins our civilisation and wish to sponsor the Reproducible Builds project, please reach out to the project by emailing contact@reproducible-builds.org.

28 September 2024

Dave Hibberd: EuroBSDCon 2024 Report

This year I attended EuroBSDCon 2024 in Dublin. I always appreciate an excuse to head over to Ireland, and this seemed like a great chance to spend some time in Dublin and learn new things. Due to constraints on my time I didn t go to the 2 day devsummit that precedes the conference, only the main event itself.

The Event EuroBSDCon was attended by about 200-250 people, the hardcore of the BSD community! Attendees came from all over, I met Canadians, USAians, Germanians, Belgians and Irelandians amongst other nationalities! The event was at UCD Dublin, which is a gorgeous university campus about 10km south of Dublin proper in Stillorgan. The speaker hotel was a 20 minute walk (at my ~9min/km pace) from the hotel, or a quick bus journey. It was a pleasant walk, through the leafy campus and then along some pretty broad pavements, albeit beside a dual carriageway. The cycle infrastructure was pretty excellent too, but I sadly was unable to lease a city bike and make my way around on 2 wheels - Dublinbikes don t extend that far out the city. Lunch each day was Irish themed food - Saturday was beef stew (a Frenchman asked me what it was called - his only equivalent words were Beef Bourguignon ) and Sunday was Bangers & Mash! The kitchen struggled a bit - food was brought out in bowls in waves, and that ensured there was artificial scarcity that clearly left anxiety for some that they weren t going to be fed! Everyone I met was friendly from the day I arrived, and that set me very much at ease and made the event much more enjoyable - things are better shared with others. Big shout out to dch and Blake Willis for spending a lot of time talking to me over the weekend!

Talks I Attended

Keynote: Evidence based Policy formation in the EU Tom Smyth This talk given by Tom Smyth was an interesting look into his work with EU Policymakers in ensuring fair competition for his small, Irish ISP. It was an enlightening look into the workings of the EU and the various bodies that set, and manage policy. It truly is a complicated beast, but the feeling I left with was that there are people all through the organisation who are desperate to do the right thing for EU citizens at all costs. Sadly none of it is directly applicable to me living in the UK, but I still get to have a say on policy and vote in polls as an Irish citizen abroad.

10(ish) years of FreeBSD/arm64 Andrew Turner I have been a fan of ARM platforms for a long, long time. I had an early ARM Chromebook and have been equal measures excited and frustrated by the raspberry pi since first contact. I tend to find other ARM people at events and this was no exception! It was an interesting view into one person s dedication to making arm64 a platform for FreeBSD, starting out with no documentation or hardware to becoming a first-class platform. It s interesting to see the roadmap and things upcoming too and makes me hopeful for the future of arm64 in various OSes!

1-800-RC(8)-HELP: Dial Into FreeBSD Service Scripts Mastery! Mateusz Piotrowski rc scripts and startup applications scare me a bit. I m better at systemd units than sysvinit scripts, but that isn t really a transferable skill! This was a deep dive into lots of the functionality that FreeBSD s RC offers, and highlighted things that I only thought were limited to Linux s systemd. I am much more aware of what it s capable of now, but I m still scared to take it on! Afterwards I had a great chat in the hallway with Mateusz about our OS s different approaches to this problem and was impressed with the pragmatic view he had on startup, systemd, rc and the future!

Package management without borders. Using Ravenports on multiple BSDs Michael Reim Ports on the BSDs interest me, but I hadn t realised that outside of each major BSD s collection there were other, cross platform ports collections offered. Ravenports is one of these under developments, and it was good to understand the hows, why and what s happenings of the system. Plus, with my hibbian obsession on building other people s software as my own packages, it s interesting to see how others are doing it!

Building a Modern Packet Radio Network using Open Software me I spoke for 45 minutes to share my passion and frustration for amateur radio, packet radio, the law, the technology and what we re doing in the UK Packet network. This was a lot of fun - it felt like I had a busy room, lots of people interested in the stupid stuff I do with technology and I had lots of conversations after the fact about radio, telecoms, networking and at one point was cornered by what I describe as the Erlang Mafia to talk about how they could help!

Hacking - 30 years ago Walter Belgers This unrecorded talk looked at the history of the Dutch hacker scene, and a young group of hackers explorations of the early internet before modern security was a thing. It was exciting, enrapturing, well presented and a great story of a well spent youth in front of computers.

Social By 1730 I was pretty drained so I took myself back to the hotel missing the last talks, had some down time, and got the DART train to the social event at Brewdog. This invovled about an hour s walking and some train time and that was a nice time to reset my head and just watch the world. The train I was on had a particularly interesting feature where when the motors were not loaded (slowdown or coasting) the lights slowly flickered dim-bright-dim. I don t know if this is across the fleet or just this one, but it was fun to pontificate as I looked out the window at South Dublin passing by. The social was good - a few beer tokens (cider in my case, trying to avoid beer-driven hangovers still), some pleasant junk food and plenty of good company to talk to, lots of people wanting to talk about radio and packet to me! Brewdog struggled a bit - both in bar speed (a linear queue formed despite the staff preferring the crowd-around method of queue) and buffet food appeared in somewhat disjointed waves, meaning that people loitered around the food tables and cleared the plates of wings, sliders, fries, onion rings, mac & cheese as they appeared 4-5 plates at a time. Perhaps a few hundred hungry bodies was a bit too much for them to feed at once. They had shuffleboard that was played all night by various groups! I caught the last bus home, which was relatively painless!

Is our software sustainable? Kent Inge Fagerland Simonsen This was an interesting look into reducing the footprint of software to make it a net benefit. Lots of examples of how little changes can barrel up to big, gigawatthour changes when aggregated over the entire installbase of android or iOS!

A Packet s Journey Through the OpenBSD Network Stack Alexander Bluhm This was an analysis of what happens at each stage of networking in OpenBSD and was pretty interesting to see. Lots of it was out my depth, but it s cool to get an explanation and appreciation for various elements of how software handles each packet that arrives and the differences in the ipv4 and ipv6 stack!

FreeBSD at 30 Years: Its Secrets to Success Kirk McKusick This was a great statistical breakdown of FreeBSD since inception, including top committers, why certain parts of the system and community work so well and what has given it staying power compared to some projects on the internet that peter out after just a few years! Kirk s excitement and passion for the project really shone through, and I want to read his similarly titled article in the FreeBSD Journal now!

Building an open native FreeBSD CI system from scratch with lua, C, jails & zfs Dave Cottlehuber Dave spoke pretty excitedly about his work on a CI system using tools that FreeBSD ships with, and introduced me to the integration of C and Lua which I wasn t fully aware of before. Or I was, and my brain forgot it! With my interest in software build this year, it was quite a timely look at how others are thinking of doing things (I am doing similar stuff with zfs!). I look forward to playing with it when it finally is released to the Real World!

Building an Appliance Allan Jude This was an interesting look into the tools that FreeBSD provides which can be used to make immutable, appliance OSes without too much overhead. Fail safe upgrades and boots with ZFS, running approved code with secure boot, factory resetting and more were discussed! I have had thoughts around this in the recent past, so it was good to have some ideas validated, some challenged and gave me food for thought.

Experience as a speaker I really enjoyed being a speaker at the event! I ve spoken at other things before, but this really was a cut above. The event having money to provide me a hotel was a really welcome surprise, and also receiving a gorgeous scarf as a speaker gift was a great surprise (and it has already been worn with the change of temperature here in Scotland this week!). I would definitely consider returning, either as an attendee or as a speaker. The community of attendees were pragmatic, interesting, engaging and welcoming, the organising committee were spot-on in their work making it happen and the whole event, while turning my brain to mush with all the information, was really enjoyable and I left energised and excited by things instead of ground down and tired.

26 September 2024

Vasudev Kamath: Disabling Lockdown Mode with Secure Boot on Distro Kernel

In my previous post, I mentioned that Lockdown mode is activated when Secure Boot is enabled. One way to override this was to use a self-compiled upstream kernel. However, sometimes we might want to use the distribution kernel itself. This post explains how to disable lockdown mode while keeping Secure Boot enabled with a distribution kernel.
Understanding Secure Boot Detection To begin, we need to understand how the kernel detects if Secure Boot is enabled. This is done by the efi_get_secureboot function, as shown in the image below: Secure Boot status check
Disabling Kernel Lockdown The kernel code uses the value of MokSBStateRT to identify the Secure Boot state, assuming that Secure Boot can only be enabled via shim. This assumption holds true when using the Microsoft certificate for signature validation (as Microsoft currently only signs shim). However, if we're using our own keys, we don't need shim and can sign the bootloader ourselves. In this case, the Secure Boot state of the system doesn't need to be tied to the MokSBStateRT variable. To disable kernel lockdown, we need to set the UEFI runtime variable MokSBStateRT. This essentially tricks the kernel into thinking Secure Boot is disabled when it's actually enabled. This is achieved using a UEFI initializing driver. The code for this was written by an anonymous colleague who also assisted me with various configuration guidance for setting up UKI and Secure Boot on my system. The code is available here.
Implementation Detailed instructions for compiling and deploying the code are provided in the repository, so I won't repeat them here.
Results I've tested this method with the default distribution kernel on my Debian unstable system, and it successfully disables lockdown while maintaining Secure Boot integrity. See the screenshot below for confirmation: Distribution kernel lockdown disabled

Melissa Wen: Reflections on 2024 Linux Display Next Hackfest

Hey everyone! The 2024 Linux Display Next hackfest concluded in May, and its outcomes continue to shape the Linux Display stack. Igalia hosted this year s event in A Coru a, Spain, bringing together leading experts in the field. Samuel Iglesias and I organized this year s edition and this blog post summarizes the experience and its fruits. One of the highlights of this year s hackfest was the wide range of backgrounds represented by our 40 participants (both on-site and remotely). Developers and experts from various companies and open-source projects came together to advance the Linux Display ecosystem. You can find the list of participants here. The event covered a broad spectrum of topics affecting the development of Linux projects, user experiences, and the future of display technologies on Linux. From cutting-edge topics to long-term discussions, you can check the event agenda here.

Organization Highlights The hackfest was marked by in-depth discussions and knowledge sharing among Linux contributors, making everyone inspired, informed, and connected to the community. Building on feedback from the previous year, we refined the unconference format to enhance participant preparation and engagement. Structured Agenda and Timeboxes: Each session had a defined scope, time limit (1h20 or 2h10), and began with an introductory talk on the topic.
  • Participant-Led Discussions: We pre-selected in-person participants to lead discussions, allowing them to prepare introductions, resources, and scope.
  • Transparent Scheduling: The schedule was shared in advance as GitHub issues, encouraging participants to review and prepare for sessions of interest.
Engaging Sessions: The hackfest featured a variety of topics, including presentations and discussions on how participants were addressing specific subjects within their companies.
  • No Breakout Rooms, No Overlaps: All participants chose to attend all sessions, eliminating the need for separate breakout rooms. We also adapted run-time schedule to keep everybody involved in the same topics.
  • Real-time Updates: We provided notifications and updates through dedicated emails and the event matrix room.
Strengthening Community Connections: The hackfest offered ample opportunities for networking among attendees.
  • Social Events: Igalia sponsored coffee breaks, lunches, and a dinner at a local restaurant.
  • Museum Visit: Participants enjoyed a sponsored visit to the Museum of Estrela Galicia Beer (MEGA).

Fruitful Discussions and Follow-up The structured agenda and breaks allowed us to cover multiple topics during the hackfest. These discussions have led to new display feature development and improvements, as evidenced by patches, merge requests, and implementations in project repositories and mailing lists. With the KMS color management API taking shape, we discussed refinements and best approaches to cover the variety of color pipeline from different hardware-vendors. We are also investigating techniques for a performant SDR<->HDR content reproduction and reducing latency and power consumption when using the color blocks of the hardware.

Color Management/HDR Color Management and HDR continued to be the hottest topic of the hackfest. We had three sessions dedicated to discuss Color and HDR across Linux Display stack layers.

Color/HDR (Kernel-Level) Harry Wentland (AMD) led this session. Here, kernel Developers shared the Color Management pipeline of AMD, Intel and NVidia. We counted with diagrams and explanations from HW-vendors developers that discussed differences, constraints and paths to fit them into the KMS generic color management properties such as advertising modeset needs, IN\_FORMAT, segmented LUTs, interpolation types, etc. Developers from Qualcomm and ARM also added information regarding their hardware. Upstream work related to this session:

Color/HDR (Compositor-Level) Sebastian Wick (RedHat) led this session. It started with Sebastian s presentation covering Wayland color protocols and compositor implementation. Also, an explanation of APIs provided by Wayland and how they can be used to achieve better color management for applications and discussions around ICC profiles and color representation metadata. There was also an intensive Q&A about LittleCMS with Marti Maria. Upstream work related to this session:

Color/HDR (Use Cases and Testing) Christopher Cameron (Google) and Melissa Wen (Igalia) led this session. In contrast to the other sessions, here we focused less on implementation and more on brainstorming and reflections of real-world SDR and HDR transformations (use and validation) and gainmaps. Christopher gave a nice presentation explaining HDR gainmap images and how we should think of HDR. This presentation and Q&A were important to put participants at the same page of how to transition between SDR and HDR and somehow emulating HDR. We also discussed on the usage of a kernel background color property. Finally, we discussed a bit about Chamelium and the future of VKMS (future work and maintainership).

Power Savings vs Color/Latency Mario Limonciello (AMD) led this session. Mario gave an introductory presentation about AMD ABM (adaptive backlight management) that is similar to Intel DPST. After some discussions, we agreed on exposing a kernel property for power saving policy. This work was already merged on kernel and the userspace support is under development. Upstream work related to this session:

Strategy for video and gaming use-cases Leo Li (AMD) led this session. Miguel Casas (Google) started this session with a presentation of Overlays in Chrome/OS Video, explaining the main goal of power saving by switching off GPU for accelerated compositing and the challenges of different colorspace/HDR for video on Linux. Then Leo Li presented different strategies for video and gaming and we discussed the userspace need of more detailed feedback mechanisms to understand failures when offloading. Also, creating a debugFS interface came up as a tool for debugging and analysis.

Real-time scheduling and async KMS API Xaver Hugl (KDE/BlueSystems) led this session. Compositor developers have exposed some issues with doing real-time scheduling and async page flips. One is that the Kernel limits the lifetime of realtime threads and if a modeset takes too long, the thread will be killed and thus the compositor as well. Also, simple page flips take longer than expected and drivers should optimize them. Another issue is the lack of feedback to compositors about hardware programming time and commit deadlines (the lastest possible time to commit). This is difficult to predict from drivers, since it varies greatly with the type of properties. For example, color management updates take much longer. In this regard, we discusssed implementing a hw_done callback to timestamp when the hardware programming of the last atomic commit is complete. Also an API to pre-program color pipeline in a kind of A/B scheme. It may not be supported by all drivers, but might be useful in different ways.

VRR/Frame Limit, Display Mux, Display Control, and more and beer We also had sessions to discuss a new KMS API to mitigate headaches on VRR and Frame Limit as different brightness level at different refresh rates, abrupt changes of refresh rates, low frame rate compensation (LFC) and precise timing in VRR more. On Display Control we discussed features missing in the current KMS interface for HDR mode, atomic backlight settings, source-based tone mapping, etc. We also discussed the need of a place where compositor developers can post TODOs to be developed by KMS people. The Content-adaptive Scaling and Sharpening session focused on sharpening and scaling filters. In the Display Mux session, we discussed proposals to expose the capability of dynamic mux switching display signal between discrete and integrated GPUs. In the last session of the 2024 Display Next Hackfest, participants representing different compositors summarized current and future work and built a Linux Display wish list , which includes: improvements to VTTY and HDR switching, better dmabuf API for multi-GPU support, definition of tone mapping, blending and scaling sematics, and wayland protocols for advertising to clients which colorspaces are supported. We closed this session with a status update on feature development by compositors, including but not limited to: plane offloading (from libcamera to output) / HDR video offloading (dma-heaps) / plane-based scrolling for web pages, color management / HDR / ICC profiles support, addressing issues such as flickering when color primaries don t match, etc. After three days of intensive discussions, all in-person participants went to a guided tour at the Museum of Extrela Galicia beer (MEGA), pouring and tasting the most famous local beer.

Feedback and Future Directions Participants provided valuable feedback on the hackfest, including suggestions for future improvements.
  • Schedule and Break-time Setup: Having a pre-defined agenda and schedule provided a better balance between long discussions and mental refreshments, preventing the fatigue caused by endless discussions.
  • Action Points: Some participants recommended explicitly asking for action points at the end of each session and assigning people to follow-up tasks.
  • Remote Participation: Remote attendees appreciated the inclusive setup and opportunities to actively participate in discussions.
  • Technical Challenges: There were bandwidth and video streaming issues during some sessions due to the large number of participants.

Thank you for joining the 2024 Display Next Hackfest We can t help but thank the 40 participants, who engaged in-person or virtually on relevant discussions, for a collaborative evolution of the Linux display stack and for building an insightful agenda. A big thank you to the leaders and presenters of the nine sessions: Christopher Cameron (Google), Harry Wentland (AMD), Leo Li (AMD), Mario Limoncello (AMD), Sebastian Wick (RedHat) and Xaver Hugl (KDE/BlueSystems) for the effort in preparing the sessions, explaining the topic and guiding discussions. My acknowledge to the others in-person participants that made such an effort to travel to A Coru a: Alex Goins (NVIDIA), David Turner (Raspberry Pi), Georges Stavracas (Igalia), Joan Torres (SUSE), Liviu Dudau (Arm), Louis Chauvet (Bootlin), Robert Mader (Collabora), Tian Mengge (GravityXR), Victor Jaquez (Igalia) and Victoria Brekenfeld (System76). It was and awesome opportunity to meet you and chat face-to-face. Finally, thanks virtual participants who couldn t make it in person but organized their days to actively participate in each discussion, adding different perspectives and valuable inputs even remotely: Abhinav Kumar (Qualcomm), Chaitanya Borah (Intel), Christopher Braga (Qualcomm), Dor Askayo (Red Hat), Jiri Koten (RedHat), Jonas dahl (Red Hat), Leandro Ribeiro (Collabora), Marti Maria (Little CMS), Marijn Suijten, Mario Kleiner, Martin Stransky (Red Hat), Michel D nzer (Red Hat), Miguel Casas-Sanchez (Google), Mitulkumar Golani (Intel), Naveen Kumar (Intel), Niels De Graef (Red Hat), Pekka Paalanen (Collabora), Pichika Uday Kiran (AMD), Shashank Sharma (AMD), Sriharsha PV (AMD), Simon Ser, Uma Shankar (Intel) and Vikas Korjani (AMD). We look forward to another successful Display Next hackfest, continuing to drive innovation and improvement in the Linux display ecosystem!

25 September 2024

Melissa Wen: Reflections on 2024 Linux Display Next Hackfest

Hey everyone! The 2024 Linux Display Next hackfest concluded in May, and its outcomes continue to shape the Linux Display stack. Igalia hosted this year s event in A Coru a, Spain, bringing together leading experts in the field. Samuel Iglesias and I organized this year s edition and this blog post summarizes the experience and its fruits. One of the highlights of this year s hackfest was the wide range of backgrounds represented by our 40 participants (both on-site and remotely). Developers and experts from various companies and open-source projects came together to advance the Linux Display ecosystem. You can find the list of participants here. The event covered a broad spectrum of topics affecting the development of Linux projects, user experiences, and the future of display technologies on Linux. From cutting-edge topics to long-term discussions, you can check the event agenda here.

Organization Highlights The hackfest was marked by in-depth discussions and knowledge sharing among Linux contributors, making everyone inspired, informed, and connected to the community. Building on feedback from the previous year, we refined the unconference format to enhance participant preparation and engagement. Structured Agenda and Timeboxes: Each session had a defined scope, time limit (1h20 or 2h10), and began with an introductory talk on the topic.
  • Participant-Led Discussions: We pre-selected in-person participants to lead discussions, allowing them to prepare introductions, resources, and scope.
  • Transparent Scheduling: The schedule was shared in advance as GitHub issues, encouraging participants to review and prepare for sessions of interest.
Engaging Sessions: The hackfest featured a variety of topics, including presentations and discussions on how participants were addressing specific subjects within their companies.
  • No Breakout Rooms, No Overlaps: All participants chose to attend all sessions, eliminating the need for separate breakout rooms. We also adapted run-time schedule to keep everybody involved in the same topics.
  • Real-time Updates: We provided notifications and updates through dedicated emails and the event matrix room.
Strengthening Community Connections: The hackfest offered ample opportunities for networking among attendees.
  • Social Events: Igalia sponsored coffee breaks, lunches, and a dinner at a local restaurant.
  • Museum Visit: Participants enjoyed a sponsored visit to the Museum of Estrela Galicia Beer (MEGA).

Fruitful Discussions and Follow-up The structured agenda and breaks allowed us to cover multiple topics during the hackfest. These discussions have led to new display feature development and improvements, as evidenced by patches, merge requests, and implementations in project repositories and mailing lists. With the KMS color management API taking shape, we discussed refinements and best approaches to cover the variety of color pipeline from different hardware-vendors. We are also investigating techniques for a performant SDR<->HDR content reproduction and reducing latency and power consumption when using the color blocks of the hardware.

Color Management/HDR Color Management and HDR continued to be the hottest topic of the hackfest. We had three sessions dedicated to discuss Color and HDR across Linux Display stack layers.

Color/HDR (Kernel-Level) Harry Wentland (AMD) led this session. Here, kernel Developers shared the Color Management pipeline of AMD, Intel and NVidia. We counted with diagrams and explanations from HW-vendors developers that discussed differences, constraints and paths to fit them into the KMS generic color management properties such as advertising modeset needs, IN\_FORMAT, segmented LUTs, interpolation types, etc. Developers from Qualcomm and ARM also added information regarding their hardware. Upstream work related to this session:

Color/HDR (Compositor-Level) Sebastian Wick (RedHat) led this session. It started with Sebastian s presentation covering Wayland color protocols and compositor implementation. Also, an explanation of APIs provided by Wayland and how they can be used to achieve better color management for applications and discussions around ICC profiles and color representation metadata. There was also an intensive Q&A about LittleCMS with Marti Maria. Upstream work related to this session:

Color/HDR (Use Cases and Testing) Christopher Cameron (Google) and Melissa Wen (Igalia) led this session. In contrast to the other sessions, here we focused less on implementation and more on brainstorming and reflections of real-world SDR and HDR transformations (use and validation) and gainmaps. Christopher gave a nice presentation explaining HDR gainmap images and how we should think of HDR. This presentation and Q&A were important to put participants at the same page of how to transition between SDR and HDR and somehow emulating HDR. We also discussed on the usage of a kernel background color property. Finally, we discussed a bit about Chamelium and the future of VKMS (future work and maintainership).

Power Savings vs Color/Latency Mario Limonciello (AMD) led this session. Mario gave an introductory presentation about AMD ABM (adaptive backlight management) that is similar to Intel DPST. After some discussions, we agreed on exposing a kernel property for power saving policy. This work was already merged on kernel and the userspace support is under development. Upstream work related to this session:

Strategy for video and gaming use-cases Leo Li (AMD) led this session. Miguel Casas (Google) started this session with a presentation of Overlays in Chrome/OS Video, explaining the main goal of power saving by switching off GPU for accelerated compositing and the challenges of different colorspace/HDR for video on Linux. Then Leo Li presented different strategies for video and gaming and we discussed the userspace need of more detailed feedback mechanisms to understand failures when offloading. Also, creating a debugFS interface came up as a tool for debugging and analysis.

Real-time scheduling and async KMS API Xaver Hugl (KDE/BlueSystems) led this session. Compositor developers have exposed some issues with doing real-time scheduling and async page flips. One is that the Kernel limits the lifetime of realtime threads and if a modeset takes too long, the thread will be killed and thus the compositor as well. Also, simple page flips take longer than expected and drivers should optimize them. Another issue is the lack of feedback to compositors about hardware programming time and commit deadlines (the lastest possible time to commit). This is difficult to predict from drivers, since it varies greatly with the type of properties. For example, color management updates take much longer. In this regard, we discusssed implementing a hw_done callback to timestamp when the hardware programming of the last atomic commit is complete. Also an API to pre-program color pipeline in a kind of A/B scheme. It may not be supported by all drivers, but might be useful in different ways.

VRR/Frame Limit, Display Mux, Display Control, and more and beer We also had sessions to discuss a new KMS API to mitigate headaches on VRR and Frame Limit as different brightness level at different refresh rates, abrupt changes of refresh rates, low frame rate compensation (LFC) and precise timing in VRR more. On Display Control we discussed features missing in the current KMS interface for HDR mode, atomic backlight settings, source-based tone mapping, etc. We also discussed the need of a place where compositor developers can post TODOs to be developed by KMS people. The Content-adaptive Scaling and Sharpening session focused on sharpening and scaling filters. In the Display Mux session, we discussed proposals to expose the capability of dynamic mux switching display signal between discrete and integrated GPUs. In the last session of the 2024 Display Next Hackfest, participants representing different compositors summarized current and future work and built a Linux Display wish list , which includes: improvements to VTTY and HDR switching, better dmabuf API for multi-GPU support, definition of tone mapping, blending and scaling sematics, and wayland protocols for advertising to clients which colorspaces are supported. We closed this session with a status update on feature development by compositors, including but not limited to: plane offloading (from libcamera to output) / HDR video offloading (dma-heaps) / plane-based scrolling for web pages, color management / HDR / ICC profiles support, addressing issues such as flickering when color primaries don t match, etc. After three days of intensive discussions, all in-person participants went to a guided tour at the Museum of Extrela Galicia beer (MEGA), pouring and tasting the most famous local beer.

Feedback and Future Directions Participants provided valuable feedback on the hackfest, including suggestions for future improvements.
  • Schedule and Break-time Setup: Having a pre-defined agenda and schedule provided a better balance between long discussions and mental refreshments, preventing the fatigue caused by endless discussions.
  • Action Points: Some participants recommended explicitly asking for action points at the end of each session and assigning people to follow-up tasks.
  • Remote Participation: Remote attendees appreciated the inclusive setup and opportunities to actively participate in discussions.
  • Technical Challenges: There were bandwidth and video streaming issues during some sessions due to the large number of participants.

Thank you for joining the 2024 Display Next Hackfest We can t help but thank the 40 participants, who engaged in-person or virtually on relevant discussions, for a collaborative evolution of the Linux display stack and for building an insightful agenda. A big thank you to the leaders and presenters of the nine sessions: Christopher Cameron (Google), Harry Wentland (AMD), Leo Li (AMD), Mario Limoncello (AMD), Sebastian Wick (RedHat) and Xaver Hugl (KDE/BlueSystems) for the effort in preparing the sessions, explaining the topic and guiding discussions. My acknowledge to the others in-person participants that made such an effort to travel to A Coru a: Alex Goins (NVIDIA), David Turner (Raspberry Pi), Georges Stavracas (Igalia), Joan Torres (SUSE), Liviu Dudau (Arm), Louis Chauvet (Bootlin), Robert Mader (Collabora), Tian Mengge (GravityXR), Victor Jaquez (Igalia) and Victoria Brekenfeld (System76). It was and awesome opportunity to meet you and chat face-to-face. Finally, thanks virtual participants who couldn t make it in person but organized their days to actively participate in each discussion, adding different perspectives and valuable inputs even remotely: Abhinav Kumar (Qualcomm), Chaitanya Borah (Intel), Christopher Braga (Qualcomm), Dor Askayo, Jiri Koten (RedHat), Jonas dahl (Red Hat), Leandro Ribeiro (Collabora), Marti Maria (Little CMS), Marijn Suijten, Mario Kleiner, Martin Stransky (Red Hat), Michel D nzer (Red Hat), Miguel Casas-Sanchez (Google), Mitulkumar Golani (Intel), Naveen Kumar (Intel), Niels De Graef (Red Hat), Pekka Paalanen (Collabora), Pichika Uday Kiran (AMD), Shashank Sharma (AMD), Sriharsha PV (AMD), Simon Ser, Uma Shankar (Intel) and Vikas Korjani (AMD). We look forward to another successful Display Next hackfest, continuing to drive innovation and improvement in the Linux display ecosystem!

24 September 2024

Vasudev Kamath: Note to Self: Enabling Secure Boot with UKI on Debian

Note

This post is a continuation of my previous article on enabling the Unified Kernel Image (UKI) on Debian.

In this guide, we'll implement Secure Boot by taking full control of the device, removing preinstalled keys, and installing our own. For a comprehensive overview of the benefits and process, refer to this excellent post from rodsbooks.
Key Components To implement Secure Boot, we need three essential keys:
  1. Platform Key (PK): The top-level key in Secure Boot, typically provided by the motherboard manufacturer. We'll replace the vendor-supplied PK with our own for complete control.
  2. Key Exchange Key (KEK): Used to sign updates for the Signatures Database and Forbidden Signatures Database.
  3. Database Key (DB): Used to sign or verify binaries (bootloaders, boot managers, shells, drivers, etc.).
There's also a Forbidden Signature Key (dbx), which is the opposite of the DB key. We won't be generating this key in this guide.
Preparing for Key Enrollment Before enrolling our keys, we need to put the device in Secure Boot Setup Mode. Verify the status using the bootctl status command. You should see output similar to the following image: UEFI Setup mode
Generating Keys Follow these instructions from the Arch Wiki to generate the keys manually. You'll need the efitools and openssl packages. I recommend using rsa:2048 as the key size for better compatibility with older firmware. After generating the keys, copy all .auth files to the /efi/loader/keys/<hostname>/ folder. For example:
  sudo ls /efi/loader/keys/chamunda
db.auth  KEK.auth  PK.auth
Signing the Bootloader Sign the systemd-boot bootloader with your new keys:
sbsign --key <path-to db.key> --cert <path-to db.crt> \
   /usr/lib/systemd/boot/efi/systemd-bootx64.efi
Install the signed bootloader using bootctl install. The output should resemble this: bootctl install

Note

If you encounter warnings about mount options, update your fstab with the umask=0077 option for the EFI partition.

Verify the signature using sbsign --verify: sbsign verify
Configuring UKI for Secure Boot Update the /etc/kernel/uki.conf file with your key paths:
SecureBootPrivateKey=/path/to/db.key
SecureBootCertificate=/path/to/db.crt
Signing the UKI Image On Debian, use dpkg-reconfigure to sign the UKI image for each kernel:
sudo dpkg-reconfigure linux-image-$(uname -r)
# Repeat for other kernel versions if necessary
You should see output similar to this:
sudo dpkg-reconfigure linux-image-$(uname -r)
/etc/kernel/postinst.d/dracut:
dracut: Generating /boot/initrd.img-6.10.9-amd64
Updating kernel version 6.10.9-amd64 in systemd-boot...
Signing unsigned original image
Using config file: /etc/kernel/uki.conf
+ sbverify --list /boot/vmlinuz-6.10.9-amd64
+ sbsign --key /home/vasudeva.sk/Documents/personal/secureboot/db.key --cert /home/vasudeva.sk/Documents/personal/secureboot/db.crt /tmp/ukicc7vcxhy --output /tmp/kernel-install.staging.QLeGLn/uki.efi
Wrote signed /tmp/kernel-install.staging.QLeGLn/uki.efi
/etc/kernel/postinst.d/zz-systemd-boot:
Installing kernel version 6.10.9-amd64 in systemd-boot...
Signing unsigned original image
Using config file: /etc/kernel/uki.conf
+ sbverify --list /boot/vmlinuz-6.10.9-amd64
+ sbsign --key /home/vasudeva.sk/Documents/personal/secureboot/db.key --cert /home/vasudeva.sk/Documents/personal/secureboot/db.crt /tmp/ukit7r1hzep --output /tmp/kernel-install.staging.dWVt5s/uki.efi
Wrote signed /tmp/kernel-install.staging.dWVt5s/uki.efi
Enrolling Keys in Firmware Use systemd-boot to enroll your keys:
systemctl reboot --boot-loader-menu=0
Select the enroll option with your hostname in the systemd-boot menu. After key enrollment, the system will reboot into the newly signed kernel. Verify with bootctl: uefi enabled
Dealing with Lockdown Mode Secure Boot enables lockdown mode on distro-shipped kernels, which restricts certain features like kprobes/BPF and DKMS drivers. To avoid this, consider compiling the upstream kernel directly, which doesn't enable lockdown mode by default. As Linus Torvalds has stated, "there is no reason to tie Secure Boot to lockdown LSM." You can read more about Torvalds' opinion on UEFI tied with lockdown.
Next Steps One thing that remains is automating the signing of systemd-boot on upgrade, which is currently a manual process. I'm exploring dpkg triggers for achieving this, and if I succeed, I will write a new post with details.
Acknowledgments Special thanks to my anonymous colleague who provided invaluable assistance throughout this process.

21 September 2024

Gunnar Wolf: 50 years of queries

This post is a review for Computing Reviews for 50 years of queries , a article published in Communications of the ACM
The relational model is probably the one innovation that brought computers to the mainstream for business users. This article by Donald Chamberlin, creator of one of the first query languages (that evolved into the ubiquitous SQL), presents its history as a commemoration of the 50th anniversary of his publication of said query language. The article begins by giving background on information processing before the advent of today s database management systems: with systems storing and processing information based on sequential-only magnetic tapes in the 1950s, adopting a record-based, fixed-format filing system was far from natural. The late 1960s and early 1970s saw many fundamental advances, among which one of the best known is E. F. Codd s relational model. The first five pages (out of 12) present the evolution of the data management community up to the 1974 SIGFIDET conference. This conference was so important in the eyes of the author that, in his words, it is the event that starts the clock on 50 years of relational databases. The second part of the article tells about the growth of the structured English query language (SEQUEL) eventually renamed SQL including the importance of its standardization and its presence in commercial products as the dominant database language since the late 1970s. Chamberlin presents short histories of the various implementations, many of which remain dominant names today, that is, Oracle, Informix, and DB2. Entering the 1990s, open-source communities introduced MySQL, PostgreSQL, and SQLite. The final part of the article presents controversies and criticisms related to SQL and the relational database model as a whole. Chamberlin presents the main points of controversy throughout the years: 1) the SQL language lacks orthogonality; 2) SQL tables, unlike formal relations, might contain null values; and 3) SQL tables, unlike formal relations, may contain duplicate rows. He explains the issues and tradeoffs that guided the language design as it unfolded. Finally, a section presents several points that explain how SQL and the relational model have remained, for 50 years, a winning concept, as well as some thoughts regarding the NoSQL movement that gained traction in the 2010s. This article is written with clear language and structure, making it easy and pleasant to read. It does not drive a technical point, but instead is a recap on half a century of developments in one of the fields most important to the commercial development of computing, written by one of the greatest authorities on the topic.

20 September 2024

Sahil Dhiman: Educational and Research Institutions With Own ASN in India

Another one of the ASN list. This turned out longer than I expected (which is good). If you want to briefly understand what is an ASN, my Personal ASNs From India post carries an introduction to it. Now, here re the Educational and Research Institutions with their own ASN in India, which I could find: Special Mentions Some observations: Let me know if I m missing someone.

17 September 2024

Dirk Eddelbuettel: nanotime 0.3.10 on CRAN: Update

A minor update 0.3.10 for our nanotime package is now on CRAN. nanotime relies on the RcppCCTZ package (as well as the RcppDate package for additional C++ operations) and offers efficient high(er) resolution time parsing and formatting up to nanosecond resolution, using the bit64 package for the actual integer64 arithmetic. Initially implemented using the S3 system, it has benefitted greatly from a rigorous refactoring by Leonardo who not only rejigged nanotime internals in S4 but also added new S4 types for periods, intervals and durations. This release updates one S4 methods to very recent changes in r-devel for which CRAN had reached out. This concerns the setdiff() method when applied to two nanotime objects. As it only affected R 4.5.0, due next April, if rebuilt in the last two or so weeks it will not have been visible to that many users, if any. In any event, it now works again for that setup too, and should be going forward. We also retired one demo function from the very early days, apparently it relied on ggplot2 features that have since moved on. If someone would like to help out and resurrect the demo, please get in touch. We also cleaned out some no longer used tests, and updated DESCRIPTION to what is required now. The NEWS snippet below has the full details.

Changes in version 0.3.10 (2024-09-16)
  • Retire several checks for Solaris in test suite (Dirk in #130)
  • Switch to Authors@R in DESCRIPTION as now required by CRAN
  • Accommodate R-devel change for setdiff (Dirk in #133 fixing #132)
  • No longer ship defunction ggplot2 demo (Dirk fixing #131)

Thanks to my CRANberries, there is a diffstat report for this release. More details and examples are at the nanotime page; code, issue tickets etc at the GitHub repository and all documentation is provided at the nanotime documentation site. If you like this or other open-source work I do, you can sponsor me at GitHub.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

14 September 2024

Evgeni Golov: Fixing the volume control in an Alesis M1Active 330 USB Speaker System

I've a set of Alesis M1Active 330 USB on my desk to listen to music. They were relatively inexpensive (~100 ), have USB and sound pretty good for their size/price. They were also sitting on my desk unused for a while, because the left speaker didn't produce any sound. Well, almost any. If you'd move the volume knob long enough you might have found a position where the left speaker would work a bit, but it'd be quieter than the right one and stop working again after some time. Pretty unacceptable when you want to listen to music. Given the right speaker was working just fine and the left would work a bit when the volume knob is moved, I was quite certain which part was to blame: the potentiometer. So just open the right speaker (it contains all the logic boards, power supply, etc), take out the broken potentiometer, buy a new one, replace, done. Sounds easy? Well, to open the speaker you gotta loosen 8 (!) screws on the back. At least it's not glued, right? Once the screws are removed you can pull out the back plate, which will bring the power supply, USB controller, sound amplifier and cables, lots of cables: two pairs of thick cables, one to each driver, one thin pair for the power switch and two sets of "WTF is this, I am not going to trace pinouts today", one with a 6 pin plug, one with a 5 pin one. Unplug all of these! Yes, they are plugged, nice. Nope, still no friggin' idea how to get to the potentiometer. If you trace the "thin pair" and "WTF1" cables, you see they go inside a small wooden box structure. So we have to pull the thing from the front? Okay, let's remove the plastic part of the knob Right, this looks like a potentiometer. Unscrew it. No, no need for a Makita wrench, I just didn't have anything else in the right size (10mm). right Alesis M1Active 330 USB speaker with a Makita wrench where the volume knob is Still, no movement. Let's look again from the inside! Oh ffs, there are six more screws inside, holding the front. Away with them! Just need a very long PH1 screwdriver. Now you can slowly remove the part of the front where the potentiometer is. Be careful, the top tweeter is mounted to the front, not the main case and so is the headphone jack, without an obvious way to detach it. But you can move away the front far enough to remove the small PCB with the potentiometer and the LED. right Alesis M1Active 330 USB speaker open Great, this was the easy part! The only thing printed on the potentiometer is "A10K". 10K is easy -- 10kOhm. A?! Wikipedia says "A" means "logarithmic", but only if made in the US or Asia. In Europe that'd be "linear". "B" in US/Asia means "linear", in Europe "logarithmic". Do I need to tap the sign again? (The sign is a print of XKCD#927.) My multimeter says in this case it's something like logarithmic. On the right channel anyway, the left one is more like a chopping board. And what's this green box at the end? Oh right, this thing also turns the power on and off. So it's a power switch. Where the fuck do I get a logarithmic 10kOhm stereo potentiometer with a power switch? And then in the exact right size too?! Of course not at any of the big German electronics pharmacies. But AliExpress saves the day, again. It's even the same color! Soldering without pulling out the cable out of the case was a bit challenging, but I've managed it and now have stereo sound again. Yay! PS: Don't operate this thing open to try it out. 230V are dangerous!

12 September 2024

Dirk Eddelbuettel: RcppArmadillo 14.0.2-1 on CRAN: Updates

armadillo image Armadillo is a powerful and expressive C++ template library for linear algebra and scientific computing. It aims towards a good balance between speed and ease of use, has a syntax deliberately close to Matlab, and is useful for algorithm development directly in C++, or quick conversion of research code into production environments. RcppArmadillo integrates this library with the R environment and language and is widely used by (currently) 1164 other packages on CRAN, downloaded 36.1 million times (per the partial logs from the cloud mirrors of CRAN), and the CSDA paper (preprint / vignette) by Conrad and myself has been cited 595 times according to Google Scholar. Conrad released two small incremental releases to version 14.0.0. We did not immediately bring these to CRAN as we have to be mindful of the desired upload cadence of once every one or two months . But as 14.0.2 has been stable for a few weeks, we now decided to bring it to CRAN. Changes since the last CRAN release are summarised below, and overall fairly minimal. On the package side, we reorder what citation() returns, and now follow CRAN requirements via Authors@R.

Changes in RcppArmadillo version 14.0.2-1 (2024-09-11)
  • Upgraded to Armadillo release 14.0.2 (Stochastic Parrot)
    • Optionally use C++20 memory alignment
    • Minor corrections for several corner-cases
  • The order of items displayed by citation() is reversed (Conrad in #449)
  • The DESCRIPTION file now uses an Authors@R field with ORCID IDs

Courtesy of my CRANberries, there is a diffstat report relative to previous release. More detailed information is on the RcppArmadillo page. Questions, comments etc should go to the rcpp-devel mailing list off the Rcpp R-Forge page. If you like this or other open-source work I do, you can sponsor me at GitHub.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

10 September 2024

Freexian Collaborators: Debian Contributions: Python 3 patches, OpenSSH GSS-API split, rebootstrap, salsa CI, etc. (by Anupa Ann Joseph)

Debian Contributions: 2024-08 Contributing to Debian is part of Freexian s mission. This article covers the latest achievements of Freexian and their collaborators. All of this is made possible by organizations subscribing to our Long Term Support contracts and consulting services.

Debian Python 3 patch review, by Stefano Rivera Last month, at DebConf, Stefano reviewed the current patch set of Debian s cPython packages with Matthias Klose, the primary maintainer until now. As a result of that review, Stefano re-reviewed the patchset, updating descriptions, etc. A few patches were able to be dropped, and a few others were forwarded upstream. One finds all sorts of skeletons doing reviews like this. One of the patches had been inactive (fortunately, because it was buggy) since the day it was applied, 13 years ago. One is a cleanup that probably only fixes a bug on HPUX, and is a result of copying code from xfree86 into Python 25 years ago. It was fixed in xfree86 a year later. Others support just Debian-specific functionality and probably never seemed worth forwarding. Or good cleanup that only really applies to Debian. A trivial new patch would allow Debian to multiarch co-install Python stable ABI dynamic extensions (like we can with regular dynamic extensions). Performance concerns are stalling it in review, at the moment.

DebConf 24 Organization, by Stefano Rivera Stefano helped organize DebConf 24, which concluded in early August. The event is run by a large entirely volunteer team. The work involved in making this happen is far too varied to describe here. While Freexian provides funding for 20% of collaborator time to spend on Debian-related work, it only covers a small fraction of contributions to time-intensive tasks like this. Since the end of the event, Stefano has been doing some work on the conference finances, and initiated the reimbursement process for travel bursaries.

Archive rebuilds on Debusine, by Stefano Rivera The recent setuptools 73 upload to Debian unstable removed the test subcommand, breaking many packages that were using python3 setup.py test in their Debian packaging. Stefano did a partial archive-rebuild using debusine.debian.net to find the regressions and file bugs. Debusine will be a powerful tool to do QA work like this for Debian in the future, but it doesn t have all the features needed to coordinate rebuild-testing, yet. They are planned to be fleshed out in the next year. In the meantime, Debusine has the building blocks to work through a queue of package building tasks and store the results, it just needs to be driven from outside the system. So, Stefano started working on a set of tools using the Debusine client API to perform archive rebuilds, found and tagged existing bugs, and filed many more.

OpenSSH GSS-API split, by Colin Watson Colin landed the first stage of the planned split of GSS-API authentication and key exchange support in Debian s OpenSSH packaging. In order to allow for smooth upgrades, the second stage will have to wait until after the Debian 13 (trixie) release; but once that s done, as upstream puts it, this substantially reduces the amount of pre-authentication attack surface exposed on your users sshd by default .

OpenSSL vs. cryptography, by Colin Watson Colin facilitated a discussion between Debian s OpenSSL team and the upstream maintainers of Python cryptography about a new incompatibility between Debian s OpenSSL packaging and cryptography s handling of OpenSSL s legacy provider, which was causing a number of build and test failures. While the issue remains open, the Debian OpenSSL maintainers have effectively reverted the change now, so it s no longer a pressing problem.

/usr-move, by Helmut Grohne There are less than 40 source packages left to move files to /usr, so what we re left with is the long tail of the transition. Rather than fix all of them, Helmut started a discussion on removing packages from unstable and filed a first batch. As libvirt is being restructured in experimental, we re handling the fallout in collaboration with its maintainer Andrea Bolognani. Since base-files validates the aliasing symlinks before upgrading, it was discovered that systemd has its own ideas with no solution as of yet. Helmut also proposed that dash checks for ineffective diversions of /bin/sh and that lintian warns about aliased files.

rebootstrap by Helmut Grohne Bootstrapping Debian for a new or existing CPU architecture still is a quite manual process. The rebootstrap project attempts to automate part of the early stage, but it still is very sensitive to changes in unstable. We had a number of fairly intrusive changes this year already. August included a little more fallout from the earlier gcc-for-host work where the C++ include search path would end up being wrong in the generated cross toolchain. A number of packages such as util-linux (twice), libxml2, libcap-ng or systemd had their stage profiles broken. e2fsprogs gained a cycle with libarchive-dev due to having gained support for creating an ext4 filesystem from a tar archive. The restructuring of glib2.0 remains an unsolved problem for now, but libxt and cdebconf should be buildable without glib2.0.

Salsa CI, by Santiago Ruano Rinc n Santiago completed the initial RISC-V support (!523) in the Salsa CI s pipeline. The main work started in July, but it was required to take into account some comments in the review (thanks to Ahmed!) and some final details in [!534]. riscv64 is the most recently supported port in Debian, which will be part of trixie. As its name suggests, the new build-riscv64 job makes it possible to test that a package successfully builds in the riscv64 architecture. The RISC-V runner (salsaci riscv64 runner 01) runs in a couple of machines generously provided by lab.rvperf.org. Debian Developers interested in running this job in their projects should enable the runner (salsaci riscv64 runner 01) in Settings / CI / Runners, and follow the instructions available at https://salsa.debian.org/salsa-ci-team/pipeline/#build-job-on-risc-v. Santiago also took part in discussions about how to optimize the build jobs and reviewed !537 to make the build-source job to only satisfy the Build-Depends and Build-Conflicts fields by Andrea Pappacoda. Thanks a lot to him!

Miscellaneous contributions
  • Stefano submitted patches for BeautifulSoup to support the latest soupsieve and lxml.
  • Stefano uploaded pypy3 7.3.17, upgrading the cPython compatibility from 3.9 to 3.10. Then ran into a GCC-14-related regression, which had to be ignored for now as it s proving hard to fix.
  • Colin released libpipeline 1.5.8 and man-db 2.13.0; the latter included foundations allowing adding an autopkgtest for man-db.
  • Colin upgraded 19 Python packages to new upstream versions (fixing 5 CVEs), fixed several other build failures, fixed a Python 3.12 compatibility issue in zope.security, and made python-nacl build reproducibly.
  • Colin tracked down test failures in python-asyncssh and Ruby resulting from certain odd /etc/hosts configurations.
  • Carles upgraded the packages python-ring-doorbell and simplemonitor to new upstream versions.
  • Carles started discussions and implementation of a tool (still in early days) named po-debconf-manager : a way for translators and reviewers to collaborate using git as a backend instead of mailing list; and submit the translations using salsa MR. More information next month.
  • Carles (dog-fooding po-debconf-manager ) reviewed debconf templates translated by a collaborator.
  • Carles reviewed and submitted the translation of apt .
  • Helmut sent 19 patches for improving cross building.
  • Helmut implemented the cross-exe-wrapper proposed by Simon McVittie for use with glib2.0.
  • Helmut detailed what it takes to make Perl s ExtUtils::PkgConfig suitable for cross building.
  • Helmut made the deletion of the root password work in debvm in all situations and implemented a test case using expect.
  • Anupa attended Debian Publicity team meeting and is moderating and posting on Debian Administrators LinkedIn group.
  • Thorsten uploaded package gutenprint to fix a FTBFS with gcc14 and package ipp-usb to fix a /usr-merge issue.
  • Santiago updated bzip2 to fix a long-standing bug that requested to include a pkg-config file. An important impact of this change is that it makes it possible to use Rust bindings for libbz2 by Sequoia, an implementation of OpenPGP.

9 September 2024

Wouter Verhelst: NBD: Write Zeroes and Rotational

The NBD protocol has grown a number of new features over the years. Unfortunately, some of those features are not (yet?) supported by the Linux kernel. I suggested a few times over the years that the maintainer of the NBD driver in the kernel, Josef Bacik, take a look at these features, but he hasn't done so; presumably he has other priorities. As with anything in the open source world, if you want it done you must do it yourself. I'd been off and on considering to work on the kernel driver so that I could implement these new features, but I never really got anywhere. A few months ago, however, Christoph Hellwig posted a patch set that reworked a number of block device drivers in the Linux kernel to a new type of API. Since the NBD mailinglist is listed in the kernel's MAINTAINERS file, this patch series were crossposted to the NBD mailinglist, too, and when I noticed that it explicitly disabled the "rotational" flag on the NBD device, I suggested to Christoph that perhaps "we" (meaning, "he") might want to vary the decision on whether a device is rotational depending on whether the NBD server signals, through the flag that exists for that very purpose, whether the device is rotational. To which he replied "Can you send a patch". That got me down the rabbit hole, and now, for the first time in the 20+ years of being a C programmer who uses Linux exclusively, I got a patch merged into the Linux kernel... twice. So, what do these things do? The first patch adds support for the ROTATIONAL flag. If the NBD server mentions that the device is rotational, it will be treated as such, and the elevator algorithm will be used to optimize accesses to the device. For the reference implementation, you can do this by adding a line "rotational = true" to the relevant section (relating to the export where you want it to be used) of the config file. It's unlikely that this will be of much benefit in most cases (most nbd-server installations will be exporting a file on a filesystem and have the elevator algorithm implemented server side and then it doesn't matter whether the device has the rotational flag set), but it's there in case you wish to use it. The second set of patches adds support for the WRITE_ZEROES command. Most devices these days allow you to tell them "please write a N zeroes starting at this offset", which is a lot more efficient than sending over a buffer of N zeroes and asking the device to do DMA to copy buffers etc etc for just zeroes. The NBD protocol has supported its own WRITE_ZEROES command for a while now, and hooking it up was reasonably simple in the end. The only problem is that it expects length values in bytes, whereas the kernel uses it in blocks. It took me a few tries to get that right -- and then I also fixed up handling of discard messages, which required the same conversion.

8 September 2024

Jacob Adams: Linux's Bedtime Routine

How does Linux move from an awake machine to a hibernating one? How does it then manage to restore all state? These questions led me to read way too much C in trying to figure out how this particular hardware/software boundary is navigated. This investigation will be split into a few parts, with the first one going from invocation of hibernation to synchronizing all filesystems to disk. This article has been written using Linux version 6.9.9, the source of which can be found in many places, but can be navigated easily through the Bootlin Elixir Cross-Referencer: https://elixir.bootlin.com/linux/v6.9.9/source Each code snippet will begin with a link to the above giving the file path and the line number of the beginning of the snippet.

A Starting Point for Investigation: /sys/power/state and /sys/power/disk These two system files exist to allow debugging of hibernation, and thus control the exact state used directly. Writing specific values to the state file controls the exact sleep mode used and disk controls the specific hibernation mode1. This is extremely handy as an entry point to understand how these systems work, since we can just follow what happens when they are written to.

Show and Store Functions These two files are defined using the power_attr macro: kernel/power/power.h:80
#define power_attr(_name) \
static struct kobj_attribute _name##_attr =     \
    .attr   =               \
        .name = __stringify(_name), \
        .mode = 0644,           \
     ,                  \
    .show   = _name##_show,         \
    .store  = _name##_store,        \
 
show is called on reads and store on writes. state_show is a little boring for our purposes, as it just prints all the available sleep states. kernel/power/main.c:657
/*
 * state - control system sleep states.
 *
 * show() returns available sleep state labels, which may be "mem", "standby",
 * "freeze" and "disk" (hibernation).
 * See Documentation/admin-guide/pm/sleep-states.rst for a description of
 * what they mean.
 *
 * store() accepts one of those strings, translates it into the proper
 * enumerated value, and initiates a suspend transition.
 */
static ssize_t state_show(struct kobject *kobj, struct kobj_attribute *attr,
			  char *buf)
 
	char *s = buf;
#ifdef CONFIG_SUSPEND
	suspend_state_t i;
	for (i = PM_SUSPEND_MIN; i < PM_SUSPEND_MAX; i++)
		if (pm_states[i])
			s += sprintf(s,"%s ", pm_states[i]);
#endif
	if (hibernation_available())
		s += sprintf(s, "disk ");
	if (s != buf)
		/* convert the last space to a newline */
		*(s-1) = '\n';
	return (s - buf);
 
state_store, however, provides our entry point. If the string disk is written to the state file, it calls hibernate(). This is our entry point. kernel/power/main.c:715
static ssize_t state_store(struct kobject *kobj, struct kobj_attribute *attr,
			   const char *buf, size_t n)
 
	suspend_state_t state;
	int error;
	error = pm_autosleep_lock();
	if (error)
		return error;
	if (pm_autosleep_state() > PM_SUSPEND_ON)  
		error = -EBUSY;
		goto out;
	 
	state = decode_state(buf, n);
	if (state < PM_SUSPEND_MAX)  
		if (state == PM_SUSPEND_MEM)
			state = mem_sleep_current;
		error = pm_suspend(state);
	  else if (state == PM_SUSPEND_MAX)  
		error = hibernate();
	  else  
		error = -EINVAL;
	 
 out:
	pm_autosleep_unlock();
	return error ? error : n;
 
kernel/power/main.c:688
static suspend_state_t decode_state(const char *buf, size_t n)
 
#ifdef CONFIG_SUSPEND
	suspend_state_t state;
#endif
	char *p;
	int len;
	p = memchr(buf, '\n', n);
	len = p ? p - buf : n;
	/* Check hibernation first. */
	if (len == 4 && str_has_prefix(buf, "disk"))
		return PM_SUSPEND_MAX;
#ifdef CONFIG_SUSPEND
	for (state = PM_SUSPEND_MIN; state < PM_SUSPEND_MAX; state++)  
		const char *label = pm_states[state];
		if (label && len == strlen(label) && !strncmp(buf, label, len))
			return state;
	 
#endif
	return PM_SUSPEND_ON;
 
Could we have figured this out just via function names? Sure, but this way we know for sure that nothing else is happening before this function is called.

Autosleep Our first detour is into the autosleep system. When checking the state above, you may notice that the kernel grabs the pm_autosleep_lock before checking the current state. autosleep is a mechanism originally from Android that sends the entire system to either suspend or hibernate whenever it is not actively working on anything. This is not enabled for most desktop configurations, since it s primarily for mobile systems and inverts the standard suspend and hibernate interactions. This system is implemented as a workqueue2 that checks the current number of wakeup events, processes and drivers that need to run3, and if there aren t any, then the system is put into the autosleep state, typically suspend. However, it could be hibernate if configured that way via /sys/power/autosleep in a similar manner to using /sys/power/state to manually enable hibernation. kernel/power/main.c:841
static ssize_t autosleep_store(struct kobject *kobj,
			       struct kobj_attribute *attr,
			       const char *buf, size_t n)
 
	suspend_state_t state = decode_state(buf, n);
	int error;
	if (state == PM_SUSPEND_ON
	    && strcmp(buf, "off") && strcmp(buf, "off\n"))
		return -EINVAL;
	if (state == PM_SUSPEND_MEM)
		state = mem_sleep_current;
	error = pm_autosleep_set_state(state);
	return error ? error : n;
 
power_attr(autosleep);
#endif /* CONFIG_PM_AUTOSLEEP */
kernel/power/autosleep.c:24
static DEFINE_MUTEX(autosleep_lock);
static struct wakeup_source *autosleep_ws;
static void try_to_suspend(struct work_struct *work)
 
	unsigned int initial_count, final_count;
	if (!pm_get_wakeup_count(&initial_count, true))
		goto out;
	mutex_lock(&autosleep_lock);
	if (!pm_save_wakeup_count(initial_count)  
		system_state != SYSTEM_RUNNING)  
		mutex_unlock(&autosleep_lock);
		goto out;
	 
	if (autosleep_state == PM_SUSPEND_ON)  
		mutex_unlock(&autosleep_lock);
		return;
	 
	if (autosleep_state >= PM_SUSPEND_MAX)
		hibernate();
	else
		pm_suspend(autosleep_state);
	mutex_unlock(&autosleep_lock);
	if (!pm_get_wakeup_count(&final_count, false))
		goto out;
	/*
	 * If the wakeup occurred for an unknown reason, wait to prevent the
	 * system from trying to suspend and waking up in a tight loop.
	 */
	if (final_count == initial_count)
		schedule_timeout_uninterruptible(HZ / 2);
 out:
	queue_up_suspend_work();
 
static DECLARE_WORK(suspend_work, try_to_suspend);
void queue_up_suspend_work(void)
 
	if (autosleep_state > PM_SUSPEND_ON)
		queue_work(autosleep_wq, &suspend_work);
 

The Steps of Hibernation

Hibernation Kernel Config It s important to note that most of the hibernate-specific functions below do nothing unless you ve defined CONFIG_HIBERNATION in your Kconfig4. As an example, hibernate itself is defined as the following if CONFIG_HIBERNATE is not set. include/linux/suspend.h:407
static inline int hibernate(void)   return -ENOSYS;  

Check if Hibernation is Available We begin by confirming that we actually can perform hibernation, via the hibernation_available function. kernel/power/hibernate.c:742
if (!hibernation_available())  
	pm_pr_dbg("Hibernation not available.\n");
	return -EPERM;
 
kernel/power/hibernate.c:92
bool hibernation_available(void)
 
	return nohibernate == 0 &&
		!security_locked_down(LOCKDOWN_HIBERNATION) &&
		!secretmem_active() && !cxl_mem_active();
 
nohibernate is controlled by the kernel command line, it s set via either nohibernate or hibernate=no. security_locked_down is a hook for Linux Security Modules to prevent hibernation. This is used to prevent hibernating to an unencrypted storage device, as specified in the manual page kernel_lockdown(7). Interestingly, either level of lockdown, integrity or confidentiality, locks down hibernation because with the ability to hibernate you can extract bascially anything from memory and even reboot into a modified kernel image. secretmem_active checks whether there is any active use of memfd_secret, and if so it prevents hibernation. memfd_secret returns a file descriptor that can be mapped into a process but is specifically unmapped from the kernel s memory space. Hibernating with memory that not even the kernel is supposed to access would expose that memory to whoever could access the hibernation image. This particular feature of secret memory was apparently controversial, though not as controversial as performance concerns around fragmentation when unmapping kernel memory (which did not end up being a real problem). cxl_mem_active just checks whether any CXL memory is active. A full explanation is provided in the commit introducing this check but there s also a shortened explanation from cxl_mem_probe that sets the relevant flag when initializing a CXL memory device. drivers/cxl/mem.c:186
* The kernel may be operating out of CXL memory on this device,
* there is no spec defined way to determine whether this device
* preserves contents over suspend, and there is no simple way
* to arrange for the suspend image to avoid CXL memory which
* would setup a circular dependency between PCI resume and save
* state restoration.

Check Compression The next check is for whether compression support is enabled, and if so whether the requested algorithm is enabled. kernel/power/hibernate.c:747
/*
 * Query for the compression algorithm support if compression is enabled.
 */
if (!nocompress)  
	strscpy(hib_comp_algo, hibernate_compressor, sizeof(hib_comp_algo));
	if (crypto_has_comp(hib_comp_algo, 0, 0) != 1)  
		pr_err("%s compression is not available\n", hib_comp_algo);
		return -EOPNOTSUPP;
	 
 
The nocompress flag is set via the hibernate command line parameter, setting hibernate=nocompress. If compression is enabled, then hibernate_compressor is copied to hib_comp_algo. This synchronizes the current requested compression setting (hibernate_compressor) with the current compression setting (hib_comp_algo). Both values are character arrays of size CRYPTO_MAX_ALG_NAME (128 in this kernel). kernel/power/hibernate.c:50
static char hibernate_compressor[CRYPTO_MAX_ALG_NAME] = CONFIG_HIBERNATION_DEF_COMP;
/*
 * Compression/decompression algorithm to be used while saving/loading
 * image to/from disk. This would later be used in 'kernel/power/swap.c'
 * to allocate comp streams.
 */
char hib_comp_algo[CRYPTO_MAX_ALG_NAME];
hibernate_compressor defaults to lzo if that algorithm is enabled, otherwise to lz4 if enabled5. It can be overwritten using the hibernate.compressor setting to either lzo or lz4. kernel/power/Kconfig:95
choice
	prompt "Default compressor"
	default HIBERNATION_COMP_LZO
	depends on HIBERNATION
config HIBERNATION_COMP_LZO
	bool "lzo"
	depends on CRYPTO_LZO
config HIBERNATION_COMP_LZ4
	bool "lz4"
	depends on CRYPTO_LZ4
endchoice
config HIBERNATION_DEF_COMP
	string
	default "lzo" if HIBERNATION_COMP_LZO
	default "lz4" if HIBERNATION_COMP_LZ4
	help
	  Default compressor to be used for hibernation.
kernel/power/hibernate.c:1425
static const char * const comp_alg_enabled[] =  
#if IS_ENABLED(CONFIG_CRYPTO_LZO)
	COMPRESSION_ALGO_LZO,
#endif
#if IS_ENABLED(CONFIG_CRYPTO_LZ4)
	COMPRESSION_ALGO_LZ4,
#endif
 ;
static int hibernate_compressor_param_set(const char *compressor,
		const struct kernel_param *kp)
 
	unsigned int sleep_flags;
	int index, ret;
	sleep_flags = lock_system_sleep();
	index = sysfs_match_string(comp_alg_enabled, compressor);
	if (index >= 0)  
		ret = param_set_copystring(comp_alg_enabled[index], kp);
		if (!ret)
			strscpy(hib_comp_algo, comp_alg_enabled[index],
				sizeof(hib_comp_algo));
	  else  
		ret = index;
	 
	unlock_system_sleep(sleep_flags);
	if (ret)
		pr_debug("Cannot set specified compressor %s\n",
			 compressor);
	return ret;
 
static const struct kernel_param_ops hibernate_compressor_param_ops =  
	.set    = hibernate_compressor_param_set,
	.get    = param_get_string,
 ;
static struct kparam_string hibernate_compressor_param_string =  
	.maxlen = sizeof(hibernate_compressor),
	.string = hibernate_compressor,
 ;
We then check whether the requested algorithm is supported via crypto_has_comp. If not, we bail out of the whole operation with EOPNOTSUPP. As part of crypto_has_comp we perform any needed initialization of the algorithm, loading kernel modules and running initialization code as needed6.

Grab Locks The next step is to grab the sleep and hibernation locks via lock_system_sleep and hibernate_acquire. kernel/power/hibernate.c:758
sleep_flags = lock_system_sleep();
/* The snapshot device should not be opened while we're running */
if (!hibernate_acquire())  
	error = -EBUSY;
	goto Unlock;
 
First, lock_system_sleep marks the current thread as not freezable, which will be important later7. It then grabs the system_transistion_mutex, which locks taking snapshots or modifying how they are taken, resuming from a hibernation image, entering any suspend state, or rebooting.

The GFP Mask The kernel also issues a warning if the gfp mask is changed via either pm_restore_gfp_mask or pm_restrict_gfp_mask without holding the system_transistion_mutex. GFP flags tell the kernel how it is permitted to handle a request for memory. include/linux/gfp_types.h:12
 * GFP flags are commonly used throughout Linux to indicate how memory
 * should be allocated.  The GFP acronym stands for get_free_pages(),
 * the underlying memory allocation function.  Not every GFP flag is
 * supported by every function which may allocate memory.
In the case of hibernation specifically we care about the IO and FS flags, which are reclaim operators, ways the system is permitted to attempt to free up memory in order to satisfy a specific request for memory. include/linux/gfp_types.h:176
 * Reclaim modifiers
 * -----------------
 * Please note that all the following flags are only applicable to sleepable
 * allocations (e.g. %GFP_NOWAIT and %GFP_ATOMIC will ignore them).
 *
 * %__GFP_IO can start physical IO.
 *
 * %__GFP_FS can call down to the low-level FS. Clearing the flag avoids the
 * allocator recursing into the filesystem which might already be holding
 * locks.
gfp_allowed_mask sets which flags are permitted to be set at the current time. As the comment below outlines, preventing these flags from being set avoids situations where the kernel needs to do I/O to allocate memory (e.g. read/writing swap8) but the devices it needs to read/write to/from are not currently available. kernel/power/main.c:24
/*
 * The following functions are used by the suspend/hibernate code to temporarily
 * change gfp_allowed_mask in order to avoid using I/O during memory allocations
 * while devices are suspended.  To avoid races with the suspend/hibernate code,
 * they should always be called with system_transition_mutex held
 * (gfp_allowed_mask also should only be modified with system_transition_mutex
 * held, unless the suspend/hibernate code is guaranteed not to run in parallel
 * with that modification).
 */
static gfp_t saved_gfp_mask;
void pm_restore_gfp_mask(void)
 
	WARN_ON(!mutex_is_locked(&system_transition_mutex));
	if (saved_gfp_mask)  
		gfp_allowed_mask = saved_gfp_mask;
		saved_gfp_mask = 0;
	 
 
void pm_restrict_gfp_mask(void)
 
	WARN_ON(!mutex_is_locked(&system_transition_mutex));
	WARN_ON(saved_gfp_mask);
	saved_gfp_mask = gfp_allowed_mask;
	gfp_allowed_mask &= ~(__GFP_IO   __GFP_FS);
 

Sleep Flags After grabbing the system_transition_mutex the kernel then returns and captures the previous state of the threads flags in sleep_flags. This is used later to remove PF_NOFREEZE if it wasn t previously set on the current thread. kernel/power/main.c:52
unsigned int lock_system_sleep(void)
 
	unsigned int flags = current->flags;
	current->flags  = PF_NOFREEZE;
	mutex_lock(&system_transition_mutex);
	return flags;
 
EXPORT_SYMBOL_GPL(lock_system_sleep);
include/linux/sched.h:1633
#define PF_NOFREEZE		0x00008000	/* This thread should not be frozen */
Then we grab the hibernate-specific semaphore to ensure no one can open a snapshot or resume from it while we perform hibernation. Additionally this lock is used to prevent hibernate_quiet_exec, which is used by the nvdimm driver to active its firmware with all processes and devices frozen, ensuring it is the only thing running at that time9. kernel/power/hibernate.c:82
bool hibernate_acquire(void)
 
	return atomic_add_unless(&hibernate_atomic, -1, 0);
 

Prepare Console The kernel next calls pm_prepare_console. This function only does anything if CONFIG_VT_CONSOLE_SLEEP has been set. This prepares the virtual terminal for a suspend state, switching away to a console used only for the suspend state if needed. kernel/power/console.c:130
void pm_prepare_console(void)
 
	if (!pm_vt_switch())
		return;
	orig_fgconsole = vt_move_to_console(SUSPEND_CONSOLE, 1);
	if (orig_fgconsole < 0)
		return;
	orig_kmsg = vt_kmsg_redirect(SUSPEND_CONSOLE);
	return;
 
The first thing is to check whether we actually need to switch the VT kernel/power/console.c:94
/*
 * There are three cases when a VT switch on suspend/resume are required:
 *   1) no driver has indicated a requirement one way or another, so preserve
 *      the old behavior
 *   2) console suspend is disabled, we want to see debug messages across
 *      suspend/resume
 *   3) any registered driver indicates it needs a VT switch
 *
 * If none of these conditions is present, meaning we have at least one driver
 * that doesn't need the switch, and none that do, we can avoid it to make
 * resume look a little prettier (and suspend too, but that's usually hidden,
 * e.g. when closing the lid on a laptop).
 */
static bool pm_vt_switch(void)
 
	struct pm_vt_switch *entry;
	bool ret = true;
	mutex_lock(&vt_switch_mutex);
	if (list_empty(&pm_vt_switch_list))
		goto out;
	if (!console_suspend_enabled)
		goto out;
	list_for_each_entry(entry, &pm_vt_switch_list, head)  
		if (entry->required)
			goto out;
	 
	ret = false;
out:
	mutex_unlock(&vt_switch_mutex);
	return ret;
 
There is an explanation of the conditions under which a switch is performed in the comment above the function, but we ll also walk through the steps here. Firstly we grab the vt_switch_mutex to ensure nothing will modify the list while we re looking at it. We then examine the pm_vt_switch_list. This list is used to indicate the drivers that require a switch during suspend. They register this requirement, or the lack thereof, via pm_vt_switch_required. kernel/power/console.c:31
/**
 * pm_vt_switch_required - indicate VT switch at suspend requirements
 * @dev: device
 * @required: if true, caller needs VT switch at suspend/resume time
 *
 * The different console drivers may or may not require VT switches across
 * suspend/resume, depending on how they handle restoring video state and
 * what may be running.
 *
 * Drivers can indicate support for switchless suspend/resume, which can
 * save time and flicker, by using this routine and passing 'false' as
 * the argument.  If any loaded driver needs VT switching, or the
 * no_console_suspend argument has been passed on the command line, VT
 * switches will occur.
 */
void pm_vt_switch_required(struct device *dev, bool required)
Next, we check console_suspend_enabled. This is set to false by the kernel parameter no_console_suspend, but defaults to true. Finally, if there are any entries in the pm_vt_switch_list, then we check to see if any of them require a VT switch. Only if none of these conditions apply, then we return false. If a VT switch is in fact required, then we move first the currently active virtual terminal/console10 (vt_move_to_console) and then the current location of kernel messages (vt_kmsg_redirect) to the SUSPEND_CONSOLE. The SUSPEND_CONSOLE is the last entry in the list of possible consoles, and appears to just be a black hole to throw away messages. kernel/power/console.c:16
#define SUSPEND_CONSOLE	(MAX_NR_CONSOLES-1)
Interestingly, these are separate functions because you can use TIOCL_SETKMSGREDIRECT (an ioctl11) to send kernel messages to a specific virtual terminal, but by default its the same as the currently active console. The locations of the previously active console and the previous kernel messages location are stored in orig_fgconsole and orig_kmsg, to restore the state of the console and kernel messages after the machine wakes up again. Interestingly, this means orig_fgconsole also ends up storing any errors, so has to be checked to ensure it s not less than zero before we try to do anything with the kernel messages on both suspend and resume. drivers/tty/vt/vt_ioctl.c:1268
/* Perform a kernel triggered VT switch for suspend/resume */
static int disable_vt_switch;
int vt_move_to_console(unsigned int vt, int alloc)
 
	int prev;
	console_lock();
	/* Graphics mode - up to X */
	if (disable_vt_switch)  
		console_unlock();
		return 0;
	 
	prev = fg_console;
	if (alloc && vc_allocate(vt))  
		/* we can't have a free VC for now. Too bad,
		 * we don't want to mess the screen for now. */
		console_unlock();
		return -ENOSPC;
	 
	if (set_console(vt))  
		/*
		 * We're unable to switch to the SUSPEND_CONSOLE.
		 * Let the calling function know so it can decide
		 * what to do.
		 */
		console_unlock();
		return -EIO;
	 
	console_unlock();
	if (vt_waitactive(vt + 1))  
		pr_debug("Suspend: Can't switch VCs.");
		return -EINTR;
	 
	return prev;
 
Unlike most other locking functions we ve seen so far, console_lock needs to be careful to ensure nothing else is panicking and needs to dump to the console before grabbing the semaphore for the console and setting a couple flags.

Panics Panics are tracked via an atomic integer set to the id of the processor currently panicking. kernel/printk/printk.c:2649
/**
 * console_lock - block the console subsystem from printing
 *
 * Acquires a lock which guarantees that no consoles will
 * be in or enter their write() callback.
 *
 * Can sleep, returns nothing.
 */
void console_lock(void)
 
	might_sleep();
	/* On panic, the console_lock must be left to the panic cpu. */
	while (other_cpu_in_panic())
		msleep(1000);
	down_console_sem();
	console_locked = 1;
	console_may_schedule = 1;
 
EXPORT_SYMBOL(console_lock);
kernel/printk/printk.c:362
/*
 * Return true if a panic is in progress on a remote CPU.
 *
 * On true, the local CPU should immediately release any printing resources
 * that may be needed by the panic CPU.
 */
bool other_cpu_in_panic(void)
 
	return (panic_in_progress() && !this_cpu_in_panic());
 
kernel/printk/printk.c:345
static bool panic_in_progress(void)
 
	return unlikely(atomic_read(&panic_cpu) != PANIC_CPU_INVALID);
 
kernel/printk/printk.c:350
/* Return true if a panic is in progress on the current CPU. */
bool this_cpu_in_panic(void)
 
	/*
	 * We can use raw_smp_processor_id() here because it is impossible for
	 * the task to be migrated to the panic_cpu, or away from it. If
	 * panic_cpu has already been set, and we're not currently executing on
	 * that CPU, then we never will be.
	 */
	return unlikely(atomic_read(&panic_cpu) == raw_smp_processor_id());
 
console_locked is a debug value, used to indicate that the lock should be held, and our first indication that this whole virtual terminal system is more complex than might initially be expected. kernel/printk/printk.c:373
/*
 * This is used for debugging the mess that is the VT code by
 * keeping track if we have the console semaphore held. It's
 * definitely not the perfect debug tool (we don't know if _WE_
 * hold it and are racing, but it helps tracking those weird code
 * paths in the console code where we end up in places I want
 * locked without the console semaphore held).
 */
static int console_locked;
console_may_schedule is used to see if we are permitted to sleep and schedule other work while we hold this lock. As we ll see later, the virtual terminal subsystem is not re-entrant, so there s all sorts of hacks in here to ensure we don t leave important code sections that can t be safely resumed.

Disable VT Switch As the comment below lays out, when another program is handling graphical display anyway, there s no need to do any of this, so the kernel provides a switch to turn the whole thing off. Interestingly, this appears to only be used by three drivers, so the specific hardware support required must not be particularly common.
drivers/gpu/drm/omapdrm/dss
drivers/video/fbdev/geode
drivers/video/fbdev/omap2
drivers/tty/vt/vt_ioctl.c:1308
/*
 * Normally during a suspend, we allocate a new console and switch to it.
 * When we resume, we switch back to the original console.  This switch
 * can be slow, so on systems where the framebuffer can handle restoration
 * of video registers anyways, there's little point in doing the console
 * switch.  This function allows you to disable it by passing it '0'.
 */
void pm_set_vt_switch(int do_switch)
 
	console_lock();
	disable_vt_switch = !do_switch;
	console_unlock();
 
EXPORT_SYMBOL(pm_set_vt_switch);
The rest of the vt_switch_console function is pretty normal, however, simply allocating space if needed to create the requested virtual terminal and then setting the current virtual terminal via set_console.

Virtual Terminal Set Console With set_console, we begin (as if we haven t been already) to enter the madness that is the virtual terminal subsystem. As mentioned previously, modifications to its state must be made very carefully, as other stuff happening at the same time could create complete messes. All this to say, calling set_console does not actually perform any work to change the state of the current console. Instead it indicates what changes it wants and then schedules that work. drivers/tty/vt/vt.c:3153
int set_console(int nr)
 
	struct vc_data *vc = vc_cons[fg_console].d;
	if (!vc_cons_allocated(nr)   vt_dont_switch  
		(vc->vt_mode.mode == VT_AUTO && vc->vc_mode == KD_GRAPHICS))  
		/*
		 * Console switch will fail in console_callback() or
		 * change_console() so there is no point scheduling
		 * the callback
		 *
		 * Existing set_console() users don't check the return
		 * value so this shouldn't break anything
		 */
		return -EINVAL;
	 
	want_console = nr;
	schedule_console_callback();
	return 0;
 
The check for vc->vc_mode == KD_GRAPHICS is where most end-user graphical desktops will bail out of this change, as they re in graphics mode and don t need to switch away to the suspend console. vt_dont_switch is a flag used by the ioctls11 VT_LOCKSWITCH and VT_UNLOCKSWITCH to prevent the system from switching virtual terminal devices when the user has explicitly locked it. VT_AUTO is a flag indicating that automatic virtual terminal switching is enabled12, and thus deliberate switching to a suspend terminal is not required. However, if you do run your machine from a virtual terminal, then we indicate to the system that we want to change to the requested virtual terminal via the want_console variable and schedule a callback via schedule_console_callback. drivers/tty/vt/vt.c:315
void schedule_console_callback(void)
 
	schedule_work(&console_work);
 
console_work is a workqueue2 that will execute the given task asynchronously.

Console Callback drivers/tty/vt/vt.c:3109
/*
 * This is the console switching callback.
 *
 * Doing console switching in a process context allows
 * us to do the switches asynchronously (needed when we want
 * to switch due to a keyboard interrupt).  Synchronization
 * with other console code and prevention of re-entrancy is
 * ensured with console_lock.
 */
static void console_callback(struct work_struct *ignored)
 
	console_lock();
	if (want_console >= 0)  
		if (want_console != fg_console &&
		    vc_cons_allocated(want_console))  
			hide_cursor(vc_cons[fg_console].d);
			change_console(vc_cons[want_console].d);
			/* we only changed when the console had already
			   been allocated - a new console is not created
			   in an interrupt routine */
		 
		want_console = -1;
	 
...
console_callback first looks to see if there is a console change wanted via want_console and then changes to it if it s not the current console and has been allocated already. We do first remove any cursor state with hide_cursor. drivers/tty/vt/vt.c:841
static void hide_cursor(struct vc_data *vc)
 
	if (vc_is_sel(vc))
		clear_selection();
	vc->vc_sw->con_cursor(vc, false);
	hide_softcursor(vc);
 
A full dive into the tty driver is a task for another time, but this should give a general sense of how this system interacts with hibernation.

Notify Power Management Call Chain kernel/power/hibernate.c:767
pm_notifier_call_chain_robust(PM_HIBERNATION_PREPARE, PM_POST_HIBERNATION)
This will call a chain of power management callbacks, passing first PM_HIBERNATION_PREPARE and then PM_POST_HIBERNATION on startup or on error with another callback. kernel/power/main.c:98
int pm_notifier_call_chain_robust(unsigned long val_up, unsigned long val_down)
 
	int ret;
	ret = blocking_notifier_call_chain_robust(&pm_chain_head, val_up, val_down, NULL);
	return notifier_to_errno(ret);
 
The power management notifier is a blocking notifier chain, which means it has the following properties. include/linux/notifier.h:23
 *	Blocking notifier chains: Chain callbacks run in process context.
 *		Callouts are allowed to block.
The callback chain is a linked list with each entry containing a priority and a function to call. The function technically takes in a data value, but it is always NULL for the power management chain. include/linux/notifier.h:49
struct notifier_block;
typedef	int (*notifier_fn_t)(struct notifier_block *nb,
			unsigned long action, void *data);
struct notifier_block  
	notifier_fn_t notifier_call;
	struct notifier_block __rcu *next;
	int priority;
 ;
The head of the linked list is protected by a read-write semaphore. include/linux/notifier.h:65
struct blocking_notifier_head  
	struct rw_semaphore rwsem;
	struct notifier_block __rcu *head;
 ;
Because it is prioritized, appending to the list requires walking it until an item with lower13 priority is found to insert the current item before. kernel/notifier.c:252
/*
 *	Blocking notifier chain routines.  All access to the chain is
 *	synchronized by an rwsem.
 */
static int __blocking_notifier_chain_register(struct blocking_notifier_head *nh,
					      struct notifier_block *n,
					      bool unique_priority)
 
	int ret;
	/*
	 * This code gets used during boot-up, when task switching is
	 * not yet working and interrupts must remain disabled.  At
	 * such times we must not call down_write().
	 */
	if (unlikely(system_state == SYSTEM_BOOTING))
		return notifier_chain_register(&nh->head, n, unique_priority);
	down_write(&nh->rwsem);
	ret = notifier_chain_register(&nh->head, n, unique_priority);
	up_write(&nh->rwsem);
	return ret;
 
kernel/notifier.c:20
/*
 *	Notifier chain core routines.  The exported routines below
 *	are layered on top of these, with appropriate locking added.
 */
static int notifier_chain_register(struct notifier_block **nl,
				   struct notifier_block *n,
				   bool unique_priority)
 
	while ((*nl) != NULL)  
		if (unlikely((*nl) == n))  
			WARN(1, "notifier callback %ps already registered",
			     n->notifier_call);
			return -EEXIST;
		 
		if (n->priority > (*nl)->priority)
			break;
		if (n->priority == (*nl)->priority && unique_priority)
			return -EBUSY;
		nl = &((*nl)->next);
	 
	n->next = *nl;
	rcu_assign_pointer(*nl, n);
	trace_notifier_register((void *)n->notifier_call);
	return 0;
 
Each callback can return one of a series of options. include/linux/notifier.h:18
#define NOTIFY_DONE		0x0000		/* Don't care */
#define NOTIFY_OK		0x0001		/* Suits me */
#define NOTIFY_STOP_MASK	0x8000		/* Don't call further */
#define NOTIFY_BAD		(NOTIFY_STOP_MASK 0x0002)
						/* Bad/Veto action */
When notifying the chain, if a function returns STOP or BAD then the previous parts of the chain are called again with PM_POST_HIBERNATION14 and an error is returned. kernel/notifier.c:107
/**
 * notifier_call_chain_robust - Inform the registered notifiers about an event
 *                              and rollback on error.
 * @nl:		Pointer to head of the blocking notifier chain
 * @val_up:	Value passed unmodified to the notifier function
 * @val_down:	Value passed unmodified to the notifier function when recovering
 *              from an error on @val_up
 * @v:		Pointer passed unmodified to the notifier function
 *
 * NOTE:	It is important the @nl chain doesn't change between the two
 *		invocations of notifier_call_chain() such that we visit the
 *		exact same notifier callbacks; this rules out any RCU usage.
 *
 * Return:	the return value of the @val_up call.
 */
static int notifier_call_chain_robust(struct notifier_block **nl,
				     unsigned long val_up, unsigned long val_down,
				     void *v)
 
	int ret, nr = 0;
	ret = notifier_call_chain(nl, val_up, v, -1, &nr);
	if (ret & NOTIFY_STOP_MASK)
		notifier_call_chain(nl, val_down, v, nr-1, NULL);
	return ret;
 
Each of these callbacks tends to be quite driver-specific, so we ll cease discussion of this here.

Sync Filesystems The next step is to ensure all filesystems have been synchronized to disk. This is performed via a simple helper function that times how long the full synchronize operation, ksys_sync takes. kernel/power/main.c:69
void ksys_sync_helper(void)
 
	ktime_t start;
	long elapsed_msecs;
	start = ktime_get();
	ksys_sync();
	elapsed_msecs = ktime_to_ms(ktime_sub(ktime_get(), start));
	pr_info("Filesystems sync: %ld.%03ld seconds\n",
		elapsed_msecs / MSEC_PER_SEC, elapsed_msecs % MSEC_PER_SEC);
 
EXPORT_SYMBOL_GPL(ksys_sync_helper);
ksys_sync wakes and instructs a set of flusher threads to write out every filesystem, first their inodes15, then the full filesystem, and then finally all block devices, to ensure all pages are written out to disk. fs/sync.c:87
/*
 * Sync everything. We start by waking flusher threads so that most of
 * writeback runs on all devices in parallel. Then we sync all inodes reliably
 * which effectively also waits for all flusher threads to finish doing
 * writeback. At this point all data is on disk so metadata should be stable
 * and we tell filesystems to sync their metadata via ->sync_fs() calls.
 * Finally, we writeout all block devices because some filesystems (e.g. ext2)
 * just write metadata (such as inodes or bitmaps) to block device page cache
 * and do not sync it on their own in ->sync_fs().
 */
void ksys_sync(void)
 
	int nowait = 0, wait = 1;
	wakeup_flusher_threads(WB_REASON_SYNC);
	iterate_supers(sync_inodes_one_sb, NULL);
	iterate_supers(sync_fs_one_sb, &nowait);
	iterate_supers(sync_fs_one_sb, &wait);
	sync_bdevs(false);
	sync_bdevs(true);
	if (unlikely(laptop_mode))
		laptop_sync_completion();
 
It follows an interesting pattern of using iterate_supers to run both sync_inodes_one_sb and then sync_fs_one_sb on each known filesystem16. It also calls both sync_fs_one_sb and sync_bdevs twice, first without waiting for any operations to complete and then again waiting for completion17. When laptop_mode is enabled the system runs additional filesystem synchronization operations after the specified delay without any writes. mm/page-writeback.c:111
/*
 * Flag that puts the machine in "laptop mode". Doubles as a timeout in jiffies:
 * a full sync is triggered after this time elapses without any disk activity.
 */
int laptop_mode;
EXPORT_SYMBOL(laptop_mode);
However, when running a filesystem synchronization operation, the system will add an additional timer to schedule more writes after the laptop_mode delay. We don t want the state of the system to change at all while performing hibernation, so we cancel those timers. mm/page-writeback.c:2198
/*
 * We're in laptop mode and we've just synced. The sync's writes will have
 * caused another writeback to be scheduled by laptop_io_completion.
 * Nothing needs to be written back anymore, so we unschedule the writeback.
 */
void laptop_sync_completion(void)
 
	struct backing_dev_info *bdi;
	rcu_read_lock();
	list_for_each_entry_rcu(bdi, &bdi_list, bdi_list)
		del_timer(&bdi->laptop_mode_wb_timer);
	rcu_read_unlock();
 
As a side note, the ksys_sync function is simply called when the system call sync is used. fs/sync.c:111
SYSCALL_DEFINE0(sync)
 
	ksys_sync();
	return 0;
 

The End of Preparation With that the system has finished preparations for hibernation. This is a somewhat arbitrary cutoff, but next the system will begin a full freeze of userspace to then dump memory out to an image and finally to perform hibernation. All this will be covered in future articles!
  1. Hibernation modes are outside of scope for this article, see the previous article for a high-level description of the different types of hibernation.
  2. Workqueues are a mechanism for running asynchronous tasks. A full description of them is a task for another time, but the kernel documentation on them is available here: https://www.kernel.org/doc/html/v6.9/core-api/workqueue.html 2
  3. This is a bit of an oversimplification, but since this isn t the main focus of this article this description has been kept to a higher level.
  4. Kconfig is Linux s build configuration system that sets many different macros to enable/disable various features.
  5. Kconfig defaults to the first default found
  6. Including checking whether the algorithm is larval? Which appears to indicate that it requires additional setup, but is an interesting choice of name for such a state.
  7. Specifically when we get to process freezing, which we ll get to in the next article in this series.
  8. Swap space is outside the scope of this article, but in short it is a buffer on disk that the kernel uses to store memory not current in use to free up space for other things. See Swap Management for more details.
  9. The code for this is lengthy and tangential, thus it has not been included here. If you re curious about the details of this, see kernel/power/hibernate.c:858 for the details of hibernate_quiet_exec, and drivers/nvdimm/core.c:451 for how it is used in nvdimm.
  10. Annoyingly this code appears to use the terms console and virtual terminal interchangeably.
  11. ioctls are special device-specific I/O operations that permit performing actions outside of the standard file interactions of read/write/seek/etc. 2
  12. I m not entirely clear on how this flag works, this subsystem is particularly complex.
  13. In this case a higher number is higher priority.
  14. Or whatever the caller passes as val_down, but in this case we re specifically looking at how this is used in hibernation.
  15. An inode refers to a particular file or directory within the filesystem. See Wikipedia for more details.
  16. Each active filesystem is registed with the kernel through a structure known as a superblock, which contains references to all the inodes contained within the filesystem, as well as function pointers to perform the various required operations, like sync.
  17. I m including minimal code in this section, as I m not looking to deep dive into the filesystem code at this time.

Next.

Previous.