Search Results: "ecki"

30 November 2024

Enrico Zini: New laptop setup

My new laptop Framework (Framework Laptop 13 DIY Edition (AMD Ryzen 7040 Series)) arrived, all the hardware works out of the box on Debian Stable, and I'm very happy indeed. This post has the notes of all the provisioning steps, so that I can replicate them again if needed. Installing Debian 12 Debian 12's installer just worked, with Secure Boot enabled no less, which was nice. The only glitch is an argument with the guided partitioner, which was uncooperative: I have been hit before by a /boot partition too small, and I wanted 1G of EFI and 1G of boot, while the partitioner decided that 512Mb were good enough. Frustratingly, there was no way of changing that, nor I found how to get more than 1G of swap, as I wanted enough swap to fit RAM for hybernation. I let it install the way it pleased, then I booted into grml for a round of gparted. The tricky part of that was resizing the root btrfs filesystem, which is in an LV, which is in a VG, which is in a PV, which is in LUKS. Here's a cheatsheet. Shrink partitions: note that I used an increasing size because I don't trust that each tool has a way of representing sizes that aligns to the byte. I'd be happy to find out that they do, but didn't want to find out the hard way that they didn't. Resize with gparted: Move and resize partitions at will. Shrinking first means it all takes a reasonable time, and you won't have to wait almost an hour for a terabyte-sized empty partition to be carefully moved around. Don't ask me why I know. Regrow partitions: Setup gnome When I get a new laptop I have a tradition of trying to make it work with Gnome and Wayland, which normally ended up in frustration and a swift move to X11 and Xfce: I have a lot of long-time muscle memory involved in how I use a computer, and it needs to fit like prosthetics. I can learn to do a thing or two in a different way, but any papercut that makes me break flow and I cannot fix will soon become a dealbreaker. This applies to Gnome as present in Debian Stable. General Gnome settings tips I can list all available settings with:
gsettings list-recursively
which is handy for grepping things like hotkeys. I can manually set a value with:
gsettings set <schema> <key> <value>
and I can reset it to its default with:
gsettings reset <schema> <key>
Some applications like Gnome Terminal use "relocatable schemas", and in those cases you also need to specify a path, which can be discovered using dconf-editor:
gsettings set <schema>:<path> <key> <value>
Install appindicators First thing first: app install gnome-shell-extension-appindicator, log out and in again: the Gnome Extension manager won't see the extension as available until you restart the whole session. I have no idea why that is so, and I have no idea why a notification area is not present in Gnome by default, but at least now I can get one. Fix font sizes across monitors My laptop screen and monitor have significantly different DPIs, so:
gsettings set org.gnome.mutter experimental-features "['scale-monitor-framebuffer']"
And in Settings/Displays, set a reasonable scaling factor for each display. Disable Alt/Super as hotkey for the Overlay Seeing all my screen reorganize and reshuffle every time I accidentally press Alt leaves me disoriented and seasick:
gsettings set org.gnome.mutter overlay-key ''
Focus-follows-mouse and Raise-or-lower My desktop is like my desktop: messy and cluttered. I have lots of overlapping window and I switch between them by moving the focus with the mouse, and when the visible part is not enough I have a handy hotkey mapped to raise-or-lower to bring forward what I need and send back what I don't need anymore. Thankfully Gnome can be configured that way, with some work: This almost worked, but sometimes it didn't do what I wanted, like I expected to find a window to the front but another window disappeared instead. I eventually figured that by default Gnome delays focus changes by a perceivable amount, which is evidently too slow for the way I move around windows. The amount cannot be shortened, but it can be removed with:
gsettings set org.gnome.shell.overrides focus-change-on-pointer-rest false
Mouse and keyboard shortcuts Gnome has lots of preconfigured sounds, shortcuts, animations and other distractions that I do not need. They also either interfere with key combinations I want to use in terminals, or cause accidental window moves or resizes that make me break flow, or otherwise provide sensory overstimulation that really does not work for me. It was a lot of work, and these are the steps I used to get rid of most of them. Disable Super+N combinations that accidentally launch a questionable choice of programs:
for i in  seq 1 9 ; do gsettings set org.gnome.shell.keybindings switch-to-application-$i '[]'; done
Gnome-Shell settings: gnome-tweak-tool settings: Gnome Terminal settings: Thankfully 10 years ago I took notes on how to customize Gnome Terminal, and they're still mostly valid: Other hotkeys that got in my way and had to disable the hard way:
for n in  seq 1 12 ; do gsettings set org.gnome.mutter.wayland.keybindings switch-to-session-$n '[]'; done
gsettings set org.gnome.desktop.wm.keybindings move-to-workspace-down '[]'
gsettings set org.gnome.desktop.wm.keybindings move-to-workspace-up '[]'
gsettings set org.gnome.desktop.wm.keybindings panel-main-menu '[]'
gsettings set org.gnome.desktop.interface menubar-accel '[]'
Note that even after removing F10 from being bound to menubar-accel, and after having to gsetting binding to F10 as is:
$ gsettings list-recursively grep F10
org.gnome.Terminal.Legacy.Keybindings switch-to-tab-10 '<Alt>F10'
I still cannot quit Midnight Commander using F10 in a terminal, as that moves the focus in the window title bar. This looks like a Gnome bug, and a very frustrating one for me. Appearance Gnome-Shell settings: gnome-tweak-tool settings: Gnome Terminal settings: Other decluttering and tweaks Gnome Shell Settings: Set a delay between screen blank and lock: when the screen goes blank, it is important for me to be able to say "nope, don't blank yet!", and maybe switch on caffeine mode during a presentation without needing to type my password in front of cameras. No UI for this, but at least gsettings has it:
gsettings set org.gnome.desktop.screensaver lock-delay 30
Extensions I enabled the Applications Menu extension, since it's impossible to find less famous applications in the Overview without knowing in advance how they're named in the desktop. This stole a precious hotkey, which I had to disable in gsettings:
gsettings set org.gnome.shell.extensions.apps-menu apps-menu-toggle-menu '[]'
I also enabled: I didn't go and look for Gnome Shell extentions outside what is packaged in Debian, as I'm very wary about running JavaScript code randomly downloaded from the internet with full access over my data and desktop interaction. I also took care of checking that the Gnome Shell Extensions web page complains about the lacking "GNOME Shell integration" browser extension, because the web browser shouldn't be allowed to download random JavaScript from the internet and run it with full local access. Yuck. Run program dialog The default run program dialog is almost, but not quite, totally useless to me, as it does not provide completion, not even just for executable names in path, and so it ends up being faster to open a new terminal window and type in there. It's possible, in Gnome Shell settings, to bind a custom command to a key. The resulting keybinding will now show up in gsettings, though it can be located in a more circuitous way by grepping first, and then looking up the resulting path in dconf-editor:
gsettings list-recursively grep custom-key
org.gnome.settings-daemon.plugins.media-keys custom-keybindings ['/org/gnome/settings-daemon/plugins/media-keys/custom-keybindings/custom0/']
I tried out several run dialogs present in Debian, with sad results, possibly due to most of them not being tested on wayland: Both gmrun and xfrun4 seem like workable options, with xfrun4 being customizable with convenient shortcut prefixes, so xfrun4 it is. TODO I'll try to update these notes as I investigate. Conclusion so far I now have something that seems to work for me. A few papercuts to figure out still, but they seem manageable. It all feels a lot harder than it should be: for something intended to be minimal, Gnome defaults feel horribly cluttered and noisy to me, continuosly getting in the way of getting things done until tamed into being out of the way unless called for. It felt like a device that boots into flashy demo mode, which needs to be switched off before actual use. Thankfully it can be switched off, and now I have notes to do it again if needed. gsettings oddly feels to me like a better UI than the interactive settings managers: it's more comprehensive, more discoverable, more scriptable, and more stable across releases. Most of the Q&A I found on the internet with guidance given on the UI was obsolete, while when given with gsettings command lines it kept being relevant. I also have the feeling that these notes would be easier to understand and follow if given as gsettings invocations instead of descriptions of UI navigation paths. At some point I'll upgrade to Trixie and reevaluate things, and these notes will be a useful checklist for that. Fingers crossed that this time I'll manage to stay on Wayland. If not, I know that Xfce is still there for me, and I can trust it to be both helpful and good at not getting in the way of my work.

20 November 2024

Russell Coker: Solving Spam and Phishing for Corporations

Centralisation and Corporations An advantage of a medium to large company is that it permits specialisation. For example I m currently working in the IT department of a medium sized company and because we have standardised hardware (Dell Latitude and Precision laptops, Dell Precision Tower workstations, and Dell PowerEdge servers) and I am involved in fixing all Linux compatibility issues on that I can fix most problems in a small fraction of the time that I would take to fix on a random computer. There is scope for a lot of debate about the extent to which companies should standardise and centralise things. But for computer problems which can escalate quickly from minor to serious if not approached in the correct manner it s clear that a good deal of centralisation is appropriate. For people doing technical computer work such as programming there s a large portion of the employees who are computer hobbyists who like to fiddle with computers. But if the support system is run well even they will appreciate having computers just work most of the time and for a large portion of the failures having someone immediately recognise the problem, like the issues with NVidia drivers that I have documented so that first line support can implement workarounds without the need for a lengthy investigation. A big problem with email in the modern Internet is the prevalence of Phishing scams. The current corporate approach to this is to send out test Phishing email to people and then force computer security training on everyone who clicks on them. One problem with this is that attackers only need to fool one person on one occasion and when you have hundreds of people doing something on rare occasions that s not part of their core work they will periodically get it wrong. When every test Phishing run finds several people who need extra training it seems obvious to me that this isn t a solution that s working well. I will concede that the majority of people who click on the test Phishing email would probably realise their mistake if asked to enter the password for the corporate email system, but I think it s still clear that this isn t a great solution. Let s imagine for the sake of discussion that everyone in a company was 100% accurate at identifying Phishing email and other scam email, if that was the case would the problem be solved? I believe that even in that hypothetical case it would not be a solved problem due to the wasted time and concentration. People can spend minutes determining if a single email is legitimate. On many occasions I have had relatives and clients forward me email because they are unsure if it s valid, it s great that they seek expert advice when they are unsure about things but it would be better if they didn t have to go to that effort. What we ideally want to do is centralise the anti-Phishing and anti-spam work to a small group of people who are actually good at it and who can recognise patterns by seeing larger quantities of spam. When a spam or Phishing message is sent to 600 people in a company you don t want 600 people to individually consider it, you want one person to recognise it and delete/block all 600. If 600 people each spend one minute considering the matter then that s 10 work hours wasted! The Rationale for Human Filtering For personal email human filtering usually isn t viable because people want privacy. But corporate email isn t private, it s expected that the company can read it under certain circumstances (in most jurisdictions) and having email open in public areas of the office where colleagues might see it is expected. You can visit gmail.com on your lunch break to read personal email but every company policy (and common sense) says to not have actually private correspondence on company systems. The amount of time spent by reception staff in sorting out such email would be less than that taken by individuals. When someone sends a spam to everyone in the company instead of 500 people each spending a couple of minutes working out whether it s legit you have one person who s good at recognising spam (because it s their job) who clicks on a remove mail from this sender from all mailboxes button and 500 messages are deleted and the sender is blocked. Delaying email would be a concern. It s standard practice for CEOs (and C*Os at larger companies) to have a PA receive their email and forward the ones that need their attention. So human vetting of email can work without unreasonable delays. If we had someone checking all email for the entire company probably email to the senior people would never get noticeably delayed and while people like me would get their mail delayed on occasion people doing technical work generally don t have notifications turned on for email because it s a distraction and a fast response isn t needed. There are a few senders where fast response is required, which is mostly corporations sending a click this link within 10 minutes to confirm your password change email. Setting up rules for all such senders that are relevant to work wouldn t be difficult to do. How to Solve This Spam and Phishing became serious problems over 20 years ago and we have had 20 years of evolution of email filtering which still hasn t solved the problem. The vast majority of email addresses in use are run by major managed service providers and they haven t managed to filter out spam/phishing mail effectively so I think we should assume that it s not going to be solved by filtering. There is talk about what AI technology might do for filtering spam/phishing but that same technology can product better crafted hostile email to avoid filters. An additional complication for corporate email filtering is that some criteria that are used to filter personal email don t apply to corporate mail. If someone sends email to me personally about millions of dollars then it s obviously not legit. If someone sends email to a company then it could be legit. Companies routinely have people emailing potential clients about how their products can save millions of dollars and make purchases over a million dollars. This is not a problem that s impossible to solve, it s just an extra difficulty that reduces the efficiency of filters. It seems to me that the best solution to the problem involves having all mail filtered by a human. A company could configure their mail server to not accept direct external mail for any employee s address. Then people could email files to colleagues etc without any restriction but spam and phishing wouldn t be a problem. The issue is how to manage inbound mail. One possibility is to have addresses of the form it+russell.coker@example.com (for me as an employee in the IT department) and you would have a team of people who would read those mailboxes and forward mail to the right people if it seemed legit. Having addresses like it+russell.coker means that all mail to the IT department would be received into folders of the same account and they could be filtered by someone with suitable security level and not require any special configuration of the mail server. So the person who read the is mailbox would have a folder named russell.coker receiving mail addressed to me. The system could be configured to automate the processing of mail from known good addresses (and even domains), so they could just put in a rule saying that when Dell sends DMARC authenticated mail to is+$USER it gets immediately directed to $USER. This is the sort of thing that can be automated in the email client (mail filtering is becoming a common feature in MUAs). For a FOSS implementation of such things the server side of it (including extracting account data from a directory to determine which department a user is in) would be about a day s work and then an option would be to modify a webmail program to have extra functionality for approving senders and sending change requests to the server to automatically direct future mail from the same sender. As an aside I have previously worked on a project that had a modified version of the Horde webmail system to do this sort of thing for challenge-response email and adding certain automated messages to the allow-list. The Change One of the first things to do is configuring the system to add every recipient of an outbound message to the allow list for receiving a reply. Having a script go through the sent-mail folders of all accounts and adding the recipients to the allow lists would be easy and catch the common cases. But even with processing the sent mail folders going from a working system without such things to a system like this will take some time for the initial work of adding addresses to the allow lists, particularly for domain wide additions of all the sites that send password confirmation messages. You would need rules to direct inbound mail to the old addresses to the new style and then address a huge amount of mail that needs to be categorised. If you have 600 employees and the average amount of time taken on the first day is 10 minutes per user then that s 100 hours of work, 12 work days. If you had everyone from the IT department, reception, and executive assistants working on it that would be viable. After about a week there wouldn t be much work involved in maintaining it. Then after that it would be a net win for the company. The Benefits If the average employee spends one minute a day dealing with spam and phishing email then with 600 employees that s 10 hours of wasted time per day. Effectively wasting one employee s work! I m sure that s the low end of the range, 5 minutes average per day doesn t seem unreasonable especially when people are unsure about phishing email and send it to Slack so multiple employees spend time analysing it. So you could have 5 employees being wasted by hostile email and avoiding that would take a fraction of the time of a few people adding up to less than an hour of total work per day. Then there s the training time for phishing mail. Instead of having every employee spend half an hour doing email security training every few months (that s 300 hours or 7.5 working weeks every time you do it) you just train the few experts. In addition to saving time there are significant security benefits to having experts deal with possibly hostile email. Someone who deals with a lot of phishing email is much less likely to be tricked. Will They Do It? They probably won t do it any time soon. I don t think it s expensive enough for companies yet. Maybe government agencies already have equivalent measures in place, but for regular corporations it s probably regarded as too difficult to change anything and the costs aren t obvious. I have been unsuccessful in suggesting that managers spend slightly more on computer hardware to save significant amounts of worker time for 30 years.

18 November 2024

Russ Allbery: Review: Delilah Green Doesn't Care

Review: Delilah Green Doesn't Care, by Ashley Herring Blake
Series: Bright Falls #1
Publisher: Jove
Copyright: February 2022
ISBN: 0-593-33641-0
Format: Kindle
Pages: 374
Delilah Green Doesn't Care is a sapphic romance novel. It's the first of a trilogy, although in the normal romance series fashion each book follows a different protagonist and has its own happy ending. It is apparently classified as romantic comedy, which did not occur to me while reading but which I suppose I can see in retrospect. Delilah Green got the hell out of Bright Falls as soon as she could and tried not to look back. After her father died, her step-mother lavished all of her perfectionist attention on her overachiever step-sister, leaving Delilah feeling like an unwanted ghost. She escaped to New York where there was space for a queer woman with an acerbic personality and a burgeoning career in photography. Her estranged step-sister's upcoming wedding was not a good enough reason to return to the stifling small town of her childhood. The pay for photographing the wedding was, since it amounted to three months of rent and trying to sell photographs in galleries was not exactly a steady living. So back to Bright Falls Delilah goes. Claire never left Bright Falls. She got pregnant young and ended up with a different life than she expected, although not a bad one. Now she's raising her daughter as a single mom, running the town bookstore, and dealing with her unreliable ex. She and Iris are Astrid Parker's best friends and have been since fifth grade, which means she wants to be happy for Astrid's upcoming wedding. There's only one problem: the groom. He's a controlling, boorish ass, but worse, Astrid seems to turn into a different person around him. Someone Claire doesn't like. Then, to make life even more complicated, Claire tries to pick up Astrid's estranged step-sister in Bright Falls's bar without recognizing her. I have a lot of things to say about this novel, but here's the core of my review: I started this book at 4pm on a Saturday because I hadn't read anything so far that day and wanted to at least start a book. I finished it at 11pm, having blown off everything else I had intended to do that evening, completely unable to put it down. It turns out there is a specific type of romance novel protagonist that I absolutely adore: the sarcastic, confident, no-bullshit character who is willing to pick the fights and say the things that the other overly polite and anxious characters aren't able to get out. Astrid does not react well to criticism, for reasons that are far more complicated than it may first appear, and Claire and Iris have been dancing around the obvious problems with her surprise engagement. As the title says, Delilah thinks she doesn't care: she's here to do a job and get out, and maybe she'll get to tweak her annoying step-sister a bit in the process. But that also means that she is unwilling to play along with Astrid's obsessively controlling mother or her obnoxious fiance, and thus, to the barely disguised glee of Claire and Iris, is a direct threat to the tidy life that Astrid's mother is trying to shoehorn her daughter into. This book is a great example of why I prefer sapphic romances: I think this character setup would not work, at least for me, in a heterosexual romance. Delilah's role only works if she's a woman; if a male character were the sarcastic conversational bulldozer, it would be almost impossible to avoid falling into the gender stereotype of a male rescuer. If this were a heterosexual romance trying to avoid that trap, the long-time friend who doesn't know how to directly confront Astrid would have to be the male protagonist. That could work, but it would be a tricky book to write without turning it into a story focused primarily on the subversion of gender roles. Making both protagonists women dodges the problem entirely and gives them so much narrative and conceptual space to simply be themselves, rather than characters obscured by the shadows of societal gender rules. This is also, at it's core, a book about friendship. Claire, Astrid, and Iris have the sort of close-knit friend group that looks exclusive and unapproachable from the outside. Delilah was the stereotypical outsider, mocked and excluded when they thought of her at all. This, at least, is how the dynamics look at the start of the book, but Blake did an impressive job of shifting my understanding of those relationships without changing their essential nature. She fleshes out all of the characters, not just the romantic leads, and adds complexity, nuance, and perspective. And, yes, past misunderstanding, but it's mostly not the cheap sort that sometimes drives romance plots. It's the misunderstanding rooted in remembered teenage social dynamics, the sort of misunderstanding that happens because communication is incredibly difficult, even more difficult when one has no practice or life experience, and requires knowing oneself well enough to even know what to communicate. The encounter between Delilah and Claire in the bar near the start of the book is cornerstone of the plot, but the moment that grabbed me and pulled me in was Delilah's first interaction with Claire's daughter Ruby. That was the point when I knew these were characters I could trust, and Blake never let me down. I love how Ruby is handled throughout this book, with all of the messy complexity of a kid of divorced parents with her own life and her own personality and complicated relationships with both parents that are independent of the relationship their parents have with each other. This is not a perfect book. There's one prank scene that I thought was excessively juvenile and should have been counter-productive, and there's one tricky question of (nonsexual) consent that the book raises and then later seems to ignore in a way that bugged me after I finished it. There is a third-act breakup, which is not my favorite plot structure, but I think Blake handles it reasonably well. I would probably find more niggles and nitpicks if I re-read it more slowly. But it was utterly engrossing reading that exactly matched my mood the day that I picked it up, and that was a fantastic reading experience. I'm not much of a romance reader and am not the traditional audience for sapphic romance, so I'm probably not the person you should be looking to for recommendations, but this is the sort of book that got me to immediately buy all of the sequels and start thinking about a re-read. It's also the sort of book that dragged me back in for several chapters when I was fact-checking bits of my review. Take that recommendation for whatever it's worth. Content note: Reviews of Delilah Green Doesn't Care tend to call it steamy or spicy. I have no calibration for this for romance novels. I did not find it very sex-focused (I have read genre fantasy novels with more sex), but there are several on-page sex scenes if that's something you care about one way or the other. Followed by Astrid Parker Doesn't Fail. Rating: 9 out of 10

8 November 2024

Reproducible Builds (diffoscope): diffoscope 283 released

The diffoscope maintainers are pleased to announce the release of diffoscope version 283. This version includes the following changes:
[ Martin Abente Lahaye ]
* Fix crash when objdump is missing when checking .EFI files.
You find out more by visiting the project homepage.

27 October 2024

Enrico Zini: Typing decorators for class members with optional arguments

This looks straightforward and is far from it. I expect tool support will improve in the future. Meanwhile, this blog post serves as a step by step explanation for what is going on in code that I'm about to push to my team.
Let's take this relatively straightforward python code. It has a function printing an int, and a decorator that makes it argument optional, taking it from a global default if missing:
from unittest import mock
default = 42
def with_default(f):
    def wrapped(self, value=None):
        if value is None:
            value = default
        return f(self, value)
    return wrapped
class Fiddle:
    @with_default
    def print(self, value):
        print("Answer:", value)
fiddle = Fiddle()
fiddle.print(12)
fiddle.print()
def mocked(self, value=None):
    print("Mocked answer:", value)
with mock.patch.object(Fiddle, "print", autospec=True, side_effect=mocked):
    fiddle.print(12)
    fiddle.print()
It works nicely as expected:
$ python3 test0.py
Answer: 12
Answer: 42
Mocked answer: 12
Mocked answer: None
It lacks functools.wraps and typing, though. Let's add them. Adding functools.wraps Adding a simple @functools.wraps, mock unexpectedly stops working:
# python3 test1.py
Answer: 12
Answer: 42
Mocked answer: 12
Traceback (most recent call last):
  File "/home/enrico/lavori/freexian/tt/test1.py", line 42, in <module>
    fiddle.print()
  File "<string>", line 2, in print
  File "/usr/lib/python3.11/unittest/mock.py", line 186, in checksig
    sig.bind(*args, **kwargs)
  File "/usr/lib/python3.11/inspect.py", line 3211, in bind
    return self._bind(args, kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/inspect.py", line 3126, in _bind
    raise TypeError(msg) from None
TypeError: missing a required argument: 'value'
This is the new code, with explanations and a fix:
# Introduce functools
import functools
from unittest import mock
default = 42
def with_default(f):
    @functools.wraps(f)
    def wrapped(self, value=None):
        if value is None:
            value = default
        return f(self, value)
    # Fix:
    # del wrapped.__wrapped__
    return wrapped
class Fiddle:
    @with_default
    def print(self, value):
        assert value is not None
        print("Answer:", value)
fiddle = Fiddle()
fiddle.print(12)
fiddle.print()
def mocked(self, value=None):
    print("Mocked answer:", value)
with mock.patch.object(Fiddle, "print", autospec=True, side_effect=mocked):
    fiddle.print(12)
    # mock's autospec uses inspect.getsignature, which follows __wrapped__ set
    # by functools.wraps, which points to a wrong signature: the idea that
    # value is optional is now lost
    fiddle.print()
Adding typing For simplicity, from now on let's change Fiddle.print to match its wrapped signature:
      # Give up with making value not optional, to simplify things :(
      def print(self, value: int   None = None) -> None:
          assert value is not None
          print("Answer:", value)
Typing with ParamSpec
# Introduce typing, try with ParamSpec
import functools
from typing import TYPE_CHECKING, ParamSpec, Callable
from unittest import mock
default = 42
P = ParamSpec("P")
def with_default(f: Callable[P, None]) -> Callable[P, None]:
    # Using ParamSpec we forward arguments, but we cannot use them!
    @functools.wraps(f)
    def wrapped(self, value: int   None = None) -> None:
        if value is None:
            value = default
        return f(self, value)
    return wrapped
class Fiddle:
    @with_default
    def print(self, value: int   None = None) -> None:
        assert value is not None
        print("Answer:", value)
mypy complains inside the wrapper, because while we forward arguments we don't constrain them, so we can't be sure there is a value in there:
test2.py:17: error: Argument 2 has incompatible type "int"; expected "P.args"  [arg-type]
test2.py:19: error: Incompatible return value type (got "_Wrapped[P, None, [Any, int   None], None]", expected "Callable[P, None]")  [return-value]
test2.py:19: note: "_Wrapped[P, None, [Any, int   None], None].__call__" has type "Callable[[Arg(Any, 'self'), DefaultArg(int   None, 'value')], None]"
Typing with Callable We can use explicit Callable argument lists:
# Introduce typing, try with Callable
import functools
from typing import TYPE_CHECKING, Callable, TypeVar
from unittest import mock
default = 42
A = TypeVar("A")
# Callable cannot represent the fact that the argument is optional, so now mypy
# complains if we try to omit it
def with_default(f: Callable[[A, int   None], None]) -> Callable[[A, int   None], None]:
    @functools.wraps(f)
    def wrapped(self: A, value: int   None = None) -> None:
        if value is None:
            value = default
        return f(self, value)
    return wrapped
class Fiddle:
    @with_default
    def print(self, value: int   None = None) -> None:
        assert value is not None
        print("Answer:", value)
if TYPE_CHECKING:
    reveal_type(Fiddle.print)
fiddle = Fiddle()
fiddle.print(12)
# !! Too few arguments for "print" of "Fiddle"  [call-arg]
fiddle.print()
def mocked(self, value=None):
    print("Mocked answer:", value)
with mock.patch.object(Fiddle, "print", autospec=True, side_effect=mocked):
    fiddle.print(12)
    fiddle.print()
Now mypy complains when we try to omit the optional argument, because Callable cannot represent optional arguments:
test3.py:32: note: Revealed type is "def (test3.Fiddle, Union[builtins.int, None])"
test3.py:37: error: Too few arguments for "print" of "Fiddle"  [call-arg]
test3.py:46: error: Too few arguments for "print" of "Fiddle"  [call-arg]
typing's documentation says:
Callable cannot express complex signatures such as functions that take a variadic number of arguments, overloaded functions, or functions that have keyword-only parameters. However, these signatures can be expressed by defining a Protocol class with a call() method:
Let's do that! Typing with Protocol, take 1
# Introduce typing, try with Protocol
import functools
from typing import TYPE_CHECKING, Protocol, TypeVar, Generic, cast
from unittest import mock
default = 42
A = TypeVar("A", contravariant=True)
class Printer(Protocol, Generic[A]):
    def __call__(_, self: A, value: int   None = None) -> None:
        ...
def with_default(f: Printer[A]) -> Printer[A]:
    @functools.wraps(f)
    def wrapped(self: A, value: int   None = None) -> None:
        if value is None:
            value = default
        return f(self, value)
    return cast(Printer, wrapped)
class Fiddle:
    # function has a __get__ method to generated bound versions of itself
    # the Printer protocol does not define it, so mypy is now unable to type
    # the bound method correctly
    @with_default
    def print(self, value: int   None = None) -> None:
        assert value is not None
        print("Answer:", value)
if TYPE_CHECKING:
    reveal_type(Fiddle.print)
fiddle = Fiddle()
# !! Argument 1 to "__call__" of "Printer" has incompatible type "int"; expected "Fiddle"
fiddle.print(12)
fiddle.print()
def mocked(self, value=None):
    print("Mocked answer:", value)
with mock.patch.object(Fiddle, "print", autospec=True, side_effect=mocked):
    fiddle.print(12)
    fiddle.print()
New mypy complaints:
test4.py:41: error: Argument 1 to "__call__" of "Printer" has incompatible type "int"; expected "Fiddle"  [arg-type]
test4.py:42: error: Missing positional argument "self" in call to "__call__" of "Printer"  [call-arg]
test4.py:50: error: Argument 1 to "__call__" of "Printer" has incompatible type "int"; expected "Fiddle"  [arg-type]
test4.py:51: error: Missing positional argument "self" in call to "__call__" of "Printer"  [call-arg]
What happens with class methods, is that the function object has a __get__ method that generates a bound versions of itself. Our Printer protocol does not define it, so mypy is now unable to type the bound method correctly. Typing with Protocol, take 2 So... we add the function descriptor methos to our Protocol! A lot of this is taken from this discussion.
# Introduce typing, try with Protocol, harder!
import functools
from typing import TYPE_CHECKING, Protocol, TypeVar, Generic, cast, overload, Union
from unittest import mock
default = 42
A = TypeVar("A", contravariant=True)
# We now produce typing for the whole function descriptor protocol
#
# See https://github.com/python/typing/discussions/1040
class BoundPrinter(Protocol):
    """Protocol typing for bound printer methods."""
    def __call__(_, value: int   None = None) -> None:
        """Bound signature."""
class Printer(Protocol, Generic[A]):
    """Protocol typing for printer methods."""
    # noqa annotations are overrides for flake8 being confused, giving either D418:
    # Function/ Method decorated with @overload shouldn't contain a docstring
    # or D105:
    # Missing docstring in magic method
    #
    # F841 is for vulture being confused:
    #   unused variable 'objtype' (100% confidence)
    @overload
    def __get__(  # noqa: D105
        self, obj: A, objtype: type[A]   None = None  # noqa: F841
    ) -> BoundPrinter:
        ...
    @overload
    def __get__(  # noqa: D105
        self, obj: None, objtype: type[A]   None = None  # noqa: F841
    ) -> "Printer[A]":
        ...
    def __get__(
        self, obj: A   None, objtype: type[A]   None = None  # noqa: F841
    ) -> Union[BoundPrinter, "Printer[A]"]:
        """Implement function descriptor protocol for class methods."""
    def __call__(_, self: A, value: int   None = None) -> None:
        """Unbound signature."""
def with_default(f: Printer[A]) -> Printer[A]:
    @functools.wraps(f)
    def wrapped(self: A, value: int   None = None) -> None:
        if value is None:
            value = default
        return f(self, value)
    return cast(Printer, wrapped)
class Fiddle:
    # function has a __get__ method to generated bound versions of itself
    # the Printer protocol does not define it, so mypy is now unable to type
    # the bound method correctly
    @with_default
    def print(self, value: int   None = None) -> None:
        assert value is not None
        print("Answer:", value)
fiddle = Fiddle()
fiddle.print(12)
fiddle.print()
def mocked(self, value=None):
    print("Mocked answer:", value)
with mock.patch.object(Fiddle, "print", autospec=True, side_effect=mocked):
    fiddle.print(12)
    fiddle.print()
It works! It's typed! And mypy is happy!

7 October 2024

Reproducible Builds: Reproducible Builds in September 2024

Welcome to the September 2024 report from the Reproducible Builds project! Our reports attempt to outline what we ve been up to over the past month, highlighting news items from elsewhere in tech where they are related. As ever, if you are interested in contributing to the project, please visit our Contribute page on our website. Table of contents:
  1. New binsider tool to analyse ELF binaries
  2. Unreproducibility of GHC Haskell compiler 95% fixed
  3. Mailing list summary
  4. Towards a 100% bit-for-bit reproducible OS
  5. Two new reproducibility-related academic papers
  6. Distribution work
  7. diffoscope
  8. Other software development
  9. Android toolchain core count issue reported
  10. New Gradle plugin for reproducibility
  11. Website updates
  12. Upstream patches
  13. Reproducibility testing framework

New binsider tool to analyse ELF binaries Reproducible Builds developer Orhun Parmaks z has announced a fantastic new tool to analyse the contents of ELF binaries. According to the project s README page:
Binsider can perform static and dynamic analysis, inspect strings, examine linked libraries, and perform hexdumps, all within a user-friendly terminal user interface!
More information about Binsider s features and how it works can be found within Binsider s documentation pages.

Unreproducibility of GHC Haskell compiler 95% fixed A seven-year-old bug about the nondeterminism of object code generated by the Glasgow Haskell Compiler (GHC) received a recent update, consisting of Rodrigo Mesquita noting that the issue is:
95% fixed by [merge request] !12680 when -fobject-determinism is enabled. [ ]
The linked merge request has since been merged, and Rodrigo goes on to say that:
After that patch is merged, there are some rarer bugs in both interface file determinism (eg. #25170) and in object determinism (eg. #25269) that need to be taken care of, but the great majority of the work needed to get there should have been merged already. When merged, I think we should close this one in favour of the more specific determinism issues like the two linked above.

Mailing list summary On our mailing list this month:
  • Fay Stegerman let everyone know that she started a thread on the Fediverse about the problems caused by unreproducible zlib/deflate compression in .zip and .apk files and later followed up with the results of her subsequent investigation.
  • Long-time developer kpcyrd wrote that there has been a recent public discussion on the Arch Linux GitLab [instance] about the challenges and possible opportunities for making the Linux kernel package reproducible , all relating to the CONFIG_MODULE_SIG flag. [ ]
  • Bernhard M. Wiedemann followed-up to an in-person conversation at our recent Hamburg 2024 summit on the potential presence for Reproducible Builds in recognised standards. [ ]
  • Fay Stegerman also wrote about her worry about the possible repercussions for RB tooling of Debian migrating from zlib to zlib-ng as reproducibility requires identical compressed data streams. [ ]
  • Martin Monperrus wrote the list announcing the latest release of maven-lockfile that is designed aid building Maven projects with integrity . [ ]
  • Lastly, Bernhard M. Wiedemann wrote about potential role of reproducible builds in combatting silent data corruption, as detailed in a recent Tweet and scholarly paper on faulty CPU cores. [ ]

Towards a 100% bit-for-bit reproducible OS Bernhard M. Wiedemann began writing on journey towards a 100% bit-for-bit reproducible operating system on the openSUSE wiki:
This is a report of Part 1 of my journey: building 100% bit-reproducible packages for every package that makes up [openSUSE s] minimalVM image. This target was chosen as the smallest useful result/artifact. The larger package-sets get, the more disk-space and build-power is required to build/verify all of them.
This work was sponsored by NLnet s NGI Zero fund.

Distribution work In Debian this month, 14 reviews of Debian packages were added, 12 were updated and 20 were removed, all adding to our knowledge about identified issues. A number of issue types were updated as well. [ ][ ] In addition, Holger opened 4 bugs against the debrebuild component of the devscripts suite of tools. In particular:
  • #1081047: Fails to download .dsc file.
  • #1081048: Does not work with a proxy.
  • #1081050: Fails to create a debrebuild.tar.
  • #1081839: Fails with E: mmdebstrap failed to run error.
Last month, an issue was filed to update the Salsa CI pipeline (used by 1,000s of Debian packages) to no longer test for reproducibility with reprotest s build_path variation. Holger Levsen provided a rationale for this change in the issue, which has already been made to the tests being performed by tests.reproducible-builds.org. This month, this issue was closed by Santiago R. R., nicely explaining that build path variation is no longer the default, and, if desired, how developers may enable it again. In openSUSE news, Bernhard M. Wiedemann published another report for that distribution.

diffoscope diffoscope is our in-depth and content-aware diff utility that can locate and diagnose reproducibility issues. This month, Chris Lamb made the following changes, including preparing and uploading version 278 to Debian:
  • New features:
    • Add a helpful contextual message to the output if comparing Debian .orig tarballs within .dsc files without the ability to fuzzy-match away the leading directory. [ ]
  • Bug fixes:
    • Drop removal of calculated os.path.basename from GNU readelf output. [ ]
    • Correctly invert X% similar value and do not emit 100% similar . [ ]
  • Misc:
    • Temporarily remove procyon-decompiler from Build-Depends as it was removed from testing (via #1057532). (#1082636)
    • Update copyright years. [ ]
For trydiffoscope, the command-line client for the web-based version of diffoscope, Chris Lamb also:
  • Added an explicit python3-setuptools dependency. (#1080825)
  • Bumped the Standards-Version to 4.7.0. [ ]

Other software development disorderfs is our FUSE-based filesystem that deliberately introduces non-determinism into system calls to reliably flush out reproducibility issues. This month, version 0.5.11-4 was uploaded to Debian unstable by Holger Levsen making the following changes:
  • Replace build-dependency on the obsolete pkg-config package with one on pkgconf, following a Lintian check. [ ]
  • Bump Standards-Version field to 4.7.0, with no related changes needed. [ ]

In addition, reprotest is our tool for building the same source code twice in different environments and then checking the binaries produced by each build for any differences. This month, version 0.7.28 was uploaded to Debian unstable by Holger Levsen including a change by Jelle van der Waa to move away from the pipes Python module to shlex, as the former will be removed in Python version 3.13 [ ].

Android toolchain core count issue reported Fay Stegerman reported an issue with the Android toolchain where a part of the build system generates a different classes.dex file (and thus a different .apk) depending on the number of cores available during the build, thereby breaking Reproducible Builds:
We ve rebuilt [tag v3.6.1] multiple times (each time in a fresh container): with 2, 4, 6, 8, and 16 cores available, respectively:
  • With 2 and 4 cores we always get an unsigned APK with SHA-256 14763d682c9286ef .
  • With 6, 8, and 16 cores we get an unsigned APK with SHA-256 35324ba4c492760 instead.

New Gradle plugin for reproducibility A new plugin for the Gradle build tool for Java has been released. This easily-enabled plugin results in:
reproducibility settings [being] applied to some of Gradle s built-in tasks that should really be the default. Compatible with Java 8 and Gradle 8.3 or later.

Website updates There were a rather substantial number of improvements made to our website this month, including:

Upstream patches The Reproducible Builds project detects, dissects and attempts to fix as many currently-unreproducible packages as possible. We endeavour to send all of our patches upstream where appropriate. This month, we wrote a large number of such patches, including:

Reproducibility testing framework The Reproducible Builds project operates a comprehensive testing framework running primarily at tests.reproducible-builds.org in order to check packages and other artifacts for reproducibility. In September, a number of changes were made by Holger Levsen, including:
  • Debian-related changes:
    • Upgrade the osuosl4 node to Debian trixie in anticipation of running debrebuild and rebuilderd there. [ ][ ][ ]
    • Temporarily mark the osuosl4 node as offline due to ongoing xfs_repair filesystem maintenance. [ ][ ]
    • Do not warn about (very old) broken nodes. [ ]
    • Add the risc64 architecture to the multiarch version skew tests for Debian trixie and sid. [ ][ ][ ]
    • Mark the virt 32,64 b nodes as down. [ ]
  • Misc changes:
    • Add support for powercycling OpenStack instances. [ ]
    • Update the fail2ban to ban hosts for 4 weeks in total [ ][ ] and take care to never ban our own Jenkins instance. [ ]
In addition, Vagrant Cascadian recorded a disk failure for the virt32b and virt64b nodes [ ], performed some maintenance of the cbxi4a node [ ][ ] and marked most armhf architecture systems as being back online.

Finally, If you are interested in contributing to the Reproducible Builds project, please visit our Contribute page on our website. However, you can get in touch with us via:

29 September 2024

Vasudev Kamath: Signing the systemd-boot on Upgrade Using Dpkg Triggers

In my previous post on enabling SecureBoot, I mentioned that one pending improvement was signing the systemd-boot EFI binary with my keys on every upgrade. In this post, we'll explore the implementation of this process using dpkg triggers. For an excellent introduction to dpkg triggers, refer to this archived blog post. The source code mentioned in that post can be downloaded from alioth archive. From /usr/share/doc/dpkg/spec/triggers.txt, triggers are described as follows:
A dpkg trigger is a facility that allows events caused by one package but of interest to another package to be recorded and aggregated, and processed later by the interested package. This feature simplifies various registration and system-update tasks and reduces duplication of processing.
To implement this, we create a custom package with a single script that signs the systemd-boot EFI binary using our key. The script is as simple as:
#!/bin/bash
set -e
echo "Signing the new systemd-bootx64.efi"
sbsign --key /etc/secureboot/db.key --cert /etc/secureboot/db.crt \
       /usr/lib/systemd/boot/efi/systemd-bootx64.efi
echo "Invoking bootctl install to copy stuff"
bootctl install
Invoking bootctl install is optional if we have enabled systemd-boot-update.service, which will update the signed bootloader on the next boot. We need to have a triggers file under the debian/ folder of the package, which declares its interest in modifications to the path /usr/lib/systemd/boot/efi/systemd-bootx64.efi. The trigger file looks like this:
# trigger 1 interest on systemd-bootx64.efi
interest-noawait /usr/lib/systemd/boot/efi/systemd-bootx64.efi
You can read about various directives and their meanings that can be used in the triggers file in the deb-triggers man page. Once we build and install the package, this request is added to /var/lib/dpkg/triggers/File. See the screenshot below after installation of our package: installed trigger To test the functionality, I performed a re-installation of the systemd-boot-efi package, which provides the EFI binary for systemd-boot, using the following command:
sudo apt install --reinstall systemd-boot-efi
During installation, you can see the debug message being printed in the screenshot below: systemd-boot-signer triggered To test the systemd-boot-update.service, I commented out the bootctl install line from the above script, performed a reinstallation, and restarted the systemd-boot-update.service. Checking the log, I saw the following:
Sep 29 13:42:51 chamunda systemd[1]: Stopping systemd-boot-update.service - Automatic Boot Loader Update...
Sep 29 13:42:51 chamunda systemd[1]: Starting systemd-boot-update.service - Automatic Boot Loader Update...
Sep 29 13:42:51 chamunda bootctl[1801516]: Skipping "/efi/EFI/systemd/systemd-bootx64.efi", same boot loader version in place already.
Sep 29 13:42:51 chamunda bootctl[1801516]: Skipping "/efi/EFI/BOOT/BOOTX64.EFI", same boot loader version in place already.
Sep 29 13:42:51 chamunda bootctl[1801516]: Skipping "/efi/EFI/BOOT/BOOTX64.EFI", same boot loader version in place already.
Sep 29 13:42:51 chamunda systemd[1]: Finished systemd-boot-update.service - Automatic Boot Loader Update.
Sep 29 13:43:37 chamunda systemd[1]: systemd-boot-update.service: Deactivated successfully.
Sep 29 13:43:37 chamunda systemd[1]: Stopped systemd-boot-update.service - Automatic Boot Loader Update.
Sep 29 13:43:37 chamunda systemd[1]: Stopping systemd-boot-update.service - Automatic Boot Loader Update...
Indeed, the service attempted to copy the bootloader but did not do so because there was no actual update to the binary; it was just a reinstallation trigger. The complete code for this package can be found here. With this post the entire series on using UKI to Secureboot with Debian comes to an end. Happy hacking!.

26 September 2024

Melissa Wen: Reflections on 2024 Linux Display Next Hackfest

Hey everyone! The 2024 Linux Display Next hackfest concluded in May, and its outcomes continue to shape the Linux Display stack. Igalia hosted this year s event in A Coru a, Spain, bringing together leading experts in the field. Samuel Iglesias and I organized this year s edition and this blog post summarizes the experience and its fruits. One of the highlights of this year s hackfest was the wide range of backgrounds represented by our 40 participants (both on-site and remotely). Developers and experts from various companies and open-source projects came together to advance the Linux Display ecosystem. You can find the list of participants here. The event covered a broad spectrum of topics affecting the development of Linux projects, user experiences, and the future of display technologies on Linux. From cutting-edge topics to long-term discussions, you can check the event agenda here.

Organization Highlights The hackfest was marked by in-depth discussions and knowledge sharing among Linux contributors, making everyone inspired, informed, and connected to the community. Building on feedback from the previous year, we refined the unconference format to enhance participant preparation and engagement. Structured Agenda and Timeboxes: Each session had a defined scope, time limit (1h20 or 2h10), and began with an introductory talk on the topic.
  • Participant-Led Discussions: We pre-selected in-person participants to lead discussions, allowing them to prepare introductions, resources, and scope.
  • Transparent Scheduling: The schedule was shared in advance as GitHub issues, encouraging participants to review and prepare for sessions of interest.
Engaging Sessions: The hackfest featured a variety of topics, including presentations and discussions on how participants were addressing specific subjects within their companies.
  • No Breakout Rooms, No Overlaps: All participants chose to attend all sessions, eliminating the need for separate breakout rooms. We also adapted run-time schedule to keep everybody involved in the same topics.
  • Real-time Updates: We provided notifications and updates through dedicated emails and the event matrix room.
Strengthening Community Connections: The hackfest offered ample opportunities for networking among attendees.
  • Social Events: Igalia sponsored coffee breaks, lunches, and a dinner at a local restaurant.
  • Museum Visit: Participants enjoyed a sponsored visit to the Museum of Estrela Galicia Beer (MEGA).

Fruitful Discussions and Follow-up The structured agenda and breaks allowed us to cover multiple topics during the hackfest. These discussions have led to new display feature development and improvements, as evidenced by patches, merge requests, and implementations in project repositories and mailing lists. With the KMS color management API taking shape, we discussed refinements and best approaches to cover the variety of color pipeline from different hardware-vendors. We are also investigating techniques for a performant SDR<->HDR content reproduction and reducing latency and power consumption when using the color blocks of the hardware.

Color Management/HDR Color Management and HDR continued to be the hottest topic of the hackfest. We had three sessions dedicated to discuss Color and HDR across Linux Display stack layers.

Color/HDR (Kernel-Level) Harry Wentland (AMD) led this session. Here, kernel Developers shared the Color Management pipeline of AMD, Intel and NVidia. We counted with diagrams and explanations from HW-vendors developers that discussed differences, constraints and paths to fit them into the KMS generic color management properties such as advertising modeset needs, IN\_FORMAT, segmented LUTs, interpolation types, etc. Developers from Qualcomm and ARM also added information regarding their hardware. Upstream work related to this session:

Color/HDR (Compositor-Level) Sebastian Wick (RedHat) led this session. It started with Sebastian s presentation covering Wayland color protocols and compositor implementation. Also, an explanation of APIs provided by Wayland and how they can be used to achieve better color management for applications and discussions around ICC profiles and color representation metadata. There was also an intensive Q&A about LittleCMS with Marti Maria. Upstream work related to this session:

Color/HDR (Use Cases and Testing) Christopher Cameron (Google) and Melissa Wen (Igalia) led this session. In contrast to the other sessions, here we focused less on implementation and more on brainstorming and reflections of real-world SDR and HDR transformations (use and validation) and gainmaps. Christopher gave a nice presentation explaining HDR gainmap images and how we should think of HDR. This presentation and Q&A were important to put participants at the same page of how to transition between SDR and HDR and somehow emulating HDR. We also discussed on the usage of a kernel background color property. Finally, we discussed a bit about Chamelium and the future of VKMS (future work and maintainership).

Power Savings vs Color/Latency Mario Limonciello (AMD) led this session. Mario gave an introductory presentation about AMD ABM (adaptive backlight management) that is similar to Intel DPST. After some discussions, we agreed on exposing a kernel property for power saving policy. This work was already merged on kernel and the userspace support is under development. Upstream work related to this session:

Strategy for video and gaming use-cases Leo Li (AMD) led this session. Miguel Casas (Google) started this session with a presentation of Overlays in Chrome/OS Video, explaining the main goal of power saving by switching off GPU for accelerated compositing and the challenges of different colorspace/HDR for video on Linux. Then Leo Li presented different strategies for video and gaming and we discussed the userspace need of more detailed feedback mechanisms to understand failures when offloading. Also, creating a debugFS interface came up as a tool for debugging and analysis.

Real-time scheduling and async KMS API Xaver Hugl (KDE/BlueSystems) led this session. Compositor developers have exposed some issues with doing real-time scheduling and async page flips. One is that the Kernel limits the lifetime of realtime threads and if a modeset takes too long, the thread will be killed and thus the compositor as well. Also, simple page flips take longer than expected and drivers should optimize them. Another issue is the lack of feedback to compositors about hardware programming time and commit deadlines (the lastest possible time to commit). This is difficult to predict from drivers, since it varies greatly with the type of properties. For example, color management updates take much longer. In this regard, we discusssed implementing a hw_done callback to timestamp when the hardware programming of the last atomic commit is complete. Also an API to pre-program color pipeline in a kind of A/B scheme. It may not be supported by all drivers, but might be useful in different ways.

VRR/Frame Limit, Display Mux, Display Control, and more and beer We also had sessions to discuss a new KMS API to mitigate headaches on VRR and Frame Limit as different brightness level at different refresh rates, abrupt changes of refresh rates, low frame rate compensation (LFC) and precise timing in VRR more. On Display Control we discussed features missing in the current KMS interface for HDR mode, atomic backlight settings, source-based tone mapping, etc. We also discussed the need of a place where compositor developers can post TODOs to be developed by KMS people. The Content-adaptive Scaling and Sharpening session focused on sharpening and scaling filters. In the Display Mux session, we discussed proposals to expose the capability of dynamic mux switching display signal between discrete and integrated GPUs. In the last session of the 2024 Display Next Hackfest, participants representing different compositors summarized current and future work and built a Linux Display wish list , which includes: improvements to VTTY and HDR switching, better dmabuf API for multi-GPU support, definition of tone mapping, blending and scaling sematics, and wayland protocols for advertising to clients which colorspaces are supported. We closed this session with a status update on feature development by compositors, including but not limited to: plane offloading (from libcamera to output) / HDR video offloading (dma-heaps) / plane-based scrolling for web pages, color management / HDR / ICC profiles support, addressing issues such as flickering when color primaries don t match, etc. After three days of intensive discussions, all in-person participants went to a guided tour at the Museum of Extrela Galicia beer (MEGA), pouring and tasting the most famous local beer.

Feedback and Future Directions Participants provided valuable feedback on the hackfest, including suggestions for future improvements.
  • Schedule and Break-time Setup: Having a pre-defined agenda and schedule provided a better balance between long discussions and mental refreshments, preventing the fatigue caused by endless discussions.
  • Action Points: Some participants recommended explicitly asking for action points at the end of each session and assigning people to follow-up tasks.
  • Remote Participation: Remote attendees appreciated the inclusive setup and opportunities to actively participate in discussions.
  • Technical Challenges: There were bandwidth and video streaming issues during some sessions due to the large number of participants.

Thank you for joining the 2024 Display Next Hackfest We can t help but thank the 40 participants, who engaged in-person or virtually on relevant discussions, for a collaborative evolution of the Linux display stack and for building an insightful agenda. A big thank you to the leaders and presenters of the nine sessions: Christopher Cameron (Google), Harry Wentland (AMD), Leo Li (AMD), Mario Limoncello (AMD), Sebastian Wick (RedHat) and Xaver Hugl (KDE/BlueSystems) for the effort in preparing the sessions, explaining the topic and guiding discussions. My acknowledge to the others in-person participants that made such an effort to travel to A Coru a: Alex Goins (NVIDIA), David Turner (Raspberry Pi), Georges Stavracas (Igalia), Joan Torres (SUSE), Liviu Dudau (Arm), Louis Chauvet (Bootlin), Robert Mader (Collabora), Tian Mengge (GravityXR), Victor Jaquez (Igalia) and Victoria Brekenfeld (System76). It was and awesome opportunity to meet you and chat face-to-face. Finally, thanks virtual participants who couldn t make it in person but organized their days to actively participate in each discussion, adding different perspectives and valuable inputs even remotely: Abhinav Kumar (Qualcomm), Chaitanya Borah (Intel), Christopher Braga (Qualcomm), Dor Askayo (Red Hat), Jiri Koten (RedHat), Jonas dahl (Red Hat), Leandro Ribeiro (Collabora), Marti Maria (Little CMS), Marijn Suijten, Mario Kleiner, Martin Stransky (Red Hat), Michel D nzer (Red Hat), Miguel Casas-Sanchez (Google), Mitulkumar Golani (Intel), Naveen Kumar (Intel), Niels De Graef (Red Hat), Pekka Paalanen (Collabora), Pichika Uday Kiran (AMD), Shashank Sharma (AMD), Sriharsha PV (AMD), Simon Ser, Uma Shankar (Intel) and Vikas Korjani (AMD). We look forward to another successful Display Next hackfest, continuing to drive innovation and improvement in the Linux display ecosystem!

25 September 2024

Melissa Wen: Reflections on 2024 Linux Display Next Hackfest

Hey everyone! The 2024 Linux Display Next hackfest concluded in May, and its outcomes continue to shape the Linux Display stack. Igalia hosted this year s event in A Coru a, Spain, bringing together leading experts in the field. Samuel Iglesias and I organized this year s edition and this blog post summarizes the experience and its fruits. One of the highlights of this year s hackfest was the wide range of backgrounds represented by our 40 participants (both on-site and remotely). Developers and experts from various companies and open-source projects came together to advance the Linux Display ecosystem. You can find the list of participants here. The event covered a broad spectrum of topics affecting the development of Linux projects, user experiences, and the future of display technologies on Linux. From cutting-edge topics to long-term discussions, you can check the event agenda here.

Organization Highlights The hackfest was marked by in-depth discussions and knowledge sharing among Linux contributors, making everyone inspired, informed, and connected to the community. Building on feedback from the previous year, we refined the unconference format to enhance participant preparation and engagement. Structured Agenda and Timeboxes: Each session had a defined scope, time limit (1h20 or 2h10), and began with an introductory talk on the topic.
  • Participant-Led Discussions: We pre-selected in-person participants to lead discussions, allowing them to prepare introductions, resources, and scope.
  • Transparent Scheduling: The schedule was shared in advance as GitHub issues, encouraging participants to review and prepare for sessions of interest.
Engaging Sessions: The hackfest featured a variety of topics, including presentations and discussions on how participants were addressing specific subjects within their companies.
  • No Breakout Rooms, No Overlaps: All participants chose to attend all sessions, eliminating the need for separate breakout rooms. We also adapted run-time schedule to keep everybody involved in the same topics.
  • Real-time Updates: We provided notifications and updates through dedicated emails and the event matrix room.
Strengthening Community Connections: The hackfest offered ample opportunities for networking among attendees.
  • Social Events: Igalia sponsored coffee breaks, lunches, and a dinner at a local restaurant.
  • Museum Visit: Participants enjoyed a sponsored visit to the Museum of Estrela Galicia Beer (MEGA).

Fruitful Discussions and Follow-up The structured agenda and breaks allowed us to cover multiple topics during the hackfest. These discussions have led to new display feature development and improvements, as evidenced by patches, merge requests, and implementations in project repositories and mailing lists. With the KMS color management API taking shape, we discussed refinements and best approaches to cover the variety of color pipeline from different hardware-vendors. We are also investigating techniques for a performant SDR<->HDR content reproduction and reducing latency and power consumption when using the color blocks of the hardware.

Color Management/HDR Color Management and HDR continued to be the hottest topic of the hackfest. We had three sessions dedicated to discuss Color and HDR across Linux Display stack layers.

Color/HDR (Kernel-Level) Harry Wentland (AMD) led this session. Here, kernel Developers shared the Color Management pipeline of AMD, Intel and NVidia. We counted with diagrams and explanations from HW-vendors developers that discussed differences, constraints and paths to fit them into the KMS generic color management properties such as advertising modeset needs, IN\_FORMAT, segmented LUTs, interpolation types, etc. Developers from Qualcomm and ARM also added information regarding their hardware. Upstream work related to this session:

Color/HDR (Compositor-Level) Sebastian Wick (RedHat) led this session. It started with Sebastian s presentation covering Wayland color protocols and compositor implementation. Also, an explanation of APIs provided by Wayland and how they can be used to achieve better color management for applications and discussions around ICC profiles and color representation metadata. There was also an intensive Q&A about LittleCMS with Marti Maria. Upstream work related to this session:

Color/HDR (Use Cases and Testing) Christopher Cameron (Google) and Melissa Wen (Igalia) led this session. In contrast to the other sessions, here we focused less on implementation and more on brainstorming and reflections of real-world SDR and HDR transformations (use and validation) and gainmaps. Christopher gave a nice presentation explaining HDR gainmap images and how we should think of HDR. This presentation and Q&A were important to put participants at the same page of how to transition between SDR and HDR and somehow emulating HDR. We also discussed on the usage of a kernel background color property. Finally, we discussed a bit about Chamelium and the future of VKMS (future work and maintainership).

Power Savings vs Color/Latency Mario Limonciello (AMD) led this session. Mario gave an introductory presentation about AMD ABM (adaptive backlight management) that is similar to Intel DPST. After some discussions, we agreed on exposing a kernel property for power saving policy. This work was already merged on kernel and the userspace support is under development. Upstream work related to this session:

Strategy for video and gaming use-cases Leo Li (AMD) led this session. Miguel Casas (Google) started this session with a presentation of Overlays in Chrome/OS Video, explaining the main goal of power saving by switching off GPU for accelerated compositing and the challenges of different colorspace/HDR for video on Linux. Then Leo Li presented different strategies for video and gaming and we discussed the userspace need of more detailed feedback mechanisms to understand failures when offloading. Also, creating a debugFS interface came up as a tool for debugging and analysis.

Real-time scheduling and async KMS API Xaver Hugl (KDE/BlueSystems) led this session. Compositor developers have exposed some issues with doing real-time scheduling and async page flips. One is that the Kernel limits the lifetime of realtime threads and if a modeset takes too long, the thread will be killed and thus the compositor as well. Also, simple page flips take longer than expected and drivers should optimize them. Another issue is the lack of feedback to compositors about hardware programming time and commit deadlines (the lastest possible time to commit). This is difficult to predict from drivers, since it varies greatly with the type of properties. For example, color management updates take much longer. In this regard, we discusssed implementing a hw_done callback to timestamp when the hardware programming of the last atomic commit is complete. Also an API to pre-program color pipeline in a kind of A/B scheme. It may not be supported by all drivers, but might be useful in different ways.

VRR/Frame Limit, Display Mux, Display Control, and more and beer We also had sessions to discuss a new KMS API to mitigate headaches on VRR and Frame Limit as different brightness level at different refresh rates, abrupt changes of refresh rates, low frame rate compensation (LFC) and precise timing in VRR more. On Display Control we discussed features missing in the current KMS interface for HDR mode, atomic backlight settings, source-based tone mapping, etc. We also discussed the need of a place where compositor developers can post TODOs to be developed by KMS people. The Content-adaptive Scaling and Sharpening session focused on sharpening and scaling filters. In the Display Mux session, we discussed proposals to expose the capability of dynamic mux switching display signal between discrete and integrated GPUs. In the last session of the 2024 Display Next Hackfest, participants representing different compositors summarized current and future work and built a Linux Display wish list , which includes: improvements to VTTY and HDR switching, better dmabuf API for multi-GPU support, definition of tone mapping, blending and scaling sematics, and wayland protocols for advertising to clients which colorspaces are supported. We closed this session with a status update on feature development by compositors, including but not limited to: plane offloading (from libcamera to output) / HDR video offloading (dma-heaps) / plane-based scrolling for web pages, color management / HDR / ICC profiles support, addressing issues such as flickering when color primaries don t match, etc. After three days of intensive discussions, all in-person participants went to a guided tour at the Museum of Extrela Galicia beer (MEGA), pouring and tasting the most famous local beer.

Feedback and Future Directions Participants provided valuable feedback on the hackfest, including suggestions for future improvements.
  • Schedule and Break-time Setup: Having a pre-defined agenda and schedule provided a better balance between long discussions and mental refreshments, preventing the fatigue caused by endless discussions.
  • Action Points: Some participants recommended explicitly asking for action points at the end of each session and assigning people to follow-up tasks.
  • Remote Participation: Remote attendees appreciated the inclusive setup and opportunities to actively participate in discussions.
  • Technical Challenges: There were bandwidth and video streaming issues during some sessions due to the large number of participants.

Thank you for joining the 2024 Display Next Hackfest We can t help but thank the 40 participants, who engaged in-person or virtually on relevant discussions, for a collaborative evolution of the Linux display stack and for building an insightful agenda. A big thank you to the leaders and presenters of the nine sessions: Christopher Cameron (Google), Harry Wentland (AMD), Leo Li (AMD), Mario Limoncello (AMD), Sebastian Wick (RedHat) and Xaver Hugl (KDE/BlueSystems) for the effort in preparing the sessions, explaining the topic and guiding discussions. My acknowledge to the others in-person participants that made such an effort to travel to A Coru a: Alex Goins (NVIDIA), David Turner (Raspberry Pi), Georges Stavracas (Igalia), Joan Torres (SUSE), Liviu Dudau (Arm), Louis Chauvet (Bootlin), Robert Mader (Collabora), Tian Mengge (GravityXR), Victor Jaquez (Igalia) and Victoria Brekenfeld (System76). It was and awesome opportunity to meet you and chat face-to-face. Finally, thanks virtual participants who couldn t make it in person but organized their days to actively participate in each discussion, adding different perspectives and valuable inputs even remotely: Abhinav Kumar (Qualcomm), Chaitanya Borah (Intel), Christopher Braga (Qualcomm), Dor Askayo, Jiri Koten (RedHat), Jonas dahl (Red Hat), Leandro Ribeiro (Collabora), Marti Maria (Little CMS), Marijn Suijten, Mario Kleiner, Martin Stransky (Red Hat), Michel D nzer (Red Hat), Miguel Casas-Sanchez (Google), Mitulkumar Golani (Intel), Naveen Kumar (Intel), Niels De Graef (Red Hat), Pekka Paalanen (Collabora), Pichika Uday Kiran (AMD), Shashank Sharma (AMD), Sriharsha PV (AMD), Simon Ser, Uma Shankar (Intel) and Vikas Korjani (AMD). We look forward to another successful Display Next hackfest, continuing to drive innovation and improvement in the Linux display ecosystem!

11 September 2024

Jamie McClelland: MariaDB mystery

I keep getting an error in our backup logs:
Sep 11 05:08:03 Warning: mysqldump: Error 2013: Lost connection to server during query when dumping table  1C4Uonkwhe_options  at row: 1402
Sep 11 05:08:03 Warning: Failed to dump mysql databases ic_wp
It s a WordPress database having trouble dumping the options table. The error log has a corresponding message:
Sep 11 13:50:11 mysql007 mariadbd[580]: 2024-09-11 13:50:11 69577 [Warning] Aborted connection 69577 to db: 'ic_wp' user: 'root' host: 'localhost' (Got an error writing communication packets)
The Internet is full of suggestions, almost all of which either focus on the network connection between the client and the server or the FEDERATED plugin. We aren t using the federated plugin and this error happens when conneting via the socket. Check it out - what is better than a consistently reproducible problem! It happens if I try to select all the values in the table:
root@mysql007:~# mysql --protocol=socket -e 'select * from 1C4Uonkwhe_options' ic_wp > /dev/null
ERROR 2013 (HY000) at line 1: Lost connection to server during query
root@mysql007:~#
It happens when I specifiy one specific offset:
root@mysql007:~# mysql --protocol=socket -e 'select * from 1C4Uonkwhe_options limit 1 offset 1402' ic_wp
ERROR 2013 (HY000) at line 1: Lost connection to server during query
root@mysql007:~#
It happens if I specify the field name explicitly:
root@mysql007:~# mysql --protocol=socket -e 'select option_id,option_name,option_value,autoload from 1C4Uonkwhe_options limit 1 offset 1402' ic_wp
ERROR 2013 (HY000) at line 1: Lost connection to server during query
root@mysql007:~#
It doesn t happen if I specify the key field:
root@mysql007:~# mysql --protocol=socket -e 'select option_id from 1C4Uonkwhe_options limit 1 offset 1402' ic_wp
+-----------+
  option_id  
+-----------+
   16296351  
+-----------+
root@mysql007:~#
It does happen if I specify the value field:
root@mysql007:~# mysql --protocol=socket -e 'select option_value from 1C4Uonkwhe_options limit 1 offset 1402' ic_wp
ERROR 2013 (HY000) at line 1: Lost connection to server during query
root@mysql007:~#
It doesn t happen if I query the specific row by key field:
root@mysql007:~# mysql --protocol=socket -e 'select * from 1C4Uonkwhe_options where option_id = 16296351' ic_wp
+-----------+----------------------+--------------+----------+
  option_id   option_name            option_value   autoload  
+-----------+----------------------+--------------+----------+
   16296351   z_taxonomy_image8905                  yes       
+-----------+----------------------+--------------+----------+
root@mysql007:~#
Hm. Surely there is some funky non-printing character in that option_value right?
root@mysql007:~# mysql --protocol=socket -e 'select CHAR_LENGTH(option_value) from 1C4Uonkwhe_options where option_id = 16296351' ic_wp
+---------------------------+
  CHAR_LENGTH(option_value)  
+---------------------------+
                          0  
+---------------------------+
root@mysql007:~# mysql --protocol=socket -e 'select HEX(option_value) from 1C4Uonkwhe_options where option_id = 16296351' ic_wp
+-------------------+
  HEX(option_value)  
+-------------------+
                     
+-------------------+
root@mysql007:~#
Resetting the value to an empty value doesn t make a difference:
root@mysql007:~# mysql --protocol=socket -e 'update 1C4Uonkwhe_options set option_value = "" where option_id = 16296351' ic_wp
root@mysql007:~# mysql --protocol=socket -e 'select * from 1C4Uonkwhe_options' ic_wp > /dev/null
ERROR 2013 (HY000) at line 1: Lost connection to server during query
root@mysql007:~#
Deleting the row in question causes the error to specify a new offset:
root@mysql007:~# mysql --protocol=socket -e 'delete from 1C4Uonkwhe_options where option_id = 16296351' ic_wp
root@mysql007:~# mysql --protocol=socket -e 'select * from 1C4Uonkwhe_options' ic_wp > /dev/null
ERROR 2013 (HY000) at line 1: Lost connection to server during query
root@mysql007:~# mysqldump ic_wp > /dev/null
mysqldump: Error 2013: Lost connection to server during query when dumping table  1C4Uonkwhe_options  at row: 1401
root@mysql007:~#
If I put the record I deleted back in, we return to the old offset:
root@mysql007:~# mysql --protocol=socket -e 'insert into 1C4Uonkwhe_options VALUES(16296351,"z_taxonomy_image8905","","yes");' ic_wp 
root@mysql007:~# mysqldump ic_wp > /dev/null
mysqldump: Error 2013: Lost connection to server during query when dumping table  1C4Uonkwhe_options  at row: 1402
root@mysql007:~#
I m losing my little mind. Let s get drastic and create a whole new table, copy over the data delicately working around the deadly offset:
oot@mysql007:~# mysql --protocol=socket -e 'create table 1C4Uonkwhe_new_options like 1C4Uonkwhe_options;' ic_wp 
root@mysql007:~# mysql --protocol=socket -e 'insert into 1C4Uonkwhe_new_options select * from 1C4Uonkwhe_options limit 1402 offset 0;' ic_wp 
--- There is only 33 more records, not sure how to specify unlimited limit but 100 does the trick.
root@mysql007:~# mysql --protocol=socket -e 'insert into 1C4Uonkwhe_new_options select * from 1C4Uonkwhe_options limit 100 offset 1403;' ic_wp 
Now let s make sure all is working properly:
root@mysql007:~# mysql --protocol=socket -e 'select * from 1C4Uonkwhe_new_options' ic_wp >/dev/null;
Now let s examine which row we are missing:
root@mysql007:~# mysql --protocol=socket -e 'select option_id from 1C4Uonkwhe_options where option_id not in (select option_id from 1C4Uonkwhe_new_options) ;' ic_wp 
+-----------+
  option_id  
+-----------+
   18405297  
+-----------+
root@mysql007:~#
Wait, what? I was expecting option_id 16296351. Oh, now we are getting somewhere. And I see my mistake: when using offsets, you need to use ORDER BY or you won t get consistent results.
root@mysql007:~# mysql --protocol=socket -e 'select option_id from 1C4Uonkwhe_options order by option_id limit 1 offset 1402' ic_wp ;
+-----------+
  option_id  
+-----------+
   18405297  
+-----------+
root@mysql007:~#
Now that I have the correct row what is in it:
root@mysql007:~# mysql --protocol=socket -e 'select * from 1C4Uonkwhe_options where option_id = 18405297' ic_wp ;
ERROR 2013 (HY000) at line 1: Lost connection to server during query
root@mysql007:~#
Well, that makes a lot more sense. Let s start over with examining the value:
root@mysql007:~# mysql --protocol=socket -e 'select CHAR_LENGTH(option_value) from 1C4Uonkwhe_options where option_id = 18405297' ic_wp ;
+---------------------------+
  CHAR_LENGTH(option_value)  
+---------------------------+
                   50814767  
+---------------------------+
root@mysql007:~#
Wow, that s a lot of characters. If it were a book, it would be 35,000 pages long (I just discovered this site). It s a LONGTEXT field so it should be able to handle it. But now I have a better idea of what could be going wrong. The name of the option is rewrite_rules so it seems like something is going wrong with the generation of that option. I imagine there is some tweak I can make to allow MariaDB to cough up the value (read_buffer_size? tmp_table_size?). But I ll start with checking in with the database owner because I don t think 35,000 pages of rewrite rules is appropriate for any site.

8 September 2024

Jacob Adams: Linux's Bedtime Routine

How does Linux move from an awake machine to a hibernating one? How does it then manage to restore all state? These questions led me to read way too much C in trying to figure out how this particular hardware/software boundary is navigated. This investigation will be split into a few parts, with the first one going from invocation of hibernation to synchronizing all filesystems to disk. This article has been written using Linux version 6.9.9, the source of which can be found in many places, but can be navigated easily through the Bootlin Elixir Cross-Referencer: https://elixir.bootlin.com/linux/v6.9.9/source Each code snippet will begin with a link to the above giving the file path and the line number of the beginning of the snippet.

A Starting Point for Investigation: /sys/power/state and /sys/power/disk These two system files exist to allow debugging of hibernation, and thus control the exact state used directly. Writing specific values to the state file controls the exact sleep mode used and disk controls the specific hibernation mode1. This is extremely handy as an entry point to understand how these systems work, since we can just follow what happens when they are written to.

Show and Store Functions These two files are defined using the power_attr macro: kernel/power/power.h:80
#define power_attr(_name) \
static struct kobj_attribute _name##_attr =     \
    .attr   =               \
        .name = __stringify(_name), \
        .mode = 0644,           \
     ,                  \
    .show   = _name##_show,         \
    .store  = _name##_store,        \
 
show is called on reads and store on writes. state_show is a little boring for our purposes, as it just prints all the available sleep states. kernel/power/main.c:657
/*
 * state - control system sleep states.
 *
 * show() returns available sleep state labels, which may be "mem", "standby",
 * "freeze" and "disk" (hibernation).
 * See Documentation/admin-guide/pm/sleep-states.rst for a description of
 * what they mean.
 *
 * store() accepts one of those strings, translates it into the proper
 * enumerated value, and initiates a suspend transition.
 */
static ssize_t state_show(struct kobject *kobj, struct kobj_attribute *attr,
			  char *buf)
 
	char *s = buf;
#ifdef CONFIG_SUSPEND
	suspend_state_t i;
	for (i = PM_SUSPEND_MIN; i < PM_SUSPEND_MAX; i++)
		if (pm_states[i])
			s += sprintf(s,"%s ", pm_states[i]);
#endif
	if (hibernation_available())
		s += sprintf(s, "disk ");
	if (s != buf)
		/* convert the last space to a newline */
		*(s-1) = '\n';
	return (s - buf);
 
state_store, however, provides our entry point. If the string disk is written to the state file, it calls hibernate(). This is our entry point. kernel/power/main.c:715
static ssize_t state_store(struct kobject *kobj, struct kobj_attribute *attr,
			   const char *buf, size_t n)
 
	suspend_state_t state;
	int error;
	error = pm_autosleep_lock();
	if (error)
		return error;
	if (pm_autosleep_state() > PM_SUSPEND_ON)  
		error = -EBUSY;
		goto out;
	 
	state = decode_state(buf, n);
	if (state < PM_SUSPEND_MAX)  
		if (state == PM_SUSPEND_MEM)
			state = mem_sleep_current;
		error = pm_suspend(state);
	  else if (state == PM_SUSPEND_MAX)  
		error = hibernate();
	  else  
		error = -EINVAL;
	 
 out:
	pm_autosleep_unlock();
	return error ? error : n;
 
kernel/power/main.c:688
static suspend_state_t decode_state(const char *buf, size_t n)
 
#ifdef CONFIG_SUSPEND
	suspend_state_t state;
#endif
	char *p;
	int len;
	p = memchr(buf, '\n', n);
	len = p ? p - buf : n;
	/* Check hibernation first. */
	if (len == 4 && str_has_prefix(buf, "disk"))
		return PM_SUSPEND_MAX;
#ifdef CONFIG_SUSPEND
	for (state = PM_SUSPEND_MIN; state < PM_SUSPEND_MAX; state++)  
		const char *label = pm_states[state];
		if (label && len == strlen(label) && !strncmp(buf, label, len))
			return state;
	 
#endif
	return PM_SUSPEND_ON;
 
Could we have figured this out just via function names? Sure, but this way we know for sure that nothing else is happening before this function is called.

Autosleep Our first detour is into the autosleep system. When checking the state above, you may notice that the kernel grabs the pm_autosleep_lock before checking the current state. autosleep is a mechanism originally from Android that sends the entire system to either suspend or hibernate whenever it is not actively working on anything. This is not enabled for most desktop configurations, since it s primarily for mobile systems and inverts the standard suspend and hibernate interactions. This system is implemented as a workqueue2 that checks the current number of wakeup events, processes and drivers that need to run3, and if there aren t any, then the system is put into the autosleep state, typically suspend. However, it could be hibernate if configured that way via /sys/power/autosleep in a similar manner to using /sys/power/state to manually enable hibernation. kernel/power/main.c:841
static ssize_t autosleep_store(struct kobject *kobj,
			       struct kobj_attribute *attr,
			       const char *buf, size_t n)
 
	suspend_state_t state = decode_state(buf, n);
	int error;
	if (state == PM_SUSPEND_ON
	    && strcmp(buf, "off") && strcmp(buf, "off\n"))
		return -EINVAL;
	if (state == PM_SUSPEND_MEM)
		state = mem_sleep_current;
	error = pm_autosleep_set_state(state);
	return error ? error : n;
 
power_attr(autosleep);
#endif /* CONFIG_PM_AUTOSLEEP */
kernel/power/autosleep.c:24
static DEFINE_MUTEX(autosleep_lock);
static struct wakeup_source *autosleep_ws;
static void try_to_suspend(struct work_struct *work)
 
	unsigned int initial_count, final_count;
	if (!pm_get_wakeup_count(&initial_count, true))
		goto out;
	mutex_lock(&autosleep_lock);
	if (!pm_save_wakeup_count(initial_count)  
		system_state != SYSTEM_RUNNING)  
		mutex_unlock(&autosleep_lock);
		goto out;
	 
	if (autosleep_state == PM_SUSPEND_ON)  
		mutex_unlock(&autosleep_lock);
		return;
	 
	if (autosleep_state >= PM_SUSPEND_MAX)
		hibernate();
	else
		pm_suspend(autosleep_state);
	mutex_unlock(&autosleep_lock);
	if (!pm_get_wakeup_count(&final_count, false))
		goto out;
	/*
	 * If the wakeup occurred for an unknown reason, wait to prevent the
	 * system from trying to suspend and waking up in a tight loop.
	 */
	if (final_count == initial_count)
		schedule_timeout_uninterruptible(HZ / 2);
 out:
	queue_up_suspend_work();
 
static DECLARE_WORK(suspend_work, try_to_suspend);
void queue_up_suspend_work(void)
 
	if (autosleep_state > PM_SUSPEND_ON)
		queue_work(autosleep_wq, &suspend_work);
 

The Steps of Hibernation

Hibernation Kernel Config It s important to note that most of the hibernate-specific functions below do nothing unless you ve defined CONFIG_HIBERNATION in your Kconfig4. As an example, hibernate itself is defined as the following if CONFIG_HIBERNATE is not set. include/linux/suspend.h:407
static inline int hibernate(void)   return -ENOSYS;  

Check if Hibernation is Available We begin by confirming that we actually can perform hibernation, via the hibernation_available function. kernel/power/hibernate.c:742
if (!hibernation_available())  
	pm_pr_dbg("Hibernation not available.\n");
	return -EPERM;
 
kernel/power/hibernate.c:92
bool hibernation_available(void)
 
	return nohibernate == 0 &&
		!security_locked_down(LOCKDOWN_HIBERNATION) &&
		!secretmem_active() && !cxl_mem_active();
 
nohibernate is controlled by the kernel command line, it s set via either nohibernate or hibernate=no. security_locked_down is a hook for Linux Security Modules to prevent hibernation. This is used to prevent hibernating to an unencrypted storage device, as specified in the manual page kernel_lockdown(7). Interestingly, either level of lockdown, integrity or confidentiality, locks down hibernation because with the ability to hibernate you can extract bascially anything from memory and even reboot into a modified kernel image. secretmem_active checks whether there is any active use of memfd_secret, and if so it prevents hibernation. memfd_secret returns a file descriptor that can be mapped into a process but is specifically unmapped from the kernel s memory space. Hibernating with memory that not even the kernel is supposed to access would expose that memory to whoever could access the hibernation image. This particular feature of secret memory was apparently controversial, though not as controversial as performance concerns around fragmentation when unmapping kernel memory (which did not end up being a real problem). cxl_mem_active just checks whether any CXL memory is active. A full explanation is provided in the commit introducing this check but there s also a shortened explanation from cxl_mem_probe that sets the relevant flag when initializing a CXL memory device. drivers/cxl/mem.c:186
* The kernel may be operating out of CXL memory on this device,
* there is no spec defined way to determine whether this device
* preserves contents over suspend, and there is no simple way
* to arrange for the suspend image to avoid CXL memory which
* would setup a circular dependency between PCI resume and save
* state restoration.

Check Compression The next check is for whether compression support is enabled, and if so whether the requested algorithm is enabled. kernel/power/hibernate.c:747
/*
 * Query for the compression algorithm support if compression is enabled.
 */
if (!nocompress)  
	strscpy(hib_comp_algo, hibernate_compressor, sizeof(hib_comp_algo));
	if (crypto_has_comp(hib_comp_algo, 0, 0) != 1)  
		pr_err("%s compression is not available\n", hib_comp_algo);
		return -EOPNOTSUPP;
	 
 
The nocompress flag is set via the hibernate command line parameter, setting hibernate=nocompress. If compression is enabled, then hibernate_compressor is copied to hib_comp_algo. This synchronizes the current requested compression setting (hibernate_compressor) with the current compression setting (hib_comp_algo). Both values are character arrays of size CRYPTO_MAX_ALG_NAME (128 in this kernel). kernel/power/hibernate.c:50
static char hibernate_compressor[CRYPTO_MAX_ALG_NAME] = CONFIG_HIBERNATION_DEF_COMP;
/*
 * Compression/decompression algorithm to be used while saving/loading
 * image to/from disk. This would later be used in 'kernel/power/swap.c'
 * to allocate comp streams.
 */
char hib_comp_algo[CRYPTO_MAX_ALG_NAME];
hibernate_compressor defaults to lzo if that algorithm is enabled, otherwise to lz4 if enabled5. It can be overwritten using the hibernate.compressor setting to either lzo or lz4. kernel/power/Kconfig:95
choice
	prompt "Default compressor"
	default HIBERNATION_COMP_LZO
	depends on HIBERNATION
config HIBERNATION_COMP_LZO
	bool "lzo"
	depends on CRYPTO_LZO
config HIBERNATION_COMP_LZ4
	bool "lz4"
	depends on CRYPTO_LZ4
endchoice
config HIBERNATION_DEF_COMP
	string
	default "lzo" if HIBERNATION_COMP_LZO
	default "lz4" if HIBERNATION_COMP_LZ4
	help
	  Default compressor to be used for hibernation.
kernel/power/hibernate.c:1425
static const char * const comp_alg_enabled[] =  
#if IS_ENABLED(CONFIG_CRYPTO_LZO)
	COMPRESSION_ALGO_LZO,
#endif
#if IS_ENABLED(CONFIG_CRYPTO_LZ4)
	COMPRESSION_ALGO_LZ4,
#endif
 ;
static int hibernate_compressor_param_set(const char *compressor,
		const struct kernel_param *kp)
 
	unsigned int sleep_flags;
	int index, ret;
	sleep_flags = lock_system_sleep();
	index = sysfs_match_string(comp_alg_enabled, compressor);
	if (index >= 0)  
		ret = param_set_copystring(comp_alg_enabled[index], kp);
		if (!ret)
			strscpy(hib_comp_algo, comp_alg_enabled[index],
				sizeof(hib_comp_algo));
	  else  
		ret = index;
	 
	unlock_system_sleep(sleep_flags);
	if (ret)
		pr_debug("Cannot set specified compressor %s\n",
			 compressor);
	return ret;
 
static const struct kernel_param_ops hibernate_compressor_param_ops =  
	.set    = hibernate_compressor_param_set,
	.get    = param_get_string,
 ;
static struct kparam_string hibernate_compressor_param_string =  
	.maxlen = sizeof(hibernate_compressor),
	.string = hibernate_compressor,
 ;
We then check whether the requested algorithm is supported via crypto_has_comp. If not, we bail out of the whole operation with EOPNOTSUPP. As part of crypto_has_comp we perform any needed initialization of the algorithm, loading kernel modules and running initialization code as needed6.

Grab Locks The next step is to grab the sleep and hibernation locks via lock_system_sleep and hibernate_acquire. kernel/power/hibernate.c:758
sleep_flags = lock_system_sleep();
/* The snapshot device should not be opened while we're running */
if (!hibernate_acquire())  
	error = -EBUSY;
	goto Unlock;
 
First, lock_system_sleep marks the current thread as not freezable, which will be important later7. It then grabs the system_transistion_mutex, which locks taking snapshots or modifying how they are taken, resuming from a hibernation image, entering any suspend state, or rebooting.

The GFP Mask The kernel also issues a warning if the gfp mask is changed via either pm_restore_gfp_mask or pm_restrict_gfp_mask without holding the system_transistion_mutex. GFP flags tell the kernel how it is permitted to handle a request for memory. include/linux/gfp_types.h:12
 * GFP flags are commonly used throughout Linux to indicate how memory
 * should be allocated.  The GFP acronym stands for get_free_pages(),
 * the underlying memory allocation function.  Not every GFP flag is
 * supported by every function which may allocate memory.
In the case of hibernation specifically we care about the IO and FS flags, which are reclaim operators, ways the system is permitted to attempt to free up memory in order to satisfy a specific request for memory. include/linux/gfp_types.h:176
 * Reclaim modifiers
 * -----------------
 * Please note that all the following flags are only applicable to sleepable
 * allocations (e.g. %GFP_NOWAIT and %GFP_ATOMIC will ignore them).
 *
 * %__GFP_IO can start physical IO.
 *
 * %__GFP_FS can call down to the low-level FS. Clearing the flag avoids the
 * allocator recursing into the filesystem which might already be holding
 * locks.
gfp_allowed_mask sets which flags are permitted to be set at the current time. As the comment below outlines, preventing these flags from being set avoids situations where the kernel needs to do I/O to allocate memory (e.g. read/writing swap8) but the devices it needs to read/write to/from are not currently available. kernel/power/main.c:24
/*
 * The following functions are used by the suspend/hibernate code to temporarily
 * change gfp_allowed_mask in order to avoid using I/O during memory allocations
 * while devices are suspended.  To avoid races with the suspend/hibernate code,
 * they should always be called with system_transition_mutex held
 * (gfp_allowed_mask also should only be modified with system_transition_mutex
 * held, unless the suspend/hibernate code is guaranteed not to run in parallel
 * with that modification).
 */
static gfp_t saved_gfp_mask;
void pm_restore_gfp_mask(void)
 
	WARN_ON(!mutex_is_locked(&system_transition_mutex));
	if (saved_gfp_mask)  
		gfp_allowed_mask = saved_gfp_mask;
		saved_gfp_mask = 0;
	 
 
void pm_restrict_gfp_mask(void)
 
	WARN_ON(!mutex_is_locked(&system_transition_mutex));
	WARN_ON(saved_gfp_mask);
	saved_gfp_mask = gfp_allowed_mask;
	gfp_allowed_mask &= ~(__GFP_IO   __GFP_FS);
 

Sleep Flags After grabbing the system_transition_mutex the kernel then returns and captures the previous state of the threads flags in sleep_flags. This is used later to remove PF_NOFREEZE if it wasn t previously set on the current thread. kernel/power/main.c:52
unsigned int lock_system_sleep(void)
 
	unsigned int flags = current->flags;
	current->flags  = PF_NOFREEZE;
	mutex_lock(&system_transition_mutex);
	return flags;
 
EXPORT_SYMBOL_GPL(lock_system_sleep);
include/linux/sched.h:1633
#define PF_NOFREEZE		0x00008000	/* This thread should not be frozen */
Then we grab the hibernate-specific semaphore to ensure no one can open a snapshot or resume from it while we perform hibernation. Additionally this lock is used to prevent hibernate_quiet_exec, which is used by the nvdimm driver to active its firmware with all processes and devices frozen, ensuring it is the only thing running at that time9. kernel/power/hibernate.c:82
bool hibernate_acquire(void)
 
	return atomic_add_unless(&hibernate_atomic, -1, 0);
 

Prepare Console The kernel next calls pm_prepare_console. This function only does anything if CONFIG_VT_CONSOLE_SLEEP has been set. This prepares the virtual terminal for a suspend state, switching away to a console used only for the suspend state if needed. kernel/power/console.c:130
void pm_prepare_console(void)
 
	if (!pm_vt_switch())
		return;
	orig_fgconsole = vt_move_to_console(SUSPEND_CONSOLE, 1);
	if (orig_fgconsole < 0)
		return;
	orig_kmsg = vt_kmsg_redirect(SUSPEND_CONSOLE);
	return;
 
The first thing is to check whether we actually need to switch the VT kernel/power/console.c:94
/*
 * There are three cases when a VT switch on suspend/resume are required:
 *   1) no driver has indicated a requirement one way or another, so preserve
 *      the old behavior
 *   2) console suspend is disabled, we want to see debug messages across
 *      suspend/resume
 *   3) any registered driver indicates it needs a VT switch
 *
 * If none of these conditions is present, meaning we have at least one driver
 * that doesn't need the switch, and none that do, we can avoid it to make
 * resume look a little prettier (and suspend too, but that's usually hidden,
 * e.g. when closing the lid on a laptop).
 */
static bool pm_vt_switch(void)
 
	struct pm_vt_switch *entry;
	bool ret = true;
	mutex_lock(&vt_switch_mutex);
	if (list_empty(&pm_vt_switch_list))
		goto out;
	if (!console_suspend_enabled)
		goto out;
	list_for_each_entry(entry, &pm_vt_switch_list, head)  
		if (entry->required)
			goto out;
	 
	ret = false;
out:
	mutex_unlock(&vt_switch_mutex);
	return ret;
 
There is an explanation of the conditions under which a switch is performed in the comment above the function, but we ll also walk through the steps here. Firstly we grab the vt_switch_mutex to ensure nothing will modify the list while we re looking at it. We then examine the pm_vt_switch_list. This list is used to indicate the drivers that require a switch during suspend. They register this requirement, or the lack thereof, via pm_vt_switch_required. kernel/power/console.c:31
/**
 * pm_vt_switch_required - indicate VT switch at suspend requirements
 * @dev: device
 * @required: if true, caller needs VT switch at suspend/resume time
 *
 * The different console drivers may or may not require VT switches across
 * suspend/resume, depending on how they handle restoring video state and
 * what may be running.
 *
 * Drivers can indicate support for switchless suspend/resume, which can
 * save time and flicker, by using this routine and passing 'false' as
 * the argument.  If any loaded driver needs VT switching, or the
 * no_console_suspend argument has been passed on the command line, VT
 * switches will occur.
 */
void pm_vt_switch_required(struct device *dev, bool required)
Next, we check console_suspend_enabled. This is set to false by the kernel parameter no_console_suspend, but defaults to true. Finally, if there are any entries in the pm_vt_switch_list, then we check to see if any of them require a VT switch. Only if none of these conditions apply, then we return false. If a VT switch is in fact required, then we move first the currently active virtual terminal/console10 (vt_move_to_console) and then the current location of kernel messages (vt_kmsg_redirect) to the SUSPEND_CONSOLE. The SUSPEND_CONSOLE is the last entry in the list of possible consoles, and appears to just be a black hole to throw away messages. kernel/power/console.c:16
#define SUSPEND_CONSOLE	(MAX_NR_CONSOLES-1)
Interestingly, these are separate functions because you can use TIOCL_SETKMSGREDIRECT (an ioctl11) to send kernel messages to a specific virtual terminal, but by default its the same as the currently active console. The locations of the previously active console and the previous kernel messages location are stored in orig_fgconsole and orig_kmsg, to restore the state of the console and kernel messages after the machine wakes up again. Interestingly, this means orig_fgconsole also ends up storing any errors, so has to be checked to ensure it s not less than zero before we try to do anything with the kernel messages on both suspend and resume. drivers/tty/vt/vt_ioctl.c:1268
/* Perform a kernel triggered VT switch for suspend/resume */
static int disable_vt_switch;
int vt_move_to_console(unsigned int vt, int alloc)
 
	int prev;
	console_lock();
	/* Graphics mode - up to X */
	if (disable_vt_switch)  
		console_unlock();
		return 0;
	 
	prev = fg_console;
	if (alloc && vc_allocate(vt))  
		/* we can't have a free VC for now. Too bad,
		 * we don't want to mess the screen for now. */
		console_unlock();
		return -ENOSPC;
	 
	if (set_console(vt))  
		/*
		 * We're unable to switch to the SUSPEND_CONSOLE.
		 * Let the calling function know so it can decide
		 * what to do.
		 */
		console_unlock();
		return -EIO;
	 
	console_unlock();
	if (vt_waitactive(vt + 1))  
		pr_debug("Suspend: Can't switch VCs.");
		return -EINTR;
	 
	return prev;
 
Unlike most other locking functions we ve seen so far, console_lock needs to be careful to ensure nothing else is panicking and needs to dump to the console before grabbing the semaphore for the console and setting a couple flags.

Panics Panics are tracked via an atomic integer set to the id of the processor currently panicking. kernel/printk/printk.c:2649
/**
 * console_lock - block the console subsystem from printing
 *
 * Acquires a lock which guarantees that no consoles will
 * be in or enter their write() callback.
 *
 * Can sleep, returns nothing.
 */
void console_lock(void)
 
	might_sleep();
	/* On panic, the console_lock must be left to the panic cpu. */
	while (other_cpu_in_panic())
		msleep(1000);
	down_console_sem();
	console_locked = 1;
	console_may_schedule = 1;
 
EXPORT_SYMBOL(console_lock);
kernel/printk/printk.c:362
/*
 * Return true if a panic is in progress on a remote CPU.
 *
 * On true, the local CPU should immediately release any printing resources
 * that may be needed by the panic CPU.
 */
bool other_cpu_in_panic(void)
 
	return (panic_in_progress() && !this_cpu_in_panic());
 
kernel/printk/printk.c:345
static bool panic_in_progress(void)
 
	return unlikely(atomic_read(&panic_cpu) != PANIC_CPU_INVALID);
 
kernel/printk/printk.c:350
/* Return true if a panic is in progress on the current CPU. */
bool this_cpu_in_panic(void)
 
	/*
	 * We can use raw_smp_processor_id() here because it is impossible for
	 * the task to be migrated to the panic_cpu, or away from it. If
	 * panic_cpu has already been set, and we're not currently executing on
	 * that CPU, then we never will be.
	 */
	return unlikely(atomic_read(&panic_cpu) == raw_smp_processor_id());
 
console_locked is a debug value, used to indicate that the lock should be held, and our first indication that this whole virtual terminal system is more complex than might initially be expected. kernel/printk/printk.c:373
/*
 * This is used for debugging the mess that is the VT code by
 * keeping track if we have the console semaphore held. It's
 * definitely not the perfect debug tool (we don't know if _WE_
 * hold it and are racing, but it helps tracking those weird code
 * paths in the console code where we end up in places I want
 * locked without the console semaphore held).
 */
static int console_locked;
console_may_schedule is used to see if we are permitted to sleep and schedule other work while we hold this lock. As we ll see later, the virtual terminal subsystem is not re-entrant, so there s all sorts of hacks in here to ensure we don t leave important code sections that can t be safely resumed.

Disable VT Switch As the comment below lays out, when another program is handling graphical display anyway, there s no need to do any of this, so the kernel provides a switch to turn the whole thing off. Interestingly, this appears to only be used by three drivers, so the specific hardware support required must not be particularly common.
drivers/gpu/drm/omapdrm/dss
drivers/video/fbdev/geode
drivers/video/fbdev/omap2
drivers/tty/vt/vt_ioctl.c:1308
/*
 * Normally during a suspend, we allocate a new console and switch to it.
 * When we resume, we switch back to the original console.  This switch
 * can be slow, so on systems where the framebuffer can handle restoration
 * of video registers anyways, there's little point in doing the console
 * switch.  This function allows you to disable it by passing it '0'.
 */
void pm_set_vt_switch(int do_switch)
 
	console_lock();
	disable_vt_switch = !do_switch;
	console_unlock();
 
EXPORT_SYMBOL(pm_set_vt_switch);
The rest of the vt_switch_console function is pretty normal, however, simply allocating space if needed to create the requested virtual terminal and then setting the current virtual terminal via set_console.

Virtual Terminal Set Console With set_console, we begin (as if we haven t been already) to enter the madness that is the virtual terminal subsystem. As mentioned previously, modifications to its state must be made very carefully, as other stuff happening at the same time could create complete messes. All this to say, calling set_console does not actually perform any work to change the state of the current console. Instead it indicates what changes it wants and then schedules that work. drivers/tty/vt/vt.c:3153
int set_console(int nr)
 
	struct vc_data *vc = vc_cons[fg_console].d;
	if (!vc_cons_allocated(nr)   vt_dont_switch  
		(vc->vt_mode.mode == VT_AUTO && vc->vc_mode == KD_GRAPHICS))  
		/*
		 * Console switch will fail in console_callback() or
		 * change_console() so there is no point scheduling
		 * the callback
		 *
		 * Existing set_console() users don't check the return
		 * value so this shouldn't break anything
		 */
		return -EINVAL;
	 
	want_console = nr;
	schedule_console_callback();
	return 0;
 
The check for vc->vc_mode == KD_GRAPHICS is where most end-user graphical desktops will bail out of this change, as they re in graphics mode and don t need to switch away to the suspend console. vt_dont_switch is a flag used by the ioctls11 VT_LOCKSWITCH and VT_UNLOCKSWITCH to prevent the system from switching virtual terminal devices when the user has explicitly locked it. VT_AUTO is a flag indicating that automatic virtual terminal switching is enabled12, and thus deliberate switching to a suspend terminal is not required. However, if you do run your machine from a virtual terminal, then we indicate to the system that we want to change to the requested virtual terminal via the want_console variable and schedule a callback via schedule_console_callback. drivers/tty/vt/vt.c:315
void schedule_console_callback(void)
 
	schedule_work(&console_work);
 
console_work is a workqueue2 that will execute the given task asynchronously.

Console Callback drivers/tty/vt/vt.c:3109
/*
 * This is the console switching callback.
 *
 * Doing console switching in a process context allows
 * us to do the switches asynchronously (needed when we want
 * to switch due to a keyboard interrupt).  Synchronization
 * with other console code and prevention of re-entrancy is
 * ensured with console_lock.
 */
static void console_callback(struct work_struct *ignored)
 
	console_lock();
	if (want_console >= 0)  
		if (want_console != fg_console &&
		    vc_cons_allocated(want_console))  
			hide_cursor(vc_cons[fg_console].d);
			change_console(vc_cons[want_console].d);
			/* we only changed when the console had already
			   been allocated - a new console is not created
			   in an interrupt routine */
		 
		want_console = -1;
	 
...
console_callback first looks to see if there is a console change wanted via want_console and then changes to it if it s not the current console and has been allocated already. We do first remove any cursor state with hide_cursor. drivers/tty/vt/vt.c:841
static void hide_cursor(struct vc_data *vc)
 
	if (vc_is_sel(vc))
		clear_selection();
	vc->vc_sw->con_cursor(vc, false);
	hide_softcursor(vc);
 
A full dive into the tty driver is a task for another time, but this should give a general sense of how this system interacts with hibernation.

Notify Power Management Call Chain kernel/power/hibernate.c:767
pm_notifier_call_chain_robust(PM_HIBERNATION_PREPARE, PM_POST_HIBERNATION)
This will call a chain of power management callbacks, passing first PM_HIBERNATION_PREPARE and then PM_POST_HIBERNATION on startup or on error with another callback. kernel/power/main.c:98
int pm_notifier_call_chain_robust(unsigned long val_up, unsigned long val_down)
 
	int ret;
	ret = blocking_notifier_call_chain_robust(&pm_chain_head, val_up, val_down, NULL);
	return notifier_to_errno(ret);
 
The power management notifier is a blocking notifier chain, which means it has the following properties. include/linux/notifier.h:23
 *	Blocking notifier chains: Chain callbacks run in process context.
 *		Callouts are allowed to block.
The callback chain is a linked list with each entry containing a priority and a function to call. The function technically takes in a data value, but it is always NULL for the power management chain. include/linux/notifier.h:49
struct notifier_block;
typedef	int (*notifier_fn_t)(struct notifier_block *nb,
			unsigned long action, void *data);
struct notifier_block  
	notifier_fn_t notifier_call;
	struct notifier_block __rcu *next;
	int priority;
 ;
The head of the linked list is protected by a read-write semaphore. include/linux/notifier.h:65
struct blocking_notifier_head  
	struct rw_semaphore rwsem;
	struct notifier_block __rcu *head;
 ;
Because it is prioritized, appending to the list requires walking it until an item with lower13 priority is found to insert the current item before. kernel/notifier.c:252
/*
 *	Blocking notifier chain routines.  All access to the chain is
 *	synchronized by an rwsem.
 */
static int __blocking_notifier_chain_register(struct blocking_notifier_head *nh,
					      struct notifier_block *n,
					      bool unique_priority)
 
	int ret;
	/*
	 * This code gets used during boot-up, when task switching is
	 * not yet working and interrupts must remain disabled.  At
	 * such times we must not call down_write().
	 */
	if (unlikely(system_state == SYSTEM_BOOTING))
		return notifier_chain_register(&nh->head, n, unique_priority);
	down_write(&nh->rwsem);
	ret = notifier_chain_register(&nh->head, n, unique_priority);
	up_write(&nh->rwsem);
	return ret;
 
kernel/notifier.c:20
/*
 *	Notifier chain core routines.  The exported routines below
 *	are layered on top of these, with appropriate locking added.
 */
static int notifier_chain_register(struct notifier_block **nl,
				   struct notifier_block *n,
				   bool unique_priority)
 
	while ((*nl) != NULL)  
		if (unlikely((*nl) == n))  
			WARN(1, "notifier callback %ps already registered",
			     n->notifier_call);
			return -EEXIST;
		 
		if (n->priority > (*nl)->priority)
			break;
		if (n->priority == (*nl)->priority && unique_priority)
			return -EBUSY;
		nl = &((*nl)->next);
	 
	n->next = *nl;
	rcu_assign_pointer(*nl, n);
	trace_notifier_register((void *)n->notifier_call);
	return 0;
 
Each callback can return one of a series of options. include/linux/notifier.h:18
#define NOTIFY_DONE		0x0000		/* Don't care */
#define NOTIFY_OK		0x0001		/* Suits me */
#define NOTIFY_STOP_MASK	0x8000		/* Don't call further */
#define NOTIFY_BAD		(NOTIFY_STOP_MASK 0x0002)
						/* Bad/Veto action */
When notifying the chain, if a function returns STOP or BAD then the previous parts of the chain are called again with PM_POST_HIBERNATION14 and an error is returned. kernel/notifier.c:107
/**
 * notifier_call_chain_robust - Inform the registered notifiers about an event
 *                              and rollback on error.
 * @nl:		Pointer to head of the blocking notifier chain
 * @val_up:	Value passed unmodified to the notifier function
 * @val_down:	Value passed unmodified to the notifier function when recovering
 *              from an error on @val_up
 * @v:		Pointer passed unmodified to the notifier function
 *
 * NOTE:	It is important the @nl chain doesn't change between the two
 *		invocations of notifier_call_chain() such that we visit the
 *		exact same notifier callbacks; this rules out any RCU usage.
 *
 * Return:	the return value of the @val_up call.
 */
static int notifier_call_chain_robust(struct notifier_block **nl,
				     unsigned long val_up, unsigned long val_down,
				     void *v)
 
	int ret, nr = 0;
	ret = notifier_call_chain(nl, val_up, v, -1, &nr);
	if (ret & NOTIFY_STOP_MASK)
		notifier_call_chain(nl, val_down, v, nr-1, NULL);
	return ret;
 
Each of these callbacks tends to be quite driver-specific, so we ll cease discussion of this here.

Sync Filesystems The next step is to ensure all filesystems have been synchronized to disk. This is performed via a simple helper function that times how long the full synchronize operation, ksys_sync takes. kernel/power/main.c:69
void ksys_sync_helper(void)
 
	ktime_t start;
	long elapsed_msecs;
	start = ktime_get();
	ksys_sync();
	elapsed_msecs = ktime_to_ms(ktime_sub(ktime_get(), start));
	pr_info("Filesystems sync: %ld.%03ld seconds\n",
		elapsed_msecs / MSEC_PER_SEC, elapsed_msecs % MSEC_PER_SEC);
 
EXPORT_SYMBOL_GPL(ksys_sync_helper);
ksys_sync wakes and instructs a set of flusher threads to write out every filesystem, first their inodes15, then the full filesystem, and then finally all block devices, to ensure all pages are written out to disk. fs/sync.c:87
/*
 * Sync everything. We start by waking flusher threads so that most of
 * writeback runs on all devices in parallel. Then we sync all inodes reliably
 * which effectively also waits for all flusher threads to finish doing
 * writeback. At this point all data is on disk so metadata should be stable
 * and we tell filesystems to sync their metadata via ->sync_fs() calls.
 * Finally, we writeout all block devices because some filesystems (e.g. ext2)
 * just write metadata (such as inodes or bitmaps) to block device page cache
 * and do not sync it on their own in ->sync_fs().
 */
void ksys_sync(void)
 
	int nowait = 0, wait = 1;
	wakeup_flusher_threads(WB_REASON_SYNC);
	iterate_supers(sync_inodes_one_sb, NULL);
	iterate_supers(sync_fs_one_sb, &nowait);
	iterate_supers(sync_fs_one_sb, &wait);
	sync_bdevs(false);
	sync_bdevs(true);
	if (unlikely(laptop_mode))
		laptop_sync_completion();
 
It follows an interesting pattern of using iterate_supers to run both sync_inodes_one_sb and then sync_fs_one_sb on each known filesystem16. It also calls both sync_fs_one_sb and sync_bdevs twice, first without waiting for any operations to complete and then again waiting for completion17. When laptop_mode is enabled the system runs additional filesystem synchronization operations after the specified delay without any writes. mm/page-writeback.c:111
/*
 * Flag that puts the machine in "laptop mode". Doubles as a timeout in jiffies:
 * a full sync is triggered after this time elapses without any disk activity.
 */
int laptop_mode;
EXPORT_SYMBOL(laptop_mode);
However, when running a filesystem synchronization operation, the system will add an additional timer to schedule more writes after the laptop_mode delay. We don t want the state of the system to change at all while performing hibernation, so we cancel those timers. mm/page-writeback.c:2198
/*
 * We're in laptop mode and we've just synced. The sync's writes will have
 * caused another writeback to be scheduled by laptop_io_completion.
 * Nothing needs to be written back anymore, so we unschedule the writeback.
 */
void laptop_sync_completion(void)
 
	struct backing_dev_info *bdi;
	rcu_read_lock();
	list_for_each_entry_rcu(bdi, &bdi_list, bdi_list)
		del_timer(&bdi->laptop_mode_wb_timer);
	rcu_read_unlock();
 
As a side note, the ksys_sync function is simply called when the system call sync is used. fs/sync.c:111
SYSCALL_DEFINE0(sync)
 
	ksys_sync();
	return 0;
 

The End of Preparation With that the system has finished preparations for hibernation. This is a somewhat arbitrary cutoff, but next the system will begin a full freeze of userspace to then dump memory out to an image and finally to perform hibernation. All this will be covered in future articles!
  1. Hibernation modes are outside of scope for this article, see the previous article for a high-level description of the different types of hibernation.
  2. Workqueues are a mechanism for running asynchronous tasks. A full description of them is a task for another time, but the kernel documentation on them is available here: https://www.kernel.org/doc/html/v6.9/core-api/workqueue.html 2
  3. This is a bit of an oversimplification, but since this isn t the main focus of this article this description has been kept to a higher level.
  4. Kconfig is Linux s build configuration system that sets many different macros to enable/disable various features.
  5. Kconfig defaults to the first default found
  6. Including checking whether the algorithm is larval? Which appears to indicate that it requires additional setup, but is an interesting choice of name for such a state.
  7. Specifically when we get to process freezing, which we ll get to in the next article in this series.
  8. Swap space is outside the scope of this article, but in short it is a buffer on disk that the kernel uses to store memory not current in use to free up space for other things. See Swap Management for more details.
  9. The code for this is lengthy and tangential, thus it has not been included here. If you re curious about the details of this, see kernel/power/hibernate.c:858 for the details of hibernate_quiet_exec, and drivers/nvdimm/core.c:451 for how it is used in nvdimm.
  10. Annoyingly this code appears to use the terms console and virtual terminal interchangeably.
  11. ioctls are special device-specific I/O operations that permit performing actions outside of the standard file interactions of read/write/seek/etc. 2
  12. I m not entirely clear on how this flag works, this subsystem is particularly complex.
  13. In this case a higher number is higher priority.
  14. Or whatever the caller passes as val_down, but in this case we re specifically looking at how this is used in hibernation.
  15. An inode refers to a particular file or directory within the filesystem. See Wikipedia for more details.
  16. Each active filesystem is registed with the kernel through a structure known as a superblock, which contains references to all the inodes contained within the filesystem, as well as function pointers to perform the various required operations, like sync.
  17. I m including minimal code in this section, as I m not looking to deep dive into the filesystem code at this time.

31 August 2024

Andrew Cater: Debian release weekend - media team update 202408311900 UTC

We're doing fairly well: Debian release team have been working really hard on a double point release today. Final release for Bullseye as 11.11 as it moves to LTS.

12.7 Bookworm install media finishing tests - it's been quite a long day so far.For 11.11 we're part way through media tests.

We've been joined by a lot of enthusiastic folk from Cape Town who've been a great help. Always nice to see old friends and new people join us on IRC - and they've just joined us for a short video call.

This has gone well: two release day media checking and bug-squashing groups on two continents is excellent.

Dear Cape Town - feel free to join us for the next time and we'll hold the video call open for longer. If we don't see any of you here in Cambridge for mini-Debconf, we'll meet up in Brest for Debconf 25.


28 July 2024

Marco d'Itri: An interesting architecture-dependent autopkgtest regression

More than two years after I last uploaded the purity-off Debian package, its autopkgtest (the Debian distribution-wide continuous integration system) started failing on arm64, and only on this architecture. The failing test is very simple: it prints a long stream of "y" or "n" characters to purity(6)'s standard input and then checks the output for the expected result. While investigating the live autopkgtest system, I figured out that: I did not have time to verify this, but I have noticed that the 25 years old purity(6) program calls TIOCGWINSZ to determine the screen length, and then uses the results in the answer buffer without checking if the ioctl(2) call returned an error. Which it obviously does in this case, because standard input is not a console but a pipe. So my theory is that paging is enabled because the undefined result of the ioctl has changed, and only on this architecture. Since I do not want to fix purity(6) right now, I have implemented the workaround of printing many more "y" characters as input.

17 July 2024

Kalyani Kenekar: Securing Your Website: Installing and Configuring Nginx with SSL

Logo Nginx

The Initial Encounter: I recently started to work with Nginx to explore the requirements on how to configure a then so called server block. It s quite different than within Apache. But there are a tons of good websites out there which do explain the different steps and options quite well. I also realized quickly that I need to be able to configure my Nginx setups in a way so the content is delivered through https with some automatic redirection from http URLs.
  • Let s install Nginx

Installing Nginx
$ sudo apt update
$ sudo apt install nginx

Checking your Web Server
  • We can check now nginx service is active or inactive
Output
  nginx.service - A high performance web server and a reverse proxy server
     Loaded: loaded (/lib/systemd/system/nginx.service; enabled; vendor preset: enabled)
     Active: active (running) since Tue 2024-02-12 09:59:20 UTC; 3h ago
       Docs: man:nginx(8)
   Main PID: 2887 (nginx)
      Tasks: 2 (limit: 1132)
     Memory: 4.2M
        CPU: 81ms
     CGroup: /system.slice/nginx.service
              2887 nginx: master process /usr/sbin/nginx -g daemon on; master_process on;
              2890 nginx: worker process
  • Now we successfully installed nginx and it in running state.

How To Secure Nginx with Let s Encrypt on Debian 12
  • In this documentation, you will use Certbot to obtain a free SSL certificate for Nginx on Debian 12 and set up your certificate.

Step 1 Installing Certbot $ sudo apt install certbot python3-certbot-nginx
  • Certbot is now ready to use, but in order for it to automatically configure SSL for Nginx, we need to verify some of Nginx s configuration.

Step 2 Confirming Nginx s Configuration
  • Certbot needs to be able to find the correct server block in your Nginx configuration for it to be able to automatically configure SSL. Specifically, it does this by looking for a server_name directive that matches the domain you request a certificate for. To check, open the configuration file for your domain using nano or your favorite text editor.
$ sudo vi /etc/nginx/sites-available/example.com
 server  
    listen 80;
    root /var/www/html/;
    index index.html;
    server_name example.com
    location /  
        try_files $uri $uri/ =404;
     
    location /test.html  
        try_files $uri $uri/ =404;
        auth_basic "admin area";
        auth_basic_user_file /etc/nginx/.htpasswd;
     
 
  • Fillup above data your project wise and then save the file, quit your editor, and verify the syntax of your configuration edits.
$ sudo nginx -t

Step 3 Obtaining an SSL Certificate
  • Certbot provides a variety of ways to obtain SSL certificates through plugins. The Nginx plugin will take care of reconfiguring Nginx and reloading the config whenever necessary. To use this plugin, type the following command line.
$ sudo certbot --nginx -d example.com
  • The configuration will be updated, and Nginx will reload to pick up the new settings. certbot will wrap up with a message telling you the process was successful and where your certificates are stored.
Output
IMPORTANT NOTES:
 - Congratulations! Your certificate and chain have been saved at:
   /etc/letsencrypt/live/example.com/fullchain.pem
   Your key file has been saved at:
   /etc/letsencrypt/live/example.com/privkey.pem
   Your cert will expire on 2024-05-12. To obtain a new or tweaked
   version of this certificate in the future, simply run certbot again
   with the "certonly" option. To non-interactively renew *all* of
   your certificates, run "certbot renew"
 - If you like Certbot, please consider supporting our work by:
   Donating to ISRG / Let's Encrypt:   https://letsencrypt.org/donate
   Donating to EFF:                    https://eff.org/donate-le
  • Your certificates are downloaded, installed, and loaded. Next check the syntax again of your configuration.
$ sudo nginx -t
  • If you get an error, reopen the server block file and check for any typos or missing characters. Once your configuration file s syntax is correct, reload Nginx to load the new configuration.
$ sudo systemctl reload nginx
  • Try reloading your website using https:// and notice your browser s security indicator. It should indicate that the site is properly secured, usually with a lock icon.
Now your website is secured by SSL usage.

Gunnar Wolf: Script for weather reporting in Waybar

While I was living in Argentina, we (my family) found ourselves checking for weather forecasts almost constantly weather there can be quite unexpected, much more so that here in Mexico. So it took me a bit of tinkering to come up with a couple of simple scripts to show the weather forecast as part of my Waybar setup. I haven t cared to share with anybody, as I believe them to be quite trivial and quite dirty. But today, V ctor was asking for some slightly-related things, so here I go. Please do remember I warned: Dirty. Forecast I am using OpenWeather s open API. I had to register to get an APPID, and it allows me for up to 1,000 API calls per day, more than plenty for my uses, even if I am logged in at my desktops at three different computers (not an uncommon situation). Having that, I set up a file named /etc/get_weather/, that currently reads:
# Home, Mexico City
LAT=19.3364
LONG=-99.1819
# # Home, Paran , Argentina
# LAT=-31.7208
# LONG=-60.5317
# # PKNU, Busan, South Korea
# LAT=35.1339
#LONG=129.1055
APPID=SomeLongRandomStringIAmNotSharing
Then, I have a simple script, /usr/local/bin/get_weather, that fetches the current weather and the forecast, and stores them as /run/weather.json and /run/forecast.json:
#!/usr/bin/bash
CONF_FILE=/etc/get_weather
if [ -e "$CONF_FILE" ]; then
    . "$CONF_FILE"
else
    echo "Configuration file $CONF_FILE not found"
    exit 1
fi
if [ -z "$LAT" -o -z "$LONG" -o -z "$APPID" ]; then
    echo "Configuration file must declare latitude (LAT), longitude (LONG) "
    echo "and app ID (APPID)."
    exit 1
fi
CURRENT=/run/weather.json
FORECAST=/run/forecast.json
wget -q "https://api.openweathermap.org/data/2.5/weather?lat=$ LAT &lon=$ LONG &units=metric&appid=$ APPID " -O "$ CURRENT "
wget -q "https://api.openweathermap.org/data/2.5/forecast?lat=$ LAT &lon=$ LONG &units=metric&appid=$ APPID " -O "$ FORECAST "
This script is called by the corresponding systemd service unit, found at /etc/systemd/system/get_weather.service:
[Unit]
Description=Get the current weather
[Service]
Type=oneshot
ExecStart=/usr/local/bin/get_weather
And it is run every 15 minutes via the following systemd timer unit, /etc/systemd/system/get_weather.timer:
[Unit]
Description=Get the current weather every 15 minutes
[Timer]
OnCalendar=*:00/15:00
Unit=get_weather.service
[Install]
WantedBy=multi-user.target
(yes, it runs even if I m not logged in, wasting some of my free API calls but within reason) Then, I declare a "custom/weather" module in the desired position of my ~/.config/waybar/waybar.config, and define it as:
"custom/weather":  
    "exec": "while true;do /home/gwolf/bin/parse_weather.rb;sleep 10; done",
"return-type": "json",
 ,
This script basically morphs a generic weather JSON description into another set of JSON bits that display my weather in the way I prefer to have it displayed as:
#!/usr/bin/ruby
require 'json'
Sources =  :weather => '/run/weather.json',
           :forecast => '/run/forecast.json'
           
Icons =  '01d' => ' ', # d   day
         '01n' => ' ', # n   night
         '02d' => ' ',
         '02n' => ' ',
         '03d' => ' ',
         '03n' => ' ',
         '04d'  => ' ',
         '04n' => ' ',
         '09d' => ' ',
         '10n' =>  '  ',
         '10d' => ' ',
         '13d' => ' ',
         '50d' => ' '
         
ret =  'text': nil, 'tooltip': nil, 'class': 'weather', 'percentage': 100 
# Current weather report: Main text of the module
begin
  weather = JSON.parse(open(Sources[:weather],'r').read)
  loc_name = weather['name']
  icon = Icons[weather['weather'][0]['icon']]   ' ' + f['weather'][0]['icon'] + f['weather'][0]['main']
  temp = weather['main']['temp']
  sens = weather['main']['feels_like']
  hum = weather['main']['humidity']
  wind_vel = weather['wind']['speed']
  wind_dir = weather['wind']['deg']
  portions =  
  portions[:loc] = loc_name
  portions[:temp] = '%s  %2.2f C (%2.2f)' % [icon, temp, sens]
  portions[:hum] = '  %2d%%' % hum
  portions[:wind] = ' %2.2fm/s %d ' % [wind_vel, wind_dir]
  ret['text'] = [:loc, :temp, :hum, :wind].map   p  portions[p] .join(' ')
rescue => err
  ret['text'] = 'Could not process weather file (%s   %s: %s)' % [Sources[:weather], err.class, err.to_s]
end
# Weather prevision for the following hours/days
begin
  cast = []
  forecast = JSON.parse(open(Sources[:forecast], 'r').read)
  min = ''
  max = ''
  day=Time.now.strftime('%Y.%m.%d')
  by_day =  
  forecast['list'].each_with_index do  f,i 
    by_day[day]  = []
    time = Time.at(f['dt'])
    time_lbl = '%02d:%02d' % [time.hour, time.min]
    icon = Icons[f['weather'][0]['icon']]   ' ' + f['weather'][0]['icon'] + f['weather'][0]['main']
    by_day[day] << f['main']['temp']
    if time.hour == 0
      min = '%2.2f' % by_day[day].min
      max = '%2.2f' % by_day[day].max
      cast << '          min: <b>%s C</b> max: <b>%s C</b>' % [min, max]
      day = time.strftime('%Y.%m.%d')
      cast << '        <b>%04d.%02d.%02d</b>  ' %
              [time.year, time.month, time.day]
    end
    cast << '%s   %2.2f C    %2d%%   %s %s' % [time_lbl,
                                                f['main']['temp'],
                                                f['main']['humidity'],
                                                icon,
                                                f['weather'][0]['description']
                                               ]
  end
  cast << '          min: <b>%s</b> C max: <b>%s C</b>' % [min, max]
  ret['tooltip'] = cast.join("\n")
  
rescue => err
  ret['tooltip'] = 'Could not process forecast file (%s   %s)' % [Sources[:forecast], err.class, err.to_s]
end
# Print out the result for Waybar to process
puts ret.to_json
The end result? Nothing too stunning, but definitively something I find useful and even nicely laid out: Screenshot Do note that it seems OpenWeather will return the name of the closest available meteorology station with (most?) recent data for my home, I often get Ciudad Universitaria, but sometimes Coyoac n or even San ngel Inn.

13 July 2024

Russ Allbery: podlators v6.0.1

This is a small bug-fix release to remove use of autodie from the build system for the module. podlators is included in Perl core, and at the point when it is built during the core build, the prerequisites of the autodie module are not yet met, so the module is not available. This release reverts to explicit error checking in all the files used by the build system. Thanks to James E Keenan for the report and the analysis. You can get the latest version from CPAN or the podlators distribution page.

4 July 2024

Arturo Borrero Gonz lez: Wikimedia Toolforge: migrating Kubernetes from PodSecurityPolicy to Kyverno

Le ch teau de Val re et le Haut de Cry en juillet 2022 Christian David, CC BY-SA 4.0, via Wikimedia Commons This post was originally published in the Wikimedia Tech blog, authored by Arturo Borrero Gonzalez. Summary: this article shares the experience and learnings of migrating away from Kubernetes PodSecurityPolicy into Kyverno in the Wikimedia Toolforge platform. Wikimedia Toolforge is a Platform-as-a-Service, built with Kubernetes, and maintained by the Wikimedia Cloud Services team (WMCS). It is completely free and open, and we welcome anyone to use it to build and host tools (bots, webservices, scheduled jobs, etc) in support of Wikimedia projects. We provide a set of platform-specific services, command line interfaces, and shortcuts to help in the task of setting up webservices, jobs, and stuff like building container images, or using databases. Using these interfaces makes the underlying Kubernetes system pretty much invisible to users. We also allow direct access to the Kubernetes API, and some advanced users do directly interact with it. Each account has a Kubernetes namespace where they can freely deploy their workloads. We have a number of controls in place to ensure performance, stability, and fairness of the system, including quotas, RBAC permissions, and up until recently PodSecurityPolicies (PSP). At the time of this writing, we had around 3.500 Toolforge tool accounts in the system. We early adopted PSP in 2019 as a way to make sure Pods had the correct runtime configuration. We needed Pods to stay within the safe boundaries of a set of pre-defined parameters. Back when we adopted PSP there was already the option to use 3rd party agents, like OpenPolicyAgent Gatekeeper, but we decided not to invest in them, and went with a native, built-in mechanism instead. In 2021 it was announced that the PSP mechanism would be deprecated, and removed in Kubernetes 1.25. Even though we had been warned years in advance, we did not prioritize the migration of PSP until we were in Kubernetes 1.24, and blocked, unable to upgrade forward without taking actions. The WMCS team explored different alternatives for this migration, but eventually we decided to go with Kyverno as a replacement for PSP. And so with that decision it began the journey described in this blog post. First, we needed a source code refactor for one of the key components of our Toolforge Kubernetes: maintain-kubeusers. This custom piece of software that we built in-house, contains the logic to fetch accounts from LDAP and do the necessary instrumentation on Kubernetes to accommodate each one: create namespace, RBAC, quota, a kubeconfig file, etc. With the refactor, we introduced a proper reconciliation loop, in a way that the software would have a notion of what needs to be done for each account, what would be missing, what to delete, upgrade, and so on. This would allow us to easily deploy new resources for each account, or iterate on their definitions. The initial version of the refactor had a number of problems, though. For one, the new version of maintain-kubeusers was doing more filesystem interaction than the previous version, resulting in a slow reconciliation loop over all the accounts. We used NFS as the underlying storage system for Toolforge, and it could be very slow because of reasons beyond this blog post. This was corrected in the next few days after the initial refactor rollout. A side note with an implementation detail: we stored a configmap on each account namespace with the state of each resource. Storing more state on this configmap was our solution to avoid additional NFS latency. I initially estimated this refactor would take me a week to complete, but unfortunately it took me around three weeks instead. Previous to the refactor, there were several manual steps and cleanups required to be done when updating the definition of a resource. The process is now automated, more robust, performant, efficient and clean. So in my opinion it was worth it, even if it took more time than expected. Then, we worked on the Kyverno policies themselves. Because we had a very particular PSP setting, in order to ease the transition, we tried to replicate their semantics on a 1:1 basis as much as possible. This involved things like transparent mutation of Pod resources, then validation. Additionally, we had one different PSP definition for each account, so we decided to create one different Kyverno namespaced policy resource for each account namespace remember, we had 3.5k accounts. We created a Kyverno policy template that we would then render and inject for each account. For developing and testing all this, maintain-kubeusers and the Kyverno bits, we had a project called lima-kilo, which was a local Kubernetes setup replicating production Toolforge. This was used by each engineer in their laptop as a common development environment. We had planned the migration from PSP to Kyverno policies in stages, like this:
  1. update our internal template generators to make Pod security settings explicit
  2. introduce Kyverno policies in Audit mode
  3. see how the cluster would behave with them, and if we had any offending resources reported by the new policies, and correct them
  4. modify Kyverno policies and set them in Enforce mode
  5. drop PSP
In stage 1, we updated things like the toolforge-jobs-framework and tools-webservice. In stage 2, when we deployed the 3.5k Kyverno policy resources, our production cluster died almost immediately. Surprise. All the monitoring went red, the Kubernetes apiserver became irresponsibe, and we were unable to perform any administrative actions in the Kubernetes control plane, or even the underlying virtual machines. All Toolforge users were impacted. This was a full scale outage that required the energy of the whole WMCS team to recover from. We temporarily disabled Kyverno until we could learn what had occurred. This incident happened despite having tested before in lima-kilo and in another pre-production cluster we had, called Toolsbeta. But we had not tested that many policy resources. Clearly, this was something scale-related. After the incident, I went on and created 3.5k Kyverno policy resources on lima-kilo, and indeed I was able to reproduce the outage. We took a number of measures, corrected a few errors in our infrastructure, reached out to the Kyverno upstream developers, asking for advice, and at the end we did the following to accommodate the setup to our needs: I have to admit, I was briefly tempted to drop Kyverno, and even stop pursuing using an external policy agent entirely, and write our own custom admission controller out of concerns over performance of this architecture. However, after applying all the measures listed above, the system became very stable, so we decided to move forward. The second attempt at deploying it all went through just fine. No outage this time When we were in stage 4 we detected another bug. We had been following the Kubernetes upstream documentation for setting securityContext to the right values. In particular, we were enforcing the procMount to be set to the default value, which per the docs it was DefaultProcMount . However, that string is the name of the internal variable in the source code, whereas the actual default value is the string Default . This caused pods to be rightfully rejected by Kyverno while we figured the problem. I sent a patch upstream to fix this problem. We finally had everything in place, reached stage 5, and we were able to disable PSP. We unloaded the PSP controller from the kubernetes apiserver, and deleted every individual PSP definition. Everything was very smooth in this last step of the migration. This whole PSP project, including the maintain-kubeusers refactor, the outage, and all the different migration stages took roughly three months to complete. For me there are a number of valuable reasons to learn from this project. For one, the scale is something to consider, and test, when evaluating a new architecture or software component. Not doing so can lead to service outages, or unexpectedly poor performances. This is in the first chapter of the SRE handbook, but we got a reminder the hard way This post was originally published in the Wikimedia Tech blog, authored by Arturo Borrero Gonzalez.

3 July 2024

Sahil Dhiman: RTI to NPL Regarding Their NTP Infrastructure

I became interested in Network Time Protocol (NTP) last year after learning how fundamental this protocol is to the functioning of the global Internet. NTP helps synchronize clocks on devices over the Internet, which is essential for secure browsing, timestamping, keeping everyone in sync or just checking what time it is. Computers usually have a hardware real-time clock (RTC) but that deviates over time, so an occasional sync over NTP is required to keep the time accurate. Many network and IoT devices don t have hardware RTC so have even more reliance on NTP. Accurate time keeping starts with reference clocks like atomic clocks, GPS etc. Multiple government standard agencies host these reference clocks, which are regarded as Stratum 0. Stratum 1 servers are known as primary servers, and directly connect to Stratum 0 clocks for time. Stratum 1 servers then distribute time to Stratum 2 and further down the hierarchy. Computers typically connects to one or more Stratum 1/2/3 servers to get their time. Someone has to host these public Stratum 1,2,3 NTP servers. That s what NTP pool, a global effort by volunteers does. They provide NTP servers for the public to use. As of today, there are 4700+ servers in the pool which are free to use for anyone. Now let s come to the reason for writing this post. Indian Computer Emergency Response Team (CERT-In) in April 2022 released a set of cybersecurity directions which set the alarm bells ringing. Internet Society (and almost everyone else) wrote about it. And then there was this specific section about NTP:
All service providers, intermediaries, data centres, body corporate and Government organisations shall connect to the Network Time Protocol (NTP) Server of National Informatics Centre (NIC) or National Physical Laboratory (NPL) or with NTP servers traceable to these NTP servers, for synchronisation of all their ICT systems clocks. Entities having ICT infrastructure spanning multiple geographies may also use accurate and standard time source other than NPL and NIC, however it is to be ensured that their time source shall not deviate from NPL and NIC.
CSIR-National Physical Laboratory (NPL) is the official timekeeper for India and hosts the only public Stratum 1 clock in India, according to NTP pool website. So I was naturally curious to know what kind of infrastructure they re running for NTP. India has a Right to Information (RTI) Act which, like the Freedom of Information Act (FOIA) in the United States, gives citizens rights to request information from governmental entities, to which they have to respond in under 30 days. So last year, I filed two sets of RTI (one after the first reply came) inquiring about NPL s public NTP server setup. The first RTI had some generic questions: RTI 1
First RTI. Click to enlarge
This gave a vague idea about the setup, so I sat down and came with some specific questions in the next RTI. RTI 2
Second RTI. Click to enlarge
Feel free to make your conclusions from it now. Bear in mind these were filled last year so things might have changed. Do let me know if you have more information about it. Update (07/07/2024): Found an article from Medianama about Indian government time servers with information sourced through RTI.

28 June 2024

Matthew Palmer: Checking for Compromised Private Keys has Never Been Easier

As regular readers would know, since I never stop banging on about it, I run Pwnedkeys, a service which finds and collates private keys which have been disclosed or are otherwise compromised. Until now, the only way to check if a key is compromised has been to use the Pwnedkeys API, which is not necessarily trivial for everyone. Starting today, that s changing. The next phase of Pwnedkeys is to start offering more user-friendly tools for checking whether keys being used are compromised. These will typically be web-based or command-line tools intended to answer the question is the key in this (certificate, CSR, authorized_keys file, TLS connection, email, etc) known to Pwnedkeys to have been compromised? .

Opening the Toolbox Available right now are the first web-based key checking tools in this arsenal. These tools allow you to:
  1. Check the key in a PEM-format X509 data structure (such as a CSR or certificate);
  2. Check the keys in an authorized_keys file you upload; and
  3. Check the SSH keys used by a user at any one of a number of widely-used code-hosting sites.
Further planned tools include live checking of the certificates presented in TLS connections (for HTTPS, etc), SSH host keys, command-line utilities for checking local authorized_keys files, and many other goodies.

If You Are Intrigued By My Ideas and wish to subscribe to my newsletter, now you can! I m not going to be blogging every little update to Pwnedkeys, because that would probably get a bit tedious for readers who aren t as intrigued by compromised keys as I am. Instead, I ll be posting every little update in the Pwnedkeys newsletter. So, if you want to keep up-to-date with the latest and greatest news and information, subscribe to the newsletter.

Supporting Pwnedkeys All this work I m doing on my own time, and I m paying for the infrastructure from my own pocket. If you ve got a few dollars to spare, I d really appreciate it if you bought me a refreshing beverage. It helps keep the lights on here at Pwnedkeys Global HQ.

8 June 2024

Thorsten Alteholz: My Debian Activities in May 2024

FTP master This month I accepted 347 and rejected 49 packages. The overall number of packages that got accepted was 348.

Debian LTS This was my hundred-nineteenth month that I did some work for the Debian LTS initiative, started by Raphael Hertzog at Freexian. During my allocated time I uploaded or worked on: I also continued to work on tiff and last but not least did a week of FD and attended the monthly LTS/ELTS meeting. Unfortunately I used lots of time to debug an issue with nghttp2. Please see my odyssey below. Debian ELTS This month was the seventieth ELTS month. During my allocated time I uploaded: For some tests I installed the new nghttp2 package on my Stretch VM and started the daemon. Unfortunately I got an unexpected error from getaddrinfo() about ai_socktype not supported. The daemon was configured to listen on lo, the device was available, but the error remained. I was pretty sure that my patch was not the reason for this and indeed the unpatched version showed this error as well. I didn t want to release an untested package, so nghttp2 had to start at least! Therefore I built a minimal example to reproduce the issue. getaddrinfo() failed for hints.ai_socktype=SOCK_STREAM and a numerical IP address. Having no hints at all or localhost instead of 127.0.0.1 made the error disappear (as a remark: localhost resolves to 127.0.0.1, the ipv6 variant is ip6-localhost ). I could see that in nghttp2 as well. Configuring it with localhost let the error vanish but the daemon still exited due to other reasons. After some time of debugging, I added another network interface to my VM and configured it with a dummy IPv4 address. Voila, everything worked as expected. According to Wikipedia, IPv6 was ratified as standard in 2017 and Stretch was also released in 2017. No wonder that a IPv6-only-VM had problems back then and these problems survived to the present. I also continued to work on an update for tiff in Jessie and Stretch, did a week of FD and attended the LTS/ELTS meeting. Debian Printing This month I uploaded new upstream or bugfix versions of: This work is generously funded by Freexian! Debian Astro This month I uploaded a new upstream or bugfix version of: Debian IoT This month I uploaded new upstream or bugfix versions of: Debian Mobcom Due to more and more problems with time_t, I removed osmo-iuh and all dependencies from armel, armhf and i386, sorry. If there is really anybody using this software on 32-bit architectures don t hesitate to get in touch. It is official now, the GSoC student working on the Mobcom packages is Nathan Doris. He already finished the hardest part of the job and I could upload the latest version of libosmocore. I really enjoy working with him and look forward to a pleasant SoC :-). misc This month I uploaded new upstream or bugfix versions of: Did I already mention that I love lists with topics I can work on. I print out such lists and enjoy checking off one after the other. End of May Helmut told me that I am a bit lazy and gave me such a list with all my packages that have one or the other issue with /usr-move. Most of the uploads above are packages on that list and I could check off a lot :-).

Next.