Search Results: "beh"

1 October 2024

Ravi Dwivedi: State of the Map Conference in Kenya

Last month, I traveled to Kenya to attend a conference called State of the Map 2024 ( SotM for short), which is an annual meetup of OpenStreetMap contributors from all over the world. It was held at the University of Nairobi Towers in Nairobi, from the 6th to the 8th of September.
University of Nairobi.
I have been contributing to OpenStreetMap for the last three years, and this conference seemed like a great opportunity to network with others in the community. As soon as I came across the travel grant announcement, I jumped in and filled the form immediately. I was elated when I was selected for the grant and couldn t wait to attend. The grant had an upper limit of 1200 and covered food, accommodation, travel and miscellaneous expenses such as visa fee. Pre-travel tasks included obtaining Kenya s eTA and getting a yellow fever vaccine. Before the conference, Mikko from the Humanitarian OpenStreetMap Team introduced me to Rabina and Pragya from Nepal, Ibtehal from Bangladesh, and Sajeevini from Sri Lanka. We all booked the Nairobi Transit Hotel, which was within walking distance of the conference venue. Pragya, Rabina, and I traveled together from Delhi to Nairobi, while Ibtehal was my roommate in the hotel.
Our group at the conference.
The venue, University of Nairobi Towers, was a tall building and the conference was held on the fourth, fifth and sixth floors. The open area on the fifth floor of the building had a nice view of Nairobi s skyline and was a perfect spot for taking pictures. Interestingly, the university had a wing dedicated to Mahatma Gandhi, who is regarded in India as the Father of the Nation.
View of Nairobi's skyline from the open area on the fifth floor.
A library in Mahatma Gandhi wing of the University of Nairobi.
The diversity of the participants was mind-blowing, with people coming from a whopping 54 countries. I was surprised to notice that I was the only participant traveling from India, despite India having a large OpenStreetMap community. That said, there were two other Indian participants who traveled from other countries. I finally got to meet Arnalie (from the Phillipines) and Letwin (from Zimbabwe), both of whom I had only met online before. I had met Anisa (from Albania) earlier during DebConf 2023. But I missed Mikko and Honey from the Humanitarian OpenStreetMap Team, whom I knew from the Open Mapping Guru program. I learned about the extent of OSM use through Pragya and Rabina s talk; about the logistics of running the OSM Board, in the OSMF (OpenStreetMap Foundation) session; about the Youth Mappers from Sajeevini, about the OSM activities in Malawi from Priscilla Kapolo, and about mapping in Zimbabwe from Letwin. However, I missed Ibtehal s lightning session. The ratio of women speakers and participants at the conference was impressive, and I hope we can get such gender representation in our Delhi/NCR mapping parties.
One of the conference halls where talks took place.
Outside of talks, the conference also had lunch and snack breaks, giving ample time for networking with others. In the food department, there were many options for a lacto-ovo vegetarian like myself, including potatoes, rice, beans, chips etc. I found out that the milk tea in Kenya (referred to as white tea ) is usually not as strong compared to India, so I switched to coffee (which is also called white coffee when taken with milk). The food wasn t spicy, but I can t complain :) Fruit juices served as a nice addition to lunch.
One of the lunch meals served during the conference.
At the end of the second day of the conference, there was a surprise in store for us a bus ride to the Bao Box restaurant. The ride gave us the experience of a typical Kenyan matatu (privately-owned minibuses used as share taxis), complete with loud rap music. I remember one of the songs being Kraff s Nursery Rhymes. That day, I was wearing an original Kenyan cricket jersey - one that belonged to Dominic Wesonga, who represented Kenya in four ODIs. This confused Priscilla Kapolo, who asked if I was from Kenya! Anyway, while it served as a good conversation starter, it didn t attract as much attention as I expected :) I had some pizza and chips there, and later some drinks with Ibtehal. After the party, Piyush went with us to our hotel and we played a few games of UNO.
Minibus which took us from the university to Bao Box restaurant.
This minibus in the picture gave a sense of a real matatu.
I am grateful to the organizers Laura and Dorothea for introducing me to Nikhil when I was searching for a companion for my post-conference trip. Nikhil was one of the aforementioned Indian participants, and a wildlife lover. We had some nice conversations; he wanted to go to the Masai Maara Natural Reserve, but it was too expensive for me. In addition, all the safaris were multi-day affairs, and I wasn t keen on being around wildlife for that long. Eventually I chose to go my own way, exploring the coastal side and visiting Mombasa. While most of the work regarding the conference was done using free software (including the reimbursement form and Mastodon announcements), I was disappointed by the use of WhatsApp for coordination with the participants. I don t use WhatsApp and so was left out. WhatsApp is proprietary software (they do not provide the source code) and users don t control it. It is common to highlight that OpenStreetMap is controlled by users and the community, rather than a company - this should apply to WhatsApp as well. My suggestion is to use XMPP, which shares similar principles with OpenStreetMap, as it is privacy-respecting, controlled by users, and powered by free software. I understand the concern that there might not be many participants using XMPP already. Although it is a good idea to onboard people to free software like XMPP, we can also create a Matrix group, and bridge it with both the XMPP group and the Telegram group. In fact, using Matrix and bridging it with Telegram is how I communicated with the South Asian participants. While it s not ideal - as Telegram s servers are proprietary and centralized - but it s certainly much better than creating a WhatsApp-only group. The setup can be bridged with IRC as well. On the other hand, self-hosted mailing lists for participants is also a good idea. Finally, I would like to thank SotM for the generous grant, enabling me to attend this conference, meet the diverse community behind OSM and visit the beautiful country of Kenya. Stay tuned for the blog post on Kenya trip. Thanks to Sahilister, Contrapunctus, Snehal and Badri for reviewing the draft of this blog post before publishing.

29 September 2024

Reproducible Builds: Supporter spotlight: Kees Cook on Linux kernel security

The Reproducible Builds project relies on several projects, supporters and sponsors for financial support, but they are also valued as ambassadors who spread the word about our project and the work that we do. This is the eighth installment in a series featuring the projects, companies and individuals who support the Reproducible Builds project. We started this series by featuring the Civil Infrastructure Platform project, and followed this up with a post about the Ford Foundation as well as recent ones about ARDC, the Google Open Source Security Team (GOSST), Bootstrappable Builds, the F-Droid project, David A. Wheeler and Simon Butler. Today, however, we will be talking with Kees Cook, founder of the Kernel Self-Protection Project.

Vagrant Cascadian: Could you tell me a bit about yourself? What sort of things do you work on? Kees Cook: I m a Free Software junkie living in Portland, Oregon, USA. I have been focusing on the upstream Linux kernel s protection of itself. There is a lot of support that the kernel provides userspace to defend itself, but when I first started focusing on this there was not as much attention given to the kernel protecting itself. As userspace got more hardened the kernel itself became a bigger target. Almost 9 years ago I formally announced the Kernel Self-Protection Project because the work necessary was way more than my time and expertise could do alone. So I just try to get people to help as much as possible; people who understand the ARM architecture, people who understand the memory management subsystem to help, people who understand how to make the kernel less buggy.
Vagrant: Could you describe the path that lead you to working on this sort of thing? Kees: I have always been interested in security through the aspect of exploitable flaws. I always thought it was like a magic trick to make a computer do something that it was very much not designed to do and seeing how easy it is to subvert bugs. I wanted to improve that fragility. In 2006, I started working at Canonical on Ubuntu and was mainly focusing on bringing Debian and Ubuntu up to what was the state of the art for Fedora and Gentoo s security hardening efforts. Both had really pioneered a lot of userspace hardening with compiler flags and ELF stuff and many other things for hardened binaries. On the whole, Debian had not really paid attention to it. Debian s packaging building process at the time was sort of a chaotic free-for-all as there wasn t centralized build methodology for defining things. Luckily that did slowly change over the years. In Ubuntu we had the opportunity to apply top down build rules for hardening all the packages. In 2011 Chrome OS was following along and took advantage of a bunch of the security hardening work as they were based on ebuild out of Gentoo and when they looked for someone to help out they reached out to me. We recognized the Linux kernel was pretty much the weakest link in the Chrome OS security posture and I joined them to help solve that. Their userspace was pretty well handled but the kernel had a lot of weaknesses, so focusing on hardening was the next place to go. When I compared notes with other users of the Linux kernel within Google there were a number of common concerns and desires. Chrome OS already had an upstream first requirement, so I tried to consolidate the concerns and solve them upstream. It was challenging to land anything in other kernel team repos at Google, as they (correctly) wanted to minimize their delta from upstream, so I needed to work on any major improvements entirely in upstream and had a lot of support from Google to do that. As such, my focus shifted further from working directly on Chrome OS into being entirely upstream and being more of a consultant to internal teams, helping with integration or sometimes backporting. Since the volume of needed work was so gigantic I needed to find ways to inspire other developers (both inside and outside of Google) to help. Once I had a budget I tried to get folks paid (or hired) to work on these areas when it wasn t already their job.
Vagrant: So my understanding of some of your recent work is basically defining undefined behavior in the language or compiler? Kees: I ve found the term undefined behavior to have a really strict meaning within the compiler community, so I have tried to redefine my goal as eliminating unexpected behavior or ambiguous language constructs . At the end of the day ambiguity leads to bugs, and bugs lead to exploitable security flaws. I ve been taking a four-pronged approach: supporting the work people are doing to get rid of ambiguity, identify new areas where ambiguity needs to be removed, actually removing that ambiguity from the C language, and then dealing with any needed refactoring in the Linux kernel source to adapt to the new constraints. None of this is particularly novel; people have recognized how dangerous some of these language constructs are for decades and decades but I think it is a combination of hard problems and a lot of refactoring that nobody has the interest/resources to do. So, we have been incrementally going after the lowest hanging fruit. One clear example in recent years was the elimination of C s implicit fall-through in switch statements. The language would just fall through between adjacent cases if a break (or other code flow directive) wasn t present. But this is ambiguous: is the code meant to fall-through, or did the author just forget a break statement? By defining the [[fallthrough]] statement, and requiring its use in Linux, all switch statements now have explicit code flow, and the entire class of bugs disappeared. During our refactoring we actually found that 1 in 10 added [[fallthrough]] statements were actually missing break statements. This was an extraordinarily common bug! So getting rid of that ambiguity is where we have been. Another area I ve been spending a bit of time on lately is looking at how defensive security work has challenges associated with metrics. How do you measure your defensive security impact? You can t say because we installed locks on the doors, 20% fewer break-ins have happened. Much of our signal is always secondary or retrospective, which is frustrating: This class of flaw was used X much over the last decade so, and if we have eliminated that class of flaw and will never see it again, what is the impact? Is the impact infinity? Attackers will just move to the next easiest thing. But it means that exploitation gets incrementally more difficult. As attack surfaces are reduced, the expense of exploitation goes up.
Vagrant: So it is hard to identify how effective this is how bad would it be if people just gave up? Kees: I think it would be pretty bad, because as we have seen, using secondary factors, the work we have done in the industry at large, not just the Linux kernel, has had an impact. What we, Microsoft, Apple, and everyone else is doing for their respective software ecosystems, has shown that the price of functional exploits in the black market has gone up. Especially for really egregious stuff like a zero-click remote code execution. If those were cheap then obviously we are not doing something right, and it becomes clear that it s trivial for anyone to attack the infrastructure that our lives depend on. But thankfully we have seen over the last two decades that prices for exploits keep going up and up into millions of dollars. I think it is important to keep working on that because, as a central piece of modern computer infrastructure, the Linux kernel has a giant target painted on it. If we give up, we have to accept that our computers are not doing what they were designed to do, which I can t accept. The safety of my grandparents shouldn t be any different from the safety of journalists, and political activists, and anyone else who might be the target of attacks. We need to be able to trust our devices otherwise why use them at all?
Vagrant: What has been your biggest success in recent years? Kees: I think with all these things I am not the only actor. Almost everything that we have been successful at has been because of a lot of people s work, and one of the big ones that has been coordinated across the ecosystem and across compilers was initializing stack variables to 0 by default. This feature was added in Clang, GCC, and MSVC across the board even though there were a lot of fears about forking the C language. The worry was that developers would come to depend on zero-initialized stack variables, but this hasn t been the case because we still warn about uninitialized variables when the compiler can figure that out. So you still still get the warnings at compile time but now you can count on the contents of your stack at run-time and we drop an entire class of uninitialized variable flaws. While the exploitation of this class has mostly been around memory content exposure, it has also been used for control flow attacks. So that was politically and technically a large challenge: convincing people it was necessary, showing its utility, and implementing it in a way that everyone would be happy with, resulting in the elimination of a large and persistent class of flaws in C.
Vagrant: In a world where things are generally Reproducible do you see ways in which that might affect your work? Kees: One of the questions I frequently get is, What version of the Linux kernel has feature $foo? If I know how things are built, I can answer with just a version number. In a Reproducible Builds scenario I can count on the compiler version, compiler flags, kernel configuration, etc. all those things are known, so I can actually answer definitively that a certain feature exists. So that is an area where Reproducible Builds affects me most directly. Indirectly, it is just being able to trust the binaries you are running are going to behave the same for the same build environment is critical for sane testing.
Vagrant: Have you used diffoscope? Kees: I have! One subset of tree-wide refactoring that we do when getting rid of ambiguous language usage in the kernel is when we have to make source level changes to satisfy some new compiler requirement but where the binary output is not expected to change at all. It is mostly about getting the compiler to understand what is happening, what is intended in the cases where the old ambiguity does actually match the new unambiguous description of what is intended. The binary shouldn t change. We have used diffoscope to compare the before and after binaries to confirm that yep, there is no change in binary .
Vagrant: You cannot just use checksums for that? Kees: For the most part, we need to only compare the text segments. We try to hold as much stable as we can, following the Reproducible Builds documentation for the kernel, but there are macros in the kernel that are sensitive to source line numbers and as a result those will change the layout of the data segment (and sometimes the text segment too). With diffoscope there s flexibility where I can exclude or include different comparisons. Sometimes I just go look at what diffoscope is doing and do that manually, because I can tweak that a little harder, but diffoscope is definitely the default. Diffoscope is awesome!
Vagrant: Where has reproducible builds affected you? Kees: One of the notable wins of reproducible builds lately was dealing with the fallout of the XZ backdoor and just being able to ask the question is my build environment running the expected code? and to be able to compare the output generated from one install that never had a vulnerable XZ and one that did have a vulnerable XZ and compare the results of what you get. That was important for kernel builds because the XZ threat actor was working to expand their influence and capabilities to include Linux kernel builds, but they didn t finish their work before they were noticed. I think what happened with Debian proving the build infrastructure was not affected is an important example of how people would have needed to verify the kernel builds too.
Vagrant: What do you want to see for the near or distant future in security work? Kees: For reproducible builds in the kernel, in the work that has been going on in the ClangBuiltLinux project, one of the driving forces of code and usability quality has been the continuous integration work. As soon as something breaks, on the kernel side, the Clang side, or something in between the two, we get a fast signal and can chase it and fix the bugs quickly. I would like to see someone with funding to maintain a reproducible kernel build CI. There have been places where there are certain architecture configurations or certain build configuration where we lose reproducibility and right now we have sort of a standard open source development feedback loop where those things get fixed but the time in between introduction and fix can be large. Getting a CI for reproducible kernels would give us the opportunity to shorten that time.
Vagrant: Well, thanks for that! Any last closing thoughts? Kees: I am a big fan of reproducible builds, thank you for all your work. The world is a safer place because of it.
Vagrant: Likewise for your work!


For more information about the Reproducible Builds project, please see our website at reproducible-builds.org. If you are interested in ensuring the ongoing security of the software that underpins our civilisation and wish to sponsor the Reproducible Builds project, please reach out to the project by emailing contact@reproducible-builds.org.

23 September 2024

Jonathan McDowell: The (lack of a) return-to-office conspiracy

During COVID companies suddenly found themselves able to offer remote working where it hadn t previously been on offer. That s changed over the past 2 or so years, with most places I m aware of moving back from a fully remote situation to either some sort of hybrid, or even full time office attendance. For example last week Amazon announced a full return to office, having already pulled remote-hired workers in for 3 days a week. I ve seen a lot of folk stating they ll never work in an office again, and that RTO is insanity. Despite being lucky enough to work fully remotely (for a role I d been approached about before, but was never prepared to relocate for), I feel the objections from those who are pro-remote often fail to consider the nuances involved. So let s talk about some of the reasons why companies might want to enforce some sort of RTO.

Real estate value Let s clear this one up first. It s not about real estate value, for most companies. City planners and real estate investors might care, but even if your average company owned their building they d close it in an instant all other things being equal. An unoccupied building costs a lot less to maintain. And plenty of companies rent and would save money even if there s a substantial exit fee.

Occupancy levels That said, once you have anyone in the building the equation changes. If you re having to provide power, heating, internet, security/front desk staff etc, you want to make sure you re getting your money s worth. There s no point heating a building that can seat 100 for only 10 people present. One option is to downsize the building, but that leads to not being able to assign everyone a desk, for example. No one I know likes hot desking. There are also scheduling problems about ensuring there are enough desks for everyone who might turn up on a certain day, and you ve ruled out the option of company/office wide events.

Coexistence builds relationships As a remote worker I wish it wasn t true that most people find it easier to form relationships in person, but it is. Some of this can be worked on with specific teambuilding style events, rather than in office working, but I know plenty of folk who hate those as much as they hate the idea of being in the office. I am lucky in that I work with a bunch of folk who are terminally online, so it s much easier to have those casual conversations even being remote, but I also accept I miss out on some things because I m just not in the office regularly enough. You might not care about this ( I just need to put my head down and code, not talk to people ), but don t discount it as a valid reason why companies might want their workers to be in the office. This often matters even more for folk at the start of their career, where having a bunch of experience folk around to help them learn and figure things out ends up working much better in person (my first job offered to let me go mostly remote when I moved to Norwich, but I said no as I knew I wasn t ready for it yet).

Coexistence allows for unexpected interactions People hate the phrase water cooler chat , and I get that, but it covers the idea of casual conversations that just won t happen the same way when people are remote. I experienced this while running Black Cat; every time Simon and I met up in person we had a bunch of useful conversations even though we were on IRC together normally, and had a VoIP setup that meant we regularly talked too. Equally when I was at Nebulon there were conversations I overheard in the office where I was able to correct a misconception or provide extra context. Some of this can be replicated with the right online chat culture, but I ve found many places end up with folk taking conversations to DMs, or they happen in private channels. It happens more naturally in an office environment.

It s easier for bad managers to manage bad performers Again, this falls into the category of things that shouldn t be true, but are. Remote working has increased the ability for people who want to slack off to do so without being easily detected. Ideally what you want is that these folk, if they fail to perform, are then performance managed out of the organisation. That s hard though, there are (rightly) a bunch of rights workers have (I m writing from a UK perspective) around the procedure that needs to be followed. Managers need organisational support in this to make sure they get it right (and folk are given a chance to improve), which is often lacking.

Summary Look, I get there are strong reasons why offering remote is a great thing from the company perspective, but what I ve tried to outline here is that a return-to-office mandate can have some compelling reasons behind it too. Some of those might be things that wouldn t exist in an ideal world, but unfortunately fixing them is a bigger issue than just changing where folk work from. Not acknowledging that just makes any reaction against office work seem ill-informed, to me.

8 September 2024

Antonio Terceiro: gotcha: using ccache in Debian package builds

Before I upload packages to Debian, I always do a full build from source under sbuild. This ensures that the package can build from source on a clean environment, implying that the set of build dependencies is complete. But when iterating on a non-trivial package locally, I will usually build the package directly on my Debian testing system, and I want to take advantage of ccache to cache native (C/C++) code compilation to speed things up. In Debian, the easiest way to enable ccache is to add /usr/lib/ccache to your $PATH. I do this by doing something similar to the following in my ~/.bashrc:
export PATH=/usr/lib/ccache:$PATH
I noticed, however, that my Debian package builds were not using the cache. When building the same small package manually using make, the cache was used, but not when the build was wrapped with dpkg-buildpackage. I tracked it down to the fact that in compatibility level 13+, debhelper will set $HOME to a temporary directory. For what's it worth, I think that's a good thing: you don't want package builds reaching for your home directory as that makes it harder to make builds reproducible, among other things. This behavior, however, breaks ccache. The default cache directory is $HOME/.ccache, but that only gets resolved when ccache is actually used. So we end up starting with an empty cache on each build, get a 100% cache miss rate, and still pay for the overhead of populating the cache. The fix is to explicitly set $CCACHE_DIR upfront, so that by the time $HOME gets overriden, it doesn't matter anymore for ccache. I did this in my ~/.bashrc:
export CCACHE_DIR=$HOME/.ccache
This way, $HOME will be expanded right there when the shell starts, and by the time ccache is called, it will use the persistent cache in my home directory even though $HOME will, at that point, refer to a temporary directory.

Jacob Adams: Linux's Bedtime Routine

How does Linux move from an awake machine to a hibernating one? How does it then manage to restore all state? These questions led me to read way too much C in trying to figure out how this particular hardware/software boundary is navigated. This investigation will be split into a few parts, with the first one going from invocation of hibernation to synchronizing all filesystems to disk. This article has been written using Linux version 6.9.9, the source of which can be found in many places, but can be navigated easily through the Bootlin Elixir Cross-Referencer: https://elixir.bootlin.com/linux/v6.9.9/source Each code snippet will begin with a link to the above giving the file path and the line number of the beginning of the snippet.

A Starting Point for Investigation: /sys/power/state and /sys/power/disk These two system files exist to allow debugging of hibernation, and thus control the exact state used directly. Writing specific values to the state file controls the exact sleep mode used and disk controls the specific hibernation mode1. This is extremely handy as an entry point to understand how these systems work, since we can just follow what happens when they are written to.

Show and Store Functions These two files are defined using the power_attr macro: kernel/power/power.h:80
#define power_attr(_name) \
static struct kobj_attribute _name##_attr =     \
    .attr   =               \
        .name = __stringify(_name), \
        .mode = 0644,           \
     ,                  \
    .show   = _name##_show,         \
    .store  = _name##_store,        \
 
show is called on reads and store on writes. state_show is a little boring for our purposes, as it just prints all the available sleep states. kernel/power/main.c:657
/*
 * state - control system sleep states.
 *
 * show() returns available sleep state labels, which may be "mem", "standby",
 * "freeze" and "disk" (hibernation).
 * See Documentation/admin-guide/pm/sleep-states.rst for a description of
 * what they mean.
 *
 * store() accepts one of those strings, translates it into the proper
 * enumerated value, and initiates a suspend transition.
 */
static ssize_t state_show(struct kobject *kobj, struct kobj_attribute *attr,
			  char *buf)
 
	char *s = buf;
#ifdef CONFIG_SUSPEND
	suspend_state_t i;
	for (i = PM_SUSPEND_MIN; i < PM_SUSPEND_MAX; i++)
		if (pm_states[i])
			s += sprintf(s,"%s ", pm_states[i]);
#endif
	if (hibernation_available())
		s += sprintf(s, "disk ");
	if (s != buf)
		/* convert the last space to a newline */
		*(s-1) = '\n';
	return (s - buf);
 
state_store, however, provides our entry point. If the string disk is written to the state file, it calls hibernate(). This is our entry point. kernel/power/main.c:715
static ssize_t state_store(struct kobject *kobj, struct kobj_attribute *attr,
			   const char *buf, size_t n)
 
	suspend_state_t state;
	int error;
	error = pm_autosleep_lock();
	if (error)
		return error;
	if (pm_autosleep_state() > PM_SUSPEND_ON)  
		error = -EBUSY;
		goto out;
	 
	state = decode_state(buf, n);
	if (state < PM_SUSPEND_MAX)  
		if (state == PM_SUSPEND_MEM)
			state = mem_sleep_current;
		error = pm_suspend(state);
	  else if (state == PM_SUSPEND_MAX)  
		error = hibernate();
	  else  
		error = -EINVAL;
	 
 out:
	pm_autosleep_unlock();
	return error ? error : n;
 
kernel/power/main.c:688
static suspend_state_t decode_state(const char *buf, size_t n)
 
#ifdef CONFIG_SUSPEND
	suspend_state_t state;
#endif
	char *p;
	int len;
	p = memchr(buf, '\n', n);
	len = p ? p - buf : n;
	/* Check hibernation first. */
	if (len == 4 && str_has_prefix(buf, "disk"))
		return PM_SUSPEND_MAX;
#ifdef CONFIG_SUSPEND
	for (state = PM_SUSPEND_MIN; state < PM_SUSPEND_MAX; state++)  
		const char *label = pm_states[state];
		if (label && len == strlen(label) && !strncmp(buf, label, len))
			return state;
	 
#endif
	return PM_SUSPEND_ON;
 
Could we have figured this out just via function names? Sure, but this way we know for sure that nothing else is happening before this function is called.

Autosleep Our first detour is into the autosleep system. When checking the state above, you may notice that the kernel grabs the pm_autosleep_lock before checking the current state. autosleep is a mechanism originally from Android that sends the entire system to either suspend or hibernate whenever it is not actively working on anything. This is not enabled for most desktop configurations, since it s primarily for mobile systems and inverts the standard suspend and hibernate interactions. This system is implemented as a workqueue2 that checks the current number of wakeup events, processes and drivers that need to run3, and if there aren t any, then the system is put into the autosleep state, typically suspend. However, it could be hibernate if configured that way via /sys/power/autosleep in a similar manner to using /sys/power/state to manually enable hibernation. kernel/power/main.c:841
static ssize_t autosleep_store(struct kobject *kobj,
			       struct kobj_attribute *attr,
			       const char *buf, size_t n)
 
	suspend_state_t state = decode_state(buf, n);
	int error;
	if (state == PM_SUSPEND_ON
	    && strcmp(buf, "off") && strcmp(buf, "off\n"))
		return -EINVAL;
	if (state == PM_SUSPEND_MEM)
		state = mem_sleep_current;
	error = pm_autosleep_set_state(state);
	return error ? error : n;
 
power_attr(autosleep);
#endif /* CONFIG_PM_AUTOSLEEP */
kernel/power/autosleep.c:24
static DEFINE_MUTEX(autosleep_lock);
static struct wakeup_source *autosleep_ws;
static void try_to_suspend(struct work_struct *work)
 
	unsigned int initial_count, final_count;
	if (!pm_get_wakeup_count(&initial_count, true))
		goto out;
	mutex_lock(&autosleep_lock);
	if (!pm_save_wakeup_count(initial_count)  
		system_state != SYSTEM_RUNNING)  
		mutex_unlock(&autosleep_lock);
		goto out;
	 
	if (autosleep_state == PM_SUSPEND_ON)  
		mutex_unlock(&autosleep_lock);
		return;
	 
	if (autosleep_state >= PM_SUSPEND_MAX)
		hibernate();
	else
		pm_suspend(autosleep_state);
	mutex_unlock(&autosleep_lock);
	if (!pm_get_wakeup_count(&final_count, false))
		goto out;
	/*
	 * If the wakeup occurred for an unknown reason, wait to prevent the
	 * system from trying to suspend and waking up in a tight loop.
	 */
	if (final_count == initial_count)
		schedule_timeout_uninterruptible(HZ / 2);
 out:
	queue_up_suspend_work();
 
static DECLARE_WORK(suspend_work, try_to_suspend);
void queue_up_suspend_work(void)
 
	if (autosleep_state > PM_SUSPEND_ON)
		queue_work(autosleep_wq, &suspend_work);
 

The Steps of Hibernation

Hibernation Kernel Config It s important to note that most of the hibernate-specific functions below do nothing unless you ve defined CONFIG_HIBERNATION in your Kconfig4. As an example, hibernate itself is defined as the following if CONFIG_HIBERNATE is not set. include/linux/suspend.h:407
static inline int hibernate(void)   return -ENOSYS;  

Check if Hibernation is Available We begin by confirming that we actually can perform hibernation, via the hibernation_available function. kernel/power/hibernate.c:742
if (!hibernation_available())  
	pm_pr_dbg("Hibernation not available.\n");
	return -EPERM;
 
kernel/power/hibernate.c:92
bool hibernation_available(void)
 
	return nohibernate == 0 &&
		!security_locked_down(LOCKDOWN_HIBERNATION) &&
		!secretmem_active() && !cxl_mem_active();
 
nohibernate is controlled by the kernel command line, it s set via either nohibernate or hibernate=no. security_locked_down is a hook for Linux Security Modules to prevent hibernation. This is used to prevent hibernating to an unencrypted storage device, as specified in the manual page kernel_lockdown(7). Interestingly, either level of lockdown, integrity or confidentiality, locks down hibernation because with the ability to hibernate you can extract bascially anything from memory and even reboot into a modified kernel image. secretmem_active checks whether there is any active use of memfd_secret, and if so it prevents hibernation. memfd_secret returns a file descriptor that can be mapped into a process but is specifically unmapped from the kernel s memory space. Hibernating with memory that not even the kernel is supposed to access would expose that memory to whoever could access the hibernation image. This particular feature of secret memory was apparently controversial, though not as controversial as performance concerns around fragmentation when unmapping kernel memory (which did not end up being a real problem). cxl_mem_active just checks whether any CXL memory is active. A full explanation is provided in the commit introducing this check but there s also a shortened explanation from cxl_mem_probe that sets the relevant flag when initializing a CXL memory device. drivers/cxl/mem.c:186
* The kernel may be operating out of CXL memory on this device,
* there is no spec defined way to determine whether this device
* preserves contents over suspend, and there is no simple way
* to arrange for the suspend image to avoid CXL memory which
* would setup a circular dependency between PCI resume and save
* state restoration.

Check Compression The next check is for whether compression support is enabled, and if so whether the requested algorithm is enabled. kernel/power/hibernate.c:747
/*
 * Query for the compression algorithm support if compression is enabled.
 */
if (!nocompress)  
	strscpy(hib_comp_algo, hibernate_compressor, sizeof(hib_comp_algo));
	if (crypto_has_comp(hib_comp_algo, 0, 0) != 1)  
		pr_err("%s compression is not available\n", hib_comp_algo);
		return -EOPNOTSUPP;
	 
 
The nocompress flag is set via the hibernate command line parameter, setting hibernate=nocompress. If compression is enabled, then hibernate_compressor is copied to hib_comp_algo. This synchronizes the current requested compression setting (hibernate_compressor) with the current compression setting (hib_comp_algo). Both values are character arrays of size CRYPTO_MAX_ALG_NAME (128 in this kernel). kernel/power/hibernate.c:50
static char hibernate_compressor[CRYPTO_MAX_ALG_NAME] = CONFIG_HIBERNATION_DEF_COMP;
/*
 * Compression/decompression algorithm to be used while saving/loading
 * image to/from disk. This would later be used in 'kernel/power/swap.c'
 * to allocate comp streams.
 */
char hib_comp_algo[CRYPTO_MAX_ALG_NAME];
hibernate_compressor defaults to lzo if that algorithm is enabled, otherwise to lz4 if enabled5. It can be overwritten using the hibernate.compressor setting to either lzo or lz4. kernel/power/Kconfig:95
choice
	prompt "Default compressor"
	default HIBERNATION_COMP_LZO
	depends on HIBERNATION
config HIBERNATION_COMP_LZO
	bool "lzo"
	depends on CRYPTO_LZO
config HIBERNATION_COMP_LZ4
	bool "lz4"
	depends on CRYPTO_LZ4
endchoice
config HIBERNATION_DEF_COMP
	string
	default "lzo" if HIBERNATION_COMP_LZO
	default "lz4" if HIBERNATION_COMP_LZ4
	help
	  Default compressor to be used for hibernation.
kernel/power/hibernate.c:1425
static const char * const comp_alg_enabled[] =  
#if IS_ENABLED(CONFIG_CRYPTO_LZO)
	COMPRESSION_ALGO_LZO,
#endif
#if IS_ENABLED(CONFIG_CRYPTO_LZ4)
	COMPRESSION_ALGO_LZ4,
#endif
 ;
static int hibernate_compressor_param_set(const char *compressor,
		const struct kernel_param *kp)
 
	unsigned int sleep_flags;
	int index, ret;
	sleep_flags = lock_system_sleep();
	index = sysfs_match_string(comp_alg_enabled, compressor);
	if (index >= 0)  
		ret = param_set_copystring(comp_alg_enabled[index], kp);
		if (!ret)
			strscpy(hib_comp_algo, comp_alg_enabled[index],
				sizeof(hib_comp_algo));
	  else  
		ret = index;
	 
	unlock_system_sleep(sleep_flags);
	if (ret)
		pr_debug("Cannot set specified compressor %s\n",
			 compressor);
	return ret;
 
static const struct kernel_param_ops hibernate_compressor_param_ops =  
	.set    = hibernate_compressor_param_set,
	.get    = param_get_string,
 ;
static struct kparam_string hibernate_compressor_param_string =  
	.maxlen = sizeof(hibernate_compressor),
	.string = hibernate_compressor,
 ;
We then check whether the requested algorithm is supported via crypto_has_comp. If not, we bail out of the whole operation with EOPNOTSUPP. As part of crypto_has_comp we perform any needed initialization of the algorithm, loading kernel modules and running initialization code as needed6.

Grab Locks The next step is to grab the sleep and hibernation locks via lock_system_sleep and hibernate_acquire. kernel/power/hibernate.c:758
sleep_flags = lock_system_sleep();
/* The snapshot device should not be opened while we're running */
if (!hibernate_acquire())  
	error = -EBUSY;
	goto Unlock;
 
First, lock_system_sleep marks the current thread as not freezable, which will be important later7. It then grabs the system_transistion_mutex, which locks taking snapshots or modifying how they are taken, resuming from a hibernation image, entering any suspend state, or rebooting.

The GFP Mask The kernel also issues a warning if the gfp mask is changed via either pm_restore_gfp_mask or pm_restrict_gfp_mask without holding the system_transistion_mutex. GFP flags tell the kernel how it is permitted to handle a request for memory. include/linux/gfp_types.h:12
 * GFP flags are commonly used throughout Linux to indicate how memory
 * should be allocated.  The GFP acronym stands for get_free_pages(),
 * the underlying memory allocation function.  Not every GFP flag is
 * supported by every function which may allocate memory.
In the case of hibernation specifically we care about the IO and FS flags, which are reclaim operators, ways the system is permitted to attempt to free up memory in order to satisfy a specific request for memory. include/linux/gfp_types.h:176
 * Reclaim modifiers
 * -----------------
 * Please note that all the following flags are only applicable to sleepable
 * allocations (e.g. %GFP_NOWAIT and %GFP_ATOMIC will ignore them).
 *
 * %__GFP_IO can start physical IO.
 *
 * %__GFP_FS can call down to the low-level FS. Clearing the flag avoids the
 * allocator recursing into the filesystem which might already be holding
 * locks.
gfp_allowed_mask sets which flags are permitted to be set at the current time. As the comment below outlines, preventing these flags from being set avoids situations where the kernel needs to do I/O to allocate memory (e.g. read/writing swap8) but the devices it needs to read/write to/from are not currently available. kernel/power/main.c:24
/*
 * The following functions are used by the suspend/hibernate code to temporarily
 * change gfp_allowed_mask in order to avoid using I/O during memory allocations
 * while devices are suspended.  To avoid races with the suspend/hibernate code,
 * they should always be called with system_transition_mutex held
 * (gfp_allowed_mask also should only be modified with system_transition_mutex
 * held, unless the suspend/hibernate code is guaranteed not to run in parallel
 * with that modification).
 */
static gfp_t saved_gfp_mask;
void pm_restore_gfp_mask(void)
 
	WARN_ON(!mutex_is_locked(&system_transition_mutex));
	if (saved_gfp_mask)  
		gfp_allowed_mask = saved_gfp_mask;
		saved_gfp_mask = 0;
	 
 
void pm_restrict_gfp_mask(void)
 
	WARN_ON(!mutex_is_locked(&system_transition_mutex));
	WARN_ON(saved_gfp_mask);
	saved_gfp_mask = gfp_allowed_mask;
	gfp_allowed_mask &= ~(__GFP_IO   __GFP_FS);
 

Sleep Flags After grabbing the system_transition_mutex the kernel then returns and captures the previous state of the threads flags in sleep_flags. This is used later to remove PF_NOFREEZE if it wasn t previously set on the current thread. kernel/power/main.c:52
unsigned int lock_system_sleep(void)
 
	unsigned int flags = current->flags;
	current->flags  = PF_NOFREEZE;
	mutex_lock(&system_transition_mutex);
	return flags;
 
EXPORT_SYMBOL_GPL(lock_system_sleep);
include/linux/sched.h:1633
#define PF_NOFREEZE		0x00008000	/* This thread should not be frozen */
Then we grab the hibernate-specific semaphore to ensure no one can open a snapshot or resume from it while we perform hibernation. Additionally this lock is used to prevent hibernate_quiet_exec, which is used by the nvdimm driver to active its firmware with all processes and devices frozen, ensuring it is the only thing running at that time9. kernel/power/hibernate.c:82
bool hibernate_acquire(void)
 
	return atomic_add_unless(&hibernate_atomic, -1, 0);
 

Prepare Console The kernel next calls pm_prepare_console. This function only does anything if CONFIG_VT_CONSOLE_SLEEP has been set. This prepares the virtual terminal for a suspend state, switching away to a console used only for the suspend state if needed. kernel/power/console.c:130
void pm_prepare_console(void)
 
	if (!pm_vt_switch())
		return;
	orig_fgconsole = vt_move_to_console(SUSPEND_CONSOLE, 1);
	if (orig_fgconsole < 0)
		return;
	orig_kmsg = vt_kmsg_redirect(SUSPEND_CONSOLE);
	return;
 
The first thing is to check whether we actually need to switch the VT kernel/power/console.c:94
/*
 * There are three cases when a VT switch on suspend/resume are required:
 *   1) no driver has indicated a requirement one way or another, so preserve
 *      the old behavior
 *   2) console suspend is disabled, we want to see debug messages across
 *      suspend/resume
 *   3) any registered driver indicates it needs a VT switch
 *
 * If none of these conditions is present, meaning we have at least one driver
 * that doesn't need the switch, and none that do, we can avoid it to make
 * resume look a little prettier (and suspend too, but that's usually hidden,
 * e.g. when closing the lid on a laptop).
 */
static bool pm_vt_switch(void)
 
	struct pm_vt_switch *entry;
	bool ret = true;
	mutex_lock(&vt_switch_mutex);
	if (list_empty(&pm_vt_switch_list))
		goto out;
	if (!console_suspend_enabled)
		goto out;
	list_for_each_entry(entry, &pm_vt_switch_list, head)  
		if (entry->required)
			goto out;
	 
	ret = false;
out:
	mutex_unlock(&vt_switch_mutex);
	return ret;
 
There is an explanation of the conditions under which a switch is performed in the comment above the function, but we ll also walk through the steps here. Firstly we grab the vt_switch_mutex to ensure nothing will modify the list while we re looking at it. We then examine the pm_vt_switch_list. This list is used to indicate the drivers that require a switch during suspend. They register this requirement, or the lack thereof, via pm_vt_switch_required. kernel/power/console.c:31
/**
 * pm_vt_switch_required - indicate VT switch at suspend requirements
 * @dev: device
 * @required: if true, caller needs VT switch at suspend/resume time
 *
 * The different console drivers may or may not require VT switches across
 * suspend/resume, depending on how they handle restoring video state and
 * what may be running.
 *
 * Drivers can indicate support for switchless suspend/resume, which can
 * save time and flicker, by using this routine and passing 'false' as
 * the argument.  If any loaded driver needs VT switching, or the
 * no_console_suspend argument has been passed on the command line, VT
 * switches will occur.
 */
void pm_vt_switch_required(struct device *dev, bool required)
Next, we check console_suspend_enabled. This is set to false by the kernel parameter no_console_suspend, but defaults to true. Finally, if there are any entries in the pm_vt_switch_list, then we check to see if any of them require a VT switch. Only if none of these conditions apply, then we return false. If a VT switch is in fact required, then we move first the currently active virtual terminal/console10 (vt_move_to_console) and then the current location of kernel messages (vt_kmsg_redirect) to the SUSPEND_CONSOLE. The SUSPEND_CONSOLE is the last entry in the list of possible consoles, and appears to just be a black hole to throw away messages. kernel/power/console.c:16
#define SUSPEND_CONSOLE	(MAX_NR_CONSOLES-1)
Interestingly, these are separate functions because you can use TIOCL_SETKMSGREDIRECT (an ioctl11) to send kernel messages to a specific virtual terminal, but by default its the same as the currently active console. The locations of the previously active console and the previous kernel messages location are stored in orig_fgconsole and orig_kmsg, to restore the state of the console and kernel messages after the machine wakes up again. Interestingly, this means orig_fgconsole also ends up storing any errors, so has to be checked to ensure it s not less than zero before we try to do anything with the kernel messages on both suspend and resume. drivers/tty/vt/vt_ioctl.c:1268
/* Perform a kernel triggered VT switch for suspend/resume */
static int disable_vt_switch;
int vt_move_to_console(unsigned int vt, int alloc)
 
	int prev;
	console_lock();
	/* Graphics mode - up to X */
	if (disable_vt_switch)  
		console_unlock();
		return 0;
	 
	prev = fg_console;
	if (alloc && vc_allocate(vt))  
		/* we can't have a free VC for now. Too bad,
		 * we don't want to mess the screen for now. */
		console_unlock();
		return -ENOSPC;
	 
	if (set_console(vt))  
		/*
		 * We're unable to switch to the SUSPEND_CONSOLE.
		 * Let the calling function know so it can decide
		 * what to do.
		 */
		console_unlock();
		return -EIO;
	 
	console_unlock();
	if (vt_waitactive(vt + 1))  
		pr_debug("Suspend: Can't switch VCs.");
		return -EINTR;
	 
	return prev;
 
Unlike most other locking functions we ve seen so far, console_lock needs to be careful to ensure nothing else is panicking and needs to dump to the console before grabbing the semaphore for the console and setting a couple flags.

Panics Panics are tracked via an atomic integer set to the id of the processor currently panicking. kernel/printk/printk.c:2649
/**
 * console_lock - block the console subsystem from printing
 *
 * Acquires a lock which guarantees that no consoles will
 * be in or enter their write() callback.
 *
 * Can sleep, returns nothing.
 */
void console_lock(void)
 
	might_sleep();
	/* On panic, the console_lock must be left to the panic cpu. */
	while (other_cpu_in_panic())
		msleep(1000);
	down_console_sem();
	console_locked = 1;
	console_may_schedule = 1;
 
EXPORT_SYMBOL(console_lock);
kernel/printk/printk.c:362
/*
 * Return true if a panic is in progress on a remote CPU.
 *
 * On true, the local CPU should immediately release any printing resources
 * that may be needed by the panic CPU.
 */
bool other_cpu_in_panic(void)
 
	return (panic_in_progress() && !this_cpu_in_panic());
 
kernel/printk/printk.c:345
static bool panic_in_progress(void)
 
	return unlikely(atomic_read(&panic_cpu) != PANIC_CPU_INVALID);
 
kernel/printk/printk.c:350
/* Return true if a panic is in progress on the current CPU. */
bool this_cpu_in_panic(void)
 
	/*
	 * We can use raw_smp_processor_id() here because it is impossible for
	 * the task to be migrated to the panic_cpu, or away from it. If
	 * panic_cpu has already been set, and we're not currently executing on
	 * that CPU, then we never will be.
	 */
	return unlikely(atomic_read(&panic_cpu) == raw_smp_processor_id());
 
console_locked is a debug value, used to indicate that the lock should be held, and our first indication that this whole virtual terminal system is more complex than might initially be expected. kernel/printk/printk.c:373
/*
 * This is used for debugging the mess that is the VT code by
 * keeping track if we have the console semaphore held. It's
 * definitely not the perfect debug tool (we don't know if _WE_
 * hold it and are racing, but it helps tracking those weird code
 * paths in the console code where we end up in places I want
 * locked without the console semaphore held).
 */
static int console_locked;
console_may_schedule is used to see if we are permitted to sleep and schedule other work while we hold this lock. As we ll see later, the virtual terminal subsystem is not re-entrant, so there s all sorts of hacks in here to ensure we don t leave important code sections that can t be safely resumed.

Disable VT Switch As the comment below lays out, when another program is handling graphical display anyway, there s no need to do any of this, so the kernel provides a switch to turn the whole thing off. Interestingly, this appears to only be used by three drivers, so the specific hardware support required must not be particularly common.
drivers/gpu/drm/omapdrm/dss
drivers/video/fbdev/geode
drivers/video/fbdev/omap2
drivers/tty/vt/vt_ioctl.c:1308
/*
 * Normally during a suspend, we allocate a new console and switch to it.
 * When we resume, we switch back to the original console.  This switch
 * can be slow, so on systems where the framebuffer can handle restoration
 * of video registers anyways, there's little point in doing the console
 * switch.  This function allows you to disable it by passing it '0'.
 */
void pm_set_vt_switch(int do_switch)
 
	console_lock();
	disable_vt_switch = !do_switch;
	console_unlock();
 
EXPORT_SYMBOL(pm_set_vt_switch);
The rest of the vt_switch_console function is pretty normal, however, simply allocating space if needed to create the requested virtual terminal and then setting the current virtual terminal via set_console.

Virtual Terminal Set Console With set_console, we begin (as if we haven t been already) to enter the madness that is the virtual terminal subsystem. As mentioned previously, modifications to its state must be made very carefully, as other stuff happening at the same time could create complete messes. All this to say, calling set_console does not actually perform any work to change the state of the current console. Instead it indicates what changes it wants and then schedules that work. drivers/tty/vt/vt.c:3153
int set_console(int nr)
 
	struct vc_data *vc = vc_cons[fg_console].d;
	if (!vc_cons_allocated(nr)   vt_dont_switch  
		(vc->vt_mode.mode == VT_AUTO && vc->vc_mode == KD_GRAPHICS))  
		/*
		 * Console switch will fail in console_callback() or
		 * change_console() so there is no point scheduling
		 * the callback
		 *
		 * Existing set_console() users don't check the return
		 * value so this shouldn't break anything
		 */
		return -EINVAL;
	 
	want_console = nr;
	schedule_console_callback();
	return 0;
 
The check for vc->vc_mode == KD_GRAPHICS is where most end-user graphical desktops will bail out of this change, as they re in graphics mode and don t need to switch away to the suspend console. vt_dont_switch is a flag used by the ioctls11 VT_LOCKSWITCH and VT_UNLOCKSWITCH to prevent the system from switching virtual terminal devices when the user has explicitly locked it. VT_AUTO is a flag indicating that automatic virtual terminal switching is enabled12, and thus deliberate switching to a suspend terminal is not required. However, if you do run your machine from a virtual terminal, then we indicate to the system that we want to change to the requested virtual terminal via the want_console variable and schedule a callback via schedule_console_callback. drivers/tty/vt/vt.c:315
void schedule_console_callback(void)
 
	schedule_work(&console_work);
 
console_work is a workqueue2 that will execute the given task asynchronously.

Console Callback drivers/tty/vt/vt.c:3109
/*
 * This is the console switching callback.
 *
 * Doing console switching in a process context allows
 * us to do the switches asynchronously (needed when we want
 * to switch due to a keyboard interrupt).  Synchronization
 * with other console code and prevention of re-entrancy is
 * ensured with console_lock.
 */
static void console_callback(struct work_struct *ignored)
 
	console_lock();
	if (want_console >= 0)  
		if (want_console != fg_console &&
		    vc_cons_allocated(want_console))  
			hide_cursor(vc_cons[fg_console].d);
			change_console(vc_cons[want_console].d);
			/* we only changed when the console had already
			   been allocated - a new console is not created
			   in an interrupt routine */
		 
		want_console = -1;
	 
...
console_callback first looks to see if there is a console change wanted via want_console and then changes to it if it s not the current console and has been allocated already. We do first remove any cursor state with hide_cursor. drivers/tty/vt/vt.c:841
static void hide_cursor(struct vc_data *vc)
 
	if (vc_is_sel(vc))
		clear_selection();
	vc->vc_sw->con_cursor(vc, false);
	hide_softcursor(vc);
 
A full dive into the tty driver is a task for another time, but this should give a general sense of how this system interacts with hibernation.

Notify Power Management Call Chain kernel/power/hibernate.c:767
pm_notifier_call_chain_robust(PM_HIBERNATION_PREPARE, PM_POST_HIBERNATION)
This will call a chain of power management callbacks, passing first PM_HIBERNATION_PREPARE and then PM_POST_HIBERNATION on startup or on error with another callback. kernel/power/main.c:98
int pm_notifier_call_chain_robust(unsigned long val_up, unsigned long val_down)
 
	int ret;
	ret = blocking_notifier_call_chain_robust(&pm_chain_head, val_up, val_down, NULL);
	return notifier_to_errno(ret);
 
The power management notifier is a blocking notifier chain, which means it has the following properties. include/linux/notifier.h:23
 *	Blocking notifier chains: Chain callbacks run in process context.
 *		Callouts are allowed to block.
The callback chain is a linked list with each entry containing a priority and a function to call. The function technically takes in a data value, but it is always NULL for the power management chain. include/linux/notifier.h:49
struct notifier_block;
typedef	int (*notifier_fn_t)(struct notifier_block *nb,
			unsigned long action, void *data);
struct notifier_block  
	notifier_fn_t notifier_call;
	struct notifier_block __rcu *next;
	int priority;
 ;
The head of the linked list is protected by a read-write semaphore. include/linux/notifier.h:65
struct blocking_notifier_head  
	struct rw_semaphore rwsem;
	struct notifier_block __rcu *head;
 ;
Because it is prioritized, appending to the list requires walking it until an item with lower13 priority is found to insert the current item before. kernel/notifier.c:252
/*
 *	Blocking notifier chain routines.  All access to the chain is
 *	synchronized by an rwsem.
 */
static int __blocking_notifier_chain_register(struct blocking_notifier_head *nh,
					      struct notifier_block *n,
					      bool unique_priority)
 
	int ret;
	/*
	 * This code gets used during boot-up, when task switching is
	 * not yet working and interrupts must remain disabled.  At
	 * such times we must not call down_write().
	 */
	if (unlikely(system_state == SYSTEM_BOOTING))
		return notifier_chain_register(&nh->head, n, unique_priority);
	down_write(&nh->rwsem);
	ret = notifier_chain_register(&nh->head, n, unique_priority);
	up_write(&nh->rwsem);
	return ret;
 
kernel/notifier.c:20
/*
 *	Notifier chain core routines.  The exported routines below
 *	are layered on top of these, with appropriate locking added.
 */
static int notifier_chain_register(struct notifier_block **nl,
				   struct notifier_block *n,
				   bool unique_priority)
 
	while ((*nl) != NULL)  
		if (unlikely((*nl) == n))  
			WARN(1, "notifier callback %ps already registered",
			     n->notifier_call);
			return -EEXIST;
		 
		if (n->priority > (*nl)->priority)
			break;
		if (n->priority == (*nl)->priority && unique_priority)
			return -EBUSY;
		nl = &((*nl)->next);
	 
	n->next = *nl;
	rcu_assign_pointer(*nl, n);
	trace_notifier_register((void *)n->notifier_call);
	return 0;
 
Each callback can return one of a series of options. include/linux/notifier.h:18
#define NOTIFY_DONE		0x0000		/* Don't care */
#define NOTIFY_OK		0x0001		/* Suits me */
#define NOTIFY_STOP_MASK	0x8000		/* Don't call further */
#define NOTIFY_BAD		(NOTIFY_STOP_MASK 0x0002)
						/* Bad/Veto action */
When notifying the chain, if a function returns STOP or BAD then the previous parts of the chain are called again with PM_POST_HIBERNATION14 and an error is returned. kernel/notifier.c:107
/**
 * notifier_call_chain_robust - Inform the registered notifiers about an event
 *                              and rollback on error.
 * @nl:		Pointer to head of the blocking notifier chain
 * @val_up:	Value passed unmodified to the notifier function
 * @val_down:	Value passed unmodified to the notifier function when recovering
 *              from an error on @val_up
 * @v:		Pointer passed unmodified to the notifier function
 *
 * NOTE:	It is important the @nl chain doesn't change between the two
 *		invocations of notifier_call_chain() such that we visit the
 *		exact same notifier callbacks; this rules out any RCU usage.
 *
 * Return:	the return value of the @val_up call.
 */
static int notifier_call_chain_robust(struct notifier_block **nl,
				     unsigned long val_up, unsigned long val_down,
				     void *v)
 
	int ret, nr = 0;
	ret = notifier_call_chain(nl, val_up, v, -1, &nr);
	if (ret & NOTIFY_STOP_MASK)
		notifier_call_chain(nl, val_down, v, nr-1, NULL);
	return ret;
 
Each of these callbacks tends to be quite driver-specific, so we ll cease discussion of this here.

Sync Filesystems The next step is to ensure all filesystems have been synchronized to disk. This is performed via a simple helper function that times how long the full synchronize operation, ksys_sync takes. kernel/power/main.c:69
void ksys_sync_helper(void)
 
	ktime_t start;
	long elapsed_msecs;
	start = ktime_get();
	ksys_sync();
	elapsed_msecs = ktime_to_ms(ktime_sub(ktime_get(), start));
	pr_info("Filesystems sync: %ld.%03ld seconds\n",
		elapsed_msecs / MSEC_PER_SEC, elapsed_msecs % MSEC_PER_SEC);
 
EXPORT_SYMBOL_GPL(ksys_sync_helper);
ksys_sync wakes and instructs a set of flusher threads to write out every filesystem, first their inodes15, then the full filesystem, and then finally all block devices, to ensure all pages are written out to disk. fs/sync.c:87
/*
 * Sync everything. We start by waking flusher threads so that most of
 * writeback runs on all devices in parallel. Then we sync all inodes reliably
 * which effectively also waits for all flusher threads to finish doing
 * writeback. At this point all data is on disk so metadata should be stable
 * and we tell filesystems to sync their metadata via ->sync_fs() calls.
 * Finally, we writeout all block devices because some filesystems (e.g. ext2)
 * just write metadata (such as inodes or bitmaps) to block device page cache
 * and do not sync it on their own in ->sync_fs().
 */
void ksys_sync(void)
 
	int nowait = 0, wait = 1;
	wakeup_flusher_threads(WB_REASON_SYNC);
	iterate_supers(sync_inodes_one_sb, NULL);
	iterate_supers(sync_fs_one_sb, &nowait);
	iterate_supers(sync_fs_one_sb, &wait);
	sync_bdevs(false);
	sync_bdevs(true);
	if (unlikely(laptop_mode))
		laptop_sync_completion();
 
It follows an interesting pattern of using iterate_supers to run both sync_inodes_one_sb and then sync_fs_one_sb on each known filesystem16. It also calls both sync_fs_one_sb and sync_bdevs twice, first without waiting for any operations to complete and then again waiting for completion17. When laptop_mode is enabled the system runs additional filesystem synchronization operations after the specified delay without any writes. mm/page-writeback.c:111
/*
 * Flag that puts the machine in "laptop mode". Doubles as a timeout in jiffies:
 * a full sync is triggered after this time elapses without any disk activity.
 */
int laptop_mode;
EXPORT_SYMBOL(laptop_mode);
However, when running a filesystem synchronization operation, the system will add an additional timer to schedule more writes after the laptop_mode delay. We don t want the state of the system to change at all while performing hibernation, so we cancel those timers. mm/page-writeback.c:2198
/*
 * We're in laptop mode and we've just synced. The sync's writes will have
 * caused another writeback to be scheduled by laptop_io_completion.
 * Nothing needs to be written back anymore, so we unschedule the writeback.
 */
void laptop_sync_completion(void)
 
	struct backing_dev_info *bdi;
	rcu_read_lock();
	list_for_each_entry_rcu(bdi, &bdi_list, bdi_list)
		del_timer(&bdi->laptop_mode_wb_timer);
	rcu_read_unlock();
 
As a side note, the ksys_sync function is simply called when the system call sync is used. fs/sync.c:111
SYSCALL_DEFINE0(sync)
 
	ksys_sync();
	return 0;
 

The End of Preparation With that the system has finished preparations for hibernation. This is a somewhat arbitrary cutoff, but next the system will begin a full freeze of userspace to then dump memory out to an image and finally to perform hibernation. All this will be covered in future articles!
  1. Hibernation modes are outside of scope for this article, see the previous article for a high-level description of the different types of hibernation.
  2. Workqueues are a mechanism for running asynchronous tasks. A full description of them is a task for another time, but the kernel documentation on them is available here: https://www.kernel.org/doc/html/v6.9/core-api/workqueue.html 2
  3. This is a bit of an oversimplification, but since this isn t the main focus of this article this description has been kept to a higher level.
  4. Kconfig is Linux s build configuration system that sets many different macros to enable/disable various features.
  5. Kconfig defaults to the first default found
  6. Including checking whether the algorithm is larval? Which appears to indicate that it requires additional setup, but is an interesting choice of name for such a state.
  7. Specifically when we get to process freezing, which we ll get to in the next article in this series.
  8. Swap space is outside the scope of this article, but in short it is a buffer on disk that the kernel uses to store memory not current in use to free up space for other things. See Swap Management for more details.
  9. The code for this is lengthy and tangential, thus it has not been included here. If you re curious about the details of this, see kernel/power/hibernate.c:858 for the details of hibernate_quiet_exec, and drivers/nvdimm/core.c:451 for how it is used in nvdimm.
  10. Annoyingly this code appears to use the terms console and virtual terminal interchangeably.
  11. ioctls are special device-specific I/O operations that permit performing actions outside of the standard file interactions of read/write/seek/etc. 2
  12. I m not entirely clear on how this flag works, this subsystem is particularly complex.
  13. In this case a higher number is higher priority.
  14. Or whatever the caller passes as val_down, but in this case we re specifically looking at how this is used in hibernation.
  15. An inode refers to a particular file or directory within the filesystem. See Wikipedia for more details.
  16. Each active filesystem is registed with the kernel through a structure known as a superblock, which contains references to all the inodes contained within the filesystem, as well as function pointers to perform the various required operations, like sync.
  17. I m including minimal code in this section, as I m not looking to deep dive into the filesystem code at this time.

4 September 2024

Dirk Eddelbuettel: RcppCNPy 0.2.13 on CRAN: Micro Bugfix

Another (again somewhat minor) maintenance release of the RcppCNPy package arrived on CRAN earlier today. RcppCNPy provides R with read and write access to NumPy files thanks to the cnpy library by Carl Rogers along with Rcpp for the glue to R. A change in the most recent Rcpp appears to cause void functions wrapper via Rcpp Modules to return NULL, as opposed to being silent. That tickles discrepancy between the current output and the saved (reference) output of one test file, leading CRAN to display a NOTE which we were asked to take care of. Done here in this release and now that we know we will also look into restoring the prior Rcpp behaviour. Other small changes involved standard maintenance for continuous integration and updates to files README.md and DESCRIPTION. More details are below.

Changes in version 0.2.13 (2024-09-03)
  • A test script was updated to account for the fact that it now returns a few instances of NULL under current Rcpp.
  • Small package maintenance updates have been made to the README and DESCRIPTION files as well as to the continuous integration setup.

CRANberries also provides a diffstat report for the latest release. As always, feedback is welcome and the best place to start a discussion may be the GitHub issue tickets page. If you like this or other open-source work I do, you can now sponsor me at GitHub.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

Samuel Henrique: DebConf24 was fun!: Security, curl, wcurl, Debian's quality

A picture of a badger2040w with Samuel's badge and the curl manpage PCB on the side

tl;dr DebConf24 was fun! A playlist of all of my talks, with subtitles (en, pt-br) and chapters is available on YouTube.

Overview DebConf24 was held in Busan, South Korea, between Sunday July 28th to Sunday August 4th 2024. As usual for DebConfs, I had a great time meeting my friends, but also met new people and got to learn a bit about the interesting things they're working on. I ended up getting too excited during the talk submission stage of the conference and as a result I presented 5 different activities (3 talks, 1 BoF and 1 lightning talk). Since I was too busy with the presentations, I did not have a lot of time to actually hang out with folks, or even to go out in the city, I guess I've learned my lesson for next time. The main purpose of this post is to write about all of the things I presented at the conference. I did want to list some of the interesting talks I've watched, but that I would not be able to be fair as I'm sure I would miss some. You can get the schedule and the recordings of any talks from the conference's website: https://debconf24.debconf.org/schedule/

wcurl Lightning Talk The most fun of my presentations, during the second-to-last day of the conference, I've asked for help from Sergio Durigan Junior <sergiodj> to setup an URL containing a whitespace and redirecting that to wcurl's manpage. I then did a little demo to showcase why me (and a lot others) struggle with downloading things with curl, and how wcurl solves that. https://www.youtube.com/watch?v=eM8M5qa4pPM

Fixing CVEs on Debian: Everything you probably know already I've always felt like DebConf was missing security-related talks, so I decided to do something about it and presented a few of the things I've learned when fixing CVEs for Debian. This is an area where we don't get a lot of new contributors, I'm trying to change that, and this talk can be used to introduce newcomers to it. https://www.youtube.com/watch?v=XzNVVILVyUM

The secret sauce of Debian Debian is not very vocal about all of the nice things it has regarding quality-assurance, testing, or CI, even though it's at the state-of-the-art for a lot of things. This talk is an initial step towards making people aware of the cool things happening behind the scenes. Ideally we should have it well-documented somewhere. https://www.youtube.com/watch?v=x_X2IBnpjic

"I use Debian BTW": fzf, tmux, zoxide and friends One of my earliest good memories of Debian was when it started coming with a colored PS1 by default, I still remember the feeling of relief whenever I jumped into a Debian server and didn't have to deal with a black and white PS1. There's still a lot of room for Debian to ship better defaults, and I think some of them can actually happen. This talk is a bit of a silly one where I'm just making people aware of the existence of a few Golang/Rust CLI tools, and also some dotfiles configurations that should probably be the default. https://www.youtube.com/watch?v=tfto3Seokn4

curl The curl project does such a great job with their security advisories that it will likely never receive the amount of praise it deserves, but I did my best at mentioning it throughout my CVEs talk. Maybe I will write more extensively about this someday, but in case I don't:
There's no other project which always consistently mentions the exact range of commits that are affected by a given CVE. Forget about whether the versions are EOL, curl doesn't have LTS releases, yet they do such a great job at clearly documenting their CVEs that I would take that over having LTS releases anytime (that's for curl at least, I acknowledge some types of projects have a different need for LTS releases). Not only that, but they are also always careful about explaining alternative mitigations such as configuration changes, build flags that defuse the exploitation, or parameters that you should not use.
Just like we tend to do every time we meet, me and the other Debian curl maintainers spent the first 2 or 3 days of the conference talking about how we wanted to eventually meet up to discuss the package. It was going to be informal, maybe during the Cheese and Wine party, but then I've realized we should make it part of the official schedule, which would also give us the recordings for later. And so the "curl maintainers BoF" happened, where we spoke about HTTP3, GnutTLS, wcurl and other things. https://www.youtube.com/watch?v=fL7hSypUTdM

wcurl Right after that BoF, Daniel Stenberg asked if we were interested in having wcurl adopted into curl, which we definitely were, so wcurl is now part of the curl project. Daniel was also kind enough to design a logo for the project, which makes me especially happy because I can stop with my own approach at a logo (which I had to redo every few days): A laptop with a curl and a GoHorse sticker, there's a 'w' handwritten with a marker on the right side of the curl sticker, making it 'wcurl' And here is the new logo: 'wcurl' written with the same font and colors as the curl logo, with the 'w' being green instead of blue, and a download icon at the end Much better, I would say :)

curl Swag DebConf24 was my chance at forwarding some curl swag items to the other curl maintainers, so both Sergio Durigan Junior <sergiodj> and Carlos Henrique Lima Melara <charles> got the curl-up t-shirt and the very cool curl PCB coaster, both gifted by Daniel Stenberg. Unfortunately I didn't have any of that for DebConf attendees, but I did drop loads of curl stickers at the stickers table, they were gone very quickly. A table full of different stickers, curl stickers can be seen over the whole table

For the future I used to think the most humbling experience you could have as someone who presented a talk was to have to watch it yourself, you notice a lot of mistakes and you instantly think about things that should be done differently. It turns out the most humbling thing to do is actually to write subtitles for your talks, I noticed every single mistake, often multiple times. So after spending more than 30 hours writing the subtitles for both English and Brazilian Portuguese for my talks, I feel like it's going to be much easier to avoid committing the same mistakes again. After some time you stop feeling shame about those mistakes and you're just left with feelings of annoyance, and at that point it becomes easier to consciously avoid them. I am collecting a list of things I wish I had done differently on all of those talks, so if I end up presenting any one of them again, it will be an improved version. A picture from the top of a group of conference attendees, there's about 150 people in the picture

1 September 2024

Russ Allbery: Review: Reasons Not to Worry

Review: Reasons Not to Worry, by Brigid Delaney
Publisher: Harper
Copyright: 2022
Printing: October 2023
ISBN: 0-06-331484-3
Format: Kindle
Pages: 295
Reasons Not to Worry is a self-help non-fiction book about stoicism, focusing specifically on quotes from Seneca, Epictetus, and Marcus Aurelius. Brigid Delaney is a long-time Guardian columnist who has written on a huge variety of topics, including (somewhat relevantly to this book) her personal experiences trying weird fads. Stoicism is having a moment among the sort of men who give people life advice in podcast form. Ryan Holiday, a former marketing executive, has made a career out of being the face of stoicism in everyone's podcast feed (and, of course, hosting his own). He is far from alone. If you pay attention to anyone in the male self-help space right now (Cal Newport, in my case), you have probably heard something vague about the "wisdom of the stoics." Given that the core of stoicism is easily interpreted as a strategy for overcoming your emotions with logic, this isn't surprising. Philosophies that lean heavily on college dorm room logic, discount emotion, and argue that society is full of obvious flaws that can be analyzed and debunked by one dude with some blog software and a free afternoon have been very popular in tech circles for the past ten to fifteen years, and have spread to some extent into popular culture. Intriguingly, though, stoicism is a system of virtue ethics, which means it is historically in opposition to consequentialist philosophies like utilitarianism, the ethical philosophy behind effective altruism and other related Silicon Valley fads. I am pretty exhausted with the whole genre of men talking to each other about how to live a better life Cal Newport by himself more than satisfies the amount of that I want to absorb but I was still mildly curious about stoicism. My education didn't provide me with a satisfying grounding in major historical philosophical movements, so I occasionally look around for good introductions. Stoicism also has some reputation as an anxiety-reduction technique, and I could use more of those. When I saw a Discord recommendation for Reasons Not to Worry that specifically mentioned its lack of bro perspective, I figured I'd give it a shot. Reasons Not to Worry is indeed not a bro book, although I would have preferred fewer appearances of the author's friend Andrew, whose opinions on stoicism I could not possibly care less about. What it is, though, is a shallow and credulous book that falls squarely in the middle of the lightweight self-help genre. Delaney is here to explain why stoicism is awesome and to convince you that a school of Greek and Roman philosophers knew exactly how you should think about your life today. If this sounds quasi-religious, well, I'll get to that. Delaney does provide a solid introduction to stoicism that I think is a bit more approachable than reading the relevant Wikipedia article. In her presentation, the core of stoicism is the practice of four virtues: wisdom, courage, moderation, and justice. The modern definition of "stoic" as someone who is impassive in the presence of pleasure or pain is somewhat misleading, but Delaney does emphasize a goal of ataraxia, or tranquility of mind. By making that the goal rather than joy or pleasure, stoicism tries to avoid the trap of the hedonic treadmill in favor of a more achievable persistent contentment. As an aside, some quick Internet research makes me doubt Delaney's summary here. Other material about stoicism I found focuses on apatheia and associates ataraxia with Epicureanism instead. But I won't start quibbling with Delaney's definitions; I'm not qualified and this review is already too long. The key to ataraxia, in Delaney's summary of stoicism, is to focus only on those parts of life we can control. She summarizes those as our character, how we treat others, and our actions and reactions. Everything else wealth, the esteem of our colleagues, good health, good fortune is at least partly outside of our control, and therefore we should enjoy it when we have it but try to be indifferent to whether it will last. Attempting to control things that are outside of our control is doomed to failure and will disturb our tranquility. Essentially all of this book is elaborations and variations on this theme, specialized to some specific area of life like social media, anxiety, or grief and written in the style of a breezy memoir. If you're familiar with modern psychological treatment frameworks like cognitive behavioral therapy or acceptance and commitment therapy, this summary of stoicism may sound familiar. (Apparently this is not an accident; the predecessor to CBT used stoicism as a philosophical basis.) Stoicism, like those treatment approaches, tries to refocus your attention on the things that you can improve and de-emphasizes the things outside of your control. This is a lot of the appeal, at least to me (and I think to Delaney as well). Hearing that definition, you may have some questions. Why those virtues specifically? They sound good, but all virtues sound good almost by definition. Is there any measure of your success in following those virtues outside your subjective feeling of ataraxia? Does the focus on only things you can control lead to ignoring problems only mostly outside of your control, where your actions would matter but only to a small degree? Doesn't this whole philosophy sound a little self-centered? What do non-stoic virtue ethics look like, and why do they differ from stoicism? What is the consequentialist critique of stoicism? This is where the shortcomings of this book become clear: Delaney is not very interested in questions like this. There are sections on some of those topics, particularly the relationship between stoicism and social justice, but her treatment is highly unsatisfying. She raises the question, talks about her doubts about stoicism's applicability, and then says that, after further thought, she decided stoicism is entirely consistent with social justice and the stoics were right after all. There is a little bit more explanation than that, but not much. Stoicism can apparently never be wrong; it can only be incompletely understood. Self-help books often fall short here, and I suspect this may be what the audience wants. Part of the appeal of the self-help genre is artificial certainty. Becoming a better manager, starting a business, becoming more productive, or working out an entire life philosophy are not problems amenable to a highly approachable and undemanding book. We all know that at some level, but the seductive allure of the self-help genre is the promise of simplifying complex problems down to a few approachable bullet points. Here is a life philosophy in a neatly packaged form, and if you just think deeply about its core principles, you will find they can be applied to any situation and any doubts you were harboring will turn out to be incorrect. I am all too familiar with this pattern because it's also how fundamentalist Christianity works. The second time Delaney talked about her doubts about the applicability of stoicism and then claimed a few pages later that those doubts disappeared with additional thought and discussion, my radar went off. This book was sounding less like a thoughtful examination of one specific philosophy out of many and more like the soothing adoption of religious certainty by a convert. I was therefore entirely unsurprised when Delaney all but says outright in the epilogue that she's adopted stoicism as her religion and approaches it with the same dedicated practice that she used to bring to Catholicism. I think this is where a lot of self-help books end up, although most of them don't admit it. There's nothing wrong with this, to be clear. It sounds like she was looking for a non-theistic religion, found one that she liked, and is excited to tell other people about it. But it's a profound mismatch with what I was looking for in an introduction to stoicism. I wanted context, history, and a frank discussion of the problems with adopting philosophy to everyday issues. I also wanted some acknowledgment that it is highly unlikely that a few men who lived 2000 years ago in a wildly different social context, and with drastically limited information about cultures other than their own, figured out a foolproof recipe for how to approach life. The subsequent two millennia of philosophical debates prove that stoicism didn't end the argument, and that a lot of other philosophers thought that stoicism got a few things wrong. You would never know that from this book. What I wanted is outside the scope of this sort of undemanding self-help book, though, and this is the problem that I keep having with philosophy. The books I happen across are either nigh-incomprehensibly dense and academic, or they're simplified into catechism. This was the latter. That's probably more the fault of my reading selection than it is the fault of the book, but it was still annoying. What I will say for this book, and what I suspect may be the most useful property of self-help books in general, is that it prompts you to think about basic stoic principles without getting in the way of your thoughts. It's like background music for the brain: nothing Delaney wrote was very thorny or engaging, but she kept quietly and persistently repeating the basic stoic formula and turning my thoughts back to it. Some of those thoughts may have been useful? As a source of prompts for me to ponder, Reasons Not to Worry was therefore somewhat successful. The concept of not trying to control things outside of my control is simple but valid, and it probably didn't hurt me to spend a week thinking about it. "It kind of works as an undemanding meditation aid" is not a good enough reason for me to recommend this book, but maybe that's what someone else is looking for. Rating: 5 out of 10

22 August 2024

Matthew Garrett: What the fuck is an SBAT and why does everyone suddenly care

Short version: Secure Boot Advanced Targeting and if that's enough for you you can skip the rest you're welcome.

Long version: When UEFI Secure Boot was specified, everyone involved was, well, a touch naive. The basic security model of Secure Boot is that all the code that ends up running in a kernel-level privileged environment should be validated before execution - the firmware verifies the bootloader, the bootloader verifies the kernel, the kernel verifies any additional runtime loaded kernel code, and now we have a trusted environment to impose any other security policy we want. Obviously people might screw up, but the spec included a way to revoke any signed components that turned out not to be trustworthy: simply add the hash of the untrustworthy code to a variable, and then refuse to load anything with that hash even if it's signed with a trusted key.

Unfortunately, as it turns out, scale. Every Linux distribution that works in the Secure Boot ecosystem generates their own bootloader binaries, and each of them has a different hash. If there's a vulnerability identified in the source code for said bootloader, there's a large number of different binaries that need to be revoked. And, well, the storage available to store the variable containing all these hashes is limited. There's simply not enough space to add a new set of hashes every time it turns out that grub (a bootloader initially written for a simpler time when there was no boot security and which has several separate image parsers and also a font parser and look you know where this is going) has another mechanism for a hostile actor to cause it to execute arbitrary code, so another solution was needed.

And that solution is SBAT. The general concept behind SBAT is pretty straightforward. Every important component in the boot chain declares a security generation that's incorporated into the signed binary. When a vulnerability is identified and fixed, that generation is incremented. An update can then be pushed that defines a minimum generation - boot components will look at the next item in the chain, compare its name and generation number to the ones stored in a firmware variable, and decide whether or not to execute it based on that. Instead of having to revoke a large number of individual hashes, it becomes possible to push one update that simply says "Any version of grub with a security generation below this number is considered untrustworthy".

So why is this suddenly relevant? SBAT was developed collaboratively between the Linux community and Microsoft, and Microsoft chose to push a Windows update that told systems not to trust versions of grub with a security generation below a certain level. This was because those versions of grub had genuine security vulnerabilities that would allow an attacker to compromise the Windows secure boot chain, and we've seen real world examples of malware wanting to do that (Black Lotus did so using a vulnerability in the Windows bootloader, but a vulnerability in grub would be just as viable for this). Viewed purely from a security perspective, this was a legitimate thing to want to do.

(An aside: the "Something has gone seriously wrong" message that's associated with people having a bad time as a result of this update? That's a message from shim, not any Microsoft code. Shim pays attention to SBAT updates in order to avoid violating the security assumptions made by other bootloaders on the system, so even though it was Microsoft that pushed the SBAT update, it's the Linux bootloader that refuses to run old versions of grub as a result. This is absolutely working as intended)

The problem we've ended up in is that several Linux distributions had not shipped versions of grub with a newer security generation, and so those versions of grub are assumed to be insecure (it's worth noting that grub is signed by individual distributions, not Microsoft, so there's no externally introduced lag here). Microsoft's stated intention was that Windows Update would only apply the SBAT update to systems that were Windows-only, and any dual-boot setups would instead be left vulnerable to attack until the installed distro updated its grub and shipped an SBAT update itself. Unfortunately, as is now obvious, that didn't work as intended and at least some dual-boot setups applied the update and that distribution's Shim refused to boot that distribution's grub.

What's the summary? Microsoft (understandably) didn't want it to be possible to attack Windows by using a vulnerable version of grub that could be tricked into executing arbitrary code and then introduce a bootkit into the Windows kernel during boot. Microsoft did this by pushing a Windows Update that updated the SBAT variable to indicate that known-vulnerable versions of grub shouldn't be allowed to boot on those systems. The distribution-provided Shim first-stage bootloader read this variable, read the SBAT section from the installed copy of grub, realised these conflicted, and refused to boot grub with the "Something has gone seriously wrong" message. This update was not supposed to apply to dual-boot systems, but did anyway. Basically:

1) Microsoft applied an update to systems where that update shouldn't have been applied
2) Some Linux distros failed to update their grub code and SBAT security generation when exploitable security vulnerabilities were identified in grub

The outcome is that some people can't boot their systems. I think there's plenty of blame here. Microsoft should have done more testing to ensure that dual-boot setups could be identified accurately. But also distributions shipping signed bootloaders should make sure that they're updating those and updating the security generation to match, because otherwise they're shipping a vector that can be used to attack other operating systems and that's kind of a violation of the social contract around all of this.

It's unfortunate that the victims here are largely end users faced with a system that suddenly refuses to boot the OS they want to boot. That should never happen. I don't think asking arbitrary end users whether they want secure boot updates is likely to result in good outcomes, and while I vaguely tend towards UEFI Secure Boot not being something that benefits most end users it's also a thing you really don't want to discover you want after the fact so I have sympathy for it being default on, so I do sympathise with Microsoft's choices here, other than the failed attempt to avoid the update on dual boot systems.

Anyway. I was extremely involved in the implementation of this for Linux back in 2012 and wrote the first prototype of Shim (which is now a massively better bootloader maintained by a wider set of people and that I haven't touched in years), so if you want to blame an individual please do feel free to blame me. This is something that shouldn't have happened, and unless you're either Microsoft or a Linux distribution it's not your fault. I'm sorry.

comment count unavailable comments

11 August 2024

Ravi Dwivedi: My Austrian Visa Refusal Story

Vienna - the capital of Austria - is one of the most visited cities in the world, popular for its rich history, gardens, and cafes, along with well-known artists like Beethoven, Mozart, G del, and Freud. It has also been consistently ranked as the most livable city in the world. For these reasons, I was elated when my friend Snehal invited me last year to visit Vienna for a few days. We included Christmas and New Year s Eve in my itinerary due to the city s popular Christmas markets and lively events. The festive season also ensured that Snehal had some days off for sightseeing. Indians require a visa to visit Austria. Since the travel dates were near, I rushed to book an appointment online with VFS Global in Delhi, and quickly arranged the required documents. However, at VFS, I found out that I had applied in the wrong appointment category (tourist), which depends on the purpose of the visit, and that my travel dates do not allow enough time for visa authorities to make a decision. Apparently, even if you plan to stay only for a part of the trip with the host, you need to apply under the category Visiting Friends and Family . Thus, I had to book another appointment under this category, and took the opportunity to shift my travel dates to allow at least 15 business days for the visa application to be processed, removing Christmas and New Year s Eve from my itinerary. The process went smoothly, and my visa application was submitted by VFS. For reference, here s a list of documents I submitted - The following charges were collected from me.
Service Description Amount (Indian Rupees)
Cash Handling Charge - SAC Code: (SAC:998599) 0
VFS Fee - India - SAC Code: (SAC:998599) 1,820
VISA Fee - India - SAC Code: 7,280
Convenience Fee - SAC Code: (SAC:998599) 182
Courier Service - SAC Code: (SAC:998599) 728
Courier Assurance - SAC Code: (SAC:998599) 182
Total 10,192
I later learned that the courier charges (728 INR) and the courier assurance charges (182 INR) mentioned above were optional. However, VFS didn t ask whether I wanted to include them. When the emabssy is done processing your application, it will send your passport back to VFS, from where you can either collect it yourself or get it couriered back home, which requires you to pay courier charges. However, courier assurance charges do not add any value as VFS cannot assure anything about courier and I suggest you get them removed. My visa application was submitted on the 21st of December 2023. A few days later, on the 29th of December 2023, I received an email from the Austrian embassy asking me to submit an additional document -
Subject: AUSTRIAN VISA APPLICATION - AMENDMENT REQUEST: Ravi Dwivedi VIS 4331 Dear Applicant, On 22.12.2023 your application for Visa C was registered at the Embassy. You are requested to kindly send the scanned copies of the following documents via email to the Embassy or submit the documents at the nearest VFS centre, for further processing of your application:
  • Kindly submit Electronic letter of guarantee EVE- Elektronische Verpflichtungserkl rung obtained from the Fremdenpolizeibeh rde of the sponsor s district in Austria. Once your host company/inviting company has obtained the EVE, please share the reference number (starting from DEL_____) received from the authorities, with the Embassy.
Kindly Note: It is in your own interest to fulfil the requirements as indicated above and submit the missing documents within 14 days of the receipt of this email. Otherwise a decision will be taken based on the documentation available. Sie werden in Ihrem Interesse ersucht, die gekennzeichneten M ngel so schnell wie m glich zu beheben bzw. fehlende Unterlagen umgehend nachzureichen, um die weitere Bearbeitung des Antrages zu erm glichen. Sollten Sie innerhalb 14 Tagen die gekennzeichneten M ngel nicht beheben bzw. die fehlenden Unterlagen nicht nachreichen, wird ber den vorliegenden Antrag ohne diese Unterlagen bzw. M ngelbehebung entschieden. Austrian Embassy New Delhi R.J/ Consular Section +91 11 2419 2700 EP-13, Chandragupta Marg, Chanakyapuri, New Delhi 110 021, India bmeia.gv.at/botschaft/new-delhi facebook.at/AustrianEmbassyNewDelhihttp://www.facebook.at/AustrianEmbassyNewDelhi twitter.com/MFA_Austriahttp://www.twitter.com/MFA_Austria [refocus1][Signatur_V+30]https://www.bmeia.gv.at/en/european-foreign-policy/foreign-trade/refocus-austria/[Logo_AT_IN_22px]
I misunderstood the required document (the EVE) to be a scanned copy of the letter of guarantee form signed by Snehal, and responded by attaching it. Upon researching, Snehal determined that the document is an electronic letter of guarantee, and is supposed to be obtained at a local police station in Vienna. He visited a police station the next day and had a hard time conversing due to the language barrier (German is the common language in Austria, whereas Snehal speaks English). That day was a weekend, so he took an appointment for Monday, but in the meantime the embassy had finished processing my visa. My visa was denied, and the refusal letter stated:
The Austrian embassy in Delhi examined your application; the visa has been refused. The decision is based on the following reason(s):
  • The information submitted regarding the justification for the purpose and conditions of the intended stay was not reliable.
  • There are reasonable doubts as to your intention to leave the territory of the Member States before the expiry of the visa.
Other remarks: You have been given an amendment request, which you have failed to fulfil, or have only fulfilled inadequately, within the deadline set. You are a first-time traveller. The social and economic roots with the home country are not evident. The return from Schengen territory does therefore not seem to be certain.
I could have reapplied after obtaining the EVE, but I didn t because I found the following line
The social and economic roots with the home country are not evident.
offensive for someone who was born and raised in India, got the impression that the absence of electronic guarantee letter was not the only reason behind the refusal, had already wasted 12,000 INR on this application, and my friend s stay in Austria was uncertain after January. In fact, my friend soon returned to India. To summarize -
  1. If you are visiting a host, then the category of appointment at VFS must be Visiting Friends and Family rather than Tourist .
  2. VFS charged me for courier assurance, which is an optional service. Make sure to get these removed from your bill.
  3. Neither my travel agent nor the VFS application center mentioned the EVE.
  4. While the required documents list from the VFS website does mention it in point 6, it leads to a dead link.
  5. Snehal informed me that a mere two months ago, his wife s visa was approved without an EVE. This hints at inconsistency in processing of applications, even those under identical categories.
Such incidents are a waste of time and money for applicants, and an embarrassment to VFS and the Austrian visa authorities. I suggest that the Austrian visa authorities fix that URL, and provide instructions for hosts to obtain the EVE. Credits to Snehal and Contrapunctus for editing, Badri for proofreading.

7 August 2024

Sahil Dhiman: Banks With Own ASN in India

Most banks are behind CDNs and DDoS mitigation providers nowadays, though they still hold their own IP space. Was interested in this, so compiled a list from BGP.Tools and Hurricane Electric BGP Toolkit. Other noteable mentions: Let me know if I m missing someone. Many thanks to Saswata Sarkar for helping with the list.

4 August 2024

Matthias Klumpp: Freedesktop Specs Website Update

The Freedesktop.org Specifications directory contains a list of common specifications that have accumulated over the decades and define how common desktop environment functionality works. The specifications are designed to increase interoperability between desktops. Common specifications make the life of both desktop-environment developers and especially application developers (who will almost always want to maximize the amount of Linux DEs their app can run on and behave as expected, to increase their apps target audience) a lot easier. Unfortunately, building the HTML specifications and maintaining the directory of available specs has become a bit of a difficult chore, as the pipeline for building the site has become fairly old and unmaintained (parts of it still depended on Python 2). In order to make my life of maintaining this part of Freedesktop easier, I aimed to carefully modernize the website. I do have bigger plans to maybe eventually restructure the site to make it easier to navigate and not just a plain alphabetical list of specifications, and to integrate it with the Wiki, but in the interest of backwards compatibility and to get anything done in time (rather than taking on a mega-project that can t be finished), I decided to just do the minimum modernization first to get a viable website, and do the rest later. So, long story short: Most Freedesktop specs are written in DocBook XML. Some were plain HTML documents, some were DocBook SGML, a few were plaintext files. To make things easier to maintain, almost every specification is written in DocBook now. This also simplifies the review process and we may be able to switch to something else like AsciiDoc later if we want to. Of course, one could have switched to something else than DocBook, but that would have been a much bigger chore with a lot more broken links, and I did not want this to become an even bigger project than it already was and keep its scope somewhat narrow. DocBook is a markup language for documentation which has been around for a very long time, and therefore has older tooling around it. But fortunately our friends at openSUSE created DAPS (DocBook Authoring and Publishing Suite) as a modern way to render DocBook documents to HTML and other file formats. DAPS is now used to generate all Freedesktop specifications on our website. The website index and the specification revisions are also now defined in structured TOML files, to make them easier to read and to extend. A bunch of specifications that had been missing from the original website are also added to the index and rendered on the website now. Originally, I wanted to put the website live in a temporary location and solicit feedback, especially since some links have changed and not everything may have redirects. However, due to how GitLab Pages worked (and due to me not knowing GitLab CI well enough ) the changes went live before their MR was actually merged. Rather than reverting the change, I decided to keep it (as the old website did not build properly anymore) and to see if anything breaks. So far, no dead links or bad side effects have been observed, but: If you notice any broken link to specifications.fd.o or anything else weird, please file a bug so that we can fix it! Thank you, and I hope you enjoy reading the specifications in better rendering and more coherent look!

1 August 2024

Guido G nther: Free Software Activities July 2024

A short status update on what happened on my side last month. Looking at unified push support for Chatty prompted some libcmatrix fixes and Chatty improvements (benefiting other protocols like SMS/MMS as well). The Bluetooth status page in Phosh was a slightly larger change code wise as we also enhanced our common widgets for building status pages, simplifying the Wi-Fi status page and making future status pages simpler. But as usual investigating bugs, reviewing patches (thanks!) and keeping up with the changing world around us is what ate most of the time. Phosh A Wayland Shell for mobile devices Phoc A Wayland compositor for mobile devices libphosh-rs Phosh Rust bindings phosh-osk-stub A on screen keyboard for Phosh phosh-mobile-settings phosh-wallpapers Wallpapers, Sounds and other artwork git-buildpackage Suite to help with Debian packages in Git repositories Whatmaps Tool to find processes mapping shared objects Debian The universal operating system Mobian A Debian derivative for mobile devices Calls PSTN and SIP calls for GNOME Livi Minimalistic video player targeting mobile devices libcall-ui Common user interface parts for call handling in GNOME and Phosh. feedbackd DBus service for haptic/visual/audio feedback Chatty Messaging application for mobile and desktop libcmatrix A matrix client client library Eigenvalue A libcmatrix test client Help Development If you want to support my work see donations. This includes list of hardware we want to improve support for.

21 July 2024

Mike Gabriel: Polis - a FLOSS Tool for Civic Participation -- Issues extending Polis and adjusting our Goals

Here comes the 3rd article of the 5-episode blog post series on Polis, written by Guido Berh rster, member of staff at my company Fre(i)e Software GmbH. Enjoy also this read on Guido's work on Polis,
Mike
Table of Contents of the Blog Post Series
  1. Introduction
  2. Initial evaluation and adaptation
  3. Issues extending Polis and adjusting our goals (this article)
  4. Creating (a) new frontend(s) for Polis
  5. Current status and roadmap
Polis - Issues extending Polis and adjusting our Goals After the initial implementation of limited branding support, user feedback and the involvement of an UX designer lead to the conclusion that we needed more far-reaching changes to the user interface in order to reduce visual clutter, rearrange and improve UI elements, and provide better integration with the websites in which conversations are embedded. Challenges when visualizing Data in Polis Polis visualizes groups using a spatial projection of users based on similarities in voting behavior and places them in two to five groups using a clustering algorithm. During our testing and evaluation users were rarely able to interpret the visualization and often intuitively made incorrect assumptions e.g. by associating the filled area of a group with its significance or size. After consultation with a member of the Multi-Agent Systems (MAS) Group at the University of Groningen we chose to temporarily replace the visualization offered by Polis with simple bar charts representing agreement or disagreement with statements of a group or the majority. We intend to revisit this and explore different forms of visualization at a later point in time. The different factors playing into the weight attached to statements which determine the pseuodo-random order in which they are presented for voting ( comment routing ) proved difficult to explain to stakeholders and users and the admission of the ad-hoc and heuristic nature of the used algorithm1 by Polis authors lead to the decision to temporarily remove this feature. Instead, statements should be placed into three groups, namely
  1. metadata questions,
  2. seed statements,
  3. and participant statements
Statements should then be sorted by group but in a fully randomized order within the group so that metadata questions would be presented before seed statements which would be presented before participant s statements. This simpler method was deemed sufficient for the scale of our pilot projects, however we intend to revisit this decision and explore different methods of comment routing in cooperation with our scientific partners at a later point in time. An evaluation of the requirements for implementing mandatory authentication and adding support for additional authentication methods to Polis showed that significant changes to both the administration and participation frontend were needed due to a lack of an abstraction layer or extension mechanism and the current authentication providers being hardcoded in many parts of the code base. A New Frontend is born: Particiapp Based on the implementation details of the participation frontend, the invasive nature of the changes required, and the overhead of keeping up with active upstream development it became clear that a different, more flexible approach to development was needed. This ultimately lead to the creation of Particiapp, a new Open Source project providing the building blocks and necessary abstraction layers for rapid protoyping and experimentation with different fontends which are compatible with but independent from Polis.
  1. Small, Christopher T., Bjorkegren, Michael, Erkkil , Timo, Shaw, Lynette and Megill, Colin (2021). Polis: Scaling deliberation by mapping high dimensional opinion spaces. Recerca. Revista de Pensament i An lisi, 26(2), pp. 1-26.

14 July 2024

Russ Allbery: DocKnot 8.0.1

DocKnot is my static web site generator, with some additional features for managing software releases. This release fixes some bugs in the newly-added conversion of text to HTML that were due to my still-incomplete refactoring of that code. It still uses some global variables, and they were leaking between different documents and breaking the formatting. It also fixes consistency problems with how the style parameter in *.spin files was interpreted, and fixes some incorrect docknot update-spin behavior. You can get the latest version from CPAN or from the DocKnot distribution page.

13 July 2024

Anuradha Weeraman: Windows of Opportunity: Microsoft's Open Source Renaissance

Windows of Opportunity: Microsoft's Open Source RenaissanceTwenty years ago, it was easy to dislike Microsoft. It was the quintessential evil MegaCorp that was quick to squash competition, often ruthlessly, but in some cases slowly through a more insidious process of embracing, extending, and exterminating anything that got in the way. This was the signature personality of Ballmer-era Microsoft that also inspired and united the software freedom fighting forces that came together to safeguard things that mattered to them and were at risk.I remember the era when the Novell, SCO, and Microsoft saga cast fear, uncertainty, and doubt on the future of open Unix and Linux and on what would happen to the operating systems that we loved if the suits of Redmond prevailed. Looking back, I&aposm glad that the arc of this story has bent towards justice, and I shudder at the possibilities had it worked out differently.Looking at today&aposs Microsoft, I&aposm amazed at how much change a leader with the right vision can make to the trajectory of a company that even makes an old-school software freedom advocate as me admire and even applaud the strides it has taken in the last 10 or so years that has dramatically shifted the perception of Microsoft. The personality of the Satya-era Microsoft is one to behold. While it will take more time to win back the trust, we see the tides changing and the positivity is important for the entire industry.For Microsoft, it was TypeScript and VS Code that helped change the narrative internally which led to its internal resurgence and acceptance of open source. Its acquisition of GitHub propelled it forward within the community overnight. Its contributions to the Linux kernel and other major software projects have also been consequential in changing its public perceptions.It takes a while to claw back trust and is very easy to breach. This time, however, Microsoft seems to understand this dynamic more than it did 20 years ago. All it took was the right leadership.

9 July 2024

Russ Allbery: Review: Raising Steam

Review: Raising Steam, by Terry Pratchett
Series: Discworld #40
Publisher: Anchor Books
Copyright: 2013
Printing: October 2014
ISBN: 0-8041-6920-9
Format: Trade paperback
Pages: 365
Raising Steam is the 40th Discworld novel and the third Moist von Lipwig novel, following Making Money. This is not a good place to start reading the series. Dick Simnel is a tinkerer from a line of tinkerers. He has been obsessed with mastering the power of steam since the age of ten, when his father died in a steam accident. That pursuit took him deeper into mathematics and precision, calculations and experiments, until he built Iron Girder: Discworld's first steam-powered locomotive. His early funding came from some convenient family pirate treasure, but turning his prototype into something more will require significantly more resources. That is how he ends up in the office of Harry King, Ankh-Morpork's sanitation magnate. Simnel's steam locomotive has the potential to solve some obvious logistical problems, such as getting fish from the docks of Quirm to the streets of Ankh-Morpork before it stops being vaguely edible. That's not what makes railways catch fire, however. As soon as Iron Girder is huffing and puffing its way around King's compound, it becomes the most popular attraction in the city. People stand in line for hours to ride it over and over again for reasons that they cannot entirely explain. There is something wild and uncontrollable going on. Vetinari is not sure he likes wild and uncontrollable, but he knows the lap into which such problems can be dumped: Moist von Lipwig, who is already getting bored with being a figurehead for the city's banking system. The setup for Raising Steam reminds me more of Moving Pictures than the other Moist von Lipwig novels. Simnel himself is a relentlessly practical engineer, but the trains themselves have tapped some sort of primal magic. Unlike Moving Pictures, Pratchett doesn't provide an explicit fantasy explanation involving intruding powers from another world. It might have been a more interesting book if he had. Instead, this book expects the reader to believe there is something inherently appealing and fascinating about trains, without providing much logic or underlying justification. I think some readers will be willing to go along with this, and others (myself included) will be left wishing the story had more world-building and fewer exclamation points. That's not the real problem with this book, though. Sadly, its true downfall is that Pratchett's writing ability had almost completely collapsed by the time he wrote it. As mentioned in my review of Snuff, we're now well into the period where Pratchett was suffering the effects of early-onset Alzheimer's. In that book, his health issues mostly affected the dialogue near the end of the novel. In this book, published two years later, it's pervasive and much worse. Here's a typical passage from early in the book:
It is said that a soft answer turneth away wrath, but this assertion has a lot to do with hope and was now turning out to be patently inaccurate, since even a well-spoken and thoughtful soft answer could actually drive the wrong kind of person into a state of fury if wrath was what they had in mind, and that was the state the elderly dwarf was now enjoying.
One of the best things about Discworld is Pratchett's ability to drop unexpected bits of wisdom in a sentence or two, or twist a verbal knife in an unexpected and surprising direction. Raising Steam still shows flashes of that ability, but it's buried in run-on sentences, drowned in cliches and repetition, and often left behind as the containing sentence meanders off into the weeds and sputters to a confused halt. The idea is still there; the delivery, sadly, is not. This is the first Discworld novel that I found mentally taxing to read. Sentences are often so overpacked that they require real effort to untangle, and the untangled meaning rarely feels worth the effort. The individual voice of the characters is almost gone. Vetinari's monologues, rather than being a rare event with dangerous layers, are frequent, rambling, and indecisive, often sounding like an entirely different character than the Vetinari we know. The constant repetition of the name any given character is speaking to was impossible for me to ignore. And the momentum of the story feels wrong; rather than constructing the events of the story in a way that sweeps the reader along, it felt like Pratchett was constantly pushing, trying to convince the reader that trains were the most exciting thing to ever happen to Discworld. The bones of a good story are here, including further development of dwarf politics from The Fifth Elephant and Thud! and the further fallout of the events of Snuff. There are also glimmers of Pratchett's typically sharp observations and turns of phrase that could have been unearthed and polished. But at the very least this book needed way more editing and a lot of rewriting. I suspect it could have dropped thirty pages just by tightening the dialogue and removing some of the repetition. I'm afraid I did not enjoy this. I am a bit of a hard sell for the magic fascination of trains I love trains, but my model railroad days are behind me and I'm now more interested in them as part of urban transportation policy. Previous Discworld books on technology and social systems did more of the work of drawing the reader in, providing character hooks and additional complexity, and building a firmer foundation than "trains are awesome." The main problem, though, was the quality of the writing, particularly when compared to the previous novels with the same characters. I dragged myself through this book out of a sense of completionism and obligation, and was relieved when I finished it. This is the first Discworld novel that I don't recommend. I think the only reason to read it is if you want to have read all of Discworld. Otherwise, consider stopping with Snuff and letting it be the send-off for the Ankh-Morpork characters. Followed by The Shepherd's Crown, a Tiffany Aching story and the last Discworld novel. Rating: 3 out of 10

4 July 2024

Arturo Borrero Gonz lez: Wikimedia Toolforge: migrating Kubernetes from PodSecurityPolicy to Kyverno

Le ch teau de Val re et le Haut de Cry en juillet 2022 Christian David, CC BY-SA 4.0, via Wikimedia Commons This post was originally published in the Wikimedia Tech blog, authored by Arturo Borrero Gonzalez. Summary: this article shares the experience and learnings of migrating away from Kubernetes PodSecurityPolicy into Kyverno in the Wikimedia Toolforge platform. Wikimedia Toolforge is a Platform-as-a-Service, built with Kubernetes, and maintained by the Wikimedia Cloud Services team (WMCS). It is completely free and open, and we welcome anyone to use it to build and host tools (bots, webservices, scheduled jobs, etc) in support of Wikimedia projects. We provide a set of platform-specific services, command line interfaces, and shortcuts to help in the task of setting up webservices, jobs, and stuff like building container images, or using databases. Using these interfaces makes the underlying Kubernetes system pretty much invisible to users. We also allow direct access to the Kubernetes API, and some advanced users do directly interact with it. Each account has a Kubernetes namespace where they can freely deploy their workloads. We have a number of controls in place to ensure performance, stability, and fairness of the system, including quotas, RBAC permissions, and up until recently PodSecurityPolicies (PSP). At the time of this writing, we had around 3.500 Toolforge tool accounts in the system. We early adopted PSP in 2019 as a way to make sure Pods had the correct runtime configuration. We needed Pods to stay within the safe boundaries of a set of pre-defined parameters. Back when we adopted PSP there was already the option to use 3rd party agents, like OpenPolicyAgent Gatekeeper, but we decided not to invest in them, and went with a native, built-in mechanism instead. In 2021 it was announced that the PSP mechanism would be deprecated, and removed in Kubernetes 1.25. Even though we had been warned years in advance, we did not prioritize the migration of PSP until we were in Kubernetes 1.24, and blocked, unable to upgrade forward without taking actions. The WMCS team explored different alternatives for this migration, but eventually we decided to go with Kyverno as a replacement for PSP. And so with that decision it began the journey described in this blog post. First, we needed a source code refactor for one of the key components of our Toolforge Kubernetes: maintain-kubeusers. This custom piece of software that we built in-house, contains the logic to fetch accounts from LDAP and do the necessary instrumentation on Kubernetes to accommodate each one: create namespace, RBAC, quota, a kubeconfig file, etc. With the refactor, we introduced a proper reconciliation loop, in a way that the software would have a notion of what needs to be done for each account, what would be missing, what to delete, upgrade, and so on. This would allow us to easily deploy new resources for each account, or iterate on their definitions. The initial version of the refactor had a number of problems, though. For one, the new version of maintain-kubeusers was doing more filesystem interaction than the previous version, resulting in a slow reconciliation loop over all the accounts. We used NFS as the underlying storage system for Toolforge, and it could be very slow because of reasons beyond this blog post. This was corrected in the next few days after the initial refactor rollout. A side note with an implementation detail: we stored a configmap on each account namespace with the state of each resource. Storing more state on this configmap was our solution to avoid additional NFS latency. I initially estimated this refactor would take me a week to complete, but unfortunately it took me around three weeks instead. Previous to the refactor, there were several manual steps and cleanups required to be done when updating the definition of a resource. The process is now automated, more robust, performant, efficient and clean. So in my opinion it was worth it, even if it took more time than expected. Then, we worked on the Kyverno policies themselves. Because we had a very particular PSP setting, in order to ease the transition, we tried to replicate their semantics on a 1:1 basis as much as possible. This involved things like transparent mutation of Pod resources, then validation. Additionally, we had one different PSP definition for each account, so we decided to create one different Kyverno namespaced policy resource for each account namespace remember, we had 3.5k accounts. We created a Kyverno policy template that we would then render and inject for each account. For developing and testing all this, maintain-kubeusers and the Kyverno bits, we had a project called lima-kilo, which was a local Kubernetes setup replicating production Toolforge. This was used by each engineer in their laptop as a common development environment. We had planned the migration from PSP to Kyverno policies in stages, like this:
  1. update our internal template generators to make Pod security settings explicit
  2. introduce Kyverno policies in Audit mode
  3. see how the cluster would behave with them, and if we had any offending resources reported by the new policies, and correct them
  4. modify Kyverno policies and set them in Enforce mode
  5. drop PSP
In stage 1, we updated things like the toolforge-jobs-framework and tools-webservice. In stage 2, when we deployed the 3.5k Kyverno policy resources, our production cluster died almost immediately. Surprise. All the monitoring went red, the Kubernetes apiserver became irresponsibe, and we were unable to perform any administrative actions in the Kubernetes control plane, or even the underlying virtual machines. All Toolforge users were impacted. This was a full scale outage that required the energy of the whole WMCS team to recover from. We temporarily disabled Kyverno until we could learn what had occurred. This incident happened despite having tested before in lima-kilo and in another pre-production cluster we had, called Toolsbeta. But we had not tested that many policy resources. Clearly, this was something scale-related. After the incident, I went on and created 3.5k Kyverno policy resources on lima-kilo, and indeed I was able to reproduce the outage. We took a number of measures, corrected a few errors in our infrastructure, reached out to the Kyverno upstream developers, asking for advice, and at the end we did the following to accommodate the setup to our needs: I have to admit, I was briefly tempted to drop Kyverno, and even stop pursuing using an external policy agent entirely, and write our own custom admission controller out of concerns over performance of this architecture. However, after applying all the measures listed above, the system became very stable, so we decided to move forward. The second attempt at deploying it all went through just fine. No outage this time When we were in stage 4 we detected another bug. We had been following the Kubernetes upstream documentation for setting securityContext to the right values. In particular, we were enforcing the procMount to be set to the default value, which per the docs it was DefaultProcMount . However, that string is the name of the internal variable in the source code, whereas the actual default value is the string Default . This caused pods to be rightfully rejected by Kyverno while we figured the problem. I sent a patch upstream to fix this problem. We finally had everything in place, reached stage 5, and we were able to disable PSP. We unloaded the PSP controller from the kubernetes apiserver, and deleted every individual PSP definition. Everything was very smooth in this last step of the migration. This whole PSP project, including the maintain-kubeusers refactor, the outage, and all the different migration stages took roughly three months to complete. For me there are a number of valuable reasons to learn from this project. For one, the scale is something to consider, and test, when evaluating a new architecture or software component. Not doing so can lead to service outages, or unexpectedly poor performances. This is in the first chapter of the SRE handbook, but we got a reminder the hard way This post was originally published in the Wikimedia Tech blog, authored by Arturo Borrero Gonzalez.

3 July 2024

Samuel Henrique: Announcing wcurl: a curl wrapper to download files

tl;dr Whenever you need to download files through the terminal and don't feel like using wget:
wcurl example.com/filename.txt
Manpage:
https://manpages.debian.org/unstable/curl/wcurl.1.en.html

Availability (comes installed with the curl package):
  • Debian unstable - Since 2024-07-02
  • Debian testing - Since 2024-07-18
  • Debian 12/bookworm backports - Expected by the end of August 2024.
  • Debian 12/bookworm - Depends on whether Debian's release team will approve it, it could be available in the next point release.
  • Debian derivatives - Rolling releases will get it by the time it's on Debian testing (e.g.: Kali Linux). Stable derivatives only in their next major release.
If you don't want to wait for the package update to arrive, you can always copy the script and place it in your /usr/bin, the code is here:
https://github.com/Debian/wcurl/blob/main/wcurl
https://salsa.debian.org/debian/wcurl/-/blob/main/wcurl?ref_type=heads

Smoother CLI experience Starting with curl version 8.8.0-2, the Debian's curl package now ships a wcurl executable. wcurl is the solution for those who just need to download files without having to remember curl's parameters for things like automatically naming the files. Some people, myself included, would fall back to using wget whenever there was a need to download a file. Sometimes even installing wget just for that usecase. After all, it's easier to remember "apt install wget" rather than "curl -L -O -C - ...". wcurl consists of a simple shell script that provides sane defaults for the curl invocation, for when the use case is to just download files. By default, wcurl will:
  • Encode whitespaces in URLs;
  • Download multiple URLs in parallel if the installed curl's version is >= 7.66.0;
  • Follow redirects;
  • Automatically choose a filename as output;
  • Avoid overwriting files if the installed curl's version is >= 7.83.0 (--no-clobber);
  • Perform retries;
  • Set the downloaded file timestamp to the value provided by the server, if available;
  • Default to the protocol used as https if the URL doesn't contain any;
  • Disable curl's URL globbing parser so and [] characters in URLs are not treated specially.
Example to download a single file:
wcurl example.com/filename.txt
If you ever need to set a custom flag, you can make use of the --curl-options wcurl option, anything set there will be passed to the curl invocation. Just beware that if you need to set any custom flags, it's likely you will be better served by calling curl directly. The --curl-options option is there to allow for some flexibility in unforeseen circumstances.

The need for wcurl I've always felt a bit ashamed of not remembering curl's parameters for downloading a file and automatically naming it, having resorted to wget most of the times this was needed (even installing wget when it wasn't there, just for this). I've spoken to a few other experienced people I know and confirmed what could be obvious to others: a lot of people struggle with this. Recently, the curl project released the results of 2024's curl survey, which also showed this is as a much needed feature, just look at some of the answers:

Q: Which curl command line option do you think needs improvement and how?
-O, I really want wget like functionality where I don't have to specify the name
Downloading a file (like wget) could be improved - with automatic naming of the file
downloading files - wget is much cleaner
I wish the default behaviour when GETting a binary was to drop it on disk. That's the only reason 'wget foo.tgz" is still ingrained in my muscle memory .
Maybe have a way to download without specifying something in -o (the only reason i used wget still)
--remote-time should be default
--remote-name-all could really use a short flag

Q: If you miss support for something, tell us what!
"Write the data to the file named in the URL (or in redirects if I'm feeling daring), and timestamp the file to the last-modified-date". This is the main reason I'm still using wget.
I can finally feel less bad about falling back to wget due to not remembering the parameters I want.

Idealization vs. reality I don't believe curl will ever change its default behavior in such a way that would accommodate this need, as that would have a side-effect of breaking things which expect the current behavior (the blast radius is literally the solar system). This means a new executable needs to be shipped side-by-side with curl, an opportunity to start fresh and work with a more focused use case (to download files). Ideally, this new executable would be maintained by the curl project, make use of libcurl under-the-hood, and be available everywhere. Nobody wants to worry if their systems have the tool or not, it should always be there. Given I'm just a Debian Developer, with not as much free time as I wish, I've decided to write a simple shell script wrapper calling the curl CLI under-the-hood. wcurl will come installed with the curl package from now on, and I will check with the release team about shipping it on the current Debian stable as well. Shipping wcurl in other distros will be up to them (Debian-derivatives should pick it up automatically, though). We've tried to make it easy for anyone to ship this by using the curl license, keeping the script POSIX-compliant, and shipping a manpage. Maybe if there's enough interest across distributions, someone might sign up for implementing this in upstream curl and increase its reach. I would be happy with the curl project reusing the wcurl name when that happens. It's unlikely that wcurl would be shipped by curl upstream as it is, assuming they would prefer a solution that uses libcurl direclty (more similar to curl the CLI, to maintain). In the worst case, wcurl becomes a Debian-specific tool that only a few people are aware of, in the best case, it becomes the new go-to CLI tool for simply downloading files. I would be happy if at least someone other than me finds it useful.

Naming is hard When I started working on it, I was calling the new executable "curld" (stands for "curl download"), but then when discussing this in one of our weekly calls in the Debian Bras lia community, it was mentioned that this could be confused for a daemon. We then settled for the name "wcurl", suggested by Carlos Henrique Lima Melara <charles>. It doesn't really stand for anything, but it's very easy to remember. You know... "it's that wget alternative for when you want to use curl instead" :)

Feedback I'm hosting the code on Github and Debian's GitLab instance, feel free to open an issue to provide feedback.
https://salsa.debian.org/debian/wcurl
https://github.com/Debian/wcurl We also have a Matrix room for the Debian curl maintainers:
https://matrix.to/#/#debian-curl-maintainers:matrix.org

Acknowledgments The idea for wcurl came a few days before the curl-up conference 2024. I've been thinking a lot about developer productivity in the terminal lately, different tools and better defaults. Before curl-up, I was also thinking about packaging improvements for the curl package. I don't remember what exactly happened, but I likely had to download something and felt a bit ashamed of maintaining curl and not remembering the parameters to download files the way I wanted. I first discussed this idea in the conference, where I asked the participants about it and there were no concerns raised, and some people said I should give it a go. Participating in curl-up was a really great experience and I'm thankful for the interactions I've had there. On the Debian side, I've got reviews of the code and manpage by Sergio Durigan Junior <sergiodj>, Guilherme Puida Moreira <puida> and Carlos Henrique Lima Melara <charles>. Sergio ended up rewriting the tool to be POSIX-compliant (my version was written in bash), so he takes all the credit for the portability.

Changes since publication

2024-07-18
  • Update date of availability for Debian testing and expected date for bookworm backports.
  • Mention charles as the person who suggested "wcurl" as a name.
  • Update wcurl's -o/--opts options, it's now just --curl-options.
  • Remove mention of language spoken in the Matrix room, we are using English now.
  • Update list of features of wcurl.

2 July 2024

Bits from Debian: Bits from the DPL

Dear Debian community, Statement on Daniel Pocock The Debian project has successfully taken action to secure its trademarks and interests worldwide, as detailed in our press statement. I would like to personally thank everyone in the community who was involved in this process. I would have loved for you all to have spent your volunteer time on more fruitful things. Debian Boot team might need help I think I've identified the issue that finally motivated me to contact our teams: for a long time, I have had the impression that Debian is driven by several "one-person teams" (to varying extents of individual influence and susceptibility to burnout). As DPL, I see it as my task to find ways to address this issue and provide support. I received private responses from Debian Boot team members, which motivated me to kindly invite volunteers to some prominent and highly visible fields of work that you might find personally challenging. I recommend subscribing to the Debian Boot mailing list to see where you might be able to provide assistance. /usrmerge Helmut Grohne confirmed that the last remaining packages shipping aliased files inside the package set relevant to debootstrap were uploaded. Thanks a lot for Helmut and all contributors that helped to implement DEP17. Contacting more teams I'd like to repeat that I've registered a BoF for DebConf24 in Busan with the following description: This BoF is an attempt to gather as much as possible teams inside Debian to exchange experiences, discuss workflows inside teams, share their ways to attract newcomers etc. Each participant team should prepare a short description of their work and what team roles ( openings ) they have for new contributors. Even for delegated teams (membership is less fluid), it would be good to present the team, explain what it takes to be a team member, and what steps people usually go to end up being invited to participate. Some other teams can easily absorb contributions from salsa MRs, and at some point people get commit access. Anyway, the point is that we work on the idea that the pathway to become a team member becomes more clear from an outsider point-of-view. I'm lagging a bit behind my team contacting schedule and will not manage to contact every team before DebConf. As a (short) summary, I can draw some positive conclusions about my efforts to reach out to teams. I was able to identify some issues that were new to me and which I am now working on. Examples include limitations in Salsa and Salsa CI. I consider both essential parts of our infrastructure and will support both teams in enhancing their services. Some teams confirmed that they are basically using some common infrastructure (Salsa team space, mailing lists, IRC channels) but that the individual members of the team work on their own problems without sharing any common work. I have also not read about convincing strategies to attract newcomers to the team, as we have established, for instance, in the Debian Med team. DebConf attendance The amount of money needed to fly people to South Korea was higher than usual, so the DebConf bursary team had to make some difficult decisions about who could be reimbursed for travel expenses. I extended the budget for diversity and newcomers, which enabled us to invite some additional contributors. We hope that those who were not able to come this year can make it next year to Brest or to MiniDebConf Cambridge or Toulouse tag2upload On June 12, Sean Whitton requested comments on the debian-vote list regarding a General Resolution (GR) about tag2upload. The discussion began with technical details but unfortunately, as often happens in long threads, it drifted into abrasive language, prompting the community team to address the behavior of an opponent of the GR supporters. After 560 emails covering technical details, including a detailed security review by Russ Allbery, Sean finally proposed the GR on June 27, 2024 (two weeks after requesting comments). Firstly, I would like to thank the drivers of this GR and acknowledge the technical work behind it, including the security review. I am positively convinced that Debian can benefit from modernizing its infrastructure, particularly through stronger integration of Git into packaging workflows. Sam Hartman provided some historical context [1], [2], [3], [4], noting that this discussion originally took place five years ago with no results from several similarly lengthy threads. My favorite summary of the entire thread was given by Gregor Herrmann, which reflects the same gut feeling I have and highlights a structural problem within Debian that hinders technical changes. Addressing this issue is definitely a matter for the Debian Project Leader, and I will try to address it during my term. At the time of writing these bits, a proposal from ftpmaster, which is being continuously discussed, might lead to a solution. I was also asked to extend the GR discussion periods which I will do in separate mail. Talk: Debian GNU/Linux for Scientific Research I was invited to have a talk in the Systems-Facing Track of University of British Columbia (who is sponsoring rack space for several Debian servers). I admit it felt a bit strange to me after working more than 20 years for establishing Debian in scientific environments to be invited to such a talk "because I'm DPL". Kind regards Andreas.

Next.