Search Results: "torin"

29 September 2024

Reproducible Builds: Supporter spotlight: Kees Cook on Linux kernel security

The Reproducible Builds project relies on several projects, supporters and sponsors for financial support, but they are also valued as ambassadors who spread the word about our project and the work that we do. This is the eighth installment in a series featuring the projects, companies and individuals who support the Reproducible Builds project. We started this series by featuring the Civil Infrastructure Platform project, and followed this up with a post about the Ford Foundation as well as recent ones about ARDC, the Google Open Source Security Team (GOSST), Bootstrappable Builds, the F-Droid project, David A. Wheeler and Simon Butler. Today, however, we will be talking with Kees Cook, founder of the Kernel Self-Protection Project.

Vagrant Cascadian: Could you tell me a bit about yourself? What sort of things do you work on? Kees Cook: I m a Free Software junkie living in Portland, Oregon, USA. I have been focusing on the upstream Linux kernel s protection of itself. There is a lot of support that the kernel provides userspace to defend itself, but when I first started focusing on this there was not as much attention given to the kernel protecting itself. As userspace got more hardened the kernel itself became a bigger target. Almost 9 years ago I formally announced the Kernel Self-Protection Project because the work necessary was way more than my time and expertise could do alone. So I just try to get people to help as much as possible; people who understand the ARM architecture, people who understand the memory management subsystem to help, people who understand how to make the kernel less buggy.
Vagrant: Could you describe the path that lead you to working on this sort of thing? Kees: I have always been interested in security through the aspect of exploitable flaws. I always thought it was like a magic trick to make a computer do something that it was very much not designed to do and seeing how easy it is to subvert bugs. I wanted to improve that fragility. In 2006, I started working at Canonical on Ubuntu and was mainly focusing on bringing Debian and Ubuntu up to what was the state of the art for Fedora and Gentoo s security hardening efforts. Both had really pioneered a lot of userspace hardening with compiler flags and ELF stuff and many other things for hardened binaries. On the whole, Debian had not really paid attention to it. Debian s packaging building process at the time was sort of a chaotic free-for-all as there wasn t centralized build methodology for defining things. Luckily that did slowly change over the years. In Ubuntu we had the opportunity to apply top down build rules for hardening all the packages. In 2011 Chrome OS was following along and took advantage of a bunch of the security hardening work as they were based on ebuild out of Gentoo and when they looked for someone to help out they reached out to me. We recognized the Linux kernel was pretty much the weakest link in the Chrome OS security posture and I joined them to help solve that. Their userspace was pretty well handled but the kernel had a lot of weaknesses, so focusing on hardening was the next place to go. When I compared notes with other users of the Linux kernel within Google there were a number of common concerns and desires. Chrome OS already had an upstream first requirement, so I tried to consolidate the concerns and solve them upstream. It was challenging to land anything in other kernel team repos at Google, as they (correctly) wanted to minimize their delta from upstream, so I needed to work on any major improvements entirely in upstream and had a lot of support from Google to do that. As such, my focus shifted further from working directly on Chrome OS into being entirely upstream and being more of a consultant to internal teams, helping with integration or sometimes backporting. Since the volume of needed work was so gigantic I needed to find ways to inspire other developers (both inside and outside of Google) to help. Once I had a budget I tried to get folks paid (or hired) to work on these areas when it wasn t already their job.
Vagrant: So my understanding of some of your recent work is basically defining undefined behavior in the language or compiler? Kees: I ve found the term undefined behavior to have a really strict meaning within the compiler community, so I have tried to redefine my goal as eliminating unexpected behavior or ambiguous language constructs . At the end of the day ambiguity leads to bugs, and bugs lead to exploitable security flaws. I ve been taking a four-pronged approach: supporting the work people are doing to get rid of ambiguity, identify new areas where ambiguity needs to be removed, actually removing that ambiguity from the C language, and then dealing with any needed refactoring in the Linux kernel source to adapt to the new constraints. None of this is particularly novel; people have recognized how dangerous some of these language constructs are for decades and decades but I think it is a combination of hard problems and a lot of refactoring that nobody has the interest/resources to do. So, we have been incrementally going after the lowest hanging fruit. One clear example in recent years was the elimination of C s implicit fall-through in switch statements. The language would just fall through between adjacent cases if a break (or other code flow directive) wasn t present. But this is ambiguous: is the code meant to fall-through, or did the author just forget a break statement? By defining the [[fallthrough]] statement, and requiring its use in Linux, all switch statements now have explicit code flow, and the entire class of bugs disappeared. During our refactoring we actually found that 1 in 10 added [[fallthrough]] statements were actually missing break statements. This was an extraordinarily common bug! So getting rid of that ambiguity is where we have been. Another area I ve been spending a bit of time on lately is looking at how defensive security work has challenges associated with metrics. How do you measure your defensive security impact? You can t say because we installed locks on the doors, 20% fewer break-ins have happened. Much of our signal is always secondary or retrospective, which is frustrating: This class of flaw was used X much over the last decade so, and if we have eliminated that class of flaw and will never see it again, what is the impact? Is the impact infinity? Attackers will just move to the next easiest thing. But it means that exploitation gets incrementally more difficult. As attack surfaces are reduced, the expense of exploitation goes up.
Vagrant: So it is hard to identify how effective this is how bad would it be if people just gave up? Kees: I think it would be pretty bad, because as we have seen, using secondary factors, the work we have done in the industry at large, not just the Linux kernel, has had an impact. What we, Microsoft, Apple, and everyone else is doing for their respective software ecosystems, has shown that the price of functional exploits in the black market has gone up. Especially for really egregious stuff like a zero-click remote code execution. If those were cheap then obviously we are not doing something right, and it becomes clear that it s trivial for anyone to attack the infrastructure that our lives depend on. But thankfully we have seen over the last two decades that prices for exploits keep going up and up into millions of dollars. I think it is important to keep working on that because, as a central piece of modern computer infrastructure, the Linux kernel has a giant target painted on it. If we give up, we have to accept that our computers are not doing what they were designed to do, which I can t accept. The safety of my grandparents shouldn t be any different from the safety of journalists, and political activists, and anyone else who might be the target of attacks. We need to be able to trust our devices otherwise why use them at all?
Vagrant: What has been your biggest success in recent years? Kees: I think with all these things I am not the only actor. Almost everything that we have been successful at has been because of a lot of people s work, and one of the big ones that has been coordinated across the ecosystem and across compilers was initializing stack variables to 0 by default. This feature was added in Clang, GCC, and MSVC across the board even though there were a lot of fears about forking the C language. The worry was that developers would come to depend on zero-initialized stack variables, but this hasn t been the case because we still warn about uninitialized variables when the compiler can figure that out. So you still still get the warnings at compile time but now you can count on the contents of your stack at run-time and we drop an entire class of uninitialized variable flaws. While the exploitation of this class has mostly been around memory content exposure, it has also been used for control flow attacks. So that was politically and technically a large challenge: convincing people it was necessary, showing its utility, and implementing it in a way that everyone would be happy with, resulting in the elimination of a large and persistent class of flaws in C.
Vagrant: In a world where things are generally Reproducible do you see ways in which that might affect your work? Kees: One of the questions I frequently get is, What version of the Linux kernel has feature $foo? If I know how things are built, I can answer with just a version number. In a Reproducible Builds scenario I can count on the compiler version, compiler flags, kernel configuration, etc. all those things are known, so I can actually answer definitively that a certain feature exists. So that is an area where Reproducible Builds affects me most directly. Indirectly, it is just being able to trust the binaries you are running are going to behave the same for the same build environment is critical for sane testing.
Vagrant: Have you used diffoscope? Kees: I have! One subset of tree-wide refactoring that we do when getting rid of ambiguous language usage in the kernel is when we have to make source level changes to satisfy some new compiler requirement but where the binary output is not expected to change at all. It is mostly about getting the compiler to understand what is happening, what is intended in the cases where the old ambiguity does actually match the new unambiguous description of what is intended. The binary shouldn t change. We have used diffoscope to compare the before and after binaries to confirm that yep, there is no change in binary .
Vagrant: You cannot just use checksums for that? Kees: For the most part, we need to only compare the text segments. We try to hold as much stable as we can, following the Reproducible Builds documentation for the kernel, but there are macros in the kernel that are sensitive to source line numbers and as a result those will change the layout of the data segment (and sometimes the text segment too). With diffoscope there s flexibility where I can exclude or include different comparisons. Sometimes I just go look at what diffoscope is doing and do that manually, because I can tweak that a little harder, but diffoscope is definitely the default. Diffoscope is awesome!
Vagrant: Where has reproducible builds affected you? Kees: One of the notable wins of reproducible builds lately was dealing with the fallout of the XZ backdoor and just being able to ask the question is my build environment running the expected code? and to be able to compare the output generated from one install that never had a vulnerable XZ and one that did have a vulnerable XZ and compare the results of what you get. That was important for kernel builds because the XZ threat actor was working to expand their influence and capabilities to include Linux kernel builds, but they didn t finish their work before they were noticed. I think what happened with Debian proving the build infrastructure was not affected is an important example of how people would have needed to verify the kernel builds too.
Vagrant: What do you want to see for the near or distant future in security work? Kees: For reproducible builds in the kernel, in the work that has been going on in the ClangBuiltLinux project, one of the driving forces of code and usability quality has been the continuous integration work. As soon as something breaks, on the kernel side, the Clang side, or something in between the two, we get a fast signal and can chase it and fix the bugs quickly. I would like to see someone with funding to maintain a reproducible kernel build CI. There have been places where there are certain architecture configurations or certain build configuration where we lose reproducibility and right now we have sort of a standard open source development feedback loop where those things get fixed but the time in between introduction and fix can be large. Getting a CI for reproducible kernels would give us the opportunity to shorten that time.
Vagrant: Well, thanks for that! Any last closing thoughts? Kees: I am a big fan of reproducible builds, thank you for all your work. The world is a safer place because of it.
Vagrant: Likewise for your work!


For more information about the Reproducible Builds project, please see our website at reproducible-builds.org. If you are interested in ensuring the ongoing security of the software that underpins our civilisation and wish to sponsor the Reproducible Builds project, please reach out to the project by emailing contact@reproducible-builds.org.

21 September 2024

Gunnar Wolf: 50 years of queries

This post is a review for Computing Reviews for 50 years of queries , a article published in Communications of the ACM
The relational model is probably the one innovation that brought computers to the mainstream for business users. This article by Donald Chamberlin, creator of one of the first query languages (that evolved into the ubiquitous SQL), presents its history as a commemoration of the 50th anniversary of his publication of said query language. The article begins by giving background on information processing before the advent of today s database management systems: with systems storing and processing information based on sequential-only magnetic tapes in the 1950s, adopting a record-based, fixed-format filing system was far from natural. The late 1960s and early 1970s saw many fundamental advances, among which one of the best known is E. F. Codd s relational model. The first five pages (out of 12) present the evolution of the data management community up to the 1974 SIGFIDET conference. This conference was so important in the eyes of the author that, in his words, it is the event that starts the clock on 50 years of relational databases. The second part of the article tells about the growth of the structured English query language (SEQUEL) eventually renamed SQL including the importance of its standardization and its presence in commercial products as the dominant database language since the late 1970s. Chamberlin presents short histories of the various implementations, many of which remain dominant names today, that is, Oracle, Informix, and DB2. Entering the 1990s, open-source communities introduced MySQL, PostgreSQL, and SQLite. The final part of the article presents controversies and criticisms related to SQL and the relational database model as a whole. Chamberlin presents the main points of controversy throughout the years: 1) the SQL language lacks orthogonality; 2) SQL tables, unlike formal relations, might contain null values; and 3) SQL tables, unlike formal relations, may contain duplicate rows. He explains the issues and tradeoffs that guided the language design as it unfolded. Finally, a section presents several points that explain how SQL and the relational model have remained, for 50 years, a winning concept, as well as some thoughts regarding the NoSQL movement that gained traction in the 2010s. This article is written with clear language and structure, making it easy and pleasant to read. It does not drive a technical point, but instead is a recap on half a century of developments in one of the fields most important to the commercial development of computing, written by one of the greatest authorities on the topic.

17 September 2024

Dirk Eddelbuettel: nanotime 0.3.10 on CRAN: Update

A minor update 0.3.10 for our nanotime package is now on CRAN. nanotime relies on the RcppCCTZ package (as well as the RcppDate package for additional C++ operations) and offers efficient high(er) resolution time parsing and formatting up to nanosecond resolution, using the bit64 package for the actual integer64 arithmetic. Initially implemented using the S3 system, it has benefitted greatly from a rigorous refactoring by Leonardo who not only rejigged nanotime internals in S4 but also added new S4 types for periods, intervals and durations. This release updates one S4 methods to very recent changes in r-devel for which CRAN had reached out. This concerns the setdiff() method when applied to two nanotime objects. As it only affected R 4.5.0, due next April, if rebuilt in the last two or so weeks it will not have been visible to that many users, if any. In any event, it now works again for that setup too, and should be going forward. We also retired one demo function from the very early days, apparently it relied on ggplot2 features that have since moved on. If someone would like to help out and resurrect the demo, please get in touch. We also cleaned out some no longer used tests, and updated DESCRIPTION to what is required now. The NEWS snippet below has the full details.

Changes in version 0.3.10 (2024-09-16)
  • Retire several checks for Solaris in test suite (Dirk in #130)
  • Switch to Authors@R in DESCRIPTION as now required by CRAN
  • Accommodate R-devel change for setdiff (Dirk in #133 fixing #132)
  • No longer ship defunction ggplot2 demo (Dirk fixing #131)

Thanks to my CRANberries, there is a diffstat report for this release. More details and examples are at the nanotime page; code, issue tickets etc at the GitHub repository and all documentation is provided at the nanotime documentation site. If you like this or other open-source work I do, you can sponsor me at GitHub.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

8 September 2024

Jacob Adams: Linux's Bedtime Routine

How does Linux move from an awake machine to a hibernating one? How does it then manage to restore all state? These questions led me to read way too much C in trying to figure out how this particular hardware/software boundary is navigated. This investigation will be split into a few parts, with the first one going from invocation of hibernation to synchronizing all filesystems to disk. This article has been written using Linux version 6.9.9, the source of which can be found in many places, but can be navigated easily through the Bootlin Elixir Cross-Referencer: https://elixir.bootlin.com/linux/v6.9.9/source Each code snippet will begin with a link to the above giving the file path and the line number of the beginning of the snippet.

A Starting Point for Investigation: /sys/power/state and /sys/power/disk These two system files exist to allow debugging of hibernation, and thus control the exact state used directly. Writing specific values to the state file controls the exact sleep mode used and disk controls the specific hibernation mode1. This is extremely handy as an entry point to understand how these systems work, since we can just follow what happens when they are written to.

Show and Store Functions These two files are defined using the power_attr macro: kernel/power/power.h:80
#define power_attr(_name) \
static struct kobj_attribute _name##_attr =     \
    .attr   =               \
        .name = __stringify(_name), \
        .mode = 0644,           \
     ,                  \
    .show   = _name##_show,         \
    .store  = _name##_store,        \
 
show is called on reads and store on writes. state_show is a little boring for our purposes, as it just prints all the available sleep states. kernel/power/main.c:657
/*
 * state - control system sleep states.
 *
 * show() returns available sleep state labels, which may be "mem", "standby",
 * "freeze" and "disk" (hibernation).
 * See Documentation/admin-guide/pm/sleep-states.rst for a description of
 * what they mean.
 *
 * store() accepts one of those strings, translates it into the proper
 * enumerated value, and initiates a suspend transition.
 */
static ssize_t state_show(struct kobject *kobj, struct kobj_attribute *attr,
			  char *buf)
 
	char *s = buf;
#ifdef CONFIG_SUSPEND
	suspend_state_t i;
	for (i = PM_SUSPEND_MIN; i < PM_SUSPEND_MAX; i++)
		if (pm_states[i])
			s += sprintf(s,"%s ", pm_states[i]);
#endif
	if (hibernation_available())
		s += sprintf(s, "disk ");
	if (s != buf)
		/* convert the last space to a newline */
		*(s-1) = '\n';
	return (s - buf);
 
state_store, however, provides our entry point. If the string disk is written to the state file, it calls hibernate(). This is our entry point. kernel/power/main.c:715
static ssize_t state_store(struct kobject *kobj, struct kobj_attribute *attr,
			   const char *buf, size_t n)
 
	suspend_state_t state;
	int error;
	error = pm_autosleep_lock();
	if (error)
		return error;
	if (pm_autosleep_state() > PM_SUSPEND_ON)  
		error = -EBUSY;
		goto out;
	 
	state = decode_state(buf, n);
	if (state < PM_SUSPEND_MAX)  
		if (state == PM_SUSPEND_MEM)
			state = mem_sleep_current;
		error = pm_suspend(state);
	  else if (state == PM_SUSPEND_MAX)  
		error = hibernate();
	  else  
		error = -EINVAL;
	 
 out:
	pm_autosleep_unlock();
	return error ? error : n;
 
kernel/power/main.c:688
static suspend_state_t decode_state(const char *buf, size_t n)
 
#ifdef CONFIG_SUSPEND
	suspend_state_t state;
#endif
	char *p;
	int len;
	p = memchr(buf, '\n', n);
	len = p ? p - buf : n;
	/* Check hibernation first. */
	if (len == 4 && str_has_prefix(buf, "disk"))
		return PM_SUSPEND_MAX;
#ifdef CONFIG_SUSPEND
	for (state = PM_SUSPEND_MIN; state < PM_SUSPEND_MAX; state++)  
		const char *label = pm_states[state];
		if (label && len == strlen(label) && !strncmp(buf, label, len))
			return state;
	 
#endif
	return PM_SUSPEND_ON;
 
Could we have figured this out just via function names? Sure, but this way we know for sure that nothing else is happening before this function is called.

Autosleep Our first detour is into the autosleep system. When checking the state above, you may notice that the kernel grabs the pm_autosleep_lock before checking the current state. autosleep is a mechanism originally from Android that sends the entire system to either suspend or hibernate whenever it is not actively working on anything. This is not enabled for most desktop configurations, since it s primarily for mobile systems and inverts the standard suspend and hibernate interactions. This system is implemented as a workqueue2 that checks the current number of wakeup events, processes and drivers that need to run3, and if there aren t any, then the system is put into the autosleep state, typically suspend. However, it could be hibernate if configured that way via /sys/power/autosleep in a similar manner to using /sys/power/state to manually enable hibernation. kernel/power/main.c:841
static ssize_t autosleep_store(struct kobject *kobj,
			       struct kobj_attribute *attr,
			       const char *buf, size_t n)
 
	suspend_state_t state = decode_state(buf, n);
	int error;
	if (state == PM_SUSPEND_ON
	    && strcmp(buf, "off") && strcmp(buf, "off\n"))
		return -EINVAL;
	if (state == PM_SUSPEND_MEM)
		state = mem_sleep_current;
	error = pm_autosleep_set_state(state);
	return error ? error : n;
 
power_attr(autosleep);
#endif /* CONFIG_PM_AUTOSLEEP */
kernel/power/autosleep.c:24
static DEFINE_MUTEX(autosleep_lock);
static struct wakeup_source *autosleep_ws;
static void try_to_suspend(struct work_struct *work)
 
	unsigned int initial_count, final_count;
	if (!pm_get_wakeup_count(&initial_count, true))
		goto out;
	mutex_lock(&autosleep_lock);
	if (!pm_save_wakeup_count(initial_count)  
		system_state != SYSTEM_RUNNING)  
		mutex_unlock(&autosleep_lock);
		goto out;
	 
	if (autosleep_state == PM_SUSPEND_ON)  
		mutex_unlock(&autosleep_lock);
		return;
	 
	if (autosleep_state >= PM_SUSPEND_MAX)
		hibernate();
	else
		pm_suspend(autosleep_state);
	mutex_unlock(&autosleep_lock);
	if (!pm_get_wakeup_count(&final_count, false))
		goto out;
	/*
	 * If the wakeup occurred for an unknown reason, wait to prevent the
	 * system from trying to suspend and waking up in a tight loop.
	 */
	if (final_count == initial_count)
		schedule_timeout_uninterruptible(HZ / 2);
 out:
	queue_up_suspend_work();
 
static DECLARE_WORK(suspend_work, try_to_suspend);
void queue_up_suspend_work(void)
 
	if (autosleep_state > PM_SUSPEND_ON)
		queue_work(autosleep_wq, &suspend_work);
 

The Steps of Hibernation

Hibernation Kernel Config It s important to note that most of the hibernate-specific functions below do nothing unless you ve defined CONFIG_HIBERNATION in your Kconfig4. As an example, hibernate itself is defined as the following if CONFIG_HIBERNATE is not set. include/linux/suspend.h:407
static inline int hibernate(void)   return -ENOSYS;  

Check if Hibernation is Available We begin by confirming that we actually can perform hibernation, via the hibernation_available function. kernel/power/hibernate.c:742
if (!hibernation_available())  
	pm_pr_dbg("Hibernation not available.\n");
	return -EPERM;
 
kernel/power/hibernate.c:92
bool hibernation_available(void)
 
	return nohibernate == 0 &&
		!security_locked_down(LOCKDOWN_HIBERNATION) &&
		!secretmem_active() && !cxl_mem_active();
 
nohibernate is controlled by the kernel command line, it s set via either nohibernate or hibernate=no. security_locked_down is a hook for Linux Security Modules to prevent hibernation. This is used to prevent hibernating to an unencrypted storage device, as specified in the manual page kernel_lockdown(7). Interestingly, either level of lockdown, integrity or confidentiality, locks down hibernation because with the ability to hibernate you can extract bascially anything from memory and even reboot into a modified kernel image. secretmem_active checks whether there is any active use of memfd_secret, and if so it prevents hibernation. memfd_secret returns a file descriptor that can be mapped into a process but is specifically unmapped from the kernel s memory space. Hibernating with memory that not even the kernel is supposed to access would expose that memory to whoever could access the hibernation image. This particular feature of secret memory was apparently controversial, though not as controversial as performance concerns around fragmentation when unmapping kernel memory (which did not end up being a real problem). cxl_mem_active just checks whether any CXL memory is active. A full explanation is provided in the commit introducing this check but there s also a shortened explanation from cxl_mem_probe that sets the relevant flag when initializing a CXL memory device. drivers/cxl/mem.c:186
* The kernel may be operating out of CXL memory on this device,
* there is no spec defined way to determine whether this device
* preserves contents over suspend, and there is no simple way
* to arrange for the suspend image to avoid CXL memory which
* would setup a circular dependency between PCI resume and save
* state restoration.

Check Compression The next check is for whether compression support is enabled, and if so whether the requested algorithm is enabled. kernel/power/hibernate.c:747
/*
 * Query for the compression algorithm support if compression is enabled.
 */
if (!nocompress)  
	strscpy(hib_comp_algo, hibernate_compressor, sizeof(hib_comp_algo));
	if (crypto_has_comp(hib_comp_algo, 0, 0) != 1)  
		pr_err("%s compression is not available\n", hib_comp_algo);
		return -EOPNOTSUPP;
	 
 
The nocompress flag is set via the hibernate command line parameter, setting hibernate=nocompress. If compression is enabled, then hibernate_compressor is copied to hib_comp_algo. This synchronizes the current requested compression setting (hibernate_compressor) with the current compression setting (hib_comp_algo). Both values are character arrays of size CRYPTO_MAX_ALG_NAME (128 in this kernel). kernel/power/hibernate.c:50
static char hibernate_compressor[CRYPTO_MAX_ALG_NAME] = CONFIG_HIBERNATION_DEF_COMP;
/*
 * Compression/decompression algorithm to be used while saving/loading
 * image to/from disk. This would later be used in 'kernel/power/swap.c'
 * to allocate comp streams.
 */
char hib_comp_algo[CRYPTO_MAX_ALG_NAME];
hibernate_compressor defaults to lzo if that algorithm is enabled, otherwise to lz4 if enabled5. It can be overwritten using the hibernate.compressor setting to either lzo or lz4. kernel/power/Kconfig:95
choice
	prompt "Default compressor"
	default HIBERNATION_COMP_LZO
	depends on HIBERNATION
config HIBERNATION_COMP_LZO
	bool "lzo"
	depends on CRYPTO_LZO
config HIBERNATION_COMP_LZ4
	bool "lz4"
	depends on CRYPTO_LZ4
endchoice
config HIBERNATION_DEF_COMP
	string
	default "lzo" if HIBERNATION_COMP_LZO
	default "lz4" if HIBERNATION_COMP_LZ4
	help
	  Default compressor to be used for hibernation.
kernel/power/hibernate.c:1425
static const char * const comp_alg_enabled[] =  
#if IS_ENABLED(CONFIG_CRYPTO_LZO)
	COMPRESSION_ALGO_LZO,
#endif
#if IS_ENABLED(CONFIG_CRYPTO_LZ4)
	COMPRESSION_ALGO_LZ4,
#endif
 ;
static int hibernate_compressor_param_set(const char *compressor,
		const struct kernel_param *kp)
 
	unsigned int sleep_flags;
	int index, ret;
	sleep_flags = lock_system_sleep();
	index = sysfs_match_string(comp_alg_enabled, compressor);
	if (index >= 0)  
		ret = param_set_copystring(comp_alg_enabled[index], kp);
		if (!ret)
			strscpy(hib_comp_algo, comp_alg_enabled[index],
				sizeof(hib_comp_algo));
	  else  
		ret = index;
	 
	unlock_system_sleep(sleep_flags);
	if (ret)
		pr_debug("Cannot set specified compressor %s\n",
			 compressor);
	return ret;
 
static const struct kernel_param_ops hibernate_compressor_param_ops =  
	.set    = hibernate_compressor_param_set,
	.get    = param_get_string,
 ;
static struct kparam_string hibernate_compressor_param_string =  
	.maxlen = sizeof(hibernate_compressor),
	.string = hibernate_compressor,
 ;
We then check whether the requested algorithm is supported via crypto_has_comp. If not, we bail out of the whole operation with EOPNOTSUPP. As part of crypto_has_comp we perform any needed initialization of the algorithm, loading kernel modules and running initialization code as needed6.

Grab Locks The next step is to grab the sleep and hibernation locks via lock_system_sleep and hibernate_acquire. kernel/power/hibernate.c:758
sleep_flags = lock_system_sleep();
/* The snapshot device should not be opened while we're running */
if (!hibernate_acquire())  
	error = -EBUSY;
	goto Unlock;
 
First, lock_system_sleep marks the current thread as not freezable, which will be important later7. It then grabs the system_transistion_mutex, which locks taking snapshots or modifying how they are taken, resuming from a hibernation image, entering any suspend state, or rebooting.

The GFP Mask The kernel also issues a warning if the gfp mask is changed via either pm_restore_gfp_mask or pm_restrict_gfp_mask without holding the system_transistion_mutex. GFP flags tell the kernel how it is permitted to handle a request for memory. include/linux/gfp_types.h:12
 * GFP flags are commonly used throughout Linux to indicate how memory
 * should be allocated.  The GFP acronym stands for get_free_pages(),
 * the underlying memory allocation function.  Not every GFP flag is
 * supported by every function which may allocate memory.
In the case of hibernation specifically we care about the IO and FS flags, which are reclaim operators, ways the system is permitted to attempt to free up memory in order to satisfy a specific request for memory. include/linux/gfp_types.h:176
 * Reclaim modifiers
 * -----------------
 * Please note that all the following flags are only applicable to sleepable
 * allocations (e.g. %GFP_NOWAIT and %GFP_ATOMIC will ignore them).
 *
 * %__GFP_IO can start physical IO.
 *
 * %__GFP_FS can call down to the low-level FS. Clearing the flag avoids the
 * allocator recursing into the filesystem which might already be holding
 * locks.
gfp_allowed_mask sets which flags are permitted to be set at the current time. As the comment below outlines, preventing these flags from being set avoids situations where the kernel needs to do I/O to allocate memory (e.g. read/writing swap8) but the devices it needs to read/write to/from are not currently available. kernel/power/main.c:24
/*
 * The following functions are used by the suspend/hibernate code to temporarily
 * change gfp_allowed_mask in order to avoid using I/O during memory allocations
 * while devices are suspended.  To avoid races with the suspend/hibernate code,
 * they should always be called with system_transition_mutex held
 * (gfp_allowed_mask also should only be modified with system_transition_mutex
 * held, unless the suspend/hibernate code is guaranteed not to run in parallel
 * with that modification).
 */
static gfp_t saved_gfp_mask;
void pm_restore_gfp_mask(void)
 
	WARN_ON(!mutex_is_locked(&system_transition_mutex));
	if (saved_gfp_mask)  
		gfp_allowed_mask = saved_gfp_mask;
		saved_gfp_mask = 0;
	 
 
void pm_restrict_gfp_mask(void)
 
	WARN_ON(!mutex_is_locked(&system_transition_mutex));
	WARN_ON(saved_gfp_mask);
	saved_gfp_mask = gfp_allowed_mask;
	gfp_allowed_mask &= ~(__GFP_IO   __GFP_FS);
 

Sleep Flags After grabbing the system_transition_mutex the kernel then returns and captures the previous state of the threads flags in sleep_flags. This is used later to remove PF_NOFREEZE if it wasn t previously set on the current thread. kernel/power/main.c:52
unsigned int lock_system_sleep(void)
 
	unsigned int flags = current->flags;
	current->flags  = PF_NOFREEZE;
	mutex_lock(&system_transition_mutex);
	return flags;
 
EXPORT_SYMBOL_GPL(lock_system_sleep);
include/linux/sched.h:1633
#define PF_NOFREEZE		0x00008000	/* This thread should not be frozen */
Then we grab the hibernate-specific semaphore to ensure no one can open a snapshot or resume from it while we perform hibernation. Additionally this lock is used to prevent hibernate_quiet_exec, which is used by the nvdimm driver to active its firmware with all processes and devices frozen, ensuring it is the only thing running at that time9. kernel/power/hibernate.c:82
bool hibernate_acquire(void)
 
	return atomic_add_unless(&hibernate_atomic, -1, 0);
 

Prepare Console The kernel next calls pm_prepare_console. This function only does anything if CONFIG_VT_CONSOLE_SLEEP has been set. This prepares the virtual terminal for a suspend state, switching away to a console used only for the suspend state if needed. kernel/power/console.c:130
void pm_prepare_console(void)
 
	if (!pm_vt_switch())
		return;
	orig_fgconsole = vt_move_to_console(SUSPEND_CONSOLE, 1);
	if (orig_fgconsole < 0)
		return;
	orig_kmsg = vt_kmsg_redirect(SUSPEND_CONSOLE);
	return;
 
The first thing is to check whether we actually need to switch the VT kernel/power/console.c:94
/*
 * There are three cases when a VT switch on suspend/resume are required:
 *   1) no driver has indicated a requirement one way or another, so preserve
 *      the old behavior
 *   2) console suspend is disabled, we want to see debug messages across
 *      suspend/resume
 *   3) any registered driver indicates it needs a VT switch
 *
 * If none of these conditions is present, meaning we have at least one driver
 * that doesn't need the switch, and none that do, we can avoid it to make
 * resume look a little prettier (and suspend too, but that's usually hidden,
 * e.g. when closing the lid on a laptop).
 */
static bool pm_vt_switch(void)
 
	struct pm_vt_switch *entry;
	bool ret = true;
	mutex_lock(&vt_switch_mutex);
	if (list_empty(&pm_vt_switch_list))
		goto out;
	if (!console_suspend_enabled)
		goto out;
	list_for_each_entry(entry, &pm_vt_switch_list, head)  
		if (entry->required)
			goto out;
	 
	ret = false;
out:
	mutex_unlock(&vt_switch_mutex);
	return ret;
 
There is an explanation of the conditions under which a switch is performed in the comment above the function, but we ll also walk through the steps here. Firstly we grab the vt_switch_mutex to ensure nothing will modify the list while we re looking at it. We then examine the pm_vt_switch_list. This list is used to indicate the drivers that require a switch during suspend. They register this requirement, or the lack thereof, via pm_vt_switch_required. kernel/power/console.c:31
/**
 * pm_vt_switch_required - indicate VT switch at suspend requirements
 * @dev: device
 * @required: if true, caller needs VT switch at suspend/resume time
 *
 * The different console drivers may or may not require VT switches across
 * suspend/resume, depending on how they handle restoring video state and
 * what may be running.
 *
 * Drivers can indicate support for switchless suspend/resume, which can
 * save time and flicker, by using this routine and passing 'false' as
 * the argument.  If any loaded driver needs VT switching, or the
 * no_console_suspend argument has been passed on the command line, VT
 * switches will occur.
 */
void pm_vt_switch_required(struct device *dev, bool required)
Next, we check console_suspend_enabled. This is set to false by the kernel parameter no_console_suspend, but defaults to true. Finally, if there are any entries in the pm_vt_switch_list, then we check to see if any of them require a VT switch. Only if none of these conditions apply, then we return false. If a VT switch is in fact required, then we move first the currently active virtual terminal/console10 (vt_move_to_console) and then the current location of kernel messages (vt_kmsg_redirect) to the SUSPEND_CONSOLE. The SUSPEND_CONSOLE is the last entry in the list of possible consoles, and appears to just be a black hole to throw away messages. kernel/power/console.c:16
#define SUSPEND_CONSOLE	(MAX_NR_CONSOLES-1)
Interestingly, these are separate functions because you can use TIOCL_SETKMSGREDIRECT (an ioctl11) to send kernel messages to a specific virtual terminal, but by default its the same as the currently active console. The locations of the previously active console and the previous kernel messages location are stored in orig_fgconsole and orig_kmsg, to restore the state of the console and kernel messages after the machine wakes up again. Interestingly, this means orig_fgconsole also ends up storing any errors, so has to be checked to ensure it s not less than zero before we try to do anything with the kernel messages on both suspend and resume. drivers/tty/vt/vt_ioctl.c:1268
/* Perform a kernel triggered VT switch for suspend/resume */
static int disable_vt_switch;
int vt_move_to_console(unsigned int vt, int alloc)
 
	int prev;
	console_lock();
	/* Graphics mode - up to X */
	if (disable_vt_switch)  
		console_unlock();
		return 0;
	 
	prev = fg_console;
	if (alloc && vc_allocate(vt))  
		/* we can't have a free VC for now. Too bad,
		 * we don't want to mess the screen for now. */
		console_unlock();
		return -ENOSPC;
	 
	if (set_console(vt))  
		/*
		 * We're unable to switch to the SUSPEND_CONSOLE.
		 * Let the calling function know so it can decide
		 * what to do.
		 */
		console_unlock();
		return -EIO;
	 
	console_unlock();
	if (vt_waitactive(vt + 1))  
		pr_debug("Suspend: Can't switch VCs.");
		return -EINTR;
	 
	return prev;
 
Unlike most other locking functions we ve seen so far, console_lock needs to be careful to ensure nothing else is panicking and needs to dump to the console before grabbing the semaphore for the console and setting a couple flags.

Panics Panics are tracked via an atomic integer set to the id of the processor currently panicking. kernel/printk/printk.c:2649
/**
 * console_lock - block the console subsystem from printing
 *
 * Acquires a lock which guarantees that no consoles will
 * be in or enter their write() callback.
 *
 * Can sleep, returns nothing.
 */
void console_lock(void)
 
	might_sleep();
	/* On panic, the console_lock must be left to the panic cpu. */
	while (other_cpu_in_panic())
		msleep(1000);
	down_console_sem();
	console_locked = 1;
	console_may_schedule = 1;
 
EXPORT_SYMBOL(console_lock);
kernel/printk/printk.c:362
/*
 * Return true if a panic is in progress on a remote CPU.
 *
 * On true, the local CPU should immediately release any printing resources
 * that may be needed by the panic CPU.
 */
bool other_cpu_in_panic(void)
 
	return (panic_in_progress() && !this_cpu_in_panic());
 
kernel/printk/printk.c:345
static bool panic_in_progress(void)
 
	return unlikely(atomic_read(&panic_cpu) != PANIC_CPU_INVALID);
 
kernel/printk/printk.c:350
/* Return true if a panic is in progress on the current CPU. */
bool this_cpu_in_panic(void)
 
	/*
	 * We can use raw_smp_processor_id() here because it is impossible for
	 * the task to be migrated to the panic_cpu, or away from it. If
	 * panic_cpu has already been set, and we're not currently executing on
	 * that CPU, then we never will be.
	 */
	return unlikely(atomic_read(&panic_cpu) == raw_smp_processor_id());
 
console_locked is a debug value, used to indicate that the lock should be held, and our first indication that this whole virtual terminal system is more complex than might initially be expected. kernel/printk/printk.c:373
/*
 * This is used for debugging the mess that is the VT code by
 * keeping track if we have the console semaphore held. It's
 * definitely not the perfect debug tool (we don't know if _WE_
 * hold it and are racing, but it helps tracking those weird code
 * paths in the console code where we end up in places I want
 * locked without the console semaphore held).
 */
static int console_locked;
console_may_schedule is used to see if we are permitted to sleep and schedule other work while we hold this lock. As we ll see later, the virtual terminal subsystem is not re-entrant, so there s all sorts of hacks in here to ensure we don t leave important code sections that can t be safely resumed.

Disable VT Switch As the comment below lays out, when another program is handling graphical display anyway, there s no need to do any of this, so the kernel provides a switch to turn the whole thing off. Interestingly, this appears to only be used by three drivers, so the specific hardware support required must not be particularly common.
drivers/gpu/drm/omapdrm/dss
drivers/video/fbdev/geode
drivers/video/fbdev/omap2
drivers/tty/vt/vt_ioctl.c:1308
/*
 * Normally during a suspend, we allocate a new console and switch to it.
 * When we resume, we switch back to the original console.  This switch
 * can be slow, so on systems where the framebuffer can handle restoration
 * of video registers anyways, there's little point in doing the console
 * switch.  This function allows you to disable it by passing it '0'.
 */
void pm_set_vt_switch(int do_switch)
 
	console_lock();
	disable_vt_switch = !do_switch;
	console_unlock();
 
EXPORT_SYMBOL(pm_set_vt_switch);
The rest of the vt_switch_console function is pretty normal, however, simply allocating space if needed to create the requested virtual terminal and then setting the current virtual terminal via set_console.

Virtual Terminal Set Console With set_console, we begin (as if we haven t been already) to enter the madness that is the virtual terminal subsystem. As mentioned previously, modifications to its state must be made very carefully, as other stuff happening at the same time could create complete messes. All this to say, calling set_console does not actually perform any work to change the state of the current console. Instead it indicates what changes it wants and then schedules that work. drivers/tty/vt/vt.c:3153
int set_console(int nr)
 
	struct vc_data *vc = vc_cons[fg_console].d;
	if (!vc_cons_allocated(nr)   vt_dont_switch  
		(vc->vt_mode.mode == VT_AUTO && vc->vc_mode == KD_GRAPHICS))  
		/*
		 * Console switch will fail in console_callback() or
		 * change_console() so there is no point scheduling
		 * the callback
		 *
		 * Existing set_console() users don't check the return
		 * value so this shouldn't break anything
		 */
		return -EINVAL;
	 
	want_console = nr;
	schedule_console_callback();
	return 0;
 
The check for vc->vc_mode == KD_GRAPHICS is where most end-user graphical desktops will bail out of this change, as they re in graphics mode and don t need to switch away to the suspend console. vt_dont_switch is a flag used by the ioctls11 VT_LOCKSWITCH and VT_UNLOCKSWITCH to prevent the system from switching virtual terminal devices when the user has explicitly locked it. VT_AUTO is a flag indicating that automatic virtual terminal switching is enabled12, and thus deliberate switching to a suspend terminal is not required. However, if you do run your machine from a virtual terminal, then we indicate to the system that we want to change to the requested virtual terminal via the want_console variable and schedule a callback via schedule_console_callback. drivers/tty/vt/vt.c:315
void schedule_console_callback(void)
 
	schedule_work(&console_work);
 
console_work is a workqueue2 that will execute the given task asynchronously.

Console Callback drivers/tty/vt/vt.c:3109
/*
 * This is the console switching callback.
 *
 * Doing console switching in a process context allows
 * us to do the switches asynchronously (needed when we want
 * to switch due to a keyboard interrupt).  Synchronization
 * with other console code and prevention of re-entrancy is
 * ensured with console_lock.
 */
static void console_callback(struct work_struct *ignored)
 
	console_lock();
	if (want_console >= 0)  
		if (want_console != fg_console &&
		    vc_cons_allocated(want_console))  
			hide_cursor(vc_cons[fg_console].d);
			change_console(vc_cons[want_console].d);
			/* we only changed when the console had already
			   been allocated - a new console is not created
			   in an interrupt routine */
		 
		want_console = -1;
	 
...
console_callback first looks to see if there is a console change wanted via want_console and then changes to it if it s not the current console and has been allocated already. We do first remove any cursor state with hide_cursor. drivers/tty/vt/vt.c:841
static void hide_cursor(struct vc_data *vc)
 
	if (vc_is_sel(vc))
		clear_selection();
	vc->vc_sw->con_cursor(vc, false);
	hide_softcursor(vc);
 
A full dive into the tty driver is a task for another time, but this should give a general sense of how this system interacts with hibernation.

Notify Power Management Call Chain kernel/power/hibernate.c:767
pm_notifier_call_chain_robust(PM_HIBERNATION_PREPARE, PM_POST_HIBERNATION)
This will call a chain of power management callbacks, passing first PM_HIBERNATION_PREPARE and then PM_POST_HIBERNATION on startup or on error with another callback. kernel/power/main.c:98
int pm_notifier_call_chain_robust(unsigned long val_up, unsigned long val_down)
 
	int ret;
	ret = blocking_notifier_call_chain_robust(&pm_chain_head, val_up, val_down, NULL);
	return notifier_to_errno(ret);
 
The power management notifier is a blocking notifier chain, which means it has the following properties. include/linux/notifier.h:23
 *	Blocking notifier chains: Chain callbacks run in process context.
 *		Callouts are allowed to block.
The callback chain is a linked list with each entry containing a priority and a function to call. The function technically takes in a data value, but it is always NULL for the power management chain. include/linux/notifier.h:49
struct notifier_block;
typedef	int (*notifier_fn_t)(struct notifier_block *nb,
			unsigned long action, void *data);
struct notifier_block  
	notifier_fn_t notifier_call;
	struct notifier_block __rcu *next;
	int priority;
 ;
The head of the linked list is protected by a read-write semaphore. include/linux/notifier.h:65
struct blocking_notifier_head  
	struct rw_semaphore rwsem;
	struct notifier_block __rcu *head;
 ;
Because it is prioritized, appending to the list requires walking it until an item with lower13 priority is found to insert the current item before. kernel/notifier.c:252
/*
 *	Blocking notifier chain routines.  All access to the chain is
 *	synchronized by an rwsem.
 */
static int __blocking_notifier_chain_register(struct blocking_notifier_head *nh,
					      struct notifier_block *n,
					      bool unique_priority)
 
	int ret;
	/*
	 * This code gets used during boot-up, when task switching is
	 * not yet working and interrupts must remain disabled.  At
	 * such times we must not call down_write().
	 */
	if (unlikely(system_state == SYSTEM_BOOTING))
		return notifier_chain_register(&nh->head, n, unique_priority);
	down_write(&nh->rwsem);
	ret = notifier_chain_register(&nh->head, n, unique_priority);
	up_write(&nh->rwsem);
	return ret;
 
kernel/notifier.c:20
/*
 *	Notifier chain core routines.  The exported routines below
 *	are layered on top of these, with appropriate locking added.
 */
static int notifier_chain_register(struct notifier_block **nl,
				   struct notifier_block *n,
				   bool unique_priority)
 
	while ((*nl) != NULL)  
		if (unlikely((*nl) == n))  
			WARN(1, "notifier callback %ps already registered",
			     n->notifier_call);
			return -EEXIST;
		 
		if (n->priority > (*nl)->priority)
			break;
		if (n->priority == (*nl)->priority && unique_priority)
			return -EBUSY;
		nl = &((*nl)->next);
	 
	n->next = *nl;
	rcu_assign_pointer(*nl, n);
	trace_notifier_register((void *)n->notifier_call);
	return 0;
 
Each callback can return one of a series of options. include/linux/notifier.h:18
#define NOTIFY_DONE		0x0000		/* Don't care */
#define NOTIFY_OK		0x0001		/* Suits me */
#define NOTIFY_STOP_MASK	0x8000		/* Don't call further */
#define NOTIFY_BAD		(NOTIFY_STOP_MASK 0x0002)
						/* Bad/Veto action */
When notifying the chain, if a function returns STOP or BAD then the previous parts of the chain are called again with PM_POST_HIBERNATION14 and an error is returned. kernel/notifier.c:107
/**
 * notifier_call_chain_robust - Inform the registered notifiers about an event
 *                              and rollback on error.
 * @nl:		Pointer to head of the blocking notifier chain
 * @val_up:	Value passed unmodified to the notifier function
 * @val_down:	Value passed unmodified to the notifier function when recovering
 *              from an error on @val_up
 * @v:		Pointer passed unmodified to the notifier function
 *
 * NOTE:	It is important the @nl chain doesn't change between the two
 *		invocations of notifier_call_chain() such that we visit the
 *		exact same notifier callbacks; this rules out any RCU usage.
 *
 * Return:	the return value of the @val_up call.
 */
static int notifier_call_chain_robust(struct notifier_block **nl,
				     unsigned long val_up, unsigned long val_down,
				     void *v)
 
	int ret, nr = 0;
	ret = notifier_call_chain(nl, val_up, v, -1, &nr);
	if (ret & NOTIFY_STOP_MASK)
		notifier_call_chain(nl, val_down, v, nr-1, NULL);
	return ret;
 
Each of these callbacks tends to be quite driver-specific, so we ll cease discussion of this here.

Sync Filesystems The next step is to ensure all filesystems have been synchronized to disk. This is performed via a simple helper function that times how long the full synchronize operation, ksys_sync takes. kernel/power/main.c:69
void ksys_sync_helper(void)
 
	ktime_t start;
	long elapsed_msecs;
	start = ktime_get();
	ksys_sync();
	elapsed_msecs = ktime_to_ms(ktime_sub(ktime_get(), start));
	pr_info("Filesystems sync: %ld.%03ld seconds\n",
		elapsed_msecs / MSEC_PER_SEC, elapsed_msecs % MSEC_PER_SEC);
 
EXPORT_SYMBOL_GPL(ksys_sync_helper);
ksys_sync wakes and instructs a set of flusher threads to write out every filesystem, first their inodes15, then the full filesystem, and then finally all block devices, to ensure all pages are written out to disk. fs/sync.c:87
/*
 * Sync everything. We start by waking flusher threads so that most of
 * writeback runs on all devices in parallel. Then we sync all inodes reliably
 * which effectively also waits for all flusher threads to finish doing
 * writeback. At this point all data is on disk so metadata should be stable
 * and we tell filesystems to sync their metadata via ->sync_fs() calls.
 * Finally, we writeout all block devices because some filesystems (e.g. ext2)
 * just write metadata (such as inodes or bitmaps) to block device page cache
 * and do not sync it on their own in ->sync_fs().
 */
void ksys_sync(void)
 
	int nowait = 0, wait = 1;
	wakeup_flusher_threads(WB_REASON_SYNC);
	iterate_supers(sync_inodes_one_sb, NULL);
	iterate_supers(sync_fs_one_sb, &nowait);
	iterate_supers(sync_fs_one_sb, &wait);
	sync_bdevs(false);
	sync_bdevs(true);
	if (unlikely(laptop_mode))
		laptop_sync_completion();
 
It follows an interesting pattern of using iterate_supers to run both sync_inodes_one_sb and then sync_fs_one_sb on each known filesystem16. It also calls both sync_fs_one_sb and sync_bdevs twice, first without waiting for any operations to complete and then again waiting for completion17. When laptop_mode is enabled the system runs additional filesystem synchronization operations after the specified delay without any writes. mm/page-writeback.c:111
/*
 * Flag that puts the machine in "laptop mode". Doubles as a timeout in jiffies:
 * a full sync is triggered after this time elapses without any disk activity.
 */
int laptop_mode;
EXPORT_SYMBOL(laptop_mode);
However, when running a filesystem synchronization operation, the system will add an additional timer to schedule more writes after the laptop_mode delay. We don t want the state of the system to change at all while performing hibernation, so we cancel those timers. mm/page-writeback.c:2198
/*
 * We're in laptop mode and we've just synced. The sync's writes will have
 * caused another writeback to be scheduled by laptop_io_completion.
 * Nothing needs to be written back anymore, so we unschedule the writeback.
 */
void laptop_sync_completion(void)
 
	struct backing_dev_info *bdi;
	rcu_read_lock();
	list_for_each_entry_rcu(bdi, &bdi_list, bdi_list)
		del_timer(&bdi->laptop_mode_wb_timer);
	rcu_read_unlock();
 
As a side note, the ksys_sync function is simply called when the system call sync is used. fs/sync.c:111
SYSCALL_DEFINE0(sync)
 
	ksys_sync();
	return 0;
 

The End of Preparation With that the system has finished preparations for hibernation. This is a somewhat arbitrary cutoff, but next the system will begin a full freeze of userspace to then dump memory out to an image and finally to perform hibernation. All this will be covered in future articles!
  1. Hibernation modes are outside of scope for this article, see the previous article for a high-level description of the different types of hibernation.
  2. Workqueues are a mechanism for running asynchronous tasks. A full description of them is a task for another time, but the kernel documentation on them is available here: https://www.kernel.org/doc/html/v6.9/core-api/workqueue.html 2
  3. This is a bit of an oversimplification, but since this isn t the main focus of this article this description has been kept to a higher level.
  4. Kconfig is Linux s build configuration system that sets many different macros to enable/disable various features.
  5. Kconfig defaults to the first default found
  6. Including checking whether the algorithm is larval? Which appears to indicate that it requires additional setup, but is an interesting choice of name for such a state.
  7. Specifically when we get to process freezing, which we ll get to in the next article in this series.
  8. Swap space is outside the scope of this article, but in short it is a buffer on disk that the kernel uses to store memory not current in use to free up space for other things. See Swap Management for more details.
  9. The code for this is lengthy and tangential, thus it has not been included here. If you re curious about the details of this, see kernel/power/hibernate.c:858 for the details of hibernate_quiet_exec, and drivers/nvdimm/core.c:451 for how it is used in nvdimm.
  10. Annoyingly this code appears to use the terms console and virtual terminal interchangeably.
  11. ioctls are special device-specific I/O operations that permit performing actions outside of the standard file interactions of read/write/seek/etc. 2
  12. I m not entirely clear on how this flag works, this subsystem is particularly complex.
  13. In this case a higher number is higher priority.
  14. Or whatever the caller passes as val_down, but in this case we re specifically looking at how this is used in hibernation.
  15. An inode refers to a particular file or directory within the filesystem. See Wikipedia for more details.
  16. Each active filesystem is registed with the kernel through a structure known as a superblock, which contains references to all the inodes contained within the filesystem, as well as function pointers to perform the various required operations, like sync.
  17. I m including minimal code in this section, as I m not looking to deep dive into the filesystem code at this time.

4 September 2024

Dirk Eddelbuettel: RcppCNPy 0.2.13 on CRAN: Micro Bugfix

Another (again somewhat minor) maintenance release of the RcppCNPy package arrived on CRAN earlier today. RcppCNPy provides R with read and write access to NumPy files thanks to the cnpy library by Carl Rogers along with Rcpp for the glue to R. A change in the most recent Rcpp appears to cause void functions wrapper via Rcpp Modules to return NULL, as opposed to being silent. That tickles discrepancy between the current output and the saved (reference) output of one test file, leading CRAN to display a NOTE which we were asked to take care of. Done here in this release and now that we know we will also look into restoring the prior Rcpp behaviour. Other small changes involved standard maintenance for continuous integration and updates to files README.md and DESCRIPTION. More details are below.

Changes in version 0.2.13 (2024-09-03)
  • A test script was updated to account for the fact that it now returns a few instances of NULL under current Rcpp.
  • Small package maintenance updates have been made to the README and DESCRIPTION files as well as to the continuous integration setup.

CRANberries also provides a diffstat report for the latest release. As always, feedback is welcome and the best place to start a discussion may be the GitHub issue tickets page. If you like this or other open-source work I do, you can now sponsor me at GitHub.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

31 August 2024

Vincent Bernat: Fixing layout shifts caused by web fonts

In 2020, Google introduced Core Web Vitals metrics to measure some aspects of real-world user experience on the web. This blog has consistently achieved good scores for two of these metrics: Largest Contentful Paint and Interaction to Next Paint. However, optimizing the third metric, Cumulative Layout Shift, which measures unexpected layout changes, has been more challenging. Let s face it: optimizing for this metric is not really useful for a site like this one. But getting a better score is always a good distraction. To prevent the flash of invisible text when using web fonts, developers should set the font-display property to swap in @font-face rules. This method allows browsers to initially render text using a fallback font, then replace it with the web font after loading. While this improves the LCP score, it causes content reflow and layout shifts if the fallback and web fonts are not metrically compatible. These shifts negatively affect the CLS score. CSS provides properties to address this issue by overriding font metrics when using fallback fonts: size-adjust, ascent-override, descent-override, and line-gap-override. Two comprehensive articles explain each property and their computation methods in detail: Creating Perfect Font Fallbacks in CSS and Improved font fallbacks.

Interactive tuning tool Instead of computing each property from font average metrics, I put together a tool for interactively tuning fallback fonts.1

Instructions
  1. Load your custom font.
  2. Select a fallback font to tune.
  3. Adjust the size-adjust property to match the width of your custom font with the fallback font. With a proportional font, it is not possible to achieve a perfect match.
  4. Fine-tune the ascent-override property. Aim to align the final dot of the last paragraph while monitoring the font s baseline. For more precise adjustment, disable the option.
  5. Modify the descent-override property. The goal is to make the two boxes match. You may need to alternate between this and the previous property for optimal results.
  6. If necessary, adjust the line-gap-override property. This step is typically not required.
The process needs to be repeated for each fallback font. Some platforms may not include certain fonts. Notably, Android lacks most fonts found in other operating systems. It replaces Georgia with Noto Serif, which is not metrically-compatible.

Tool

This tool is not available from the Atom feed.

Results For the body text of this blog, I get the following CSS definition:
@font-face  
  font-family: Merriweather;
  font-style: normal;
  font-weight: 400;
  src: url("../fonts/merriweather.woff2") format("woff2");
  font-display: swap;
 
@font-face  
  font-family: "Fallback for Merriweather";
  src: local("Noto Serif"), local("Droid Serif");
  size-adjust: 98.3%;
  ascent-override: 99%;
  descent-override: 27%;
 
@font-face  
  font-family: "Fallback for Merriweather";
  src: local("Georgia");
  size-adjust: 106%;
  ascent-override: 90.4%;
  descent-override: 27.3%;
 
font-family: Merriweather, "Fallback for Merriweather", serif;
After a month, the CLS metric improved to 0:
Core Web Vitals scores for vincent.bernat.ch showing all 6 metrics as green. Notably the Cumulative Layout Shift is 0.
Recent Core Web Vitals scores for vincent.bernat.ch

About custom fonts Using safe web fonts or a modern font stack is often simpler. However, I prefer custom web fonts. Merriweather and Iosevka, which are used in this blog, enhance the reading experience. An alternative approach could be to use Georgia as a serif option. Unfortunately, most default monospace fonts are ugly. Furthermore, paragraphs that combine proportional and monospace fonts can create visual disruption. This occurs due to mismatched vertical metrics or weights. To address this issue, I adjust Iosevka s metrics and weight to align with Merriweather s characteristics.

  1. Similar tools already exist, like the Fallback Font Generator, but they were missing a few features, such as the ability to load the fallback font or to have decimals for the CSS properties. And no source code.

22 August 2024

Jonathan McDowell: Thoughts on Advent of Code + Rust

Diego wrote about his dislike for Advent of Code and that reminded me I hadn t written up my experience from 2023. Mostly because, spoiler, I never actually completed it and always intended to do so and then write it up. I think it s time to accept I m not going to do that, and write down some thoughts before I forget all of them. These are somewhat vague, given the time that s elapsed, but I think still relevant. You might also find Roger s problem write up interesting. I ve tried AoC a couple of times before; I think I had a very brief attempt back in 2021, and I got 4 days in for 2022. For Advent of Code 2023 I tried much harder to actually complete the challenges, and got most of the way there. I didn t allow myself to move on to the next day until fully completing the previous day, and didn t end up doing the second half of December 24th, or any of December 25th.

Rust First I want to talk about Rust, which is the language I chose to use for the problems. I ve dabbled a little in it, but I d like more familiarity with the basic language, and some programming problems seemed like a good way to get that. It s a language I want to like; I ve spent a lot of my career writing C, do more in Go these days, and generally think Rust promises a low level, run-time light environment like C but with the rough edges taken off. I set myself the challenge of using just bare Rust; no external crates, no use of cargo. I was accused of playing on hard mode by doing this, but it really wasn t the intention - I figured that I should be able to do what I needed without recourse to anything outside the core language, and didn t want what seemed like the extra complexity of dealing with cargo. That caused problems, however. I m used to by-default generic error handling in Go through the error type, but Rust seems to have much more tightly typed errors. I was pointed at anyhow as the right way to do this in Rust. I still find this surprising; I ended up using unwrap() a lot when I think with more generic error handling I could have used ?. The other thing I discovered is that by default rustc is heavy on the debug output. I got significantly better results on some of the solutions with rustc -O -C target-cpu=native source.rs. I probably shouldn t be surprised by this, but worth noting. Rust, to me, has a syntax only a C++ programmer could love. I am not a C++ programmer. Coming from C I found Go to be a nice, simple syntax to learn. Rust has not been the same. There s a lot more punctuation, and it s not always clear to me what it s doing. This applies more when reading other people s code than when writing it myself, obviously, but I see a lot of Rust code that could give Perl a run for its money in terms of looking like line noise. The borrow checker didn t bug me too much, but did add overhead to my thinking. The Rust compiler is generally very good at outputting helpful error messages when the programmer is an idiot. I ended up having to use a RefCell for one solution, and using .iter() for loops rather than explicit iterators (why, why is this different?). I also kept forgetting to explicitly mark variables as mutable when declaring them. Things I liked? There s a rich set of first class data types. Look, I m a C programmer, I m easily pleased. You give me some sort of hash array and I ll be happy. Rust manages that, tuples, strings, all the standard bits any modern language can provide. The whole impl thing for adding methods to structures I like as a way of providing some abstraction, though I think Go has a nicer syntax for it. The compiler, as mentioned, is great at spitting out useful errors for the most part. Also although I wasn t using external crates for AoC I do appreciate there s a decent ecosystem there now (though that brings up another gripe: rust seems to still be a fairly fast moving target, to the extent I can no longer rely on the compiler in Debian stable to be able to compile random projects I find).

Advent of Code Let s talk about the advent of code bit now. Hopefully it s long enough since it came out that this won t be spoilers for anyone, but if you haven t attempted the 2023 AoC and might, you might want to stop reading here. First, a refresher on the format for those who might not be aware of it. Problems are posted daily from December 1st until the 25th. Each is in 2 parts; the second part is not viewable until you have provided the correct answer for the first part. There s a whole leaderboard thing going on, but the puzzle opens at midnight UTC-5 so generally by the time I wake up and have time to look the problem has been solved many times over; no chance of getting listed. Credit to AoC creator, Eric Wastl, for writing up the set of problems in an entertaining fashion. I quite enjoyed seeing how the puzzle would be phrased each day, and the whole thing obviously brings a lot of joy to folk I know. I always start AoC thinking it ll be a fun set of puzzles to solve. Then something happens and I miss a day or two, and all of a sudden I ve a bunch of catching up to do and it s all a bit more of a chore. I hit that at some points this time, but made a concerted effort to try and power through it. That perseverance was required up front, because I found the second part of Day 1 to be ill specified, and had to iterate a few times to actually calculate the desired solution (IIRC, issues about whether sevenone at the end of a line ended up as 7 or 1 really tripped me up). I don t recall any other problems that bit me as hard on the specification as this one, but it happening up front was unfortunate. The short example input doesn t always help with this either; either it s not enough to be able to extrapolate patterns, or it doesn t show all the variations you need to account for (that aren t fully specified in the text), or in a few cases it turned out I needed to understand the shape of the actual data to produce a solution that could actually complete in a reasonable time. Which brings me to another matter, sometimes brute force doesn t actually work. This is fine, but the second part of the day s problem can change the approach you d take. So sometimes I got lucky in the way I handled the first half, and doing the second half was a simple 5 minute tweak, and sometimes I had to entirely change the way I was storing data. You might claim that if I was a better programmer I d have always produced a first half solution that was amenable to extension for the second half. First, I dispute that; I think there are always situations where the problem domain can change in enough directions that you can t handle all of them without a lot of effort. Secondly, I didn t find AoC an environment that encouraged me to optimise for generic solutions. Maybe some of the puzzles in isolation would allow for that, but a month of daily problems to solve while still engaging in regular life meant I hacked things up, took short cuts based on the knowledge I had of the input data, etc, etc. Overall I can see the appeal, but the sheer quantity and the fact I write code as part of my day job just made it feel too much like a chore, rather than a fun mental exercise. I did wonder how they d look as a set of interview puzzles (obviously a subset, rather than all of them), but I m not sure how you d actually use them for that - I wouldn t want anyone to have to solve them in a live interview. So, in case it s not obvious, I m not planning to engage in AoC again this yet. But I m continuing to persevere with Rust (though most of my work stuff is thankfully still Go).

8 August 2024

Reproducible Builds: Reproducible Builds in July 2024

Welcome to the July 2024 report from the Reproducible Builds project! In our reports, we outline what we ve been up to over the past month and highlight news items in software supply-chain security more broadly. As always, if you are interested in contributing to the project, please visit our Contribute page on our website. Table of contents:
  1. Reproducible Builds Summit 2024
  2. Pulling Linux up by its bootstraps
  3. Towards Idempotent Rebuilds?
  4. AROMA: Automatic Reproduction of Maven Artifacts
  5. Community updates
  6. Android Reproducible Builds at IzzyOnDroid with rbtlog
  7. Extending the Scalability, Flexibility and Responsiveness of Secure Software Update Systems
  8. Development news
  9. Website updates
  10. Upstream patches
  11. Reproducibility testing framework


Reproducible Builds Summit 2024 Last month, we were very pleased to announce the upcoming Reproducible Builds Summit, set to take place from September 17th 19th 2024 in Hamburg, Germany. We are thrilled to host the seventh edition of this exciting event, following the success of previous summits in various iconic locations around the world, including Venice, Marrakesh, Paris, Berlin and Athens. Our summits are a unique gathering that brings together attendees from diverse projects, united by a shared vision of advancing the Reproducible Builds effort. During this enriching event, participants will have the opportunity to engage in discussions, establish connections and exchange ideas to drive progress in this vital field. Our aim is to create an inclusive space that fosters collaboration, innovation and problem-solving. If you re interesting in joining us this year, please make sure to read the event page, which has more details about the event and location. We are very much looking forward to seeing many readers of these reports there.

Pulling Linux up by its bootstraps (LWN) In a recent edition of Linux Weekly News, Daroc Alden has written an article on bootstrappable builds. Starting with a brief introduction that
a bootstrappable build is one that builds existing software from scratch for example, building GCC without relying on an existing copy of GCC. In 2023, the Guix project announced that the project had reduced the size of the binary bootstrap seed needed to build its operating system to just 357-bytes not counting the Linux kernel required to run the build process.
The article goes onto to describe that now, the live-bootstrap project has gone a step further and removed the need for an existing kernel at all. and concludes:
The real benefit of bootstrappable builds comes from a few things. Like reproducible builds, they can make users more confident that the binary packages downloaded from a package mirror really do correspond to the open-source project whose source code they can inspect. Bootstrappable builds have also had positive effects on the complexity of building a Linux distribution from scratch [ ]. But most of all, bootstrappable builds are a boon to the longevity of our software ecosystem. It s easy for old software to become unbuildable. By having a well-known, self-contained chain of software that can build itself from a small seed, in a variety of environments, bootstrappable builds can help ensure that today s software is not lost, no matter where the open-source community goes from here

Towards Idempotent Rebuilds? Trisquel developer Simon Josefsson wrote an interesting blog post comparing the output of the .deb files from our tests.reproducible-builds.org testing framework and the ones in the official Debian archive. Following up from a previous post on the reproducibility of Trisquel, Simon notes that typically [the] rebuilds do not match the official packages, even when they say the package is reproducible , Simon correctly identifies that the purpose of [these] rebuilds are not to say anything about the official binary build, instead the purpose is to offer a QA service to maintainers by performing two builds of a package and declaring success if both builds match. However, Simon s post swiftly moves on to announce a new tool called debdistrebuild that performs rebuilds of the difference between two distributions in a GitLab pipeline and displays diffoscope output for further analysis.

AROMA: Automatic Reproduction of Maven Artifacts Mehdi Keshani, Tudor-Gabriel Velican, Gideon Bot and Sebastian Proksch of the Delft University of Technology, Netherlands, have published a new paper in the ACM Software Engineering on a new tool to automatically reproduce Apache Maven artifacts:
Reproducible Central is an initiative that curates a list of reproducible Maven libraries, but the list is limited and challenging to maintain due to manual efforts. [We] investigate the feasibility of automatically finding the source code of a library from its Maven release and recovering information about the original release environment. Our tool, AROMA, can obtain this critical information from the artifact and the source repository through several heuristics and we use the results for reproduction attempts of Maven packages. Overall, our approach achieves an accuracy of up to 99.5% when compared field-by-field to the existing manual approach [and] we reveal that automatic reproducibility is feasible for 23.4% of the Maven packages using AROMA, and 8% of these packages are fully reproducible.

Community updates On our mailing list this month:
  • Nichita Morcotilo reached out to the community, first to share their efforts to build reproducible packages cross-platform with a new build tool called rattler-build, noting that as you can imagine, building packages reproducibly on Windows is the hardest challenge (so far!) . Nichita goes onto mention that the Apple ecosystem appears to be using ZERO_AR_DATE over SOURCE_DATE_EPOCH. [ ]
  • Roland Clobus announced that the Debian bookworm 12.6 live images are nearly reproducible , with more detail in the post itself and input in the thread from other contributors.
  • As reported in last month s report, Pol Dellaiera completed his master thesis on Reproducibility in Software Engineering at the University of Mons, Belgium. This month, Pol announced this on the list with more background info. Since the master thesis sources have been available, it has received some feedback and contributions. As a result, an updated version of the thesis has been published containing those community fixes.
  • Daniel Gr ber asked for help in getting the Yosys documentation to build reproducibly, citing issues in inter alia the PDF generation causing differing CreationDate metadata values.
  • James Addison continued his long journey towards getting the Sphinx documentation generator to build reproducible documentation. In this thread, James concerns himself with the problem that even when SOURCE_DATE_EPOCH is configured, Sphinx projects that have configured their copyright notices using dynamic elements can produce nonsensical output under some circumstances. James query ended up generating a number of replies.
  • Allen gunner Gunner posted a brief update on the progress the core team is making towards introducing a Code of Conduct (CoC) such that it is in place in time for the RB Summit in Hamburg in September . In particular, gunner asks if you are interested in helping with CoC design and development in the weeks ahead, simply email rb-core@lists.reproducible-builds.org and let us know . [ ]

Android Reproducible Builds at IzzyOnDroid with rbtlog On our mailing list, Fay Stegerman announced a new Reproducible Builds collaboration in the Android ecosystem:
We are pleased to announce Reproducible Builds, special client support and more in our repo : a collaboration between various independent interoperable projects: the IzzyOnDroid team, 3rd-party clients Droid-ify & Neo Store, and rbtlog (part of my collection of tools for Android Reproducible Builds) to bring Reproducible Builds to IzzyOnDroid and the wider Android ecosystem.

Extending the Scalability, Flexibility and Responsiveness of Secure Software Update Systems Congratulations to Marina Moore of the New York Tandon School of Engineering who has submitted her PhD thesis on Extending the Scalability, Flexibility and Responsiveness of Secure Software Update Systems. The introduction outlines its contributions to the field:
[S]oftware repositories are a vital component of software development and release, with packages downloaded both for direct use and to use as dependencies for other software. Further, when software is updated due to patched vulnerabilities or new features, it is vital that users are able to see and install this patched version of the software. However, this process of updating software can also be the source of attack. To address these attacks, secure software update systems have been proposed. However, these secure software update systems have seen barriers to widespread adoption. The Update Framework (TUF) was introduced in 2010 to address several attacks on software update systems including repository compromise, rollback attacks, and arbitrary software installation. Despite this, compromises continue to occur, with millions of users impacted by such compromises. My work has addressed substantial challenges to adoption of secure software update systems grounded in an understanding of practical concerns. Work with industry and academic communities provided opportunities to discover challenges, expand adoption, and raise awareness about secure software updates. [ ]

Development news In Debian this month, 12 reviews of Debian packages were added, 13 were updated and 6 were removed this month adding to our knowledge about identified issues. A new toolchain issue type was identified as well, specifically ordering_differences_in_pkg_info.
Colin Percival filed a bug against the LLVM compiler noting that building i386 binaries on the i386 architecture is different when building i386 binaries under amd64. The fix was narrowed down to x87 excess precision, which can result in slightly different register choices when the compiler is hosted on x86_64 or i386 and a fix committed. [ ]
Fay Stegerman performed some in-depth research surrounding her apksigcopier tool, after some Android .apk files signed with the latest apksigner could no longer be verified as reproducible. Fay identified the issue as follows:
Since build-tools >= 35.0.0-rc1, backwards-incompatible changes to apksigner break apksigcopier as it now by default forcibly replaces existing alignment padding and changed the default page alignment from 4k to 16k (same as Android Gradle Plugin >= 8.3, so the latter is only an issue when using older AGP). [ ]
She documented multiple available workarounds and filed a bug in Google s issue tracker.
Lastly, diffoscope is our in-depth and content-aware diff utility that can locate and diagnose reproducibility issues. This month, Chris Lamb uploaded version 272 and Mattia Rizzolo uploaded version 273 to Debian, and the following changes were made as well:
  • Chris Lamb:
    • Ensure that the convert utility is from ImageMagick version 6.x. The command-line interface has seemingly changed with the 7.x series of ImageMagick. [ ]
    • Factor out version detection in test_jpeg_image. [ ]
    • Correct the import of the identify_version method after a refactoring change in a previous commit. [ ]
    • Move away from using DSA OpenSSH keys in tests as support has been deprecated and removed in OpenSSH version 9.8p1. [ ]
    • Move to assert_diff in the test_openssh_pub_key package. [ ]
    • Update copyright years. [ ]
  • Mattia Rizzolo:
    • Add support for ffmpeg version 7.x which adds some extra context to the diff. [ ]
    • Rework the handling of OpenSSH testing of DSA keys if OpenSSH is strictly 9.7, and add an OpenSSH key test with a ed25519-format key [ ][ ][ ]
    • Temporarily disable a few packages that are not available in Debian testing. [ ][ ]
    • Stop ignoring the results of Debian testing in the continuous integration system. [ ]
    • Adjust options in debian/source to make sure not to pack the Python sdist directory into the binary Debian package. [ ]
    • Adjust Lintian overrides. [ ]

Website updates There were a number of improvements made to our website this month, including:

Upstream patches The Reproducible Builds project detects, dissects and attempts to fix as many currently-unreproducible packages as possible. We endeavour to send all of our patches upstream where appropriate. This month, we wrote a large number of such patches, including:

Reproducibility testing framework The Reproducible Builds project operates a comprehensive testing framework running primarily at tests.reproducible-builds.org in order to check packages and other artifacts for reproducibility. In July, a number of changes were made by Holger Levsen, including:
  • Grant bremner access to the ionos7 node. [ ][ ]
  • Perform a dummy change to force update of all jobs. [ ][ ]
In addition, Vagrant Cascadian performed some necessary node maintenance of the underlying build hosts. [ ]

If you are interested in contributing to the Reproducible Builds project, please visit our Contribute page on our website. However, you can get in touch with us via:

31 July 2024

Reproducible Builds (diffoscope): diffoscope 273 released

The diffoscope maintainers are pleased to announce the release of diffoscope version 273. This version includes the following changes:
[ Chris Lamb ]
* Factor out version detection in test_jpeg_image. (Re:
  reproducible-builds/diffoscope#384)
* Ensure that 'convert' is from Imagemagick 6.x; we will need to update a
  few things with IM7. (Closes: reproducible-builds/diffoscope#384)
* Correct import of identify_version after refactoring change in 037bdcbb0.
[ Mattia Rizzolo ]
* tests:
  + Add OpenSSH key test with a ed25519 key.
  + Skip the OpenSSH test with DSA key if openssh is >> 9.7
  + Support ffmpeg >= 7 that adds some extra context to the diff
* Do not ignore testing in gitlab-ci.
* debian:
  + Temporarily remove aapt, androguard and dexdump from the build/test
    dependencies as they are not available in testin/trixie.  Closes: #1070416
  + Bump Standards-Version to 4.7.0, no changes needed.
  + Adjust options to make sure not to pack the python s-dist directory
    into the debian source package.
  + Adjust the lintian overrides.
You find out more by visiting the project homepage.

24 July 2024

Dirk Eddelbuettel: RQuantLib 0.4.23 on CRAN: Updates

A new minor release 0.4.23 of RQuantLib just arrived at CRAN earlier today, and will be uploaded to Debian in due course. QuantLib is a rather comprehensice free/open-source library for quantitative finance. RQuantLib connects (some parts of) it to the R environment and language, and has been part of CRAN for more than twenty-two years (!!) as it was one of the first packages I uploaded. This release of RQuantLib updates to QuantLib version 1.35 released this morning. It accommodates some removals following earlier deprecations, and also updates most of the code in the function for a more readable and compact form of creating shared pointers via make_shared() along with auto.

Changes in RQuantLib version 0.4.23 (2024-07-23)
  • Adjustments for QuantLib 1.35 and removal of deprecated code (in utility functions and dividend case of vanilla options)
  • Adjustments for new changes in QuantLib 1.35
  • Refactoring most C++ files making more use of both auto and make_shared to simplify and shorten expressions

Courtesy of my CRANberries, there is also a diffstat report for the this release. As always, more detailed information is on the RQuantLib page. Questions, comments etc should go to the rquantlib-devel mailing list. Issue tickets can be filed at the GitHub repo. If you like this or other open-source work I do, you can now sponsor me at GitHub.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

14 July 2024

Russ Allbery: podlators v6.0.2

podlators contains the Perl modules and scripts used to convert Perl's documentation language, POD, to text and manual pages. This is another small bug fix release that is part of iterating on getting the new podlators incorproated into Perl core. The bug fixed in this release was another build system bug I introduced in recent refactorings, this time breaking the realclean target so that some generated scripts were not removed. Thanks to James E Keenan for the report. You can get the latest version from CPAN or from the podlators distribution page.

Russ Allbery: DocKnot 8.0.1

DocKnot is my static web site generator, with some additional features for managing software releases. This release fixes some bugs in the newly-added conversion of text to HTML that were due to my still-incomplete refactoring of that code. It still uses some global variables, and they were leaking between different documents and breaking the formatting. It also fixes consistency problems with how the style parameter in *.spin files was interpreted, and fixes some incorrect docknot update-spin behavior. You can get the latest version from CPAN or from the DocKnot distribution page.

7 July 2024

Russ Allbery: DocKnot v8.0.0

DocKnot is my static web site generator, with additional features for managing software releases and package documentation. This release switches to semantic versioning for the Perl modules, hence the v prefix to the version number. This appears to be the standard in the Perl world for indicating that the version follows the semantic versioning standard. That also required adding support to DocKnot for release tarballs whose version string starts with v. I plan to convert all of my Perl modules to semantic versioning in their next releases. The main change in this release is that it incorporates my old faq2html script that was previously used to convert text documents to HTML for publication. This code was imported into DocKnot and integrated with the rest of the package, so text documents can now be referenced by *.spin pointers rather than the (now-deprecated) *.faq files. The long delay in the release is because I was hoping to refactor the faq2html code to work as a proper module with no global state and to follow my current Perl coding guidelines before releasing a version of DocKnot containing it, but the refactoring is taking forever and support for v-prefixed versions was blocking other releases, so I'm releasing it in a less-than-ideal state and hoping to fix it later. There are also a few other bug fixes and improvements, the most notable of which is probably that the footer on generated web pages now points properly to the DocKnot distribution page rather than my old spin page. You can get the latest version from CPAN or from the DocKnot distribution page.

4 July 2024

Arturo Borrero Gonz lez: Wikimedia Toolforge: migrating Kubernetes from PodSecurityPolicy to Kyverno

Le ch teau de Val re et le Haut de Cry en juillet 2022 Christian David, CC BY-SA 4.0, via Wikimedia Commons This post was originally published in the Wikimedia Tech blog, authored by Arturo Borrero Gonzalez. Summary: this article shares the experience and learnings of migrating away from Kubernetes PodSecurityPolicy into Kyverno in the Wikimedia Toolforge platform. Wikimedia Toolforge is a Platform-as-a-Service, built with Kubernetes, and maintained by the Wikimedia Cloud Services team (WMCS). It is completely free and open, and we welcome anyone to use it to build and host tools (bots, webservices, scheduled jobs, etc) in support of Wikimedia projects. We provide a set of platform-specific services, command line interfaces, and shortcuts to help in the task of setting up webservices, jobs, and stuff like building container images, or using databases. Using these interfaces makes the underlying Kubernetes system pretty much invisible to users. We also allow direct access to the Kubernetes API, and some advanced users do directly interact with it. Each account has a Kubernetes namespace where they can freely deploy their workloads. We have a number of controls in place to ensure performance, stability, and fairness of the system, including quotas, RBAC permissions, and up until recently PodSecurityPolicies (PSP). At the time of this writing, we had around 3.500 Toolforge tool accounts in the system. We early adopted PSP in 2019 as a way to make sure Pods had the correct runtime configuration. We needed Pods to stay within the safe boundaries of a set of pre-defined parameters. Back when we adopted PSP there was already the option to use 3rd party agents, like OpenPolicyAgent Gatekeeper, but we decided not to invest in them, and went with a native, built-in mechanism instead. In 2021 it was announced that the PSP mechanism would be deprecated, and removed in Kubernetes 1.25. Even though we had been warned years in advance, we did not prioritize the migration of PSP until we were in Kubernetes 1.24, and blocked, unable to upgrade forward without taking actions. The WMCS team explored different alternatives for this migration, but eventually we decided to go with Kyverno as a replacement for PSP. And so with that decision it began the journey described in this blog post. First, we needed a source code refactor for one of the key components of our Toolforge Kubernetes: maintain-kubeusers. This custom piece of software that we built in-house, contains the logic to fetch accounts from LDAP and do the necessary instrumentation on Kubernetes to accommodate each one: create namespace, RBAC, quota, a kubeconfig file, etc. With the refactor, we introduced a proper reconciliation loop, in a way that the software would have a notion of what needs to be done for each account, what would be missing, what to delete, upgrade, and so on. This would allow us to easily deploy new resources for each account, or iterate on their definitions. The initial version of the refactor had a number of problems, though. For one, the new version of maintain-kubeusers was doing more filesystem interaction than the previous version, resulting in a slow reconciliation loop over all the accounts. We used NFS as the underlying storage system for Toolforge, and it could be very slow because of reasons beyond this blog post. This was corrected in the next few days after the initial refactor rollout. A side note with an implementation detail: we stored a configmap on each account namespace with the state of each resource. Storing more state on this configmap was our solution to avoid additional NFS latency. I initially estimated this refactor would take me a week to complete, but unfortunately it took me around three weeks instead. Previous to the refactor, there were several manual steps and cleanups required to be done when updating the definition of a resource. The process is now automated, more robust, performant, efficient and clean. So in my opinion it was worth it, even if it took more time than expected. Then, we worked on the Kyverno policies themselves. Because we had a very particular PSP setting, in order to ease the transition, we tried to replicate their semantics on a 1:1 basis as much as possible. This involved things like transparent mutation of Pod resources, then validation. Additionally, we had one different PSP definition for each account, so we decided to create one different Kyverno namespaced policy resource for each account namespace remember, we had 3.5k accounts. We created a Kyverno policy template that we would then render and inject for each account. For developing and testing all this, maintain-kubeusers and the Kyverno bits, we had a project called lima-kilo, which was a local Kubernetes setup replicating production Toolforge. This was used by each engineer in their laptop as a common development environment. We had planned the migration from PSP to Kyverno policies in stages, like this:
  1. update our internal template generators to make Pod security settings explicit
  2. introduce Kyverno policies in Audit mode
  3. see how the cluster would behave with them, and if we had any offending resources reported by the new policies, and correct them
  4. modify Kyverno policies and set them in Enforce mode
  5. drop PSP
In stage 1, we updated things like the toolforge-jobs-framework and tools-webservice. In stage 2, when we deployed the 3.5k Kyverno policy resources, our production cluster died almost immediately. Surprise. All the monitoring went red, the Kubernetes apiserver became irresponsibe, and we were unable to perform any administrative actions in the Kubernetes control plane, or even the underlying virtual machines. All Toolforge users were impacted. This was a full scale outage that required the energy of the whole WMCS team to recover from. We temporarily disabled Kyverno until we could learn what had occurred. This incident happened despite having tested before in lima-kilo and in another pre-production cluster we had, called Toolsbeta. But we had not tested that many policy resources. Clearly, this was something scale-related. After the incident, I went on and created 3.5k Kyverno policy resources on lima-kilo, and indeed I was able to reproduce the outage. We took a number of measures, corrected a few errors in our infrastructure, reached out to the Kyverno upstream developers, asking for advice, and at the end we did the following to accommodate the setup to our needs: I have to admit, I was briefly tempted to drop Kyverno, and even stop pursuing using an external policy agent entirely, and write our own custom admission controller out of concerns over performance of this architecture. However, after applying all the measures listed above, the system became very stable, so we decided to move forward. The second attempt at deploying it all went through just fine. No outage this time When we were in stage 4 we detected another bug. We had been following the Kubernetes upstream documentation for setting securityContext to the right values. In particular, we were enforcing the procMount to be set to the default value, which per the docs it was DefaultProcMount . However, that string is the name of the internal variable in the source code, whereas the actual default value is the string Default . This caused pods to be rightfully rejected by Kyverno while we figured the problem. I sent a patch upstream to fix this problem. We finally had everything in place, reached stage 5, and we were able to disable PSP. We unloaded the PSP controller from the kubernetes apiserver, and deleted every individual PSP definition. Everything was very smooth in this last step of the migration. This whole PSP project, including the maintain-kubeusers refactor, the outage, and all the different migration stages took roughly three months to complete. For me there are a number of valuable reasons to learn from this project. For one, the scale is something to consider, and test, when evaluating a new architecture or software component. Not doing so can lead to service outages, or unexpectedly poor performances. This is in the first chapter of the SRE handbook, but we got a reminder the hard way This post was originally published in the Wikimedia Tech blog, authored by Arturo Borrero Gonzalez.

21 June 2024

Dirk Eddelbuettel: nanotime 0.3.9 on CRAN: Bugfix

A quick bug fix release 0.3.9 for our nanotime package is now on CRAN, following up on the 0.3.8 release made this week. nanotime relies on the RcppCCTZ package (as well as the RcppDate package for additional C++ operations) and offers efficient high(er) resolution time parsing and formatting up to nanosecond resolution, using the bit64 package for the actual integer64 arithmetic. Initially implemented using the S3 system, it has benefitted greatly from a rigorous refactoring by Leonardo who not only rejigged nanotime internals in S4 but also added new S4 types for periods, intervals and durations. The 0.3.8 release added a accurate parameter for POSIXct conversions, and it turns out that this did not test as expected on arm64 so we disabled the test on that platform. The NEWS snippet below has the full details.

Changes in version 0.3.9 (2024-06-21)
  • Condition two tests to not run on arm64 (Dirk in #129 fixing #128)

Thanks to my CRANberries, there is a diffstat report for this release. More details and examples are at the nanotime page; code, issue tickets etc at the GitHub repository and all documentation is provided at the nanotime documentation site. If you like this or other open-source work I do, you can sponsor me at GitHub.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

19 June 2024

Dirk Eddelbuettel: nanotime 0.3.8 on CRAN: More Maintenance

Leonardo and I are happy to annunce that a new version 0.3.8 of our nanotime package arrived on CRAN today. It is the first release in over 1 1/2 years. nanotime relies on the RcppCCTZ package (as well as the RcppDate package for additional C++ operations) and offers efficient high(er) resolution time parsing and formatting up to nanosecond resolution, using the bit64 package for the actual integer64 arithmetic. Initially implemented using the S3 system, it has benefitted greatly from a rigorous refactoring by Leonardo who not only rejigged nanotime internals in S4 but also added new S4 types for periods, intervals and durations. This release responds to a number of enhancements including a new paramter accurate for POSIXct to nanotime conversions, a vector date converter, a switch to double return value when durations objects are dividded as well as a small battery of CRAN requests for changes and updates. This started with a move away from the now non-API function SET_S4_OBJECT which has been replaced by use of Rf_asS4. We also no longer need a custom compiler flag on Windows (where for some reasons nobody understands or remembers, bitfields are not packed) to small enhancements of manual page formatting and last-but-not-least avoidance of some new UBSAN warnings. The NEWS snippet has the full details.

Changes in version 0.3.8 (2024-06-19)
  • Time format documentation now has a reference to RcppCCTZ
  • The package no longer sets a default C++ compilation standard of C++11 (Dirk initially in #114, and later switched to C++17)
  • New accurate parameter for conversion from POSIXct to nanotime (Davor Josipovic and Leonardo in #116 closing #115)
  • The as.Date() function is now vectorized and can take a TZ argument (Leonardo and Dirk in #119 closing #118)
  • Use of internal function SET_S4_OBJECT has been replaced by API function Rf_asS4 (Leonardo in #121 closing #120)
  • An nanoduration / nanoduration expression now returns a double (Leonardo in #122 closing #117)
  • Bitfield calculations no longer require an Windows-only compiler switch (Leonardo in #124)
  • A simple manual page format nag involving has been addressed (Dirk in #126 fixing #125)
  • An set of tests tickling an UBSAN issue via Rcpp code no longer run unless CI is set (Dirk in #127 fixing #123)

Thanks to my CRANberries, there is a diffstat report for this release. More details and examples are at the nanotime page; code, issue tickets etc at the GitHub repository and all documentation is provided at the nanotime documentation site. If you like this or other open-source work I do, you can sponsor me at GitHub.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

18 May 2024

Russell Coker: Kogan 5120*2160 40 Monitor

I ve just got a new Kogan 5120*2160 40 curved monitor. It cost $599 including shipping etc which is much cheaper than the Dell monitor with similar specs selling for about $2500. For monitors with better than 4K resolution (by which I don t mean 5K*1440) this is the cheapest option. The nearest competitors are the 27 monitors that do 5120*2880 from Apple and some companies copying Apple s specs. While 5120*2880 is a significantly better resolution than what I got it s probably not going to help me at 27 size. I ve had a Dell 32 4K monitor since the 1st of July 2022 [1]. It is a really good monitor and I had no complaints at all about it. It was clearer than the Samsung 27 4K monitor I used before it and I m not sure how much of that is due to better display technology (the Samsung was from 2017) and how much was due to larger size. But larger size was definitely a significant factor. I briefly owned a Phillips 43 4K monitor [2] and determined that a 43 flat screen was definitely too big. At the time I thought that about 35 would have been ideal but after a couple of years using a flat 32 screen I think that 32 is about the upper limit for a flat screen. This is the first curved monitor I ve used but I m already thinking that maybe 40 is too big for a 21:9 aspect ratio even with a curved screen. Maybe if it was 4:4 or even 16:9 that would be ok. Otherwise the ideal for a curved screen for me would be something between about 36 and 38 . Also 43 is awkward to move around my desk. But this is still quite close to ideal. The first system I tested this on was a work laptop, a Dell Latitude 7400 2in1. On the Dell dock that did 4K resolution and on a HDMI cable it did 1440p which was a disappointment as that laptop has talked to many 4K monitors at native resolution on the HDMI port with the same cable. This isn t an impossible problem, as I work in the IT department I can just go through all the laptops in the store room until I find one that supports it. But the 2in1 is a very nice laptop, so I might even just keep using it in 4K resolution when WFH. The laptop in question is deemed an executive laptop so I have to wait another 2 years for the executives to get new laptops before I can get a newer 2in1. On my regular desktop I had the problem of the display going off for a few seconds every minute or so and also occasionally giving a white flicker. That was using 5120*2160 with a DisplayPort switch as described in the blog post about the Dell 32 monitor. When I ran it in 4K resolution with the DisplayPort switch from my desktop it was fine. I then used the DisplayPort cable that came with the monitor directly connecting the video card to the display and it was fine at 5120*2160 with 75Hz. The monitor has the joystick thing that seems to have become some sort of standard for controlling modern monitors. It s annoying that pressing it in powers it off. I think there should be a separate button for that. Also the UI in general made me wonder if one of the vendors of expensive monitors had paid whoever designed it to make the UI suck. The monitor had a single dead pixel in the center of the screen about 1/4 the way down from the top when I started writing this post. Now it s gone away which is a concern as I don t know which pixels might have problems next or if the number of stuck pixels will increase. Also it would be good if there was a dark mode for the WordPress editor. I use dark mode wherever possible so I didn t notice the dead pixel for several hours until I started writing this blog post. I watched a movie on Netflix and it took the entire screen area, I don t know if they are storing movies in 64:27 ratio or if the clipped the top and bottom, it was probably clipped but still looked OK. The monitor has different screen modes which make it look different, I can t see much benefit to the different modes. The standard mode is what I usually use and it s brighter and the movie mode seems OK for the one movie I ve watched so far. In other news BenQ has just announced a 3840*2560 28 monitor specifically designed for programming [3]. This is the first time I ve heard of a monitor with 3:2 ratio with modern resolution, we still aren t at the 4:3 type ratio that we were used to when 640*480 was high resolution but it s a definite step in the right direction. It s also the only time I recall ever seeing a monitor advertised as being designed for programming. In the 80s there were home computers advertised as being computers for kids to program, but at that time it was either TV sets for monitors or monitors sold with computers. It was only after the IBM PC compatible market took off that having a choice of different monitors for one computer was a thing. In recent years monitors advertised as being for office use (meaning bright and expensive) have become common as are monitors designed for gamer use (meaning high refresh rate). But BenQ seems to be the first to advertise a monitor for the purpose of programming. They have a desktop partition feature (which could be software or hardware the article doesn t make it clear) to give some of the benefits of a tiled window manager to people who use OSs that don t support such things. The BenQ monitor is a bit small for my taste, I don t know if my vision is good enough to take advantage of 3840*2560 in a 28 monitor nowadays. I think at least 32 would be better. Google seems to be really into buying good monitors for their programmers, if every Google programmer got one of those BenQ monitors then that would be enough sales to make it worth-while for them. I had hoped that we would have 6K monitors become affordable this year and 8K become less expensive than most cars. Maybe that won t happen and we will instead have a wider range of products like the ultra wide monitor I just bought and the BenQ programmer s monitor. If so I don t think that will be a bad result. Now the question is whether I can use this monitor for 2 years before finding something else that makes me want to upgrade. I can afford to spend the equivalent of a bit under $1/day on monitor upgrades.

12 May 2024

Freexian Collaborators: Debian Contributions: Salsa CI updates, OpenSSH option review, and more! (by Utkarsh Gupta)

Contributing to Debian is part of Freexian s mission. This article covers the latest achievements of Freexian and their collaborators. All of this is made possible by organizations subscribing to our Long Term Support contracts and consulting services. P.S. We ve completed over a year of writing these blogs. If you have any suggestions on how to make them better or what you d like us to cover, or any other opinions/reviews you might have, et al, please let us know by dropping an email to us. We d be happy to hear your thoughts. :)

Salsa CI updates & GSoC candidacy, by Santiago Ruano Rincon In the context of Google Summer of Code (GSoC), Santiago continued the mentoring work, following the applications of three of the candidates. This work started in March, but Aquila Macedo, Ahmed Siam and Piyush Raj continued in April to propose and review MRs. For example, Update CI pipeline to utilize specific blhc image per release and Remove references to buster-backports by Aquila, or the reviews the candidates made to Document the structure of the different components of the pipeline (see below). Unfortunately, the Salsa CI project didn t get any slot from the GSoC program in the end. Along with the Salsa CI related work, Santiago improved the documentation of Salsa CI, to make it easier for newcomers (as the GSoC candidates) or people willing to fork the project to understand its internals. Documentation is an aspect where a lot of improvements can be made.

OpenSSH option review, by Colin Watson In light of last month s xz-utils backdoor, Colin did an extensive review of some of the choices in Debian s OpenSSH packaging. Some work on this has already been done (removing uses of libsystemd and reducing tcp-wrappers linkage); the next step is likely to be to start work on the plan to split out GSS-API key exchange again.

Miscellaneous contributions
  • Utkarsh Gupta started to put together and kickstart the bursary team ahead of DebConf 24, to be held in Busan, South Korea.
  • Utkarsh Gupta reviewed some MRs and docs for the bursary team for the DC24 website.
  • Helmut Grohne sent patches for 19 cross build failures and submitted a gcc patch removing LIMITS_H_TEST upstream.
  • Helmut sent 8 bug reports with 3 patches related to the /usr-move.
  • Helmut diagnosed why /dev/stdout is not accessible in sbuild --mode=unshare.
  • Helmut diagnosed the time64-induced glibc FTBFS.
  • Helmut sent patches for fixing initramfs triggers on firmware removal.
  • Thorsten Alteholz uploaded foo2zjs and fixed two bugs, one related to /usr-merge. Likewise the upload of cups-filters (from the 1.x branch) fixed three bugs. In order to fix an RC bug in cpdb-backends-cups, which was updated to the 2.x branch, the new package libcupsfilters has been introduced. Last but not least an upload of hplip fixed one RC bug and an upload of gutenprint fixed two of them. All of these RC bugs were more or less related to the time_t transition.
  • Santiago continued to work in the DebConf organization tasks, including some for the DebConf 24 Content Team, and looking to build a local community for DebConf 25.
  • Stefano Rivera made a couple of uploads of dh-python to Debian, and a few other general package update uploads.
  • Stefano did some winding up of DebConf 23 finances, including closing bursary claims and recording the amounts spent on travel bursaries.
  • Stefano opened DebConf 24 registration, which always requires some last-minute work on the website.
  • Colin released man-db 2.12.1.
  • Colin fixed a regression in groff s PDF output.
  • In the Python team, Colin fixed build/autopkgtest failures in seven packages, and updated ten packages to new upstream versions.

1 May 2024

Dirk Eddelbuettel: RcppInt64 0.0.5 on CRAN: Minor Maintenance

The new-ish package RcppInt64 (announced last fall in this post, with three small updates following) arrived on CRAN yesterday as relase 0.0.5. RcppInt64 collects some of the previous conversions between 64-bit integer values in R and C++, and regroups them in a single package. It offers two interfaces: both a more standard as<>() converter from R values along with its companions wrap() to return to R, as well as more dedicated functions from and to . This release addresses an new nag from CRAN who no longer want us to use the non-API header function SET_S4_OBJECT so a small change was made. The brief NEWS entry follows:

Changes in version 0.0.5 (2024-04-30)
  • Minor refactoring of internal code to not rely on SET_S4_OBJECT.

Courtesy of my CRANberries, there is a diffstat report relative to the previous release. If you like this or other open-source work I do, you can sponsor me at GitHub.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

Bits from Debian: Debian welcomes the 2024 GSOC contributors/students

GSoC logo We are very excited to announce that Debian has selected seven contributors to work under mentorship on a variety of projects with us during the Google Summer of Code. Here are the list of the projects, students, and details of the tasks to be performed.
Project: Android SDK Tools in Debian Deliverables of the project: Make the entire Android toolchain, Android Target Platform Framework, and SDK tools available in the Debian archives.
Project: Benchmarking Parallel Performance of Numerical MPI Packages Deliverables of the project: Deliver an automated method for Debian maintainers to test selected numerical Debian packages for their parallel performance in clusters, in particular to catch performance regressions from updates, and to verify expected performance gains, such as Amdahl s and Gufstafson s law, from increased cluster resources.
Project: Debian MobCom Deliverables of the project: Update the outdated mobile packages and recreate aged packages due to new dependencies. Bring in more mobile communication tools by adding about 5 new packages.
Project: Improve support of the Rust coreutils in Debian Deliverables of the project: Make uutils behave more like GNU s coreutils by improving compatibility with GNU coreutils test suit.
Project: Improve support of the Rust findutils in Debian Deliverables of the project: A safer and more performant implementation of the GNU suite's xargs, find, locate and updatedb tools in rust.
Project: Expanding ROCm support within Debian and derivatives Deliverables of the project: Building, packaging, and uploading missing ROCm software into Debian repositories, starting with simple tools and progressing to high-level applications like PyTorch, with the final deliverables comprising a series of ROCm packages meeting community quality assurance standards.
Project: procps: Development of System Monitoring, Statistics and Information Tools in Rust Deliverables of the project: Improve the usability of the entire Rust-based implementation of the procps utility on Linux.
Congratulations and welcome to all the contributors! The Google Summer of Code program is possible in Debian thanks to the efforts of Debian Developers and Debian Contributors that dedicate part of their free time to mentor contributors and outreach tasks. Join us and help extend Debian! You can follow the contributors' weekly reports on the debian-outreach mailing-list, chat with us on our IRC channel or reach out to the individual projects' team mailing lists.

Next.

Previous.